In C#, volatile keyword ensures that reads and writes have acquire and release semantics, respectively. However, does it say anything about introduced reads or writes?
For instance:
volatile Thing something;
volatile int aNumber;
void Method()
{
// Are these lines...
var local = something;
if (local != null)
local.DoThings();
// ...guaranteed not to be transformed into these by compiler, jitter or processor?
if (something != null)
something.DoThings(); // <-- Second read!
// Are these lines...
if (aNumber == 0)
aNumber = 1;
// ...guaranteed not to be transformed into these by compiler, jitter or processor?
var temp = aNumber;
if (temp == 0)
temp = 1;
aNumber = temp; // <-- An out-of-thin-air write!
}
Here's what the C# spec1 has to say about Execution Order:
Execution of a C# program proceeds such that the side effects of each executing thread are preserved at critical execution points. A side effect is defined as a read or write of a volatile field ...
The execution environment is free to change the order of execution of a C# program, subject to the following constraints:
...
The ordering of side effects is preserved with respect to volatile reads and writes ...
I would certainly consider introducing new side effects to be changing the order of side effects, but it's not explicitly stated like that here.
Link in answer is to the C# 6 spec which is listed as Draft. C# 5 spec isn't a draft but is not available on-line, only as a download. Identical wording, so far as I can see in this section.
This wording from the C# spec:
The ordering of side effects is preserved with respect to volatile
reads and writes...
may be interpreted as implying that read and write introductions on volatile variables are not allowed, but it is really ambiguous and it depends on the meaning of "ordering." If it is referring to relative ordering of existing accesses, then introducing new reads or writes does not change that and so it would not violate this part of the spec. If it is referring to the exact position of all memory accesses in program order, then introducing new accesses would violate the spec.
This article says that reads on non-volatile variables might be introduced but does not say explicitly whether this is not allowed on volatile variables.
This Q/A discusses how to prevent read introduction (but no discussion on write introduction).
In the comments under this article, two Microsoft employees (at the least at the time the comments were written) explicitly state that read and write introductions on volatile variables are not allowed.
Stephen Toub
"read introduction" is one mechanism by which a memory reordering
might be introduced.
Igor Ostrovsky
Elsewhere in the C# specification, a volatile read is defined to be a
"side effect". As a result, repeating the read of m_paused would be
equivalent to adding another side effect, which is not allowed.
I think we can conclude from these comments that introducing a side effect out-of-thin-air in C#, any kind of side effect, anywhere in the code is not allowed.
A related quote from the CLI standard states the following in Section I.12.6.7:
An optimizing compiler that converts CIL to native code shall not
remove any volatile operation, nor shall it coalesce multiple volatile
operations into a single operation.
As far as I know, the CLI does not explicitly talk about introducing new side effects.
I wonder whether you have misunderstood what volatile means. Volatile can be used with types that can be read or written as an atomic action.
There is no acquire/release of a lock, only a barrier to compile-time and run-time reordering to provide lockless aquire/release semantics (https://preshing.com/20120913/acquire-and-release-semantics/). On non-x86 this may require barrier instructions in the asm, but not taking a lock.
volatile indicates that a field may be modified by other threads, which is why read/writes need to be treated as atomic and not optimised.
Your question is a little ambiguous.
1/ If you mean, will the compiler transform:
var local = something;
if (local != null) local.DoThings();
into:
if (something != null) something.DoThings();
then the answer is no.
2/ If you mean, will "DoThings()" be called twice on the same object:
var local = something;
if (local != null) local.DoThings();
if (something != null) something.DoThings();
then the answer is mostly yes, unless another thread has changed the value of "something" before the second "DoThings()" is invoked. If this is the case then it could give you a run time error - if after the "if" condition is evaluated and before "DoThings" is called, another thread sets "something" to null then you will get a runtime error. I assume this is why you have your "var local = something;".
3/ If you mean will the following cause two reads:
if (something != null) something.DoThings();
then yes, one read for the condition and a second read when it invokes DoThings() (assuming that something is not null). Were it not marked volatile then the compiler might manage that with a single read.
In any event the implementation of the function "DoThings()" needs to be aware that it could be called by multiple threads, so would need to consider incorporating a combination of locks and its own volatile members.
Related
In C#, this is the standard code for invoking an event in a thread-safe way:
var handler = SomethingHappened;
if(handler != null)
handler(this, e);
Where, potentially on another thread, the compiler-generated add method uses Delegate.Combine to create a new multicast delegate instance, which it then sets on the compiler-generated field (using interlocked compare-exchange).
(Note: for the purposes of this question, we don't care about code that runs in the event subscribers. Assume that it's thread-safe and robust in the face of removal.)
In my own code, I want to do something similar, along these lines:
var localFoo = this.memberFoo;
if(localFoo != null)
localFoo.Bar(localFoo.baz);
Where this.memberFoo could be set by another thread. (It's just one thread, so I don't think it needs to be interlocked - but maybe there's a side-effect here?)
(And, obviously, assume that Foo is "immutable enough" that we're not actively modifying it while it is in use on this thread.)
Now I understand the obvious reason that this is thread-safe: reads from reference fields are atomic. Copying to a local ensures we don't get two different values. (Apparently only guaranteed from .NET 2.0, but I assume it's safe in any sane .NET implementation?)
But what I don't understand is: What about the memory occupied by the object instance that is being referenced? Particularly in regards to cache coherency? If a "writer" thread does this on one CPU:
thing.memberFoo = new Foo(1234);
What guarantees that the memory where the new Foo is allocated doesn't happen to be in the cache of the CPU the "reader" is running on, with uninitialized values? What ensures that localFoo.baz (above) doesn't read garbage? (And how well guaranteed is this across platforms? On Mono? On ARM?)
And what if the newly created foo happens to come from a pool?
thing.memberFoo = FooPool.Get().Reset(1234);
This seems no different, from a memory perspective, to a fresh allocation - but maybe the .NET allocator does some magic to make the first case work?
My thinking, in asking this, is that a memory barrier would be required to ensure - not so much that memory accesses cannot be moved around, given the read is dependent - but as a signal to the CPU to flush any cache invalidations.
My source for this is Wikipedia, so make of that what you will.
(I might speculate that maybe the interlocked-compare-exchange on the writer thread invalidates the cache on the reader? Or maybe all reads cause invalidation? Or pointer dereferences cause invalidation? I'm particularly concerned how platform-specific these things sound.)
Update: Just to make it more explicit that the question is about CPU cache invalidation and what guarantees .NET provides (and how those guarantees might depend on CPU architecture):
Say we have a reference stored in field Q (a memory location).
On CPU A (writer) we initialize an object at memory location R, and write a reference to R into Q
On CPU B (reader), we dereference field Q, and get back memory location R
Then, on CPU B, we read a value from R
Assume the GC does not run at any point. Nothing else interesting happens.
Question: What prevents R from being in B's cache, from before A has modified it during initialisation, such that when B reads from R it gets stale values, in spite of it getting a fresh version of Q to know where R is in the first place?
(Alternate wording: what makes the modification to R visible to CPU B at or before the point that the change to Q is visible to CPU B.)
(And does this only apply to memory allocated with new, or to any memory?)+
Note: I've posted a self-answer here.
This is a really good question. Let us consider your first example.
var handler = SomethingHappened;
if(handler != null)
handler(this, e);
Why is this safe? To answer that question you first have to define what you mean by "safe". Is it safe from a NullReferenceException? Yes, it is pretty trivial to see that caching the delegate reference locally eliminates that pesky race between the null check and the invocation. Is it safe to have more than one thread touching the delegate? Yes, delegates are immutable so there is no way that one thread can cause the delegate to get into a half-baked state. The first two are obvious. But, what about a scenario where thread A is doing this invocation in a loop and thread B at some later point in time assigns the first event handler? Is that safe in the sense that thread A will eventually see a non-null value for the delegate? The somewhat surprising answer to this is probably. The reason is that the default implementations of the add and remove accessors for the event create memory barriers. I believe the early version of the CLR took an explicit lock and later versions used Interlocked.CompareExchange. If you implemented your own accessors and omitted a memory barrier then the answer could be no. I think in reality it highly depends on whether Microsoft added memory barriers to the construction of the multicast delegate itself.
On to the second and more interesting example.
var localFoo = this.memberFoo;
if(localFoo != null)
localFoo.Bar(localFoo.baz);
Nope. Sorry, this actually is not safe. Let us assume memberFoo is of type Foo which is defined like the following.
public class Foo
{
public int baz = 0;
public int daz = 0;
public Foo()
{
baz = 5;
daz = 10;
}
public void Bar(int x)
{
x / daz;
}
}
And then let us assume another thread does the following.
this.memberFoo = new Foo();
Despite what some may think there is nothing that mandates that instructions have to be executed in the order that they were defined in the code as long as the intent of the programmer is logically preserved. The C# or JIT compilers could actually formulate the following sequence of instructions.
/* 1 */ set register = alloc-memory-and-return-reference(typeof(Foo));
/* 2 */ set register.baz = 0;
/* 3 */ set register.daz = 0;
/* 4 */ set this.memberFoo = register;
/* 5 */ set register.baz = 5; // Foo.ctor
/* 6 */ set register.daz = 10; // Foo.ctor
Notice how the assignment to memberFoo occurs before the constructor is run. That is valid because it does not have any unintended side-effects from the perspective of the thread executing it. It could, however, have a major impact on other threads. What happens if your null check of memberFoo on the reading thread occurred when the writing thread just fininished instruction #4? The reader will see a non-null value and then attempt to invoke Bar before the daz variable got set to 10. daz will still hold its default value of 0 thus leading to a divide by zero error. Of course, this is mostly theoretical because Microsoft's implementation of the CLR creates a release-fence on writes that would prevent this. But, the specification would technically allow for it. See this question for related content.
I think I have figured out what the answer is. But I'm not a hardware guy, so I'm open to being corrected by someone more familiar with how CPUs work.
The .NET 2.0 memory model guarantees:
Writes cannot move past other writes from the same thread.
This means that the writing CPU (A in the example), will never write a reference to an object into memory (to Q), until after it has written out contents of that object being constructed (to R). So far, so good. This cannot be re-ordered:
R = <data>
Q = &R
Let's consider the reading CPU (B). What is to stop it reading from R before it reads from Q?
On a sufficiently naïve CPU, one would expect it to be impossible to read from R without first reading from Q. We must first read Q to get the address of R. (Note: it is safe to assume that the C# compiler and JIT behave this way.)
But, if the reading CPU has a cache, couldn't it have stale memory for R in its cache, but receive the updated Q?
The answer seems to be no. For sane cache coherency protocols, invalidation is implemented as a queue (hence "invalidation queue"). So R will always be invalidated before Q is invalidated.
Apparently the only hardware where this is not the case is the DEC Alpha (according to Table 1, here). It is the only listed architecture where dependent reads can be re-ordered. (Further reading.)
Capturing reference to immutable object guarantees thread safety (in sense of consistency, it does not guarantee that you get the latest value).
List of events handlers are immutable and thus it is enough for thread safety to capture reference to current value. The whole object would be consistent as it never change after initial creation.
Your sample code does not explicitly state if Foo is immutable, so you get all sorts of problems with figuring out whether the object can change or not i.e. directly by setting properties. Note that code would be "unsafe" even in single-threaded case as you can't guarantee that particular instance of Foo does not change.
On CPU caches and like: The only change that can invalidate data at actual location in memory for true immutable object is GC's compaction. That code ensures all necessary locks/cache consistency - so managed code would never observe change in bytes referenced by your cached pointer to immutable object.
When this is evaluated:
thing.memberFoo = new Foo(1234);
First new Foo(1234) is evaluated, which means that the Foo constructor executes to completion. Then thing.memberFoo is assigned the value. This means that any other thread reading from thing.memberFoo is not going to read an incomplete object. It's either going to read the old value, or it's going to read the reference to the new Foo object after its constructor has completed. Whether this new object is in the cache or not is irrelevant; the reference being read won't point to the new object until after the constructor has completed.
The same thing happens with the object pool. Everything on the right evaluates completely before the assignment happens.
In your example, B will never get the reference to R before R's constructor has run, because A does not write R to Q until A has completed constructing R. If B reads Q before that, it will get whatever value was already in Q. If R's constructor throws an exception, then Q will never be written to.
C# order of operations guarantees this will happen this way. Assignment operators have the lowest precedence, and new and function call operators have the highest precedence. This guarantees that the new will evaluate before the assignment is evaluated. This is required for things like exceptions -- if an exception is thrown by the constructor then the object being allocated will be in an invalid state and you don't want that assignment to occur regardless of whether you're multithreaded or not.
It seems to me you should be using see this article in this case. This ensures the compiler doesn't perform optimisations that assume access by a single thread.
Events used to use locks, but as of C# 4 use lock-free synchronisation - I'm not sure exactly what (see this article).
EDIT:
The Interlocked methods use memory barriers which will ensure all threads read the updated value (on any sane system). So long as you perform all updates with Interlocked you can safely read the value from any thread without a memory barrier. This is the pattern used in the System.Collections.Concurrent classes.
I understand the volatile keyword in C++ fairly well. But in C#, it appears to take a different meaning, more so related to multi-threading. I thought bool operations were atomic, and I thought that if the operation were atomic, you wouldn't have threading concerns. What am I missing?
https://msdn.microsoft.com/en-us/library/x13ttww7.aspx
I thought bool operations were atomic
They are are indeed atomic.
and I thought that if the operation were atomic, you wouldn't have threading concerns.
That is where you have an incomplete picture. Imagine you have two threads running on separate cores, each with their own cache layers. Thread #1 has foo in its cache, and Thread #2 updates the value of foo. Thread #1 won't see the change unless foo is marked as volatile, acquired a lock, used the Interlocked class, or explicitly called Thread.MemoryBarrier() which would cause the value to be invalidated in the cache. Thus, guaranteeing that you read the most up to date value:
Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.
Eric Lippert has a great post about volatility where he explains:
The true semantics of volatile reads
and writes are considerably more complex than I've outlined here; in
fact they do not actually guarantee that every processor stops what it
is doing and updates caches to/from main memory. Rather, they provide
weaker guarantees about how memory accesses before and after reads and
writes may be observed to be ordered with respect to each other.
Edit:
As per #xanatos comment, This doesn't mean that volatile guarantees an immediate read, it guarantees a read on the most updated value.
The volatile keyword in C# is all about reading/writing reordering, so it is something quite esoteric.
http://www.albahari.com/threading/part4.aspx#_Memory_Barriers_and_Volatility
(that I consider to be one of the "bibles" about threading) writes:
The volatile keyword instructs the compiler to generate an acquire-fence on every read from that field, and a release-fence on every write to that field. An acquire-fence prevents other reads/writes from being moved before the fence; a release-fence prevents other reads/writes from being moved after the fence. These “half-fences” are faster than full fences because they give the runtime and hardware more scope for optimization.
It is something quite unreadable :-)
Now... What it doesn't mean:
It doesn't mean that a value will be read now/will be written now
it simply means that if you read something from a volatile variable, everything else that has been read/written before this "special" read won't be moved after this "special" read. So it creates a barrier. So paradoxically, by reading from a volatile variable, you guarantee that all the writes you have done to any other variable (volatile or not) at the point of the reading will be done.
A volatile write is something probably more important, and is something that is in part guaranteed by the Intel CPU, and something that wasn't guaranteed by the first version of Java: no write reordering. The problem was:
object myrefthatissharedwithotherthreads = new MyClass(5);
where
class MyClass
{
public int Value;
MyClass(int value)
{
Value = value;
}
}
Now... that expression can be imagined to be:
var temp = new MyClass();
temp.Value = 5;
myrefthatissharedwithotherthreads = temp;
where temp is something generated by the compiler that you can't see.
If the writes can be reordered, you could have:
var temp = new MyClass();
myrefthatissharedwithotherthreads = temp;
temp.Value = 5;
and another thread could see a partially initialized MyClass, because the value of myrefthatissharedwithotherthreads is readable before the class MyClass has finished initializing.
I have read many contradicting information (msdn, SO etc.) about volatile and VoletileRead (ReadAcquireFence).
I understand the memory access reordering restriction implication of those - what I'm still completely confused about is the freshness guarantee - which is very important for me.
msdn doc for volatile mentions:
(...) This ensures that the most up-to-date value is present in the field at all times.
msdn doc for volatile fields mentions:
A read of a volatile field is called a volatile read. A volatile read has "acquire semantics"; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
.NET code for VolatileRead is:
public static int VolatileRead(ref int address)
{
int ret = address;
MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
return ret;
}
According to msdn MemoryBarrier doc Memory barrier prevents reordering. However this doesn't seem to have any implications on freshness - correct?
How then one can get freshness guarantee?
And is there difference between marking field volatile and accessing it with VolatileRead and VolatileWrite semantic? I'm currently doing the later in my performance critical code that needs to guarantee freshness, however readers sometimes get stale value. I'm wondering if marking the state volatile will make situation different.
EDIT1:
What I'm trying to achieve - get the guarantee that reader threads will get as recent value of shared variable (written by multiple writers) as possible - ideally no older than what is the cost of context switch or other operations that may postpone the immediate write of state.
If volatile or higher level construct (e.g. lock) have this guarantee (do they?) than how do they achieve this?
EDIT2:
The very condensed question should have been - how do I get guarantee of as fresh value during reads as possible? Ideally without locking (as exclusive access is not needed and there is potential for high contention).
From what I learned here I'm wondering if this might be the solution (solving(?) line is marked with comment):
private SharedState _sharedState;
private SpinLock _spinLock = new SpinLock(false);
public void Update(SharedState newValue)
{
bool lockTaken = false;
_spinLock.Enter(ref lockTaken);
_sharedState = newValue;
if (lockTaken)
{
_spinLock.Exit();
}
}
public SharedState GetFreshSharedState
{
get
{
Thread.MemoryBarrier(); // <---- This is added to give readers freshness guarantee
var value = _sharedState;
Thread.MemoryBarrier();
return value;
}
}
The MemoryBarrier call was added to make sure both - reads and writes - are wrapped by full fences (same as lock code - as indicated here http://www.albahari.com/threading/part4.aspx#_The_volatile_keyword 'Memory barriers and locking' section)
Does this look correct or is it flawed?
EDIT3:
Thanks to very interesting discussions here I learned quite a few things and I actually was able to distill to the simplified unambiguous question that I have about this topic. It's quite different from the original one so I rather posted a new one here: Memory barrier vs Interlocked impact on memory caches coherency timing
I think this is a good question. But, it is also difficult to answer. I am not sure I can give you a definitive answer to your questions. It is not your fault really. It is just that the subject matter is complex and really requires knowing details that might not be feasible to enumerate. Honestly, it really seems like you have educated yourself on the subject quite well already. I have spent a lot of time studying the subject myself and I still do not fully understand everything. Nevertheless, I will still attempt an some semblance of an answer here anyway.
So what does it mean for a thread to read a fresh value anyway? Does it mean the value returned by the read is guaranteed to be no older than 100ms, 50ms, or 1ms? Or does it mean the value is the absolute latest? Or does it mean that if two reads occur back-to-back then the second is guaranteed to get a newer value assuming the memory address changed after the first read? Or does it mean something else altogether?
I think you are going to have a hard time getting your readers to work correctly if you are thinking about things in terms of time intervals. Instead think of things in terms of what happens when you chain reads together. To illustrate my point consider how you would implement an interlocked-like operation using arbitrarily complex logic.
public static T InterlockedOperation<T>(ref T location, T operand)
{
T initial, computed;
do
{
initial = location;
computed = op(initial, operand); // where op is replaced with a specific implementation
}
while (Interlocked.CompareExchange(ref location, computed, initial) != initial);
return computed;
}
In the code above we can create any interlocked-like operation if we exploit the fact that the second read of location via Interlocked.CompareExchange will be guaranteed to return a newer value if the memory address received a write after the first read. This is because the Interlocked.CompareExchange method generates a memory barrier. If the value has changed between reads then the code spins around the loop repeatedly until location stops changing. This pattern does not require that the code use the latest or freshest value; just a newer value. The distinction is crucial.1
A lot of lock free code I have seen works on this principal. That is the operations are usually wrapped into loops such that the operation is continually retried until it succeeds. It does not assume that the first attempt is using the latest value. Nor does it assume every use of the value be the latest. It only assumes that the value is newer after each read.
Try to rethink how your readers should behave. Try to make them more agnostic about the age of the value. If that is simply not possible and all writes must be captured and processed then you may be forced into a more deterministic approach like placing all writes into a queue and having the readers dequeue them one-by-one. I am sure the ConcurrentQueue class would help in that situation.
If you can reduce the meaning of "fresh" to only "newer" then placing a call to Thread.MemoryBarrier after each read, using Volatile.Read, using the volatile keyword, etc. will absolutely guarantee that one read in a sequence will return a newer value than a previous read.
1The ABA problem opens up a new can of worms.
A memory barrier does provide this guarantee. We can derive the "freshness" property that you are looking for from the reording properties that a barrier guarantees.
By freshness you probably mean that a read returns the value of the most recent write.
Let's say we have these operations, each on a different thread:
x = 1
x = 2
print(x)
How could we possibly print a value other than 2? Without volatile the read can move one slot upwards and return 1. Volatile prevents reorderings, though. The write cannot move backwards in time.
In short, volatile guarantees you to see the most recent value.
Strictly speaking I'd need to differentiate between volatile and a memory barrier here. The latter one is a stronger guarantee. I have simplified this discussion because volatile is implemented using memory barriers, at least on x86/x64.
Like many other people, I've always been confused by volatile reads/writes and fences. So now I'm trying to fully understand what these do.
So, a volatile read is supposed to (1) exhibit acquire-semantics and (2) guarantee that the value read is fresh, i.e., it is not a cached value. Let's focus on (2).
Now, I've read that, if you want to perform a volatile read, you should introduce an acquire fence (or a full fence) after the read, like this:
int local = shared;
Thread.MemoryBarrier();
How exactly does this prevent the read operation from using a previously cached value?
According to the definition of a fence (no read/stores are allowed to be moved above/below the fence), I would insert the fence before the read, preventing the read from crossing the fence and being moved backwards in time (aka, being cached).
How does preventing the read from being moved forwards in time (or subsequent instructions from being moved backwards in time) guarantee a volatile (fresh) read? How does it help?
Similarly, I believe that a volatile write should introduce a fence after the write operation, preventing the processor from moving the write forward in time (aka, delaying the write). I believe this would make the processor flush the write to the main memory.
But to my surprise, the C# implementation introduces the fence before the write!
[MethodImplAttribute(MethodImplOptions.NoInlining)] // disable optimizations
public static void VolatileWrite(ref int address, int value)
{
MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
address = value;
}
Update
According to this example, apparently taken from "C# 4 in a Nutshell", fence 2 , placed after a write is supposed to force the write to be flushed to main memory immediately, and fence 3, placed before a read, is supposed to guarantee a fresh read:
class Foo{
int _answer;
bool complete;
void A(){
_answer = 123;
Thread.MemoryBarrier(); // Barrier 1
_complete = true;
Thread.MemoryBarrier(); // Barrier 2
}
void B(){
Thread.MemoryBarrier(); // Barrier 3;
if(_complete){
Thread.MemoryBarrier(); // Barrier 4;
Console.WriteLine(_answer);
}
}
}
The ideas in this book (and my own personal beliefs) seem to contradict the ideas behind C#'s VolatileRead and VolatileWrite implementations.
How exactly does this prevent the read operation from using a
previously cached value?
It does no such thing. A volatile read does not guarantee that the latest value will be returned. In plain English all it really means is that the next read will return a newer value and nothing more.
How does preventing the read from being moved forwards in time (or
subsequent instructions from being moved backwards in time) guarantee
a volatile (fresh) read? How does it help?
Be careful with the terminology here. Volatile is not synonymous with fresh. As I already mentioned above its real usefulness lies in how two or more volatile reads are chained together. The next read in a sequence of volatile reads will absolutely return a newer value than the previous read of the same address. Lock-free code should be written with this premise in mind. That is, the code should be structured to work on the principal of dealing with a newer value and not the latest value. This is why most lock-free code spins in a loop until it can verify that the operation completely successfully.
The ideas in this book (and my own personal beliefs) seem to
contradict the ideas behind C#'s VolatileRead and VolatileWrite
implementations.
Not really. Remember volatile != fresh. Yes, if you want a "fresh" read then you need to place an acquire-fence before the read. But, that is not the same as doing a volatile read. What I am saying is that if the implementation of VolatileRead had the call to Thread.MemoryBarrier before the read instruction then it would not actually produce a volatile read. If would produce fresh a read though.
The important thing to understand is that volatile does not only mean "cannot cache value", but also gives important visibility guarantees (to be exact, it's entirely possible to have a volatile write that only goes to cache; depends solely on the hardware and its used cache coherency protocols)
A volatile read gives acquire semantics, while a volatile write has release semantics. An acquire fence means that you cannot reorder reads or writes before the fence, while a release fence means you cannot move them after the fence. The linked answer in the comments explains that actually quite nicely.
Now the question is, if we don't have any memory barrier before the load how is it guaranteed that we'll see the newest value? The answer to that is: Because we also put memory barriers after each volatile write to guarantee that.
Doug Lea wrote a great summary on which barriers exist, what they do and where to put them for volatile reads/writes for the JMM as a help for compiler writers, but the text is also quite useful for other people. Volatile reads and writes give the same guarantees in both Java and the CLR so that's generally applicable.
Source - scroll down to the "Memory Barriers" section (I'd copy the interesting parts, but the formatting doesn't survive it..)
Since Java 5, the volatile keyword has release/acquire semantics to make side-effects visible to other threads (including assignments to non-volatile variables!). Take these two variables, for example:
int i;
volatile int v;
Note that i is a regular, non-volatile variable. Imagine thread 1 executing the following statements:
i = 42;
v = 0;
At some later point in time, thread 2 executes the following statements:
int some_local_variable = v;
print(i);
According to the Java memory model, the write of v in thread 1 followed by the read of v in thread 2 ensures that thread 2 sees the write to i executed in thread 1, so the value 42 is printed.
My question is: does volatile have the same release/acquire semantics in C#?
The semantics of "volatile" in C# are defined in sections 7.10 and 14.5.4 of the C# 6.0 specification. Rather than reproduce them here, I encourage you to look it up in the spec, and then decide that it is too complicated and dangerous to use "volatile", and go back to using locks. That's what I always do.
See "7.10 Execution order" and "14.5.4 Volatile fields" in the C# 6.0 specification.
In the C# 5.0 specification, see "3.10 Execution Order" and "10.5.3 Volatile Fields"
Well, I believe it ensures that if some_local_variable is read as 0 (due to the write to v), i will be read as 42.
The tricky part is "at some later point in time". While commonly volatility is talked about in terms of "flushing" writes, that's not how it's actually defined in the spec (either Java or C#).
From the C# 4 language spec, section 10.5.3:
For non-volatile fields, optimization techniques that reorder instructions can lead to unexpected and unpredictable results in multi-threaded programs that access fields without synchronization such as that provided by the lock-statement (§8.12). These optimizations can be performed by the compiler, by the run-time system, or by hardware. For volatile fields, such reordering optimizations are restricted:
A read of a volatile field is called a volatile read. A volatile read has “acquire semantics”; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
A write of a volatile field is called a volatile write. A volatile write has “release semantics”; that is, it is guaranteed to happen after any memory references prior to the write instruction in the instruction sequence.
There's then an example which is quite similar to yours, but conditional on the value read from the volatile variable.
And like Eric, I'd strongly avoid relying on volatile myself. It's hard to reason about, and best left to the Joe Duffy/Stephen Toubs of the world.