In C#, we know that a bool is atomic - then why is it valid to mark it as volatile? what is the difference and what is a good (or even practical) use-case for one versus the other?
bool _isPending;
Versus
volatile bool _isPending; // Is this realistic, or insanity?
I have done some reading here and here, and I'm trying to ensure that I fully understand the inner workings of the two. I want to understand when it is appropriate to use one vs the other, or if just bool is enough.
In C#, we know that a bool is atomic - then why is it valid to mark it as volatile? what is the difference and what is a good (or even practical) use-case for one versus the other?
The supposition of your question is that you believe that volatile makes an access atomic. But volatility and atomicity are completely different things, so stop conflating them.
Volatility is the property that the compiler and runtime are restricted from making certain optimizations involving moving reads and writes of variables forwards and backwards in time with respect to each other, and more generally, with respect to other important events such as starting and stopping threads, running constructors, and so on. Consult the C# specification for a detailed list of how operations may or may not be re-ordered with respect to visible side effects.
Atomicity is the property that a particular operation can only be observed as not started or totally completed, and never "halfway done".
As you can see from the definitions, those two things have nothing whatsoever to do with each other.
In C#, all accesses to references, bools, and integer types of size 4 and smaller are guaranteed to be atomic.
Now, in C# there is some slight non-orthogonality between atomicity and volatility, in that only fields of atomic types may be marked as volatile. You may not make a volatile double, for example. It would be really weird and dangerous to say "we're going to restrict how reads and writes may be optimized but still allow tearing". Since volatility does not cause atomicity, you don't want to put users in a position of thinking that an operation is atomic just because it is also volatile.
You should read my series of articles that explains in far more detail what the differences between these things are, and what volatile actually does, and why you do not understand nearly enough to be using it safely.
https://ericlippert.com/2011/05/26/atomicity-volatility-and-immutability-are-different-part-one/
https://ericlippert.com/2011/05/31/atomicity-volatility-and-immutability-are-different-part-two/
https://ericlippert.com/2011/06/16/atomicity-volatility-and-immutability-are-different-part-three/
https://web.archive.org/web/20160323025740/http://blog.coverity.com/2014/03/12/can-skip-lock-reading-integer/
If you think you understand volatility after reading all that, I invite you to try to solve the puzzle I pose here:
https://web.archive.org/web/20160729162225/http://blog.coverity.com/2014/03/26/reordering-optimizations/
If there are updates to variables in the preceding or subsequent code and the order in which the updates occurs is critical, then marking the field as volatile will ensure that an update to that field will happen after any previous updates and before any subsequent updates.
In other words, if _isPending is volatile, then the compiler will not cause these instructions to execute in a different order:
_someVariable = 10;
_isPending = true;
_someOtherVariable = 5;
Whether multi-threaded or not, if we've written code that breaks depending on whether these updates in adjacent lines occur in the specified order then something is wrong. We should ask why that sequence matters. (If there is a scenario where that matters, imagine trying to explain it in a comment so that no one makes a breaking change to the code.)
To nearly anyone reading the code above it would appear that the order of those operations doesn't matter at all. If they do matter that means that someone else who reads our code can't possibly understand what's going on. They could do some refactoring, reorder those lines of code, and break everything without knowing it. It might even work when they test it and then fail unpredictably and inconsistently when it's deployed.
I agree with Eric Lippert's comment in the answer you linked:
Frankly, I discourage you from ever making a volatile field. Volatile
fields are a sign that you are doing something downright crazy: you're
attempting to read and write the same value on two different threads
without putting a lock in place.
I suppose I failed to directly answer the direction. volatile is valid for a type (including bool) because it's possible to perform an atomic operation on that type. volatile protects from compiler optimizations. According to the documentation for volatile,
This ensures that the most up-to-date value is present in the field at
all times.
But if the field can't be represented in 32 bits or less then preventing compiler optimizations can't guarantee that anyway.
Related
1. Out of curiosity, what does operations like the following do behind the scenes when they get called for example from 2 or 3 threads at the same time?
Interlocked.Add(ref myInt, 24);
Interlocked.Increment(ref counter);
Does the C# creates an inside queue that tells for example Thread 2, now it's your turn to do the operation, then it tells Thread 1 now it's your turn, and then Thread 3 you do the operation? So that they will not interfere with each other?
2. Why doesn't the C# do this process automatically?
Isn't it obvious that when a programmer write something like the following inside a multi-thread method:
myByte++;
Sum = int1 + int2 + int3;
and this variables are Shared with other threads, that he wants each of this operations do be executed as an Atomic operation without interruptions?
Why does the programmer have to tell it Explicitly to do so?
Isn't it clear that that's what every programmer wants? Aren't this "Interlocked" methods just add unnecessary complication to the language?
Thanks.
what does operations like the following do behind the scenes
As far as how it's implemented internally, CPU hardware arbitrates which core gets ownership of the cache line when there's contention. See Can num++ be atomic for 'int num'? for a C++ and x86 asm / cpu-architecture explanation of the details.
Re: why CPUs and compilers want to load early and store late:
see Java instruction reordering and CPU memory reordering
Atomic RMW prevents that, so do seq_cst store semantics on most ISAs where you do a plain store and then a full barrier. (AArch64 has a special interaction between stlr and ldar to prevent StoreLoad reordering of seq_cst operations, but still allow reordering with other operations.)
Isn't it obvious that when a programmer write something like the following inside a multi-thread method [...]
What does that even mean? It's not running the same method in multiple threads that's a problem, it's accessing shared data. How is the compiler supposed to know which data will be accessed non-readonly from multiple threads at the same time, not inside a critical section?
There's no reasonable way to prove this in general, only in some simplistic cases. If a compiler were to try, it would have to be conservative, erring on the side of making more things atomic, at a huge cost in performance. The other kind of mistake would be a correctness problem, and if that could just happen when the compiler guesses wrong based on some undocumented heuristics, it would make the language unusable for multi-threaded programs.
Besides that, not all multi-threaded code needs sequential consistency all the time; often acquire/release or relaxed atomics are fine, but sometimes they aren't. It makes a lot of sense for programmers to be explicit about what ordering and atomicity their algorithm is built on.
Also you carefully design lock-free multi-threaded code to do things in a sensible order. In C++, you don't have to use Interlock..., but instead you make a variable std::atomic<int> shared_int; for example. (Or use std::atomic_ref<int> to do atomic operations on variables that other code can access non-atomically, like using Interlocked functions).
Having no explicit indication in the source of which operations are atomic with what ordering semantics would make it harder to read and maintain such code. Correct lock-free algorithms don't just happen by having the compiler turn individual operators into atomic ops.
Promoting every operation to atomic would destroy performance. Most data isn't shared, even in functions that access some shared data structures.
Atomic RMW (like x86 lock add [rdi], eax) is much slower than a non-atomic operation, especially since non-atomic lets the compiler optimize variables into registers.
An atomic RMW on x86 is a full memory barrier, so making every operation atomic would destroy memory-level parallelism every time you use a += or ++.
e.g. one per 18 cycle throughput on Skylake for lock xadd [mem], reg if hot in L1d cache, vs. one per 0.25 cycles for add reg, reg (https://uops.info), not to mention removing opportunities to optimize away and combine operations. And reducing the ability for out-of-order execution to overlap work.
This is a partial answer to you question you asked in the in the comments:
Why not? If I, as a programmer know exactly where I should put this protections, why can't the compiler?
In order for the compiler to do that, it would need to understand all possible execution paths through your program. This is effectively the Path Testing problem discussed here: https://softwareengineering.stackexchange.com/questions/277693/does-path-coverage-guarantee-finding-all-bugs
That article states that this is equivalent to the halting problem, which is computer-science-ese for saying it's an unsolvable problem.
The cool thing is that you want to do this in a world where you have multiple threads of execution running on possibly multiple processors. That makes an unsolvable problem that much harder to solve.
On the other hand, the programmer should know what his/her program does...
It is known that, unlike Java's volatiles, .NET's ones allow reordering of volatile writes with the following volatile reads from another location. When it is a problem MemoryBarier is recommended to be placed between them, or Interlocked.Exchange can be used instead of volatile write.
It works but MemoryBarier could be a performance killer when used in highly optimized lock-free code.
I thought about it a bit and came with an idea. I want somebody to tell me if I took the right way.
So, the idea is the following:
We want to prevent reordering between these two accesses:
volatile1 write
volatile2 read
From .NET MM we know that :
1) writes to a variable cannot be reordered with a following read from
the same variable
2) no volatile accesses can be eliminated
3) no memory accesses can be reordered with a previous volatile read
To prevent unwanted reordering between write and read we introduce a dummy volatile read from the variable we've just written to:
A) volatile1 write
B) volatile1 read [to a visible (accessible | potentially shared) location]
C) volatile2 read
In such case B cannot be reordered with A as they both access the same variable,
C cannot be reordered with B because two volatile reads cannot be reordered with each other, and transitively C cannot be reordered with A.
And the question:
Am I right? Can that dummy volatile read be used as a lightweight memory barrier for such scenario?
Here I will use an arrow notation to conceptualize the memory barriers. I use an up arrow ↑ and a down arrow ↓ to represent volatile writes and reads respectively. Think of the arrow head as pushing away any other reads or writes. So no other memory access can move past the arrow head, but they can move past the tail.
Consider your first example. This is how it would be conceptualized.
↑
volatile1 write // A
volatile2 read // B
↓
So clearly we can see that the read and the write are allowed to switch positions. You are correct.
Now consider your second example. You claimed that introducing a dummy read would prevent the write of A and the read of B from getting swapped.
↑
volatile1 write // A
volatile1 read // A
↓
volatile2 read // B
↓
We can see that B is prevented from floating up by the dummy read of A. We can also see that the read of A cannot float down because, by inference, that would be the same as B moving up before A. But, notice that we have no ↑ arrow that would prevent the write to A from floating down (remember it can still move past the tail of an arrow). So no, at least theoretically, injecting a dummy read of A will not prevent the original write of A and the read of B from getting swapped because the write to A is still allowed to move downward.
I had to really think about this scenario. One thing I pondered for a quite some time is whether the read and write to A are locked together in tandem. If so then that would prevent the write to A from moving down because it would have to take the read with it which we already said was prevented. So if you go with that school of thought then your solution might just work. But, I read the specification again and I see nothing special mentioned about volatile accesses to the same variable. Obviously, the thread has to execute in a manner that is logically consistent with the original program sequence (that is mentioned in the specification). But, I can visualize ways the compiler or hardware could optimize (or otherwise reorder) that tandem access of A and still get the same result. So, I simply have to side with caution here and assume that the write to A can move down. Remember, a volatile read does not mean "fresh read from main memory". The write to A could be cached in a register and then the read comes from that register delaying the actual write to a later time. Volatile semantics do not prevent that scenario as far as I know.
The correct solution would be to put a call to Thread.MemoryBarrier in between the accesses. You can see how this is conceptualized with the arrow notation.
↑
volatile1 write // A
↑
Thread.MemoryBarrier
↓
volatile2 read // B
↓
Now you can see that the read is not allowed to float up and the write is not allowed to float down preventing the swap.
You can see some of my other memory barrier answers using this arrow notation here, here, and here just to name a few.
I forgot to post the soon found answer back to SO. Better late than never..
Turns out it is impossible thanks to how processors (at least x86-x64 kind of them) optimize memory accesses.
I found the answer when was reading Intel manuals on its procs. Example 8-5:" Intra-Processor Forwarding is Allowed" was looking suspicious. Googling for "store buffer forwarding" lead to Joe Duffy's blog posts (first and second - read them pls).
To optimize writes processor uses store buffers (per processor queues of write ops). Buffering writes locally allows it to do next optimization: satisfying reads from the previously buffered writes to the same memory location and which haven't left the processor yet. The technique is called store-buffer forwarding (or store-to-load forwarding).
The end result in our case is that as reading at B is satisfied from a local storage (store buffer) it is not considered a volatile read and can be reordered with further volatile reads from another memory location (C).
It seems like a violation of the rule "Volatile reads don't reorder with each other". Yes, it is a violation, but very rare and exotic one.
Why did it happen? Probably because Intel's released its first formal document on memory model of its processors years after .NET (and its JIT compiler) saw the sunlight.
So the answer is: no, the dummy reading (B) doesn't prevent reordering between A and C and cannot be used as a lightweight memory barrier.
EDIT The conclusions I drew from the C# specs are wrong, see below. END EDIT
I surely am not someone 'authorized', but I think you haven't understood the memory model correctly.
Quoting the C# specification, section §10.10 Execution order, third bullet point on page 105:
The ordering of side effects is preserved with respect to volatile reads and writes.
Volatile reads and writes are defined as "side-effects" and this paragraph states that the ordering of side-effects is preserved.
So it is my understanding that your whole question is based on an incorrect assumption: volatile reads and writes cannot be reordered.
I think you got confused with that fact that with respect to non-volatile memory operations, volatile reads and writes are only half-fences.
EDIT this article: The C# Memory Model in Theory and Practice, Part 2 states exactly the opposite and supports your assertion that volatile reads can move up past an unrelated volatile write. The suggested solution is to introduce a MemoryBarrier where it matters.
Comment by Daniel below also says that the CLI spec is more specific about this than the C# spec and allows this reordering.
Now I find that the C# spec I quoted above is confusing! But given that on x86 the same instructions are used for a volatile memory access and a regular memory access, it makes perfect sense that they are subject to the same half-fence reordering issues. END EDIT
Let me disagree with the accepted answer from Brian Gideon.
OmariO your solution to the problem (dummy read) looks perfectly correct to me. As you mentioned correctly, writes to a variable cannot be reordered with a following read from the same variable. If that reordering was possible then it would make the code incorrect in a single-threaded case (the read operation could return not the same value that has been written by the previous write operation). That is it would violate the fundamental rule of any memory model: single-threaded execution of a program must not be logically changed.
Also guys, Brian and OmariO, please don't mix up memory operations with acquire/release semantics and acquire/release memory fences. E.g. a read-acquire operation is not the same as an acquire fence. They have similar semantics but the distinction between them is very important. The best explanation of that terms that I know is in the great blog of Jeff Preshing:
Acquire and Release Semantics
Acquire and Release Fences
I'm using such configuration:
.NET framework 4.5
Windows Server 2008 R2
HP DL360p Gen8 (2 * Xeon E5-2640, x64)
I have such field somewhere in my program:
protected int HedgeVolume;
I access this field from several threads. I assume that as I have multi-processor system it's possible that this threads are executing on different processors.
What should I do to guarantee that any time I use this field the most recent value is "read"? And to make sure that when I "write" value it become available to all other threads immediately?
What should I do?
just leave field as is.
declare it volatile
use Interlocked class to access the field
use .NET 4.5 Volatile.Read, Volatile.Write methods to access the field
use lock
I only need simplest way to make my program work on this configuration I don't need my program to work on another computers or servers or operation systems. Also I want minimal latency so I'm looking for fastest solution that will always work on this standard configuration (multiprocessor intel x64, .net 4.5).
Your question is missing one key element... How important is the integrity of the data in that field?
volatile gives you performance, but if a thread is currently writing changes to the field, you won't get that data until it's done, so you might access out of date information, and potentially overwrite changes another thread is currently doing. If the data is sensitive, you might get bugs that would get very hard to track. However, if you are doing very quick update, overwrite the value without reading it and don't care that once in a while you get outdated (by a few ms) data, go for it.
lock guaranty that only one thread can access the field at a time. You can put it only on the methods that write the field and leave the reading method alone. The down side is, it is slow, and may block a thread while another is performing its task. However, you are sure your data stay valid.
Interlock exist to shield yourself from the scheduler context switch. My opinion? Don't use it unless you know exactly why you would be using it and exactly how to use it. It gives options, but with great options comes great problematic. It prevents a context switch while a variable is being update. It might not do what you think it does and won't prevent parallel threads from performing their tasks simultaneously.
You want to use Volatile.Read().
As you are running on x86, all writes in C# are the equivalent of Volatile.Write(), you only need to use this for Itanium.
Volatile.Read() will ensure that you get the latest copy regardless of which thread last wrote it.
There is a fantastic write up here, C# Memory Model Explained
Summary of it includes,
On some processors, not only must the compiler avoid certain
optimizations on volatile reads and writes, it also has to use special
instructions. On a multi-core machine, different cores have different
caches. The processors may not bother to keep those caches coherent by
default, and special instructions may be needed to flush and refresh
the caches.
Hopefully that much is obvious, other than the need for volatile to stop the compiler from optimising it, there is the processor as well.
However, in C# all writes are volatile (unlike say in Java),
regardless of whether you write to a volatile or a non-volatile field.
So, the above situation actually never happens in C#. A volatile write
updates the thread’s cache, and then flushes the entire cache to main
memory.
You do not need Volatile.Write(). More authoratitive source here, Joe Duffy CLR Memory Model. However, you may need it to stop the compiler reordering it.
Since all C# writes are volatile, you can think of all writes as going
straight to main memory. A regular, non-volatile read can read the
value from the thread’s cache, rather than from main
You need Volatile.Read()
When you start designing a concurrent program, you should consider these options in order of preference:
1) Isolation: each thread has it's own private data
2) Immutability: threads can see shared state, but it never changes
3) Mutable shared state: protect all access to shared state with locks
If you get to (3), then how fast do you actually need this to be?
Acquiring an uncontested lock takes in the order of 10ns ( 10-8 seconds ) - that's fast enough for most applications and is the easiest way to guarantee correctness.
Using any of the other options you mention takes you into the realm of low-lock programming, which is insanely difficult to get correct.
If you want to learn how to write concurrent software, you should read these:
Intro: Joe Albahari's free e-book - will take about a day to read
Bible: Joe Duffy's "Concurrent Programming on Windows" - will take about a month to read
Depends what you DO. For reading only, volatile is easiest, interlocked allows a little more control. Lock is unnecessary as it is more ganular than the problem you describe. Not sure about Volatile.Read/Write, never used them.
volatile - bad, there are some issues (see Joe Duffy's blog)
if all you do is read the value or unconditionally write a value - use Volatile.Read and Volatile.Write
if you need to read and subsequently write an updated value - use the lock syntax. You can however achieve the same effect without lock using the Interlocked classes functionality, but this is more complex (involves CompareExchange s to ensure that you are updating the read value i.e. has not been modified since the read operation + logic to retry if the value was modified since the read).
From this i can understand that you want to be able to read the last value that it was writtent in a field. Lets make an analogy with the sql concurency problem of the data. If you want to be able to read the last value of a field you must make atomic instructions. If someone is writing a field all of the threads must be locked for reading until that thread finished the writing transaction. After that every read on that thread will be safe. The problem is not with reading as it is with writing. A lock on that field whenever its writtent should be enough if you ask me ...
First have a look here: Volatile vs. Interlocked vs. lock
The volatile modifier shurely is a good option for a multikernel cpu.
But is this enough? It depends on how you calculate the new HedgeVolume value!
If your new HedgeVolume does not depend on current HedgeVolume then your done with volatile.
But if HedgeVolume[x] = f(HedgeVolume[x-1]) then you need some thread synchronisation to guarantee that HedgeVolume doesn't change while you calculate and assign the new value. Both, lock and Interlocked szenarios would be suitable in this case.
I had a similar question and found this article to be extremely helpful. It's a very long read, but I learned a LOT!
I've been reading Joe Duffy's book on Concurrent programming. I have kind of an academic question about lockless threading.
First: I know that lockless threading is fraught with peril (if you don't believe me, read the sections in the book about memory model)
Nevertheless, I have a question:
suppose I have an class with an int property on it.
The value referenced by this property will be read very frequently by multiple threads
It is extremely rare that the value will change, and when it does it will be a single thread that changes it.
If it does change while another operation that uses it is in flight, no one is going to lose a finger (the first thing anyone using it does is copy it to a local variable)
I could use locks (or a readerwriterlockslim to keep the reads concurrent).
I could mark the variable volatile (lots of examples where this is done)
However, even volatile can impose a performance hit.
What if I use VolatileWrite when it changes, and leave the access normal for reads. Something like this:
public class MyClass
{
private int _TheProperty;
internal int TheProperty
{
get { return _TheProperty; }
set { System.Threading.Thread.VolatileWrite(ref _TheProperty, value); }
}
}
I don't think that I would ever try this in real life, but I'm curious about the answer (more than anything, as a checkpoint of whether I understand the memory model stuff I've been reading).
Marking a variable as "volatile" has two effects.
1) Reads and writes have acquire and release semantics, so that reads and writes of other memory locations will not "move forwards and backwards in time" with respect to reads and writes of this memory location. (This is a simplification, but you take my point.)
2) The code generated by the jitter will not "cache" a value that seems to logically be unchanging.
Whether the former point is relevant in your scenario, I don't know; you've only described one memory location. Whether or not it is important that you have only volatile writes but not volatile reads is something that is up to you to decide.
But it seems to me that the latter point is quite relevant. If you have a spin lock on a non-volatile variable:
while(this.prop == 0) {}
the jitter is within its rights to generate this code as though you'd written
if (this.prop == 0) { while (true) {} }
Whether it actually does so or not, I don't know, but it has the right to. If what you want is for the code to actually re-check the property on each go round the loop, marking it as volatile is the right way to go.
The question is whether the reading thread will ever see the change. It's not just a matter of whether it sees it immediately.
Frankly I've given up on trying to understand volatility - I know it doesn't mean quite what I thought it used to... but I also know that with no kind of memory barrier on the reading thread, you could be reading the same old data forever.
The "performance hit" of volatile is because the compiler now generates code to actually check the value instead of optimizing that away - in other words, you'll have to take that performance hit regardless of what you do.
At the CPU level, yes every processor will eventually see the change to the memory address. Even without locks or memory barriers. Locks and barriers would just ensure that it all happened in a relative ordering (w.r.t other instructions) such that it appeared correct to your program.
The problem isn't cache-coherency (I hope Joe Duffy's book doesn't make that mistake). The caches stay conherent - it is just that this takes time, and the processors don't bother to wait for that to happen - unless you enforce it. So instead, the processor moves on to the next instruction, which may or may not end up happening before the previous one (because each memory read/write make take a different amount of time. Ironically because of the time for the processors to agree on coherency, etc. - this causes some cachelines to be conherent faster than others (ie depending on whether the line was Modified, Exclusive, Shared, or Invalid it takes more or less work to get into the necessary state).)
So a read may appear old or from an out of date cache, but really it just happened earlier than expected (typically because of look-ahead and branch prediction). When it really was read, the cache was coherent, it has just changed since then. So the value wasn't old when you read it, but it is now when you need it. You just read it too soon. :-(
Or equivalently, it was written later than the logic of your code thought it would be written.
Or both.
Anyhow, if this was C/C++, even without locks/barriers, you would eventually get the updated values. (within a few hundred cycles typically, as memory takes about that long). In C/C++ you could use volatile (the weak non-thread volatile) to ensure that the value wasn't read from a register. (Now there's a non-coherent cache! ie the registers)
In C# I don't know enough about CLR to know how long a value could stay in a register, nor how to ensure you get a real re-read from memory. You've lost the 'weak' volatile.
I would suspect as long as the variable access doesn't completely get compiled away, you will eventually run out of registers (x86 doesn't have many to start with) and get your re-read.
But no guarantees that I see. If you could limit your volatile-read to a particular point in your code that was often, but not too often (ie start of next task in a while(things_to_do) loop) then that might be the best you can do.
This is the pattern I use when the 'last writer wins' pattern is applicable to the situation. I had used the volatile keyword, but after seeing this pattern in a code example from Jeffery Richter, I started using it.
For normal things (like memory-mapped devices), the cache-coherency protocols going on within/between the CPU/CPUs is there to ensure that different threads sharing that memory get a consistent view of things (i.e., if I change the value of a memory location in one CPU, it will be seen by other CPUs that have the memory in their caches). In this regard volatile will help to ensure that the optimizer doesn't optimize away memory accesses (which are always going through cache anyway) by, say, reading the value cached in a register. The C# documentation seems pretty clear on this. Again, the application programmer doesn't generally have to deal with cache-coherency themselves.
I highly recommend reading the freely available paper "What Every Programmer Should Know About Memory". A lot of magic goes on under the hood that mostly prevents shooting oneself in the foot.
In C#, the int type is thread-safe.
Since you said that only one thread writes to it, you should never have contention as to what is the proper value, and as long as you are caching a local copy, you should never get dirty data.
You may, however, want to declare it volatile if an OS thread will be doing the update.
Also keep in mind that some operations are not atomic, and can cause problems if you have more than one writer. For example, even though the bool type wont corrupt if you have more than one writer, a statement like this:
a = !a;
is not atomic. If two threads read at the same time, you have a race condition.
In other words, can I do something with a volatile variable that could not also be solved with a normal variable and the Interlocked class?
EDIT: question largely rewritten
To answer this question, I dived a bit further in the matter and found out a few things about volatile and Interlocked that I wasn't aware of. Let's clear that out, not only for me, but for this discussion and other people reading up on this:
volatile read/write are supposed to be immune to reordering. This only means reading and writing, it does not mean any other action;
volatility is not forced on the CPU, i.e., hardware level (x86 uses acquire and release fences on any read/write). It does prevent compiler or CLR optimizations;
Interlocked uses atomic assembly instructions for CompareExchange (cmpxchg), Increment (inc) etc;
Interlocked does use a lock sometimes: a hardware lock on multi processor systems; in uni-processor systems, there is no hardware lock;
Interlocked is different from volatile in that it uses a full fence, where volatile uses a half fence.
A read following a write can be reordered when you use volatile. It can't happen with Interlocked. VolatileRead and VolatileWrite have the same reordering issue as `volatile (link thanks to Brian Gideon).
Now that we have the rules, we can define an answer to your question:
Technically: yes, there are things you can do with volatile that you cannot do with Interlocked:
Syntax: you cannot write a = b where a or b is volatile, but this is obvious;
You can read a different value after you write it to a volatile variable because of reordering. You cannot do this with Interlocked. In other words: you can be less safe with volatile then you can be with Interlocked.
Performance: volatile is faster then Interlocked.
Semantically: no, because Interlocked simply provides a superset of operations and is safer to use because it applies full fencing. You can't do anything with volatile that you cannot do with Interlocked and you can do a lot with Interlocked that you cannot do with volatile:
static volatile int x = 0;
x++; // non-atomic
static int y = 0;
Interlocked.Increment(y); // atomic
Scope: yes, declaring a variable volatile makes it volatile for every single access. It is impossible to force this behavior any other way, hence volatile cannot be replaced with Interlocked. This is needed in scenarios where other libraries, interfaces or hardware can access your variable and update it anytime, or need the most recent version.
If you'd ask me, this last bit is the actual real need for volatile and may make it ideal where two processes share memory and need to read or write without locking. Declaring a variable as volatile is much safer in this context then forcing all programmers to use Interlocked (which you cannot force by the compiler).
EDIT: The following quote was part of my original answer, I'll leave it in ;-)
A quote from the the C# Programming Language standard:
For nonvolatile fields,optimization
techniques that consider that reorder
instructions can lead to unexpected
and unpredictable results in
multithreaded programs that access
fields without synchronization such as
that provided by the lock-statement.
These optimizationscan be performed by
the compiler, by the runtime system,
or by hardware. For volatile fields,
such reordering optimizations are
restricted:
A read of a volatile field is called a volatile read. A volatile read
has :acquire semantics"; that is, it
is guaranteed to occur prior to any
references to memory that occur after
it in the instruction sequence.
A write of a volatile field is called a volatile write. A
volatile write has "release
semantics"; that is, it is guaranteed
to happen after any memory references
prior to the write instruction in the
instruction sequence.
Update: question largely rewritten, corrected my original response and added a "real" answer
This is a fairly complex topic. I find Joseph Albahari's writeup to be one of the more definitive and accurate sources for multithreading concepts in the .NET Framework that might help answer your question.
But, to quickly summarizes there is a lot of overlap between the volatile keyword and the Interlocked class as far as how they can be used. And of course both go way above and beyond what you can do with a normal variable.
Yes - you can look at the value directly.
As long as you ONLY use the Interlocked class to access the variable then there is no difference. What volatile does is it tells the compiler that the variable is special and when optimizing it shouldn't assume that the value hasn't changed.
Takes this loop:
bool done = false;
...
while(!done)
{
... do stuff which the compiler can prove doesn't touch done...
}
If you set done to true in another thread you would expect the loop to exit. However - if done is not marked as volatile then the compiler has the option to realize that the loop code can never change done and it can optimize out the compare for exit.
This is one of the difficult things about multithread programming - many of the situations which are problems only come up in certain situations.
I won't attempt to be an authority on this subject but I would highly recommend that you take a look at this article by the vaunted Jon Skeet.
Also take a look at the final part of this answer which details what volatile should be used for.
Yes, you can gain some performance by using a volatile variable instead of a lock.
Lock is a full memory barrier which can give you the same characteristics of a volatile variable as well as many others. As has already been said volatile just ensures that in multi-threaded scenarios if a CPU changes a value in its cache line, the other CPUs sees the value immediately but do not ensure any locking semantic at all.
The thing is lock is a lot more powerful than volatile and you should use volatile when you can to avoid unnecessary locks.