Reading value before InterLocked.CompareExchange [duplicate]

Reading value before InterLocked.CompareExchange [duplicate] - c#

The following example comes from the MSDN.
public class ThreadSafe
{
// Field totalValue contains a running total that can be updated
// by multiple threads. It must be protected from unsynchronized
// access.
private float totalValue = 0.0F;
// The Total property returns the running total.
public float Total { get { return totalValue; }}
// AddToTotal safely adds a value to the running total.
public float AddToTotal(float addend)
{
float initialValue, computedValue;
do
{
// Save the current running total in a local variable.
initialValue = totalValue;
// Add the new value to the running total.
computedValue = initialValue + addend;
// CompareExchange compares totalValue to initialValue. If
// they are not equal, then another thread has updated the
// running total since this loop started. CompareExchange
// does not update totalValue. CompareExchange returns the
// contents of totalValue, which do not equal initialValue,
// so the loop executes again.
}
while (initialValue != Interlocked.CompareExchange(ref totalValue,
computedValue, initialValue));
// If no other thread updated the running total, then
// totalValue and initialValue are equal when CompareExchange
// compares them, and computedValue is stored in totalValue.
// CompareExchange returns the value that was in totalValue
// before the update, which is equal to initialValue, so the
// loop ends.
// The function returns computedValue, not totalValue, because
// totalValue could be changed by another thread between
// the time the loop ends and the function returns.
return computedValue;
}
}
Should the totalValue not be declared as volatile to get the freshest value possible? I imagine that if you get a dirty value from a CPU cache then the call to Interlocked.CompareExchange should take care of getting the freshest value and cause the loop to try again. Would the volatile keyword potentially save one unnecessary loop?
I guess it isn't 100% necessary to have the volatile keyword because the method has overloads that takes datatype such as long that don't support the volatile keyword.

No, volatile wouldn't be helpful at all, and certainly not for this reason. It would just give that first read "acquire" semantics instead of effectively relaxed, but either way will compile to similar asm that runs a load instruction.
if you get a dirty value from a CPU cache
CPU caches are coherent, so anything you read from CPU cache is the current globally agreed-on value for this line. "Dirty" just means it doesn't match DRAM contents, and will have to get written-back if / when evicted. A load value can also be forwarded from the store buffer, for a value this thread stored recently that isn't yet globally visible, but that's fine, Interlocked methods are full barriers that result in waiting for the store buffer to drain as well.
If you mean stale, then no, that's impossible, cache coherency protocols like MESI prevent that. This is why Interlocked things like CAS aren't horribly slow if the cache line is already owned by this core (MESI Modified or Exclusive state). See Myths Programmers Believe about CPU Caches which talks some about Java volatiles, which I think are similar to C# volatile.
This C++11 answer also explains some about cache coherency and asm. (Note that C++11 volatile is significantly different from C#, and doesn't imply any thread-safety or ordering, but does still imply the asm must do a load or a store, not optimize into a register.)
On non-x86, running extra barrier instructions after the initial read (to give those acquire semantics) before you even try a CAS just slows things down. (On x86 including x86-64, a volatile read compiles to the same asm as a plain read, except it prevents compile-time reordering).
A volatile read can't be optimized into just using a value in a register if the current thread just wrote something via a non-interlocked = assignment. That's not helpful either; if we just stored something and remember in a register what we stored, a load that does store-forwarding from the store buffer is morally equivalent to just using the register value.
Most of the good use-cases for lock-free atomics are when contention is lowish, so usually things can succeed without hardware having to wait a long time for access / ownership of the cache line. So you want to make the uncontended case as fast as possible. Avoid volatile even if there was anything to gain from it in highly-contended cases, which I don't think there is anyway.
If you ever did any plain stores (assignments with =, not interlocked RMW), volatile would have an effect on those, too. That might mean waiting for the store buffer to drain before later memory ops in this thread can run, if C# volatile gives semantics like C++ memory_order_seq_cst. In that case, you'd be slowing down the code involving the stores a lot, if you didn't need ordering wrt. other loads/stores. If you did such a store before this CAS code, yeah you'd be waiting until the store (and all previous stores) were globally visible to try reloading it. This would mean a reload + CAS the CPU is waiting to do right after are very likely to not have to spin because the CPU will have ownership of that line, but I think you'd effectively get similar behaviour from the full barrier that's part of an Interlocked CAS.

You could get some insights by studying the source code of the ImmutableInterlocked.Update method:
/// <summary>
/// Mutates a value in-place with optimistic locking transaction semantics
/// via a specified transformation function.
/// The transformation is retried as many times as necessary to win the
/// optimistic locking race.
/// </summary>
public static bool Update<T>(ref T location, Func<T, T> transformer)
where T : class
{
Requires.NotNull(transformer, "transformer");
bool successful;
T oldValue = Volatile.Read(ref location);
do
{
T newValue = transformer(oldValue);
if (ReferenceEquals(oldValue, newValue))
{
// No change was actually required.
return false;
}
T interlockedResult = Interlocked.CompareExchange(ref location,
newValue, oldValue);
successful = ReferenceEquals(oldValue, interlockedResult);
oldValue = interlockedResult; // we already have a volatile read
// that we can reuse for the next loop
}
while (!successful);
return true;
}
You can see that the method starts by making a volatile read on the location argument. I think that there are two reasons for that:
This method has a little twist by avoiding the Interlocked.CompareExchange operation, in case the new value happens to be the same with the already stored value.
The transformer delegate has an unknown computational complexity, so invoking it on a potentially stale value could be much more costly than the cost of the initial Volatile.Read.

It does not matter since Interlocked.CompareExchange inserts memory barriers.
initialValue = totalValue;
At this point the totalValue could be anything. Stale value from cache, just replaced, who knowns. While volatile would prevent reading a cached value, the value might become stale just after it was read, so volatile does not solve anything.
Interlocked.CompareExchange(ref totalValue, computedValue, initialValue)
At this point we have memory barriers that ensures totalValue is up to date. If it is equal to initialValue, then we also know that the initialValue was not stale when we started the computation. If it is not equal we try again, and since we have issued a memory barrier we do not risk getting the same stale value the next iteration.
Edit:
I find it very unlikely that there would be any performance difference. If there is no contention there is little reason for the value to be stale. If there is high contention the time will be dominated by needing to loop.

Related

Variable freshness guarantee in .NET (volatile vs. volatile read)

I have read many contradicting information (msdn, SO etc.) about volatile and VoletileRead (ReadAcquireFence).
I understand the memory access reordering restriction implication of those - what I'm still completely confused about is the freshness guarantee - which is very important for me.
msdn doc for volatile mentions:
(...) This ensures that the most up-to-date value is present in the field at all times.
msdn doc for volatile fields mentions:
A read of a volatile field is called a volatile read. A volatile read has "acquire semantics"; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
.NET code for VolatileRead is:
public static int VolatileRead(ref int address)
{
int ret = address;
MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
return ret;
}
According to msdn MemoryBarrier doc Memory barrier prevents reordering. However this doesn't seem to have any implications on freshness - correct?
How then one can get freshness guarantee?
And is there difference between marking field volatile and accessing it with VolatileRead and VolatileWrite semantic? I'm currently doing the later in my performance critical code that needs to guarantee freshness, however readers sometimes get stale value. I'm wondering if marking the state volatile will make situation different.
EDIT1:
What I'm trying to achieve - get the guarantee that reader threads will get as recent value of shared variable (written by multiple writers) as possible - ideally no older than what is the cost of context switch or other operations that may postpone the immediate write of state.
If volatile or higher level construct (e.g. lock) have this guarantee (do they?) than how do they achieve this?
EDIT2:
The very condensed question should have been - how do I get guarantee of as fresh value during reads as possible? Ideally without locking (as exclusive access is not needed and there is potential for high contention).
From what I learned here I'm wondering if this might be the solution (solving(?) line is marked with comment):
private SharedState _sharedState;
private SpinLock _spinLock = new SpinLock(false);
public void Update(SharedState newValue)
{
bool lockTaken = false;
_spinLock.Enter(ref lockTaken);
_sharedState = newValue;
if (lockTaken)
{
_spinLock.Exit();
}
}
public SharedState GetFreshSharedState
{
get
{
Thread.MemoryBarrier(); // <---- This is added to give readers freshness guarantee
var value = _sharedState;
Thread.MemoryBarrier();
return value;
}
}
The MemoryBarrier call was added to make sure both - reads and writes - are wrapped by full fences (same as lock code - as indicated here http://www.albahari.com/threading/part4.aspx#_The_volatile_keyword 'Memory barriers and locking' section)
Does this look correct or is it flawed?
EDIT3:
Thanks to very interesting discussions here I learned quite a few things and I actually was able to distill to the simplified unambiguous question that I have about this topic. It's quite different from the original one so I rather posted a new one here: Memory barrier vs Interlocked impact on memory caches coherency timing

I think this is a good question. But, it is also difficult to answer. I am not sure I can give you a definitive answer to your questions. It is not your fault really. It is just that the subject matter is complex and really requires knowing details that might not be feasible to enumerate. Honestly, it really seems like you have educated yourself on the subject quite well already. I have spent a lot of time studying the subject myself and I still do not fully understand everything. Nevertheless, I will still attempt an some semblance of an answer here anyway.
So what does it mean for a thread to read a fresh value anyway? Does it mean the value returned by the read is guaranteed to be no older than 100ms, 50ms, or 1ms? Or does it mean the value is the absolute latest? Or does it mean that if two reads occur back-to-back then the second is guaranteed to get a newer value assuming the memory address changed after the first read? Or does it mean something else altogether?
I think you are going to have a hard time getting your readers to work correctly if you are thinking about things in terms of time intervals. Instead think of things in terms of what happens when you chain reads together. To illustrate my point consider how you would implement an interlocked-like operation using arbitrarily complex logic.
public static T InterlockedOperation<T>(ref T location, T operand)
{
T initial, computed;
do
{
initial = location;
computed = op(initial, operand); // where op is replaced with a specific implementation
}
while (Interlocked.CompareExchange(ref location, computed, initial) != initial);
return computed;
}
In the code above we can create any interlocked-like operation if we exploit the fact that the second read of location via Interlocked.CompareExchange will be guaranteed to return a newer value if the memory address received a write after the first read. This is because the Interlocked.CompareExchange method generates a memory barrier. If the value has changed between reads then the code spins around the loop repeatedly until location stops changing. This pattern does not require that the code use the latest or freshest value; just a newer value. The distinction is crucial.1
A lot of lock free code I have seen works on this principal. That is the operations are usually wrapped into loops such that the operation is continually retried until it succeeds. It does not assume that the first attempt is using the latest value. Nor does it assume every use of the value be the latest. It only assumes that the value is newer after each read.
Try to rethink how your readers should behave. Try to make them more agnostic about the age of the value. If that is simply not possible and all writes must be captured and processed then you may be forced into a more deterministic approach like placing all writes into a queue and having the readers dequeue them one-by-one. I am sure the ConcurrentQueue class would help in that situation.
If you can reduce the meaning of "fresh" to only "newer" then placing a call to Thread.MemoryBarrier after each read, using Volatile.Read, using the volatile keyword, etc. will absolutely guarantee that one read in a sequence will return a newer value than a previous read.
1The ABA problem opens up a new can of worms.

A memory barrier does provide this guarantee. We can derive the "freshness" property that you are looking for from the reording properties that a barrier guarantees.
By freshness you probably mean that a read returns the value of the most recent write.
Let's say we have these operations, each on a different thread:
x = 1
x = 2
print(x)
How could we possibly print a value other than 2? Without volatile the read can move one slot upwards and return 1. Volatile prevents reorderings, though. The write cannot move backwards in time.
In short, volatile guarantees you to see the most recent value.
Strictly speaking I'd need to differentiate between volatile and a memory barrier here. The latter one is a stronger guarantee. I have simplified this discussion because volatile is implemented using memory barriers, at least on x86/x64.

Is lock or volatile required when worker threads write non-competitively to local or class variables?

For the case below, when there is no competition for writes between the worker threads, are locks or volatile still required? Any difference in the answer if "Peek" access is not required at "G".
class A
{
Object _o; // need volatile (position A)?
Int _i; // need volatile (position B)?
Method()
{
Object o;
Int i;
Task [] task = new Task[2]
{
Task.Factory.StartNew(() => {
_o = f1(); // use lock() (position C)?
o = f2(); // use lock() (position D)?
}
Task.Factory.StartNew(() => {
_i = g1(); // use lock() (position E)?
i = g2(); // use lock() (position F)?
}
}
// "Peek" at _o, _i, o, i (position G)?
Task.WaitAll(tasks);
// Use _o, _i, o, i (position H)?
}

The safe thing to do is to not do this in the first place. Don't write a value on one thread and read the value on another thread in the first place. Make a Task<object> and a Task<int> that return the values to the thread that needs them, rather than making tasks that modify variables across threads.
If you are hell bent on writing to variables across threads then you need to guarantee two things. First, that the jitter does not choose optimizations that would cause reads and writes to be moved around in time, and second, that a memory barrier is introduced. The memory barrier limits the processor from moving reads and writes around in time in certain ways.
As Brian Gideon notes in his answer, you get a memory barrier from the WaitAll, but I do not recall offhand if that is a documented guarantee or just an implementation detail.
As I said, I would not do this in the first place. If I were forced to, I would at least make the variables I was writing to marked as volatile.

Writes to reference types (i.e. Object) and word-sized value types (i.e. int in a 32 bit system) are atomic. This means that when you peek at the values (position 6) you can be sure that you either get the old value or the new value, but not something else (if you had a type such as a large struct it could be spliced, and you could read the value when it was half way through being written). You don't need a lock or volatile, so long as you're willing to accept the potential risk of reading stale values.
Note that because there is no memory barrier introduced at this point (a lock or use of volatile both add one) it's possible that the variable has been updated in the other thread, but the current thread isn't observing that change; it can be reading a "stale" value for (potentially) quite some time after it has been changed in the other thread. The use of volatile will ensure that the current thread can observe changes to the variable sooner.
You can be sure that you'll have the appropriate value after the call to WaitAll, even without a lock or volatile.
Also note that while you can be sure the reference to the reference type is written atomically, your program makes no guarantee about the observed order of any changes to the actual object that the reference refers to. Even if, from the point of view of the background thread, the object is initialized before it is assigned to the instance field, it may not happen in that order. The other thread can therefore observe the write of the reference tot he object but then follow that reference and find an object in an initialize, or partially initialized, state. Introducing a memory barrier (i.e. through the use of a volatile variable can potentially allow you to prevent the runtime from making such re-orderings, thus ensuring that doesn't happen. This is why it's better to just not do this in the first place and to just have the two tasks return the results that they generate rather than manipulating a closed over variable.
WaitAll will introduce a memory barrier, in addition to ensuring that the two tasks are actually finished, which means that you know that the variables are up-to-date and will not have the old stale values.

At position G you may observe the values _o and _i may retain their initialized values null and 0 respectively or they may contain the values written by the tasks. It is unpredictable at this position.
However, at position H you force the issue in two different ways. First, you have guaranteed that both tasks finished and thus the writes are completed. Second, Task.WaitAll will generate a memory barrier which will guarantee that the main thread will observe the new values published by the tasks.
So, in this particular example an explicit lock or memory barrier generator (volatile) is not technically required.

Reordering of operations around volatile

I'm currently looking at a copy-on-write set implementation and want to confirm it's thread safe. I'm fairly sure the only way it might not be is if the compiler is allowed to reorder statements within certain methods. For example, the Remove method looks like:
public bool Remove(T item)
{
var newHashSet = new HashSet<T>(hashSet);
var removed = newHashSet.Remove(item);
hashSet = newHashSet;
return removed;
}
Where hashSet is defined as
private volatile HashSet<T> hashSet;
So my question is, given that hashSet is volatile does it mean that the Remove on the new set happens before the write to the member variable? If not, then other threads may see the set before the remove has occurred.
I haven't actually seen any issues with this in production, but I just want to confirm it is guaranteed to be safe.
UPDATE
To be more specific, there's another method to get an IEnumerator:
public IEnumerator<T> GetEnumerator()
{
return hashSet.GetEnumerator();
}
So the more specific question is: is there a guarantee that the returned IEnumerator will never throw a ConcurrentModificationException from a remove?
UPDATE 2
Sorry, the answers are all addressing the thread safety from multiple writers. Good points are raised, but that's not what I'm trying to find out here. I'd like to know if the compiler is allowed to re-order the operations in Remove to something like this:
var newHashSet = new HashSet<T>(hashSet);
hashSet = newHashSet; // swapped
var removed = newHashSet.Remove(item); // swapped
return removed;
If this was possible, it would mean that a thread could call GetEnumerator after hashSet had been assigned, but before item was removed, which could lead to the collection being modified during enumeration.
Joe Duffy has a blog article that states:
Volatile on loads means ACQUIRE, no more, no less. (There are
additional compiler optimization restrictions, of course, like not
allowing hoisting outside of loops, but let’s focus on the MM aspects
for now.) The standard definition of ACQUIRE is that subsequent
memory operations may not move before the ACQUIRE instruction; e.g.
given { ld.acq X, ld Y }, the ld Y cannot occur before ld.acq X.
However, previous memory operations can certainly move after it; e.g.
given { ld X, ld.acq Y }, the ld.acq Y can indeed occur before the ld
X. The only processor Microsoft .NET code currently runs on for which
this actually occurs is IA64, but this is a notable area where CLR’s
MM is weaker than most machines. Next, all stores on .NET are RELEASE
(regardless of volatile, i.e. volatile is a no-op in terms of jitted
code). The standard definition of RELEASE is that previous memory
operations may not move after a RELEASE operation; e.g. given { st X,
st.rel Y }, the st.rel Y cannot occur before st X. However,
subsequent memory operations can indeed move before it; e.g. given {
st.rel X, ld Y }, the ld Y can move before st.rel X.
The way I read this is that the call to newHashSet.Remove requires a ld newHashSet and the write to hashSet requires a st.rel newHashSet. From the above definition of RELEASE no loads can move after the store RELEASE, so the statements cannot be reordered! Could someone confirm please confirm my interpretation is correct?

Consider using Interlocked.Exchange - it will guarantee ordering, or Interlocked.CompareExchange which as side benifit will let you detect (and potentially recover from) simultaneous writes to collection. Clearly it adds some additional level of synchronization so it is different from your current code, but easier to reason about.
public bool Remove(T item)
{
var old = hashSet;
var newHashSet = new HashSet<T>(old);
var removed = newHashSet.Remove(item);
var current = Interlocked.CompareExchange(ref hashSet, newHashSet, old);
if (current != old)
{
// failed... throw or retry...
}
return removed;
}
And I think you'll still need volatile hashSet in this case.

EDITED:
Thanks for clarifying the presence of an external lock for calls to Remove (and other collection mutations).
Because of RELEASE semantics you will not end up storing a new value to hashSet until after the value to the variable removed has been assigned (because st removed can't be moved after st.rel hashSet).
Therefore, the 'snapshot' behaviour of GetEnumerator will work as intended, at least with respect to Remove and other mutators implemented in a similar fashion.

I can't speak for C#, but in C volatile in principle indicates and only indicates that the content of the variable could change at any time. It offers no constraints regarding compiler or CPU re-ordering. All you get is that the compiler/CPU will always read the value from memory, rather than trusting a cached version.
I believe however in recent MSVC (and so quite possibly C#), reading a volatile acts as a memoy barrier for loads and writing acts as a memory barrier for stores, e.g. the CPU will stall till all loads have completed and no loads may escape this by being re-ordered below the volatile read (although later independent loads may still move up above the memory barrier); and the CPU will stall till are stores have completed and no stores may escape this by being re-ordered below the volatile write (although later independent writes may still move up above the memory barrier).
When only a single thread is writing a given variable (but many are reading), only memory barriers are required for correct operation. When multiple threads may write to a given variable, atomic operations have to be used as CPU design is such that there is basically a race condition on write, such that a write can be lost.

Fields read from/written by several threads, Interlocked vs. volatile

There are a fair share of questions about Interlocked vs. volatile here on SO, I understand and know the concepts of volatile (no re-ordering, always reading from memory, etc.) and I am aware of how Interlocked works in that it performs an atomic operation.
But my question is this: Assume I have a field that is read from several threads, which is some reference type, say: public Object MyObject;. I know that if I do a compare exchange on it, like this: Interlocked.CompareExchange(ref MyObject, newValue, oldValue) that interlocked guarantees to only write newValue to the memory location that ref MyObject refers to, if ref MyObject and oldValue currently refer to the same object.
But what about reading? Does Interlocked guarantee that any threads reading MyObject after the CompareExchange operation succeeded will instantly get the new value, or do I have to mark MyObject as volatile to ensure this?
The reason I'm wondering is that I've implemented a lock-free linked list which updates the "head" node inside itself constantly when you prepend an element to it, like this:
[System.Diagnostics.DebuggerDisplay("Length={Length}")]
public class LinkedList<T>
{
LList<T>.Cell head;
// ....
public void Prepend(T item)
{
LList<T>.Cell oldHead;
LList<T>.Cell newHead;
do
{
oldHead = head;
newHead = LList<T>.Cons(item, oldHead);
} while (!Object.ReferenceEquals(Interlocked.CompareExchange(ref head, newHead, oldHead), oldHead));
}
// ....
}
Now after Prepend succeeds, are the threads reading head guaranteed to get the latest version, even though it's not marked as volatile?
I have been doing some empirical tests, and it seems to be working fine, and I have searched here on SO but not found a definitive answer (a bunch of different questions and comments/answer in them all say conflicting things).

Does Interlocked guarantee that any threads reading MyObject after the CompareExchange operation succeeded will instantly get the new value, or do I have to mark MyObject as volatile to ensure this?
Yes, subsequent reads on the same thread will get the new value.
Your loop unrolls to this:
oldHead = head;
newHead = ... ;
Interlocked.CompareExchange(ref head, newHead, oldHead) // full fence
oldHead = head; // this read cannot move before the fence
EDIT:
Normal caching can happen on other threads. Consider:
var copy = head;
while ( copy == head )
{
}
If you run that on another thread, the complier can cache the value of head and never see the update.

Your code should work fine. Though it is not clearly documented the Interlocked.CompareExchange method will produce a full-fence barrier. I suppose you could make one small change and omit the Object.ReferenceEquals call in favor of relying on the != operator which would perform reference equality by default.
For what it is worth the documentation for the InterlockedCompareExchange Win API call is much better.
This function generates a full memory barrier (or fence) to ensure
that memory operations are completed in order.
It is a shame the same level documenation does not exist on the .NET BCL counterpart Interlocked.CompareExchange because it is very likely they map to the exact same underlying mechanisms for the CAS.
Now after Prepend succeeds, are the threads reading head guaranteed to
get the latest version, even though it's not marked as volatile?
No, not necessarily. If those threads do not generate an acquire-fence barrier then there is no guarantee that they will read the latest value. Make sure that you perform a volatile read upon any use of head. You have already ensured that in Prepend with the Interlocked.CompareExchange call. Sure, that code may go through the loop once with an stale value of head, but the next iteration will be refreshed due to the Interlocked operation.
So if the context of your question was in regards to other threads who are also executing Prepend then nothing more needs to be done.
But if the context of your question was in regards to other threads executing another method on LinkedList then make sure you use Thread.VolatileRead or Interlocked.CompareExchange where appropriate.
Side note...there may be a micro-optimization that could be performed on the following code.
newHead = LList<T>.Cons(item, oldHead);
The only problem I see with this is that memory is allocated on every iteration of the loop. During periods of high contention the loop may spin around several times before it finally succeeds. You could probably lift this line outside of the loop as long as you reassign the linked reference to oldHead on every iteration (so that you get the fresh read). This way memory is only allocated once.

Using memory barriers

In the following code sample, does the memory barrier in FuncA is required to ensure that the most up-to-date value is read?
class Foo
{
DateTime m_bar;
void FuncA() // invoked by thread X
{
Thread.MemoryBarrier(); // is required?
Console.WriteLine(m_bar);
}
void FuncB() // invoked by thread Y
{
m_bar = DateTime.Now;
}
}
EDIT: If not, how can I ensure that FuncA will read the most recent value? (I want to make sure that the recent value is actually store in the processor's cache) [wihout using locks]

Looks like a big "No" to me. Thread.MemoryBarrier() only syncs up memory access within the thread that implemented it.
From MSDN:
The processor executing the current thread cannot reorder instructions in such a way that memory accesses prior to the call to MemoryBarrier execute after memory accesses that follow the call to MemoryBarrier.

I suggest you store Datetime as number of ticks (it is of type "long", i.e. Int64), you can easily transform from ticks (new DateTime (ticks)) and to ticks (myDateTime.Ticks). Then you can use Interlocked.Read to read value and Interlocked.Exchange to write value in fast non-locking operations.

Yes, the memory barrier is needed so that you can get the most up to date value.
If the memory barrier is not present then it is possible for thread X to read the value of m_bar from its own cache line while that value hasn't been written back to main memory (the change has been make local to Thread Y). You can achieve the same effect by declaring the variable as volatile:
The volatile modifier is usually used for a field that is accessed by multiple threads without using the lock statement to serialize access. Using the volatile modifier ensures that one thread retrieves the most up-to-date value written by another thread.
A good entry on that matter (probably the best) is this one by Joe Duffy: Volatile reads and writes, and timeliness

A memory barrier does infact the same thing as wat locking does, guarantee the field will get its
latest value from memory on entering the lock and be written to memory before exiting the lock.
Making sure a field's value is always read or written to memory and never optimized by reading or writing
it first to the cpu's cache can also be achieved by using the volatile keyword.
Unless primitive integral types and reference types DateTime cannot be cached in CPU registers ad so
need not (and cannot) be declared with the volatile keyword.

This actually doesn't matter since on 32bit architectures one can get a torn read in such situation

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.