I have read many contradicting information (msdn, SO etc.) about volatile and VoletileRead (ReadAcquireFence).
I understand the memory access reordering restriction implication of those - what I'm still completely confused about is the freshness guarantee - which is very important for me.
msdn doc for volatile mentions:
(...) This ensures that the most up-to-date value is present in the field at all times.
msdn doc for volatile fields mentions:
A read of a volatile field is called a volatile read. A volatile read has "acquire semantics"; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
.NET code for VolatileRead is:
public static int VolatileRead(ref int address)
{
int ret = address;
MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
return ret;
}
According to msdn MemoryBarrier doc Memory barrier prevents reordering. However this doesn't seem to have any implications on freshness - correct?
How then one can get freshness guarantee?
And is there difference between marking field volatile and accessing it with VolatileRead and VolatileWrite semantic? I'm currently doing the later in my performance critical code that needs to guarantee freshness, however readers sometimes get stale value. I'm wondering if marking the state volatile will make situation different.
EDIT1:
What I'm trying to achieve - get the guarantee that reader threads will get as recent value of shared variable (written by multiple writers) as possible - ideally no older than what is the cost of context switch or other operations that may postpone the immediate write of state.
If volatile or higher level construct (e.g. lock) have this guarantee (do they?) than how do they achieve this?
EDIT2:
The very condensed question should have been - how do I get guarantee of as fresh value during reads as possible? Ideally without locking (as exclusive access is not needed and there is potential for high contention).
From what I learned here I'm wondering if this might be the solution (solving(?) line is marked with comment):
private SharedState _sharedState;
private SpinLock _spinLock = new SpinLock(false);
public void Update(SharedState newValue)
{
bool lockTaken = false;
_spinLock.Enter(ref lockTaken);
_sharedState = newValue;
if (lockTaken)
{
_spinLock.Exit();
}
}
public SharedState GetFreshSharedState
{
get
{
Thread.MemoryBarrier(); // <---- This is added to give readers freshness guarantee
var value = _sharedState;
Thread.MemoryBarrier();
return value;
}
}
The MemoryBarrier call was added to make sure both - reads and writes - are wrapped by full fences (same as lock code - as indicated here http://www.albahari.com/threading/part4.aspx#_The_volatile_keyword 'Memory barriers and locking' section)
Does this look correct or is it flawed?
EDIT3:
Thanks to very interesting discussions here I learned quite a few things and I actually was able to distill to the simplified unambiguous question that I have about this topic. It's quite different from the original one so I rather posted a new one here: Memory barrier vs Interlocked impact on memory caches coherency timing
I think this is a good question. But, it is also difficult to answer. I am not sure I can give you a definitive answer to your questions. It is not your fault really. It is just that the subject matter is complex and really requires knowing details that might not be feasible to enumerate. Honestly, it really seems like you have educated yourself on the subject quite well already. I have spent a lot of time studying the subject myself and I still do not fully understand everything. Nevertheless, I will still attempt an some semblance of an answer here anyway.
So what does it mean for a thread to read a fresh value anyway? Does it mean the value returned by the read is guaranteed to be no older than 100ms, 50ms, or 1ms? Or does it mean the value is the absolute latest? Or does it mean that if two reads occur back-to-back then the second is guaranteed to get a newer value assuming the memory address changed after the first read? Or does it mean something else altogether?
I think you are going to have a hard time getting your readers to work correctly if you are thinking about things in terms of time intervals. Instead think of things in terms of what happens when you chain reads together. To illustrate my point consider how you would implement an interlocked-like operation using arbitrarily complex logic.
public static T InterlockedOperation<T>(ref T location, T operand)
{
T initial, computed;
do
{
initial = location;
computed = op(initial, operand); // where op is replaced with a specific implementation
}
while (Interlocked.CompareExchange(ref location, computed, initial) != initial);
return computed;
}
In the code above we can create any interlocked-like operation if we exploit the fact that the second read of location via Interlocked.CompareExchange will be guaranteed to return a newer value if the memory address received a write after the first read. This is because the Interlocked.CompareExchange method generates a memory barrier. If the value has changed between reads then the code spins around the loop repeatedly until location stops changing. This pattern does not require that the code use the latest or freshest value; just a newer value. The distinction is crucial.1
A lot of lock free code I have seen works on this principal. That is the operations are usually wrapped into loops such that the operation is continually retried until it succeeds. It does not assume that the first attempt is using the latest value. Nor does it assume every use of the value be the latest. It only assumes that the value is newer after each read.
Try to rethink how your readers should behave. Try to make them more agnostic about the age of the value. If that is simply not possible and all writes must be captured and processed then you may be forced into a more deterministic approach like placing all writes into a queue and having the readers dequeue them one-by-one. I am sure the ConcurrentQueue class would help in that situation.
If you can reduce the meaning of "fresh" to only "newer" then placing a call to Thread.MemoryBarrier after each read, using Volatile.Read, using the volatile keyword, etc. will absolutely guarantee that one read in a sequence will return a newer value than a previous read.
1The ABA problem opens up a new can of worms.
A memory barrier does provide this guarantee. We can derive the "freshness" property that you are looking for from the reording properties that a barrier guarantees.
By freshness you probably mean that a read returns the value of the most recent write.
Let's say we have these operations, each on a different thread:
x = 1
x = 2
print(x)
How could we possibly print a value other than 2? Without volatile the read can move one slot upwards and return 1. Volatile prevents reorderings, though. The write cannot move backwards in time.
In short, volatile guarantees you to see the most recent value.
Strictly speaking I'd need to differentiate between volatile and a memory barrier here. The latter one is a stronger guarantee. I have simplified this discussion because volatile is implemented using memory barriers, at least on x86/x64.
Related
(This is a repeat of: How to correctly read an Interlocked.Increment'ed int field? but, after reading the answers and comments, I'm still not sure of the right answer.)
There's some code that I don't own and can't change to use locks that increments an int counter (numberOfUpdates) in several different threads. All calls use:
Interlocked.Increment(ref numberOfUpdates);
I want to read numberOfUpdates in my code. Now since this is an int, I know that it can't tear. But what's the best way to ensure that I get the latest value possible? It seems like my options are:
int localNumberOfUpdates = Interlocked.CompareExchange(ref numberOfUpdates, 0, 0);
Or
int localNumberOfUpdates = Thread.VolatileRead(numberOfUpdates);
Will both work (in the sense of delivering the latest value possible regardless of optimizations, re-orderings, caching, etc.)? Is one preferred over the other? Is there a third option that's better?
I'm a firm believer in that if you're using interlocked to increment shared data, then you should use interlocked everywhere you access that shared data. Likewise, if you use insert you favorite synchronization primitive here to increment shared data, then you should use insert you favorite synchronization primitive here everywhere you access that shared data.
int localNumberOfUpdates = Interlocked.CompareExchange(ref numberOfUpdates, 0, 0);
Will give you exactly what your looking for. As others have said interlocked operations are atomic. So Interlocked.CompareExchange will always return the most recent value. I use this all the time for accessing simple shared data like counters.
I'm not as familiar with Thread.VolatileRead, but I suspect it will also return the most recent value. I'd stick with interlocked methods, if only for the sake of being consistent.
Additional info:
I'd recommend taking a look at Jon Skeet's answer for why you may want to shy away from Thread.VolatileRead(): Thread.VolatileRead Implementation
Eric Lippert discusses volatility and the guarantees made by the C# memory model in his blog at http://blogs.msdn.com/b/ericlippert/archive/2011/06/16/atomicity-volatility-and-immutability-are-different-part-three.aspx. Straight from the horses mouth: "I don't attempt to write any low-lock code except for the most trivial usages of Interlocked operations. I leave the usage of "volatile" to real experts."
And I agree with Hans's point that the value will always be stale at least by a few ns, but if you have a use case where that is unacceptable, its probably not well suited for a garbage collected language like C# or a non-real-time OS. Joe Duffy has a good article on the timeliness of interlocked methods here: http://joeduffyblog.com/2008/06/13/volatile-reads-and-writes-and-timeliness/
Thread.VolatileRead(numberOfUpdates) is what you want. numberOfUpdates is an Int32, so you already have atomicity by default, and Thread.VolatileRead will ensure volatility is dealt with.
If numberOfUpdates is defined as volatile int numberOfUpdates; you don't have to do
this, as all reads of it will already be volatile reads.
There seems to be confusion about whether Interlocked.CompareExchange is more appropriate. Consider the following two excerpts from the documentation.
From the Thread.VolatileRead documentation:
Reads the value of a field. The value is the latest written by any processor in a computer, regardless of the number of processors or the state of processor cache.
From the Interlocked.CompareExchange documentation:
Compares two 32-bit signed integers for equality and, if they are equal, replaces one of the values.
In terms of the stated behavior of these methods, Thread.VolatileRead is clearly more appropriate. You do not want to compare numberOfUpdates to another value, and you do not want to replace its value. You want to read its value.
Lasse makes a good point in his comment: you might be better off using simple locking. When the other code wants to update numberOfUpdates it does something like the following.
lock (state)
{
state.numberOfUpdates++;
}
When you want to read it, you do something like the following.
int value;
lock (state)
{
value = state.numberOfUpdates;
}
This will ensure your requirements of atomicity and volatility without delving into more-obscure, relatively low-level multithreading primitives.
Will both work (in the sense of delivering the latest value possible regardless of optimizations, re-orderings, caching, etc.)?
No, the value you get is always stale. How stale the value might be is entirely unpredictable. The vast majority of the time it will be stale by a few nanoseconds, give or take, depending how quickly you act on the value. But there is no reasonable upper-bound:
your thread can lose the processor when it context-switches another thread onto the core. Typical delays are around 45 msec with no firm upper-bound. This does not mean that another thread in your process also gets switched-out, it can keep motoring and continue to mutate the value.
just like any user-mode code, your code is subjected to page-faults as well. Incurred when the processor needs RAM for another process. On a heavily loaded machine that can and will page-out active code. As sometimes happens to the mouse driver code for example, leaving a frozen mouse cursor.
managed threads are subject to near-random garbage collection pauses. Tends to be the lesser problem since it is likely that another thread that's mutating the value will be paused as well.
Whatever you do with the value needs to take this into account. Needless to say perhaps, that's very, very difficult. Practical examples are hard to come by. The .NET Framework is a very large chunk of battle-scarred code. You can see the cross-reference to usage of VolatileRead from the Reference Source. Number of hits: 0.
Well, any value you read will always be somewhat stale as Hans Passant said. You can only control a guarantee that other shared values are consistent with the one you've just read using memory fences in the middle of code reading several shared values without locks (ie: are at the same degree of "staleness")
Fences also have the effect of defeating some compiler optimizations and reordering thus preventing unexpected behavior in release mode on different platforms.
Thread.VolatileRead will cause a full memory fence to be emitted so that no reads or writes can be reordered around your read of the int (in the method that's reading it). Obviously if you're only reading a single shared value (and you're not reading something else shared and the order and consistency of them both is important), then it may not seem necessary...
But I think that you will need it anyway to defeat some optimizations by the compiler or CPU so that you don't get the read more "stale" than necessary.
A dummy Interlocked.CompareExchange will do the same thing as Thread.VolatileRead (full fence and optimization defeating behavior).
There is a pattern followed in the framework used by CancellationTokenSource
http://referencesource.microsoft.com/#mscorlib/system/threading/CancellationTokenSource.cs#64
//m_state uses the pattern "volatile int32 reads, with cmpxch writes" which is safe for updates and cannot suffer torn reads.
private volatile int m_state;
public bool IsCancellationRequested
{
get { return m_state >= NOTIFYING; }
}
// ....
if (Interlocked.CompareExchange(ref m_state, NOTIFYING, NOT_CANCELED) == NOT_CANCELED) {
}
// ....
The volatile keyword has the effect of emitting a "half" fence. (ie: it blocks reads/writes from being moved before the read to it, and blocks reads/writes from being moved after the write to it).
It seems like my options are:
int localNumberOfUpdates = Interlocked.CompareExchange(ref numberOfUpdates, 0, 0);
Or
int localNumberOfUpdates = Thread.VolatileRead(numberOfUpdates);
Starting from the .NET Framework 4.5, there is a third option:
int localNumberOfUpdates = Volatile.Read(ref numberOfUpdates);
The Interlocked methods impose full fences, while the Volatile methods impose half fences¹. So using the static methods of the Volatile class is a potentially more economic way of reading atomically the latest value of an int variable or field.
Alternatively, if the numberOfUpdates is a field of a class, you could declare it as volatile. Reading a volatile field is equivalent with reading it with the Volatile.Read method.
I should mention one more option, which is to simply read the numberOfUpdates directly, without the help of either the Interlocked or the Volatile. We are not supposed to do this, but demonstrating an actual problem caused by doing so might be impossible. The reason is that the memory models of the most commonly used CPUs are stronger than the C# memory model. So if your machine has such a CPU (for example x86 or x64), you won't be able to write a program that fails as a result of reading directly the field. Nevertheless personally I never use this option, because I am not an expert in CPU architectures and memory protocols, nor I have the desire to become one. So I prefer to use either the Volatile class or the volatile keyword, whatever is more convenient in each case.
¹ With some exceptions, like reading/writing an Int64 or double on a 32-bit machine.
Not sure why nobody mentioned Interlocked.Add(ref localNumberOfUpdates, 0), but seems the simplest way to me...
I've found in our project's code the following implementation of double-lock checking:
public class SomeComponent
{
private readonly object mutex = new object();
public SomeComponent()
{
}
public bool IsInitialized { get; private set; }
public void Initialize()
{
this.InitializeIfRequired();
}
protected virtual void InitializeIfRequired()
{
if (!this.OnRequiresInitialization())
{
return;
}
lock (this.mutex)
{
if (!this.OnRequiresInitialization())
{
return;
}
try
{
this.OnInitialize();
}
catch (Exception)
{
throw;
}
this.IsInitialized = true;
}
}
protected virtual void OnInitialize()
{
//some code here
}
protected virtual bool OnRequiresInitialization()
{
return !this.IsInitialized;
}
}
From my point of view, this is the wrong implementation due to the absence of guarantees that different threads will see the freshest value of the IsInitialized property.
And the question is "Am I right?".
Update:
The scenario that I'm afraid to happen, is the following:
Step 1. Thread1 is executed on Processor1 and writes true into IsInitialized inside the lock section. This time old value of IsInitialized ( it's false) is in the cache of Processor1 As we know, processors have store buffers, so Processor1 can put new value (true) into its store buffer, not into its cache.
Step 2. Thread2 is inside InitializeIfRequired, executed on Processor2 and reads IsInitialized. There is no value of IsInitialized inside the cache of Processor2, so Processor2 ask the value of IsInitialized from other processors' caches or from memory. Processor1 has the value of IsInitialized inside its cache ( but remember it's old value, the updated value is still in the store buffer of Processor1 ), so it sends the old value to Processor2. As a result, Thread2 can read false instead of true.
Update 2:
If the lock (this.mutex) flushes processors' store buffers, then everything is ok, but is that guaranteed?
this is the wrong implementation due to the absence of guarantees that different threads will see the freshest value of the IsInitialized property. The question is "Am I right?".
You are correct that this is a broken implementation of double-checked locking. You are wrong in multiple subtle ways about why it is wrong.
First, let's disabuse you of your wrongness.
The belief that there is a "freshest" value of any variable in a multithreaded program is a bad belief, for two reasons. The first reason is that yes, C# makes guarantees about certain constraints on how reads and writes may be re-ordered. However, those guarantees do not include any promise that a globally consistent ordering exists and can be deduced by all threads. It is legal in the C# memory model for there to be reads and writes on variables, and for there to be ordering constraints on those reads and writes. But in cases where those constraints are not strong enough to enforce exactly one ordering of reads and writes, it is permissible for there to be no "canonical" order observed by all threads. It is permitted for two threads to agree that the constraints were all met, but still disagree upon what order was chosen. This logically implies that the notion that there is a single, canonical "freshest" value for each variable is simply wrong. Different threads can disagree as to which writes are "fresher" than others.
The second reason is that even without this weird property that the model admits two threads to disagree on the sequence of reads and writes, it would still be wrong to say that in any low-lock program you have a way to read the "freshest" value. All the primitive operations you have guarantee you is that certain writes and reads will not be moved forwards or backwards in time past certain points in the code. Nothing in there says anything whatsoever about "freshest", whatever that means. The best you can say is that some reads will read a fresher value. The notion of "freshest" is not defined by the memory model.
Another way you are wrong is very subtle indeed. You are doing a great job of reasoning about what might happen based on processors flushing caches. But nowhere in the C# documentation does it say one word about processors flushing caches! That's a chip implementation detail that is subject to change any time your C# program runs on a different architecture. Do not reason about processors flushing caches unless you know your program will run on exactly one architecture, and that you thoroughly understand that architecture. Rather, reason about the constraints imposed by the memory model. I am aware that the documentation on the model is sorely lacking, but that's the thing you should be reasoning about, because that's what you can actually depend on.
The other way that you are wrong is that though yes, the implementation is broken, it is not broken because you are not reading an up-to-date value of the initialized flag. The problem is that the initialized state that is controlled by the flag is not subject to restrictions on being moved around in time!
Let's make your example a bit more concrete:
private C c = null;
protected virtual void OnInitialize()
{
c = new C();
}
And a usage site:
this.InitializeIfRequired();
this.c.Frob();
Now we come to the real problem. Nothing is stopping the reads of IsInitialized and c from being moved around in time.
Suppose threads Alpha and Bravo are both running this code. Thread Bravo wins the race and the first thing it does is reads c as null. Remember, it is allowed to do so because there is no ordering constraint on the reads and writes because Bravo is never going to enter the lock.
Realistically, how might this happen? The C# compiler or the jitter are permitted to move the read instruction earlier, but they don't. Briefly returning to the real world of cached architectures, the read of c might be logically moved up in front of the read of the flag because c is already in the cache. Maybe it was close to a different variable that was read recently. Or maybe branch prediction is predicting that the flag is going to cause you to skip the lock, and the processor pre-fetches the value. But again, it doesn't matter what the real-world scenario is; that's all chip implementation details. The C# spec permits this read to be done early, so assume that at some point it will be done early!
Back to our scenario. We immediately switch to thread Alpha.
Thread Alpha runs as you expect it to. It sees that the flag says that initialization is required, takes the lock, initializes c, sets the flag, and leaves.
Now thread Bravo runs again, the flag now says that initialization is not required, and so we use the version of c that we read earlier, and dereference null.
Double-checked locking is correct in C# as long as you strictly follow the exact double-checked locking pattern. The moment you diverge from it even slightly you are off in the weeds of horrible, unreproducible, race condition bugs like the one I just described. Just don't go there:
Don't share memory across threads. The takeaway that I take from knowing everything I just told you is I am not smart enough to write multithreaded code that shares memory and works by design. I am only smart enough to write multithreaded code that works by accident, and that's not acceptable to me.
If you must share memory across threads, lock every access, without exception. It's not that expensive! And you know what is more expensive? Dealing with a series of unreproducible fatal crashes that all lose user data.
If you must share memory across threads and you must have low lock lazy initialization good heavens do not write it yourself. Use Lazy<T>; it contains a correct implementation of low-locked lazy initialization that you can rely on being correct on all processor architectures.
Follow-up question:
If the lock (this.mutex) flushes processors' store buffers, then everything is ok, but is that guaranteed?
To clarify, this question is about whether the initialized flag is read correctly in the double-checked locking scenario. Let's again address your misconceptions here.
The initialized flag is guaranteed to be read correctly inside the lock because it is written inside the lock.
However, the correct way to think about this, as I mentioned before, is not to reason anything about flushing caches. The correct way to reason about this is that the C# specification puts restrictions on how reads and writes can be moved around in time with respect to locks.
In particular, a read inside a lock may not be moved to before the lock, and a write inside a lock may not be moved to after the lock. Those facts, combined with the fact that locks provide mutual exclusion, is sufficient to conclude that the read of the initialized flag is correct inside the lock.
Again, if you are not comfortable making these kinds of deductions -- and I am not! -- then do not write low-lock code.
Like many other people, I've always been confused by volatile reads/writes and fences. So now I'm trying to fully understand what these do.
So, a volatile read is supposed to (1) exhibit acquire-semantics and (2) guarantee that the value read is fresh, i.e., it is not a cached value. Let's focus on (2).
Now, I've read that, if you want to perform a volatile read, you should introduce an acquire fence (or a full fence) after the read, like this:
int local = shared;
Thread.MemoryBarrier();
How exactly does this prevent the read operation from using a previously cached value?
According to the definition of a fence (no read/stores are allowed to be moved above/below the fence), I would insert the fence before the read, preventing the read from crossing the fence and being moved backwards in time (aka, being cached).
How does preventing the read from being moved forwards in time (or subsequent instructions from being moved backwards in time) guarantee a volatile (fresh) read? How does it help?
Similarly, I believe that a volatile write should introduce a fence after the write operation, preventing the processor from moving the write forward in time (aka, delaying the write). I believe this would make the processor flush the write to the main memory.
But to my surprise, the C# implementation introduces the fence before the write!
[MethodImplAttribute(MethodImplOptions.NoInlining)] // disable optimizations
public static void VolatileWrite(ref int address, int value)
{
MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
address = value;
}
Update
According to this example, apparently taken from "C# 4 in a Nutshell", fence 2 , placed after a write is supposed to force the write to be flushed to main memory immediately, and fence 3, placed before a read, is supposed to guarantee a fresh read:
class Foo{
int _answer;
bool complete;
void A(){
_answer = 123;
Thread.MemoryBarrier(); // Barrier 1
_complete = true;
Thread.MemoryBarrier(); // Barrier 2
}
void B(){
Thread.MemoryBarrier(); // Barrier 3;
if(_complete){
Thread.MemoryBarrier(); // Barrier 4;
Console.WriteLine(_answer);
}
}
}
The ideas in this book (and my own personal beliefs) seem to contradict the ideas behind C#'s VolatileRead and VolatileWrite implementations.
How exactly does this prevent the read operation from using a
previously cached value?
It does no such thing. A volatile read does not guarantee that the latest value will be returned. In plain English all it really means is that the next read will return a newer value and nothing more.
How does preventing the read from being moved forwards in time (or
subsequent instructions from being moved backwards in time) guarantee
a volatile (fresh) read? How does it help?
Be careful with the terminology here. Volatile is not synonymous with fresh. As I already mentioned above its real usefulness lies in how two or more volatile reads are chained together. The next read in a sequence of volatile reads will absolutely return a newer value than the previous read of the same address. Lock-free code should be written with this premise in mind. That is, the code should be structured to work on the principal of dealing with a newer value and not the latest value. This is why most lock-free code spins in a loop until it can verify that the operation completely successfully.
The ideas in this book (and my own personal beliefs) seem to
contradict the ideas behind C#'s VolatileRead and VolatileWrite
implementations.
Not really. Remember volatile != fresh. Yes, if you want a "fresh" read then you need to place an acquire-fence before the read. But, that is not the same as doing a volatile read. What I am saying is that if the implementation of VolatileRead had the call to Thread.MemoryBarrier before the read instruction then it would not actually produce a volatile read. If would produce fresh a read though.
The important thing to understand is that volatile does not only mean "cannot cache value", but also gives important visibility guarantees (to be exact, it's entirely possible to have a volatile write that only goes to cache; depends solely on the hardware and its used cache coherency protocols)
A volatile read gives acquire semantics, while a volatile write has release semantics. An acquire fence means that you cannot reorder reads or writes before the fence, while a release fence means you cannot move them after the fence. The linked answer in the comments explains that actually quite nicely.
Now the question is, if we don't have any memory barrier before the load how is it guaranteed that we'll see the newest value? The answer to that is: Because we also put memory barriers after each volatile write to guarantee that.
Doug Lea wrote a great summary on which barriers exist, what they do and where to put them for volatile reads/writes for the JMM as a help for compiler writers, but the text is also quite useful for other people. Volatile reads and writes give the same guarantees in both Java and the CLR so that's generally applicable.
Source - scroll down to the "Memory Barriers" section (I'd copy the interesting parts, but the formatting doesn't survive it..)
I'm currently looking at a copy-on-write set implementation and want to confirm it's thread safe. I'm fairly sure the only way it might not be is if the compiler is allowed to reorder statements within certain methods. For example, the Remove method looks like:
public bool Remove(T item)
{
var newHashSet = new HashSet<T>(hashSet);
var removed = newHashSet.Remove(item);
hashSet = newHashSet;
return removed;
}
Where hashSet is defined as
private volatile HashSet<T> hashSet;
So my question is, given that hashSet is volatile does it mean that the Remove on the new set happens before the write to the member variable? If not, then other threads may see the set before the remove has occurred.
I haven't actually seen any issues with this in production, but I just want to confirm it is guaranteed to be safe.
UPDATE
To be more specific, there's another method to get an IEnumerator:
public IEnumerator<T> GetEnumerator()
{
return hashSet.GetEnumerator();
}
So the more specific question is: is there a guarantee that the returned IEnumerator will never throw a ConcurrentModificationException from a remove?
UPDATE 2
Sorry, the answers are all addressing the thread safety from multiple writers. Good points are raised, but that's not what I'm trying to find out here. I'd like to know if the compiler is allowed to re-order the operations in Remove to something like this:
var newHashSet = new HashSet<T>(hashSet);
hashSet = newHashSet; // swapped
var removed = newHashSet.Remove(item); // swapped
return removed;
If this was possible, it would mean that a thread could call GetEnumerator after hashSet had been assigned, but before item was removed, which could lead to the collection being modified during enumeration.
Joe Duffy has a blog article that states:
Volatile on loads means ACQUIRE, no more, no less. (There are
additional compiler optimization restrictions, of course, like not
allowing hoisting outside of loops, but let’s focus on the MM aspects
for now.) The standard definition of ACQUIRE is that subsequent
memory operations may not move before the ACQUIRE instruction; e.g.
given { ld.acq X, ld Y }, the ld Y cannot occur before ld.acq X.
However, previous memory operations can certainly move after it; e.g.
given { ld X, ld.acq Y }, the ld.acq Y can indeed occur before the ld
X. The only processor Microsoft .NET code currently runs on for which
this actually occurs is IA64, but this is a notable area where CLR’s
MM is weaker than most machines. Next, all stores on .NET are RELEASE
(regardless of volatile, i.e. volatile is a no-op in terms of jitted
code). The standard definition of RELEASE is that previous memory
operations may not move after a RELEASE operation; e.g. given { st X,
st.rel Y }, the st.rel Y cannot occur before st X. However,
subsequent memory operations can indeed move before it; e.g. given {
st.rel X, ld Y }, the ld Y can move before st.rel X.
The way I read this is that the call to newHashSet.Remove requires a ld newHashSet and the write to hashSet requires a st.rel newHashSet. From the above definition of RELEASE no loads can move after the store RELEASE, so the statements cannot be reordered! Could someone confirm please confirm my interpretation is correct?
Consider using Interlocked.Exchange - it will guarantee ordering, or Interlocked.CompareExchange which as side benifit will let you detect (and potentially recover from) simultaneous writes to collection. Clearly it adds some additional level of synchronization so it is different from your current code, but easier to reason about.
public bool Remove(T item)
{
var old = hashSet;
var newHashSet = new HashSet<T>(old);
var removed = newHashSet.Remove(item);
var current = Interlocked.CompareExchange(ref hashSet, newHashSet, old);
if (current != old)
{
// failed... throw or retry...
}
return removed;
}
And I think you'll still need volatile hashSet in this case.
EDITED:
Thanks for clarifying the presence of an external lock for calls to Remove (and other collection mutations).
Because of RELEASE semantics you will not end up storing a new value to hashSet until after the value to the variable removed has been assigned (because st removed can't be moved after st.rel hashSet).
Therefore, the 'snapshot' behaviour of GetEnumerator will work as intended, at least with respect to Remove and other mutators implemented in a similar fashion.
I can't speak for C#, but in C volatile in principle indicates and only indicates that the content of the variable could change at any time. It offers no constraints regarding compiler or CPU re-ordering. All you get is that the compiler/CPU will always read the value from memory, rather than trusting a cached version.
I believe however in recent MSVC (and so quite possibly C#), reading a volatile acts as a memoy barrier for loads and writing acts as a memory barrier for stores, e.g. the CPU will stall till all loads have completed and no loads may escape this by being re-ordered below the volatile read (although later independent loads may still move up above the memory barrier); and the CPU will stall till are stores have completed and no stores may escape this by being re-ordered below the volatile write (although later independent writes may still move up above the memory barrier).
When only a single thread is writing a given variable (but many are reading), only memory barriers are required for correct operation. When multiple threads may write to a given variable, atomic operations have to be used as CPU design is such that there is basically a race condition on write, such that a write can be lost.
There are a fair share of questions about Interlocked vs. volatile here on SO, I understand and know the concepts of volatile (no re-ordering, always reading from memory, etc.) and I am aware of how Interlocked works in that it performs an atomic operation.
But my question is this: Assume I have a field that is read from several threads, which is some reference type, say: public Object MyObject;. I know that if I do a compare exchange on it, like this: Interlocked.CompareExchange(ref MyObject, newValue, oldValue) that interlocked guarantees to only write newValue to the memory location that ref MyObject refers to, if ref MyObject and oldValue currently refer to the same object.
But what about reading? Does Interlocked guarantee that any threads reading MyObject after the CompareExchange operation succeeded will instantly get the new value, or do I have to mark MyObject as volatile to ensure this?
The reason I'm wondering is that I've implemented a lock-free linked list which updates the "head" node inside itself constantly when you prepend an element to it, like this:
[System.Diagnostics.DebuggerDisplay("Length={Length}")]
public class LinkedList<T>
{
LList<T>.Cell head;
// ....
public void Prepend(T item)
{
LList<T>.Cell oldHead;
LList<T>.Cell newHead;
do
{
oldHead = head;
newHead = LList<T>.Cons(item, oldHead);
} while (!Object.ReferenceEquals(Interlocked.CompareExchange(ref head, newHead, oldHead), oldHead));
}
// ....
}
Now after Prepend succeeds, are the threads reading head guaranteed to get the latest version, even though it's not marked as volatile?
I have been doing some empirical tests, and it seems to be working fine, and I have searched here on SO but not found a definitive answer (a bunch of different questions and comments/answer in them all say conflicting things).
Does Interlocked guarantee that any threads reading MyObject after the CompareExchange operation succeeded will instantly get the new value, or do I have to mark MyObject as volatile to ensure this?
Yes, subsequent reads on the same thread will get the new value.
Your loop unrolls to this:
oldHead = head;
newHead = ... ;
Interlocked.CompareExchange(ref head, newHead, oldHead) // full fence
oldHead = head; // this read cannot move before the fence
EDIT:
Normal caching can happen on other threads. Consider:
var copy = head;
while ( copy == head )
{
}
If you run that on another thread, the complier can cache the value of head and never see the update.
Your code should work fine. Though it is not clearly documented the Interlocked.CompareExchange method will produce a full-fence barrier. I suppose you could make one small change and omit the Object.ReferenceEquals call in favor of relying on the != operator which would perform reference equality by default.
For what it is worth the documentation for the InterlockedCompareExchange Win API call is much better.
This function generates a full memory barrier (or fence) to ensure
that memory operations are completed in order.
It is a shame the same level documenation does not exist on the .NET BCL counterpart Interlocked.CompareExchange because it is very likely they map to the exact same underlying mechanisms for the CAS.
Now after Prepend succeeds, are the threads reading head guaranteed to
get the latest version, even though it's not marked as volatile?
No, not necessarily. If those threads do not generate an acquire-fence barrier then there is no guarantee that they will read the latest value. Make sure that you perform a volatile read upon any use of head. You have already ensured that in Prepend with the Interlocked.CompareExchange call. Sure, that code may go through the loop once with an stale value of head, but the next iteration will be refreshed due to the Interlocked operation.
So if the context of your question was in regards to other threads who are also executing Prepend then nothing more needs to be done.
But if the context of your question was in regards to other threads executing another method on LinkedList then make sure you use Thread.VolatileRead or Interlocked.CompareExchange where appropriate.
Side note...there may be a micro-optimization that could be performed on the following code.
newHead = LList<T>.Cons(item, oldHead);
The only problem I see with this is that memory is allocated on every iteration of the loop. During periods of high contention the loop may spin around several times before it finally succeeds. You could probably lift this line outside of the loop as long as you reassign the linked reference to oldHead on every iteration (so that you get the fresh read). This way memory is only allocated once.