Reordering of operations around volatile

Reordering of operations around volatile - c#

I'm currently looking at a copy-on-write set implementation and want to confirm it's thread safe. I'm fairly sure the only way it might not be is if the compiler is allowed to reorder statements within certain methods. For example, the Remove method looks like:
public bool Remove(T item)
{
var newHashSet = new HashSet<T>(hashSet);
var removed = newHashSet.Remove(item);
hashSet = newHashSet;
return removed;
}
Where hashSet is defined as
private volatile HashSet<T> hashSet;
So my question is, given that hashSet is volatile does it mean that the Remove on the new set happens before the write to the member variable? If not, then other threads may see the set before the remove has occurred.
I haven't actually seen any issues with this in production, but I just want to confirm it is guaranteed to be safe.
UPDATE
To be more specific, there's another method to get an IEnumerator:
public IEnumerator<T> GetEnumerator()
{
return hashSet.GetEnumerator();
}
So the more specific question is: is there a guarantee that the returned IEnumerator will never throw a ConcurrentModificationException from a remove?
UPDATE 2
Sorry, the answers are all addressing the thread safety from multiple writers. Good points are raised, but that's not what I'm trying to find out here. I'd like to know if the compiler is allowed to re-order the operations in Remove to something like this:
var newHashSet = new HashSet<T>(hashSet);
hashSet = newHashSet; // swapped
var removed = newHashSet.Remove(item); // swapped
return removed;
If this was possible, it would mean that a thread could call GetEnumerator after hashSet had been assigned, but before item was removed, which could lead to the collection being modified during enumeration.
Joe Duffy has a blog article that states:
Volatile on loads means ACQUIRE, no more, no less. (There are
additional compiler optimization restrictions, of course, like not
allowing hoisting outside of loops, but let’s focus on the MM aspects
for now.) The standard definition of ACQUIRE is that subsequent
memory operations may not move before the ACQUIRE instruction; e.g.
given { ld.acq X, ld Y }, the ld Y cannot occur before ld.acq X.
However, previous memory operations can certainly move after it; e.g.
given { ld X, ld.acq Y }, the ld.acq Y can indeed occur before the ld
X. The only processor Microsoft .NET code currently runs on for which
this actually occurs is IA64, but this is a notable area where CLR’s
MM is weaker than most machines. Next, all stores on .NET are RELEASE
(regardless of volatile, i.e. volatile is a no-op in terms of jitted
code). The standard definition of RELEASE is that previous memory
operations may not move after a RELEASE operation; e.g. given { st X,
st.rel Y }, the st.rel Y cannot occur before st X. However,
subsequent memory operations can indeed move before it; e.g. given {
st.rel X, ld Y }, the ld Y can move before st.rel X.
The way I read this is that the call to newHashSet.Remove requires a ld newHashSet and the write to hashSet requires a st.rel newHashSet. From the above definition of RELEASE no loads can move after the store RELEASE, so the statements cannot be reordered! Could someone confirm please confirm my interpretation is correct?

Consider using Interlocked.Exchange - it will guarantee ordering, or Interlocked.CompareExchange which as side benifit will let you detect (and potentially recover from) simultaneous writes to collection. Clearly it adds some additional level of synchronization so it is different from your current code, but easier to reason about.
public bool Remove(T item)
{
var old = hashSet;
var newHashSet = new HashSet<T>(old);
var removed = newHashSet.Remove(item);
var current = Interlocked.CompareExchange(ref hashSet, newHashSet, old);
if (current != old)
{
// failed... throw or retry...
}
return removed;
}
And I think you'll still need volatile hashSet in this case.

EDITED:
Thanks for clarifying the presence of an external lock for calls to Remove (and other collection mutations).
Because of RELEASE semantics you will not end up storing a new value to hashSet until after the value to the variable removed has been assigned (because st removed can't be moved after st.rel hashSet).
Therefore, the 'snapshot' behaviour of GetEnumerator will work as intended, at least with respect to Remove and other mutators implemented in a similar fashion.

I can't speak for C#, but in C volatile in principle indicates and only indicates that the content of the variable could change at any time. It offers no constraints regarding compiler or CPU re-ordering. All you get is that the compiler/CPU will always read the value from memory, rather than trusting a cached version.
I believe however in recent MSVC (and so quite possibly C#), reading a volatile acts as a memoy barrier for loads and writing acts as a memory barrier for stores, e.g. the CPU will stall till all loads have completed and no loads may escape this by being re-ordered below the volatile read (although later independent loads may still move up above the memory barrier); and the CPU will stall till are stores have completed and no stores may escape this by being re-ordered below the volatile write (although later independent writes may still move up above the memory barrier).
When only a single thread is writing a given variable (but many are reading), only memory barriers are required for correct operation. When multiple threads may write to a given variable, atomic operations have to be used as CPU design is such that there is basically a race condition on write, such that a write can be lost.

Related

Possibly incorrect implementation of double double-lock checking

I've found in our project's code the following implementation of double-lock checking:
public class SomeComponent
{
private readonly object mutex = new object();
public SomeComponent()
{
}
public bool IsInitialized { get; private set; }
public void Initialize()
{
this.InitializeIfRequired();
}
protected virtual void InitializeIfRequired()
{
if (!this.OnRequiresInitialization())
{
return;
}
lock (this.mutex)
{
if (!this.OnRequiresInitialization())
{
return;
}
try
{
this.OnInitialize();
}
catch (Exception)
{
throw;
}
this.IsInitialized = true;
}
}
protected virtual void OnInitialize()
{
//some code here
}
protected virtual bool OnRequiresInitialization()
{
return !this.IsInitialized;
}
}
From my point of view, this is the wrong implementation due to the absence of guarantees that different threads will see the freshest value of the IsInitialized property.
And the question is "Am I right?".
Update:
The scenario that I'm afraid to happen, is the following:
Step 1. Thread1 is executed on Processor1 and writes true into IsInitialized inside the lock section. This time old value of IsInitialized ( it's false) is in the cache of Processor1 As we know, processors have store buffers, so Processor1 can put new value (true) into its store buffer, not into its cache.
Step 2. Thread2 is inside InitializeIfRequired, executed on Processor2 and reads IsInitialized. There is no value of IsInitialized inside the cache of Processor2, so Processor2 ask the value of IsInitialized from other processors' caches or from memory. Processor1 has the value of IsInitialized inside its cache ( but remember it's old value, the updated value is still in the store buffer of Processor1 ), so it sends the old value to Processor2. As a result, Thread2 can read false instead of true.
Update 2:
If the lock (this.mutex) flushes processors' store buffers, then everything is ok, but is that guaranteed?

this is the wrong implementation due to the absence of guarantees that different threads will see the freshest value of the IsInitialized property. The question is "Am I right?".
You are correct that this is a broken implementation of double-checked locking. You are wrong in multiple subtle ways about why it is wrong.
First, let's disabuse you of your wrongness.
The belief that there is a "freshest" value of any variable in a multithreaded program is a bad belief, for two reasons. The first reason is that yes, C# makes guarantees about certain constraints on how reads and writes may be re-ordered. However, those guarantees do not include any promise that a globally consistent ordering exists and can be deduced by all threads. It is legal in the C# memory model for there to be reads and writes on variables, and for there to be ordering constraints on those reads and writes. But in cases where those constraints are not strong enough to enforce exactly one ordering of reads and writes, it is permissible for there to be no "canonical" order observed by all threads. It is permitted for two threads to agree that the constraints were all met, but still disagree upon what order was chosen. This logically implies that the notion that there is a single, canonical "freshest" value for each variable is simply wrong. Different threads can disagree as to which writes are "fresher" than others.
The second reason is that even without this weird property that the model admits two threads to disagree on the sequence of reads and writes, it would still be wrong to say that in any low-lock program you have a way to read the "freshest" value. All the primitive operations you have guarantee you is that certain writes and reads will not be moved forwards or backwards in time past certain points in the code. Nothing in there says anything whatsoever about "freshest", whatever that means. The best you can say is that some reads will read a fresher value. The notion of "freshest" is not defined by the memory model.
Another way you are wrong is very subtle indeed. You are doing a great job of reasoning about what might happen based on processors flushing caches. But nowhere in the C# documentation does it say one word about processors flushing caches! That's a chip implementation detail that is subject to change any time your C# program runs on a different architecture. Do not reason about processors flushing caches unless you know your program will run on exactly one architecture, and that you thoroughly understand that architecture. Rather, reason about the constraints imposed by the memory model. I am aware that the documentation on the model is sorely lacking, but that's the thing you should be reasoning about, because that's what you can actually depend on.
The other way that you are wrong is that though yes, the implementation is broken, it is not broken because you are not reading an up-to-date value of the initialized flag. The problem is that the initialized state that is controlled by the flag is not subject to restrictions on being moved around in time!
Let's make your example a bit more concrete:
private C c = null;
protected virtual void OnInitialize()
{
c = new C();
}
And a usage site:
this.InitializeIfRequired();
this.c.Frob();
Now we come to the real problem. Nothing is stopping the reads of IsInitialized and c from being moved around in time.
Suppose threads Alpha and Bravo are both running this code. Thread Bravo wins the race and the first thing it does is reads c as null. Remember, it is allowed to do so because there is no ordering constraint on the reads and writes because Bravo is never going to enter the lock.
Realistically, how might this happen? The C# compiler or the jitter are permitted to move the read instruction earlier, but they don't. Briefly returning to the real world of cached architectures, the read of c might be logically moved up in front of the read of the flag because c is already in the cache. Maybe it was close to a different variable that was read recently. Or maybe branch prediction is predicting that the flag is going to cause you to skip the lock, and the processor pre-fetches the value. But again, it doesn't matter what the real-world scenario is; that's all chip implementation details. The C# spec permits this read to be done early, so assume that at some point it will be done early!
Back to our scenario. We immediately switch to thread Alpha.
Thread Alpha runs as you expect it to. It sees that the flag says that initialization is required, takes the lock, initializes c, sets the flag, and leaves.
Now thread Bravo runs again, the flag now says that initialization is not required, and so we use the version of c that we read earlier, and dereference null.
Double-checked locking is correct in C# as long as you strictly follow the exact double-checked locking pattern. The moment you diverge from it even slightly you are off in the weeds of horrible, unreproducible, race condition bugs like the one I just described. Just don't go there:
Don't share memory across threads. The takeaway that I take from knowing everything I just told you is I am not smart enough to write multithreaded code that shares memory and works by design. I am only smart enough to write multithreaded code that works by accident, and that's not acceptable to me.
If you must share memory across threads, lock every access, without exception. It's not that expensive! And you know what is more expensive? Dealing with a series of unreproducible fatal crashes that all lose user data.
If you must share memory across threads and you must have low lock lazy initialization good heavens do not write it yourself. Use Lazy<T>; it contains a correct implementation of low-locked lazy initialization that you can rely on being correct on all processor architectures.
Follow-up question:
If the lock (this.mutex) flushes processors' store buffers, then everything is ok, but is that guaranteed?
To clarify, this question is about whether the initialized flag is read correctly in the double-checked locking scenario. Let's again address your misconceptions here.
The initialized flag is guaranteed to be read correctly inside the lock because it is written inside the lock.
However, the correct way to think about this, as I mentioned before, is not to reason anything about flushing caches. The correct way to reason about this is that the C# specification puts restrictions on how reads and writes can be moved around in time with respect to locks.
In particular, a read inside a lock may not be moved to before the lock, and a write inside a lock may not be moved to after the lock. Those facts, combined with the fact that locks provide mutual exclusion, is sufficient to conclude that the read of the initialized flag is correct inside the lock.
Again, if you are not comfortable making these kinds of deductions -- and I am not! -- then do not write low-lock code.

Why is the standard C# event invocation pattern thread-safe without a memory barrier or cache invalidation? What about similar code?

In C#, this is the standard code for invoking an event in a thread-safe way:
var handler = SomethingHappened;
if(handler != null)
handler(this, e);
Where, potentially on another thread, the compiler-generated add method uses Delegate.Combine to create a new multicast delegate instance, which it then sets on the compiler-generated field (using interlocked compare-exchange).
(Note: for the purposes of this question, we don't care about code that runs in the event subscribers. Assume that it's thread-safe and robust in the face of removal.)
In my own code, I want to do something similar, along these lines:
var localFoo = this.memberFoo;
if(localFoo != null)
localFoo.Bar(localFoo.baz);
Where this.memberFoo could be set by another thread. (It's just one thread, so I don't think it needs to be interlocked - but maybe there's a side-effect here?)
(And, obviously, assume that Foo is "immutable enough" that we're not actively modifying it while it is in use on this thread.)
Now I understand the obvious reason that this is thread-safe: reads from reference fields are atomic. Copying to a local ensures we don't get two different values. (Apparently only guaranteed from .NET 2.0, but I assume it's safe in any sane .NET implementation?)
But what I don't understand is: What about the memory occupied by the object instance that is being referenced? Particularly in regards to cache coherency? If a "writer" thread does this on one CPU:
thing.memberFoo = new Foo(1234);
What guarantees that the memory where the new Foo is allocated doesn't happen to be in the cache of the CPU the "reader" is running on, with uninitialized values? What ensures that localFoo.baz (above) doesn't read garbage? (And how well guaranteed is this across platforms? On Mono? On ARM?)
And what if the newly created foo happens to come from a pool?
thing.memberFoo = FooPool.Get().Reset(1234);
This seems no different, from a memory perspective, to a fresh allocation - but maybe the .NET allocator does some magic to make the first case work?
My thinking, in asking this, is that a memory barrier would be required to ensure - not so much that memory accesses cannot be moved around, given the read is dependent - but as a signal to the CPU to flush any cache invalidations.
My source for this is Wikipedia, so make of that what you will.
(I might speculate that maybe the interlocked-compare-exchange on the writer thread invalidates the cache on the reader? Or maybe all reads cause invalidation? Or pointer dereferences cause invalidation? I'm particularly concerned how platform-specific these things sound.)
Update: Just to make it more explicit that the question is about CPU cache invalidation and what guarantees .NET provides (and how those guarantees might depend on CPU architecture):
Say we have a reference stored in field Q (a memory location).
On CPU A (writer) we initialize an object at memory location R, and write a reference to R into Q
On CPU B (reader), we dereference field Q, and get back memory location R
Then, on CPU B, we read a value from R
Assume the GC does not run at any point. Nothing else interesting happens.
Question: What prevents R from being in B's cache, from before A has modified it during initialisation, such that when B reads from R it gets stale values, in spite of it getting a fresh version of Q to know where R is in the first place?
(Alternate wording: what makes the modification to R visible to CPU B at or before the point that the change to Q is visible to CPU B.)
(And does this only apply to memory allocated with new, or to any memory?)+
Note: I've posted a self-answer here.

This is a really good question. Let us consider your first example.
var handler = SomethingHappened;
if(handler != null)
handler(this, e);
Why is this safe? To answer that question you first have to define what you mean by "safe". Is it safe from a NullReferenceException? Yes, it is pretty trivial to see that caching the delegate reference locally eliminates that pesky race between the null check and the invocation. Is it safe to have more than one thread touching the delegate? Yes, delegates are immutable so there is no way that one thread can cause the delegate to get into a half-baked state. The first two are obvious. But, what about a scenario where thread A is doing this invocation in a loop and thread B at some later point in time assigns the first event handler? Is that safe in the sense that thread A will eventually see a non-null value for the delegate? The somewhat surprising answer to this is probably. The reason is that the default implementations of the add and remove accessors for the event create memory barriers. I believe the early version of the CLR took an explicit lock and later versions used Interlocked.CompareExchange. If you implemented your own accessors and omitted a memory barrier then the answer could be no. I think in reality it highly depends on whether Microsoft added memory barriers to the construction of the multicast delegate itself.
On to the second and more interesting example.
var localFoo = this.memberFoo;
if(localFoo != null)
localFoo.Bar(localFoo.baz);
Nope. Sorry, this actually is not safe. Let us assume memberFoo is of type Foo which is defined like the following.
public class Foo
{
public int baz = 0;
public int daz = 0;
public Foo()
{
baz = 5;
daz = 10;
}
public void Bar(int x)
{
x / daz;
}
}
And then let us assume another thread does the following.
this.memberFoo = new Foo();
Despite what some may think there is nothing that mandates that instructions have to be executed in the order that they were defined in the code as long as the intent of the programmer is logically preserved. The C# or JIT compilers could actually formulate the following sequence of instructions.
/* 1 */ set register = alloc-memory-and-return-reference(typeof(Foo));
/* 2 */ set register.baz = 0;
/* 3 */ set register.daz = 0;
/* 4 */ set this.memberFoo = register;
/* 5 */ set register.baz = 5; // Foo.ctor
/* 6 */ set register.daz = 10; // Foo.ctor
Notice how the assignment to memberFoo occurs before the constructor is run. That is valid because it does not have any unintended side-effects from the perspective of the thread executing it. It could, however, have a major impact on other threads. What happens if your null check of memberFoo on the reading thread occurred when the writing thread just fininished instruction #4? The reader will see a non-null value and then attempt to invoke Bar before the daz variable got set to 10. daz will still hold its default value of 0 thus leading to a divide by zero error. Of course, this is mostly theoretical because Microsoft's implementation of the CLR creates a release-fence on writes that would prevent this. But, the specification would technically allow for it. See this question for related content.

I think I have figured out what the answer is. But I'm not a hardware guy, so I'm open to being corrected by someone more familiar with how CPUs work.
The .NET 2.0 memory model guarantees:
Writes cannot move past other writes from the same thread.
This means that the writing CPU (A in the example), will never write a reference to an object into memory (to Q), until after it has written out contents of that object being constructed (to R). So far, so good. This cannot be re-ordered:
R = <data>
Q = &R
Let's consider the reading CPU (B). What is to stop it reading from R before it reads from Q?
On a sufficiently naïve CPU, one would expect it to be impossible to read from R without first reading from Q. We must first read Q to get the address of R. (Note: it is safe to assume that the C# compiler and JIT behave this way.)
But, if the reading CPU has a cache, couldn't it have stale memory for R in its cache, but receive the updated Q?
The answer seems to be no. For sane cache coherency protocols, invalidation is implemented as a queue (hence "invalidation queue"). So R will always be invalidated before Q is invalidated.
Apparently the only hardware where this is not the case is the DEC Alpha (according to Table 1, here). It is the only listed architecture where dependent reads can be re-ordered. (Further reading.)

Capturing reference to immutable object guarantees thread safety (in sense of consistency, it does not guarantee that you get the latest value).
List of events handlers are immutable and thus it is enough for thread safety to capture reference to current value. The whole object would be consistent as it never change after initial creation.
Your sample code does not explicitly state if Foo is immutable, so you get all sorts of problems with figuring out whether the object can change or not i.e. directly by setting properties. Note that code would be "unsafe" even in single-threaded case as you can't guarantee that particular instance of Foo does not change.
On CPU caches and like: The only change that can invalidate data at actual location in memory for true immutable object is GC's compaction. That code ensures all necessary locks/cache consistency - so managed code would never observe change in bytes referenced by your cached pointer to immutable object.

When this is evaluated:
thing.memberFoo = new Foo(1234);
First new Foo(1234) is evaluated, which means that the Foo constructor executes to completion. Then thing.memberFoo is assigned the value. This means that any other thread reading from thing.memberFoo is not going to read an incomplete object. It's either going to read the old value, or it's going to read the reference to the new Foo object after its constructor has completed. Whether this new object is in the cache or not is irrelevant; the reference being read won't point to the new object until after the constructor has completed.
The same thing happens with the object pool. Everything on the right evaluates completely before the assignment happens.
In your example, B will never get the reference to R before R's constructor has run, because A does not write R to Q until A has completed constructing R. If B reads Q before that, it will get whatever value was already in Q. If R's constructor throws an exception, then Q will never be written to.
C# order of operations guarantees this will happen this way. Assignment operators have the lowest precedence, and new and function call operators have the highest precedence. This guarantees that the new will evaluate before the assignment is evaluated. This is required for things like exceptions -- if an exception is thrown by the constructor then the object being allocated will be in an invalid state and you don't want that assignment to occur regardless of whether you're multithreaded or not.

It seems to me you should be using see this article in this case. This ensures the compiler doesn't perform optimisations that assume access by a single thread.
Events used to use locks, but as of C# 4 use lock-free synchronisation - I'm not sure exactly what (see this article).
EDIT:
The Interlocked methods use memory barriers which will ensure all threads read the updated value (on any sane system). So long as you perform all updates with Interlocked you can safely read the value from any thread without a memory barrier. This is the pattern used in the System.Collections.Concurrent classes.

Variable freshness guarantee in .NET (volatile vs. volatile read)

I have read many contradicting information (msdn, SO etc.) about volatile and VoletileRead (ReadAcquireFence).
I understand the memory access reordering restriction implication of those - what I'm still completely confused about is the freshness guarantee - which is very important for me.
msdn doc for volatile mentions:
(...) This ensures that the most up-to-date value is present in the field at all times.
msdn doc for volatile fields mentions:
A read of a volatile field is called a volatile read. A volatile read has "acquire semantics"; that is, it is guaranteed to occur prior to any references to memory that occur after it in the instruction sequence.
.NET code for VolatileRead is:
public static int VolatileRead(ref int address)
{
int ret = address;
MemoryBarrier(); // Call MemoryBarrier to ensure the proper semantic in a portable way.
return ret;
}
According to msdn MemoryBarrier doc Memory barrier prevents reordering. However this doesn't seem to have any implications on freshness - correct?
How then one can get freshness guarantee?
And is there difference between marking field volatile and accessing it with VolatileRead and VolatileWrite semantic? I'm currently doing the later in my performance critical code that needs to guarantee freshness, however readers sometimes get stale value. I'm wondering if marking the state volatile will make situation different.
EDIT1:
What I'm trying to achieve - get the guarantee that reader threads will get as recent value of shared variable (written by multiple writers) as possible - ideally no older than what is the cost of context switch or other operations that may postpone the immediate write of state.
If volatile or higher level construct (e.g. lock) have this guarantee (do they?) than how do they achieve this?
EDIT2:
The very condensed question should have been - how do I get guarantee of as fresh value during reads as possible? Ideally without locking (as exclusive access is not needed and there is potential for high contention).
From what I learned here I'm wondering if this might be the solution (solving(?) line is marked with comment):
private SharedState _sharedState;
private SpinLock _spinLock = new SpinLock(false);
public void Update(SharedState newValue)
{
bool lockTaken = false;
_spinLock.Enter(ref lockTaken);
_sharedState = newValue;
if (lockTaken)
{
_spinLock.Exit();
}
}
public SharedState GetFreshSharedState
{
get
{
Thread.MemoryBarrier(); // <---- This is added to give readers freshness guarantee
var value = _sharedState;
Thread.MemoryBarrier();
return value;
}
}
The MemoryBarrier call was added to make sure both - reads and writes - are wrapped by full fences (same as lock code - as indicated here http://www.albahari.com/threading/part4.aspx#_The_volatile_keyword 'Memory barriers and locking' section)
Does this look correct or is it flawed?
EDIT3:
Thanks to very interesting discussions here I learned quite a few things and I actually was able to distill to the simplified unambiguous question that I have about this topic. It's quite different from the original one so I rather posted a new one here: Memory barrier vs Interlocked impact on memory caches coherency timing

I think this is a good question. But, it is also difficult to answer. I am not sure I can give you a definitive answer to your questions. It is not your fault really. It is just that the subject matter is complex and really requires knowing details that might not be feasible to enumerate. Honestly, it really seems like you have educated yourself on the subject quite well already. I have spent a lot of time studying the subject myself and I still do not fully understand everything. Nevertheless, I will still attempt an some semblance of an answer here anyway.
So what does it mean for a thread to read a fresh value anyway? Does it mean the value returned by the read is guaranteed to be no older than 100ms, 50ms, or 1ms? Or does it mean the value is the absolute latest? Or does it mean that if two reads occur back-to-back then the second is guaranteed to get a newer value assuming the memory address changed after the first read? Or does it mean something else altogether?
I think you are going to have a hard time getting your readers to work correctly if you are thinking about things in terms of time intervals. Instead think of things in terms of what happens when you chain reads together. To illustrate my point consider how you would implement an interlocked-like operation using arbitrarily complex logic.
public static T InterlockedOperation<T>(ref T location, T operand)
{
T initial, computed;
do
{
initial = location;
computed = op(initial, operand); // where op is replaced with a specific implementation
}
while (Interlocked.CompareExchange(ref location, computed, initial) != initial);
return computed;
}
In the code above we can create any interlocked-like operation if we exploit the fact that the second read of location via Interlocked.CompareExchange will be guaranteed to return a newer value if the memory address received a write after the first read. This is because the Interlocked.CompareExchange method generates a memory barrier. If the value has changed between reads then the code spins around the loop repeatedly until location stops changing. This pattern does not require that the code use the latest or freshest value; just a newer value. The distinction is crucial.1
A lot of lock free code I have seen works on this principal. That is the operations are usually wrapped into loops such that the operation is continually retried until it succeeds. It does not assume that the first attempt is using the latest value. Nor does it assume every use of the value be the latest. It only assumes that the value is newer after each read.
Try to rethink how your readers should behave. Try to make them more agnostic about the age of the value. If that is simply not possible and all writes must be captured and processed then you may be forced into a more deterministic approach like placing all writes into a queue and having the readers dequeue them one-by-one. I am sure the ConcurrentQueue class would help in that situation.
If you can reduce the meaning of "fresh" to only "newer" then placing a call to Thread.MemoryBarrier after each read, using Volatile.Read, using the volatile keyword, etc. will absolutely guarantee that one read in a sequence will return a newer value than a previous read.
1The ABA problem opens up a new can of worms.

A memory barrier does provide this guarantee. We can derive the "freshness" property that you are looking for from the reording properties that a barrier guarantees.
By freshness you probably mean that a read returns the value of the most recent write.
Let's say we have these operations, each on a different thread:
x = 1
x = 2
print(x)
How could we possibly print a value other than 2? Without volatile the read can move one slot upwards and return 1. Volatile prevents reorderings, though. The write cannot move backwards in time.
In short, volatile guarantees you to see the most recent value.
Strictly speaking I'd need to differentiate between volatile and a memory barrier here. The latter one is a stronger guarantee. I have simplified this discussion because volatile is implemented using memory barriers, at least on x86/x64.

Fields read from/written by several threads, Interlocked vs. volatile

There are a fair share of questions about Interlocked vs. volatile here on SO, I understand and know the concepts of volatile (no re-ordering, always reading from memory, etc.) and I am aware of how Interlocked works in that it performs an atomic operation.
But my question is this: Assume I have a field that is read from several threads, which is some reference type, say: public Object MyObject;. I know that if I do a compare exchange on it, like this: Interlocked.CompareExchange(ref MyObject, newValue, oldValue) that interlocked guarantees to only write newValue to the memory location that ref MyObject refers to, if ref MyObject and oldValue currently refer to the same object.
But what about reading? Does Interlocked guarantee that any threads reading MyObject after the CompareExchange operation succeeded will instantly get the new value, or do I have to mark MyObject as volatile to ensure this?
The reason I'm wondering is that I've implemented a lock-free linked list which updates the "head" node inside itself constantly when you prepend an element to it, like this:
[System.Diagnostics.DebuggerDisplay("Length={Length}")]
public class LinkedList<T>
{
LList<T>.Cell head;
// ....
public void Prepend(T item)
{
LList<T>.Cell oldHead;
LList<T>.Cell newHead;
do
{
oldHead = head;
newHead = LList<T>.Cons(item, oldHead);
} while (!Object.ReferenceEquals(Interlocked.CompareExchange(ref head, newHead, oldHead), oldHead));
}
// ....
}
Now after Prepend succeeds, are the threads reading head guaranteed to get the latest version, even though it's not marked as volatile?
I have been doing some empirical tests, and it seems to be working fine, and I have searched here on SO but not found a definitive answer (a bunch of different questions and comments/answer in them all say conflicting things).

Does Interlocked guarantee that any threads reading MyObject after the CompareExchange operation succeeded will instantly get the new value, or do I have to mark MyObject as volatile to ensure this?
Yes, subsequent reads on the same thread will get the new value.
Your loop unrolls to this:
oldHead = head;
newHead = ... ;
Interlocked.CompareExchange(ref head, newHead, oldHead) // full fence
oldHead = head; // this read cannot move before the fence
EDIT:
Normal caching can happen on other threads. Consider:
var copy = head;
while ( copy == head )
{
}
If you run that on another thread, the complier can cache the value of head and never see the update.

Your code should work fine. Though it is not clearly documented the Interlocked.CompareExchange method will produce a full-fence barrier. I suppose you could make one small change and omit the Object.ReferenceEquals call in favor of relying on the != operator which would perform reference equality by default.
For what it is worth the documentation for the InterlockedCompareExchange Win API call is much better.
This function generates a full memory barrier (or fence) to ensure
that memory operations are completed in order.
It is a shame the same level documenation does not exist on the .NET BCL counterpart Interlocked.CompareExchange because it is very likely they map to the exact same underlying mechanisms for the CAS.
Now after Prepend succeeds, are the threads reading head guaranteed to
get the latest version, even though it's not marked as volatile?
No, not necessarily. If those threads do not generate an acquire-fence barrier then there is no guarantee that they will read the latest value. Make sure that you perform a volatile read upon any use of head. You have already ensured that in Prepend with the Interlocked.CompareExchange call. Sure, that code may go through the loop once with an stale value of head, but the next iteration will be refreshed due to the Interlocked operation.
So if the context of your question was in regards to other threads who are also executing Prepend then nothing more needs to be done.
But if the context of your question was in regards to other threads executing another method on LinkedList then make sure you use Thread.VolatileRead or Interlocked.CompareExchange where appropriate.
Side note...there may be a micro-optimization that could be performed on the following code.
newHead = LList<T>.Cons(item, oldHead);
The only problem I see with this is that memory is allocated on every iteration of the loop. During periods of high contention the loop may spin around several times before it finally succeeds. You could probably lift this line outside of the loop as long as you reassign the linked reference to oldHead on every iteration (so that you get the fresh read). This way memory is only allocated once.

When implementing a thread safe queue or list, is it required to lock before returning Count?

When implementing a thread safe list or queue; does it require to lock on the List.Count property before returning the Count i.e:
//...
public int Count
{
lock (_syncObject)
{
return _list.Count;
}
}
//...
Is it necessary to do a lock because of the original _list.Count variable maybe not a volatile variable?

Yes, that is necessary, but mostly irrelevant. Without the lock you could be reading stale values or even, depending on the inner working and the type of _list.Count, introduce errors.
But note that using that Count property is problematic, any calling code cannot really rely on it:
if (myStore.Count > 0) // this Count's getter locks internally
{
var item = myStore.Dequeue(); // not safe, myStore could be empty
}
So you should aim for a design where checking the Count and acting on it are combined:
ItemType GetNullOrFirst()
{
lock (_syncObject)
{
if (_list.Count > 0)
{
....
}
}
}
Additional:
is it nessesary to do a lock because of the original _list.Count variable maybe not a volatile variable?
The _list.Count is not a variable but a property. It cannot be marked volatile. Whether it is thread-safe depends on the getter code of the property, but it will usually be safe. But unreliable.

It depends.
Now, theoretically, because the underlying value is not threadsafe, just about anything could go wrong because of something the implementation does. In practice though, it's a read from a variable of 32-bits and so guaranteed to be atomic. Hence it might be stale, but it will be a real stale value rather than garbage caused by reading half a value before a change and the other half after it.
So the question is, does staleness matter? Possibly it doesn't. The more you can put up with staleness the better for performance (because the more you can put up with it the less you need to do to ensure you don't have anything stale). E.g. if you were putting the count out to the UI and the collection was changing rapidly, then just the time it takes for the person to read the number and process it in their own brain is going to be enough to make it obsolete anyway, and hence stale.
If however, you needed it to ensure an operation was reasonable before it was attempted, then that staleness is going to cause problems.
However, that staleness is going to happen anyway, because in between your locked (and guaranteed to be fresh) read and the operation happening, there's the opportunity for the collection to change, so you've got a race condition even when locking.
Hence, if freshness is important, you are going to have to lock at a higher level. With this being the case, the lower-level lock is just a waste. Hence you can probably do without it at that point.
The important thing here is that even with a class that is threadsafe in every regard, the best that can guarantee is that each operation will be fresh (though even that may not be guaranteed; indeed when there are more than one core involved "fresh" begins to become meaningless as changes that are near to truly simultaneous can happen) and that each operation won't put the object into an invalid safe. It is still possible to write non-threadsafe code with threadsafe objects (indeed very many non-threadsafe objects are composed of ints, strings and bools or objects that are in turn composed of them, and each of those is threadsafe in and of itself).
One thing that can be useful with mutable classes intended for multithreaded use are synchronised "Try" methods. E.g. to use a List as a stack we need the following operations:
See if the list is empty, reporting failure if it is.
Obtain the "top" value.
Remove the top value.
Even if we synchronise each of those individually, the code doing so won't be threadsafe as something can happen between each step. However, we can provide a Pop() method that does each three in one synchronised method. Analogous approaches are also useful with lock-free classes (where different techniques are used to ensure the method either succeeds or fails completely, and without damage to the object).

No, you don't need to lock, but the caller should otherwise something like this might happen
count is n
thread asks for count
Count returns n
another thread dequeues
count is n - 1
thread asking for count sees count is n when count is actually n - 1

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.