There is a very good article by Joe Albahari explaining volatile in C#: Threading in C#: PART 4: ADVANCED THREADING.
Considering instruction reordering Joe uses this example:
public class IfYouThinkYouUnderstandVolatile
{
private volatile int x, y;
private void Test1() // Executed on one thread
{
this.x = 1; // Volatile write (release-fence)
int a = this.y; // Volatile read (acquire-fence)
}
private void Test2() // Executed on another thread
{
this.y = 1; // Volatile write (release-fence)
int b = this.x; // Volatile read (acquire-fence)
}
}
Basically what he is saying is that a and b both could end up with containing 0 when the methods are running on different threads in parallel.
IOW the optimizer or processor could reorder the instructions as follows:
public class IfYouThinkYouUnderstandVolatileReordered
{
private volatile int x, y;
private void Test1() // Executed on one thread
{
int tempY = this.y; // Volatile read (reordered)
this.x = 1; // Volatile write
int a = tempY; // Use the already read value
}
private void Test2() // Executed on another thread
{
int tempX = this.x; // Volatile read (reordered)
this.y = 1; // Volatile write (release-fence)
int b = tempX; // Use the already read value
}
}
The reason why this can happen though we are using volatile is that a read instruction following a write instruction can be moved before the write instruction.
So far I understand what is happening here.
My question is: could this reordering work through stack frames? I mean can a volatile write instruction be moved after a volatile read instruction which is happening in another method (or property accessor)?
Have a look at the following code: it is working with properties instead of directly accessing instance variables.
What about reordering in this case? Could it happen in any case? Or could it only happen if the property access is inlined by the compiler?
public class IfYouThinkYouUnderstandVolatileWithProps
{
private volatile int x, y;
public int PropX
{
get { return this.x; }
set { this.x = value; }
}
public int PropY
{
get { return this.y; }
set { this.y = value; }
}
private void Test1() // Executed on one thread
{
this.PropX = 1; // Volatile write (release-fence)
int a = this.PropY; // Volatile read (acquire-fence)
}
private void Test2() // Executed on another thread
{
this.PropY = 1; // Volatile write (release-fence)
int b = this.PropX; // Volatile read (acquire-fence)
}
}
As said in ECMA-335
I.12.6.4 Optimization
Conforming implementations of the CLI are free to execute programs using any technology that guarantees, within a single thread of execution, that side-effects and exceptions generated by a thread are visible in the order specified by the CIL. For this purpose only volatile operations (including volatile reads) constitute visible side-effects. (Note that while only volatile operations constitute visible side-effects, volatile operations also affect the visibility of non-volatile references.)
Volatile operations are specified in §I.12.6.7. There are no ordering guarantees relative to exceptions injected into a thread by another thread (such exceptions are sometimes called “asynchronous exceptions”
(e.g., System.Threading.ThreadAbortException).
So, obviously it's allowed to inline all that code and then it's the same as it was.
You should not think about such high levels things because you can't control them.
JIT has many reasons to inline or not.
Reordering is a good concept which allows you to reason about possible outcomes of parallel code execution. But the real things happening are not only about reordering read/write operations. It can be real reordering or caching values in CPU registers by JIT, or effects of speculative execution by the CPU itself, or how memory controller does its job.
Think of reads and writes of pieces of memory of pointer (or less) size. Use reordering model of such reads and writes and don't rely on today's specifics of the JIT or CPU your program runs on.
Related
Is the following code thread-safe?
public object DemoObject {get;set;}
public void DemoMethod()
{
if (DemoObject is IDemoInterface demo)
{
demo.DoSomething();
}
}
If other threads modify DemoObject (e.g. set to null) while DemoMethod is being processed, is it guaranteed that within the if block the local variable demo will always be assigned correctly (to an instance of type IDemoInterface)?
The is construct here is atomic much like interlocked. However the behavior of this code is almost 100% non deterministic. Unless the objective is to create unpredictable and non deterministic behavior this would be a bug.
Valid usage example of this code: In a game to simulate the possibility of some non deterministic event such as "Neo from the Matrix catching a bullet in mid air", this method may be more non deterministic that simply using a pseudo random number generator.
In any scenario where deterministic / predictable behavior is expected this code is a bug.
Explanation:
if (DemoObject is IDemoInterface demo)
is evaluated and assigned pseudo atomically.
Thereafter within the if statement:
even if DemoObject is set to null by another thread the value of demo has already been assigned and the DoSomething() operation is executed on the already assigned instance.
To answer your comment questions:
why is there a race?
The race condition is by design in this code. In the example code below:
16 threads are competing to set the value of DemoObject to null
while another 16 threads are competing to set the value of DemoObject to an instance of DemoClass.
At the same time 16 threads are competing to execute DoSomething() whenever they win the race condition when DemoObject is NOT null.
See: What is a race condition?
and why can i not predict whether DoSomething() will execute?
DoSomething() will execute each time
if (DemoObject is IDemoInterface demo)
evaluates to true. Each time DemoObject is null or NOT IDemoInterface it will NOT execute.
You cannot predict when it will execute. You can only predict that it will execute whenever the thread executing DoSomething() manages to get a reference to a non null instance of DemoObject. Or in other words when a thread running DemoMethod() manages to win the race condition:
A) after a thread running DemoMethod_Assign() wins the race condition
B) and before a thread running DemoMethod_Null() wins the race condition
Caveat: As per my understanding (Someone else please clarify this point) DemoObject may be both null and not null at the same time across different threads.
DemoObject may be read from cache or may be read from main memory. We cannot make it volatile since it is an object reference. Therefore the state of DemoObject may be simultaneously Null for one thread and not null for another thread. Meaning its value is non deterministic. In Schrödinger's cat, the cat is both dead and alive simultaneously. We have much the same situation here.
There are no locks or memory barriers in this code with respect to DemoObject. However a thread context switch forces the equivalent of a memory barrier. Therefore any thread resuming after a context switch will have an accurate copy of the value of DemoObject as retrieved from main memory. However a different thread may have altered the value of DemoObject but this altered value may not have been flushed to main memory yet. Which then brings into question which is the actual accurate value? The value fetched from main memory or the value not yet flushed to main memory.
Note: Someone else please clarify this Caveat as I may have missed something.
Here is some code to validate everything above except the Caveat. Ran this console app test on a machine with 64 logical cores. Null reference exception is never thrown.
internal class Program
{
private static ManualResetEvent BenchWaitHandle = new ManualResetEvent(false);
private class DemoClass : IDemoInterface
{
public void DoSomething()
{
Interlocked.Increment(ref Program.DidSomethingCount);
}
}
private interface IDemoInterface
{
void DoSomething();
}
private static object DemoObject { get; set; }
public static volatile int DidSomethingCount = 0;
private static void DemoMethod()
{
BenchWaitHandle.WaitOne();
for (int i = 0; i < 100000000; i++)
{
try
{
if (DemoObject is IDemoInterface demo)
{
demo.DoSomething();
}
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
}
}
}
private static bool m_IsRunning = false;
private static object RunningLock = new object();
private static bool IsRunning
{
get { lock (RunningLock) { return m_IsRunning; } }
set { lock(RunningLock) { m_IsRunning = value; } }
}
private static void DemoMethod_Assign()
{
BenchWaitHandle.WaitOne();
while (IsRunning)
{
DemoObject = new DemoClass();
}
}
private static void DemoMethod_Null()
{
BenchWaitHandle.WaitOne();
while (IsRunning)
{
DemoObject = null;
}
}
static void Main(string[] args)
{
List<Thread> threadsListDoWork = new List<Thread>();
List<Thread> threadsList = new List<Thread>();
BenchWaitHandle.Reset();
for (int I =0; I < 16; I++)
{
threadsListDoWork.Add(new Thread(new ThreadStart(DemoMethod)));
threadsList.Add(new Thread(new ThreadStart(DemoMethod_Assign)));
threadsList.Add(new Thread(new ThreadStart(DemoMethod_Null)));
}
foreach (Thread t in threadsListDoWork)
{
t.Start();
}
foreach (Thread t in threadsList)
{
t.Start();
}
IsRunning = true;
BenchWaitHandle.Set();
foreach (Thread t in threadsListDoWork)
{
t.Join();
}
IsRunning = false;
foreach (Thread t in threadsList)
{
t.Join();
}
Console.WriteLine(#"Did Something {0} times", DidSomethingCount);
Console.ReadLine();
}
//On the last run this printed
//Did Something 112780926 times
//Which means that DemoMethod() threads won the race condition slightly over 7% of the time.
I read recently about memory barriers and the reordering issue and now I have some confusion about it.
Consider the following scenario:
private object _object1 = null;
private object _object2 = null;
private bool _usingObject1 = false;
private object MyObject
{
get
{
if (_usingObject1)
{
return _object1;
}
else
{
return _object2;
}
}
set
{
if (_usingObject1)
{
_object1 = value;
}
else
{
_object2 = value;
}
}
}
private void Update()
{
_usingMethod1 = true;
SomeProperty = FooMethod();
//..
_usingMethod1 = false;
}
At Update method; is the _usingMethod1 = true statement always executed before getting or setting the property? or due to reordering issue we can not guarantee that?
Should we use volatile like
private volatile bool _usingMethod1 = false;
If we use lock; can we guarantee then every statement within the lock will be executed in order like:
private void FooMethod()
{
object locker = new object();
lock (locker)
{
x = 1;
y = a;
i++;
}
}
The subject of memory barriers is quite complex. It even trips up the experts from time to time. When we talk about a memory barrier we are really combining two different ideas.
Acquire fence: A memory barrier in which other reads & writes are not allowed to move before the fence.
Release fence: A memory barrier in which other reads & writes are not allowed to move after the fence.
A memory barrier that creates only one of two is sometimes called a half-fence. A memory barrier that creates both is sometimes called a full-fence.
The volatile keyword creates half-fences. Reads of volatile fields have acquire semantics while writes have release semantics. That means no instruction can be moved before a read or after a write.
The lock keyword creates full-fences on both boundaries (entry and exit). That means no instruction can be moved either before or after each boundary.
However, all of this moot if we are only concerned with one thread. Ordering, as it is perceived by that thread, is always preserved. In fact, without that fundamental guarentee no program would ever work right. The real issue is with how other threads perceive reads and writes. That is where you need to be concerned.
So to answer your questions:
From a single thread's perspective...yes. From another thread's perspective...no.
It depends. That might work, but I need to have better understanding of what you are trying to acheive.
From another thread's perspective...no. The reads and writes are free to move around within the boundaries of the lock. They just cannot move outside those boundaries. That is why it is important for other threads to also create memory barriers.
The volatile keyword doesn't accomplish anything here. It has very weak guarantees, it does not imply a memory barrier. Your code doesn't show another thread getting created so it is hard to guess if locking is required. It is however a hard requirement if two threads can execute Update() at the same time and use the same object.
Beware that your lock code as posted doesn't lock anything. Each thread would have its own instance of the "locker" object. You have to make it a private field of your class, created by the constructor or an initializer. Thus:
private object locker = new object();
private void Update()
{
lock (locker)
{
_usingMethod1 = true;
SomeProperty = FooMethod();
//..
_usingMethod1 = false;
}
}
Note that there will also be a race on the SomeProperty assignment.
If I have a variable which is assigned to at the start of the program and then the program creates a few threads and then refers to it, is it thread safe?
private int myVal
private void StartOfApp()
{
myVal = 99;
}
private void MethodCalledFromVariousThreads()
{
int i = 100;
if (i > myVal) //Is reading this variable thread safe?
{
//Do Stuff
}
}
}
It's fine as long as you can guarantee that StartOfApp will complete the assignment before MethodCalled.
This would be better , imo:
private const int myVal = 99;
private void MethodCalledFromVariousThreads()
{
int i = 100;
if (i > myVal) //Is reading this variable thread safe?
{
//Do Stuff
}
}
Yes, this is thread safe. Because you are never writing to the variable ( I assume ) it's data is, in essence, immutable. ( Ok, so it really is mutable because this is C#, but you get the idea ). Because of this, it will always return the same value and is thusly thread safe to read from.
If you never write to a variable, except when creating it, then it will always be thread safe to read from.
While I was learning threading memory barrier (fences) seems really not easy to understand, here in my case I want employee 10 threads simultaneously increase a Int32 number: x by 100 times at each (x++), and will get result 10 * 100 = 1000.
So this is actually an atomicity problem, and what I know so far there are a number of ways to achieve that - limited in concurrent ways:
Interlocked.Increment
exclusive lock (lock, monitor, Mutex, Semaphore, etc.)
ReadWriteLockSlim
If there are more better ways please kindly guide me, I tried to use a volatile read/write but failed:
for (int i = 0; i < 10000; i++)
{
Thread.VolatileRead(ref x);
Thread.VolatileWrite(ref x, x + 1);
}
My investigation code is tidied below:
private const int MaxThraedCount = 10;
private Thread[] m_Workers = new Thread[MaxThraedCount];
private volatile int m_Counter = 0;
private Int32 x = 0;
protected void btn_DoWork_Click(object sender, EventArgs e)
{
for (int i = 0; i < MaxThraedCount; i++)
{
m_Workers[i] = new Thread(IncreaseNumber) { Name = "Thread " + (i + 1) };
m_Workers[i].Start();
}
}
void IncreaseNumber()
{
try
{
for (int i = 0; i < 10000; i++)
Interlocked.Increment(ref x);
// Increases Counter and decides whether or not sets the finish signal
m_Counter++;
if (m_Counter == MaxThraedCount)
{
// Print finish information on UI
m_Counter = 0;
}
}
catch (Exception ex)
{
throw;
}
}
My question is: how can I use Memory Barrier to replace Interlocked, since "All of Interlocked’s methods generate a full fence.", I tried to modify the increase loop as below but failed, I don't understand why...
for (int i = 0; i < 10000; i++)
{
Thread.MemoryBarrier();
x++;
Thread.MemoryBarrier();
}
The memory barrier just keeps memory operations from moving from one side of the barrier to the other. Your issue is this:
Thread A reads the value of X.
Thread B reads the value of X.
Thread A adds one to the value it read.
Thread B adds one to the value it read.
Thread A writes back the value it calculated.
Thread B writes back the value it calculated.
Oops, two increments only added one. Memory barriers are not atomic operations and they are not locks. They just enforce ordering, not atomicity.
Unfortunately, the x86 architecture does not offer any atomic operations that don't include a full fence. It is what it is. On the bright side, the full fence is heavily optimized. (For example, it does not ever lock any bus.)
You can't "use MemoryBarrier to replace Interlocked". They are two different tools.
Use MemoryBarrier, volatile etc to control re-ordering of reads and writes. Use Interlocked, lock etc for atomicity.
(Besides, are you aware that calling MemoryBarrier also generates a full fence*, as do VolatileRead and VolatileWrite? So if you're trying to avoid Interlocked, lock etc for performance reasons, there's a good chance that your alternatives will be less performant as well as more likely to be broken.)
*In the standard Microsoft CLR, at least. I'm not sure about Mono etc.
I read recently about memory barriers and the reordering issue and now I have some confusion about it.
Consider the following scenario:
private object _object1 = null;
private object _object2 = null;
private bool _usingObject1 = false;
private object MyObject
{
get
{
if (_usingObject1)
{
return _object1;
}
else
{
return _object2;
}
}
set
{
if (_usingObject1)
{
_object1 = value;
}
else
{
_object2 = value;
}
}
}
private void Update()
{
_usingMethod1 = true;
SomeProperty = FooMethod();
//..
_usingMethod1 = false;
}
At Update method; is the _usingMethod1 = true statement always executed before getting or setting the property? or due to reordering issue we can not guarantee that?
Should we use volatile like
private volatile bool _usingMethod1 = false;
If we use lock; can we guarantee then every statement within the lock will be executed in order like:
private void FooMethod()
{
object locker = new object();
lock (locker)
{
x = 1;
y = a;
i++;
}
}
The subject of memory barriers is quite complex. It even trips up the experts from time to time. When we talk about a memory barrier we are really combining two different ideas.
Acquire fence: A memory barrier in which other reads & writes are not allowed to move before the fence.
Release fence: A memory barrier in which other reads & writes are not allowed to move after the fence.
A memory barrier that creates only one of two is sometimes called a half-fence. A memory barrier that creates both is sometimes called a full-fence.
The volatile keyword creates half-fences. Reads of volatile fields have acquire semantics while writes have release semantics. That means no instruction can be moved before a read or after a write.
The lock keyword creates full-fences on both boundaries (entry and exit). That means no instruction can be moved either before or after each boundary.
However, all of this moot if we are only concerned with one thread. Ordering, as it is perceived by that thread, is always preserved. In fact, without that fundamental guarentee no program would ever work right. The real issue is with how other threads perceive reads and writes. That is where you need to be concerned.
So to answer your questions:
From a single thread's perspective...yes. From another thread's perspective...no.
It depends. That might work, but I need to have better understanding of what you are trying to acheive.
From another thread's perspective...no. The reads and writes are free to move around within the boundaries of the lock. They just cannot move outside those boundaries. That is why it is important for other threads to also create memory barriers.
The volatile keyword doesn't accomplish anything here. It has very weak guarantees, it does not imply a memory barrier. Your code doesn't show another thread getting created so it is hard to guess if locking is required. It is however a hard requirement if two threads can execute Update() at the same time and use the same object.
Beware that your lock code as posted doesn't lock anything. Each thread would have its own instance of the "locker" object. You have to make it a private field of your class, created by the constructor or an initializer. Thus:
private object locker = new object();
private void Update()
{
lock (locker)
{
_usingMethod1 = true;
SomeProperty = FooMethod();
//..
_usingMethod1 = false;
}
}
Note that there will also be a race on the SomeProperty assignment.