I've been reading Joe Duffy's book on Concurrent programming. I have kind of an academic question about lockless threading.
First: I know that lockless threading is fraught with peril (if you don't believe me, read the sections in the book about memory model)
Nevertheless, I have a question:
suppose I have an class with an int property on it.
The value referenced by this property will be read very frequently by multiple threads
It is extremely rare that the value will change, and when it does it will be a single thread that changes it.
If it does change while another operation that uses it is in flight, no one is going to lose a finger (the first thing anyone using it does is copy it to a local variable)
I could use locks (or a readerwriterlockslim to keep the reads concurrent).
I could mark the variable volatile (lots of examples where this is done)
However, even volatile can impose a performance hit.
What if I use VolatileWrite when it changes, and leave the access normal for reads. Something like this:
public class MyClass
{
private int _TheProperty;
internal int TheProperty
{
get { return _TheProperty; }
set { System.Threading.Thread.VolatileWrite(ref _TheProperty, value); }
}
}
I don't think that I would ever try this in real life, but I'm curious about the answer (more than anything, as a checkpoint of whether I understand the memory model stuff I've been reading).
Marking a variable as "volatile" has two effects.
1) Reads and writes have acquire and release semantics, so that reads and writes of other memory locations will not "move forwards and backwards in time" with respect to reads and writes of this memory location. (This is a simplification, but you take my point.)
2) The code generated by the jitter will not "cache" a value that seems to logically be unchanging.
Whether the former point is relevant in your scenario, I don't know; you've only described one memory location. Whether or not it is important that you have only volatile writes but not volatile reads is something that is up to you to decide.
But it seems to me that the latter point is quite relevant. If you have a spin lock on a non-volatile variable:
while(this.prop == 0) {}
the jitter is within its rights to generate this code as though you'd written
if (this.prop == 0) { while (true) {} }
Whether it actually does so or not, I don't know, but it has the right to. If what you want is for the code to actually re-check the property on each go round the loop, marking it as volatile is the right way to go.
The question is whether the reading thread will ever see the change. It's not just a matter of whether it sees it immediately.
Frankly I've given up on trying to understand volatility - I know it doesn't mean quite what I thought it used to... but I also know that with no kind of memory barrier on the reading thread, you could be reading the same old data forever.
The "performance hit" of volatile is because the compiler now generates code to actually check the value instead of optimizing that away - in other words, you'll have to take that performance hit regardless of what you do.
At the CPU level, yes every processor will eventually see the change to the memory address. Even without locks or memory barriers. Locks and barriers would just ensure that it all happened in a relative ordering (w.r.t other instructions) such that it appeared correct to your program.
The problem isn't cache-coherency (I hope Joe Duffy's book doesn't make that mistake). The caches stay conherent - it is just that this takes time, and the processors don't bother to wait for that to happen - unless you enforce it. So instead, the processor moves on to the next instruction, which may or may not end up happening before the previous one (because each memory read/write make take a different amount of time. Ironically because of the time for the processors to agree on coherency, etc. - this causes some cachelines to be conherent faster than others (ie depending on whether the line was Modified, Exclusive, Shared, or Invalid it takes more or less work to get into the necessary state).)
So a read may appear old or from an out of date cache, but really it just happened earlier than expected (typically because of look-ahead and branch prediction). When it really was read, the cache was coherent, it has just changed since then. So the value wasn't old when you read it, but it is now when you need it. You just read it too soon. :-(
Or equivalently, it was written later than the logic of your code thought it would be written.
Or both.
Anyhow, if this was C/C++, even without locks/barriers, you would eventually get the updated values. (within a few hundred cycles typically, as memory takes about that long). In C/C++ you could use volatile (the weak non-thread volatile) to ensure that the value wasn't read from a register. (Now there's a non-coherent cache! ie the registers)
In C# I don't know enough about CLR to know how long a value could stay in a register, nor how to ensure you get a real re-read from memory. You've lost the 'weak' volatile.
I would suspect as long as the variable access doesn't completely get compiled away, you will eventually run out of registers (x86 doesn't have many to start with) and get your re-read.
But no guarantees that I see. If you could limit your volatile-read to a particular point in your code that was often, but not too often (ie start of next task in a while(things_to_do) loop) then that might be the best you can do.
This is the pattern I use when the 'last writer wins' pattern is applicable to the situation. I had used the volatile keyword, but after seeing this pattern in a code example from Jeffery Richter, I started using it.
For normal things (like memory-mapped devices), the cache-coherency protocols going on within/between the CPU/CPUs is there to ensure that different threads sharing that memory get a consistent view of things (i.e., if I change the value of a memory location in one CPU, it will be seen by other CPUs that have the memory in their caches). In this regard volatile will help to ensure that the optimizer doesn't optimize away memory accesses (which are always going through cache anyway) by, say, reading the value cached in a register. The C# documentation seems pretty clear on this. Again, the application programmer doesn't generally have to deal with cache-coherency themselves.
I highly recommend reading the freely available paper "What Every Programmer Should Know About Memory". A lot of magic goes on under the hood that mostly prevents shooting oneself in the foot.
In C#, the int type is thread-safe.
Since you said that only one thread writes to it, you should never have contention as to what is the proper value, and as long as you are caching a local copy, you should never get dirty data.
You may, however, want to declare it volatile if an OS thread will be doing the update.
Also keep in mind that some operations are not atomic, and can cause problems if you have more than one writer. For example, even though the bool type wont corrupt if you have more than one writer, a statement like this:
a = !a;
is not atomic. If two threads read at the same time, you have a race condition.
Related
The documentation for Volatile.Write says the following:
Writes the specified object reference to the specified field. On
systems that require it, inserts a memory barrier that prevents the
processor from reordering memory operations as follows: If a read or
write appears before this method in the code, the processor cannot
move it after this method.
and
value T
The object reference to write. The reference is written
immediately so that it is visible to all processors in the computer.
But it seems like quotes 1 and 2 are contradictory.
For the second quote to be true, I would think that the first quote would have to be changed as follows:
If a read or
write appears before after this method in the code, the processor cannot
move it after before this method.
Does Volatile.Write actually mean that other threads are guaranteed to pick up the write in a timely fashion, or is the second quote misleading?
It seems to me as though all these "Volatile"/"Memory Barriers" seem to be focused on is ensuring that if writes are exposed to other threads they are exposed in the correct order, but I can't seem to find what actually would be force them to be exposed.
I understand that it may be hard/impossible to expose writes to other threads immediately, but without volatile writes/reads there are cases when the writes are exposed never. So it seems there must be a way to ensure that writes are exposed "eventually", but I'm unsure what that is. Is it that writes are always exposed in .NET but reads can be cached? And if so does Volatile.Read stop this caching behaviour?
(Note I have read through Joseph Albahari's Threading in C# which tends to suggest I need explicit memory barriers before my reads and after my writes, although it's not clear why even that should be effective as the documentation for Thread.MemoryBarrier doesn't seem to explicitly say that the writes are shown to other threads).
You are misunderstanding the concept of barriers a little bit. As you wrote
The object reference to write. The reference is written immediately so that it is visible to all processors in the computer.
So the really important unit here is a processor, not thread.
So, there are processors, processor caches, store buffers and invalidation queues involved.
When a processor writes something into the memory, it looks like that or similar to that
The subject is at the store buffer level. As you can see, there are a lot of things is going on when you write something or read, and it does not happen instantly for all the processors in the system. At the beginning a read or write command is places into processor's store buffer, and those commands could be reordered, in other words, executed in different order by the processor.
While that happens, other processors don't know about changes, if the operation is write and the currently working processor doesn't know about changes other processors made.
When you place a barrier, that means that operations in the store buffer or invalidation queue should be completed before any read or write could be performed. That is necessary to actualize CPU caches across processors. So there is basically no mechanics to synchronize any data across threads, we are syncing data across processors.
When a thread A writes something on processor 1 and thread B reads something on the processor 1, they both starts by looking into the store buffer first, so they read actual data, whether any barriers placed or not.
It's just an overview of the mechanic involved, maybe I'm wrong in some details. You can find complete info if you read about MESI protocol, this PDF with explanation on invalidation queues and store buffers
I agree with you that the description in the MSDN documentation is bit confusing. I would say that "immediately" is strong word here as well as in regards to any subject related to parallel processes. The result won't be visible immediately but documentation doesn't say that - it says that the value will be written immediately, that is as soon as all prior load/store operation results become globally visible the store operation to write a value will be immediately initiated.
As for the memory barriers, they only can give a guarantee of prior operations exposure (global visibility) because in essence the memory barriers are instructions which are encountered by a CPU make the CPU "wait" for getting all pending load/store operations globally visible while the moment of global visibility of value written by Volatile.Write is neither barrier nor Volatile.Write concern.
Now about suggestion to use the barrier in lock-free programming. Of course it makes sense because it ensures the order of global visibility which is actual for multi-core systems. When you cannot be sure that an event B always happens after event A you just can't build reliable logic supposed to be executed in multi-core environemnts.
I'm maintaining a legacy application that uses strings to lock values in a cache. It does so something like this:
object Cache(string key, Func<object> createObjToCache)
{
object result = Get(key);
if (result == null)
{
string internKey = string.Intern(key);
lock (internKey) {
result = Get(key);
if (result == null)
{
result = createObjToCache();
Add(key, result);
}
}
}
return result;
}
I've two questions about this code. First is string.Intern() thread safe? Is it possible that two threads on two separate CPUs with two identical strings would return different references? If not is that a possible bottle neck, does string.Intern block?
Secondly I'm concerned that this application might be using a huge number of strings as keys. I'd like to be able to monitor the amount of memory that the intern pool uses to store all these strings, but I can't find a performance counter for this on .Net Memory. Is there one somewhere else?
NOTE:
I'm aware that this implementation sucks. However I need to make the case to management before re-writing what they see as a critical bit of code. Hence I could use facts and stats on exactly how bad it sucks rather than alternative solutions.
Also Get() and Add() are not in the original code. I've replaced the original code to keep this question simple. We can assume that Add() will not fail if it is called twice with the same or different keys.
MSDN does not make any mention of thread-safety on string.Intern, so you're right in that it is very undefined what would happen if two threads called Intern for a new key at exactly the same time. I want to say "it'll probably work OK", but that isn't a guarantee. There is no guarantee AFAIK. The implementation is extern, so peeking at the implementation means looking at the runtime itself.
Frankly, there are so many reasons not to do this that it is hard to get excited about answering these specific questions. I'd be tempted to look at some kind of Dictionary<string,object> or ThreadSafeDictionary<string,object> (where the object here is simply a new object() that I can use for the lock) - without all the issues related to string.Intern. Then I can a: query the size, b: discard it at whim, c: have parallel isolated containers, etc.
First is string.Intern() thread safe?
Unless something has changed (my info on this is quite old, and I'm not curious enough to take a look at the current implementation), yes. This however is about the only good thing with this idea.
Indeed, it's not fully a good thing. string.Intern() locks globally which is one of the things that can make it slow.
Secondly I'm concerned that this application might be using a huge number of strings as keys.
If that cache lives forever then that's an issue (or not if the memory use is sufficiently low) whether you intern or not. In which case have the wrong approach to the right potential issue to investigate:
I'd like to be able to monitor the amount of memory that the intern pool uses to store all these strings,
If they weren't interned but still lived forever in that cache, then if you stopped interning, you'd still be the same that amount of memory for the strings themselves, and the extra memory overhead of the interning wouldn't really be the issue.
There are a few reasons why one might want to intern a key, and not all of them are even bad (if the strings being interned are going to all appear regularly throughout the lifetime of the application then interning could even reduce memory use), but it seems here that the reason is to make sure that the key locked on is the same instance that another attempt to use the same string would use.
This might be thread safety at the wrong place, if Add() isn't thread-safe enough to guarantee that two simultaneous insertions of different keys can't put it into an invalid state (if Add() isn't explicitly thread-safe, then it does not make this guarantee).
If the cache is threadsafe, then this is likely extra thread safety for no good reason. Since objToCache has already been created and races will result in one being thrown away, it might be fine to let them race and have a brief period of two objToCache existing before one is collected. If not then MemoryCache.AddOrGetExisting or ConcurrentDictionary.GetOrAdd deal with this issue much better than this.
In C#, we know that a bool is atomic - then why is it valid to mark it as volatile? what is the difference and what is a good (or even practical) use-case for one versus the other?
bool _isPending;
Versus
volatile bool _isPending; // Is this realistic, or insanity?
I have done some reading here and here, and I'm trying to ensure that I fully understand the inner workings of the two. I want to understand when it is appropriate to use one vs the other, or if just bool is enough.
In C#, we know that a bool is atomic - then why is it valid to mark it as volatile? what is the difference and what is a good (or even practical) use-case for one versus the other?
The supposition of your question is that you believe that volatile makes an access atomic. But volatility and atomicity are completely different things, so stop conflating them.
Volatility is the property that the compiler and runtime are restricted from making certain optimizations involving moving reads and writes of variables forwards and backwards in time with respect to each other, and more generally, with respect to other important events such as starting and stopping threads, running constructors, and so on. Consult the C# specification for a detailed list of how operations may or may not be re-ordered with respect to visible side effects.
Atomicity is the property that a particular operation can only be observed as not started or totally completed, and never "halfway done".
As you can see from the definitions, those two things have nothing whatsoever to do with each other.
In C#, all accesses to references, bools, and integer types of size 4 and smaller are guaranteed to be atomic.
Now, in C# there is some slight non-orthogonality between atomicity and volatility, in that only fields of atomic types may be marked as volatile. You may not make a volatile double, for example. It would be really weird and dangerous to say "we're going to restrict how reads and writes may be optimized but still allow tearing". Since volatility does not cause atomicity, you don't want to put users in a position of thinking that an operation is atomic just because it is also volatile.
You should read my series of articles that explains in far more detail what the differences between these things are, and what volatile actually does, and why you do not understand nearly enough to be using it safely.
https://ericlippert.com/2011/05/26/atomicity-volatility-and-immutability-are-different-part-one/
https://ericlippert.com/2011/05/31/atomicity-volatility-and-immutability-are-different-part-two/
https://ericlippert.com/2011/06/16/atomicity-volatility-and-immutability-are-different-part-three/
https://web.archive.org/web/20160323025740/http://blog.coverity.com/2014/03/12/can-skip-lock-reading-integer/
If you think you understand volatility after reading all that, I invite you to try to solve the puzzle I pose here:
https://web.archive.org/web/20160729162225/http://blog.coverity.com/2014/03/26/reordering-optimizations/
If there are updates to variables in the preceding or subsequent code and the order in which the updates occurs is critical, then marking the field as volatile will ensure that an update to that field will happen after any previous updates and before any subsequent updates.
In other words, if _isPending is volatile, then the compiler will not cause these instructions to execute in a different order:
_someVariable = 10;
_isPending = true;
_someOtherVariable = 5;
Whether multi-threaded or not, if we've written code that breaks depending on whether these updates in adjacent lines occur in the specified order then something is wrong. We should ask why that sequence matters. (If there is a scenario where that matters, imagine trying to explain it in a comment so that no one makes a breaking change to the code.)
To nearly anyone reading the code above it would appear that the order of those operations doesn't matter at all. If they do matter that means that someone else who reads our code can't possibly understand what's going on. They could do some refactoring, reorder those lines of code, and break everything without knowing it. It might even work when they test it and then fail unpredictably and inconsistently when it's deployed.
I agree with Eric Lippert's comment in the answer you linked:
Frankly, I discourage you from ever making a volatile field. Volatile
fields are a sign that you are doing something downright crazy: you're
attempting to read and write the same value on two different threads
without putting a lock in place.
I suppose I failed to directly answer the direction. volatile is valid for a type (including bool) because it's possible to perform an atomic operation on that type. volatile protects from compiler optimizations. According to the documentation for volatile,
This ensures that the most up-to-date value is present in the field at
all times.
But if the field can't be represented in 32 bits or less then preventing compiler optimizations can't guarantee that anyway.
I'm using such configuration:
.NET framework 4.5
Windows Server 2008 R2
HP DL360p Gen8 (2 * Xeon E5-2640, x64)
I have such field somewhere in my program:
protected int HedgeVolume;
I access this field from several threads. I assume that as I have multi-processor system it's possible that this threads are executing on different processors.
What should I do to guarantee that any time I use this field the most recent value is "read"? And to make sure that when I "write" value it become available to all other threads immediately?
What should I do?
just leave field as is.
declare it volatile
use Interlocked class to access the field
use .NET 4.5 Volatile.Read, Volatile.Write methods to access the field
use lock
I only need simplest way to make my program work on this configuration I don't need my program to work on another computers or servers or operation systems. Also I want minimal latency so I'm looking for fastest solution that will always work on this standard configuration (multiprocessor intel x64, .net 4.5).
Your question is missing one key element... How important is the integrity of the data in that field?
volatile gives you performance, but if a thread is currently writing changes to the field, you won't get that data until it's done, so you might access out of date information, and potentially overwrite changes another thread is currently doing. If the data is sensitive, you might get bugs that would get very hard to track. However, if you are doing very quick update, overwrite the value without reading it and don't care that once in a while you get outdated (by a few ms) data, go for it.
lock guaranty that only one thread can access the field at a time. You can put it only on the methods that write the field and leave the reading method alone. The down side is, it is slow, and may block a thread while another is performing its task. However, you are sure your data stay valid.
Interlock exist to shield yourself from the scheduler context switch. My opinion? Don't use it unless you know exactly why you would be using it and exactly how to use it. It gives options, but with great options comes great problematic. It prevents a context switch while a variable is being update. It might not do what you think it does and won't prevent parallel threads from performing their tasks simultaneously.
You want to use Volatile.Read().
As you are running on x86, all writes in C# are the equivalent of Volatile.Write(), you only need to use this for Itanium.
Volatile.Read() will ensure that you get the latest copy regardless of which thread last wrote it.
There is a fantastic write up here, C# Memory Model Explained
Summary of it includes,
On some processors, not only must the compiler avoid certain
optimizations on volatile reads and writes, it also has to use special
instructions. On a multi-core machine, different cores have different
caches. The processors may not bother to keep those caches coherent by
default, and special instructions may be needed to flush and refresh
the caches.
Hopefully that much is obvious, other than the need for volatile to stop the compiler from optimising it, there is the processor as well.
However, in C# all writes are volatile (unlike say in Java),
regardless of whether you write to a volatile or a non-volatile field.
So, the above situation actually never happens in C#. A volatile write
updates the thread’s cache, and then flushes the entire cache to main
memory.
You do not need Volatile.Write(). More authoratitive source here, Joe Duffy CLR Memory Model. However, you may need it to stop the compiler reordering it.
Since all C# writes are volatile, you can think of all writes as going
straight to main memory. A regular, non-volatile read can read the
value from the thread’s cache, rather than from main
You need Volatile.Read()
When you start designing a concurrent program, you should consider these options in order of preference:
1) Isolation: each thread has it's own private data
2) Immutability: threads can see shared state, but it never changes
3) Mutable shared state: protect all access to shared state with locks
If you get to (3), then how fast do you actually need this to be?
Acquiring an uncontested lock takes in the order of 10ns ( 10-8 seconds ) - that's fast enough for most applications and is the easiest way to guarantee correctness.
Using any of the other options you mention takes you into the realm of low-lock programming, which is insanely difficult to get correct.
If you want to learn how to write concurrent software, you should read these:
Intro: Joe Albahari's free e-book - will take about a day to read
Bible: Joe Duffy's "Concurrent Programming on Windows" - will take about a month to read
Depends what you DO. For reading only, volatile is easiest, interlocked allows a little more control. Lock is unnecessary as it is more ganular than the problem you describe. Not sure about Volatile.Read/Write, never used them.
volatile - bad, there are some issues (see Joe Duffy's blog)
if all you do is read the value or unconditionally write a value - use Volatile.Read and Volatile.Write
if you need to read and subsequently write an updated value - use the lock syntax. You can however achieve the same effect without lock using the Interlocked classes functionality, but this is more complex (involves CompareExchange s to ensure that you are updating the read value i.e. has not been modified since the read operation + logic to retry if the value was modified since the read).
From this i can understand that you want to be able to read the last value that it was writtent in a field. Lets make an analogy with the sql concurency problem of the data. If you want to be able to read the last value of a field you must make atomic instructions. If someone is writing a field all of the threads must be locked for reading until that thread finished the writing transaction. After that every read on that thread will be safe. The problem is not with reading as it is with writing. A lock on that field whenever its writtent should be enough if you ask me ...
First have a look here: Volatile vs. Interlocked vs. lock
The volatile modifier shurely is a good option for a multikernel cpu.
But is this enough? It depends on how you calculate the new HedgeVolume value!
If your new HedgeVolume does not depend on current HedgeVolume then your done with volatile.
But if HedgeVolume[x] = f(HedgeVolume[x-1]) then you need some thread synchronisation to guarantee that HedgeVolume doesn't change while you calculate and assign the new value. Both, lock and Interlocked szenarios would be suitable in this case.
I had a similar question and found this article to be extremely helpful. It's a very long read, but I learned a LOT!
Is it necessary to acquire a lock on a variable before reading it from multiple threads?
The short answer is: it depends.
The long answer is:
If it is not a shared value, i.e, only one thread can see it (or use it), you don't need any synchronization.
If it is an immutable value, i.e., you set it only once and then only ever read, it is safe to do so without synchronization (as long as you don't start reading before the first write completes).
If it is a "primitive" type of at most 32-bits (e.g. byte, short, int) you can get stale (old) data when reading. If that doesn't bother you, you're set. If stale data is undesirable, making the variable volatile can fix this problem without additional synchronization for reads. But if you have racing writers, you will need to follow the same advice as for longs below.
If it is a "primitive" type longer than 32-bits (e.g. long, decimal, double) you need synchronization, otherwise you could read "half" of one value, "half" of another, and get crazy results. For this the recommended approach is to use the methods in the Interlocked class, for both reads and writes..
If it is a reference type, you will need synchronization to avoid seeing an invalid state (Jeff Lamb's picture example is a good one). The lock statement might be enough for that. Again, you need to lock for both reads and writes.
There are some other points to consider (how long to lock, for example), but I think these are enough to answer your question.
It depends on the type of variable and your platform. For example, reading Int64s is not guaranteed to be atomic on 32 bit machines. Hence, Interlocked.Read.
If the loading of the value is done in 1 assembly instruction, it's not necessary to get a lock. You don't care if the value changed 10 minutes ago or 1 microsecond ago. You just want the value now.
However, if you're loading a HUGE array or picture or something, it'd probably be a good idea to lock it out. In theory, you can get preempted while loading the data and have half of the first item and half of the second item.
If it's a simple variable, though, like a bool or int, it's not necessary.
In adition to the answers below you can also do a read lock using the ReadWriterLockSlim.
That would allow you to do only a read lock when reading and a write lock when modifying your variable. Multiple threads can have a read lock at the same time but as soon as a thread requests a write lock all new request are blocked until it is complete.
This sort of locking would be usefull if you are doing alot of reads and not many writes.
As with most multithreading issues, research it enough to understand if it really fits your problem the ReadWriterLock would not be suitable for every locking situation.
It depends on whether or not is it a local or shared variable, and whether something else may write to it in the meantime, and what you're going to do after reading it.
If you make a decision based on the variable, consider that the next line of code may then be based on data which is now stale.
Answer is it depends. If the value of the variable does not change when the threads are accessing the variable. otherwise, its needed.
Also, You can use Interlocked.XXX series for maintaining atomicity in reading\writing the variable .
Reading does not require a lock; as long as you don't care about the 'correctness' of the read. It is only dangerous if you attempt to write without a lock.
If it is a constant, no.
If it is an updatable value, yes, if you need consistency.
For updatable values where the exact value must be managed, then yes, you should use a lock or other synchronization method for reads and writes; and perhaps block during the entire scope where the value is used.
It is 100% necessary unless you are 100% sure that the variable's value won't change while the reader threads are running.
Necessary? No.
...but if it's possible that another thread could try to write to it during the read (like a collection, etc.) then it might be a good idea.
As long as it doesn't change during others threads execution you don't need to lock it.
If change, you should use it.
If the variable is never written to by someone (at least at the time it is accessible), you don't need to lock it, because there are no possibilities for missed updates. The same goes if you don't care about missed updates (meaning it is not a problem if you get an older value). Otherwise you should use some sort of synchronization