My question may be newbie or duplicate, but i wonder what is happening when several threads try to read a static variable at the same time. I'm not interesting in synchronization now, i just want to know are they reading it instantly or by turn?
UPDATE:
my question is more in domain of physics or smth like that(= if it is the same moment of time when threads read the variable.
If a value of variable does not change (any thread does not write a value) so read by multiple threads would be a safe operation and does not require an additional synchronization like locking. Otherwise you have to consider locking for write access operations.
UPDATE: Regarding question update
Physically in scope of a single core CPU only one instruction (simplified, ignore CPU pipelines) could be executed so no chance to access the same memory location in a same quant of a time.
They can't be accessing it truly simultaneously - at some point the CPU will be sequencing the reads.
If it is a static type that is read in one go a processor core (on all platforms) then it is an atomic operation. If it is a larger type that takes more than one operation to read or write then it is not atomic and you could read dodgy values that are a product of another thread changing it partially while you are reading/writing it.
Related
I searched for a answer and I know at a high level how to use lock etc. in a multithreading environment. This question has bugged me for a long time and I think I am not the only one. TL;DR at the end.
In my case I want to prevent a method which gets called from multiple threads to be executed while it is being called by another thread.
Now a normal lock scenario would look like this in C#:
static readonly object _locker = new object();
private static int counter;
public void Increase()
{
lock (_locker)
{
_counter++;//Do more than this here
}
}
As I understand it, the object _locker acts as a bool, which indicates if the method is currently being executed. If method is "free", set it to locked and execute method and free afterwards. If method is locked, wait until unlocked, lock, execute and unlock.
Side question 1: Does calling this method repeatedly guarantee a queue like behavior? Ignoring the fact that the blocking in the parent thread could cause problems. Imagine Increase() is the last call in the parent thread.
Side question 2: Using an object in a boolean way feels odd. Does every object contain a "is-being-used" flag and a "raw" object is just being used for containing this flag? Why not Boolean?
Side question 3: how can lock() modify a readonly?
Functionally I could write it like this also:
static Boolean _locker = false;
private static int counter;
public void Increase()
{
while(_locker)//Waits for lock==0
{
}
_locker = true;//Sets lock=1
_counter++;//Do more than this here
_locker = false;//Sets lock=0
}
While the lock example looks sophisticated and safe, the second one just feels wrong, and somehow rings alarm bells in my head.
Is it possible for this method to be executed at the exact same CPU cycle by two cores simultaneously?
I know this is "but sometimes" taken to the extreme. I believe an OS-scheduler does split threads from one application to multiple cores, so why shouldn't the assembly instruction "Load value of _locked for comparison" be executed on two cores at the same time? Even if the method is entered one cycle apart the "read for comparison" and "write true to _locked" would be executed at the same time.
This doesn't even take into account that one line of C# could/will translate to multiple assembly instructions and a thread could be interrupted after confirming locked==0 and writing locked=1. Because one line of C# can result in many assembly instructions, even the lock() could be interrupted?
Obviously these problems are somehow solved or avoided, I would really appreciate a explanation of where my thought process is wrong or what I am missing.
TL;DR Can a lock() statement be executed at the exact same time by two CPU cores? I can't explain avoidance of this by software without big performance impacts.
Yes, two cores can take two different locks at the same time. The atomic RMW operation only needs a "cache lock", not a global bus lock, on modern CPUs. e.g. this test code (on Godbolt) is C++ code that compiles to a loop that just repeats an xchg [rdi], ecx, with each thread using a different std::atomic<int> object in a different cache line. The total runtime of the program on my i7-6700k is 463ms whether it runs on 1 or 4 threads, so that rules out any kind of system-wide bus lock, confirming that the CPU just uses a MESI cache-lock within the core doing the RMW to make sure it's atomic without disturbing operations of other cores. Uncontended locks scale perfectly when each thread is only locking/unlocking its own lock repeatedly.
Taking a lock that was last released by another core will stall this one for maybe hundreds of clock cycles (40 to 70 nanoseconds is a typical inter-core latency) for the RFO (Read For Ownership) to complete and get exclusive ownership of the cache line, but won't have to retry or anything. Atomic RMW involves a memory barrier (on x86), so memory operations after the lock can't even get started, so the CPU core may be stalled for a while. There is significant cost here, compared to normal loads/stores, which out-of-order exec can't hide as well as some other things.
No, two cores can't take the same lock at the same time1, that's the whole point of a mutex. Correctly-implemented ones don't have the same bug as your example of spin-wait and then separately store a true.
(Note 1: There are counted locks / semaphores that you can use to allow up to n threads into a critical section, for some fixed n, where the resource management problem you want to solve is something other than simple mutual exclusion. But you're only talking about mutexes.)
The critical operation in taking a lock is an atomic RMW, for example x86 xchg [rcx], eax or lock cmpxchg [rcx], edx, that stores a 1 (true) and as part of the same operation checks what the old value was. (Can num++ be atomic for 'int num'?). In C++, that would mean using std::atomic<bool> lock; / old = lock.exchange(true); In C#, you have Interlocked.Exchange(). That closes the race window your attempt contained, where two threads could exit the while(_locker){} loop and then both blindly store a _locker = true.
Also note that rolling your own spin-loop has problems if you don't use volatile or Volatile.Read() to stop the compiler from assuming that no other threads are writing a variable you're reading/writing. (Without volatile, while(foo){} can optimize into if(!foo) infinite_loop{} by hoisting the apparently loop-invariant load out of the loop).
(The other interesting part of implementing a lock is what to do if it's not available the first time you try. e.g. how long you keep spinning (and if so exactly how, e.g. the x86 pause instruction between read-only checks), using CPU time while waiting, before falling back to making a system call to give up the CPU to another thread or process, and have the OS wake you back up when the lock is or might be available again. But that's all performance tuning; actually taking the lock revolves around an atomic RMW.)
Of course, if you're going to do any rolling-your-own, make the increment itself a lock-free atomic RMW with Interlocked.Increment(ref counter);, as per the example in MS's docs
Does every object contain a "is-being-used" flag and a "raw" object is just being used for containing this flag ? Why not Boolean ?
We know from object sizes that C# doesn't do that. Probably you should just use lock (counter){ counter++; } instead of inventing a separate. Using a dummy object would make sense if you didn't have an existing object you wanted to manage, but instead some more abstract resource like calling into some function. (Correct me if I'm wrong, I don't use C#; I'm just here for the cpu-architecture and assembly tags. Does lock() require an object, not a primitive type like int?)
I'd guess that they instead do what normal C++ implementations of std::atomic<T> does for objects too large to be lock-free: a hash table of actual mutexes or spinlocks, indexed by C# object address. Where is the lock for a std::atomic?
Even if that guess isn't exactly what C# does, that's the kind of mental model that can make sense of this ability to lock anything without using reserving space in every object.
This can create extra contention (by using the same mutex for two different objects). It could even introduce deadlocks where there shouldn't have been any, which is something the implementation would have to work around. Perhaps by putting the identity of the object being locked into the mutex, so another thread that indexes the same mutex can see that it's actually being used to lock a different object, and then do something about it... This is perhaps where being a "managed" language comes in; Java apparently does the same thing where you can lock any object without having to define a separate lock.
(C++ std::atomic doesn't have this problem because the mutexes are taken/released inside library functions, with no possibility to try to take two locks at the same time.)
Do CPU cores tick at the same time?
Not necessarily, e.g. Intel "server" chips (most Xeons) let each core control its frequency-multiplier independently. However, even in a multi-socket system, the clock for all cores is normally still derived from the same source, so they can keep their TSC (which counts reference cycles, not core clocks) synced across cores.
Intel "client" chips, like desktop/laptop chips such as i7-6700, actually do use the same clock for all cores. A core is either in low-power sleep (clock halted) or running at the same clock frequency as any other active cores.
None of this has anything to do with locking, or making atomic RMW operations truly atomic, and probably should be split off to a separate Q&A. I'm sure there are plenty of non-x86 examples, but I happen to know how Intel CPUs do things.
I want to protect access to a resource in the following manner:
All threads can read concurrently, except during update (if the update is not atomic).
Only one thread can be assigned the task of updating, until next time an
update is required.
This may seem like a simple question of using a proper lock, or possibly making all operations atomic, but that is not it, I think.
If I just have a write-lock for updating (i.e. ReaderWriterLockSlim), or use non-locking code, nothing prevents more than one thread from running the update procedure (or queuing up to do so). If I use locking to block threads before checking if the resource need updating, they can't execute concurrently but are effectively serialized.
I could have specific threads performing all of the checking and updating of the resource, and utilize something like a ManualResetEvent to put other reading threads on hold until updating is finished. (Or if the updating is implemented as an atomic operation, just settle for having specific update threads.)
However, I'm uncertain about best practice, and I would like to ask if you think that the requirements may be met with less effort, or if I'm way off in any of my assumptions.
I think you are looking for a ReaderWriterLockSlim. Use the exclusive lock mode for writing.
A few questions about accessing a local variable from multiple threads at once:
I have multiple threads writing and reading the value of a variable, should i synchronize access to it or not?
The variable is being updated every few seconds from Thread1 and its being read and updated to the Database every few seconds from Thread2.
Which problems can occur if i don't hold any logic and don't have any concurrency issues?
Should i use volatile for this?
EDIT:
I would like to emphasize that i don't have any concurrency issues. Here's my specific scenarion:
a. My variable's name is pingLatency and it measures ping latency
b. Thread1 is sending a ping to 8.8.8.8 each 10 seconds and writes the latency to pingLatency
c. Thread2 updates a correcposing field with the value of pingLatency each 10 seconds.
d. Thread2 updates the same database row each time.
Now, i'm using this database field to monitor network connectivity. My question is - Can there be a situation where the variable is not updated or it would throw an exception due to thread safety issues? I want to avoid using lock because it just seems like an overkill.
What do you think?
Yes you should synchronize access to it, if it is a primitive type there are methods to do this for you without locks
no comment
not sure by what you mean by this... most likely you'll end up inserting the wrong value into the DB
Don't use volatile, per Eric Lippert, it's overly complicated and the semantics are very weird.
Be careful of breaking the memory model, C# by and large follows most other languages in using sequential consistency for data-race-free programs (SC-DRF). Volatile breaks this, so just use locks to prevent a data race.
As for lock it's not as heavy as one might imagine, in most cases the lock won't be contended in the scenario you imagine. So acquiring the lock should be painless in most cases.
If you want .NET managed parallelism use the built in good stuff. Task Parallelism. This will manage the threads for you and you can use the thread safe variables that are built in just as an array/list would be equal to ConcurrentBag, etc.
If access to your variable is atomic and there are no logical problems you are OK.
according to this you can know if you are using an atomic variable.
I need to write a class that hold as Dictionary. The Dictionary will be accessed by multiple threads.
Each access will be very short.
I expect:
every minute an access that adds or removes entries
Every 2 sec. I need to create a copy of that dict. to run checks over it (and then e.g. call the DB)
Multiple times per second I'm updated a field of the value to one of the keys. (value is a struct) The same key will not be accessed concurrently.
Which locking strategy would you choose and why?
My first shot is to use ReaderWriterLockSlim. But after I read it's a least two times slower than Monitor, I'm not so sure anymore, as everytime I access the dict, I'm going to lock very short.
tia
Martin
Given that the most frequent operation is writing and you never need multiple concurrent readers as far as I can see, I would just use a normal lock. I can't see that ReaderWriterLockSlim would help you, and it will certainly give you more complex code.
EDIT: One possible optimization is to just access a list of changes from the writing threads. The reading thread would then need to lock that list just while it updates the underlying dictionary before processing it. Assuming the dictionary is very large but the list of changes is relatively small, that means the writing threads will be blocked for a lot less time.
In fact, you could use something like Queue instead of a list, and potentially make the reading thread dequeue small batches and yield, reducing the latency of the writing threads even further - so if there were 100 changes to process, the reading thread might read 10 of them, yield so that any writing threads blocked waiting to add to the queue could get their turn.
There are ever more complex solutions to this, depending on how critical the performance is, particularly in terms of latency - but I'd start off with just a lock round the dictionary.
Please explain what are the main differences and when should I use what.
The focus on web multi-threaded applications.
lock allows only one thread to execute the code at the same time. ReaderWriterLock may allow multiple threads to read at the same time or have exclusive access for writing, so it might be more efficient. If you are using .NET 3.5 ReaderWriterLockSlim is even faster. So if your shared resource is being read more often than being written, use ReaderWriterLockSlim. A good example for using it is a file that you read very often (on each request) and you update the contents of the file rarely. So when you read from the file you enter a read lock so that many requests can open it for reading and when you decide to write you enter a write lock. Using a lock on the file will basically mean that you can serve one request at a time.
Consider using ReaderWriterLock if you have lots of threads that only need to read the data and these threads are getting blocked waiting for the lock and and you don’t often need to change the data.
However ReaderWriterLock may block a thread that is waiting to write for a long time.
Therefore only use ReaderWriterLock after you have confirmed you get high contention for the lock in “real life” and you have confirmed you can’t redesign your locking design to reduce how long the lock is held for.
Also consider if you can't rather store the shared data in a database and let it take care of all the locking, as this is a lot less likely to give you a hard time tracking down bugs, iff a database is fast enough for your application.
In some cases you may also be able to use the Aps.net cache to handle shared data, and just remove the item from the cache when the data changes. The next read can put a fresh copy in the cache.
Remember
"The best kind of locking is the
locking you don't need (i.e. don't
share data between threads)."
Monitor and the underlying "syncblock" that can be associated with any reference object—the underlying mechanism under C#'s lock—support exclusive execution. Only one thread can ever have the lock. This is simple and efficient.
ReaderWriterLock (or, in V3.5, the better ReaderWriterLockSlim) provide a more complex model. Avoid unless you know it will be more efficient (i.e. have performance measurements to support yourself).
The best kind of locking is the locking you don't need (i.e. don't share data between threads).
ReaderWriterLock allows you to have multiple threads hold the ReadLock at the same time... so that your shared data can be consumed by many threads at once. As soon as a WriteLock is requested no more ReadLocks are granted and the code waiting for the WriteLock is blocked until all the threads with ReadLocks have released them.
The WriteLock can only ever be held by one thread, allow your 'data updates' to appear atomic from the point of view of the consuming parts of your code.
The Lock on the other hand only allows one thread to enter at a time, with no allowance for threads that are simply trying to consume the shared data.
ReaderWriterLockSlim is a new more performant version of ReaderWriterLock with better support for recursion and the ability to have a thread move from a Lock that is essentially a ReadLock to the WriteLock smoothly (UpgradeableReadLock).
ReaderWriterLock/Slim is specifically designed to help you efficiently lock in a multiple consumer/ single producer scenario. Doing so with the lock statement is possible, but not efficient. RWL/S gets the upper hand by being able to aggressively spinlock to acquire the lock. That also helps you avoid lock convoys, a problem with the lock statement where a thread relinquishes its thread quantum when it cannot acquire the lock, making it fall behind because it won't be rescheduled for a while.
It is true that ReaderWriterLockSlim is FASTER than ReaderWriterLock. But the memory consumption by ReaderWriterLockSlim is outright outrageous. Try attaching a memory profiler and see for yourself. I would pick ReaderWriterLock anyday over ReaderWriterLockSlim.
I would suggest looking through http://www.albahari.com/threading/part4.aspx#_Reader_Writer_Locks. It talks about ReaderWriterLockSlim (which you want to use instead of ReaderWriterLock).