Parallelism and Synchronization

Parallelism and Synchronization - c#

Imagine you are utilizing Parallelism in a multi-core system.
Is it not completely possible that the same instructions may be executed simultaneously?
Take the following code:
int i = 0;
if( blockingCondition )
{
lock( objLock )
{
i++;
}
}
In my head, it seems that it is very possible on a system with multiple cores and parallelism that the blockingCondition could be checked at precisely the same moment, leading to the lock being attempted at the same moment, and so on...Is this true?
If so, how can you ensure synchronization across processors?
Also, does the .net TPL handle this type of synchronization? What about other languages?
EDIT
Please note that this is not about threads, but Tasks and Parallel-Processisng.
EDIT 2
OK, thanks for the information everyone. So is it true that the OS will ensure that writing to memory is serialized, ensuring multi-core synchronization via volatile reads?

To understand why this works, bear in mind:
Locking a lock (i.e. incrementing a
lock semaphore on the object) is an
operation that blocks if the object
is already locked.
The two steps of lock, a) checking the lock
semaphore is free, b) and actually
locking the object, are performed
'simultaneously' - i.e. they are a
monolithic or atomic operation as
far as relationship between CPU and
memory is concerned.
Therefore, you can see that, if 2 threads enter your if-block, one of the two threads will acquire the lock, and the other will block until the first one has finished the if.

A lock like you have described here is a "Monitor" style lock on the objLock. As you've noted, it is entirely possible, under a multi-core system, for the two "lock" calls to begin simultaneously. However, any high level application environment which uses monitors will have translated the monitor into semaphore requests (or, depending on your OS and language particulars, mutex requests) in the compiled byte code.
Semaphores are implemented at the operating system and/or hardware level, and higher level languages bind to them. At the OS level, they are "guaranteed" to be atomic. That is, any program acquiring a semaphore is guaranteed to be the only one doing so at that point in time. If two programs, or two threads within a program attempt to acquire the lock at the same time, one will go first (and succeed), and the other will go second (and fail).
At this point, the "how do you ensure synchronisation" stops being a problem for the application programmer to worry about, and starts being a problem for the operating system designer and the hardware designer.
The upshot of it is, as an application coder, you can safely assume that "lock(objLock)" will be an atomic call no matter how many CPUs you plug into your system.

Your concern is precisely why we need a special mechanism like lock and cannot simply use a boolean flag.
The solution to your 'simultaneous' problem is in the algorithm that lock (which calls Monitor.Enter()) uses. It involves memory barriers and knowledge of the very lowlevel memory mechanics to ensure that no 2 threads can acquire the lock at the same time.
Note: I'm talking about .NET only, not Java.

Related

Is it possible for two lock() statements to be executed at the same time on two CPU cores ? Do CPU cores tick at the same time?

I searched for a answer and I know at a high level how to use lock etc. in a multithreading environment. This question has bugged me for a long time and I think I am not the only one. TL;DR at the end.
In my case I want to prevent a method which gets called from multiple threads to be executed while it is being called by another thread.
Now a normal lock scenario would look like this in C#:
static readonly object _locker = new object();
private static int counter;
public void Increase()
{
lock (_locker)
{
_counter++;//Do more than this here
}
}
As I understand it, the object _locker acts as a bool, which indicates if the method is currently being executed. If method is "free", set it to locked and execute method and free afterwards. If method is locked, wait until unlocked, lock, execute and unlock.
Side question 1: Does calling this method repeatedly guarantee a queue like behavior? Ignoring the fact that the blocking in the parent thread could cause problems. Imagine Increase() is the last call in the parent thread.
Side question 2: Using an object in a boolean way feels odd. Does every object contain a "is-being-used" flag and a "raw" object is just being used for containing this flag? Why not Boolean?
Side question 3: how can lock() modify a readonly?
Functionally I could write it like this also:
static Boolean _locker = false;
private static int counter;
public void Increase()
{
while(_locker)//Waits for lock==0
{
}
_locker = true;//Sets lock=1
_counter++;//Do more than this here
_locker = false;//Sets lock=0
}
While the lock example looks sophisticated and safe, the second one just feels wrong, and somehow rings alarm bells in my head.
Is it possible for this method to be executed at the exact same CPU cycle by two cores simultaneously?
I know this is "but sometimes" taken to the extreme. I believe an OS-scheduler does split threads from one application to multiple cores, so why shouldn't the assembly instruction "Load value of _locked for comparison" be executed on two cores at the same time? Even if the method is entered one cycle apart the "read for comparison" and "write true to _locked" would be executed at the same time.
This doesn't even take into account that one line of C# could/will translate to multiple assembly instructions and a thread could be interrupted after confirming locked==0 and writing locked=1. Because one line of C# can result in many assembly instructions, even the lock() could be interrupted?
Obviously these problems are somehow solved or avoided, I would really appreciate a explanation of where my thought process is wrong or what I am missing.
TL;DR Can a lock() statement be executed at the exact same time by two CPU cores? I can't explain avoidance of this by software without big performance impacts.

Yes, two cores can take two different locks at the same time. The atomic RMW operation only needs a "cache lock", not a global bus lock, on modern CPUs. e.g. this test code (on Godbolt) is C++ code that compiles to a loop that just repeats an xchg [rdi], ecx, with each thread using a different std::atomic<int> object in a different cache line. The total runtime of the program on my i7-6700k is 463ms whether it runs on 1 or 4 threads, so that rules out any kind of system-wide bus lock, confirming that the CPU just uses a MESI cache-lock within the core doing the RMW to make sure it's atomic without disturbing operations of other cores. Uncontended locks scale perfectly when each thread is only locking/unlocking its own lock repeatedly.
Taking a lock that was last released by another core will stall this one for maybe hundreds of clock cycles (40 to 70 nanoseconds is a typical inter-core latency) for the RFO (Read For Ownership) to complete and get exclusive ownership of the cache line, but won't have to retry or anything. Atomic RMW involves a memory barrier (on x86), so memory operations after the lock can't even get started, so the CPU core may be stalled for a while. There is significant cost here, compared to normal loads/stores, which out-of-order exec can't hide as well as some other things.
No, two cores can't take the same lock at the same time1, that's the whole point of a mutex. Correctly-implemented ones don't have the same bug as your example of spin-wait and then separately store a true.
(Note 1: There are counted locks / semaphores that you can use to allow up to n threads into a critical section, for some fixed n, where the resource management problem you want to solve is something other than simple mutual exclusion. But you're only talking about mutexes.)
The critical operation in taking a lock is an atomic RMW, for example x86 xchg [rcx], eax or lock cmpxchg [rcx], edx, that stores a 1 (true) and as part of the same operation checks what the old value was. (Can num++ be atomic for 'int num'?). In C++, that would mean using std::atomic<bool> lock; / old = lock.exchange(true); In C#, you have Interlocked.Exchange(). That closes the race window your attempt contained, where two threads could exit the while(_locker){} loop and then both blindly store a _locker = true.
Also note that rolling your own spin-loop has problems if you don't use volatile or Volatile.Read() to stop the compiler from assuming that no other threads are writing a variable you're reading/writing. (Without volatile, while(foo){} can optimize into if(!foo) infinite_loop{} by hoisting the apparently loop-invariant load out of the loop).
(The other interesting part of implementing a lock is what to do if it's not available the first time you try. e.g. how long you keep spinning (and if so exactly how, e.g. the x86 pause instruction between read-only checks), using CPU time while waiting, before falling back to making a system call to give up the CPU to another thread or process, and have the OS wake you back up when the lock is or might be available again. But that's all performance tuning; actually taking the lock revolves around an atomic RMW.)
Of course, if you're going to do any rolling-your-own, make the increment itself a lock-free atomic RMW with Interlocked.Increment(ref counter);, as per the example in MS's docs
Does every object contain a "is-being-used" flag and a "raw" object is just being used for containing this flag ? Why not Boolean ?
We know from object sizes that C# doesn't do that. Probably you should just use lock (counter){ counter++; } instead of inventing a separate. Using a dummy object would make sense if you didn't have an existing object you wanted to manage, but instead some more abstract resource like calling into some function. (Correct me if I'm wrong, I don't use C#; I'm just here for the cpu-architecture and assembly tags. Does lock() require an object, not a primitive type like int?)
I'd guess that they instead do what normal C++ implementations of std::atomic<T> does for objects too large to be lock-free: a hash table of actual mutexes or spinlocks, indexed by C# object address. Where is the lock for a std::atomic?
Even if that guess isn't exactly what C# does, that's the kind of mental model that can make sense of this ability to lock anything without using reserving space in every object.
This can create extra contention (by using the same mutex for two different objects). It could even introduce deadlocks where there shouldn't have been any, which is something the implementation would have to work around. Perhaps by putting the identity of the object being locked into the mutex, so another thread that indexes the same mutex can see that it's actually being used to lock a different object, and then do something about it... This is perhaps where being a "managed" language comes in; Java apparently does the same thing where you can lock any object without having to define a separate lock.
(C++ std::atomic doesn't have this problem because the mutexes are taken/released inside library functions, with no possibility to try to take two locks at the same time.)
Do CPU cores tick at the same time?
Not necessarily, e.g. Intel "server" chips (most Xeons) let each core control its frequency-multiplier independently. However, even in a multi-socket system, the clock for all cores is normally still derived from the same source, so they can keep their TSC (which counts reference cycles, not core clocks) synced across cores.
Intel "client" chips, like desktop/laptop chips such as i7-6700, actually do use the same clock for all cores. A core is either in low-power sleep (clock halted) or running at the same clock frequency as any other active cores.
None of this has anything to do with locking, or making atomic RMW operations truly atomic, and probably should be split off to a separate Q&A. I'm sure there are plenty of non-x86 examples, but I happen to know how Intel CPUs do things.

does C# lock affect other processes on the same computer system

I'm trying to understand whether a process would interfere another process running on the same piece of hardware system. This could happen in a wild range of products. ie. vmware or as simple as running multiple .net applications.
If I have repetitive lock happening of a particular process say, interlock, or lock keywords in C# terms, will it affect the performance other processes due to its intensive usage of lock? The setting is a heavy loaded www system, and I am experience some situational delay, I would like to determine whether the delay was caused by a dense while loop of locks that was completely isolated by a different windows kernel thread.
If there is no isolation, will application domain in .net help me in this case?
Thanks for your answer

No it won't. A lock in C#, and .Net overall, is local to a process. It can't directly affect other processes on the machine.
A lock statement operates on a particular instance of an object. In order for a lock to effect multiple processes they would all have to lock on the same instance of an object. This is not possible since objects are local to a process.

I'm trying to understand whether a process would interfere another process running on the same piece of hardware system
Is there anything that lead you to this question or are you simply just imagining some scenario based on a whim?
A lock is local to the process running those threads. If you want to synchronize across processes, consider using a Semaphore.
will it affect the performance other processes due to its intensive usage of lock?
Short answer, no. Of course, unfettered and whimsical use of lock will probably lead to some live-lock/deadlock scenarios.

No, that's not going to be a problem... you're only locking your own worker process only. Other tasks have their process. While locks are useful for specific tasks I'd recommend you keep them to a minimum since you'll introduce waits in your application.

ReaderWriterLock vs lock{}

Please explain what are the main differences and when should I use what.
The focus on web multi-threaded applications.

lock allows only one thread to execute the code at the same time. ReaderWriterLock may allow multiple threads to read at the same time or have exclusive access for writing, so it might be more efficient. If you are using .NET 3.5 ReaderWriterLockSlim is even faster. So if your shared resource is being read more often than being written, use ReaderWriterLockSlim. A good example for using it is a file that you read very often (on each request) and you update the contents of the file rarely. So when you read from the file you enter a read lock so that many requests can open it for reading and when you decide to write you enter a write lock. Using a lock on the file will basically mean that you can serve one request at a time.

Consider using ReaderWriterLock if you have lots of threads that only need to read the data and these threads are getting blocked waiting for the lock and and you don’t often need to change the data.
However ReaderWriterLock may block a thread that is waiting to write for a long time.
Therefore only use ReaderWriterLock after you have confirmed you get high contention for the lock in “real life” and you have confirmed you can’t redesign your locking design to reduce how long the lock is held for.
Also consider if you can't rather store the shared data in a database and let it take care of all the locking, as this is a lot less likely to give you a hard time tracking down bugs, iff a database is fast enough for your application.
In some cases you may also be able to use the Aps.net cache to handle shared data, and just remove the item from the cache when the data changes. The next read can put a fresh copy in the cache.
Remember
"The best kind of locking is the
locking you don't need (i.e. don't
share data between threads)."

Monitor and the underlying "syncblock" that can be associated with any reference object—the underlying mechanism under C#'s lock—support exclusive execution. Only one thread can ever have the lock. This is simple and efficient.
ReaderWriterLock (or, in V3.5, the better ReaderWriterLockSlim) provide a more complex model. Avoid unless you know it will be more efficient (i.e. have performance measurements to support yourself).
The best kind of locking is the locking you don't need (i.e. don't share data between threads).

ReaderWriterLock allows you to have multiple threads hold the ReadLock at the same time... so that your shared data can be consumed by many threads at once. As soon as a WriteLock is requested no more ReadLocks are granted and the code waiting for the WriteLock is blocked until all the threads with ReadLocks have released them.
The WriteLock can only ever be held by one thread, allow your 'data updates' to appear atomic from the point of view of the consuming parts of your code.
The Lock on the other hand only allows one thread to enter at a time, with no allowance for threads that are simply trying to consume the shared data.
ReaderWriterLockSlim is a new more performant version of ReaderWriterLock with better support for recursion and the ability to have a thread move from a Lock that is essentially a ReadLock to the WriteLock smoothly (UpgradeableReadLock).

ReaderWriterLock/Slim is specifically designed to help you efficiently lock in a multiple consumer/ single producer scenario. Doing so with the lock statement is possible, but not efficient. RWL/S gets the upper hand by being able to aggressively spinlock to acquire the lock. That also helps you avoid lock convoys, a problem with the lock statement where a thread relinquishes its thread quantum when it cannot acquire the lock, making it fall behind because it won't be rescheduled for a while.

It is true that ReaderWriterLockSlim is FASTER than ReaderWriterLock. But the memory consumption by ReaderWriterLockSlim is outright outrageous. Try attaching a memory profiler and see for yourself. I would pick ReaderWriterLock anyday over ReaderWriterLockSlim.

I would suggest looking through http://www.albahari.com/threading/part4.aspx#_Reader_Writer_Locks. It talks about ReaderWriterLockSlim (which you want to use instead of ReaderWriterLock).

When should each thread synchronization objects be used?

Under what circumstances should each of the following synchronization objects be used?
ReaderWriter lock
Semaphore
Mutex

Since wait() will return once for each time post() is called, semaphores are a basic producer-consumer model - the simplest form of inter-thread message except maybe signals. They are used so one thread can tell another thread that something has happened that it's interested in (and how many times), and for managing access to resources which can have at most a fixed finite number of users. They offer ordering guarantees needed for multi-threaded code.
Mutexes do what they say on the tin - "mutual exclusion". They ensure that the right to access some resource is "held" by only on thread at a time. This gives guarantees of atomicity and ordering needed for multi-threaded code. On most OSes, they also offer reasonably sophisticated waiter behaviour, in particular to avoid priority inversion.
Note that a semaphore can easily be used to implement mutual exclusion, but that because a semaphore does not have an "owner thread", you don't get priority inversion avoidance with semaphores. So they are not suitable for all uses which require a "lock".
ReaderWriter locks are an optimisation over mutexes, in cases where you will have a lot of contention, most accesses are read-only, and simultaneous reads are permissible for the data structure being protected. In such cases, exclusion is required only when a writer is involved - readers don't need to be excluded from each other. To promote a reader to writer all other readers must finish (or abort and start waiting to retry if they also wish to become writers) before the writer lock is acquired. ReaderWriter locks are likely to be slower in cases where they aren't faster, due to the additional book-keeping they do over mutexes.
Condition variables are for allowing threads to wait on certain facts or combinations of facts being true, where the condition in question is more complex than just "it has been poked" as for semaphores, or "nobody else is using it" for mutexes and the writer part of reader-writer locks, or "no writers are using it" for the reader part of reader-writer locks. They are also used where the triggering condition is different for different waiting threads, but depends on some or all of the same state (memory locations or whatever).
Spin locks are for when you will be waiting a very short period of time (like a few cycles) on one processor or core, while another core (or piece of hardware such as an I/O bus) simultaneously does some work that you care about. In some cases they give a performance enhancement over other primitives such as semaphores or interrupts, but must be used with extreme care (since lock-free algorithms are difficult in modern memory models) and only when proven necessary (since bright ideas to avoid system primitives are often premature optimisation).
Btw, these answers aren't C# specific (hence for example the comment about "most OSes"). Richard makes the excellent point that in C# you should be using plain old locks where appropriate. I believe Monitors are a mutex/condition variable pair rolled into one object.

I would say each of them can be "the best" - depends on the use case ;-)

Simple answer: almost never.
The best type of locking is to not need a lock (no shared mutable state).
If you do need a lock, try and use a Monitor (via a lock statement), unless you have specific needs for something different (in which case see Onebyone's answer
Additionally, prefer ReaderWriteLockSlim to ReaderWriterLock (except in the extremely rare case of requiring the latter's fairness).

lock keyword in C#

I understand the main function of the lock key word from MSDN
lock Statement (C# Reference)
The lock keyword marks a statement
block as a critical section by
obtaining the mutual-exclusion lock
for a given object, executing a
statement, and then releasing the
lock.
When should the lock be used?
For instance it makes sense with multi-threaded applications because it protects the data. But is it necessary when the application does not spin off any other threads?
Is there performance issues with using lock?
I have just inherited an application that is using lock everywhere, and it is single threaded and I want to know should I leave them in, are they even necessary?
Please note this is more of a general knowledge question, the application speed is fine, I want to know if that is a good design pattern to follow in the future or should this be avoided unless absolutely needed.

When should the lock be used?
A lock should be used to protect shared resources in multithreaded code. Not for anything else.
But is it necessary when the application does not spin off any other threads?
Absolutely not. It's just a time waster. However do be sure that you're not implicitly using system threads. For example if you use asynchronous I/O you may receive callbacks from a random thread, not your original thread.
Is there performance issues with using lock?
Yes. They're not very big in a single-threaded application, but why make calls you don't need?
...if that is a good design pattern to follow in the future[?]
Locking everything willy-nilly is a terrible design pattern. If your code is cluttered with random locking and then you do decide to use a background thread for some work, you're likely to run into deadlocks. Sharing a resource between multiple threads requires careful design, and the more you can isolate the tricky part, the better.

All the answers here seem right: locks' usefulness is to block threads from acessing locked code concurrently. However, there are many subtleties in this field, one of which is that locked blocks of code are automatically marked as critical regions by the Common Language Runtime.
The effect of code being marked as critical is that, if the entire region cannot be entirely executed, the runtime may consider that your entire Application Domain is potentially jeopardized and, therefore, unload it from memory. To quote MSDN:
For example, consider a task that attempts to allocate memory while holding a lock. If the memory allocation fails, aborting the current task is not sufficient to ensure stability of the AppDomain, because there can be other tasks in the domain waiting for the same lock. If the current task is terminated, other tasks could be deadlocked.
Therefore, even though your application is single-threaded, this may be a hazard for you. Consider that one method in a locked block throws an exception that is eventually not handled within the block. Even if the exception is dealt as it bubbles up through the call stack, your critical region of code didn't finish normally. And who knows how the CLR will react?
For more info, read this article on the perils of Thread.Abort().

Bear in mind that there might be reasons why your application is not as single-threaded as you think. Async I/O in .NET may well call-back on a pool thread, for example, as do some of the various timer classes (not the Windows Forms Timer, though).

Generally speaking if your application is single threaded, you're not going to get much use out of the lock statement. Not knowing your application exactly, I don't know if they're useful or not - but I suspect not. Further, if you're application is using lock everywhere I don't know that I would feel all that confident about it working in a multi-threaded environment anyways - did the original developer actually know how to develop multi-threaded code, or did they just add lock statements everywhere in the vague hope that that would do the trick?

lock should be used around the code that modifies shared state, state that is modified by other threads concurrently, and those other treads must take the same lock.
A lock is actually a memory access serializer, the threads (that take the lock) will wait on the lock to enter until the current thread exits the lock, so memory access is serialized.
To answer you question lock is not needed in a single threaded application, and it does have performance side effects. because locks in C# are based on kernel sync objects and every lock you take creates a transition to kernel mode from user mode.
If you're interested in multithreading performance a good place to start is MSDN threading guidelines

You can have performance issues with locking variables, but normally, you'd construct your code to minimize the lengths of time that are spent inside a 'locked' block of code.
As far as removing the locks. It'll depend on what exactly the code is doing. Even though it's single threaded, if your object is implemented as a Singleton, it's possible that you'll have multiple clients using an instance of it (in memory, on a server) at the same time..

Yes, there will be some performance penalty when using lock but it is generally neglible enough to not matter.
Using locks (or any other mutual-exclusion statement or construct) is generally only needed in multi-threaded scenarios where multiple threads (either of your own making or from your caller) have the opportunity to interact with the object and change the underlying state or data maintained. For example, if you have a collection that can be accessed by multiple threads you don't want one thread changing the contents of that collection by removing an item while another thread is trying to read it.

Lock(token) is only used to mark one or more blocks of code that should not run simultaneously in multiple threads. If your application is single-threaded, it's protecting against a condition that can't exist.
And locking does invoke a performance hit, adding instructions to check for simultaneous access before code is executed. It should only be used where necessary.

See the question about 'Mutex' in C#. And then look at these two questions regarding use of the 'lock(Object)' statement specifically.

There is no point in having locks in the app if there is only one thread and yes, it is a performance hit although it does take a fair number of calls for that hit to stack up into something significant.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.