I need to write a class that hold as Dictionary. The Dictionary will be accessed by multiple threads.
Each access will be very short.
I expect:
every minute an access that adds or removes entries
Every 2 sec. I need to create a copy of that dict. to run checks over it (and then e.g. call the DB)
Multiple times per second I'm updated a field of the value to one of the keys. (value is a struct) The same key will not be accessed concurrently.
Which locking strategy would you choose and why?
My first shot is to use ReaderWriterLockSlim. But after I read it's a least two times slower than Monitor, I'm not so sure anymore, as everytime I access the dict, I'm going to lock very short.
tia
Martin
Given that the most frequent operation is writing and you never need multiple concurrent readers as far as I can see, I would just use a normal lock. I can't see that ReaderWriterLockSlim would help you, and it will certainly give you more complex code.
EDIT: One possible optimization is to just access a list of changes from the writing threads. The reading thread would then need to lock that list just while it updates the underlying dictionary before processing it. Assuming the dictionary is very large but the list of changes is relatively small, that means the writing threads will be blocked for a lot less time.
In fact, you could use something like Queue instead of a list, and potentially make the reading thread dequeue small batches and yield, reducing the latency of the writing threads even further - so if there were 100 changes to process, the reading thread might read 10 of them, yield so that any writing threads blocked waiting to add to the queue could get their turn.
There are ever more complex solutions to this, depending on how critical the performance is, particularly in terms of latency - but I'd start off with just a lock round the dictionary.
Related
I searched for a answer and I know at a high level how to use lock etc. in a multithreading environment. This question has bugged me for a long time and I think I am not the only one. TL;DR at the end.
In my case I want to prevent a method which gets called from multiple threads to be executed while it is being called by another thread.
Now a normal lock scenario would look like this in C#:
static readonly object _locker = new object();
private static int counter;
public void Increase()
{
lock (_locker)
{
_counter++;//Do more than this here
}
}
As I understand it, the object _locker acts as a bool, which indicates if the method is currently being executed. If method is "free", set it to locked and execute method and free afterwards. If method is locked, wait until unlocked, lock, execute and unlock.
Side question 1: Does calling this method repeatedly guarantee a queue like behavior? Ignoring the fact that the blocking in the parent thread could cause problems. Imagine Increase() is the last call in the parent thread.
Side question 2: Using an object in a boolean way feels odd. Does every object contain a "is-being-used" flag and a "raw" object is just being used for containing this flag? Why not Boolean?
Side question 3: how can lock() modify a readonly?
Functionally I could write it like this also:
static Boolean _locker = false;
private static int counter;
public void Increase()
{
while(_locker)//Waits for lock==0
{
}
_locker = true;//Sets lock=1
_counter++;//Do more than this here
_locker = false;//Sets lock=0
}
While the lock example looks sophisticated and safe, the second one just feels wrong, and somehow rings alarm bells in my head.
Is it possible for this method to be executed at the exact same CPU cycle by two cores simultaneously?
I know this is "but sometimes" taken to the extreme. I believe an OS-scheduler does split threads from one application to multiple cores, so why shouldn't the assembly instruction "Load value of _locked for comparison" be executed on two cores at the same time? Even if the method is entered one cycle apart the "read for comparison" and "write true to _locked" would be executed at the same time.
This doesn't even take into account that one line of C# could/will translate to multiple assembly instructions and a thread could be interrupted after confirming locked==0 and writing locked=1. Because one line of C# can result in many assembly instructions, even the lock() could be interrupted?
Obviously these problems are somehow solved or avoided, I would really appreciate a explanation of where my thought process is wrong or what I am missing.
TL;DR Can a lock() statement be executed at the exact same time by two CPU cores? I can't explain avoidance of this by software without big performance impacts.
Yes, two cores can take two different locks at the same time. The atomic RMW operation only needs a "cache lock", not a global bus lock, on modern CPUs. e.g. this test code (on Godbolt) is C++ code that compiles to a loop that just repeats an xchg [rdi], ecx, with each thread using a different std::atomic<int> object in a different cache line. The total runtime of the program on my i7-6700k is 463ms whether it runs on 1 or 4 threads, so that rules out any kind of system-wide bus lock, confirming that the CPU just uses a MESI cache-lock within the core doing the RMW to make sure it's atomic without disturbing operations of other cores. Uncontended locks scale perfectly when each thread is only locking/unlocking its own lock repeatedly.
Taking a lock that was last released by another core will stall this one for maybe hundreds of clock cycles (40 to 70 nanoseconds is a typical inter-core latency) for the RFO (Read For Ownership) to complete and get exclusive ownership of the cache line, but won't have to retry or anything. Atomic RMW involves a memory barrier (on x86), so memory operations after the lock can't even get started, so the CPU core may be stalled for a while. There is significant cost here, compared to normal loads/stores, which out-of-order exec can't hide as well as some other things.
No, two cores can't take the same lock at the same time1, that's the whole point of a mutex. Correctly-implemented ones don't have the same bug as your example of spin-wait and then separately store a true.
(Note 1: There are counted locks / semaphores that you can use to allow up to n threads into a critical section, for some fixed n, where the resource management problem you want to solve is something other than simple mutual exclusion. But you're only talking about mutexes.)
The critical operation in taking a lock is an atomic RMW, for example x86 xchg [rcx], eax or lock cmpxchg [rcx], edx, that stores a 1 (true) and as part of the same operation checks what the old value was. (Can num++ be atomic for 'int num'?). In C++, that would mean using std::atomic<bool> lock; / old = lock.exchange(true); In C#, you have Interlocked.Exchange(). That closes the race window your attempt contained, where two threads could exit the while(_locker){} loop and then both blindly store a _locker = true.
Also note that rolling your own spin-loop has problems if you don't use volatile or Volatile.Read() to stop the compiler from assuming that no other threads are writing a variable you're reading/writing. (Without volatile, while(foo){} can optimize into if(!foo) infinite_loop{} by hoisting the apparently loop-invariant load out of the loop).
(The other interesting part of implementing a lock is what to do if it's not available the first time you try. e.g. how long you keep spinning (and if so exactly how, e.g. the x86 pause instruction between read-only checks), using CPU time while waiting, before falling back to making a system call to give up the CPU to another thread or process, and have the OS wake you back up when the lock is or might be available again. But that's all performance tuning; actually taking the lock revolves around an atomic RMW.)
Of course, if you're going to do any rolling-your-own, make the increment itself a lock-free atomic RMW with Interlocked.Increment(ref counter);, as per the example in MS's docs
Does every object contain a "is-being-used" flag and a "raw" object is just being used for containing this flag ? Why not Boolean ?
We know from object sizes that C# doesn't do that. Probably you should just use lock (counter){ counter++; } instead of inventing a separate. Using a dummy object would make sense if you didn't have an existing object you wanted to manage, but instead some more abstract resource like calling into some function. (Correct me if I'm wrong, I don't use C#; I'm just here for the cpu-architecture and assembly tags. Does lock() require an object, not a primitive type like int?)
I'd guess that they instead do what normal C++ implementations of std::atomic<T> does for objects too large to be lock-free: a hash table of actual mutexes or spinlocks, indexed by C# object address. Where is the lock for a std::atomic?
Even if that guess isn't exactly what C# does, that's the kind of mental model that can make sense of this ability to lock anything without using reserving space in every object.
This can create extra contention (by using the same mutex for two different objects). It could even introduce deadlocks where there shouldn't have been any, which is something the implementation would have to work around. Perhaps by putting the identity of the object being locked into the mutex, so another thread that indexes the same mutex can see that it's actually being used to lock a different object, and then do something about it... This is perhaps where being a "managed" language comes in; Java apparently does the same thing where you can lock any object without having to define a separate lock.
(C++ std::atomic doesn't have this problem because the mutexes are taken/released inside library functions, with no possibility to try to take two locks at the same time.)
Do CPU cores tick at the same time?
Not necessarily, e.g. Intel "server" chips (most Xeons) let each core control its frequency-multiplier independently. However, even in a multi-socket system, the clock for all cores is normally still derived from the same source, so they can keep their TSC (which counts reference cycles, not core clocks) synced across cores.
Intel "client" chips, like desktop/laptop chips such as i7-6700, actually do use the same clock for all cores. A core is either in low-power sleep (clock halted) or running at the same clock frequency as any other active cores.
None of this has anything to do with locking, or making atomic RMW operations truly atomic, and probably should be split off to a separate Q&A. I'm sure there are plenty of non-x86 examples, but I happen to know how Intel CPUs do things.
Let's say I have a timer (e.g. a System.Timers.Timer), and we know each elasped event will get put into the threadpool. If events come rapidly enough, how does the threadpool manage access to shared variables (e.g. a global int counter). Does the manager use semaphores/locks under the hood?
Or does it not do anything, and just simply make a copy of shared variables at the start of the threadpool, and the last thread to finish will set the correct variable value?
Unfortunately I can't really test this because the order of events firing are not guaranteed (e.g. using a counter variable is not reliable) between each elapsed event, as they may be fired out of order.
Thanks
You have to manage multi-threaded access to shared variables yourself.
There are many answers on StackOverflow and Google explaining how to do this, search for "thread safety C#".
I've worked on huge projects with many potential threading issues, and the code I write just works. I'm damn good at writing thread safe code these days, as I've already made all of the possible mistakes.
If you are just learning to write thread safe code, then its easy to get overwhelmed by the huge amount of information out there. You might find some pages that cover the 8 different types of synchronization primitives. You will find huge discussions on the topic, and only half of them will be helpful.
If you are following the learning curve for the first time, I would recommend that you ignore said noise for now, and instead focus on mastering these two rules first:
Rule 1
If any two threads write to some shared primitive (like a long or a Dictionary or a List), put a lock around the access to this shared primitive. Aim for a situation so that when the lock is finished, the data structure is completely updated. This is the heart of writing thread safe code: all other rules for threading can be derived from this one.
Example:
// This _lock should be initialized once on program startup, and should be global.
static readonly object _dictLock = new object();
// This data structure can be accessed by multiple threads.
public static Dictionary<string, int> dict = new Dictionary<string, int>();
lock (_dictLock)
{
if (dict.ContainsKey("Hello") == false)
{
dict.Add("Hello", 42);
}
} // Lock exits: data structure is now completely 100% updated. Google "atomic access C#".
Rule 2
Try not to have locks within locks. This can create deadlocks if the locks are entered in the wrong order. If you only lock around the primitives (e.g. dictionary, long, string, etc), then this shouldn't be an issue.
Guideline 1
If you are just learning, use nothing but lock, see how to use lock. Its difficult to go wrong if you just this, as the lock is automatically released when the function exits. You can graduate to other types of locks, like reader-write locks, later on. Don't bother with ConcurrentDictionary or Interlocked.Increment yet - focus on getting the basics correct.
Guideline 2
Try to spend as little time in locks as possible. Don't put a lock around a huge block of code, put locks around the smallest possible portions in the code, usually a dictionary or a long. A lock is blindingly fast unless its contested, so this technique seems to work well to create thread safe code that is fast.
Cause of 95% of meaningful threading issues?
In my experience, the single biggest cause of thread-unsafe code is Dictionary. Even ConcurrentDictionary is not immune to this - it needs manual locking to be correct if the access is spread over multiple lines. If you get this right, you will eliminate 95% of meaningful threading issues in your code.
The thread pool can't magically make your shared mutable variables thread-safe. It has no control over them and it does not even know they exist.
Be aware of the fact that timer ticks can happen concurrently (even at low frequencies) and after the timer has been disposed. You need to perform any synchronization necessary.
The thread pool itself is thread-safe in the sense that I can successfully process concurrent work items (which is kind of the point).
Please explain what are the main differences and when should I use what.
The focus on web multi-threaded applications.
lock allows only one thread to execute the code at the same time. ReaderWriterLock may allow multiple threads to read at the same time or have exclusive access for writing, so it might be more efficient. If you are using .NET 3.5 ReaderWriterLockSlim is even faster. So if your shared resource is being read more often than being written, use ReaderWriterLockSlim. A good example for using it is a file that you read very often (on each request) and you update the contents of the file rarely. So when you read from the file you enter a read lock so that many requests can open it for reading and when you decide to write you enter a write lock. Using a lock on the file will basically mean that you can serve one request at a time.
Consider using ReaderWriterLock if you have lots of threads that only need to read the data and these threads are getting blocked waiting for the lock and and you don’t often need to change the data.
However ReaderWriterLock may block a thread that is waiting to write for a long time.
Therefore only use ReaderWriterLock after you have confirmed you get high contention for the lock in “real life” and you have confirmed you can’t redesign your locking design to reduce how long the lock is held for.
Also consider if you can't rather store the shared data in a database and let it take care of all the locking, as this is a lot less likely to give you a hard time tracking down bugs, iff a database is fast enough for your application.
In some cases you may also be able to use the Aps.net cache to handle shared data, and just remove the item from the cache when the data changes. The next read can put a fresh copy in the cache.
Remember
"The best kind of locking is the
locking you don't need (i.e. don't
share data between threads)."
Monitor and the underlying "syncblock" that can be associated with any reference object—the underlying mechanism under C#'s lock—support exclusive execution. Only one thread can ever have the lock. This is simple and efficient.
ReaderWriterLock (or, in V3.5, the better ReaderWriterLockSlim) provide a more complex model. Avoid unless you know it will be more efficient (i.e. have performance measurements to support yourself).
The best kind of locking is the locking you don't need (i.e. don't share data between threads).
ReaderWriterLock allows you to have multiple threads hold the ReadLock at the same time... so that your shared data can be consumed by many threads at once. As soon as a WriteLock is requested no more ReadLocks are granted and the code waiting for the WriteLock is blocked until all the threads with ReadLocks have released them.
The WriteLock can only ever be held by one thread, allow your 'data updates' to appear atomic from the point of view of the consuming parts of your code.
The Lock on the other hand only allows one thread to enter at a time, with no allowance for threads that are simply trying to consume the shared data.
ReaderWriterLockSlim is a new more performant version of ReaderWriterLock with better support for recursion and the ability to have a thread move from a Lock that is essentially a ReadLock to the WriteLock smoothly (UpgradeableReadLock).
ReaderWriterLock/Slim is specifically designed to help you efficiently lock in a multiple consumer/ single producer scenario. Doing so with the lock statement is possible, but not efficient. RWL/S gets the upper hand by being able to aggressively spinlock to acquire the lock. That also helps you avoid lock convoys, a problem with the lock statement where a thread relinquishes its thread quantum when it cannot acquire the lock, making it fall behind because it won't be rescheduled for a while.
It is true that ReaderWriterLockSlim is FASTER than ReaderWriterLock. But the memory consumption by ReaderWriterLockSlim is outright outrageous. Try attaching a memory profiler and see for yourself. I would pick ReaderWriterLock anyday over ReaderWriterLockSlim.
I would suggest looking through http://www.albahari.com/threading/part4.aspx#_Reader_Writer_Locks. It talks about ReaderWriterLockSlim (which you want to use instead of ReaderWriterLock).
I have used generic queue in C# collection and everyone says that it is better to use the object of System.Collection.Generic.Queue because of thread safety.
Please advise on the right decision to use Queue object, and how it is thread safe?
"Thread safe" is a bit of an unfortunate term because it doesn't really have a solid definition. Basically it means that certain operations on the object are guaranteed to behave sensibly when the object is being operated on via multiple threads.
Consider the simplest example: a counter. Suppose you have two threads that are incrementing a counter. If the sequence of events goes:
Thread one reads from counter, gets
zero.
Thread two reads from counter, gets
zero.
Thread one increments zero, writes
one to counter.
Thread two increments zero, writes
one to counter.
Then notice how the counter has "lost" one of the increments. Simple increment operations on counters are not threadsafe; to make them threadsafe you can use locks, or InterlockedIncrement.
Similarly with queues. Not-threadsafe-queues can "lose" enqueues the same way that not-threadsafe counters can lose increments. Worse, not threadsafe queues can even crash or produce crazy results if you use them in a multi-threaded scenario improperly.
The difficulty with "thread safe" is that it is not clearly defined. Does it simply mean "will not crash"? Does it mean that sensible results will be produced? For example, suppose you have a "threadsafe" collection. Is this code correct?
if (!collection.IsEmpty) Console.WriteLine(collection[0]);
No. Even if the collection is "threadsafe", that doesn't mean that this code is correct; another thread could have made the collection empty after the check but before the writeline and therefore this code could crash, even if the object is allegedly "threadsafe". Actually determining that every relevant combination of operations is threadsafe is an extremely difficult problem.
Now to come to your actual situation: anyone who is telling you "you should use the Queue class, it is better because it is threadsafe" probably does not have a clear idea of what they're talking about. First off, Queue is not threadsafe. Second, whether Queue is threadsafe or not is completely irrelevant if you are only using the object on a single thread! If you have a collection that is going to be accessed on multiple threads, then, as I indicated in my example above, you have an extremely difficult problem to solve, regardless of whether the collection itself is "threadsafe". You have to determine that every combination of operations you perform on the collection is also threadsafe. This is a very difficult problem, and if it is one you face, then you should use the services of an expert on this difficult topic.
A type that is thread safe can be safely accessed from multiple threads without concern for concurrency. This usually means that the type is read-only.
Interestingly enough, Queue<T> is not thread safe - it can support concurrent reads as long as the queue isn't modified but that isn't the same thing as thread safety.
In order to think about thread safety consider what would happen if two threads were accessing a Queue<T> and a third thread came along and began either adding to or removing from this Queue<T>. Since this type does not restrict this behavior it is not thread safe.
In dealing with multithreading, you usually have to deal with concurrency issues. The term "concurrency issues" refers to issues that are specifically introduced by the possibility of interleaving instructions from two different execution contexts on a resource shared by both. Here, in terms of thread safety, the execution contexts are two threads within a process; however, in related subjects they might be processes.
Thread safety measures are put in place to achieve two goals primarily. First is to regain determinism with regard to what happens if the threads context-switch (which is otherwise controlled by the OS and thus basically nondeterministic in user-level programs), to prevent certain tasks from being left half-finished or two contexts writing to the same location in memory one after the other. Most measures simply use a little bit of hardware-supported test-and-set instructions and the like, as well as software-level synchronization constructs to force all other execution contexts to stay away from a data type while another one is doing work that should not be interrupted.
Usually, objects that are read-only are thread-safe. Many objects that are not read-only are able to have data accesses (read-only) occur with multiple threads without issue, if the object is not modified in the middle. But this is not thread safety. Thread safety is when all manner of things are done to a data type to prevent any modifications to it by one thread from causing data corruption or deadlock even when dealing with many concurrent reads and writes.
I was googling for some advise about this and I found some links. The most obvious was this one but in the end what im wondering is how well my code is implemented.
I have basically two classes. One is the Converter and the other is ConverterThread
I create an instance of this Converter class that has a property ThreadNumber that tells me how many threads should be run at the same time (this is read from user) since this application will be used on multi-cpu systems (physically, like 8 cpu) so it is suppossed that this will speed up the import
The Converter instance reads a file that can range from 100mb to 800mb and each line of this file is a tab-delimitted value record that is imported to another destination like a database.
The ConverterThread class simply runs inside the thread (new Thread(ConverterThread.StartThread)) and has event notification so when its work is done it can notify the Converter class and then I can sum up the progress for all these threads and notify the user (in the GUI for example) about how many of these records have been imported and how many bytes have been read.
It seems, however that I'm having some trouble because I get random errors about the file not being able to be read or that the sum of the progress (percentage) went above 100% which is not possible and I think that happens because threads are not being well managed and probably the information returned by the event is malformed (since it "travels" from one thread to another)
Do you have any advise on better practices of implementation of threads so I can accomplish this?
Thanks in advance.
I read very large files in some of my own code and, I have to tell you, I am skeptical of any claim that adding threads to a read operation would actually improve the overall read performance. In fact, adding threads might actually reduce performance by causing head seeks. It is highly likely that any file operations of this type would be I/O bound, not CPU bound.
Given that the author of the post you referenced never actually provided the 'real' code, his claims that multiple threads will speed up I/O remain untestable by others. Any attempt to improve hard disk read/write performance by adding threads would most certainly be I/O bound, unless he is doing some serious number crunching between reads, or has stumbled upon some happy coincidence having to do with the disk cache, in which case the performance improvement might be unreproduceable on another machine with different hardware characteristics.
Generally, when files of this size are involved, an additional 20% or 30% improvement in performance is not going to matter much, even if it is possible utilizing threads, because such a task would most certainly be considered a background task (not real-time). I use multiple threads for this kind of work, not because it improves read performance on one file, but because multiple files can be processed simultaneously in the background.
Before using threads to do this, I carefully benchmarked the software to see if threads would actually improve overall throughput. The results of the tests (on my development machine) were that using the same number of threads as the number of processor cores produced the maximum possible throughput. But that was processing ONE file per thread.
Multiple threads reading a file at a time is asking for trouble. I would set up a producer consumer model such that the producer read the lines in the file, perhaps into a buffer, and then handed them out to the consumer threads when they complete processing their current work load. It does mean you have a blocking point where the lines are handed out but if processing takes much longer than reading then it shouldn't be that big of a deal. If reading is the slow part then you really don't need multiple consumers anyway.
You should try to just have one thread read the file, since multiple threads will likely be bound by the I/O anyway. Then you can feed the lines into a thread-safe queue from which multiple threads can dequeue lines to parse.
You won't be able to tell the progress of any one thread because that thread has no defined amount of work. However, you should be able to track approximate progress by keeping track of how many items (total) have been added to the queue and how many have been taken out. Obviously as your file reader thread puts more lines into the queue your progress will appear to decrease because more lines are available, but presumably you should be able to fill the queue faster than workers can process the lines.