I have used generic queue in C# collection and everyone says that it is better to use the object of System.Collection.Generic.Queue because of thread safety.
Please advise on the right decision to use Queue object, and how it is thread safe?
"Thread safe" is a bit of an unfortunate term because it doesn't really have a solid definition. Basically it means that certain operations on the object are guaranteed to behave sensibly when the object is being operated on via multiple threads.
Consider the simplest example: a counter. Suppose you have two threads that are incrementing a counter. If the sequence of events goes:
Thread one reads from counter, gets
zero.
Thread two reads from counter, gets
zero.
Thread one increments zero, writes
one to counter.
Thread two increments zero, writes
one to counter.
Then notice how the counter has "lost" one of the increments. Simple increment operations on counters are not threadsafe; to make them threadsafe you can use locks, or InterlockedIncrement.
Similarly with queues. Not-threadsafe-queues can "lose" enqueues the same way that not-threadsafe counters can lose increments. Worse, not threadsafe queues can even crash or produce crazy results if you use them in a multi-threaded scenario improperly.
The difficulty with "thread safe" is that it is not clearly defined. Does it simply mean "will not crash"? Does it mean that sensible results will be produced? For example, suppose you have a "threadsafe" collection. Is this code correct?
if (!collection.IsEmpty) Console.WriteLine(collection[0]);
No. Even if the collection is "threadsafe", that doesn't mean that this code is correct; another thread could have made the collection empty after the check but before the writeline and therefore this code could crash, even if the object is allegedly "threadsafe". Actually determining that every relevant combination of operations is threadsafe is an extremely difficult problem.
Now to come to your actual situation: anyone who is telling you "you should use the Queue class, it is better because it is threadsafe" probably does not have a clear idea of what they're talking about. First off, Queue is not threadsafe. Second, whether Queue is threadsafe or not is completely irrelevant if you are only using the object on a single thread! If you have a collection that is going to be accessed on multiple threads, then, as I indicated in my example above, you have an extremely difficult problem to solve, regardless of whether the collection itself is "threadsafe". You have to determine that every combination of operations you perform on the collection is also threadsafe. This is a very difficult problem, and if it is one you face, then you should use the services of an expert on this difficult topic.
A type that is thread safe can be safely accessed from multiple threads without concern for concurrency. This usually means that the type is read-only.
Interestingly enough, Queue<T> is not thread safe - it can support concurrent reads as long as the queue isn't modified but that isn't the same thing as thread safety.
In order to think about thread safety consider what would happen if two threads were accessing a Queue<T> and a third thread came along and began either adding to or removing from this Queue<T>. Since this type does not restrict this behavior it is not thread safe.
In dealing with multithreading, you usually have to deal with concurrency issues. The term "concurrency issues" refers to issues that are specifically introduced by the possibility of interleaving instructions from two different execution contexts on a resource shared by both. Here, in terms of thread safety, the execution contexts are two threads within a process; however, in related subjects they might be processes.
Thread safety measures are put in place to achieve two goals primarily. First is to regain determinism with regard to what happens if the threads context-switch (which is otherwise controlled by the OS and thus basically nondeterministic in user-level programs), to prevent certain tasks from being left half-finished or two contexts writing to the same location in memory one after the other. Most measures simply use a little bit of hardware-supported test-and-set instructions and the like, as well as software-level synchronization constructs to force all other execution contexts to stay away from a data type while another one is doing work that should not be interrupted.
Usually, objects that are read-only are thread-safe. Many objects that are not read-only are able to have data accesses (read-only) occur with multiple threads without issue, if the object is not modified in the middle. But this is not thread safety. Thread safety is when all manner of things are done to a data type to prevent any modifications to it by one thread from causing data corruption or deadlock even when dealing with many concurrent reads and writes.
Related
I just found out about ArrayPool existence, but it's documentation is somewhat lacking.
I'd like to know if Rent(.) and Return(.) are thread-safe.
Edit: looks like I didn't notice the "Thread Safety" part of documentation; but reading some of the comments and answers I was relieved I wasn't the only one that didn't.
Update Comment from ta.speot.is
It literally says on
Thread safety This
class is thread-safe. All members may be used by multiple threads
concurrently
Original
It doesn't say it on learn.microsoft.com, however there is a few references to the fact it is
Add a new System.Buffers namespace to the BCL for Resource Pooling
The Pool will be lightweight and thread-safe, allowing for fast Rent
and Return calls from any thread within the process, along with
minimal locking overhead, and 0 heap allocations on most Rent calls
(exceptions to this will be called out below in the description of the
Rent function).
Pooling large arrays with ArrayPool
Recommended: use the ArrayPool.Shared property, which returns a
shared pool instance. It’s thread safe and all you need to remember is
that it has a default max array length, equal to 2^20 (1024*1024 = 1
048 576).
.
it's documentation is somewhat lacking.
You can read about thread safety under Thread Safety:
Thread Safety
This class is thread-safe. All members may be used by multiple threads concurrently.
Let's say I have a timer (e.g. a System.Timers.Timer), and we know each elasped event will get put into the threadpool. If events come rapidly enough, how does the threadpool manage access to shared variables (e.g. a global int counter). Does the manager use semaphores/locks under the hood?
Or does it not do anything, and just simply make a copy of shared variables at the start of the threadpool, and the last thread to finish will set the correct variable value?
Unfortunately I can't really test this because the order of events firing are not guaranteed (e.g. using a counter variable is not reliable) between each elapsed event, as they may be fired out of order.
Thanks
You have to manage multi-threaded access to shared variables yourself.
There are many answers on StackOverflow and Google explaining how to do this, search for "thread safety C#".
I've worked on huge projects with many potential threading issues, and the code I write just works. I'm damn good at writing thread safe code these days, as I've already made all of the possible mistakes.
If you are just learning to write thread safe code, then its easy to get overwhelmed by the huge amount of information out there. You might find some pages that cover the 8 different types of synchronization primitives. You will find huge discussions on the topic, and only half of them will be helpful.
If you are following the learning curve for the first time, I would recommend that you ignore said noise for now, and instead focus on mastering these two rules first:
Rule 1
If any two threads write to some shared primitive (like a long or a Dictionary or a List), put a lock around the access to this shared primitive. Aim for a situation so that when the lock is finished, the data structure is completely updated. This is the heart of writing thread safe code: all other rules for threading can be derived from this one.
Example:
// This _lock should be initialized once on program startup, and should be global.
static readonly object _dictLock = new object();
// This data structure can be accessed by multiple threads.
public static Dictionary<string, int> dict = new Dictionary<string, int>();
lock (_dictLock)
{
if (dict.ContainsKey("Hello") == false)
{
dict.Add("Hello", 42);
}
} // Lock exits: data structure is now completely 100% updated. Google "atomic access C#".
Rule 2
Try not to have locks within locks. This can create deadlocks if the locks are entered in the wrong order. If you only lock around the primitives (e.g. dictionary, long, string, etc), then this shouldn't be an issue.
Guideline 1
If you are just learning, use nothing but lock, see how to use lock. Its difficult to go wrong if you just this, as the lock is automatically released when the function exits. You can graduate to other types of locks, like reader-write locks, later on. Don't bother with ConcurrentDictionary or Interlocked.Increment yet - focus on getting the basics correct.
Guideline 2
Try to spend as little time in locks as possible. Don't put a lock around a huge block of code, put locks around the smallest possible portions in the code, usually a dictionary or a long. A lock is blindingly fast unless its contested, so this technique seems to work well to create thread safe code that is fast.
Cause of 95% of meaningful threading issues?
In my experience, the single biggest cause of thread-unsafe code is Dictionary. Even ConcurrentDictionary is not immune to this - it needs manual locking to be correct if the access is spread over multiple lines. If you get this right, you will eliminate 95% of meaningful threading issues in your code.
The thread pool can't magically make your shared mutable variables thread-safe. It has no control over them and it does not even know they exist.
Be aware of the fact that timer ticks can happen concurrently (even at low frequencies) and after the timer has been disposed. You need to perform any synchronization necessary.
The thread pool itself is thread-safe in the sense that I can successfully process concurrent work items (which is kind of the point).
A few questions about accessing a local variable from multiple threads at once:
I have multiple threads writing and reading the value of a variable, should i synchronize access to it or not?
The variable is being updated every few seconds from Thread1 and its being read and updated to the Database every few seconds from Thread2.
Which problems can occur if i don't hold any logic and don't have any concurrency issues?
Should i use volatile for this?
EDIT:
I would like to emphasize that i don't have any concurrency issues. Here's my specific scenarion:
a. My variable's name is pingLatency and it measures ping latency
b. Thread1 is sending a ping to 8.8.8.8 each 10 seconds and writes the latency to pingLatency
c. Thread2 updates a correcposing field with the value of pingLatency each 10 seconds.
d. Thread2 updates the same database row each time.
Now, i'm using this database field to monitor network connectivity. My question is - Can there be a situation where the variable is not updated or it would throw an exception due to thread safety issues? I want to avoid using lock because it just seems like an overkill.
What do you think?
Yes you should synchronize access to it, if it is a primitive type there are methods to do this for you without locks
no comment
not sure by what you mean by this... most likely you'll end up inserting the wrong value into the DB
Don't use volatile, per Eric Lippert, it's overly complicated and the semantics are very weird.
Be careful of breaking the memory model, C# by and large follows most other languages in using sequential consistency for data-race-free programs (SC-DRF). Volatile breaks this, so just use locks to prevent a data race.
As for lock it's not as heavy as one might imagine, in most cases the lock won't be contended in the scenario you imagine. So acquiring the lock should be painless in most cases.
If you want .NET managed parallelism use the built in good stuff. Task Parallelism. This will manage the threads for you and you can use the thread safe variables that are built in just as an array/list would be equal to ConcurrentBag, etc.
If access to your variable is atomic and there are no logical problems you are OK.
according to this you can know if you are using an atomic variable.
Under what circumstances should each of the following synchronization objects be used?
ReaderWriter lock
Semaphore
Mutex
Since wait() will return once for each time post() is called, semaphores are a basic producer-consumer model - the simplest form of inter-thread message except maybe signals. They are used so one thread can tell another thread that something has happened that it's interested in (and how many times), and for managing access to resources which can have at most a fixed finite number of users. They offer ordering guarantees needed for multi-threaded code.
Mutexes do what they say on the tin - "mutual exclusion". They ensure that the right to access some resource is "held" by only on thread at a time. This gives guarantees of atomicity and ordering needed for multi-threaded code. On most OSes, they also offer reasonably sophisticated waiter behaviour, in particular to avoid priority inversion.
Note that a semaphore can easily be used to implement mutual exclusion, but that because a semaphore does not have an "owner thread", you don't get priority inversion avoidance with semaphores. So they are not suitable for all uses which require a "lock".
ReaderWriter locks are an optimisation over mutexes, in cases where you will have a lot of contention, most accesses are read-only, and simultaneous reads are permissible for the data structure being protected. In such cases, exclusion is required only when a writer is involved - readers don't need to be excluded from each other. To promote a reader to writer all other readers must finish (or abort and start waiting to retry if they also wish to become writers) before the writer lock is acquired. ReaderWriter locks are likely to be slower in cases where they aren't faster, due to the additional book-keeping they do over mutexes.
Condition variables are for allowing threads to wait on certain facts or combinations of facts being true, where the condition in question is more complex than just "it has been poked" as for semaphores, or "nobody else is using it" for mutexes and the writer part of reader-writer locks, or "no writers are using it" for the reader part of reader-writer locks. They are also used where the triggering condition is different for different waiting threads, but depends on some or all of the same state (memory locations or whatever).
Spin locks are for when you will be waiting a very short period of time (like a few cycles) on one processor or core, while another core (or piece of hardware such as an I/O bus) simultaneously does some work that you care about. In some cases they give a performance enhancement over other primitives such as semaphores or interrupts, but must be used with extreme care (since lock-free algorithms are difficult in modern memory models) and only when proven necessary (since bright ideas to avoid system primitives are often premature optimisation).
Btw, these answers aren't C# specific (hence for example the comment about "most OSes"). Richard makes the excellent point that in C# you should be using plain old locks where appropriate. I believe Monitors are a mutex/condition variable pair rolled into one object.
I would say each of them can be "the best" - depends on the use case ;-)
Simple answer: almost never.
The best type of locking is to not need a lock (no shared mutable state).
If you do need a lock, try and use a Monitor (via a lock statement), unless you have specific needs for something different (in which case see Onebyone's answer
Additionally, prefer ReaderWriteLockSlim to ReaderWriterLock (except in the extremely rare case of requiring the latter's fairness).
I understand the main function of the lock key word from MSDN
lock Statement (C# Reference)
The lock keyword marks a statement
block as a critical section by
obtaining the mutual-exclusion lock
for a given object, executing a
statement, and then releasing the
lock.
When should the lock be used?
For instance it makes sense with multi-threaded applications because it protects the data. But is it necessary when the application does not spin off any other threads?
Is there performance issues with using lock?
I have just inherited an application that is using lock everywhere, and it is single threaded and I want to know should I leave them in, are they even necessary?
Please note this is more of a general knowledge question, the application speed is fine, I want to know if that is a good design pattern to follow in the future or should this be avoided unless absolutely needed.
When should the lock be used?
A lock should be used to protect shared resources in multithreaded code. Not for anything else.
But is it necessary when the application does not spin off any other threads?
Absolutely not. It's just a time waster. However do be sure that you're not implicitly using system threads. For example if you use asynchronous I/O you may receive callbacks from a random thread, not your original thread.
Is there performance issues with using lock?
Yes. They're not very big in a single-threaded application, but why make calls you don't need?
...if that is a good design pattern to follow in the future[?]
Locking everything willy-nilly is a terrible design pattern. If your code is cluttered with random locking and then you do decide to use a background thread for some work, you're likely to run into deadlocks. Sharing a resource between multiple threads requires careful design, and the more you can isolate the tricky part, the better.
All the answers here seem right: locks' usefulness is to block threads from acessing locked code concurrently. However, there are many subtleties in this field, one of which is that locked blocks of code are automatically marked as critical regions by the Common Language Runtime.
The effect of code being marked as critical is that, if the entire region cannot be entirely executed, the runtime may consider that your entire Application Domain is potentially jeopardized and, therefore, unload it from memory. To quote MSDN:
For example, consider a task that attempts to allocate memory while holding a lock. If the memory allocation fails, aborting the current task is not sufficient to ensure stability of the AppDomain, because there can be other tasks in the domain waiting for the same lock. If the current task is terminated, other tasks could be deadlocked.
Therefore, even though your application is single-threaded, this may be a hazard for you. Consider that one method in a locked block throws an exception that is eventually not handled within the block. Even if the exception is dealt as it bubbles up through the call stack, your critical region of code didn't finish normally. And who knows how the CLR will react?
For more info, read this article on the perils of Thread.Abort().
Bear in mind that there might be reasons why your application is not as single-threaded as you think. Async I/O in .NET may well call-back on a pool thread, for example, as do some of the various timer classes (not the Windows Forms Timer, though).
Generally speaking if your application is single threaded, you're not going to get much use out of the lock statement. Not knowing your application exactly, I don't know if they're useful or not - but I suspect not. Further, if you're application is using lock everywhere I don't know that I would feel all that confident about it working in a multi-threaded environment anyways - did the original developer actually know how to develop multi-threaded code, or did they just add lock statements everywhere in the vague hope that that would do the trick?
lock should be used around the code that modifies shared state, state that is modified by other threads concurrently, and those other treads must take the same lock.
A lock is actually a memory access serializer, the threads (that take the lock) will wait on the lock to enter until the current thread exits the lock, so memory access is serialized.
To answer you question lock is not needed in a single threaded application, and it does have performance side effects. because locks in C# are based on kernel sync objects and every lock you take creates a transition to kernel mode from user mode.
If you're interested in multithreading performance a good place to start is MSDN threading guidelines
You can have performance issues with locking variables, but normally, you'd construct your code to minimize the lengths of time that are spent inside a 'locked' block of code.
As far as removing the locks. It'll depend on what exactly the code is doing. Even though it's single threaded, if your object is implemented as a Singleton, it's possible that you'll have multiple clients using an instance of it (in memory, on a server) at the same time..
Yes, there will be some performance penalty when using lock but it is generally neglible enough to not matter.
Using locks (or any other mutual-exclusion statement or construct) is generally only needed in multi-threaded scenarios where multiple threads (either of your own making or from your caller) have the opportunity to interact with the object and change the underlying state or data maintained. For example, if you have a collection that can be accessed by multiple threads you don't want one thread changing the contents of that collection by removing an item while another thread is trying to read it.
Lock(token) is only used to mark one or more blocks of code that should not run simultaneously in multiple threads. If your application is single-threaded, it's protecting against a condition that can't exist.
And locking does invoke a performance hit, adding instructions to check for simultaneous access before code is executed. It should only be used where necessary.
See the question about 'Mutex' in C#. And then look at these two questions regarding use of the 'lock(Object)' statement specifically.
There is no point in having locks in the app if there is only one thread and yes, it is a performance hit although it does take a fair number of calls for that hit to stack up into something significant.