I have class than spins off a backgroundworker to do some processor intensive stuff. The background worker reads a few strings that are declared globally for the whole class... do I need to lock around those strings? The backgroundworker never write the strings, they simply represent some directory locations that are set in the constructor of the class and are hardly ever written to by the class after the constructor (and never written to by the backgroundworker). So it's possible the background worker could read the string as it is also being written to by the main class object, though pretty unlikely. But wouldn't both those operations (the read by the background worker and the write by the main class) be atomic for a string literal anyway?
Thanks,
-Robert
Edit: I don't care about the string being out of date or anything (that wouldn't be a big problem in my app), I'm more worried about getting the "object is in use elsewhere" exception.
Strings in .NET are immutable; they can't change. What happens is that the reference will point to a totally different string but the strings themselves won't be changed.
So if you don't particularly mind that the background workers might not all use the same string if you change it, then you should be fine. Example: Worker A reads string, something else changes it, Worker B reads new string—maybe this doesn't cause problems, maybe it does. But accessing the strings itself is definitely safe.
To quote from the documentation:
This type is thread safe.
ETA: A very good point by Martinho Fernandes in the comments below: Thread-safe objects to not automagically mean that everything you do with them is thread-safe as well. He even wrote a blog post on that which spares me the work of saying again everything he did :-)
If you don't use a lock, the worst case would be that one of your background workers reads an outdated copy of the string from the perspective of your main class and thread. You will never (under any circumstance) encounter a "object in use elsewhere" exception when working with strings (from question).
As stated correctly in another answer, strings are immutable and cannot be changed after created. Any changes to an existing string will transparently result in a new string being created in memory on the heap, without any impact to the previous string object.
Using a lock (at the cost of a possibly measurable performance impact), will ensure that your background workers read the latest copy of the string.
Yes, reads and writes will be atomic to string variables. That is because only the variable reference is ever changed. Strings are immutable so any operation that modifies the contents of the string will also create a new instance of the string. It is the reference to that new instance that is swapped out via the variable. But, that is not the main issue.
The main issue has to do with the staleness of the string variable itself. Without the appropriate synchronization mechanisms the writes in one thread may not be seen in another thread.
Bottom line...if there is even a remote chance that another thread will modify the string variable then you will need to synchronize access to it from the worker thread and most likely your main thread as well.
Edit: Since staleness is of no concern to you then you will probably be okay without using locks. However, the assumption is that you have initialized the string variable to something before the worker thread starts.
If you are not doing any writing or modifying any shared variables then you don't need to use lock.
There are several strings A B & C? Does it matter if the background worker is working on (say) v3 of A and B and v2 of C? If so then you need a lock around the update of the whole set.
A second subtle problem is that C# might choose cache values in registers or otehrwise reorder your code and so the threads don't see the same view of "reality". See this discussion and answers to this SO question.
My recommendation, write the conspicuously correct code using synchronisation. In this scenario the performance impact is surely trivial. That way the code maintainer doesn't even need to worry. If benchmarking reveals this to be a performance problem then be very, very careful as you study writing thread-safe clever code.
Yes, i would recomend locking, if it is such a small task as only reading, it should be fairly trivial. Just be aware of deadlock where a thread has locked a string and is waiting for another thread that has some other lock held.
You don't need the lock since such a race would only occur if two threads attempt to assign a value concurrently. Since .NET strings are immutable, the result you get is never corrupt - in the worst case, it's outdated.
Related
Let's say I have a timer (e.g. a System.Timers.Timer), and we know each elasped event will get put into the threadpool. If events come rapidly enough, how does the threadpool manage access to shared variables (e.g. a global int counter). Does the manager use semaphores/locks under the hood?
Or does it not do anything, and just simply make a copy of shared variables at the start of the threadpool, and the last thread to finish will set the correct variable value?
Unfortunately I can't really test this because the order of events firing are not guaranteed (e.g. using a counter variable is not reliable) between each elapsed event, as they may be fired out of order.
Thanks
You have to manage multi-threaded access to shared variables yourself.
There are many answers on StackOverflow and Google explaining how to do this, search for "thread safety C#".
I've worked on huge projects with many potential threading issues, and the code I write just works. I'm damn good at writing thread safe code these days, as I've already made all of the possible mistakes.
If you are just learning to write thread safe code, then its easy to get overwhelmed by the huge amount of information out there. You might find some pages that cover the 8 different types of synchronization primitives. You will find huge discussions on the topic, and only half of them will be helpful.
If you are following the learning curve for the first time, I would recommend that you ignore said noise for now, and instead focus on mastering these two rules first:
Rule 1
If any two threads write to some shared primitive (like a long or a Dictionary or a List), put a lock around the access to this shared primitive. Aim for a situation so that when the lock is finished, the data structure is completely updated. This is the heart of writing thread safe code: all other rules for threading can be derived from this one.
Example:
// This _lock should be initialized once on program startup, and should be global.
static readonly object _dictLock = new object();
// This data structure can be accessed by multiple threads.
public static Dictionary<string, int> dict = new Dictionary<string, int>();
lock (_dictLock)
{
if (dict.ContainsKey("Hello") == false)
{
dict.Add("Hello", 42);
}
} // Lock exits: data structure is now completely 100% updated. Google "atomic access C#".
Rule 2
Try not to have locks within locks. This can create deadlocks if the locks are entered in the wrong order. If you only lock around the primitives (e.g. dictionary, long, string, etc), then this shouldn't be an issue.
Guideline 1
If you are just learning, use nothing but lock, see how to use lock. Its difficult to go wrong if you just this, as the lock is automatically released when the function exits. You can graduate to other types of locks, like reader-write locks, later on. Don't bother with ConcurrentDictionary or Interlocked.Increment yet - focus on getting the basics correct.
Guideline 2
Try to spend as little time in locks as possible. Don't put a lock around a huge block of code, put locks around the smallest possible portions in the code, usually a dictionary or a long. A lock is blindingly fast unless its contested, so this technique seems to work well to create thread safe code that is fast.
Cause of 95% of meaningful threading issues?
In my experience, the single biggest cause of thread-unsafe code is Dictionary. Even ConcurrentDictionary is not immune to this - it needs manual locking to be correct if the access is spread over multiple lines. If you get this right, you will eliminate 95% of meaningful threading issues in your code.
The thread pool can't magically make your shared mutable variables thread-safe. It has no control over them and it does not even know they exist.
Be aware of the fact that timer ticks can happen concurrently (even at low frequencies) and after the timer has been disposed. You need to perform any synchronization necessary.
The thread pool itself is thread-safe in the sense that I can successfully process concurrent work items (which is kind of the point).
I am trying to run two separate threads, like A and B. A and B running on totally different data, and A only need small part of data from B. They both need to be running all time. How can I retrieve the data from thread B and not interrupt B's running.
I am new to the multiple threads, could you tell me in examples?
That's not how threads work, threads don't “own” data (most of the time). You can access data that was used or created on another thread just like any other data, but it can be very dangerous to do so.
The problem is that most data structures are not ready to be accessed from more than one thread at the same time (they are not thread-safe). There are several ways how to fix that:
Use lock (or some other synchronization construct) to access the shared resource. Doing this makes sure that only one thread accesses the resource at a time, so it's safe. This is the most general approach (it works every time), it's probably the most common solution and the one that is easiest to get right (just lock on the right lock object every time you access the resource). But it can hurt performance, because it can make threads wait on each other a lot.
Don't share data between threads. If you have several operations that you want to run in parallel, some require resource A and others require resource B, run those that require A on one thread and those that require B on another thread. This way, you can be sure that only one thread accesses A or B, so it's safe. Another variant of this is if each thread has a copy of the resource.
Use special thread-safe data structures. For example in .Net 4, there is a whole namespace of thread-safe collections: System.Collections.Concurrent.
Use immutable data structures. If the structure doesn't change, it's safe to access it from several threads at the same time. For example, because of this it's safe to share a string between several threads.
Use special constructs that avoid locking, like Interlocked operations or volatile operations. This is how most of the structures from #3 are implemented internally and it's a solution that can be much more performant than #1. But it's also very hard to do this right, which is why you should avoid it unless you really know what you're doing.
You have several options and it can be all confusing. But the best option usually is to just use a lock to access the shared resource, or use a thread-safe structure from a library and doing that is not hard. But if you find out that's not enough, you can go for the more advanced alternatives, but it will be hard to get right.
I have used generic queue in C# collection and everyone says that it is better to use the object of System.Collection.Generic.Queue because of thread safety.
Please advise on the right decision to use Queue object, and how it is thread safe?
"Thread safe" is a bit of an unfortunate term because it doesn't really have a solid definition. Basically it means that certain operations on the object are guaranteed to behave sensibly when the object is being operated on via multiple threads.
Consider the simplest example: a counter. Suppose you have two threads that are incrementing a counter. If the sequence of events goes:
Thread one reads from counter, gets
zero.
Thread two reads from counter, gets
zero.
Thread one increments zero, writes
one to counter.
Thread two increments zero, writes
one to counter.
Then notice how the counter has "lost" one of the increments. Simple increment operations on counters are not threadsafe; to make them threadsafe you can use locks, or InterlockedIncrement.
Similarly with queues. Not-threadsafe-queues can "lose" enqueues the same way that not-threadsafe counters can lose increments. Worse, not threadsafe queues can even crash or produce crazy results if you use them in a multi-threaded scenario improperly.
The difficulty with "thread safe" is that it is not clearly defined. Does it simply mean "will not crash"? Does it mean that sensible results will be produced? For example, suppose you have a "threadsafe" collection. Is this code correct?
if (!collection.IsEmpty) Console.WriteLine(collection[0]);
No. Even if the collection is "threadsafe", that doesn't mean that this code is correct; another thread could have made the collection empty after the check but before the writeline and therefore this code could crash, even if the object is allegedly "threadsafe". Actually determining that every relevant combination of operations is threadsafe is an extremely difficult problem.
Now to come to your actual situation: anyone who is telling you "you should use the Queue class, it is better because it is threadsafe" probably does not have a clear idea of what they're talking about. First off, Queue is not threadsafe. Second, whether Queue is threadsafe or not is completely irrelevant if you are only using the object on a single thread! If you have a collection that is going to be accessed on multiple threads, then, as I indicated in my example above, you have an extremely difficult problem to solve, regardless of whether the collection itself is "threadsafe". You have to determine that every combination of operations you perform on the collection is also threadsafe. This is a very difficult problem, and if it is one you face, then you should use the services of an expert on this difficult topic.
A type that is thread safe can be safely accessed from multiple threads without concern for concurrency. This usually means that the type is read-only.
Interestingly enough, Queue<T> is not thread safe - it can support concurrent reads as long as the queue isn't modified but that isn't the same thing as thread safety.
In order to think about thread safety consider what would happen if two threads were accessing a Queue<T> and a third thread came along and began either adding to or removing from this Queue<T>. Since this type does not restrict this behavior it is not thread safe.
In dealing with multithreading, you usually have to deal with concurrency issues. The term "concurrency issues" refers to issues that are specifically introduced by the possibility of interleaving instructions from two different execution contexts on a resource shared by both. Here, in terms of thread safety, the execution contexts are two threads within a process; however, in related subjects they might be processes.
Thread safety measures are put in place to achieve two goals primarily. First is to regain determinism with regard to what happens if the threads context-switch (which is otherwise controlled by the OS and thus basically nondeterministic in user-level programs), to prevent certain tasks from being left half-finished or two contexts writing to the same location in memory one after the other. Most measures simply use a little bit of hardware-supported test-and-set instructions and the like, as well as software-level synchronization constructs to force all other execution contexts to stay away from a data type while another one is doing work that should not be interrupted.
Usually, objects that are read-only are thread-safe. Many objects that are not read-only are able to have data accesses (read-only) occur with multiple threads without issue, if the object is not modified in the middle. But this is not thread safety. Thread safety is when all manner of things are done to a data type to prevent any modifications to it by one thread from causing data corruption or deadlock even when dealing with many concurrent reads and writes.
I have multiple threads starting at the roughly the same time --- all executing the same code path. Each thread needs to write records to a table in a database. If the table doesn't exist it should be created. Obviously two or more threads could see the table as missing, and try to create it.
What is the preferred approach to ensure that this particular block of code is executed only once by only one thread.
While I'm writing in C# on .NET 2.0, I assume that the approach would be framework/language neutral.
Something like this should work...
private object lockObject = new object();
private void CreateTableIfNotPresent()
{
lock(lockObject)
{
// check for table presence and create it if necessary,
// all inside this block
}
}
Have your threads call call the CreateTableIfNotPresent function. The lock block will ensure that no thread will be able to execute the code inside of the block concurrently, so no threads will be able to view the table as not present while another is creating it.
This is a classical application for either a Mutex or a Semaphore
A mutex ensures that a specific piece of code (or several pieces of code) can only be run by a single thread at a time. You could be clever and use a different mutex for each table, or simply constrain the whole initialisation block to one thread at a time.
A semaphore (or set of semaphores) could perform exactly the same function.
Most lock implementations will use a mutex internally, so look at what lock code is already available in the language or libraries you are using.
#ebpower has it right that in certain applications, you would actually be more efficient to catch an exception caused by an attempt to create the same table multiple times, though this may not be the case in your example.
However there are many other ways of proceeding. For example, you could use a single-threaded ExecutorService (sorry, I could only find a Java reference) that has responsibility for creating any tables that your worker threads discover are missing. If it gets two requests for the same table, it simply ignores the later ones.
A variant on a Memoizer (remembering table references, creating them first if necessary) would also work under the circumstances. The book Java Concurrency In Practice walks through the implementation of a nice Memoizer class, but this would be pretty simple to port to any other language with effective concurrency building blocks.
This is what Semaphores are for.
You may not even need to bother with locks since your database shouldn't let you create multiple tables with the same name. Why not just catch the appropriate exceptions and if two threads try to create the same table, one wins and continues on, while the other recovers and continues on.
I'd use a thread sync object such as ManualResetEvent though it sounds to me like you're willing a race condition which may mean you have a design problem
Some posts have suggested Mutexes - this is an overkill unless your threads are running on different processes.
Others have suggested using locks - this is fine but locking can lead to over-pessimistic locks on data which can negate the benefit of using threads in the first place.
A more fundamental question is why are you doing it this way at all? What benefit does threading bring to the problem domain? Does concurrency solve your problem?
You may want to try static constructors to get a reference of the table.
According to the MSDN (.net 2.0), A static constructor is used to initialize any static data, or to perform a particular action that needs performed once only.
Also, CLR automatically guarantees that a static constructor executes only once per AppDomain and is thread-safe.
For more info, check Chapter 8 of CLR via C# by Jeffrey Richter.
I understand the main function of the lock key word from MSDN
lock Statement (C# Reference)
The lock keyword marks a statement
block as a critical section by
obtaining the mutual-exclusion lock
for a given object, executing a
statement, and then releasing the
lock.
When should the lock be used?
For instance it makes sense with multi-threaded applications because it protects the data. But is it necessary when the application does not spin off any other threads?
Is there performance issues with using lock?
I have just inherited an application that is using lock everywhere, and it is single threaded and I want to know should I leave them in, are they even necessary?
Please note this is more of a general knowledge question, the application speed is fine, I want to know if that is a good design pattern to follow in the future or should this be avoided unless absolutely needed.
When should the lock be used?
A lock should be used to protect shared resources in multithreaded code. Not for anything else.
But is it necessary when the application does not spin off any other threads?
Absolutely not. It's just a time waster. However do be sure that you're not implicitly using system threads. For example if you use asynchronous I/O you may receive callbacks from a random thread, not your original thread.
Is there performance issues with using lock?
Yes. They're not very big in a single-threaded application, but why make calls you don't need?
...if that is a good design pattern to follow in the future[?]
Locking everything willy-nilly is a terrible design pattern. If your code is cluttered with random locking and then you do decide to use a background thread for some work, you're likely to run into deadlocks. Sharing a resource between multiple threads requires careful design, and the more you can isolate the tricky part, the better.
All the answers here seem right: locks' usefulness is to block threads from acessing locked code concurrently. However, there are many subtleties in this field, one of which is that locked blocks of code are automatically marked as critical regions by the Common Language Runtime.
The effect of code being marked as critical is that, if the entire region cannot be entirely executed, the runtime may consider that your entire Application Domain is potentially jeopardized and, therefore, unload it from memory. To quote MSDN:
For example, consider a task that attempts to allocate memory while holding a lock. If the memory allocation fails, aborting the current task is not sufficient to ensure stability of the AppDomain, because there can be other tasks in the domain waiting for the same lock. If the current task is terminated, other tasks could be deadlocked.
Therefore, even though your application is single-threaded, this may be a hazard for you. Consider that one method in a locked block throws an exception that is eventually not handled within the block. Even if the exception is dealt as it bubbles up through the call stack, your critical region of code didn't finish normally. And who knows how the CLR will react?
For more info, read this article on the perils of Thread.Abort().
Bear in mind that there might be reasons why your application is not as single-threaded as you think. Async I/O in .NET may well call-back on a pool thread, for example, as do some of the various timer classes (not the Windows Forms Timer, though).
Generally speaking if your application is single threaded, you're not going to get much use out of the lock statement. Not knowing your application exactly, I don't know if they're useful or not - but I suspect not. Further, if you're application is using lock everywhere I don't know that I would feel all that confident about it working in a multi-threaded environment anyways - did the original developer actually know how to develop multi-threaded code, or did they just add lock statements everywhere in the vague hope that that would do the trick?
lock should be used around the code that modifies shared state, state that is modified by other threads concurrently, and those other treads must take the same lock.
A lock is actually a memory access serializer, the threads (that take the lock) will wait on the lock to enter until the current thread exits the lock, so memory access is serialized.
To answer you question lock is not needed in a single threaded application, and it does have performance side effects. because locks in C# are based on kernel sync objects and every lock you take creates a transition to kernel mode from user mode.
If you're interested in multithreading performance a good place to start is MSDN threading guidelines
You can have performance issues with locking variables, but normally, you'd construct your code to minimize the lengths of time that are spent inside a 'locked' block of code.
As far as removing the locks. It'll depend on what exactly the code is doing. Even though it's single threaded, if your object is implemented as a Singleton, it's possible that you'll have multiple clients using an instance of it (in memory, on a server) at the same time..
Yes, there will be some performance penalty when using lock but it is generally neglible enough to not matter.
Using locks (or any other mutual-exclusion statement or construct) is generally only needed in multi-threaded scenarios where multiple threads (either of your own making or from your caller) have the opportunity to interact with the object and change the underlying state or data maintained. For example, if you have a collection that can be accessed by multiple threads you don't want one thread changing the contents of that collection by removing an item while another thread is trying to read it.
Lock(token) is only used to mark one or more blocks of code that should not run simultaneously in multiple threads. If your application is single-threaded, it's protecting against a condition that can't exist.
And locking does invoke a performance hit, adding instructions to check for simultaneous access before code is executed. It should only be used where necessary.
See the question about 'Mutex' in C#. And then look at these two questions regarding use of the 'lock(Object)' statement specifically.
There is no point in having locks in the app if there is only one thread and yes, it is a performance hit although it does take a fair number of calls for that hit to stack up into something significant.