I know we need to take care of thread safety for static member variables inside the class. Do we need to worry about the instance member variables?
It depends on whether you want your type to be thread-safe... and what you mean by that.
Most of the time I think it's entirely reasonable to document that the type isn't thread-safe, but can be used safely from different threads with appropriate synchronization. Most .NET types fall into this category.
That would you can usually make sure that only "coordinating" objects need to worry about synchronization, rather than putting a lock in every method and property - a strategy which is painful, and doesn't really address the wider synchronization issues you're likely to run into anyway.
Of course, types which will naturally be used from multiple threads - ones specifically design to enable concurrency, or service locators etc, should be thread-safe - and be documented so. Likewise fully immutable types are naturally thread-safe to start with.
Finally, there's the matter of what counts as "thread-safe" to start with. You should read Eric Lippert's blog post on the matter to clarify what sort of thing you should be thinking about and documenting.
Yes you should care because the same class instance method could be passed as callback to multiple threads. Example:
var instance = new Foo();
ThreadPool.QueueUserWorkItem(instance.SomeInstanceMethod);
ThreadPool.QueueUserWorkItem(instance.SomeInstanceMethod);
The instance method now needs to be synchronized because in this case the shared state is the instance itself.
Consider the following code:
public void Execute()
{
Task.Factory.StartNew(Iterrate);
Task.Factory.StartNew(Add);
}
private List<int> _list = Enumerable.Range(1, 10).ToList();
private void Iterrate()
{
foreach (var item in _list)
{
Console.WriteLine(item);
}
}
private void Add()
{
_list.Add(_list.Count);
}
This code will result (most of the times):
InvalidOperationException: Collection was modified; enumeration operation may not execute.
Related
According to this reference assignment is atomic so why is Interlocked.Exchange(ref Object, Object) needed? Reference assignment is guaranteed to be atomic on all .NET platforms. Will this code is atomic,
public static List<MyType> _items;
public static List<MyType> Items
{
get
{
if (_items== null)
{
_items= JsonConvert.DeserializeObject<List<MyType>>(ConfigurationManager.AppSettings["Items"]);
}
return _items;
}
}
I know there might be multiple object as given here. But will it Items will be atomic(I mean it will be either null or List and not in middle)?
No, this code is not atomic - if Items is accessed from multiple threads in parallel, _items may actually get created more than once and different callers may receive a different value.
This code needs locking because it first performs a read, a branch and a write (after an expensive deserialization call). The read and the write by themselves are atomic but - without a lock - there's nothing to prevent the system to switch to another thread between the read and the write.
In pseudo(ish) code, this is what may happen:
if (_items==null)
// Thread may be interrupted here.
{
// Thread may be interrupted inside this call in many places,
// so another thread may enter the body of the if() and
// call this same function again.
var s = ConfigurationManager.AppSettings.get_Item("Items");
// Thread may be interrupted inside this call in many places,
// so another thread may enter the body of the if() and
// call this same function again.
var i = JsonConvert.DeserializeObject(s);
// Thread may be interrupted here.
_items = i;
}
// Thread may be interrupted here.
return (_items);
This shows you that without locking it's possible for multiple callers to get a different instance of the Items list.
You should look into using Lazy<T> which will make this sort of initialization a lot simpler and safe.
When should I use Lazy<T>?
Also, keep in mind that List<T> itself is not thread-safe - you may want to use a different type (like ConcurrentDictionary<T1, T2> or ReadOnlyCollection<T>) or you may need to use locking around all operations against this list.
Rob, in the comments, pointed out that the question may be about whether a given assignment is atomic - a single assignment (that is a single write) of a reference is guaranteed to be atomic but that doesn't make this code safe because there's more than a single assignment here.
When implementing a class intended to be thread-safe, should I include a memory barrier at the end of its constructor, in order to ensure that any internal structures have completed being initialized before they can be accessed? Or is it the responsibility of the consumer to insert the memory barrier before making the instance available to other threads?
Simplified question:
Is there a race hazard in the code below that could give erroneous behaviour due to the lack of a memory barrier between the initialization and the access of the thread-safe class? Or should the thread-safe class itself protect against this?
ConcurrentQueue<int> queue = null;
Parallel.Invoke(
() => queue = new ConcurrentQueue<int>(),
() => queue?.Enqueue(5));
Note that it is acceptable for the program to enqueue nothing, as would happen if the second delegate executes before the first. (The null-conditional operator ?. protects against a NullReferenceException here.) However, it should not be acceptable for the program to throw an IndexOutOfRangeException, NullReferenceException, enqueue 5 multiple times, get stuck in an infinite loop, or do any of the other weird things caused by race hazards on internal structures.
Elaborated question:
Concretely, imagine that I were implementing a simple thread-safe wrapper for a queue. (I'm aware that .NET already provides ConcurrentQueue<T>; this is just an example.) I could write:
public class ThreadSafeQueue<T>
{
private readonly Queue<T> _queue;
public ThreadSafeQueue()
{
_queue = new Queue<T>();
// Thread.MemoryBarrier(); // Is this line required?
}
public void Enqueue(T item)
{
lock (_queue)
{
_queue.Enqueue(item);
}
}
public bool TryDequeue(out T item)
{
lock (_queue)
{
if (_queue.Count == 0)
{
item = default(T);
return false;
}
item = _queue.Dequeue();
return true;
}
}
}
This implementation is thread-safe, once initialized. However, if the initialization itself is raced by another consumer thread, then race hazards could arise, whereby the latter thread would access the instance before the internal Queue<T> has been initialized. As a contrived example:
ThreadSafeQueue<int> queue = null;
Parallel.For(0, 10000, i =>
{
if (i == 0)
queue = new ThreadSafeQueue<int>();
else if (i % 2 == 0)
queue?.Enqueue(i);
else
{
int item = -1;
if (queue?.TryDequeue(out item) == true)
Console.WriteLine(item);
}
});
It is acceptable for the code above to miss some numbers; however, without the memory barrier, it could also be getting a NullReferenceException (or some other weird result) due to the internal Queue<T> not having been initialized by the time that Enqueue or TryDequeue are called.
Is it the responsibility of the thread-safe class to include a memory barrier at the end of its constructor, or is it the consumer who should include a memory barrier between the class's instantiation and its visibility to other threads? What is the convention in the .NET Framework for classes marked as thread-safe?
Edit: This is an advanced threading topic, so I understand the confusion in some of the comments. An instance can appear as half-baked if accessed from other threads without proper synchronization. This topic is discussed extensively within the context of double-checked locking, which is broken under the ECMA CLI specification without the use of memory barriers (such as through volatile). Per Jon Skeet:
The Java memory model doesn't ensure that the constructor completes before the reference to the new object is assigned to instance. The Java memory model underwent a reworking for version 1.5, but double-check locking is still broken after this without a volatile variable (as in C#).
Without any memory barriers, it's broken in the ECMA CLI specification too. It's possible that under the .NET 2.0 memory model (which is stronger than the ECMA spec) it's safe, but I'd rather not rely on those stronger semantics, especially if there's any doubt as to the safety.
Lazy<T> is a very good choice for Thread-Safe Initialization. I think it should be left to the consumer to provide that:
var queue = new Lazy<ThreadSafeQueue<int>>(() => new ThreadSafeQueue<int>());
Parallel.For(0, 10000, i =>
{
else if (i % 2 == 0)
queue.Value.Enqueue(i);
else
{
int item = -1;
if (queue.Value.TryDequeue(out item) == true)
Console.WriteLine(item);
}
});
Should thread-safe class have a memory barrier at the end of its
constructor?
I do not see a reason for this. The queue is local variable that is assigned from one thread and accessed from another. Such concurrent access should be synchronized and it is responsibility of the accessing code to do so. It has nothing to do with constructor or type of the variable, such access should always be explicitly synchronized or you are entering a dangerous area even for primitive types (even if the assignment is atomic, you may get caught is some cache trap). If the access to the variable is properly synchronized, it does not need any support in the constructor.
I'll attempt to answer this interesting and well-presented question, based on the comments by Servy and Douglas, and on information coming from other related questions. What follows is just my assumptions, and not solid information from a reputable source.
Thread-safe classes have properties and methods that can be safely invoked by multiple threads concurrently, but their constructors are not thread-safe. This means that it is entirely possible for a thread to "see" an instance of a thread-safe class having an invalid state, provided that the instance is constructed concurrently by another thread.
Adding the line Thread.MemoryBarrier(); at the end of the constructor is not enough to make the constructor thread-safe, because this statement only affects the thread that runs the constructor¹. The other threads that may access concurrently the under-construction instance are not affected. Memory-visibility is cooperative, and one thread cannot change what another thread "sees" by altering the other thread's execution flow (or invalidating the local cache of the CPU-core that the other thread is running on) in a non-cooperative manner.
The correct and robust way to ensure that all threads are seeing the instance having a valid state, is to include proper memory barriers in all threads. This can be achieved by either declaring the instance as volatile, in case it is a field of a class, or otherwise using the methods of the static Volatile class:
ThreadSafeQueue<int> queue = null;
Parallel.For(0, 10000, i =>
{
if (i == 0)
Volatile.Write(ref queue, new ThreadSafeQueue<int>());
else if (i % 2 == 0)
Volatile.Read(ref queue)?.Enqueue(i);
else
{
int item = -1;
if (Volatile.Read(ref queue)?.TryDequeue(out item) == true)
Console.WriteLine(item);
}
});
In this particular example it would be simpler and more efficient to instantiate the queue variable before invoking the Parallel.For method. Doing so would render unnecessary the explicit Volatile invocations. The Parallel.For method is using Tasks internally, and TPL includes the appropriate memory barriers at the beginning/end of each task. Memory barriers are generated implicitly and automatically by the .NET infrastructure, by any built-in mechanism that starts a thread or causes a delegate to execute on another thread. (citation)
I'll repeat that I'm not 100% confident about the correctness of the information presented above.
¹ Quoting from the documentation of the Thread.MemoryBarrier method: Synchronizes memory access as follows: The processor executing the current thread cannot reorder instructions in such a way that memory accesses prior to the call to MemoryBarrier() execute after memory accesses that follow the call to MemoryBarrier().
No, you don't need memory barrier in the constructor. Your assumption, even though demonstrating some creative thought - is wrong. No thread can get a half backed instance of queue. The new reference is "visible" to the other threads only when the initialization is done. Suppose thread_1 is the first thread to initialize queue - it goes through the ctor code, but queue's reference in the main stack is still null! only when thread_1 exists the constructor code it assigns the reference.
See comments below and OP elaborated question.
I have a lightly used dictionary which is hardly ever going to be read or updated since the individual items raise events and return their results with their event args. In fact the thread is always going to be updated with the same thread. I was thinking about adding a simple lock just to be safe. I was wondering if I can just place the lock in the get accessor. Does this work?
Dictionary<string,Indicator> indicators = new Dictionary<string,Indicator>();
Dictionary<string, Indicator> Indicators
{
get
{
lock (indicators)
{
return indicators;
}
}
}
public void AddIndicator(Indicator i)
{
lock (indicators)
{
indicators.Add(i.Name, i);
}
}
That doesn't do anything particularly useful, no.
In particular, if you have:
x = foo.Indicators["blah"]
then the indexer will be executed without the thread holding the lock... so it's not thread-safe. Think of the above code like this:
Dictionary<string, Indicator> indicators = foo.Indicators;
// By now, your property getter has completed, and the lock has been released...
x = indicators["blah"];
Do you ever need to do anything with the collection other than access it via the indexer? If not, you might want to just replace the property with a method:
public Indicator GetIndicator(string name)
{
lock (indicators)
{
return indicators[name];
}
}
(You may want to use TryGetValue instead, etc - it depends on what you're trying to achieve.)
Personally I'd prefer to use a reference to a privately-owned-and-otherwise-unused lock object rather than locking on the collection reference, but that's a separate matter.
As mentioned elsewhere, ConcurrentDictionary is your friend if you're using .NET 4, but of course it's not available prior to that :(
Other than Jon's input, I'll say don't lock the collection indicators itself anyway, from MSDN:
Use caution when locking on instances,
for example lock(this) in C# or
SyncLock(Me) in Visual Basic. If other
code in your application, external to
the type, takes a lock on the object,
deadlocks could occur.
It is recommended to use a dedicated object instance to lock onto. There are other places where this is covered with more details and reasons why - even here on SO, should you care to search for the information when you have time.
Alternatively, you could use ConcurrentDictionary which handles the thread safety for you.
Short answer: YES.
Why shouldn't that work, but as mention by Jon, it does not lock as intended when using indexes?
Suppose I have this method:
void Foo(int bar)
{
// do stuff
}
Here is the behavior I want Foo to have:
If thread 1 calls Foo(1) and thread 2 calls Foo(2), both threads can run concurrently.
If thread 1 calls Foo(1) and thread 2 calls Foo(1), both threads cannot run concurrently.
Is there a good, standard way in .net to specify this type of behavior? I have a solution that uses a dictionary of objects to lock on, but that feels kind of messy.
Use a dictionary that provides different lock objects for the different arguments. Set up the dictionary when you instantiate the underlying object (or statically, if applicable):
var locks = new Dictionary<int, object>() {
{1, new Object()},
{2, new Object()},
…
};
And then use it inside your method:
void Foo(int bar) {
lock (locks[bar]) {
…
}
}
I wouldn’t say that this solution is messy, on the contrary: providing a fine lock granularity is commendable and since locks on value types don’t work in .NET, having a mapping is the obvious solution.
Be careful though: the above only works as long as the dictionary isn’t concurrently modified and read. It is therefore best to treat the dictionary as read-only after its set-up.
Bottom line: you can't lock on value types.
The dictionary you're using is the best approach I can think of. It's kludgey, but it works.
Personally, I'd pursue an architectural solution that makes the locking unnecessary, but I don't know enough about your system to give you pointers there.
Using Dictionary is not enough, you should use "ConcurrentDictionary" or implement a data structure that supports multi-thread access.
Creating a Dictionary<> so that you can lock on a value seems overkill to me. I got this working using a string. There are people (e.g. Jon Skeet) who do not like this approach (and for valid reasons - see this post: Is it OK to use a string as a lock object?)
But I have a way to mitigate for those concerns: intern the string on the fly and combine it with an unique identifier.
// you should insert your own guid here
string lockIdentifier = "a8ef3042-e866-4667-8673-6e2268d5ab8e";
public void Foo(int bar)
{
lock (string.Intern(string.Format("{0}-{1}", lockIdentifier, bar)))
{
// do stuff
}
}
What happens is that distinct values are stored in a string intern pool (which crosses AppDomain boundaries). Adding lockIdentifier to the string ensures that the string won't conflict with interned strings used in other applications, meaning the lock will only take effect in your own application.
So the intern pool will return a reference to an interned string - this is ok to lock on.
UPDATED: now using a read-only collection based on comments below
I believe that the following code should be thread safe "lock free" code, but want to make sure I'm not missing something...
public class ViewModel : INotifyPropertyChanged
{
//INotifyPropertyChanged and other boring stuff goes here...
private volatile List<string> _data;
public IEnumerable<string> Data
{
get { return _data; }
}
//this function is called on a timer and runs on a background thread
private void RefreshData()
{
List<string> newData = ACallToAService();
_data = newData.AsReadOnly();
OnPropertyChanged("Data"); // yes, this dispatches the to UI thread
}
}
Specifically, I know that I could use a lock(_lock) or even an Interlocked.Exchange() but I don't believe that there is a need for it in this case. The volatile keyword should be sufficient (to make sure the value isn't cached), no? Can someone please confirm this, or else let me know what I don't understand about threading :)
I have no idea whether that is "safe" or not; it depends on precisely what you mean by "safe". For example, if you define "safe" as "a consistent ordering of all volatile writes is guaranteed to be observed from all threads", then your program is not guaranteed to be "safe" on all hardware.
The best practice here is to use a lock unless you have an extremely good reason not to. What is your extremely good reason to write this risky code?
UPDATE: My point is that low-lock or no-lock code is extremely risky and that only a small number of people in the world actually understand it. Let me give you an example, from Joe Duffy:
// deeply broken, do not use!
class Singleton {
private static object slock = new object();
private static Singleton instance;
private static bool initialized;
private Singleton() {}
public Instance {
get {
if (!initialized) {
lock (slock) {
if (!initialized) {
instance = new Singleton();
initialized = true;
}
}
}
return instance;
}
}
}
This code is broken; it is perfectly legal for a correct implementation of the C# compiler to write you a program that returns null for the instance. Can you see how? If not, then you have no business doing low-lock or no-lock programming; you will get it wrong.
I can't figure out this stuff myself; it breaks my brain. That's why I try to never do low-lock programming that departs in any way from standard practices that have been analyzed by experts.
It depends on what the intent is. The get/set of the list is atomic (even without volatile) and non-cached (volatile), but callers can mutate the list, which is not guaranteed thread-safe.
There is also a race condition that could lose data:
obj.Data.Add(value);
Here value could easily be discarded.
I would use an immutable (read-only) collection.
I think that if you have only two threads like you described, your code is correct and safe. And also you don't need that volatile, it is useless here.
But please don't call it "thread safe", as it is safe only for your two threads using it your special way.
I believe that this is safe in itself (even without volatile), however there may be issues depending on how other threads use the Data property.
Provided that you can guarantee that all other threads read and cache the value of Data once before doing enumeration on it (and don't try to cast it to some broader interface to perform other operations), and make no consistency assumptions for a second access to the property, then you should be ok. If you can't make that guarantee (and it'd be hard to make that guarantee if eg. one of the users is the framework itself via data-binding, and hence code that you do not control), then you can't say that it's safe.
For example, this would be safe:
foreach (var item in x.Data)
{
// do something with item
}
And this would be safe (provided that the JIT isn't allowed to optimise away the local, which I think is the case):
var data = x.Data;
var item1 = FindItem(data, a);
var item2 = FindItem(data, b);
DoSomething(item1, item2);
The above two might act on stale data, but it will always be consistent data. But this would not necessarily be safe:
var item1 = FindItem(x.Data, a);
var item2 = FindItem(x.Data, b);
DoSomething(item1, item2);
This one could possibly be searching two different states of the collection (before and after some thread replaces it), so it may not be safe to operate on items found in each separate enumeration, as they may not be consistent with each other.
The issue would be worse with a broader interface; eg. if Data exposed IList<T> you'd have to watch for consistency of Count and indexer operations as well.