Thread-safety accessing a List<> of read-only objects - c#

I have a list of objects which will be accessible by many objects across many threads. To ensure thread safety I have made the list and its object read-only. My only concern is the iterators of the List<> object because I remember reading something about iterator thread safety issues. Do I have a problem?
For clarification: in the BarObservable class, the List < Bar > bar is read-only. The individual bars of the list are also read-only. The MarketDataAdaptor class uses BarService to add new bars to BarsObservable class. The diagram doesn't show this but the IBarObservers are passed a reference to the List < Bar > . They can't write to it but they do use the iterator of the List. Meanwhile the final bar is updated and once finalized a new bar is added to the end of the list.

As I understand it, you currently provide two immutability guarantees:
There is an unchanging reference(s) to a List<Bar> object.
The Bar type itself is immutable or is, by convention, instances of it are not mutated after they are added to the list.
Neither of these is sufficient to deal with any concurrent reader / writer scenarios since the List<T> type itself is not thread-safe.
If you have multiple unsynchronized writers, you could corrupt the list.
If you have a single writer, with reader(s) on other threads, you probably won't corrupt the list. On the other hand, the readers won't work correctly. If you're lucky, iterating the list in the middle of a write will throw a "Collection changed during enumeration" exception. If you're not, your program will silently lose its functional correctness.
Now you could try synchronizing access to the list with locks, ReaderWriterLockSlims, etc. How you do this will be specific to the Producer / Consumer relationships of your particular situation. For example, you could lock mutation during enumeration by either of these:
Hold up an attempt to mutate as long as there is an active undisposed enumerator.
Every time an enumerator is requested, block writers. Copy the list to another list; return its enumerator, and then unblock writers.
But I would suggest, if you are on .NET 4.0, to take a look at the thread-safe collection classes in the System.Collections.Concurrent namespace. In particular, you may find the BlockingCollection<T> class to be exactly what you need.
Finally, I would look at the overall design to see if you can solve this problem in a lock-free manner.

Related

Object is only read by multiple threads, does it needs synchronization

The collection object that I have is a singleton type, which contains a list of particular object, each index in the list is read by multiple threads, so that they can query an integer property value to be used by a thread local variable. Does this case need any safety using synchronization, in my view No, but posting the question to be doubly sure.
There's no update happening to the object mentioned above on the multiple threads, they are just reading. In my view even ReaderWriterLockSlim needn't be used here, since there's no write. Please confirm my understanding.
Code is something like:
Here NumOfLocs, threadProp are specific to a thread and collection count and objects doesn't change, while threads are reading, they are just fixed in the beginning during initialization
int NumOfLocs = collectionObject.LocCollection.Count;
int threadProp = collectionObject.LocCollection[index].Prop
You wouldn't need synchronization if you're only reading the collection. If you wanted to update the collection however, there's a list of Thread-safe collection classes available in System.Collections.Concurrent which you could use. See here for the MSDN documentation.
Usually functions that are meant to read the state do not change the state. But sometimes some function of some object will change some internal object's state, contrary to common sense. This for instance could happen if object is caching something, or rearranging internal structure. It is impossible to tell up front what some object does in any of its functions, without knowing the internal workings of the object.
If it's standard .NET object then there's probably a documentation for it that will tell you if the object is thread safe for reading. If it's some third party object then you have to ask that third party. If you coded the object then only you know.

When to use BlockingCollection and when ConcurrentBag instead of List<T>?

The accepted answer to the question "Why does this Parallel.ForEach code freeze the program up?" advises to substitute the List usage by ConcurrentBag in a WPF application.
I'd like to understand whether a BlockingCollection can be used in this case instead?
You can indeed use a BlockingCollection, but there is absolutely no point in doing so.
First off, note that BlockingCollection is a wrapper around a collection that implements IProducerConsumerCollection<T>. Any type that implements that interface can be used as the underlying storage:
When you create a BlockingCollection<T> object, you can specify not
only the bounded capacity but also the type of collection to use. For
example, you could specify a ConcurrentQueue<T> object for first in,
first out (FIFO) behavior, or a ConcurrentStack<T> object for last
in,first out (LIFO) behavior. You can use any collection class that
implements the IProducerConsumerCollection<T> interface. The default
collection type for BlockingCollection<T> is ConcurrentQueue<T>.
This includes ConcurrentBag<T>, which means you can have a blocking concurrent bag. So what's the difference between a plain IProducerConsumerCollection<T> and a blocking collection? The documentation of BlockingCollection says (emphasis mine):
BlockingCollection<T> is used as a wrapper for an
IProducerConsumerCollection<T> instance, allowing removal attempts
from the collection to block until data is available to be removed.
Similarly, a BlockingCollection<T> can be created to enforce an
upper-bound on the number of data elements allowed in the
IProducerConsumerCollection<T> [...]
Since in the linked question there is no need to do either of these things, using BlockingCollection simply adds a layer of functionality that goes unused.
List<T> is a collection designed to use in single thread
applications.
ConcurrentBag<T> is a class of Collections.Concurrent namespace designed
to simplify using collections in multi-thread environments. If you
use ConcurrentCollection you will not have to lock your
collection to prevent corruption by other threads. You can insert
or take data from your collection with no need to write special locking codes.
BlockingCollection<T> is designed to get rid of the requirement of checking if new data is available in the shared collection between threads. if there is new data inserted into the shared collection then your consumer thread will awake immediately. So you do not have to check if new data is available for consumer thread in certain time intervals typically in a while loop.
Whenever you find the need for a thread-safe List<T>, in most cases neither the ConcurrentBag<T> nor the BlockingCollection<T> are going to be your best option. Both collections are specialized for facilitating producer-consumer scenarios, so unless you have more than one threads that are concurrently adding and removing items from the collection, you should look for other options (with the best candidate being the ConcurrentQueue<T> in most cases).
Regarding especially the ConcurrentBag<T>, it's an extremely specialized class targeting mixed producer-consumer scenarios. This means that each worker-thread is expected to be both a producer and a consumer (that adds and removes items from the same collection). It could be a good candidate for the internal storage of an ObjectPool class, but beyond that it is hard to imagine any advantageous usage scenario for this class.
People usually think that the ConcurrentBag<T> is the thread-safe equivalent of a List<T>, but it's not. The similarity of the two APIs is misleading. Calling Add to a List<T> results to adding an item at the end of the list. Calling Add to a ConcurrentBag<T> results instead to the item being added at a random slot inside the bag. The ConcurrentBag<T> is essentially unordered. It is not optimized for being enumerated, and does a lousy job when it is commanded to do so. It maintains internally a bunch of thread-local queues, so the order of its contents is dominated by which thread did what, not by when did something happened. Before each enumeration of the ConcurrentBag<T>, all these thread-local queues are copied to an array, adding pressure to the garbage collector (source code). So for example the line var item = bag.First(); results in a copy of the whole collection, for returning just one element.
These characteristics make the ConcurrentBag<T> a less than ideal choice for storing the results of a Parallel.For/Parallel.ForEach loop.
A better thread-safe substitute of the List<T>.Add is the ConcurrentQueue<T>.Enqueue method. "Enqueue" is a less familiar word than "Add", but it actually does what you expect it to do.
There is nothing that a ConcurrentBag<T> can do that a ConcurrentQueue<T> can't. For example neither collection offers a way to remove a specific item from the collection. If you want a concurrent collection with a TryRemove method that has a key parameter, you could look at the ConcurrentDictionary<K,V> class.
The ConcurrentBag<T> appears frequently in the Task Parallel Library-related examples in Microsoft's documentation. Like here for example.
Whoever wrote the documentation, apparently they valued more the tiny usability advantage of writing Add instead of Enqueue, than the behavioral/performance disadvantage of using the wrong collection. This makes some sense considering that the examples were authored at a time when the TPL was new, and the goal was the fast adoption of the library by developers who were mostly unfamiliar with parallel programming. I get it, Enqueue is a scary word when you see it for the first time. Unfortunately now there is a whole generation of developers that have incorporated the ConcurrentBag<T> in their mental tools, although it has no business being there, considering
how specialized this collection is.
In case you want to collect the results of a Parallel.ForEach loop in exactly the same order as the source elements, you can use a List<T> protected with a lock. In most cases the overhead will be negligible, especially if the work inside the loop is chunky. An example is shown below, featuring the Select LINQ operator for getting the index of each element.
var indexedSource = source.Select((item, index) => (item, index));
List<TResult> results = new();
Parallel.ForEach(indexedSource, parallelOptions, entry =>
{
var (item, index) = entry;
TResult result = GetResult(item);
lock (results)
{
while (results.Count <= index) results.Add(default);
results[index] = result;
}
});
This is for the case that the source is a deferred sequence with unknown size. If you know its size beforehand, it is even simpler. Just preallocate a TResult[] array, and update it in parallel without locking:
TResult[] results = new TResult[source.Count];
Parallel.For(0, source.Count, parallelOptions, i =>
{
results[i] = GetResult(source[i]);
});
The TPL includes memory barriers at the end of task executions, so all the values of the results array will be visible from the current thread (citation).
Yes, you could use BlockingCollection for that. finishedProxies would be defined as:
BlockingCollection<string> finishedProxies = new BlockingCollection<string>();
and to add an item, you would write:
finishedProxies.Add(checkResult);
And when it's done, you could create a list from the contents.

List<T>: Moving from immutable to changeable structure

I am currently using quite heavily some List and I am looping very frequently via foreach over these lists.
Originally List was immuteable afer the startup. Now I have a requirement to amend the List during runtime from one thread only (a kind of listener). I need to remove from the List in object A and add to the list of object B. A and B are instances of the same class.
Unfortunaly there is no Synchronized List. What would you suggest me to do in this case? in my case speed is more important than synchronisation, thus I am currently working with copies of the lists for add/remove to avoid that the enumerators fail.
Do you have any other recommended way to deal with this?
class X {
List<T> Related {get; set;}
}
In several places and in different threads I am then using
foreach var x in X.Related
Now I need to basically perform in yet another thread
a.Related.Remove(t);
b.Related.Add(t);
To avoid potential exceptions, I am currently doing this:
List<T> aNew=new List<T> (a.Related);
aNew.Remove(t);
a.Related=aNew;
List<T>bNew=new List<T>(b.Related){t};
b.Related=bNew;
Is this correct to avoid exceptions?
From this MSDN post: http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx
"...the only way to ensure thread safety is to lock the collection during the entire enumeration. "
Consider using for loops and iterate over your collection in reverse. This way you do not have the "enumerators fail", and as you are going backwards over your collection it is consistent from the POV of the loop.
It's hard to discuss the threading aspects as there is limited detail.
Update
If your collections are small, and you only have 3-4 "potential" concurrent users, I would suggest using a plain locking strategy as suggested by #Jalal although you would need to iterate backwards, e.g.
private readonly object _syncObj = new object();
lock (_syncObj)
{
for (int i = list.Count - 1; i >= 0; i--)
{
//remove from the list and add to the second one.
}
}
You need to protect all accesses to your lists with these lock blocks.
Your current implementation uses the COW (Copy-On-Write) strategy, which can be effective in some scenarios, but your particular implementation suffers from the fact that two or more threads take a copy, make their changes, but then could potentially overwrite the results of other threads.
Update
Further to your question comment, if you are guaranteed to only have one thread updating the collections, then your use of COW is valid, as there is no chance of multiple threads making updates and updates being lost by overwriting by multiple threads. It's a good use of the COW strategy to achieve lock free synchronization.
If you bring other threads in to update the collections, my previous locking comments stand.
My only concern would be that the other "reader" threads may have cached values for the addresses of the original lists, and may not see the new addresses when they are updated. In this case make the list variables volatile.
Update
If you do go for the lock-free strategy there is still one more pitfall, there will still be a gap between setting a.Related and b.Related, in which case your reader threads could be iterating over out-of-date collections e.g. item a could have been removed from list1 but not yet added to list2 - item a will be in neither lists. You could also swap the issue around and add to list2 before removing from list1, in which case item a would be in both lists - duplicates.
If consistency is important you should use locking.
You should lock before you handle the lists since you are in multithreading mode, the lock operation itself does not affect the speed here, the lock operation is executed in nanoseconds about 10 ns depending on the machine. So:
private readonly object _listLocker = new object();
lock (_listLocker)
{
for (int itemIndex = 0; itemIndex < list.Count; itemIndex++)
{
//remove from the first list and add to the second one.
}
}
If you are using framework 4.0 I encorage you to use ConcurrentBag instead of list.
Edit: code snippet:
List<T> aNew=new List<T> (a.Related);
This will work if only all interaction with the collection "including add remove replace items" managed this way. Also you have to use System.Threading.Interlocked.CompareExchange and System.Threading.Interlocked.Exchange methods to replace the existing collection with the new modified. if that is not the case then you are doing nothing by coping
This will not work. for instance consider a thread trying to get an item from the collection, at the same time another thread replace the collection. this could leave the item retrieved in a not constant data. also consider while you are coping the collection, another thread want to insert item to the collection at the same time while you are coping?
This will throw exception indicates that the collection modified.
Another thing is that you are coping the whole collection to a new list to handle it. certainly this will harm the performance, and I think using synchronization mechanism such as lock reduce the performance pallet, and it is the much appropriated thing to do while to handle multithreading scenarios.

Iterating Collection In Two Threads

This question relates both to C# and Java
If you have a collection which is not modified and that collection reference is shared between two threads what happens when you iterate on each thread??
ThreadA: Collection.iterator
ThreadA: Collection.moveNext
ThreadB: Collection.iterator
ThreadB: Collection.moveNext
Will threadB see the first element?
Is the iterator always reset when it is requested? What happens if this is interleaved so movenext and item is interleaved? Is there a danger that you dont process all elements??
It works as expected because each time you request the iterator you get a new one.
If it didn't you wouldn't be able to do foreach followed by foreach on the same collection!
By convention, an Iterator is implemented such that the traversal action never alters the collection's state. It only points to the current position in the collection, and manages the iteration logic. Therefore, if you scan the same collection by N different threads, everything should work fine.
However, note that Java's Iterator allows removal of items, and ListIterator even supports the set operation. If you want to use these actions by at least one of the threads, you will probably face concurrency problems (ConcurrentModificationException), unless the Iterator is specifically designed for such scenarios (such as with ConcurrentHashMap's iterators).
In Java (and I am pretty sure also in C#), the standard API collections typically do not have single iterator. Each call to iterator() produces a new one, which has its own internal index or pointer, so that as long as both threads acquire their own iterator object, there should be no problem.
However, this is not guaranteed by the interface, nor is the ability of two iterators to work concurrently without problems. For custom implementations of collections, all bets are off.
In c# - yes, java - seems to be but I'm not familiar.
About c# see http://csharpindepth.com/Articles/Chapter6/IteratorBlockImplementation.aspx and http://csharpindepth.com/Articles/Chapter11/StreamingAndIterators.aspx
At least in C#, all of the standard collections can be enumerated simultaneously on different threads. However, enumeration on any thread will blow up if you modify the underlying collection during enumeration (as it should.) I don't believe any sane developer writing a collection class would have their enumerators mutate the collection state in a way that interferes with enumeration, but it's possible. If you're using a standard collection, however, you can safely assume this and therefore use locking strategies like Single Writer / Multiple Reader when synchronizing collection access.

Safe to get Count value from generic collection without locking the collection?

I have two threads, a producer thread that places objects into a generic List collection and a consumer thread that pulls those objects out of the same generic List. I've got the reads and writes to the collection properly synchronized using the lock keyword, and everything is working fine.
What I want to know is if it is ok to access the Count property without first locking the collection.
JaredPar refers to the Count property in his blog as a decision procedure that can lead to race conditions, like this:
if (list.Count > 0)
{
return list[0];
}
If the list has one item and that item is removed after the Count property is accessed but before the indexer, an exception will occur. I get that.
But would it be ok to use the Count property to, say, determine the initial size a completely different collection? The MSDN documentation says that instance members are not guaranteed to be thread safe, so should I just lock the collection before accessing the Count property?
I suspect it's "safe" in terms of "it's not going to cause anything to go catastrophically wrong" - but that you may get stale data. That's because I suspect it's just held in a simple variable, and that that's likely to be the case in the future. That's not the same as a guarantee though.
Personally I'd keep it simple: if you're accessing shared mutable data, only do so in a lock (using the same lock for the same data). Lock-free programming is all very well if you've got appropriate isolation in place (so you know you've got appropriate memory barriers, and you know that you'll never be modifying it in one thread while you're reading from it in another) but it sounds like that isn't the case here.
The good news is that acquiring an uncontested lock is incredibly cheap - so I'd go for the safe route if I were you. Threading is hard enough without introducing race conditions which are likely to give no significant performance benefit but at the cost of rare and unreproducible bugs.

Categories

Resources