is enumerator thread safe after getting with lock? - c#

I am wondering if the returned enumerator is thread safe:
public IEnumerator<T> GetEnumerator()
{
lock (_sync) {
return _list.GetEnumerator();
}
}
If I have multiple threads whom are adding data (also within lock() blocks) to this list and one thread enumerating the contents of this list. When the enumerating thread is done it clears the list. Will it then be safe to use the enumerator gotten from this method.
I.e. does the enumerator point to a copy of the list at the instance it was asked for or does it always point back to the list itself, which may or may not be manipulated by another thread during its enumeration?
If the enumerator is not thread safe, then the only other course of action I can see is to create a copy of the list and return that. This is however not ideal as it will generate lots of garbage (this method is called about 60 times per second).

No, not at all. This lock synchronizes only access to _list.GetEnumerator method; where as enumerating a list is lot more than that. It includes reading the IEnumerator.Current property, calling IEnumerator.MoveNext etc.
You either need a lock over the foreach(I assume you enumerate via foreach), or you need to make a copy of list.
Better option is to take a look at Threadsafe collections provided out of the box.

According to the documentation, to guarantee thread-safity you have to lock collecton during entire iteration over it.
The enumerator does not have exclusive access to the collection;
therefore, enumerating through a collection is intrinsically not a
thread-safe procedure. To guarantee thread safety during enumeration,
you can lock the collection during the entire enumeration. To allow
the collection to be accessed by multiple threads for reading and
writing, you must implement your own synchronization.
Another option, may be define you own custom iterator, and for every thread create a new instance of it. So every thread will have it's own copy of Current read-only pointer to the same collection.

If I have multiple threads whom are adding data (also within lock() blocks) to this list and one thread enumerating the contents of this
list. When the enumerating thread is done it clears the list. Will it
then be safe to use the enumerator gotten from this method.
No. See reference here: http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.aspx
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and the next call to MoveNext or Reset throws an
InvalidOperationException. If the collection is modified between
MoveNext and Current, Current returns the element that it is set to,
even if the enumerator is already invalidated. The enumerator does not
have exclusive access to the collection; therefore, enumerating
through a collection is intrinsically not a thread-safe procedure.
Even when a collection is synchronized, other threads can still modify
the collection, which causes the enumerator to throw an exception. To
guarantee thread safety during enumeration, you can either lock the
collection during the entire enumeration or catch the exceptions
resulting from changes made by other threads.
..
does the enumerator point to a copy of the list at the instance it was asked for or does it always point back to the list itself, which
may or may not be manipulated by another thread during its
enumeration?
Depends on the collection. See the Concurrent Collections. Concurrent Stack, ConcurrentQueue, ConcurrentBag all take snapshots of the collection when GetEnumerator() is called and returns elements from the snapshot. The underlying collection may change without changing the snapshot. On the other hand, ConcurrentDictionary doesn't take a snapshot, so changing the collection while iterating will immediately effect things per the rules above.
A trick I sometimes use in this case is to create a temporary collection to iterate so the original is free while I use the snapshot:
foreach(var item in items.ToList()) {
//
}
If your list is too large and this causes GC churn, then locking is probably your best bet. If locking is too heavy, maybe consider a partial iteration each timeslice, if that is feasible.
You said:
When the enumerating thread is done it clears the list.
Nothing says you have to process the whole list at a time. You could instead remove a range of items, move them to a separate enumerating thread, let that process, and repeat. Perhaps iteratation and lists aren't the best model here. Consider the ConcurrentQueue with which you could build producers and consumer model, and consumers just steadily remove items to process without iteration

Related

Is it thread-safe to replace a concurrent collection when there may be iterators?

I've been reading various things and it seems like this should work but I want to be sure. I have a static property that is supposed to be a cache (plus some other functionality related to the cache data). It stores the actual data in a ConcurrentBag, and it has an IEnumerable method to [filter and] yield return values from this bag. It updates itself like this:
ConcurrentBag<Foo> NewBag = GetNewThings();
Cache = NewBag;
From what I've read, it seems like this should work, though I didn't expect it to. I expected this to blow up any iterators that were reading when this happens. However I read that if another thread were iterating through the old list, it would finish on that instance of the list while the new list gets swapped. A second (new) thread would start on the new list, even while the old thread is still iterating through their old instance. This seems magic though, so I'm probably wrong, yes?
The other threads only iterate the list, the only writing happens here on these two lines.
I read that if another thread were iterating through the old list, it
would finish on that instance of the list while the new list gets
swapped. A second (new) thread would start on the new list, even while
the old thread is still iterating through their old instance. This
seems magic though, so I'm probably wrong, yes?
What you read is correct.
This is thread-safe, but you might want to make the variable volatile or use Volatile.Read and Volatile.Write to ensure immediate visibility to all threads.
The original instance assigned to Cache still exists, and that's the one the old thread is iterating over.
The new thread operates on the GetNewThings() result.

C# Concurrent Data Structure which can be Iterated through

I am looking for a thread safe concurrent data structure which I am able to iterate though. order does not matter to me.
I want to have a Collection of Objects.
One thread will be iterating through this collection continuously calling each objects Object.Add(Item)
eg:
Collection<Object> collection = new Collection<Object>();
//Run On New thread
while(tasksAvaliable)
{
foreach(var item in collection)
{
item.Add(task); //I get tasks from a Concurrent Queue
}
}
In another thread / timer , it will attempt to try Add, Edit, Remove these objects from the collection. (however less frequently than the other thread)
What Collection Can I use? I am currently Using a concurrent Bag. But I am not sure that when I use a foreach on it that every Object in the collection will receive my Task which I provide it in the foreach loop.
I was thinking alternatively I could use a normal List<Object> and simply lock it when I Add,edit, or remove objects from it. However objects which I am not working with will get blocked. Which i would prefer to miss, ie I would prefer a fine grain lock.
Create your own class (wrapper) on IList, which provide threadsafe access or try use ConcurrentBag<T>

iterating through ConcurrentDictionary and modifying the collection

What is the correct way to iterate through ConcurrentDictionary collection and deleting some entries.
As I understand ConcurrentDictionary implements locking on a level of a cell, and for me is important to iterate through whole collection and remove irrelevant data via some condition. I affraid that through my iteration other thread will add data that should be filtered after my iterator was after that point in the collection.
So how I can achieve this with ConcurrentDictionary ,or should use ordinary Dictionary with locking mechanism on each touch of that collection.
Thanks.
ConcurrentDictionary's main advantage IMO is that it's lock free for the reading threads, which is usually why you'd use it; if you don't need that advantage then you can just use a dictionary with a lock.
You could always ensure the filter criteria are passed to any adds that occur after your filter iteration starts.
Otherwise, if non-blocking reads are more important than the occasional inconsistent entry, you could just repeat your filter until it finds no more invalid items.
Definetely you can't garante, that no entries will be added to ConcurrentDictionary while you iterate it without locking the dictionary. But if do locking, then there no sence in using ConcurrentDictionary, use Dictionary instead.
Btw, why it is important for you to end iteration with no 'bad' entries left? It can be populated with any of them the next moment after releasing the lock. Maybe it's better to not let unwanted entries to appear in dictionary?

Thread safe OfType

I am facing the problem that this code:
enumerable.OfType<Foo>().Any(x => x.Footastic == true);
Isnt thread safe and throws an enumeration has changed exception.
Is there a good way to overcome this issue?
Already tried the following but it didnt always work (seems to not fire this often)
public class Foo
{
public void DoSomeMagicWithCollection(IEnumerable enumerable)
{
lock (enumerable)
{
enumerable.OfType<Foo>().Any(x => x.Footastic == true);
}
}
}
If you're getting an exception that the underlying collection has changed while enumerating it, given that this code clearly doesn't mutate the collection itself, it means that another thread is mutating the collection while you're trying to iterate it.
There is no possible solution to this problem other than simply not doing that. What's happening is that the enumerator of the List (or whatever collection type that is) is throwing the exception and preventing further enumeration because it can see that the list was modified during the enumeration. There is no way for the enumerators of OfType of Any that wrap it to possibly recover from that. The underlying enumerator is refusing to give them the data from the list. They can't do anything about that.
You need to use some sort of synchronization mechanism to prevent another thread from mutating the collection wnile this thread is enumerating this collection. Your lock doesn't prevent another thread from using the collection, it simply prevents any code that locks on the same instance from running. You need to have any code that could possibly mutate the list also lock on the same object to properly synchronize them.
Another possibility would be to use a collection that is inherently designed to be accessed from multiple threads at the same time. There are several such collections in the System.Collections.Concurrent namespace. They may or may not fit your needs. They will take care of synchronizing access to their data (to a point) on their own, without you needing to explicitly lock when accessing them.

List<T>: Moving from immutable to changeable structure

I am currently using quite heavily some List and I am looping very frequently via foreach over these lists.
Originally List was immuteable afer the startup. Now I have a requirement to amend the List during runtime from one thread only (a kind of listener). I need to remove from the List in object A and add to the list of object B. A and B are instances of the same class.
Unfortunaly there is no Synchronized List. What would you suggest me to do in this case? in my case speed is more important than synchronisation, thus I am currently working with copies of the lists for add/remove to avoid that the enumerators fail.
Do you have any other recommended way to deal with this?
class X {
List<T> Related {get; set;}
}
In several places and in different threads I am then using
foreach var x in X.Related
Now I need to basically perform in yet another thread
a.Related.Remove(t);
b.Related.Add(t);
To avoid potential exceptions, I am currently doing this:
List<T> aNew=new List<T> (a.Related);
aNew.Remove(t);
a.Related=aNew;
List<T>bNew=new List<T>(b.Related){t};
b.Related=bNew;
Is this correct to avoid exceptions?
From this MSDN post: http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx
"...the only way to ensure thread safety is to lock the collection during the entire enumeration. "
Consider using for loops and iterate over your collection in reverse. This way you do not have the "enumerators fail", and as you are going backwards over your collection it is consistent from the POV of the loop.
It's hard to discuss the threading aspects as there is limited detail.
Update
If your collections are small, and you only have 3-4 "potential" concurrent users, I would suggest using a plain locking strategy as suggested by #Jalal although you would need to iterate backwards, e.g.
private readonly object _syncObj = new object();
lock (_syncObj)
{
for (int i = list.Count - 1; i >= 0; i--)
{
//remove from the list and add to the second one.
}
}
You need to protect all accesses to your lists with these lock blocks.
Your current implementation uses the COW (Copy-On-Write) strategy, which can be effective in some scenarios, but your particular implementation suffers from the fact that two or more threads take a copy, make their changes, but then could potentially overwrite the results of other threads.
Update
Further to your question comment, if you are guaranteed to only have one thread updating the collections, then your use of COW is valid, as there is no chance of multiple threads making updates and updates being lost by overwriting by multiple threads. It's a good use of the COW strategy to achieve lock free synchronization.
If you bring other threads in to update the collections, my previous locking comments stand.
My only concern would be that the other "reader" threads may have cached values for the addresses of the original lists, and may not see the new addresses when they are updated. In this case make the list variables volatile.
Update
If you do go for the lock-free strategy there is still one more pitfall, there will still be a gap between setting a.Related and b.Related, in which case your reader threads could be iterating over out-of-date collections e.g. item a could have been removed from list1 but not yet added to list2 - item a will be in neither lists. You could also swap the issue around and add to list2 before removing from list1, in which case item a would be in both lists - duplicates.
If consistency is important you should use locking.
You should lock before you handle the lists since you are in multithreading mode, the lock operation itself does not affect the speed here, the lock operation is executed in nanoseconds about 10 ns depending on the machine. So:
private readonly object _listLocker = new object();
lock (_listLocker)
{
for (int itemIndex = 0; itemIndex < list.Count; itemIndex++)
{
//remove from the first list and add to the second one.
}
}
If you are using framework 4.0 I encorage you to use ConcurrentBag instead of list.
Edit: code snippet:
List<T> aNew=new List<T> (a.Related);
This will work if only all interaction with the collection "including add remove replace items" managed this way. Also you have to use System.Threading.Interlocked.CompareExchange and System.Threading.Interlocked.Exchange methods to replace the existing collection with the new modified. if that is not the case then you are doing nothing by coping
This will not work. for instance consider a thread trying to get an item from the collection, at the same time another thread replace the collection. this could leave the item retrieved in a not constant data. also consider while you are coping the collection, another thread want to insert item to the collection at the same time while you are coping?
This will throw exception indicates that the collection modified.
Another thing is that you are coping the whole collection to a new list to handle it. certainly this will harm the performance, and I think using synchronization mechanism such as lock reduce the performance pallet, and it is the much appropriated thing to do while to handle multithreading scenarios.

Categories

Resources