iterating through ConcurrentDictionary and modifying the collection - c#

What is the correct way to iterate through ConcurrentDictionary collection and deleting some entries.
As I understand ConcurrentDictionary implements locking on a level of a cell, and for me is important to iterate through whole collection and remove irrelevant data via some condition. I affraid that through my iteration other thread will add data that should be filtered after my iterator was after that point in the collection.
So how I can achieve this with ConcurrentDictionary ,or should use ordinary Dictionary with locking mechanism on each touch of that collection.
Thanks.

ConcurrentDictionary's main advantage IMO is that it's lock free for the reading threads, which is usually why you'd use it; if you don't need that advantage then you can just use a dictionary with a lock.
You could always ensure the filter criteria are passed to any adds that occur after your filter iteration starts.
Otherwise, if non-blocking reads are more important than the occasional inconsistent entry, you could just repeat your filter until it finds no more invalid items.

Definetely you can't garante, that no entries will be added to ConcurrentDictionary while you iterate it without locking the dictionary. But if do locking, then there no sence in using ConcurrentDictionary, use Dictionary instead.
Btw, why it is important for you to end iteration with no 'bad' entries left? It can be populated with any of them the next moment after releasing the lock. Maybe it's better to not let unwanted entries to appear in dictionary?

Related

is enumerator thread safe after getting with lock?

I am wondering if the returned enumerator is thread safe:
public IEnumerator<T> GetEnumerator()
{
lock (_sync) {
return _list.GetEnumerator();
}
}
If I have multiple threads whom are adding data (also within lock() blocks) to this list and one thread enumerating the contents of this list. When the enumerating thread is done it clears the list. Will it then be safe to use the enumerator gotten from this method.
I.e. does the enumerator point to a copy of the list at the instance it was asked for or does it always point back to the list itself, which may or may not be manipulated by another thread during its enumeration?
If the enumerator is not thread safe, then the only other course of action I can see is to create a copy of the list and return that. This is however not ideal as it will generate lots of garbage (this method is called about 60 times per second).
No, not at all. This lock synchronizes only access to _list.GetEnumerator method; where as enumerating a list is lot more than that. It includes reading the IEnumerator.Current property, calling IEnumerator.MoveNext etc.
You either need a lock over the foreach(I assume you enumerate via foreach), or you need to make a copy of list.
Better option is to take a look at Threadsafe collections provided out of the box.
According to the documentation, to guarantee thread-safity you have to lock collecton during entire iteration over it.
The enumerator does not have exclusive access to the collection;
therefore, enumerating through a collection is intrinsically not a
thread-safe procedure. To guarantee thread safety during enumeration,
you can lock the collection during the entire enumeration. To allow
the collection to be accessed by multiple threads for reading and
writing, you must implement your own synchronization.
Another option, may be define you own custom iterator, and for every thread create a new instance of it. So every thread will have it's own copy of Current read-only pointer to the same collection.
If I have multiple threads whom are adding data (also within lock() blocks) to this list and one thread enumerating the contents of this
list. When the enumerating thread is done it clears the list. Will it
then be safe to use the enumerator gotten from this method.
No. See reference here: http://msdn.microsoft.com/en-us/library/system.collections.ienumerator.aspx
An enumerator remains valid as long as the collection remains
unchanged. If changes are made to the collection, such as adding,
modifying, or deleting elements, the enumerator is irrecoverably
invalidated and the next call to MoveNext or Reset throws an
InvalidOperationException. If the collection is modified between
MoveNext and Current, Current returns the element that it is set to,
even if the enumerator is already invalidated. The enumerator does not
have exclusive access to the collection; therefore, enumerating
through a collection is intrinsically not a thread-safe procedure.
Even when a collection is synchronized, other threads can still modify
the collection, which causes the enumerator to throw an exception. To
guarantee thread safety during enumeration, you can either lock the
collection during the entire enumeration or catch the exceptions
resulting from changes made by other threads.
..
does the enumerator point to a copy of the list at the instance it was asked for or does it always point back to the list itself, which
may or may not be manipulated by another thread during its
enumeration?
Depends on the collection. See the Concurrent Collections. Concurrent Stack, ConcurrentQueue, ConcurrentBag all take snapshots of the collection when GetEnumerator() is called and returns elements from the snapshot. The underlying collection may change without changing the snapshot. On the other hand, ConcurrentDictionary doesn't take a snapshot, so changing the collection while iterating will immediately effect things per the rules above.
A trick I sometimes use in this case is to create a temporary collection to iterate so the original is free while I use the snapshot:
foreach(var item in items.ToList()) {
//
}
If your list is too large and this causes GC churn, then locking is probably your best bet. If locking is too heavy, maybe consider a partial iteration each timeslice, if that is feasible.
You said:
When the enumerating thread is done it clears the list.
Nothing says you have to process the whole list at a time. You could instead remove a range of items, move them to a separate enumerating thread, let that process, and repeat. Perhaps iteratation and lists aren't the best model here. Consider the ConcurrentQueue with which you could build producers and consumer model, and consumers just steadily remove items to process without iteration

Should I use a lock or ToList()

To avoid the famous error:
Collection was modified; enumeration operation may not execute
when iterating an IEnumerator, should I just use ToList() as suggested in this questions, or should I use lock when updating the list?
My concern is that ToList() is simply creating a new list each time, while lock will only have a small delay at some points when updating the list.
So, should I use ToList() or lock?
The error you are seeing tells you that you are trying to modify collection while you are enumerating it. The lock keyword is applicable in multi-threaded situation and it makes sure your only one thread is in critical section. It has nothing to do with "locking" collection so that you can enumerate it while also modifying it.
The error you are seeing typically occurs with code like below
foreach(var item in items)
{
//Inside this loop now you are enumerating collection
//This means you can't add or delete anything from items
items.Add(newItem); //This will produce the error you are seeing
}
This error might also occur if you have one thread enumerating collection while other thread adding or removing from collection. So if you know you have multi-threaded situation then you can use probably use lock although it is bit tricky and may impact your concurrency performance and even cause race condition. For multi-threaded situation, using just ToList may not be enough either because you need to make sure only one thread has access to collection while you are calling ToList. So you may need to use ToList along with lock and it may be overall simpler option depending your scenario. To see an example of how to do thread-safe modify-while-enumerate see this answer: Making a "modify-while-enumerating" collection thread-safe.
If you know you don't have multiple thread then ToList is probably your only option and you don't need lock at all. I would also suggest to make sure if you really do need to modify collection while enumerating it. In many cases (but not all), you can do design bit differently to avoid it in first place.

ConcurrentDictionary enumeration and locking

If I have a ConcurrentDictionary, do I need to lock it when looping thru it using foreach?
If I have a ConcurrentDictionary, do I need to lock it when looping thru it using foreach?
No. From the docs for ConcurrentDictionary.GetEnumerator:
The enumerator returned from the dictionary is safe to use concurrently with reads and writes to the dictionary, however it does not represent a moment-in-time snapshot of the dictionary. The contents exposed through the enumerator may contain modifications made to the dictionary after GetEnumerator was called.
As long as you're okay with that, you don't need any kind of locking.

How should I keep 2 ConcurrentDictionaries in sync?

I like the lock-free operation of the ConcurrentDictionary and use it in two objects:
ConcurrentDictionary<datetime,myObj> myIndexByDate
ConcurrentDictionary<myObjSummary, ConcurrentDictionary<int, myObj> myObjectSummary Index
These two objects need to stay in Sync. Is the only way to do this is to use a Lock, thus avoiding all benefits of the Concurrent dictionary?
I would create a custom class with 2 dictionaries and use a lock only on the methods which are susceptible to change the dictionary(Add, Delete).
You don't lose the benefits of the concurrent dictionary as this method require much less code than what you would have to do using normal dictionary.
ConcurrentDictionary is only "thread-safe" on the operations on a particular instance. e.g. while a ConcurrentDictionary.TryAdd() call is being invoked, no other threads can be modifying the collection...
This doesn't mean that while you get an value from one dictionary and add it to another dictionary that the value still exists in the original dictionary while you're adding it to the second.
You probably have an invariant that details that while you're moving one value from one dictionary to the other, no values in the original dictionary can be removed (or at least that value, but that's a little more difficult to guarantee with ConncurrentDictionary.

List<T>: Moving from immutable to changeable structure

I am currently using quite heavily some List and I am looping very frequently via foreach over these lists.
Originally List was immuteable afer the startup. Now I have a requirement to amend the List during runtime from one thread only (a kind of listener). I need to remove from the List in object A and add to the list of object B. A and B are instances of the same class.
Unfortunaly there is no Synchronized List. What would you suggest me to do in this case? in my case speed is more important than synchronisation, thus I am currently working with copies of the lists for add/remove to avoid that the enumerators fail.
Do you have any other recommended way to deal with this?
class X {
List<T> Related {get; set;}
}
In several places and in different threads I am then using
foreach var x in X.Related
Now I need to basically perform in yet another thread
a.Related.Remove(t);
b.Related.Add(t);
To avoid potential exceptions, I am currently doing this:
List<T> aNew=new List<T> (a.Related);
aNew.Remove(t);
a.Related=aNew;
List<T>bNew=new List<T>(b.Related){t};
b.Related=bNew;
Is this correct to avoid exceptions?
From this MSDN post: http://msdn.microsoft.com/en-us/library/6sh2ey19.aspx
"...the only way to ensure thread safety is to lock the collection during the entire enumeration. "
Consider using for loops and iterate over your collection in reverse. This way you do not have the "enumerators fail", and as you are going backwards over your collection it is consistent from the POV of the loop.
It's hard to discuss the threading aspects as there is limited detail.
Update
If your collections are small, and you only have 3-4 "potential" concurrent users, I would suggest using a plain locking strategy as suggested by #Jalal although you would need to iterate backwards, e.g.
private readonly object _syncObj = new object();
lock (_syncObj)
{
for (int i = list.Count - 1; i >= 0; i--)
{
//remove from the list and add to the second one.
}
}
You need to protect all accesses to your lists with these lock blocks.
Your current implementation uses the COW (Copy-On-Write) strategy, which can be effective in some scenarios, but your particular implementation suffers from the fact that two or more threads take a copy, make their changes, but then could potentially overwrite the results of other threads.
Update
Further to your question comment, if you are guaranteed to only have one thread updating the collections, then your use of COW is valid, as there is no chance of multiple threads making updates and updates being lost by overwriting by multiple threads. It's a good use of the COW strategy to achieve lock free synchronization.
If you bring other threads in to update the collections, my previous locking comments stand.
My only concern would be that the other "reader" threads may have cached values for the addresses of the original lists, and may not see the new addresses when they are updated. In this case make the list variables volatile.
Update
If you do go for the lock-free strategy there is still one more pitfall, there will still be a gap between setting a.Related and b.Related, in which case your reader threads could be iterating over out-of-date collections e.g. item a could have been removed from list1 but not yet added to list2 - item a will be in neither lists. You could also swap the issue around and add to list2 before removing from list1, in which case item a would be in both lists - duplicates.
If consistency is important you should use locking.
You should lock before you handle the lists since you are in multithreading mode, the lock operation itself does not affect the speed here, the lock operation is executed in nanoseconds about 10 ns depending on the machine. So:
private readonly object _listLocker = new object();
lock (_listLocker)
{
for (int itemIndex = 0; itemIndex < list.Count; itemIndex++)
{
//remove from the first list and add to the second one.
}
}
If you are using framework 4.0 I encorage you to use ConcurrentBag instead of list.
Edit: code snippet:
List<T> aNew=new List<T> (a.Related);
This will work if only all interaction with the collection "including add remove replace items" managed this way. Also you have to use System.Threading.Interlocked.CompareExchange and System.Threading.Interlocked.Exchange methods to replace the existing collection with the new modified. if that is not the case then you are doing nothing by coping
This will not work. for instance consider a thread trying to get an item from the collection, at the same time another thread replace the collection. this could leave the item retrieved in a not constant data. also consider while you are coping the collection, another thread want to insert item to the collection at the same time while you are coping?
This will throw exception indicates that the collection modified.
Another thing is that you are coping the whole collection to a new list to handle it. certainly this will harm the performance, and I think using synchronization mechanism such as lock reduce the performance pallet, and it is the much appropriated thing to do while to handle multithreading scenarios.

Categories

Resources