Should I use a lock or ToList() - c#

To avoid the famous error:
Collection was modified; enumeration operation may not execute
when iterating an IEnumerator, should I just use ToList() as suggested in this questions, or should I use lock when updating the list?
My concern is that ToList() is simply creating a new list each time, while lock will only have a small delay at some points when updating the list.
So, should I use ToList() or lock?

The error you are seeing tells you that you are trying to modify collection while you are enumerating it. The lock keyword is applicable in multi-threaded situation and it makes sure your only one thread is in critical section. It has nothing to do with "locking" collection so that you can enumerate it while also modifying it.
The error you are seeing typically occurs with code like below
foreach(var item in items)
{
//Inside this loop now you are enumerating collection
//This means you can't add or delete anything from items
items.Add(newItem); //This will produce the error you are seeing
}
This error might also occur if you have one thread enumerating collection while other thread adding or removing from collection. So if you know you have multi-threaded situation then you can use probably use lock although it is bit tricky and may impact your concurrency performance and even cause race condition. For multi-threaded situation, using just ToList may not be enough either because you need to make sure only one thread has access to collection while you are calling ToList. So you may need to use ToList along with lock and it may be overall simpler option depending your scenario. To see an example of how to do thread-safe modify-while-enumerate see this answer: Making a "modify-while-enumerating" collection thread-safe.
If you know you don't have multiple thread then ToList is probably your only option and you don't need lock at all. I would also suggest to make sure if you really do need to modify collection while enumerating it. In many cases (but not all), you can do design bit differently to avoid it in first place.

Related

iterating through ConcurrentDictionary and modifying the collection

What is the correct way to iterate through ConcurrentDictionary collection and deleting some entries.
As I understand ConcurrentDictionary implements locking on a level of a cell, and for me is important to iterate through whole collection and remove irrelevant data via some condition. I affraid that through my iteration other thread will add data that should be filtered after my iterator was after that point in the collection.
So how I can achieve this with ConcurrentDictionary ,or should use ordinary Dictionary with locking mechanism on each touch of that collection.
Thanks.
ConcurrentDictionary's main advantage IMO is that it's lock free for the reading threads, which is usually why you'd use it; if you don't need that advantage then you can just use a dictionary with a lock.
You could always ensure the filter criteria are passed to any adds that occur after your filter iteration starts.
Otherwise, if non-blocking reads are more important than the occasional inconsistent entry, you could just repeat your filter until it finds no more invalid items.
Definetely you can't garante, that no entries will be added to ConcurrentDictionary while you iterate it without locking the dictionary. But if do locking, then there no sence in using ConcurrentDictionary, use Dictionary instead.
Btw, why it is important for you to end iteration with no 'bad' entries left? It can be populated with any of them the next moment after releasing the lock. Maybe it's better to not let unwanted entries to appear in dictionary?

Thread safe OfType

I am facing the problem that this code:
enumerable.OfType<Foo>().Any(x => x.Footastic == true);
Isnt thread safe and throws an enumeration has changed exception.
Is there a good way to overcome this issue?
Already tried the following but it didnt always work (seems to not fire this often)
public class Foo
{
public void DoSomeMagicWithCollection(IEnumerable enumerable)
{
lock (enumerable)
{
enumerable.OfType<Foo>().Any(x => x.Footastic == true);
}
}
}
If you're getting an exception that the underlying collection has changed while enumerating it, given that this code clearly doesn't mutate the collection itself, it means that another thread is mutating the collection while you're trying to iterate it.
There is no possible solution to this problem other than simply not doing that. What's happening is that the enumerator of the List (or whatever collection type that is) is throwing the exception and preventing further enumeration because it can see that the list was modified during the enumeration. There is no way for the enumerators of OfType of Any that wrap it to possibly recover from that. The underlying enumerator is refusing to give them the data from the list. They can't do anything about that.
You need to use some sort of synchronization mechanism to prevent another thread from mutating the collection wnile this thread is enumerating this collection. Your lock doesn't prevent another thread from using the collection, it simply prevents any code that locks on the same instance from running. You need to have any code that could possibly mutate the list also lock on the same object to properly synchronize them.
Another possibility would be to use a collection that is inherently designed to be accessed from multiple threads at the same time. There are several such collections in the System.Collections.Concurrent namespace. They may or may not fit your needs. They will take care of synchronizing access to their data (to a point) on their own, without you needing to explicitly lock when accessing them.

IProducerConsumerCollection<T>.TryAdd/.TryTake - when do they return true/false?

When I call IProducerConsumerCollection<T>.TryAdd(<T>) or IProducerConsumerCollection<T>.TryTake(out <T>) will these ever fail because another thread is using the collection?
Or is it the case that if there is space to Add or something to Take even after the other thread has finished with the collection, it will always return true?
Nothing that I can see here: http://msdn.microsoft.com/en-us/library/dd287147.aspx
While in theory the collections could reject take/add requests for any reason, the only reason I know about is Add failing because the collection has reached its capacity, and Take failing because the collection is empty.
The collections are designed from the get-go to be used from multiple threads - so if there are items left, even if two threads try to Take at the same time, they should both get an item and a return value of true.
For example, BlockingCollection<T> which is a high-level abstraction over the interface (it doesn't implement the interface though) with bounding and blocking capabilities may throw one of the following:
ObjectDisposedException on TryAdd(T) or TryTake(T) once the collection is disposed.
InvalidOperationException on TryAdd(T) if it's marked as complete for addition. Think about situation when you add values to a collection from 2 producers, one marks collection as complete, then another one tries to add to collection.

Iterating Collection In Two Threads

This question relates both to C# and Java
If you have a collection which is not modified and that collection reference is shared between two threads what happens when you iterate on each thread??
ThreadA: Collection.iterator
ThreadA: Collection.moveNext
ThreadB: Collection.iterator
ThreadB: Collection.moveNext
Will threadB see the first element?
Is the iterator always reset when it is requested? What happens if this is interleaved so movenext and item is interleaved? Is there a danger that you dont process all elements??
It works as expected because each time you request the iterator you get a new one.
If it didn't you wouldn't be able to do foreach followed by foreach on the same collection!
By convention, an Iterator is implemented such that the traversal action never alters the collection's state. It only points to the current position in the collection, and manages the iteration logic. Therefore, if you scan the same collection by N different threads, everything should work fine.
However, note that Java's Iterator allows removal of items, and ListIterator even supports the set operation. If you want to use these actions by at least one of the threads, you will probably face concurrency problems (ConcurrentModificationException), unless the Iterator is specifically designed for such scenarios (such as with ConcurrentHashMap's iterators).
In Java (and I am pretty sure also in C#), the standard API collections typically do not have single iterator. Each call to iterator() produces a new one, which has its own internal index or pointer, so that as long as both threads acquire their own iterator object, there should be no problem.
However, this is not guaranteed by the interface, nor is the ability of two iterators to work concurrently without problems. For custom implementations of collections, all bets are off.
In c# - yes, java - seems to be but I'm not familiar.
About c# see http://csharpindepth.com/Articles/Chapter6/IteratorBlockImplementation.aspx and http://csharpindepth.com/Articles/Chapter11/StreamingAndIterators.aspx
At least in C#, all of the standard collections can be enumerated simultaneously on different threads. However, enumeration on any thread will blow up if you modify the underlying collection during enumeration (as it should.) I don't believe any sane developer writing a collection class would have their enumerators mutate the collection state in a way that interferes with enumeration, but it's possible. If you're using a standard collection, however, you can safely assume this and therefore use locking strategies like Single Writer / Multiple Reader when synchronizing collection access.

Safe to get Count value from generic collection without locking the collection?

I have two threads, a producer thread that places objects into a generic List collection and a consumer thread that pulls those objects out of the same generic List. I've got the reads and writes to the collection properly synchronized using the lock keyword, and everything is working fine.
What I want to know is if it is ok to access the Count property without first locking the collection.
JaredPar refers to the Count property in his blog as a decision procedure that can lead to race conditions, like this:
if (list.Count > 0)
{
return list[0];
}
If the list has one item and that item is removed after the Count property is accessed but before the indexer, an exception will occur. I get that.
But would it be ok to use the Count property to, say, determine the initial size a completely different collection? The MSDN documentation says that instance members are not guaranteed to be thread safe, so should I just lock the collection before accessing the Count property?
I suspect it's "safe" in terms of "it's not going to cause anything to go catastrophically wrong" - but that you may get stale data. That's because I suspect it's just held in a simple variable, and that that's likely to be the case in the future. That's not the same as a guarantee though.
Personally I'd keep it simple: if you're accessing shared mutable data, only do so in a lock (using the same lock for the same data). Lock-free programming is all very well if you've got appropriate isolation in place (so you know you've got appropriate memory barriers, and you know that you'll never be modifying it in one thread while you're reading from it in another) but it sounds like that isn't the case here.
The good news is that acquiring an uncontested lock is incredibly cheap - so I'd go for the safe route if I were you. Threading is hard enough without introducing race conditions which are likely to give no significant performance benefit but at the cost of rare and unreproducible bugs.

Categories

Resources