Is this a safe use of Dictionary in a multi-threaded method? - c#

I have a collection of items (SortedPoints) that I iterate over using Parallel.ForEach. Each item will become the key in a Dictionary named Stripes. Computing the value for each item is expensive and is in the method BulidStripes.
Parallel.ForEach(SortedPoints, point =>
Stripes[point] = BuildStripes(point, pointToPosition)
);
I can make Stripes a ConcurrentDictionary, but I was wondering if this would work:
1) Make Stripes a regular Dictionary.
2) Iterate over all points serially and fill Stripes with mappings to an empty object.
3) Iterate over all points in parallel and replace the mapping in Stripes with the actual value returned by BuildStripes.
foreach(var point in SortedPoints)
Stripes[point] = emptyStripe;
Parallel.ForEach(SortedPoints, point =>
Stripes[point] = BuildStripes(point, pointToPosition)
);
Is setting the value for a key thread-safe if each thread works on a separate set of keys and each key is pre-loaded into the Dictionary serially as I outlined? I looked at the source code for Dictionary and it looks safe, but these collections are subtle beasts and parallel bugs are hard to spot.
Once the dictionary is created, I never modify it again and all accesses are reads.

Lets see the facts. A thread error can happen if:
Add a new item ? Nope
Resize the dictionnary while adding an item. This is not a problem, your dictionary has a fixed size.
Two threads try to set the value of a same key. Won't happen because your SortedPoints collection have distinct item (does it?)
Is there any other option? I don't see one. I think you are safe to go with this method.
But just use a regular ConcurrentDictionnary for readibility of course! Perhaps you can gain some performance, but unless you benchmark it, there is no reason to not use a ConcurrentDictionary.
Some docs about what ConcurrentDictionary deal with.

Related

Collection that lets access item by key but doesn't require duplicate checking on addition?

I'm asking for something that's a bit weird, but here is my requirement (which is all a bit computation intensive, which I couldn't find anywhere so far)..
I need a collection of <TKey, TValue> of about 30 items. But the collection is used in massively nested foreach loops that would iterate possibly almost up to a billion times, seriously. The operations on collection are trivial, something that would look like:
Dictionary<Position, Value> _cells = new
_cells.Clear();
_cells.Add(Position.p1, v1);
_cells.Add(Position.p2, v2);
//etc
In short, nothing more than addition of about 30 items and clearing of the collection. Also the values will be read from somewhere else at some point. I need this reading/retrieval by the key. So I need something along the lines of a Dictionary. Now since I'm trying to squeeze out every ounce from the CPU, I'm looking for some micro-optimizations as well. For one, I do not require the collection to check if a duplicate already exists while adding (this typically makes dictionary slower when compared to a List<T> for addition). I know I wont be passing duplicates as keys.
Since Add method would do some checks, I tried this instead:
_cells[Position.p1] = v1;
_cells[Position.p2] = v2;
//etc
But this is still about 200 ms seconds slower for about 10k iterations than a typical List<T> implementation like this:
List<KeyValuePair<Position, Value>> _cells = new
_cells.Add(new KeyValuePair<Position, Value>(Position.p1, v1));
_cells.Add(new KeyValuePair<Position, Value>(Position.p2, v2));
//etc
Now that could scale to a noticeable time after full iteration. Note that in the above case I have read item from list by index (which was ok for testing purposes). The problem with a regular List<T> for us are many, the main reason being not being able to access an item by key.
My question in short are:
Is there a custom collection class that would let access item by key, yet bypass the duplicate checking while adding? Any 3rd party open source collection would do.
Or else please point me to a good starter as to how to implement my custom collection class from IDictionary<TKey, TValue> interface
Update:
I went by MiMo's suggestion and List was still faster. Perhaps it has got to do with overhead of creating the dictionary.
My suggestion would be to start with the source code of Dictionary<TKey, TValue> and change it to optimize for you specific situation.
You don't have to support removal of individual key/value pairs, this might help simplifying the code. There apppear to be also some check on the validity of keys etc. that you could get rid of.
But this is still a few ms seconds slower for about ten iterations than a typical List implementation like this
A few milliseconds slower for ten iterations of adding just 30 values? I don't believe that. Adding just a few values should take microscopic amounts of time, unless your hashing/equality routines are very slow. (That can be a real problem. I've seen code improved massively by tweaking the key choice to be something that's hashed quickly.)
If it's really taking milliseconds longer, I'd urge you to check your diagnostics.
But it's not surprising that it's slower in general: it's doing more work. For a list, it just needs to check whether or not it needs to grow the buffer, then write to an array element, and increment the size. That's it. No hashing, no computation of the right bucket.
Is there a custom collection class that would let access item by key, yet bypass the duplicate checking while adding?
No. The very work you're trying to avoid is what makes it quick to access by key later.
When do you need to perform a lookup by key, however? Do you often use collections without ever looking up a key? How big is the collection by the time you perform a key lookup?
Perhaps you should build a list of key/value pairs, and only convert it into a dictionary when you've finished writing and are ready to start looking up.

Is there List type that can be Enumerated while it is Changing?

Sometimes it is useful to enumerate a list while it is changing.
e.g.
foreach (var item in listOfEntities)
item.Update();
// somewhere else (with someEntity contained in listOfEntities)
// an add or remove is made:
someEntity.OnUpdate += (s,e) => listOfEntities.Remove(someEntity);
This will fail if listOfEntities is a List<T>.
There are workarounds like making a copy or a simple for-loop, each with different drawbacks, but I would like to know if there is a list type in the framework (or open source) that supports this.
Look at the collections in System.Collections.Concurrent. There's no list there, but the collections' enumerators do "represents a moment-in-time snapshot of the contents of the [collection]".
These collections are designed for access from multiple threads, so they will be better suited to applications like the code sample you posted.
This has nothing to do with List<T>; it is a limitation of the enumerator. If you change the state of the collection underneath the enumerator it will throw, period.
You could use a for loop, but you will then run into logical errors as you index into a collection after the number of items have changed.
It's probably a bad idea to swap items in and out of a collection while you are enumerating it in another thread. I would stick with the tried and true method of recording the items to be removed in another collection or locking the collection while it is being enumerated.
I'm not claiming this is an impossible problem to solve, I just don't know of an easy way to do it.

I have read that it is bad practice to iterate over a HashSet. Should I be calling .ToList() on it first?

I have a collection of items called RegisteredItems. I do not care about the order of the items in RegisteredItems, only that they exist.
I perform two types of operations on RegisteredItems:
Find and return item by property.
Iterate over collection and have side-effect.
According to: When should I use the HashSet<T> type? Robert R. says,
"It's somewhat dangerous to iterate over a HashSet because doing so
imposes an order on the items in the set. That order is not really a
property of the set. You should not rely on it. If ordering of the
items in a collection is important to you, that collection isn't a
set."
There are some scenarios where my collection would contain 50-100 items. I realize this is not a large amount of items, but I was still hoping to reap the rewards of using a HashSet instead of List.
I have found myself looking at the following code and wondering what to do:
LayoutManager.Instance.RegisteredItems.ToList().ForEach( item => item.DoStuff() );
vs
foreach( var item in LayoutManager.Instance.RegisteredItems)
{
item.DoStuff();
}
RegisteredItems used to return an IList<T>, but now it returns a HashSet. I felt that, if I was using HashSet for efficiency, it would be improper to cast it as a List. Yet, the above quote from Robert leaves me feeling uneasy about iterating over it, as well.
What's the right call in this scenario? Thanks
If you don't care about order, use a HashSet<>. The quote is about using HashSet<> being dangerous when you're worried about order. If you run this code multiple times, and the items are operated on in different order, will you care? If not, then you're fine. If yes, then don't use a HashSet<>. Arbitrarily converting to a List first doesn't really solve the problem.
And I'm not certain, but I suspect that .ToList() will iterate over the HashSet<> to do that, so, now you're walking the collection twice.
Don't prematurely optimize. If you only have 100 items, just use a HashSet<> and move on. If you start caring about order, change it to a List<> then and use it as a list everwhere.
If you really don't care about order and you know that you can't have duplicate in your hashset (and it's what you want), go ahead use hashset.
In the quoted question, I think he's saying that if you iterate over a Set, you can easily trick yourself into thinking that the items are in a certain order. For example, it'd be easy to treat the first iterated item differently, but you aren't guaranteed that will remain the first iterated item.
As long as you keep this in mind, and consider the Set unordered, iterating over it is fine.

Double checked locking on Dictionary "ContainsKey"

My team is currently debating this issue.
The code in question is something along the lines of
if (!myDictionary.ContainsKey(key))
{
lock (_SyncObject)
{
if (!myDictionary.ContainsKey(key))
{
myDictionary.Add(key,value);
}
}
}
Some of the posts I've seen say that this may be a big NO NO (when using TryGetValue). Yet members of our team say it is ok since "ContainsKey" does not iterate on the key collection but checks if the key is contained via the hash code in O(1). Hence they claim there is no danger here.
I would like to get your honest opinions regarding this issue.
Don't do this. It's not safe.
You could be calling ContainsKey from one thread while another thread calls Add. That's simply not supported by Dictionary<TKey, TValue>. If Add needs to reallocate buckets etc, I can imagine you could get some very strange results, or an exception. It may have been written in such a way that you don't see any nasty effects, but I wouldn't like to rely on it.
It's one thing using double-checked locking for simple reads/writes to a field, although I'd still argue against it - it's another to make calls to an API which has been explicitly described as not being safe for multiple concurrent calls.
If you're on .NET 4, ConcurrentDictionary is probably the way forward. Otherwise, just lock on every access.
If you are in a multithreaded environment, you may prefer to look at using a ConcurrentDictionary. I blogged about it a couple of months ago, you might find the article useful: http://colinmackay.co.uk/blog/2011/03/24/parallelisation-in-net-4-0-the-concurrent-dictionary/
This code is incorrect. The Dictionary<TKey, TValue> type does not support simultaneous read and write operations. Even though your Add method is called within the lock the ContainsKey is not. Hence it easily allows for a violation of the simultaneous read / write rule and will lead to corruption in your instance
It doesn't look thread-safe, but it would probably be hard to make it fail.
The iteration vs hash lookup argument doesn't hold, there could be a hash-collision for instance.
If this dictionary is rarely written and often read, then I often employ safe double locking by replacing the entire dictionary on write. This is particularly effective if you can batch writes together to make them less frequent.
For example, this is a cut down version of a method we use that tries to get a schema object associated with a type, and if it can't, then it goes ahead and creates schema objects for all the types it finds in the same assembly as the specified type to minimize the number of times the entire dictionary has to be copied:
public static Schema GetSchema(Type type)
{
if (_schemaLookup.TryGetValue(type, out Schema schema))
return schema;
lock (_syncRoot) {
if (_schemaLookup.TryGetValue(type, out schema))
return schema;
var newLookup = new Dictionary<Type, Schema>(_schemaLookup);
foreach (var t in type.Assembly.GetTypes()) {
var newSchema = new Schema(t);
newLookup.Add(t, newSchema);
}
_schemaLookup = newLookup;
return _schemaLookup[type];
}
}
So the dictionary in this case will be rebuilt, at most, as many times as there are assemblies with types that need schemas. For the rest of the application lifetime the dictionary accesses will be lock-free. The dictionary copy becomes a one-time initialization cost of the assembly. The dictionary swap is thread-safe because pointer writes are atomic so the whole reference gets switched at once.
You can apply similar principles in other situations as well.

Threadsafe foreach enumeration of lists

I need to enumerate though generic IList<> of objects. The contents of the list may change, as in being added or removed by other threads, and this will kill my enumeration with a "Collection was modified; enumeration operation may not execute."
What is a good way of doing threadsafe foreach on a IList<>? prefferably without cloning the entire list. It is not possible to clone the actual objects referenced by the list.
Cloning the list is the easiest and best way, because it ensures your list won't change out from under you. If the list is simply too large to clone, consider putting a lock around it that must be taken before reading/writing to it.
There is no such operation. The best you can do is
lock(collection){
foreach (object o in collection){
...
}
}
Your problem is that an enumeration does not allow the IList to change. This means you have to avoid this while going through the list.
A few possibilities come to mind:
Clone the list. Now each enumerator has its own copy to work on.
Serialize the access to the list. Use a lock to make sure no other thread can modify it while it is being enumerated.
Alternatively, you could write your own implementation of IList and IEnumerator that allows the kind of parallel access you need. However, I'm afraid this won't be simple.
ICollection MyCollection;
// Instantiate and populate the collection
lock(MyCollection.SyncRoot) {
// Some operation on the collection, which is now thread safe.
}
From MSDN
You'll find that's a very interesting topic.
The best approach relies on the ReadWriteResourceLock which use to have big performance issues due to the so called Convoy Problem.
The best article I've found treating the subject is this one by Jeffrey Richter which exposes its own method for a high performance solution.
So the requirements are: you need to enumerate through an IList<> without making a copy while simultaniously adding and removing elements.
Could you clarify a few things? Are insertions and deletions happening only at the beginning or end of the list?
If modifications can occur at any point in the list, how should the enumeration behave when elements are removed or added near or on the location of the enumeration's current element?
This is certainly doable by creating a custom IEnumerable object with perhaps an integer index, but only if you can control all access to your IList<> object (for locking and maintaining the state of your enumeration). But multithreaded programming is a tricky business under the best of circumstances, and this is a complex probablem.
Forech depends on the fact that the collection will not change. If you want to iterate over a collection that can change, use the normal for construct and be prepared to nondeterministic behavior. Locking might be a better idea, depending on what you're doing.
Default behavior for a simple indexed data structure like a linked list, b-tree, or hash table is to enumerate in order from the first to the last. It would not cause a problem to insert an element in the data structure after the iterator had already past that point or to insert one that the iterator would enumerate once it had arrived, and such an event could be detected by the application and handled if the application required it. To detect a change in the collection and throw an error during enumeration I could only imagine was someone's (bad) idea of doing what they thought the programmer would want. Indeed, Microsoft has fixed their collections to work correctly. They have called their shiny new unbroken collections ConcurrentCollections (System.Collections.Concurrent) in .NET 4.0.
I recently spend some time multip-threading a large application and had a lot of issues with the foreach operating on list of objects shared across threads.
In many cases you can use the good old for-loop and immediately assign the object to a copy to use inside the loop. Just keep in mind that all threads writing to the objects of your list should write to different data of the objects. Otherwise, use a lock or a copy as the other contributors suggest.
Example:
foreach(var p in Points)
{
// work with p...
}
Can be replaced by:
for(int i = 0; i < Points.Count; i ++)
{
Point p = Points[i];
// work with p...
}
Wrap the list in a locking object for reading and writing. You can even iterate with multiple readers at once if you have a suitable lock, that allows multiple concurrent readers but also a single writer (when there are no readers).
This is something that I've recently had to deal with and to me it really depends on what you're doing with the list.
If you need to use the list at a point in time (given the number of elements currently in it) AND another thread can only ADD to the end of the list, then maybe you just switch out to a FOR loop with a counter. At the point you grab the counter, you're only seeing X numbers of elements in the list. You can walk through the list (while others are adding to the end of it) . . . should not cause a problem.
Now, if the list needs to have items taken OUT of it by other threads, or CLEARED by other threads, then you'll need to implement one of the locking mechanisms mentioned above. Also, you may want to look at some of the newer "concurrent" collection classes (though I don't believe they implement IList - so you may need refactor for a dictionary).

Categories

Resources