Multi-threading list pattern advice

Multi-threading list pattern advice - c#

I have made an application which also contains a folder/file scanner. I'm coming across a problem with the threading structure.
How it works:
For each folder/file it finds it starts a thread. There is a function inside each thread that uses a list to check if a similar item has been found so that it can add to the existing item. If it's not found it will add the item to the earlier mentioned list. The threads are executed parallel (async).
Problem:
Because it's async it will sometimes fail on the listcheck. This is caused because there is a time period between the check and adding to the list. Something that can happen is that the check returns that there is not a similar item, while there certainly is. This will result in the same item occurring in the list.
I have also made it that threads wait on each other. I really like the effect this gives it on the frontend. (items nicely adding to the list real time). But this takes way to long for a lot of folders/files.
Now I'm thinking of making a mix between the functions, but i would really like to see a combination of the speed of async threads and the safety of waiting on each thread.
Anybody any idea?

You should lock the entire code part that checks the list and adds a value.
Something like this:
private void YourThreadMethod(object state)
{
// long taking operation
lock (dictionary)
{
if (!dictionary.ContainsKey(yourItemKey))
{
// construct object, long taking operation
dictionary.Add(yourItemKey, createdObject);
}
}
}
In this way, every thread will have to wait until the list is free to use. If you want a more advanced solution, you could read into the ReaderWriterLockSlim class which gives a more fine grained solution.

The most sleekest approach is the usage of a ConcurrentDictionary<string, byte> when yourItemKey is type of string (otherwise adapt TKey and use a proper IEqualityComparer or implement IEquatable):
private readonly ConcurrentDictionary<string, byte> _list = new ConcurrentDictionary<string, byte>();
private void Foo(object state)
{
// looong operation
this._list.TryAdd(yourItemKey, 0);
}
public void Bar()
{
// this is how to query the content
this._list.Keys...;
}
The trick behind that is to not use a too complex object as the key, which may need disposal or has external references (I'd prefer any string representation), and a small type for the value, which just acts as a marker.

I would consider using one of the thread safe collections in C#. For your case something like a ConcurrentBag will be more efficient than using a lock.
In case there is a time delay between checking and adding, you can use ConcurrentDictionary. It has a TryAdd method which will return false if an item with the same key is already in the dictionary.

Related

Standard solution for asynch call interfering with collection during loop?

I'm working on a program, in c# that uses asynchronous network calls. Some of these modify a collection the rest of the program loops through from time to time. The problem is that when the asynchronous call is trying to modify to collection during the loop an exception is thrown and the program crashes.
I'm fairly new to this type of programming and this seems like a common problem that should have a standard solution.
What I tried was to set up a bool that I switch off before looping through the collection and check in the asynchronous method. If it is not on I modify a different collection, and make the changes based on that to the original in the rest of the program. The problem is that the asynchronous method also has a loop over this collection, so when it is called in succession quickly it can interfere with that loop.
I figure a better solution would be to set up the code so that while the bool denoting that the collection is not safe to modify is true, any of these calls are delayed. However I don't know if this is a good enough solution (since I figure that the asynchronous call could first begin, then the program could get to the unsafe part).
This is not the actual program, but an example of the problem for clarity:
private List<Stuff> myList = new List();
private void NetworkDataReceived(Stuff s)
{
myList.Add(s);
}
private void SomeOtherMethod()
{
foreach(Stuff s in myList)
{
DoSomethingWithStuff(s);
}
}
In the example above when NetworkDataReceived is called while SomeOtherMethod is also running, the program crashes with the following InvalidOperationException: "Collection was modified; enumeration operation may not execute."
I'd appreciate it if somebody who has experience in this kind of programming in C# could give me a pointer to how to resolve this issue.

List<T> is not thread safe, so you cannot have have multiple threads reading and writing to it in the same time.
You can use one of the thread-safe collections provided by the .NET framework to do that. One example of such collection is the ConcurrentQueue<T> class.
Quoting from the reference above:
Multiple threads can safely and efficiently add or remove items from these collections, without requiring additional synchronization in user code.

Lock and refresh implementation

I have a key to task mapping and I need to run the task only if the task for the given is not already running. Pseudo code follows. I believe there is lot of scope for improvement. I'm locking on the map and hence almost serializing access to CacheFreshener. Is there a better way of doing this? We know that when I'm trying to lock a key k1, there is no point in cache freshener call for key k2 waiting for lock.
class CacheFreshener
{
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock(lockMap)
{
if (lockMap.ContainsKey(key))
{
// no-op
return;
}
else
{
lockMap.Add(key, true);
}
}
// if you are here means task is not already present
cacheMissAction(key);
lock(lockMap) // Do we need to lock here??
{
lockMap.Remove(key);
}
}
}

As requested, here is an elaborated explanation of what I was getting at relative to my comments…
The basic issue here seems to be the question of concurrency, i.e. two or more threads accessing the same object at a time. This is the scenario ConcurrentDictionary is designed for. If you use the IDictionary methods of ContainsKey() and Add() separately, then you would need explicit synchronization (but only for that operation…in this particular scenario it wouldn't strictly be needed when calling Remove()) to ensure these are performed as a single atomic operation. But the ConcurrentDictionary class anticipates this need, and includes the TryAdd() method to accomplish the same, without the explicit synchronization.
<aside>
It is not entirely clear to me the intent behind the code example as given. The code appears to be meant to only store an object in the "cache" for the duration of the invocation of the cacheMissAction delegate. The key is removed immediately after. So it does seem like it's not really caching anything per se. It just prevents more than one thread from being in the process of invoking cacheMissAction at a time (subsequent threads will fail to invoke it, but also cannot count on it having completed by the time their call to the RefreshData() method has completed).
</aside>
But taking the code example as given, it's clear that no explicit locking is actually required. The ConcurrentDictionary class already provides thread-safe access (i.e. non-corruption of the data structure when used concurrently from multiple threads), and it provides the TryAdd() method as a mechanism for adding a key (and its value, though here that's just always a bool literal of true) to the dictionary that will ensure that only one thread ever has a key in the dictionary at a time.
So we can rewrite the code to look like this instead and accomplish the same goal:
private ConcurrentDictionary<string,bool> lockMap;
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
if (!lockMap.TryAdd(key, true))
{
return;
}
// if you are here means task was not already present
cacheMissAction(key);
lockMap.Remove(key);
}
No lock statement is needed for either the add or remove, as the TryAdd() handles the entire "check for key and add if not present" operation atomically.
I will note that using a dictionary to do the job of a set could be considered inefficient. If the collection is likely not to be large, it's no big deal, but I do find it odd that Microsoft chose to make the same mistake they made originally when in the pre-generics days you had to use the non-generic dictionary object Hashtable to store a set, before HashSet<T> came along. Now we have all these easy-to-use classes in System.Collections.Concurrent, but no thread-safe implementation of ISet<T> in there. Sigh…
That said, if you do prefer a somewhat more efficient approach in terms of storage (this is not necessarily a faster implementation, depending on the concurrent access patterns of the object), something like this would work as an alternative:
private HashSet<string> lockSet;
private readonly object _lock = new object();
public RefreshData(string key, Func<string, bool> cacheMissAction)
{
lock (_lock)
{
if (!lockSet.Add(key))
{
return;
}
}
// if you are here means task was not already present
cacheMissAction(key);
lock (_lock)
{
lockSet.Remove(key);
}
}
In this case, you do need the lock statement, because the HashSet<T> class is not inherently thread-safe. This is of course very similar to your original implementation, just using the more set-like semantics of HashSet<T> instead.

Cross-Thread access of a field in C#

If a class has an array, it doesn't really matter what of. Now one thread is adding data to said array, while another thread needs to process the data that is already in it. With my limited knowledge of multithreading, how could this work? The first problem I can think of is if an item is added while the other thread is processing what's still there. At first I thought that wouldn't be a problem, the processor thread would get it next time it processed, but then I realized that while the processor thread removes items it's already processed, the adding thread would not receive this change, possibly (?) wreaking havoc. Is there any good way to implement this behavior?

What you've described is basically the Reader Writers Problem. If you want to take care of multithreading, you're either going to need a concurrent collection, or use of a lock. The simplest implementation of a lock would just be locking an object
private Object myLock = new Object();
public MyClass ReadFromSharedArray()
{
lock(myLock)
{
//do whatever here
}
}
public void WriteToSharedArray(MyClass data)
{
lock(myLock)
{
//Do whatever here
}
}
There are better locks such as ReadWriterSlim locks but this sort of basic implementation should be a good starting point.
Also you mentioned adding/removing from arrays, I'm assuming you meant Lists (or better yet a Queue) - there's a ConcurrentQueuewhich could be a good replacement.

C# Is this method thread safe?

Consider the following code:
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
list.Add(a,"someothervalue");
}
}
Assuming I'm calling MyMethod("mystring") from different threads concurrently.
Would it be possible for more than one thread (we'll just take it as two) enter the if (!list.Contains(a)) statement in the same time (with a few CPU cycles differences), both threads get evaluated as false and one thread enters the critical region while another gets locked outside, so the second thread enters and add "mystring" to the list again after the first thread exits, resulting in the dictionary trying to add a duplicate key?

No, it's not thread-safe. You need the lock around the list.Contains too as it is possible for a thread to be switched out and back in again between the the if test and adding the data. Another thread may have added data in the meantime.

You need to lock the entire operation (check and add) or multiple threads may attempt to add the same value.
I would recommend using the ConcurrentDictionary(TKey, TValue) since it is designed to be thread safe.
private readonly ConcurrentDictionary<string, string> _items
= new ConcurrentDictionary<string, string>();
public void MyMethod(string item, string value)
{
_items.AddOrUpdate(item, value, (i, v) => value);
}

You need to lock around the whole statement. It's possible for you to run into issues on the .Contains portion (the way your code is now)

You should check the list after locking. e.g.
if (list.Contains(a))
return;
lock (lockObj) {
if (list.Contains(a))
return;
list.Add(a);
}
}

private Dictionary<string, string> list = new Dictionary<string, string>();
public void MyMethod(string a) {
lock (list) {
if (list.Contains(a))
return;
list.Add(a,"someothervalue");
}
}
Check out this guide to locking, it's good
A few guidelines to bear in mind
Generally lock around a private static object when locking on multiple writeable values
Do not lock on things with scope outside the class or local method such as lock(this), which could lead to deadlocks!
You may lock on the object being changed if it is the only concurrently accessed object
Ensure the object you lock is not null!
You can only lock on reference types

I am going to assume that you meant write ContainsKey instead of Contains. Contains on a Dictionary is explicitly implemented so it is not accessible via the type you declared.1
Your code is not safe. The reason is because there is nothing preventing ContainsKey and Add from executing at the same time. There are actually some quite remarkable failure scenarios that this would introduce. Because I looked at how the Dictionary is implemented I can see that your code could cause a situation where data structure contains duplicates. And I mean it literally contains duplicates. The exception will not necessarily be thrown. The other failure scenarios just keep getting stranger and stranger, but I will not go into those here.
One trivial modification to your code might involve a variation of the double-checked locking pattern.
public void MyMethod(string a)
{
if (!dictionary.ContainsKey(a))
{
lock (dictionary)
{
if (!dictionary.ContainsKey(a))
{
dictionary.Add(a, "someothervalue");
}
}
}
}
This, of course, is not any safer for the reason I already stated. Actually, the double-checked locking pattern is notoriously difficult to get right in all but the simplest cases (like the canonical implementation of a singleton). There are many variations on this theme. You can try it with TryGetValue or the default indexer, but ultimately all of these variations are just dead wrong.
So how could this be done correctly without taking a lock? You could try ConcurrentDictionary. It has the method GetOrAdd which is really useful in these scenarios. Your code would look like this.
public void MyMethod(string a)
{
// The variable 'dictionary' is a ConcurrentDictionary.
dictionary.GetOrAdd(a, "someothervalue");
}
That is all there is to it. The GetOrAdd function will check to see if the item exists. If it does not then it will be added. Otherwise, it will leave the data structure alone. This is all done in a thread-safe manner. In most cases the ConcurrentDictionary does this without waiting on a lock.2
1By the way, your variable name is obnoxious too. If it were not for Servy's comment I may have missed the fact that we were talking about a Dictionary as opposed to a List. In fact, based on the Contains call I first thought we were talking about a List.
2On the ConcurrentDictionary readers are completely lock free. However, writers always take a lock (adds and updates that is; the remove operation is still lock free). This includes the GetOrAdd function. The difference is that the data structure maintains several possible lock options so in most cases there is little or no lock contention. That is why this data structure is said to be "low lock" or "concurrent" as opposed to "lock free".

You can first do a non-locking check, but if you want to be thread-safe you need to repeat the check again within the lock. This way you don't lock unless you have to and ensure thread safety.
Dictionary<string, string> list = new Dictionary<string, string>();
object lockObj = new object();
public void MyMethod(string a) {
if (list.Contains(a))
return;
lock (lockObj) {
if (!list.Contains(a)){
list.Add(a,"someothervalue");
}
}
}

C# working with singleton from two different threads

I am using the singleton pattern in a wpf app, but having doubts about how to make it work with multiple threads.
I have a class called Monitor which maintains a list of "settings" to watch, for different "devices". Outline shown below.
On my main thread I am doing
Monitor.getMonitor.register(watchlist) or Monitor.getMonitor.unregister(...) depending on the user input and I have a DispatchTimer running every 200ms that does a
Monitor.getMonitor.update()
public class Monitor
{
private Hashtable Master; //key=device, value=list of settings to watch
private static Monitor instance = new Monitor();
private Monitor() {}
public static Monitor getMonitor()
{
return instance;
}
public void register(watchlist){...}
public void unregister(...){...}
public void update(){...}
}
register()/unregister() perform add/remove to the hastable.
update() is only reading stuff out of the hashtable.
Depending on the number of devices and settings, update() is going to be iterating over the hastable and it contents, getting the latest values.
The main thread maybe calling register and unregister quite often and I want the gui to stay responsive. Whats a good way to do this?
Do I lock the hashtable, around add/remove and iterate, OR just surrond the iteration part in update with a try catch (ala gracefully fail) to catch any weird state the hashtable might get into(no locking) or is there some better way to do this (if update fails no prob..its going to be running in 200ms again anyway).
Not very sure about what is going on, cause the code as is hasnt really shown any problems which itself is making me a bit uneasy cause it just seems wrong. Thanks for any suggestions...

See my article on singleton implementations to make the singleton fetching itself threadsafe.
Yes, you'll need to lock when you modify or iterate over the hashtable. You could use a ReaderWriterLock (or preferrably ReaderWriterLockSlim in .NET 3.5) to allow multiple readers at a time. If you need to do a lot of work while you're iterating, you could always lock, take a copy, unlock, and then work on the copy - so long as the work doesn't mind the copy being slightly stale.
(If you're using .NET 2.0+, I'd suggest using the generic collections such as Dictionary<TKey, TValue> instead of Hashtable. I'd also suggest you rename your methods in line with .NET conventions. That code's got a distinct Java accent at the moment ;)

Yes, you should lock each operation:
public class Monitor
{
private Hashtable Master; //key=device, value=list of settings to watch
...
private object tableLock = new object();
public void register(watchlist)
{
lock(tableLock) {
// do stuff
}
}
}
You shouldn't consider using a try/catch block - exceptions shouldn't be considered as a "normal" situation, and you might end up with a corrupted object state without any exception.

How many rows are there? Unless the update() loop takes a long time to do the iterations, I'd probably lock. If the main thread is potentially doing a lot of register/unregister calls, then update might fail repeatedly -- if it fails for 20 or 30 consecutive calls, is that a problem?
That code looks ok to me. I'd probably make the class sealed. I'd also use a typed dictionary vs. a Hashtable.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Multi-threading list pattern advice - c#

Related

Standard solution for asynch call interfering with collection during loop?

Lock and refresh implementation

Cross-Thread access of a field in C#

C# Is this method thread safe?

C# working with singleton from two different threads

Categories

Resources