Standard solution for asynch call interfering with collection during loop?

Standard solution for asynch call interfering with collection during loop? - c#

I'm working on a program, in c# that uses asynchronous network calls. Some of these modify a collection the rest of the program loops through from time to time. The problem is that when the asynchronous call is trying to modify to collection during the loop an exception is thrown and the program crashes.
I'm fairly new to this type of programming and this seems like a common problem that should have a standard solution.
What I tried was to set up a bool that I switch off before looping through the collection and check in the asynchronous method. If it is not on I modify a different collection, and make the changes based on that to the original in the rest of the program. The problem is that the asynchronous method also has a loop over this collection, so when it is called in succession quickly it can interfere with that loop.
I figure a better solution would be to set up the code so that while the bool denoting that the collection is not safe to modify is true, any of these calls are delayed. However I don't know if this is a good enough solution (since I figure that the asynchronous call could first begin, then the program could get to the unsafe part).
This is not the actual program, but an example of the problem for clarity:
private List<Stuff> myList = new List();
private void NetworkDataReceived(Stuff s)
{
myList.Add(s);
}
private void SomeOtherMethod()
{
foreach(Stuff s in myList)
{
DoSomethingWithStuff(s);
}
}
In the example above when NetworkDataReceived is called while SomeOtherMethod is also running, the program crashes with the following InvalidOperationException: "Collection was modified; enumeration operation may not execute."
I'd appreciate it if somebody who has experience in this kind of programming in C# could give me a pointer to how to resolve this issue.

List<T> is not thread safe, so you cannot have have multiple threads reading and writing to it in the same time.
You can use one of the thread-safe collections provided by the .NET framework to do that. One example of such collection is the ConcurrentQueue<T> class.
Quoting from the reference above:
Multiple threads can safely and efficiently add or remove items from these collections, without requiring additional synchronization in user code.

Related

Why my code does not speed up with a multithreaded Parallel.For loop?

I tried to transform a simple sequential loop into a parallel computed loop with the System.Threading.Tasks library.
The code compiles, returns correct results, but It does not save any computational cost, otherwise, it takes longer.
EDIT: Sorry guys, I have probably oversimplified the question and made some errors doing that.
To append additional information, I am running the code on an i7-4700QM, and it is referenced in a Grasshopper script.
Here is the actual code. I also switched to a non thread-local variables
public static class LineNet
{
public static List<Ray> SolveCpu(List<Speaker> sources, List<Receiver> targets, List<Panel> surfaces)
{
ConcurrentBag<Ray> rays = new ConcurrentBag<Ray>();
for (int i = 0; i < sources.Count; i++)
{
Parallel.For(
0,
targets.Count,
j =>
{
Line path = new Line(sources[i].Position, targets[j].Position);
Ray ray = new Ray(path, i, j);
if (Utils.CheckObstacles(ray,surfaces))
{
rays.Add(ray);
}
}
);
}
}
}
The Grasshopper implementation just collects sources targets and surfaces, calls the method Solve and returns rays.
I understand that dispatching workload to threads is expensive, but is it so expensive?
Or is the ConcurrentBag just preventing parallel calculation?
Plus, my classes are immutable (?), but if I use a common List the kernel aborts the operation and throws an exception, is someone able to tell why?

Without a good Minimal, Complete, and Verifiable code example that reliably reproduces the problem, it is not possible to provide a definitive answer. The code you posted does not even appear to be an excerpt of real code, because the type declared as the return type of the method isn't the same as the value actually returned by the return statement.
However, certainly the code you posted does not seem like a good use of Parallel.For(). Your Line constructor would have be fairly expensive to justify parallelizing the task of creating the items. And to be clear, that's the only possible win here.
At the end, you still need to aggregate all of the Line instances that you created into a single list, so all those intermediate lists created for the Parallel.For() tasks are just pure overhead. And the aggregation is necessarily serialized (i.e. only one thread at a time can be adding an item to the result collection), and in the worst way (each thread only gets to add a single item before it gives up the lock and another thread has a chance to take it).
Frankly, you'd be better off storing each local List<T> in a collection, and then aggregating them all at once in the main thread after Parallel.For() returns. Not that that would be likely to make the code perform better than a straight-up non-parallelized implementation. But at least it would be less likely to be worse. :)
The bottom line is that you don't seem to have a workload that could benefit from parallelization. If you think otherwise, you'll need to explain the basis for that thought in a clearer, more detailed way.
if I use a common List the kernel aborts the operation and throws an exception, is someone able to tell why?
You're already using (it appears) List<T> as the local data for each task, and indeed that should be fine, as tasks don't share their local data.
But if you are asking why you get an exception if you try to use List<T> instead of ConcurrentBag<T> for the result variable, well that's entirely to be expected. The List<T> class is not thread safe, but Parallel.For() will allow each task it runs to execute the localFinally delegate concurrently with all the others. So you have multiple threads all trying to modify the same not-thread-safe collection concurrently. This is a recipe for disaster. You're fortunate you get the exception; the actual behavior is undefined, and it's just as likely you'll simply corrupt the data structure as cause a run-time exception.

How to map ImmutableArray without getting it cast to IEnumerable which is not thread safe?

So I'm working in a multithreaded environment and I wan't to use ImmutableArray all the time because it's thread safe.
Unfortunately, ImmutableArray implements thread unsafe interfaces and so Select method from LINQ returns IEnumerable.
This way, my thread safe variable becomes thread unsafe.
How do I map from ImmutableArray to ImmutableArray?

It seems that there are a lot of misunderstandings behind this question. You need to go look at the source code for the Select method and learn about the yield keyword.
Second, LINQ methods are made to be short-lived. You have various threads doing various processing tasks. Are you using a pipeline situation, where you want to transform data in one thread and pass the result to another thread? You have to be careful with the yield keyword in that situation; essentially, you need to flush (er, realize, for lack of a better word) your collections before passing them to the next thread so that the actual work is done in the present thread. In that scenario, object ownership kicks in and you don't need thread-safe collections.
In short, the enumerable returned from calling Select on ImmutableArray is perfectly thread-safe. You can realize it at any point and it won't give you any errors. Of course it will only iterate through the data that was contained in your collection at the time you called Select. It won't know anything about newly assigned instances.

Multi-threading list pattern advice

I have made an application which also contains a folder/file scanner. I'm coming across a problem with the threading structure.
How it works:
For each folder/file it finds it starts a thread. There is a function inside each thread that uses a list to check if a similar item has been found so that it can add to the existing item. If it's not found it will add the item to the earlier mentioned list. The threads are executed parallel (async).
Problem:
Because it's async it will sometimes fail on the listcheck. This is caused because there is a time period between the check and adding to the list. Something that can happen is that the check returns that there is not a similar item, while there certainly is. This will result in the same item occurring in the list.
I have also made it that threads wait on each other. I really like the effect this gives it on the frontend. (items nicely adding to the list real time). But this takes way to long for a lot of folders/files.
Now I'm thinking of making a mix between the functions, but i would really like to see a combination of the speed of async threads and the safety of waiting on each thread.
Anybody any idea?

You should lock the entire code part that checks the list and adds a value.
Something like this:
private void YourThreadMethod(object state)
{
// long taking operation
lock (dictionary)
{
if (!dictionary.ContainsKey(yourItemKey))
{
// construct object, long taking operation
dictionary.Add(yourItemKey, createdObject);
}
}
}
In this way, every thread will have to wait until the list is free to use. If you want a more advanced solution, you could read into the ReaderWriterLockSlim class which gives a more fine grained solution.

The most sleekest approach is the usage of a ConcurrentDictionary<string, byte> when yourItemKey is type of string (otherwise adapt TKey and use a proper IEqualityComparer or implement IEquatable):
private readonly ConcurrentDictionary<string, byte> _list = new ConcurrentDictionary<string, byte>();
private void Foo(object state)
{
// looong operation
this._list.TryAdd(yourItemKey, 0);
}
public void Bar()
{
// this is how to query the content
this._list.Keys...;
}
The trick behind that is to not use a too complex object as the key, which may need disposal or has external references (I'd prefer any string representation), and a small type for the value, which just acts as a marker.

I would consider using one of the thread safe collections in C#. For your case something like a ConcurrentBag will be more efficient than using a lock.
In case there is a time delay between checking and adding, you can use ConcurrentDictionary. It has a TryAdd method which will return false if an item with the same key is already in the dictionary.

Thread safe OfType

I am facing the problem that this code:
enumerable.OfType<Foo>().Any(x => x.Footastic == true);
Isnt thread safe and throws an enumeration has changed exception.
Is there a good way to overcome this issue?
Already tried the following but it didnt always work (seems to not fire this often)
public class Foo
{
public void DoSomeMagicWithCollection(IEnumerable enumerable)
{
lock (enumerable)
{
enumerable.OfType<Foo>().Any(x => x.Footastic == true);
}
}
}

If you're getting an exception that the underlying collection has changed while enumerating it, given that this code clearly doesn't mutate the collection itself, it means that another thread is mutating the collection while you're trying to iterate it.
There is no possible solution to this problem other than simply not doing that. What's happening is that the enumerator of the List (or whatever collection type that is) is throwing the exception and preventing further enumeration because it can see that the list was modified during the enumeration. There is no way for the enumerators of OfType of Any that wrap it to possibly recover from that. The underlying enumerator is refusing to give them the data from the list. They can't do anything about that.
You need to use some sort of synchronization mechanism to prevent another thread from mutating the collection wnile this thread is enumerating this collection. Your lock doesn't prevent another thread from using the collection, it simply prevents any code that locks on the same instance from running. You need to have any code that could possibly mutate the list also lock on the same object to properly synchronize them.
Another possibility would be to use a collection that is inherently designed to be accessed from multiple threads at the same time. There are several such collections in the System.Collections.Concurrent namespace. They may or may not fit your needs. They will take care of synchronizing access to their data (to a point) on their own, without you needing to explicitly lock when accessing them.

Does using ConcurrentDictionary TryGetValue within an if statement make the if contents thread-safe?

If I have a ConcurrentDictionary and use the TryGetValue within an if statement, does this make the if statement's contents thread safe? Or must you lock still within the if statement?
Example:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
//Users is a list.
client.Users.Add(item);
}
or do I have to do:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
lock (client)
{
//Users is a list.
client.Users.Add(item);
}
}

Yes you have to lock inside the if statement the only guarantee you get from concurrent dictionary is that its methods are thread save.

The accepted answer could be misleading, depending on your point of view and the scope of thread safety you are trying to achieve. This answer is aimed at people who stumble on this question while learning about threading and concurrency:
It's true that locking on the output of the dictionary retrieval (the Client object) makes some of the code thread safe, but only the code that is accessing that retrieved object within the lock. In the example, it's possible that another thread removes that object from the dictionary after the current thread retrieves it. (Even though there are no statements between the retrieval and the lock, other threads can still execute in between.) Then, this code would add the Client object to the Users list even though it is no longer in the concurrent dictionary. That could cause an exception, synchronization, or race condition.
It depends on what the rest of the program is doing. But in the scenario I'm describing, it would be safer to put the lock around the entire dictionary retrieval. And then a regular dictionary might be faster and simpler than a concurrent dictionary, as long as you always lock on it while using it!

While both of the current answers are technically true I think that the potential exists for them to be a little misleading and they don't express ConcurrentDictionary's big strengths. Maybe the OP's original way of solving the problem with locks worked in that specific circumstance but this answer is aimed more generally towards people learning about ConcurrentDictionary for the first time.
Concurrent Dictionary is designed so that you don't have to use locks. It has several specialty methods designed around the idea that some other thread could modify the object in the dictionary while you're currently working on it. For a simple example, the TryUpdate method lets you check to see if a key's value has changed between when you got it and the moment that you're trying to update it. If the value that you've got matches the value currently in the ConcurrentDictionary you can update it and TryUpdate returns true. If not, TryUpdate returns false. The documentation for the TryUpdate method can make this a little confusing because it doesn't make it explicitly clear why there is a comparison value but that's the idea behind the comparison value. If you wanted to have a little more control around adding or updating, you could use one of the overloads of the AddOrUpdate method to either add a value for a key if it doesn't exist at the moment that you're trying to add it or update the value if some other thread has already added a value for the key that is specified. The context of whatever you're trying to do will dictate the appropriate method to use. The point is that, rather than locking, try taking a look at the specialty methods that ConcurrentDictionary provides and prefer those over trying to come up with your own locking solution.
In the case of OP's original question, I would suggest that instead of this:
ConcurrentDictionary<Guid, Client> m_Clients;
Client client;
//Does this if make the contents within it thread-safe?
if (m_Clients.TryGetValue(clientGUID, out client))
{
//Users is a list.
client.Users.Add(item);
}
One might try the following instead*:
ConcurrentDictionary<Guid, Client> m_Clients;
Client originalClient;
if(m_Clients.TryGetValue(clientGUID, out originalClient)
{
//The Client object will need to implement IEquatable if more
//than an object instance comparison needs to be done. This
//sample code assumes that Client implements IEquatable.
//If copying a Client is not trivial, you'll probably want to
//also implement a simple type of copy in a method of the Client
//object. This sample code assumes that the Client object has
//a ShallowCopy method to do this copy for simplicity's sake.
Client modifiedClient = originalClient.ShallowCopy();
//Make whatever modifications to modifiedClient that need to get
//made...
modifiedClient.Users.Add(item);
//Now update the value in the ConcurrentDictionary
if(!m_Clients.TryUpdate(clientGuid, modifiedClient, originalClient))
{
//Do something if the Client object was updated in between
//when it was retrieved and when the code here tries to
//modify it.
}
}
*Note in the example above, I'm using TryUpate for ease of demonstrating the concept. In practice, if you need to make sure that an object gets added if it doesn't exist or updated if it does, the AddOrUpdate method would be the ideal option because the method handles all of the looping required to check for add vs update and take the appropriate action.
It might seem like it's a little harder at first because it may be necessary to implement IEquatable and, depending on how instances of Client need to be copied, some sort of copying functionality but it pays off in the long run if you're working with ConcurrentDictionary and objects within it in any serious way.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.