I have an application that has to create RegistrationInputs. Of every Input I'm turning into a RegistrationInput I save the ID in a hashset of integers to make sure I'm never processing the same Input more than once. I need to make the registrationinputs of my array of inputs asynchronously, but if I see during the creation of one of the RegistrationInputs that any of the values isn't correct, I return null and remove the ID from the hashset.
Is what I'm doing thread-safe? Also is this the best way to asynchronously process data? I already tried Parallel.Foreach with an async lambda but that returns async void so I can't await it.
Inputs[] events = GetInputs();
List<Task<RegistrationInput>> tasks = new List<Task<RegistrationInput>>();
foreach (var ev in events)
tasks.Add(ProcessEvent(ev));
tempInputs = await Task.WhenAll<RegistrationInput>(tasks);
Is what I'm doing thread-safe?
No, a HashSet<T> is not thread-safe. If you need to modify it from multiple threads, you'll need to use a lock:
Any public static (Shared in Visual Basic) members of this type are
thread safe. Any instance members are not guaranteed to be thread
safe.
The best thing you could do is make those concurrent operations completely unaware of each other, and have some higher level mechanism that makes sure no two IDs are queried twice.
Also is this the best way to asynchrously process data?
It seems to me that you are on the right track with executing ProcessEvent concurrently for each event. Only thing I would do is perhaps re-write the foreach loop to use Enumerable.Select, but that is a matter of flavor:
Inputs[] events = GetInputs();
var tasks = events.Select(ev => ProcessEvent(ev));
tempInputs = await Task.WhenAll<RegistrationInput>(tasks);
You can use a ConcurentDictionary<ProcessEvent, byte> and just use the Keys.
The usage of byte as type of Value is to minimize the amount of memory used. If you don't have any memory considerations you can use anything else.
It is thread safe and you can have all functionalities in HashSet
Related
I have a C# .NET program that uses an external API to process events for real-time stock market data. I use the API callback feature to populate a ConcurrentDictionary with the data it receives on a stock-by-stock basis.
I have a set of algorithms that each run in a constant loop until a terminal condition is met. They are called like this (but all from separate calling functions elsewhere in the code):
Task.Run(() => ExecutionLoop1());
Task.Run(() => ExecutionLoop2());
...
Task.Run(() => ExecutionLoopN());
Each one of those functions calls SnapTotals():
public void SnapTotals()
{
foreach (KeyValuePair<string, MarketData> kvpMarketData in
new ConcurrentDictionary<string, MarketData>(Handler.MessageEventHandler.Realtime))
{
...
The Handler.MessageEventHandler.Realtime object is the ConcurrentDictionary that is updated in real-time by the external API.
At a certain specific point in the day, there is an instant burst of data that comes in from the API. That is the precise time I want my ExecutionLoop() functions to do some work.
As I've grown the program and added more of those execution loop functions, and grown the number of elements in the ConcurrentDictionary, the performance of the program as a whole has seriously degraded. Specifically, those ExecutionLoop() functions all seem to freeze up and take much longer to meet their terminal condition than they should.
I added some logging to all of the functions above, and to the function that updates the ConcurrentDictionary. From what I can gather, the ExecutionLoop() functions appear to access the ConcurrentDictionary so often that they block the API from updating it with real-time data. The loops are dependent on that data to meet their terminal condition so they cannot complete.
I'm stuck trying to figure out a way to re-architect this. I would like for the thread that updates the ConcurrentDictionary to have a higher priority but the message events are handled from within the external API. I don't know if ConcurrentDictionary was the right type of data structure to use, or what the alternative could be, because obviously a regular Dictionary would not work here. Or is there a way to "pause" my execution loops for a few milliseconds to allow the market data feed to catch up? Or something else?
Your basic approach is sound except for one fatal flaw: they are all hitting the same dictionary at the same time via iterators, sets, and gets. So you must do one thing: in SnapTotals you must iterate over a copy of the concurrent dictionary.
When you iterate over Handler.MessageEventHandler.Realtime or even new ConcurrentDictionary<string, MarketData>(Handler.MessageEventHandler.Realtime) you are using the ConcurrentDictionary<>'s iterator, which even though is thread-safe, is going to be using the dictionary for the entire period of iteration (including however long it takes to do the processing for each and every entry in the dictionary). That is most likely where the contention occurs.
Making a copy of the dictionary is much faster, so should lower contention.
Change SnapTotals to
public void SnapTotals()
{
var copy = Handler.MessageEventHandler.Realtime.ToArray();
foreach (var kvpMarketData in copy)
{
...
Now, each ExecutionLoopX can execute in peace without write-side contention (your API updates) and without read-side contention from the other loops. The write-side can execute without read-side contention as well.
The only "contention" should be for the short duration needed to do each copy.
And by the way, the dictionary copy (an array) is not threadsafe; it's just a plain array, but that is ok because each task is executing in isolation on its own copy.
I think that your main problem is not related to the ConcurrentDictionary, but to the large number of ExecutionLoopX methods. Each of these methods saturates a CPU core, and since the methods are more than the cores of your machine, the whole CPU is saturated. My assumption is that if you find a way to limit the degree of parallelism of the ExecutionLoopX methods to a number smaller than the Environment.ProcessorCount, your program will behave and perform better. Below is my suggestion for implementing this limitation.
The main obstacle is that currently your ExecutionLoopX methods are monolithic: they can't be separated to pieces so that they can be parallelized. My suggestion is to change their return type from void to async Task, and place an await Task.Yield(); inside the outer loop. This way it will be possible to execute them in steps, with each step being the code from the one await to the next.
Then create a TaskScheduler with limited concurrency, and a TaskFactory that uses this scheduler:
int maxDegreeOfParallelism = Environment.ProcessorCount - 1;
TaskScheduler scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxDegreeOfParallelism).ConcurrentScheduler;
TaskFactory taskFactory = new TaskFactory(scheduler);
Now you can parallelize the execution of the methods, by starting the tasks with the taskFactory.StartNew method instead of the Task.Run:
List<Task> tasks = new();
tasks.Add(taskFactory.StartNew(() => ExecutionLoop1(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop2(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop3(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop4(data)).Unwrap());
//...
Task.WaitAll(tasks.ToArray());
The .Unwrap() is needed because the taskFactory.StartNew returns a nested task (Task<Task>). The Task.Run method is also doing this unwrapping internally, when the action is asynchronous.
An online demo of this idea can be found here.
The Environment.ProcessorCount - 1 configuration means that one CPU core will be available for other work, like the communication with the external API and the updating of the ConcurrentDictionary.
A more cumbersome implementation of the same idea, using iterators and the Parallel.ForEach method instead of async/await, can be found in the first revision of this answer.
If you're not squeamish about mixing operations in a task, you could redesign such that instead of task A doing A things, B doing B things, C doing C things, etc. you can reduce the number of tasks to the number of processors, and thus run fewer concurrently, greatly easing contention.
So, for example, say you have just two processors. Make a "general purpose/pluggable" task wrapper that accepts delegates. So, wrapper 1 would accept delegates to do A and B work. Wrapper 2 would accept delegates to do C and D work. Then ask each wrapper to spin up a task that calls the delegates in a loop over the dictionary.
This would of course need to be measured. What I am proposing is, say, 4 tasks each doing 4 different types of processing. This is 4 units of work per loop over 4 loops. This is not the same as 16 tasks each doing 1 unit of work. In that case you have 16 loops.
16 loops intuitively would cause more contention than 4.
Again, this is a potential solution that should be measured. There is one drawback for sure: you will have to ensure that a piece of work within a task doesn't affect any of the others.
Are there any special rules or caveats about calling outside methods from within a Parallel.ForEach()? I want to make sure that I avoid any quirky multi-threading issues. Consider the following Parallel.ForEach() routine:
public void GetGroupUsers()
{
Parallel.ForEach(
groups.ToList(),
new ParallelOptions() { MaxDegreeOfParallelism = Environment.ProcessorCount },
group =>
{
//get associated user id's by group name
userGuids = GetUserGuidsForGroup(1);
//if no group-users then continue
if (userGuids == null) return;
//update group user map
foreach (var userGuid in userGuids)
groupUserMap.Add(new KeyValuePair<Guid, Guid>(group.Key, userGuid));
});
}
private List<Guid> GetUserGuidsForGroup(int groupNumber)
{
var userGuids = new List<Guid>();
//do a repo lookup of user guids for the groupNumber parameter and add to list
return userGuids;
}
Are there any special rules or caveats about calling outside methods from within a Parallel.ForEach()?
Rules / Caveats remain same you need when writing a simple multi threading application. Just that here threads are provisioned via pool, instead of being created by the user. When you call .Net framework apis, you are still calling outside / framework methods. In case of .Net framework api call you just need to ensure that every thread gets its own unique object to make a call (thus thread safe) or the shared object does the method call in a Thread Safe region. Focus on Write calls (which change state of an object) instead of Read calls. To list few important ones:
Avoid Race condition, don't write / update a shared / global / static variable without making a thread safe region using constructs like Interlocked, Lock, Semaphore, mutex
Try using Thread safe / Concurrent collections, as they do better since you don't need to worry about explicit synchronization
Read are generally thread safe, but Writes are not and can lead to Race condition (more often), Deadlock (all this is easy to avoid using Concurrent collections)
Using something like ThreadLocal / ThreadStatic, be careful, since here number_of_threads != number_of_Elements to be processed, which means same thread can / will be used for multiple iterations, thus it would retain the local state, which you might not expect, thus it might need explicit resetting the state of such variables. This often leads to issues.
Differentiate between IO calls and Computing calls, since Parallel APIs are poor choice for IO bound calls, Async-Await is a better choice
My question is little theoretical I want to know whether List<object> is thread-safe if I used Parallel.For in this way. Please see below:
public static List<uint> AllPrimesParallelAggregated(uint from, uint to)
{
List<uint> result = new List<uint>();
Parallel.For((int)from, (int)to,
() => new List<uint>(), // Local state initializer
(i, pls, local) => // Loop body
{
if (IsPrime((uint)i))
{
local.Add((uint)i);
}
return local;
},
local => // Local to global state combiner
{
lock (result)
{
result.AddRange(local);
}
});
return result;
}
Is local list is thread safe? Whether I've the correct data in result list without data is being alter due to multiple threads as using I'm having normal loop?
Note: I'm not worry about list order. I want to know about length of the list and data.
Is this solution thread-safe? Technically yes, in reality no.
The very notion of a thread-safe List (as opposed to a queue or a bag) is that the list is safe with respect to order, or, more strictly, index, since there is no key other than an ascending integer. In a parallel world, it's sort of a nonsensical concept, when you think about it. That is why the System.Collections.Concurrent namespace contains a ConcurrentBag and a ConcurrentQueue but no ConcurrentList.
Since you are asking about the thread safety of a list, I am assuming the requirement for your software is to generate a list that is in ascending order. If that is the case, no, your solution will not work. While the code is technically thread-safe, the threads may finish in any order and your variable result will end up non-sorted.
If you wish to use parallel computation, you must store your results in a bag, then when all threads are finished sort the bag to generate the ordered list. Otherwise you must perform the computation in series.
And since you have to use a bag anyway, you may as well use ConcurrentBag, and then you won't have to bother with the lock{} statement.
List Is not thread safe. but your current algorithm works. as it was explained in other answer and comments.
This is the description about localInit of For.Parallel
The <paramref name="localInit"/> delegate is invoked once for each thread that participates in the loop's execution and returns the initial local state for each of those threads. These initial states are passed to the first
IMO You are adding unnecessary complexity inside your loop. I would use ConcurrentBag instead. which is thread safe by design.
ConcurrentBag<uint> result = new ConcurrentBag<uint>();
Parallel.For((long) from, (long) to,
(i, PLS) =>
{
if (IsPrime((uint)i))
{
result.Add((uint)i); // this is thread safe. don't worry
}
});
return result.OrderBy(I => I).ToList(); // order if that matters
See concurrent bag here
All public and protected members of ConcurrentBag are thread-safe and may be used concurrently from multiple threads.
A List<T> is not thread-safe. For the way you are using it, thread safety is not required though. You need thread safety when you access a resource concurrently from multiple threads. You are not doing that, since you are working on local lists.
At the end you are adding the contents of the local lists into the result variable. Since you use lock for this operation, you are thread-safe within that block.
So your solution is probably fine.
Recently I've stumbled upon a Parralel.For loop that performs way better than a regular for loop for my purposes.
This is how I use it:
Parallel.For(0, values.Count, i =>Products.Add(GetAllProductByID(values[i])));
It made my application work a lot faster, but still not fast enough. My question to you guys is:
Does Parallel.Foreach performs faster than Parallel.For?
Is there some "hybrid" method with whom I can combine my Parralel.For loop to perform even faster (i.e. use more CPU power)? If yes, how?
Can someone help me out with this?
If you want to play with parallel, I suggest using Parallel Linq (PLinq) instead of Parallel.For / Parallel.ForEach , e.g.
var Products = Enumerable
.Range(0, values.Count)
.AsParallel()
//.WithDegreeOfParallelism(10) // <- if you want, say 10 threads
.Select(i => GetAllProductByID(values[i]))
.ToList(); // <- this is thread safe now
With a help of With methods (e.g. WithDegreeOfParallelism) you can try tuning you implementation.
There are two related concepts: asynchronous programming and multithreading. Basically, to do things "in parallel" or asynchronously, you can either create new threads or work asynchronously on the same thread.
Keep in mind that either way you'll need some mechanism to prevent race conditions. From the Wikipedia article I linked to, a race condition is defined as follows:
A race condition or race hazard is the behavior of an electronic,
software or other system where the output is dependent on the sequence
or timing of other uncontrollable events. It becomes a bug when events
do not happen in the order the programmer intended.
As a few people have mentioned in the comments, you can't rely on the standard List class to be thread-safe - i.e. it might behave in unexpected ways if you're updating it from multiple threads. Microsoft now offers special "built-in" collection classes (in the System.Collections.Concurrent namespace) that'll behave in the expected way if you're updating it asynchronously or from multiple threads.
For well-documented libraries (and Microsoft's generally pretty good about this in their documentation), the documentation will often explicitly state whether the class or method in question is thread-safe. For example, in the documentation for System.Collections.Generic.List, it states the following:
Public static (Shared in Visual Basic) members of this type are thread
safe. Any instance members are not guaranteed to be thread safe.
In terms of asynchronous programming (vs. multithreading), my standard illustration of this is as follows: suppose you go a restaurant with 10 people. When the waiter comes by, the first person he asks for his order isn't ready; however, the other 9 people are. Thus, the waiter asks the other 9 people for their orders and then comes back to the original guy. (It's definitely not the case that they'll get a second waiter to wait for the original guy to be ready to order and doing so probably wouldn't save much time anyway). That's how async/await typically works (the exception being that some of the Task Parallel library calls, like Thread.Run(...), actually are executing on other threads - in our illustration, bringing in a second waiter - so make sure you check the documentation for which is which).
Basically, which you choose (asynchronously on the same thread or creating new threads) depends on whether you're trying to do something that's I/O-bound (i.e. you're just waiting for an operation to complete or for a result) or CPU-bound.
If your main purpose is to wait for a result from Ebay, it would probably be better to work asynchronously in the same thread as you may not get much of a performance benefit for using multithreading. Think back to our analogy: bringing in a second waiter just to wait for the first guy to be ready to order isn't necessarily any better than just having the waiter to come back to him.
I'm not sitting in front of an IDE so forgive me if this syntax isn't perfect, but here's an approximate idea of what you can do:
public async Task GetResults(int[] productIDsToGet) {
var tasks = new List<Task>();
foreach (int productID in productIDsToGet) {
Task task = GetResultFromEbay(productID);
tasks.Add(task);
}
// Wait for all of the tasks to complete
await Task.WhenAll(tasks);
}
private async Task GetResultFromEbay(int productIdToGet) {
// Get result asynchronously from eBay
}
I've an application that makes use of parallelization for processing data.
The main program is in C#, while one of the routine for analyzing data is on an external C++ dll. This library scans data and calls a callback everytime a certain signal is found within the data. Data should be collected, sorted and then stored into HD.
Here is my first simple implementation of the method invoked by the callback and of the method for sorting and storing data:
// collection where saving found signals
List<MySignal> mySignalList = new List<MySignal>();
// method invoked by the callback
private void Collect(int type, long time)
{
lock(locker) { mySignalList.Add(new MySignal(type, time)); }
}
// store signals to disk
private void Store()
{
// sort the signals
mySignalList.Sort();
// file is a object that manages the writing of data to a FileStream
file.Write(mySignalList.ToArray());
}
Data is made up of a bidimensional array (short[][] data) of size 10000 x n, with n variable. I use parallelization in this way:
Parallel.For(0, 10000, (int i) =>
{
// wrapper for the external c++ dll
ProcessData(data[i]);
}
Now for each of the 10000 arrays I estimate that 0 to 4 callbacks could be fired. I'm facing a bottleneck and given that my CPU resources are not over-utilized, I suppose that the lock (together with thousand of callbacks) is the problem (am I right or there could be something else?). I've tried the ConcurrentBag collection but performances are still worse (in line with other user findings).
I thought that a possible solution for use lock-free code would be to have multiple collections. Then it would be necessary a strategy to make each thread of the parallel process working on a single collection. Collections could be for instance inside a dictionary with thread ID as key, but I do not know any .NET facility for this (I should know the threads ID for initialize the dictionary before launching the parallelization). Could be this idea feasible and, in case yes, does exist some .NET tool for this? Or alternatively, any other idea to speed up the process?
[EDIT]
I've followed the Reed Copsey's suggestion and I used the following solution (according to the profiler of VS2010, before the burden for locking and adding to the list was taking 15% of the resources, while now only 1%):
// master collection where saving found signals
List<MySignal> mySignalList = new List<MySignal>();
// thread-local storage of data (each thread is working on its List<MySignal>)
ThreadLocal<List<MySignal>> threadLocal;
// analyze data
private void AnalizeData()
{
using(threadLocal = new ThreadLocal<List<MySignal>>(() =>
{ return new List<MySignal>(); }))
{
Parallel.For<int>(0, 10000,
() =>
{ return 0;},
(i, loopState, localState) =>
{
// wrapper for the external c++ dll
ProcessData(data[i]);
return 0;
},
(localState) =>
{
lock(this)
{
// add thread-local lists to the master collection
mySignalList.AddRange(local.Value);
local.Value.Clear();
}
});
}
}
// method invoked by the callback
private void Collect(int type, long time)
{
local.Value.Add(new MySignal(type, time));
}
thought that a possible solution for use lock-free code would be to have multiple collections. Then it would be necessary a strategy to make each thread of the parallel process working on a single collection. Collections could be for instance inside a dictionary with thread ID as key, but I do not know any .NET facility for this (I should know the threads ID for initialize the dictionary before launching the parallelization). Could be this idea feasible and, in case yes, does exist some .NET tool for this? Or alternatively, any other idea to speed up the process?
You might want to look at using ThreadLocal<T> to hold your collections. This automatically allocates a separate collection per thread.
That being said, there are overloads of Parallel.For which work with local state, and have a collection pass at the end. This, potentially, would allow you to spawn your ProcessData wrapper, where each loop body was working on its own collection, and then recombine at the end. This would, potentially, eliminate the need for locking (since each thread is working on it's own data set) until the recombination phase, which happens once per thread (instead of once per task,ie: 10000 times). This could reduce the number of locks you're taking from ~25000 (0-4*10000) down to a few (system and algorithm dependent, but on a quad core system, probably around 10 in my experience).
For details, see my blog post on aggregating data with Parallel.For/ForEach. It demonstrates the overloads and explains how they work in more detail.
You don't say how much of a "bottleneck" you're encountering. But let's look at the locks.
On my machine (quad core, 2.4 GHz), a lock costs about 70 nanoseconds if it's not contended. I don't know how long it takes to add an item to a list, but I can't imagine that it takes more than a few microseconds. But let's it takes 100 microseconds (I would be very surprised to find that it's even 10 microseconds) to add an item to the list, taking into account lock contention. So if you're adding 40,000 items to the list, that's 4,000,000 microseconds, or 4 seconds. And I would expect one core to be pegged if this were the case.
I haven't used ConcurrentBag, but I've found the performance of BlockingCollection to be very good.
I suspect, though, that your bottleneck is somewhere else. Have you done any profiling?
The basic collections in C# aren't thread safe.
The problem you're having is due to the fact that you're locking the entire collection just to call an add() method.
You could create a thread-safe collection that only locks single elements inside the collection, instead of the whole collection.
Lets look at a linked list for example.
Implement an add(item (or list)) method that does the following:
Lock collection.
A = get last item.
set last item reference to the new item (or last item in new list).
lock last item (A).
unclock collection.
add new items/list to the end of A.
unlock locked item.
This will lock the whole collection for just 3 simple tasks when adding.
Then when iterating over the list, just do a trylock() on each object. if it's locked, wait for the lock to be free (that way you're sure that the add() finished).
In C# you can do an empty lock() block on the object as a trylock().
So now you can add safely and still iterate over the list at the same time.
Similar solutions can be implemented for the other commands if needed.
Any built-in solution for a collection is going to involve some locking. There may be ways to avoid it, perhaps by segregating the actual data constructs being read/written, but you're going to have to lock SOMEWHERE.
Also, understand that Parallel.For() will use the thread pool. While simple to implement, you lose fine-grained control over creation/destruction of threads, and the thread pool involves some serious overhead when starting up a big parallel task.
From a conceptual standpoint, I would try two things in tandem to speed up this algorithm:
Create threads yourself, using the Thread class. This frees you from the scheduling slowdowns of the thread pool; a thread starts processing (or waiting for CPU time) when you tell it to start, instead of the thread pool feeding requests for threads into its internal workings at its own pace. You should be aware of the number of threads you have going at once; the rule of thumb is that the benefits of multithreading are overcome by the overhead when you have more than twice the number of active threads as "execution units" available to execute threads. However, you should be able to architect a system that takes this into account relatively simply.
Segregate the collection of results, by creating a dictionary of collections of results. Each results collection is keyed to some token carried by the thread doing the processing and passed to the callback. The dictionary can have multiple elements READ at one time without locking, and as each thread is WRITING to a different collection within the Dictionary there shouldn't be a need to lock those lists (and even if you did lock them you wouldn't be blocking other threads). The result is that the only collection that has to be locked such that it would block threads is the main dictionary, when a new collection for a new thread is added to it. That shouldn't have to happen often if you're smart about recycling tokens.