Will Parallel.Foreach block until it is done? - c#

If I have code similar to this:
foreach (Item child in item.Children)
{
// Do stuff
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = 3;
Parallel.ForEach(items, i => DoStuff());
}
Is the Parallel.Foreach going to finish all of its items before moving on to the next foreach item?

Yes - Parallel.ForEach will block. It's a synchronous method, which internally does its work in parallel.

I've gone with a slightly bizarre way to demonstrate the desired property below, because I can't find any nice excerpts from the documentation for e.g. Parallel.ForEach that just come out and states that the loops are completed before the methods return:
Yes. Note the return type of Parallel.Foreach is a ParallelLoopResult which contains information that can only be available once all of the operations have completed, such as IsCompleted:
Gets whether the loop ran to completion, such that all iterations of the loop were executed and the loop didn't receive a request to end prematurely.
ParallelLoopResult is a struct - and so whatever value is returned from Parallel.ForEach cannot be altered after the return from that method.

Related

Why are parallel tasks not running?

I have this code, it is the skeleton of larger functionality stripped down to prove the problem:
var tasks = Enumerable.Range(0, 10)
.Select(laneNo => Task.Run(() => Console.WriteLine($"Starting generator for lane {laneNo}")));
for(int r=0; ;++r)
{
Task.Delay(TimeSpan.FromSeconds(3)).Wait();
Console.WriteLine($"Iteration {r} at {DateTime.Now}");
}
I never see "Starting generator" printed to Console but I do see the iteration fire every 3 seconds - something is causing those tasks not to progress (in the real code they run for a significant period but removing that doesn't affect the problem).
Why are the first bunch of Tasks not running? My theory is it's related to Task.Delay?
Your linq-statment is never materialized. Linq-operators like Select, Where, OrderBy, etc work as building blocks that you chain together but they are not executed until you run it through a foreach or use operators which do not return enumerables, like ToArray, ToList, First, Last etc.
If you call ToList at the end you should see all of the tasks executing but if you only call First you should see only a single one because the iteration of your original Range will then terminate after first element.
LINQ Select has deferred execution; it simply defines an iterator, so your Tasks are not being generated.
You could make use of Task.WhenAll(IEnumerable<Task>), which will iterate and await each Task, generating new Task that completes once all the provided tasks have also completed:
var tasks = Enumerable.Range(0, 10)
.Select(laneNo => Task.Run(() => Console.WriteLine($"Starting generator for lane {laneNo}")));
await Task.WhenAll(tasks);

Parallel.ForEach with async lambda waiting forall iterations to complete

recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Is there any way how could I write:
List<int> list = new List<int>[]();
Parallel.ForEach(arrayValues, async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
list.Add(x);
});
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Well, that's because Parallel doesn't work with async. And from a different perspective, why would you want to mix them in the first place? They do opposite things. Parallel is all about adding threads and async is all about giving up threads. If you want to do asynchronous work concurrently, then use Task.WhenAll. That's the correct tool for the job; Parallel is not.
That said, it sounds like you want to use the wrong tool, so here's how you do it...
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
You'll need to have some kind of a signal that some code can block on until the processing is done, e.g., CountdownEvent or Monitor. On a side note, you'll need to protect access to the non-thread-safe List<T> as well.
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
Since Parallel doesn't understand async lambdas, when the first await yields (returns) to its caller, Parallel will assume that interation of the loop is complete.
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
Correct. As far as Parallel knows, it can only "see" the method to the first await that returns to its caller. So it doesn't know when the async lambda is complete. It also will assume iterations are complete too early, which throws partitioning off.
You don't need Parallel.For/ForEach here you just need to await a list of tasks.
Background
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run multiple async tasks at the same time use Task.WhenAll , or a TPL Dataflow Block (or something similar) which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
Unless you need to do more inside of your lambda (for which you haven't shown), just use aSelect and WhenAll
var tasks = items .Select(LongRunningIoOperationAsync);
var results = await Task.WhenAll(tasks); // here is your list of int
If you do, you can still use the await,
var tasks = items.Select(async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
// do other stuff
return x;
});
var results = await Task.WhenAll(tasks);
Note : If you need the extended functionality of Parallel.ForEach (namely the Options to control max concurrency), there are several approach, however RX or DataFlow might be the most succinct

Should Parallel.Foreach be waited?

A weird thing is happening here. I thought Parallel.Foreach would wait until all of its tasks are complete before moving on. But then, I have something like that:
List<string> foo(List<A> list){
Dictionary<string, bool> dictionary = new Dictionary<string, bool>();
Parallel.Foreach(list, element =>
{
dictionary[element.Id] = true;
if (element.SomeMethod()){
dictionary[element.Id] = false;
}
});
List<string> selectedIds = (from element in list where !dictionary[element.Id] select element.Id).ToList();
return selectedIds;
}
and then I'm getting System.Collections.Generic.KeyNotFoundException (sometimes, not always) in the select line. As you can see, I'm initializing the dictionary for every possible key (Ids of list's elements), and then getting this exception, which made me think that this line might be reached before the execution of the Parallel.Foreach completes... Is that right? If so, how can I wait until all branches of this Parallel.Foreach completes?
Parallel.Foreach doesn't need to be waited as it doesn't return a Task and isn't asynchronous. When the call to that method completes the iteration is already done.
However, Parallel.Foreach uses multiple threads concurrently and Dictionary isn't thread safe.
You probably have a race conditions on your hands and you should be using the thread safe ConcurrentDictionary instead.
This specific case can be solved in a simpler way by using PLinq's AsParallel:
list.AsParallel().Where(element => !element.SomeMethod());

Creating a task inside a foreach loop

I have a typical foreach loop that calls a method where the parameter is an element of the collection we're looping over; something like this:
foreach (byte x in SomeCollection)
{
SomeMethod(x);
}
The problem is that SomeMethod takes a long time to run. I want to move the call into a new task so that the loop just creates the tasks and then the thread that called the loops just continues. How do I do this in a thread-safe way?
Edit
I had a performance issue because SomeMethod makes several DB calls. So I converted the loop to a Parallel.ForEach but that didn't make much of a difference because each thread then call the DB. What I'm looking to do is just create Tasks that will run in the background and let the main thread continue.
One way would be to use Parallel.ForEach to do this:
Parallel.ForEach(SomeCollection, x => SomeMethod(x));
The code would wait for all calls of SomeMethod to complete before proceeding, but the individual calls may run in parallel.
If you don't want to wait for the calls to finish, wrap this call in StartNew:
Task.Factory.StartNew(() => Parallel.ForEach(SomeCollection, x => SomeMethod(x)));
What thread safety do you expect? This will be thread-safe:
foreach (byte x in SomeCollection) { Task.Factory.StartNew(() => SomeMethod(x)); }
until your method does not modify any shared state, which isn't thread-safe itself.
You could something like:
IEnumerable<Task> doWork()
{
foreach(var x in SomeCollection)
yield return Task.Run(()=>SomeMethod(x);
}
Task.WhenAll(doWork());
This will run them all at the same time.

run a method multiple times simultaneously in c#

I have a method that returns XML elements, but that method takes some time to finish and return a value.
What I have now is
foreach (var t in s)
{
r.add(method(test));
}
but this only runs the next statement after previous one finishes. How can I make it run simultaneously?
You should be able to use tasks for this:
//first start a task for each element in s, and add the tasks to the tasks collection
var tasks = new List<Task>();
foreach( var t in s)
{
tasks.Add(Task.Factory.StartNew(method(t)));
}
//then wait for all tasks to complete asyncronously
Task.WaitAll(tasks);
//then add the result of all the tasks to r in a treadsafe fashion
foreach( var task in tasks)
{
r.Add(task.Result);
}
EDIT
There are some problems with the code above. See the code below for a working version. Here I have also rewritten the loops to use LINQ for readability issues (and in the case of the first loop, to avoid the closure on t inside the lambda expression causing problems).
var tasks = s.Select(t => Task<int>.Factory.StartNew(() => method(t))).ToArray();
//then wait for all tasks to complete asyncronously
Task.WaitAll(tasks);
//then add the result of all the tasks to r in a treadsafe fashion
r = tasks.Select(task => task.Result).ToList();
You can use Parallel.ForEach which will utilize multiple threads to do the execution in parallel. You have to make sure that all code called is thread safe and can be executed in parallel.
Parallel.ForEach(s, t => r.add(method(t));
From what I'm seeing you are updating a shared collection inside the loop. This means that if you execute the loop in parallel a data race will occur because multiple threads will try to update a non-synchronized collection (assuming r is a List or something like this) at the same time, causing an inconsistent state.
To execute correctly in parallel, you will need to wrap that section of code inside a lock statement:
object locker = new object();
Parallel.Foreach (s,
t =>
{
lock(locker) r.add(method(t));
});
However, this will make the execution actually serial, because each thread needs to acquire the lock and two threads cannot do so at the same time.
The better solution would be to have a local list for each thread, add the partial results to that list and then merge the results when all threads have finished. Probably #Øyvind Knobloch-Bråthen's second solution is the best one, assuming method(t) is the real CPU-hog in this case.
Modification to the correct answer for this question
change
tasks.Add(Task.Factory.StartNew(method(t);));
to
//solution will be the following code
tasks.Add(Task.Factory.StartNew(() => { method(t);}));

Categories

Resources