How can I wait till the Parallel.ForEach completes - c#

I'm using TPL in my current project and using Parallel.Foreach to spin many threads. The Task class contains Wait() to wait till the task gets completed. Like that, how I can wait for the Parallel.ForEach to complete and then go into executing next statements?

You don't have to do anything special, Parallel.Foreach() will wait until all its branched tasks are complete. From the calling thread you can treat it as a single synchronous statement and for instance wrap it inside a try/catch.
Update:
The old Parallel class methods are not a good fit for async (Task based) programming. But starting with dotnet 6 we can use Parallel.ForEachAsync()
await Parallel.ForEachAsync(items, (item, cancellationToken) =>
{
await ...
});
There are a few overloads available and the 'body' method should return a ValueTask.

You don't need that with Parallel.Foreach: it only executes the foreach in as many thread as there are processors available, but it returns synchronously.
More information can be found here

As everyone here said - you dont need to wait for it. What I can add from my experience:
If you have an async body to execute and you await some async calls inside, it just ran through my code and did not wait for anything. So I just replaced the await with .Result - then it worked as intended. I couldnt find out though why is that so :/

if you are storing results from the tasks in a List, make sure to use a thread-safe data structure such as ConcurrentBag, otherwise, some results would be missing because of concurrent write issues.

I believe that you can use IsCompleted like follows:
if(Parallel.ForEach(files, f => ProcessFiles(f)).IsCompleted)
{
// DO STUFF
}

Related

Two ways of calling asynchronous method. C#

Are there differences between this two approaches? Or program runs similarly in this two cases? If there are differences could you say what are these differences.
First approach:
Task myTask = MyFunctionAsync();
await myTask;
Second approach:
await MyFunctionAsync();
Short version: "not really, at least not in an interesting way"
Long version: awaitables aren't limited to Task/Task<T>, so it is possible (trivially so, in fact) to create code that compiles fine with:
await MyFunctionAsync();
but doesn't compile with:
Task myTask = MyFunctionAsync();
await myTask;
simply because MyFunctionAsync() returns something that isn't a task. ValueTask<int> would be enough for this, but you can make exotic awaitables if you want. But: if we replace Task with var, i.e.
var myTask = MyFunctionAsync();
await myTask;
then now the only difference is that we can refer to myTask at other points in the code, if we want to. This isn't exactly uncommon; the two main scenarios being
combining multiple checks over concurrent code, perhaps using WhenAny or WhenAll
(usually in the case of ValueTask[<T>]) checking whether the awaitable completed synchronously, to avoid the state machine overhead in the synchronous case
They are effectively the same. The difference is that the first way lets you do more steps before you wait for a response. So you could start many tasks concurrently in the first way, and then await them all together with await Task.WhenAll(myListOfTasks)
For example:
var myTasks = myEmployees.Select(e => ProcessPayrollAsync(e));
await Task.WhenAll(myTasks);
I would use the first way if you need to for concurrency and the second way if its a simple case because its shorter.
In that particular case, the 2 forms of code are executed in a similar way. Homewer, consider this:
public async Task<int> CalculateResult(InputData data) {
// This queues up the work on the threadpool.
var expensiveResultTask = Task.Run(() => DoExpensiveCalculation(data));
// Note that at this point, you can do some other work concurrently,
// as CalculateResult() is still executing!
// Execution of CalculateResult is yielded here!
var result = await expensiveResultTask;
return result;
}
As the comments in the code above point out, between a task is running and the await call you can execute any other concurrent code.
For more information, read this article.
What really helped me to understand async-await was the cook analogy descrived by Eric Lippert in this interview. Search somewhere in the middle for async-await.
Here he describes a cook making breakfast. Once he put on the kettle to boil water for the tea, he doesn't wait idly for the water to cook. Instead he puts the bread in the toaster, tea in the teapot and starts slicing tomatoes: whenever he has to wait for another apparatus or other cook to finish his job, he doesn't wait idly, but starts the next task, until he needs the results for one of tasks previous tasks.
Async await, does the same: whenever another process has to do something, where your thread can do nothing but wait for the other process to finish, the thread can look around to see if it can do other things instead. You typically see async-await when another lengthy process is involved: Writing data to a file, Querying data from a database or from the internet. Whenever your thread has to do this, the thread can order the other process to do something while it continues to do other things:
Task<string> taskReadFile = ReadMyFileAsync(...);
// not awaiting, we can do other things:
DoSomethingElse();
// now we need the result of the file:
string fetchedData = await taskReadFile;
So what happens. ReadMyFileAsync is async, so you know that somewhere deep inside it, the method awaits. In fact, your compiler will warn you if you forget to await.
Once your thread sees the await, it knows that the results of the await are needed, so it can't continue. Instead it goes up the call stack to continue processing (in my example DoSomethingElse()), until it sees an await. It goes up the call stack again and continues processing, etc.
So in fact there is no real difference between your first and your second method. You can compare it with:
double x = Math.Sin(4.0)
versus
double a = 4.0;
double x = Math.Sin(a);
Officially the only difference is that after these statements you can still use a. Similarly you can use information from the task after the await:
Task<MyData> myTask = FetchMyDataAsync(...);
MyData result = await myTask;
// if desired you can investigate myTask
if (result == null)
{
// why is it null, did someone cancel my task?
if (Task.IsCanceled)
{
Yell("Hey, who did cancel my task!");
}
}
But most of the times you are not interested in the task. If you don't have anything else to do while the task is executing, I'd just await for it:
MyData fetchedData = await FetchMyDataAsync(...)

Parallel.ForEach with async lambda waiting forall iterations to complete

recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Is there any way how could I write:
List<int> list = new List<int>[]();
Parallel.ForEach(arrayValues, async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
list.Add(x);
});
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Well, that's because Parallel doesn't work with async. And from a different perspective, why would you want to mix them in the first place? They do opposite things. Parallel is all about adding threads and async is all about giving up threads. If you want to do asynchronous work concurrently, then use Task.WhenAll. That's the correct tool for the job; Parallel is not.
That said, it sounds like you want to use the wrong tool, so here's how you do it...
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
You'll need to have some kind of a signal that some code can block on until the processing is done, e.g., CountdownEvent or Monitor. On a side note, you'll need to protect access to the non-thread-safe List<T> as well.
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
Since Parallel doesn't understand async lambdas, when the first await yields (returns) to its caller, Parallel will assume that interation of the loop is complete.
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
Correct. As far as Parallel knows, it can only "see" the method to the first await that returns to its caller. So it doesn't know when the async lambda is complete. It also will assume iterations are complete too early, which throws partitioning off.
You don't need Parallel.For/ForEach here you just need to await a list of tasks.
Background
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run multiple async tasks at the same time use Task.WhenAll , or a TPL Dataflow Block (or something similar) which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
Unless you need to do more inside of your lambda (for which you haven't shown), just use aSelect and WhenAll
var tasks = items .Select(LongRunningIoOperationAsync);
var results = await Task.WhenAll(tasks); // here is your list of int
If you do, you can still use the await,
var tasks = items.Select(async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
// do other stuff
return x;
});
var results = await Task.WhenAll(tasks);
Note : If you need the extended functionality of Parallel.ForEach (namely the Options to control max concurrency), there are several approach, however RX or DataFlow might be the most succinct

Effects of async within a parallel for loop

I am going to start by saying that I am learning about mulithreading at the moment so it may be the case that not all I say is correct - please feel free to correct me as required. I do have a reasonable understanding of async and await.
My basic aim is as follows:
I have a body of code that currently takes about 3 seconds. I am trying to load some data at the start of the method that will be used right at the end. My plan is to load the data on a different thread right at the start - allowing the rest of the code to execute independently. Then, at the point that I need the data, the code will wait if the data is not loaded. So far this is all seems to be working fine and as I describe.
My question relates to what happens when I call a method that is async, within a parallel for loop, without awaiting it.
My code follows this structure:
public void MainCaller()
{
List<int> listFromThread = null;
var secondThread = Task.Factory.StartNew(() =>
{
listFromThread = GetAllLists().Result;
});
//Do some other stuff
secondThread.Wait();
//Do not pass this point until this thread has completed
}
public Task<List<int>> GetAllLists()
{
var intList = new List<int>(){ /*Whatever... */};
var returnList = new List<int>();
Parallel.ForEach(intList, intEntry =>
{
var res = MyMethod().Result;
returnList.AddRange(res);
});
return Task.FromResult(returnList);
}
private async Task<List<int>> MyMethod()
{
var myList = await obtainList.ToListAsync();
}
Note the Parallel for Loop calls the async method, but does not await it as it is not async itself.
This is a method that is used elsewhere, so it is valid that it is async. I know one option is to make a copy of this method that is not async, but I am trying to understand what will happen here.
My question is, can I be sure that when I reach secondThread.Wait(); the async part of the execution will be complete. Eg will wait to know wait for the async part to complete, or will async mess up the wait, or will it work seamlessly together?
It seems to me it could be possible that as the call to MyMethod is not awaited, but there is an await within MyMethod, the parallel for loop could continue execution before the awaited call has completed?
Then I think, as it is assigning it by reference, then once the assigning takes place, the value will be the correct result.
This leads me to think that as long as the wait will know to wait for the async to complete, then there is no problem - hence my question.
I guess this relates to my lack of understanding of Tasks?
I hope this is clear?
In your code there is no part that is executed asynchrounously.
In MainCaller, you start a Task and immediately Wait for it to finished.
This is a blocking operation which only introduces the extra overhead of calling
GetAllLists in another Task.
In this Task you call You start a new Task (by calling GettAllLists) but immediately
wait for this Task to finish by waiting for its Result (which is also blocking).
In the Task started by GetAllLists you have the Parallel.Foreach loop which starts
several new Tasks. Each of these 'for' Tasks will start another Task by calling
MyMethod and immediately waiting for its result.
The net result is that your code completely executes synchronously. The only parallelism is introduced in the Parallel.For loop.
Hint: a usefull thread concerning this topic: Using async/await for multiple tasks
Additionally your code contains a serious bug:
Each Task created by the Parallel.For loop will eventually add its partial List to the ReturnList by calling AddRange. 'AddRange' is not thread safe, so you need to have some synchronisation mechanism (e.g. 'Lock') or there is the possibility that your ReturnList gets corrupted or does not contain all the results. See also: Is the List<T>.AddRange() thread safe?

Task.WhenAny - What happens with remaining running tasks?

I have the following code:
List<Task<bool>> tasks = tasksQuery.ToList();
while (tasks.Any())
{
Task<bool> completedTask = await Task.WhenAny(tasks);
if (await completedTask)
return true;
tasks.Remove(completedTask);
}
It launches tasks in parallel. When first completed task returns true, the method returns true.
My questions are:
What happens with all remaining tasks that have been launched and probably still running in the background?
Is this the right approach to execute code that is async, parallel and should return after the first condition occurs, or it is better to launch them one by one and await singularly?
Incidentally, I am just reading Concurrency in C# CookBook, by Stephen Cleary, and I can refer to some parts of the book here, I guess.
From Recipe 2.5 - Discussion, we have
When the first task completes, consider whether to cancel the remaining tasks. If the other tasks are not canceled but are also never awaited, then they are abandoned. Abandoned tasks will run to completion, and their results will be ignored. Any exceptions from those abandoned tasks will also be ignored.
Another antipattern for Task.WhenAny is handling tasks as they complete. At first it seems like a reasonable approach to keep a list of tasks and remove each task from the list as it completes. The problem with this approach is that it executes in O(N^2) time, when an O(N) algorithm exists.
Besides that, I think WhenAny is surely the right approach. Just consider the following Leonid approach of passing the same CancellationToken for all tasks and cancel them after the first one returns. And even that, only if the cost of these operations is actually taxing the system.

Parallel.Invoke() without waiting on execution to finish

Please know that I'm aware that Parallel.Invoke() is meant for task synchronization. My question is this:
Is there a way to call an anonymous method using something like Parallel.Invoke in which the call does NOT wait on the execution to finish?
I thought the whole point of parallel execution (or parallel invokation) is to NOT have to wait for the task to finish. If you want to wait for a piece of code to finish executing, instead of using Parallel.Invoke, why not just call the code directly? I guess I just don't understand the point of Parallel.Invoke. The documentation just says what it does, but doesn't mention any use-cases when this would be more useful than just calling the code directly.
If you want to wait for a piece of code to finish executing, instead of using Parallel.Invoke, why not just call the code directly?
Well normally you'd call Parallel.Invoke with multiple pieces of work. If you execute those pieces of work in series, it'll (probably) take longer than executing them in parallel.
"Execute in parallel" isn't the same as "execute in the background" - you appear to be looking for the latter, but that's not what Parallel.Invoke is about.
If you just want to start tasks, use Task.Run (or Task.Factory.StartNew prior to .NET 4.5). Parallel.Invoke is specifically for executing a bunch of actions in parallel, but then waiting for those parallel actions to complete.
As a concrete example, you can perform a sort by partitioning, then recursively sorting both sides of the pivot in parallel. Doing this will make use of multiple cores, but you would usually still want to wait until that whole sort had completed before you proceed.
Simply wrap your Parallel.Invoke call in a Task if you don't want to wait for completion.
Task.Factory.StartNew( () =>
{
Parallel.Invoke( <one or more actions> );
} );
Are you looking for something like this?
async Task ParallelInvokeAsync(Action[] actions)
{
var tasks = actions.Select(a => Task.Run(a));
await Task.WhenAll(tasks);
}
Background execution:
ParallelInvokeAsync(actions);
MessageBox.Show("I'm working");
Background execution with blocking, very similar to Parallel.Invoke:
ParallelInvokeAsync(actions).Wait();
MessageBox.Show("I'm dome working");

Categories

Resources