Consider this piece of code, where there is some work being done within a for loop, and then a recursive call to process sub items. I wanted to convert DoSomething(item) and GetItems(id) to async methods, but if I await on them here, the for loop is going to wait for each iteration to finish before moving on, essentially losing the benefit of parallel processing. How could I improve the performance of this method? Is it possible to do it using async/await?
public void DoWork(string id)
{
var items = GetItems(id); //takes time
if (items == null)
return;
Parallel.ForEach(items, item =>
{
DoSomething(item); //takes time
DoWork(item.subItemId);
});
}
Instead of using Parallel.ForEach to loop over the items you can create a sequence of tasks and then use Task.WhenAll to wait for them all to complete. As your code also involves recursion it gets slightly more complicated and you need to combine DoSomething and DoWork into a single method which I have aptly named DoIt:
async Task DoWork(String id) {
var items = GetItems(id);
if (items == null)
return;
var tasks = items.Select(DoIt);
await Task.WhenAll(tasks);
}
async Task DoIt(Item item) {
await DoSomething(item);
await DoWork(item.subItemId);
}
Mixing Parallel.ForEach and async/await is a bad idea. Parallel.ForEach will allow your code to execute in parallel and for compute intensive but parallelizable algorithms you get the best performance. However async/await allows your code to execute concurrently and for instance reuse threads that are blocked on IO operations.
Simplified Parallel.ForEach will setup as many threads as you have CPU cores on your computer and then partition the items you are iterating to be executed across these threads. So Parallel.ForEach should be used once at the bottom of your call stack where it will then fan out the work to multiple threads and wait for them to complete. Calling Parallel.ForEach in a recursive manner inside each of these threads is just crazy and will not improve performance at all.
Related
recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Is there any way how could I write:
List<int> list = new List<int>[]();
Parallel.ForEach(arrayValues, async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
list.Add(x);
});
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Well, that's because Parallel doesn't work with async. And from a different perspective, why would you want to mix them in the first place? They do opposite things. Parallel is all about adding threads and async is all about giving up threads. If you want to do asynchronous work concurrently, then use Task.WhenAll. That's the correct tool for the job; Parallel is not.
That said, it sounds like you want to use the wrong tool, so here's how you do it...
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
You'll need to have some kind of a signal that some code can block on until the processing is done, e.g., CountdownEvent or Monitor. On a side note, you'll need to protect access to the non-thread-safe List<T> as well.
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
Since Parallel doesn't understand async lambdas, when the first await yields (returns) to its caller, Parallel will assume that interation of the loop is complete.
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
Correct. As far as Parallel knows, it can only "see" the method to the first await that returns to its caller. So it doesn't know when the async lambda is complete. It also will assume iterations are complete too early, which throws partitioning off.
You don't need Parallel.For/ForEach here you just need to await a list of tasks.
Background
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run multiple async tasks at the same time use Task.WhenAll , or a TPL Dataflow Block (or something similar) which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
Unless you need to do more inside of your lambda (for which you haven't shown), just use aSelect and WhenAll
var tasks = items .Select(LongRunningIoOperationAsync);
var results = await Task.WhenAll(tasks); // here is your list of int
If you do, you can still use the await,
var tasks = items.Select(async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
// do other stuff
return x;
});
var results = await Task.WhenAll(tasks);
Note : If you need the extended functionality of Parallel.ForEach (namely the Options to control max concurrency), there are several approach, however RX or DataFlow might be the most succinct
I am going to start by saying that I am learning about mulithreading at the moment so it may be the case that not all I say is correct - please feel free to correct me as required. I do have a reasonable understanding of async and await.
My basic aim is as follows:
I have a body of code that currently takes about 3 seconds. I am trying to load some data at the start of the method that will be used right at the end. My plan is to load the data on a different thread right at the start - allowing the rest of the code to execute independently. Then, at the point that I need the data, the code will wait if the data is not loaded. So far this is all seems to be working fine and as I describe.
My question relates to what happens when I call a method that is async, within a parallel for loop, without awaiting it.
My code follows this structure:
public void MainCaller()
{
List<int> listFromThread = null;
var secondThread = Task.Factory.StartNew(() =>
{
listFromThread = GetAllLists().Result;
});
//Do some other stuff
secondThread.Wait();
//Do not pass this point until this thread has completed
}
public Task<List<int>> GetAllLists()
{
var intList = new List<int>(){ /*Whatever... */};
var returnList = new List<int>();
Parallel.ForEach(intList, intEntry =>
{
var res = MyMethod().Result;
returnList.AddRange(res);
});
return Task.FromResult(returnList);
}
private async Task<List<int>> MyMethod()
{
var myList = await obtainList.ToListAsync();
}
Note the Parallel for Loop calls the async method, but does not await it as it is not async itself.
This is a method that is used elsewhere, so it is valid that it is async. I know one option is to make a copy of this method that is not async, but I am trying to understand what will happen here.
My question is, can I be sure that when I reach secondThread.Wait(); the async part of the execution will be complete. Eg will wait to know wait for the async part to complete, or will async mess up the wait, or will it work seamlessly together?
It seems to me it could be possible that as the call to MyMethod is not awaited, but there is an await within MyMethod, the parallel for loop could continue execution before the awaited call has completed?
Then I think, as it is assigning it by reference, then once the assigning takes place, the value will be the correct result.
This leads me to think that as long as the wait will know to wait for the async to complete, then there is no problem - hence my question.
I guess this relates to my lack of understanding of Tasks?
I hope this is clear?
In your code there is no part that is executed asynchrounously.
In MainCaller, you start a Task and immediately Wait for it to finished.
This is a blocking operation which only introduces the extra overhead of calling
GetAllLists in another Task.
In this Task you call You start a new Task (by calling GettAllLists) but immediately
wait for this Task to finish by waiting for its Result (which is also blocking).
In the Task started by GetAllLists you have the Parallel.Foreach loop which starts
several new Tasks. Each of these 'for' Tasks will start another Task by calling
MyMethod and immediately waiting for its result.
The net result is that your code completely executes synchronously. The only parallelism is introduced in the Parallel.For loop.
Hint: a usefull thread concerning this topic: Using async/await for multiple tasks
Additionally your code contains a serious bug:
Each Task created by the Parallel.For loop will eventually add its partial List to the ReturnList by calling AddRange. 'AddRange' is not thread safe, so you need to have some synchronisation mechanism (e.g. 'Lock') or there is the possibility that your ReturnList gets corrupted or does not contain all the results. See also: Is the List<T>.AddRange() thread safe?
I would like to do something like this:
public async Task MyMethod()
{
// Do some preparation
await Parallel.ForEachAsync(0, count, i => { // Do some work //});
// do some finalization
}
However, I did not find an elegant way of doing so. I thought of two ways, but they are sub-optimal:
The only thing I thought about is manually partitioning the range, creating tasks, and then using Task.WhenAll.
Using the following code Task.Factory.StartNew(() => Parallel.For(...));.
The problem is that it "wastes" a thread on the asynchronous task.
Using TPL Dataflow's ActionBlock, and posting the integers one by one. The drawback is that it does not partition the range in a smart way like Parallel.For does, and works on each iteration one by one.
Manually using a Partitioner with Partitioner.Create, but it is less elegant. I want the framework to do intelligent partitioning for me.
You have a regular synchronous parallel loop that you'd like to invoke asynchronously (presumably to move it off the UI thread).
You can do this the same way you'd move any other CPU-bound work off the UI thread: using Task.Run:
public async Task MyMethod()
{
// Do some preparation
await Task.Run(() => Parallel.ForEach(0, count, i => { /* Do some work */ }));
// do some finalization
}
There is no thread "wasted" because Parallel.ForEach will use the calling thread as one of its worker threads.
(This is recipe 7.4 "Async Wrappers for Parallel Code" in my book).
I am trying to wrap my head around how to handle multiple async/await calls in a foreach loop. I have around 20,000 rows of data that are processed by the foreach loop. Roughly my code is:
foreach (var item in data)
{
if (ConditionA(item))
{
if (ConditionAB(item));
{
await CreateThingViaAPICall(item)
}
else
{
var result = await GetExistingRecord(item);
var result2 = await GetOtherExistingRecord(result);
var result3 = await GetOtherExistingRecord(result2);
//Do processing
...
await CreateThingViaAPICall();
}
}
... and so on
}
I've seen many posts saying the best way to use async in a loop is to build a list of tasks and then use Task.WhenAll. In my case I have Tasks that depend on each other as part of each iteration. How do I build up a list of tasks to execute in this case?
It's easiest if you break the processing of an individual item into a separate (async) method:
private async Task ProcessItemAsync(Item item)
{
if (ConditionA(item))
{
if (ConditionAB(item));
{
await CreateThingViaAPICall(item)
}
else
{
var result = await GetExistingRecord(item);
var result2 = await GetOtherExistingRecord(result);
var result3 = await GetOtherExistingRecord(result2);
//Do processing
...
await CreateThingViaAPICall();
}
}
... and so on
}
Then process your collection like so:
var tasks = data.Select(ProcessItemAsync);
await Task.WhenAll(tasks);
This effectively wraps the multiple dependent Tasks required to process a single item into one Task, allowing those steps to happen sequentially while items of the collection itself are processed concurrently.
With 10's of thousands of items, you may, for a variety of reasons, find that you need to throttle the number of Tasks running concurrently. Have a look at TPL Dataflow for this type of scenario. See here for an example.
If I'm not mistaken the recommended way to use async/wait in a foreach is to build a list of Tasks first then call Task.WhenAll.
You're partly mistaken.
If you have a multiple tasks that don't depend on each other then it is indeed generally a very good idea to have those multiple task happen in a WhenAll so that they can be scheduled together, giving better throughput.
If however each task depends on the results of the previous, then this approach isn't viable. Instead you should just await them within a foreach.
Indeed, this will work fine for any case, it's just suboptimal to have tasks wait on each other if they don't have to.
The ability to await tasks in a foreach is in fact one of the biggest gains that async/await has given us. Most code that uses await can be re-written to use ContinueWith quite easily, if less elegantly, but loops were trickier and if the actual end of the loop was only found by examining the results of the tasks themselves, trickier again.
After a few hours of struggle I found a bug in my app. I considered the 2 functions below to have identical behavior, but it turned out they don't.
Can anyone tell me what's really going on under the hood, and why they behave in a different way?
public async Task MyFunction1(IEnumerable<Task> tasks){
await Task.WhenAll(tasks);
Console.WriteLine("all done"); // happens AFTER all tasks are finished
}
public async Task MyFunction2(IEnumerable<Task> tasks){
foreach(var task in tasks){
await task;
}
Console.WriteLine("all done"); // happens BEFORE all tasks are finished
}
They'll function identically if all tasks complete successfully.
If you use WhenAll and any items fail, it still won't be completed until all of the items are finished, and it'll represent an AggregatException that wraps all errors from all tasks.
If you await each one then it'll complete as soon as it hits any item that fails, and it'll represent an exception for that one error, not any others.
The two also differ in that WhenAll will materialize the entire IEnumerable right at the start, before adding any continuations to other items. If the IEnumerable represents a collection of already existing and started tasks, then this isn't relevant, but if the act of iterating the enumerable creates and/or starts tasks, then materializing the sequence at the start would run them all in parallel, and awaiting each before fetching the next task would execute them sequentially. Below is a IEnumerable you could pass in that would behave as I've described here:
public static IEnumerable<Task> TaskGeneratorSequence()
{
for(int i = 0; i < 10; i++)
yield return Task.Delay(TimeSpan.FromSeconds(2);
}
Likely the most important functional difference is that Task.WhenAll can introduce concurrency when your tasks perform truly asynchronous operations, for example, IO. This may or may not be what you want depending on your situation.
For example, if your tasks are querying the database using the same EF DbContext, the next query would fire as soon as the first one is "in flight" which causes EF to blow up as it doesn't support multiple simultaneous queries using the same context.
That's because you're not awaiting each asynchronous operation individually. You're awaiting a task that represents the completion of all of those asynchronous operations. They can also be completed in any order.
However when you await each one individually in a foreach, you only fire the next task when the current one completes, preventing concurrency and ensuring serial execution.
A simple example demonstrating this behavior:
async Task Main()
{
var tasks = new []{1, 2, 3, 4, 5}.Select(i => OperationAsync(i));
foreach(var t in tasks)
{
await t;
}
await Task.WhenAll(tasks);
}
static Random _rand = new Random();
public async Task OperationAsync(int number)
{
// simulate an asynchronous operation
// taking anywhere between 100 to 3000 milliseconds
await Task.Delay(_rand.Next(100, 3000));
Console.WriteLine(number);
}
You'll see that no matter how long OperationAsync takes, with foreach you always get 1, 2, 3, 4, 5 printed. But with Task.WhenAll they are executed concurrently and printed in their completion order.