ForEach lambda async vs Task.WhenAll - c#

I have an async method like this:
private async Task SendAsync(string text) {
...
}
I also have to use this method one time for each item in a List:
List<string> textsToSend = new Service().GetMessages();
Currently my implementation is this:
List<string> textsToSend = new Service().GetMessages();
List<Task> tasks = new List<Task>(textsToSend.Count);
textsToSend.ForEach(t => tasks.Add(SendAsync(t)));
await Task.WhenAll(tasks);
With this code, I get a Task for each message that runs async the sending method.
However, I don't know if is there any different between my implementation and this one:
List<string> textsToSend = new Service().GetMessages();
textsToSend.ForEach(async t => await SendAsync(t));
In the second one, I don't have the List<Task> allocation, but I think that the first one launch all Task in parallel and the second sample, one by one.
Could you help me to clarify if is there any different between the first and second samples?
PD: I also know that C#8 supports foreach async, however I'm using C# 7

You don't even need a list, much less ForEach to execute multiple tasks and await all of them. In any case, ForEach is just a convenience function that uses `foreach.
To execute some async calls concurrently bases on a list of inputs all you need is Enumerable.Select. To await all of them to complete you only need Task.WhenAll :
var tasks=textsToSend.Select(text=>SendAsync(text));
await Task.WhenAll(tasks);
LINQ and IEnumerable in general use lazy evaluation which means Select's code won't be executed until the returned IEnumerable is iterated. In this case it doesn't matter because it's iterated in the very next line. If one wanted to force all tasks to start a call to ToArray() would be enough, eg :
var tasks=textsToSend.Select(SendAsync).ToArray();
If you wanted to execute those async calls sequentially, ie one after the other, you could use a simple foreach. There's no need for C# 8's await foreach :
foreach(var text in textsToSend)
{
await SendAsync(text);
}
The Bug
This line is simply a bug :
textsToSend.ForEach(async t => await SendAsync(t));
ForEach doesn't know anything about tasks so it never awaits for the generated tasks to complete. In fact, the tasks can't be awaited at all. The async t syntax creates an async void delegate. It's equivalent to :
async void MyMethod(string t)
{
await SendAsync(t);
}
textToSend.ForEach(t=>MyMethod(t));
This brings all the problems of async void methods. Since the application knows nothing about those async void calls, it could easily terminate before those methods complete, resulting in NREs, ObjectDisposedExceptions and other weird problems.
For reference check David Fowler's Implicit async void delegates
C# 8 and await foreach
C# 8's IAsyncEnumerable would be useful in the sequential case, if we wanted to return the results of each async operation in an iterator, as soon as we got them.
Before C# 8 there would be no way to avoid awaiting for all results, even with sequential execution. We'd have to collect all of them in a list. Assuming each operation returned a string, we'd have to write :
async Task<List<string> SendTexts(IEnumerable<string> textsToSend)
{
var results=new List<string>();
foreach(var text in textsToSend)
{
var result=await SendAsync(text);
results.Add(result);
}
}
And use it with :
var results=await SendTexts(texts);
In C# 8 we can return individual results and use them asynchronously. We don't need to cache the results before returning them either :
async IAsyncEmumerable<string> SendTexts(IEnumerable<string> textsToSend)
{
foreach(var text in textsToSend)
{
var result=await SendAsync(text);
yield return;
}
}
await foreach(var result in SendTexts(texts))
{
...
}
await foreach is only needed to consume the IAsyncEnumerable result, not produce it

that the first one launch all Task in parallel
Correct. And await Task.WhenAll(tasks); waits for all messages are sent.
The second one also sends messages in parallel but doesn't wait for all messages are sent since you don't await any task.
In your case:
textsToSend.ForEach(async t => await SendAsync(t));
is equivalent to
textsToSend.ForEach(t => SendAsync(t));
the async t => await SendAsync(t) delegate may return the task (it depends on assignable type) as SendAsync(t). In case of passing it to ForEach both async t => await SendAsync(t) and SendAsync(t) will be translated to Action<string>.
Also the first code will throw an exception if any SendAsync throws an excepion. In the second code any exception will be ignored.

Related

How to write an asynchronous method WITHOUT using Task.Delay OR await

I'm trying to learn to write my own asynchronous methods, but I'm having difficulty, because ALL of the millions of examples that I have seen online ALL use await Task.Delay inside the custom async method and I neither want to add a delay into my code, nor have any other async method to call in its place.
Let's use a simple example, where I want to create a new collection of objects, with only two properties, from a huge existing collection of objects, that each have a great many properties. Let's say this is my synchronous code:
public List<SomeLightType> ToLightCollection(List<SomeType> collection)
{
List<SomeLightType> lightCollection = new()
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
return lightCollection;
}
To make this method asynchronous, do I just need to wrap it in a Task.Run, add the async keyword and suffix on the method name, and change the return type, as follows?:
public Task<List<SomeLightType>> ToLightCollectionAsync(List<SomeType> collection)
{
List<SomeLightType> lightCollection = new()
Task.Run(() =>
{
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
});
return lightCollection;
}
Or do I also need to await the return of the Task inside the method? (The compiler gave me a warning until I added await.):
public async Task<List<SomeLightType>> ToLightCollectionAsync(List<SomeType> collection)
{
List<SomeLightType> lightCollection = new()
await Task.Run(() =>
{
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
});
return lightCollection;
}
EDIT:
Oh yes, I have just realised that I need to await this operation, otherwise the empty collection will be returned before it is populated. But still, is this the correct way to make this code run asynchronously?
ALL of the millions of examples that I have seen online ALL use await Task.Delay inside the custom async method and I neither want to add a delay into my code, nor have any other async method to call in its place.
Task.Delay is commonly used as a "placeholder" meaning "replace this with your actual asynchronous work".
I'm trying to learn to write my own asynchronous methods
Asynchronous code begins at the "other end". The most common example is with an I/O operation: you can make this asynchronous instead of blocking the calling thread. At the lowest level, this is commonly done using a TaskCompletionSource<T>, which creates a Task<T> you can return immediately, and then later when the operation completes, you can use the TaskCompletionSource<T> to complete the Task<T>.
However, as you state in the comments:
I definitely want that method to run asynchronously, as it currently takes several minutes... this is a WPF application
What you really want is not asynchronous code; you want to run some code on a background thread so it doesn't block the UI thread. The code being run is CPU-bound and has no I/O to do, so it's just going to run on a thread pool thread instead of actually being asynchronous.
Let's use a simple example... To make this method asynchronous...
To run this code on a background thread, you would use Task.Run. However, I recommend that you do not implement this method using Task.Run. If you do, then you have a method that looks asynchronous but is not actually asynchronous; it's just running synchronously on a thread pool thread - what I call "fake asynchronous" (it has an asynchronous signature but is not actually asynchronous).
IMO, it's cleaner to keep your business logic synchronous, and in this case since you want to free up the UI thread, have the UI code call it using Task.Run:
// no change
public List<SomeLightType> ToLightCollection(List<SomeType> collection)
{
List<SomeLightType> lightCollection = new()
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
return lightCollection;
}
async void Button_Click(...)
{
var source = ...
var lights = await Task.Run(() => ToLightCollection(source));
... // Do something with lights
}
Task.Run is for CPU-Bound work (see learn.microsoft.com - Async in depth.
You can avoid await Task.Run() scenarios if you return the created task directly:
public Task<List<SomeLightType>> ToLightCollectionAsync(List<SomeType> collection) => Task.Run(() =>
{
List<SomeLightType> lightCollection = new();
// Do CPU bound work
foreach (SomeType item in collection)
{
lightCollection.Add(new SomeLightType(item.Id, item.Name));
}
return lightCollection;
});
Now the caller can await for the result in an async Method to keep your UI responsive:
public async Task CallerMethod()
{
// ...
var result = await ToLightCollectionAsync(collection);
}
You also have the opportunity to perform some work during this computation.
public async Task CallerMethod()
{
var task = ToLightCollectionAsync(collection);
// Do some other work
var result = await task;
}

Nested async methods in a Parallel.ForEach

I have a method that runs multiple async methods within it. I have to iterate over a list of devices, and pass the device to this method. I am noticing that this is taking a long time to complete so I am thinking of using Parallel.ForEach so it can run this process against multiple devices at the same time.
Let's say this is my method.
public async Task ProcessDevice(Device device) {
var dev = await _deviceService.LookupDeviceIndbAsNoTracking(device);
var result = await DoSomething(dev);
await DoSomething2(dev);
}
Then DoSomething2 also calls an async method.
public async Task DoSomething2(Device dev) {
foreach(var obj in dev.Objects) {
await DoSomething3(obj);
}
}
The list of devices continuously gets larger over time, so the more this list grows, the longer it takes the program to finish running ProcessDevice() against each device. I would like to process more than one device at a time. So I have been looking into using Parallel.ForEach.
Parallel.ForEach(devices, async device => {
try {
await ProcessDevice(device);
} catch (Exception ex) {
throw ex;
}
})
It appears that the program is finishing before the device is fully processed. I have also tried creating a list of tasks, and then foreach device, add a new task running ProcessDevice to that list and then awaiting Task.WhenAll(listOfTasks);
var listOfTasks = new List<Task>();
foreach(var device in devices) {
var task = Task.Run(async () => await ProcessDevice(device));
listOfTasks.Add(task);
}
await Task.WhenAll(listOfTasks);
But it appears that the task is marked as completed before ProcessDevice() is actually finished running.
Please excuse my ignorance on this issue as I am new to parallel processing and not sure what is going on. What is happening to cause this behavior and is there any documentation that you could provide that could help me better understand what to do?
You can't mix async with Parallel.ForEach. Since your underlying operation is asynchronous, you'd want to use asynchronous concurrency, not parallelism. Asynchronous concurrency is most easily expressed with WhenAll:
var listOfTasks = devices.Select(ProcessDevice).ToList();
await Task.WhenAll(listOfTasks);
In your last example there's a few problems:
var listOfTasks = new List<Task>();
foreach (var device in devices)
{
await Task.Run(async () => await ProcessDevice(device));
}
await Task.WhenAll(listOfTasks);
Doing await Task.Run(async () => await ProcessDevice(device)); means you are not moving to the next iteration of the foreach loop until the previous one is done. Essentially, you're still doing them one at a time.
Additionally, you aren't adding any tasks to listOfTasks so it remains empty and therefore Task.WhenAll(listOfTasks) completes instantly because there's no tasks to await.
Try this:
var listOfTasks = new List<Task>();
foreach (var device in devices)
{
var task = Task.Run(async () => await ProcessDevice(device))
listOfTasks.Add(task);
}
await Task.WhenAll(listOfTasks);
I can explain the problem with Parallel.ForEach. An important thing to understand is that when the await keyword acts on an incomplete Task, it returns. It will return its own incomplete Task if the method signature allows (if it's not void). Then it is up to the caller to use that Task object to wait for the job to finish.
But the second parameter in Parallel.ForEach is an Action<T>, which is a void method, which means no Task can be returned, which means the caller (Parallel.ForEach in this case) has no way to wait until the job has finished.
So in your case, as soon as it hits await ProcessDevice(device), it returns and nothing waits for it to finish so it starts the next iteration. By the time Parallel.ForEach is finished, all it has done is started all the tasks, but not waited for them.
So don't use Parallel.ForEach with asynchronous code.
Stephen's answer is more appropriate. You can also use WSC's answer, but that can be dangerous with larger lists. Creating hundreds or thousands of new threads all at once will not help your performance.
not very sure it this if what you are asking for, but I can give example of how we start async process
private readonly Func<Worker> _worker;
private void StartWorkers(IEnumerable<Props> props){
Parallel.ForEach(props, timestamp => { _worker.Invoke().Consume(timestamp); });
}
Would recommend reading about Parallel.ForEach as it will do some part for you.

Running async methods in parallel

I've got an async method, GetExpensiveThing(), which performs some expensive I/O work. This is how I am using it:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = await GetExpensiveThing();
var second = await GetExpensiveThing();
return new List<Thing>() { first, second };
}
But since it's an expensive method, I want to execute these calls in in parallel. I would have thought moving the awaits would have solved this:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = GetExpensiveThing();
var second = GetExpensiveThing();
return new List<Thing>() { await first, await second };
}
That didn't work, so I wrapped them in some tasks and this works:
// Parallel execution
public async Task<List<Thing>> GetThings()
{
var first = Task.Run(() =>
{
return GetExpensiveThing();
});
var second = Task.Run(() =>
{
return GetExpensiveThing();
});
return new List<Thing>() { first.Result, second.Result };
}
I even tried playing around with awaits and async in and around the tasks, but it got really confusing and I had no luck.
Is there a better to run async methods in parallel, or are tasks a good approach?
Is there a better to run async methods in parallel, or are tasks a good approach?
Yes, the "best" approach is to utilize the Task.WhenAll method. However, your second approach should have ran in parallel. I have created a .NET Fiddle, this should help shed some light. Your second approach should actually be running in parallel. My fiddle proves this!
Consider the following:
public Task<Thing[]> GetThingsAsync()
{
var first = GetExpensiveThingAsync();
var second = GetExpensiveThingAsync();
return Task.WhenAll(first, second);
}
Note
It is preferred to use the "Async" suffix, instead of GetThings and GetExpensiveThing - we should have GetThingsAsync and GetExpensiveThingAsync respectively - source.
Task.WhenAll() has a tendency to become unperformant with large scale/amount of tasks firing simultaneously - without moderation/throttling.
If you are doing a lot of tasks in a list and wanting to await the final outcome, then I propose using a partition with a limit on the degree of parallelism.
I have modified Stephen Toub's blog elegant approach to modern LINQ:
public static Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> funcBody, int maxDoP = 4)
{
async Task AwaitPartition(IEnumerator<T> partition)
{
using (partition)
{
while (partition.MoveNext())
{
await Task.Yield(); // prevents a sync/hot thread hangup
await funcBody(partition.Current);
}
}
}
return Task.WhenAll(
Partitioner
.Create(source)
.GetPartitions(maxDoP)
.AsParallel()
.Select(p => AwaitPartition(p)));
}
How it works is simple, take an IEnumerable - dissect it into evenish partitions and the fire a function/method against each element, in each partition, at the same time. No more than one element in each partition at anyone time, but n Tasks in n partitions.
Extension Usage:
await myList.ParallelForEachAsync(myFunc, Environment.ProcessorCount);
Edit:
I now keep some overloads in a repository on Github if you need more options. It's in a NuGet too for NetStandard.
Edit 2: Thanks to comments from Theodor below, I was able to mitigate poorly written Async Tasks from blocking parallelism by using await Task.Yield();.
You can your the Task.WhenAll, which returns when all depending tasks are done
Check this question here for reference
If GetExpensiveThing is properly asynchronous (meaning it doesn't do any IO or CPU work synchronously), your second solution of invoking both methods and then awaiting the results should've worked. You could've also used Task.WhenAll.
However, if it isn't, you may get better results by posting each task to the thread-pool and using the Task.WhenAll combinator, e.g.:
public Task<IList<Thing>> GetThings() =>
Task.WhenAll(Task.Run(() => GetExpensiveThing()), Task.Run(() => GetExpensiveThing()));
(Note I changed the return type to IList to avoid awaits altogether.)
You should avoid using the Result property. It causes the caller thread to block and wait for the task to complete, unlike await or Task.WhenAll which use continuations.

How to wait the result of async operations without await?

void A()
{
foreach (var document in documents)
{
var res = records.BulkWriteAsync(operationList, writeOptions); // res is Task<BulkWriteResult<JobInfoRecord>>
}
}
After foreach I would like to wait the result of all BulkWriteAsync, how to do this? I don't want to mark A() as async and do the following
await records.BulkWriteAsync(operationList, writeOptions);
Is it good solution?
void A()
{
var tasks = new List<Task<BulkWriteResult<JobInfoRecord>>>();
foreach (var document in documents)
{
var task = records.BulkWriteAsync(operationList, writeOptions);
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
}
I call A() in try catch if I will mark public async void A() as async I never be in catch
Well, first you want a Task that represents all the operations. The simplest way to do this is with a bit of LINQ:
Task.WhenAll(documents.Select(i => records.BulkWriteAsync(...)));
Then, you ideally want to await that task. If that isn't possible, you can try
task.GetAwaiter().GetResult();
However, make sure that none of the tasks have thread affinity - that's a great way to get a deadlock. Waiting for a task on the UI thread while the task itself needs the UI thread is a typical example.
The whole point of await is that it allows you to handle asynchronous code as if it were synchronous. So from the outside, it appears as if you never left the method until you actually get to a return (or the end of the method). For this to work, however, your method must return a Task (or Task<T>), and the callee must await your method in turn.
So a code like this:
try
{
tasks = Task.WhenAll(documents.Select(i => ...));
await tasks;
}
catch (Exception ex)
{
// Handle the exception
}
will appear to run completely synchronously, and all exceptions will be handled as usual (though since we're using Task.WhenAll, some will be wrapped in AggregateException).
However, this isn't actually possible to handle with the way .NET and C# is built, so the C# compiler cheats - await is basically a return that gives you a promise of the result you'll get in the future. And when that happens, the control returns back to where the await left the last time. Task is that promise - if you use async void, there's no way for the callee to know what's happening, and it has no option but to continue as if the asynchronous method was a run-and-forget method. If you use async Task, you can await the async method and everything "feels" synchronous again. Don't break the chain, and the illusion is perfect :)

async within a LINQ code - Clarification?

Almost every SO's answer regarding this topic , states that :
LINQ doesn't work perfectly with async
Also :
I recommend that you not think of this as "using async within LINQ"
But in Stephen's book there is a sample for :
Problem: You have a collection of tasks to await, and you want to do some
processing on each task after it completes. However, you want to do
the processing for each one as soon as it completes, not waiting for
any of the other tasks.
One of the recommended solutions was :
static async Task<int> DelayAndReturnAsync(int val)
{
await Task.Delay(TimeSpan.FromSeconds(val));
return val;
}
// This method now prints "1", "2", and "3".
static async Task ProcessTasksAsync()
{
// Create a sequence of tasks.
Task<int> taskA = DelayAndReturnAsync(2);
Task<int> taskB = DelayAndReturnAsync(3);
Task<int> taskC = DelayAndReturnAsync(1);
var tasks = new[] { taskA, taskB, taskC };
var processingTasks = tasks.Select(async t =>
{
var result = await t;
Trace.WriteLine(result);
}).ToArray();
// Await all processing to complete
await Task.WhenAll(processingTasks);
}
Question #1:
I don't understand why now async inside a LINQ statement - does work . Didn't we just say "don't think about using async within LINQ" ?
Question #2:
When the control reaches the await t here — What is actually happen? Does the control leaves the ProcessTasksAsync method ? or does it leaves the anonymous method and continue the iteration ?
I don't understand why now async inside a LINQ statement - does work . Didn't we just say "don't think about using async within LINQ" ?
async mostly doesn't work with LINQ because IEnumerable<T> extensions don't always infer the delegate type properly and defer to Action<T>. They have no special understanding of the Task class. This means the actual async delegate becomes async void, which is bad. In the case of Enumerable.Select, we have an overload which returns a Func<T> (which in turn will be Func<Task> in our case), which is equivalent to async Task, hence it works fine for async use-cases.
When the control reaches the await t here — What is actually happen? Does the control leaves the ProcessTasksAsync method ?
No, it doesn't. Enumerable.Select is about projecting all elements in the sequence. This means that for each element in the collection, await t which will yield control back to the iterator, which will continue iterating all elements. That's why you later have to await Task.WhenAll, to ensure all elements have finished execution.
Question 1:
The difference is that each task is continued with additional processing which is: Trace.WriteLine(result);. In the link you pointed to, that code does not change anything, just creates overhead of awaiting and wrapping with another task.
Question 2:
When the control reaches the await t here — What is actually happen?
It awaits for the result of ProcessTasksAsync's task, then continue with Trace.WriteLine(result);. We can say that the control leaves the ProcessTasksAsync method when we have the result and the processing is still inside the anonymous method.
At the end, we have await Task.WhenAll(processingTasks); which will await for all tasks including the additional processing (Trace.WriteLine(result);) to complete before continuing but each task does not await for the others to continue executing: Trace.WriteLine(result);
It will be better this way:
static async Task<int> DelayAndReturnAsync(int val)
{
await Task.Delay(TimeSpan.FromSeconds(val));
return val;
}
static async Task AwaitAndProcessAsync(Task<int> task)
{
var result = await task;
Console.WriteLine(result);
}
// This method now prints "1", "2", and "3".
static async Task ProcessTasksAsync()
{
// Create a sequence of tasks.
Task<int> taskA = DelayAndReturnAsync(2);
Task<int> taskB = DelayAndReturnAsync(3);
Task<int> taskC = DelayAndReturnAsync(1);
var tasks = new[] { taskA, taskB, taskC };
var processingTasks = tasks.Select(AwaitAndProcessAsync).ToArray();
// Await all processing to complete
await Task.WhenAll(processingTasks);
}
Array of Task, because AwaitAndProcessAsync returns Task.

Categories

Resources