Let's imagine some abstract code
private void Main()
{
var workTask1 = DoWork1();
var workTask2 = DoWork2();
var workTask3 = DoWork3();
await Task.WhenAll(workTask1, workTask2, workTask3);
AnalyzeWork(workTask1.Result, workTask2.Result, workTask3.Result);
}
private async Task<object> DoWork1()
{
var someOperationTask1 = someOperation1();
var someOperationTask2 = someOperation2();
await Task.WhenAll(someOperationTask1, someOperationTask2);
return new object
{
SomeOperationResult1 = someOperationTask1.Result,
SomeOperationResult2 = someOperationTask2.Result,
};
}
private async Task<object> DoWork2()
{
var someOperationTask3 = someOperation3();
var someOperationTask4 = someOperation4();
await Task.WhenAll(someOperationTask3, someOperationTask4);
return new object
{
SomeOperationResult3 = someOperationTask3.Result,
SomeOperationResult4 = someOperationTask4.Result,
};
}
private async Task<object> DoWork3()
{
var someOperationTask5 = someOperation5();
var someOperationTask6 = someOperation6();
await Task.WhenAll(someOperationTask5, someOperationTask6);
return new object
{
SomeOperationResult5 = someOperationTask5.Result,
SomeOperationResult6 = someOperationTask6.Result,
};
}
Where 3 methods are being run parallelly and each of them consists of 2 parallel's operations. And result of 3 methods is passed to some method.
My question is there are any restrictions? Is it ok to have nested Task.WhenAll and what's difference between nested Task.WhenAll and one level Task.WhenAll operations?
The only restrictions are the available memory of your system. The Task.WhenAll method attaches a continuation to each incomplete task, and this continuation is detached when that task completes. A continuation is a lightweight object similar to a Task. It's quite similar to what you get when you invoke the Task.ContinueWith method. Each continuation weights more or less about 100 bytes. It is unlikely that it will have any noticeable effect to your program, unless you need to Task.WhenAll tens of millions of tasks (or more) at once.
If you want a visual demonstration of what this method looks like inside, below is a rough sketch of its implementation:
// For demonstration purposes only. This code is full of bugs.
static Task WhenAll(params Task[] tasks)
{
var tcs = new TaskCompletionSource();
int completedCount = 0;
foreach (var task in tasks)
{
task.ContinueWith(t =>
{
completedCount++;
if (completedCount == tasks.Length) tcs.SetResult();
});
}
return tcs.Task;
}
Related
Task.WhenAll(params System.Threading.Tasks.Task[] tasks) returns Task, but what is the proper way to asquire task results after calling this method?
After awaiting that task, results can be acquired from the original task by awaiting it once again which should be fine as tasks are completed already. Also it is possible to get result using Task.Result property which is often considered not good practice
Task<TResult1> task1= ...
Task<TResult2> task2= ...
Task<TResult3> task3= ...
await Task.WhenAll(task1, task2, task3)
var a = task1.Result; // returns TResult1
var b = await task1; // also returns TResult1
Which one should I choose here and why?
If you really have only an IEnumerable<Task<TResult>> and the task will be created on-the-fly (e.g. due to a .Select()) you would execute your tasks two times.
So, be sure that you either give an already materialized collection to Task.WhenAll() or get the result from the return value of that method:
var someTasks = Enumerable.Range(1, 10).Select(i => { Task.Delay(i * 100); return i; });
// Bad style, cause someTasks is an IEnumerable created on-the-fly
await Task.WhenAll(someTasks);
foreach(var task in someTasks)
{
var taskResult = await task;
Console.WriteLine(taskResult);
}
// Okay style, cause tasks are materialized before waiting, but easy to misuse wrong variable name.
var myTasks = someTasks.ToList();
await Task.WhenAll(myTasks);
foreach(var task in myTasks)
{
Console.WriteLine(task.Result);
}
// Best style
var results = await Task.WhenAll(someTasks);
foreach(var result in results)
{
Console.WriteLine(result);
}
Update
Just read this in your question:
However, I could not find any overload that would return anything but Task.
This happens, if the collection of tasks you give to the Task.WhenAll() method don't share a common Task<T> type. This could happen, if you e.g. want to run two tasks in parallel, but both return a different value. In that case you have to materialize the tasks and afterwards check the results individually:
public static class Program
{
public static async Task Main(string[] args)
{
var taskOne = ReturnTwo();
var taskTwo = ReturnPi();
await Task.WhenAll(taskOne, taskTwo);
Console.WriteLine(taskOne.Result);
Console.WriteLine(taskTwo.Result);
Console.ReadKey();
}
private static async Task<int> ReturnTwo()
{
await Task.Delay(500);
return 2;
}
private static async Task<double> ReturnPi()
{
await Task.Delay(500);
return Math.PI;
}
}
Here is the overload that returns Task<TResult[]> - MS Docs
Example:
static async Task Test()
{
List<Task<string>> tasks = new List<Task<string>>();
for (int i = 0; i < 5; i++)
{
var currentTask = GetStringAsync();
tasks.Add(currentTask);
}
string[] result = await Task.WhenAll(tasks);
}
static async Task<string> GetStringAsync()
{
await Task.Delay(1000);
return "Result string";
}
I want to create a collection of awaitable tasks, so that I can start them together and asynchronously process the result from each one as they complete.
I have this code, and a compilation error:
> cannot assign void to an implicitly-typed variable
If I understand well, the tasks return by Select don't have a return type, even though the delegate passed returns ColetaIsisViewModel, I would think:
public MainViewModel()
{
Task.Run(LoadItems);
}
async Task LoadItems()
{
IEnumerable<Task> tasks = Directory.GetDirectories(somePath)
.Select(dir => new Task(() =>
new ItemViewModel(new ItemSerializer().Deserialize(dir))));
foreach (var task in tasks)
{
var result = await task; // <-- here I get the compilation error
DoSomething(result);
}
}
You shouldn't ever use the Task constructor.
Since you're calling synchronous code (Deserialize), you could use Task.Run:
async Task LoadItems()
{
var tasks = Directory.GetDirectories(somePath)
.Select(dir => Task.Run(() =>
new ItemViewModel(new ItemSerializer().Deserialize(dir))));
foreach (var task in tasks)
{
var result = await task;
DoSomething(result);
}
}
Alternatively, you could use Parallel or Parallel LINQ:
void LoadItems()
{
var vms = Directory.GetDirectories(somePath)
.AsParallel().Select(dir =>
new ItemViewModel(new ItemSerializer().Deserialize(dir)))
.ToList();
foreach (var vm in vms)
{
DoSomething(vm);
}
}
Or, if you make Deserialize a truly async method, then you can make it all asynchronous:
async Task LoadItems()
{
var tasks = Directory.GetDirectories(somePath)
.Select(async dir =>
new ItemViewModel(await new ItemSerializer().DeserializeAsync(dir))));
foreach (var task in tasks)
{
var result = await task;
DoSomething(result);
}
}
Also, I recommend that you do not use fire-and-forget in your constructor. There are better patterns for asynchronous constructors.
I know the question has been answered, but you can always do this too:
var serializer = new ItemSerializer();
var directories = Directory.GetDirectories(somePath);
foreach (string directory in directories)
{
await Task.Run(() => serializer.Deserialize(directory))
.ContinueWith(priorTask => DoSomething(priorTask.Result));
}
Notice I pulled out the serializer instantiation (assuming there are no side effects).
I have some code of the following form:
static async Task DoSomething(int n)
{
...
}
static void RunThreads(int totalThreads, int throttle)
{
var tasks = new List<Task>();
for (var n = 0; n < totalThreads; n++)
{
var task = DoSomething(n);
tasks.Add(task);
}
Task.WhenAll(tasks).Wait(); // all threads must complete
}
Trouble is, if I don't throttle the threads, things start falling apart. Now, I want to launch a maximum of throttle threads, and only start the new thread when an old one is complete. I've tried a few approaches and none so far has worked. Problems I have encountered include:
The tasks collection must be fully populated with all tasks, whether active or awaiting execution, otherwise the final .Wait() call only looks at the threads that it started with.
Chaining the execution seems to require use of Task.Run() or the like. But I need a reference to each task from the outset, and instantiating a task seems to kick it off automatically, which is what I don't want.
How to do this?
First, abstract away from threads. Especially since your operation is asynchronous, you shouldn't be thinking about "threads" at all. In the asynchronous world, you have tasks, and you can have a huge number of tasks compared to threads.
Throttling asynchronous code can be done using SemaphoreSlim:
static async Task DoSomething(int n);
static void RunConcurrently(int total, int throttle)
{
var mutex = new SemaphoreSlim(throttle);
var tasks = Enumerable.Range(0, total).Select(async item =>
{
await mutex.WaitAsync();
try { await DoSomething(item); }
finally { mutex.Release(); }
});
Task.WhenAll(tasks).Wait();
}
The simplest option IMO is to use TPL Dataflow. You just create an ActionBLock, limit it by the desired parallelism and start posting items into it. It makes sure to only run a certain amount of tasks at the same time, and when a task completes, it starts executing the next item:
async Task RunAsync(int totalThreads, int throttle)
{
var block = new ActionBlock<int>(
DoSomething,
new ExecutionDataFlowOptions { MaxDegreeOfParallelism = throttle });
for (var n = 0; n < totalThreads; n++)
{
block.Post(n);
}
block.Complete();
await block.Completion;
}
If I understand correctly, you can start tasks limited number of tasks mentioned by throttle parameter and wait for them to finish before starting next..
To wait for all started tasks to complete before starting new tasks, use the following implementation.
static async Task RunThreads(int totalThreads, int throttle)
{
var tasks = new List<Task>();
for (var n = 0; n < totalThreads; n++)
{
var task = DoSomething(n);
tasks.Add(task);
if (tasks.Count == throttle)
{
await Task.WhenAll(tasks);
tasks.Clear();
}
}
await Task.WhenAll(tasks); // wait for remaining
}
To add tasks as on when it is completed you can use the following code
static async Task RunThreads(int totalThreads, int throttle)
{
var tasks = new List<Task>();
for (var n = 0; n < totalThreads; n++)
{
var task = DoSomething(n);
tasks.Add(task);
if (tasks.Count == throttle)
{
var completed = await Task.WhenAny(tasks);
tasks.Remove(completed);
}
}
await Task.WhenAll(tasks); // all threads must complete
}
Stephen Toub gives the following example for throttling in his The Task-based Asynchronous Pattern document.
const int CONCURRENCY_LEVEL = 15;
Uri [] urls = …;
int nextIndex = 0;
var imageTasks = new List<Task<Bitmap>>();
while(nextIndex < CONCURRENCY_LEVEL && nextIndex < urls.Length)
{
imageTasks.Add(GetBitmapAsync(urls[nextIndex]));
nextIndex++;
}
while(imageTasks.Count > 0)
{
try
{
Task<Bitmap> imageTask = await Task.WhenAny(imageTasks);
imageTasks.Remove(imageTask);
Bitmap image = await imageTask;
panel.AddImage(image);
}
catch(Exception exc) { Log(exc); }
if (nextIndex < urls.Length)
{
imageTasks.Add(GetBitmapAsync(urls[nextIndex]));
nextIndex++;
}
}
Microsoft's Reactive Extensions (Rx) - NuGet "Rx-Main" - has this problem sorted very nicely.
Just do this:
static void RunThreads(int totalThreads, int throttle)
{
Observable
.Range(0, totalThreads)
.Select(n => Observable.FromAsync(() => DoSomething(n)))
.Merge(throttle)
.Wait();
}
Job done.
.NET 6 introduces Parallel.ForEachAsync. You could rewrite your code like this:
static async ValueTask DoSomething(int n)
{
...
}
static Task RunThreads(int totalThreads, int throttle)
=> Parallel.ForEachAsync(Enumerable.Range(0, totalThreads), new ParallelOptions() { MaxDegreeOfParallelism = throttle }, (i, _) => DoSomething(i));
Notes:
I had to change the return type of your DoSomething function from Task to ValueTask.
You probably want to avoid the .Wait() call, so I made the RunThreads method async.
It is not obvious from your example why you need access to the individual tasks. This code does not give you access to the tasks, but might still be helpful in many cases.
Here are some extension method variations to build on Sriram Sakthivel answer.
In the usage example, calls to DoSomething are being wrapped in an explicitly cast closure to allow passing arguments.
public static async Task RunMyThrottledTasks()
{
var myArgsSource = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
await myArgsSource
.Select(a => (Func<Task<object>>)(() => DoSomething(a)))
.Throttle(2);
}
public static async Task<object> DoSomething(int arg)
{
// Await some async calls that need arg..
// ..then return result async Task..
return new object();
}
public static async Task<IEnumerable<T>> Throttle<T>(IEnumerable<Func<Task<T>>> toRun, int throttleTo)
{
var running = new List<Task<T>>(throttleTo);
var completed = new List<Task<T>>(toRun.Count());
foreach(var taskToRun in toRun)
{
running.Add(taskToRun());
if(running.Count == throttleTo)
{
var comTask = await Task.WhenAny(running);
running.Remove(comTask);
completed.Add(comTask);
}
}
return completed.Select(t => t.Result);
}
public static async Task Throttle(this IEnumerable<Func<Task>> toRun, int throttleTo)
{
var running = new List<Task>(throttleTo);
foreach(var taskToRun in toRun)
{
running.Add(taskToRun());
if(running.Count == throttleTo)
{
var comTask = await Task.WhenAny(running);
running.Remove(comTask);
}
}
}
What you need is a custom task scheduler. You can derive a class from System.Threading.Tasks.TaskScheduler and implement two major functions GetScheduledTasks(), QueueTask(), along with other functions to gain complete control over throttling tasks. Here is a well documented example.
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler?view=net-5.0
You can actually emulate the Parallel.ForEachAsync method introduced as part of .NET 6. In order to emulate the same you can use the following code.
public static Task ForEachAsync<T>(IEnumerable<T> source, int maxDegreeOfParallelism, Func<T, Task> body) {
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(maxDegreeOfParallelism)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
This question already has answers here:
Nesting await in Parallel.ForEach [duplicate]
(11 answers)
Closed last year.
I had such method:
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
foreach(var method in Methods)
{
string json = await Process(method);
result.Prop1 = PopulateProp1(json);
result.Prop2 = PopulateProp2(json);
}
return result;
}
Then I decided to use Parallel.ForEach:
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
Parallel.ForEach(Methods, async method =>
{
string json = await Process(method);
result.Prop1 = PopulateProp1(json);
result.Prop2 = PopulateProp2(json);
});
return result;
}
But now I've got an error:
An asynchronous module or handler completed while an asynchronous operation was still pending.
async doesn't work well with ForEach. In particular, your async lambda is being converted to an async void method. There are a number of reasons to avoid async void (as I describe in an MSDN article); one of them is that you can't easily detect when the async lambda has completed. ASP.NET will see your code return without completing the async void method and (appropriately) throw an exception.
What you probably want to do is process the data concurrently, just not in parallel. Parallel code should almost never be used on ASP.NET. Here's what the code would look like with asynchronous concurrent processing:
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
var tasks = Methods.Select(method => ProcessAsync(method)).ToArray();
string[] json = await Task.WhenAll(tasks);
result.Prop1 = PopulateProp1(json[0]);
...
return result;
}
.NET 6 finally added Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urlsToDownload = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://twitter.com/shahabfar"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urlsToDownload, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url, token);
// The request will be canceled in case of an error in another URL.
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Alternatively, with the AsyncEnumerator NuGet Package you can do this:
using System.Collections.Async;
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
await Methods.ParallelForEachAsync(async method =>
{
string json = await Process(method);
result.Prop1 = PopulateProp1(json);
result.Prop2 = PopulateProp2(json);
}, maxDegreeOfParallelism: 10);
return result;
}
where ParallelForEachAsync is an extension method.
Ahh, okay. I think I know what's going on now. async method => an "async void" which is "fire and forget" (not recommended for anything other than event handlers). This means the caller cannot know when it is completed... So, GetResult returns while the operation is still running. Although the technical details of my first answer are incorrect, the result is the same here: that GetResult is returning while the operations started by ForEach are still running. The only thing you could really do is not await on Process (so that the lambda is no longer async) and wait for Process to complete each iteration. But, that will use at least one thread pool thread to do that and thus stress the pool slightly--likely making use of ForEach pointless. I would simply not use Parallel.ForEach...
During startup, the application I need to get two sets of data, each has its asynchronous method. If I call them one by one, then the second call will pass only after completion of the first.
List<DataOne> DataCollectionOne;
List<DataTwo> DataCollectionTwo;
async void GetDatas()
{
if(sameCondOne)
DataCollectionOne = await GetDataOne();
if(sameCondTwo)
DataCollectionTwo = await GetDataTwo();
}
So I wrapped them in the task calls.
void GetDatas()
{
if(sameCondOne)
Task.Run(() => RunDataOne());
if(sameCondTwo)
Task.Run(() => RunDataTwo());
}
async void RunDataOne()
{
DataCollectionOne = await GetDataOne();
}
async void RunDataTwo()
{
DataCollectionTwo = await GetDataTwo();
}
I am doing right?
No. You don't need, nor want to spin up a new thread just to be responsible for starting these two asynchronous operations. Simply start both operations (calling the method is what starts the operation) and don't await either until you've started them both:
var firstTask = GetDataOne();
var secondTask = GetDataTwo();
var firstResult = await firstTask;
var secondResult = await secondTask;
To handle the conditional check just conditionally start the task, and then conditionally assign the result:
Task<T> firstTask = null;
if(shouldGetFirstTask)
firstTask = GetDataOne();
Task<T> secondTask = null;
if(shouldGetSecondTask)
secondTask = GetDataTwo();
if(firstTask != null)
DataCollectionOne = await firstTask;
if(secondTask != null)
DataCollectionTwo = await secondTask;
First of all, you should avoid void returning async methods (Best Practices in Asynchronous Programming).
It is a common practice to suffix async methods with Async (or TaskAsync if Async suffixed methods already exist).
If I understand you correctly, you want to spawn some conditional asynchronous interdependent tasks and wait for all the tasks to end without having any of them wait for another.
Since you are doing anything with the result of the tasks in the orchestrating method but you need istead some side effects, I would wrap the Get~ methods into Run~ methods like you did.
You could test the condition inside the Run~ methods (either by explicitly testing the condition or having it as a parameter):
async Task RunDataOneAsymc()
{
if (sameCondOne)
{
DataCollectionOne = await GetDataOneAsync();
}
}
But this would still spawn unnecessary tasks if the condition is false. So, it's better to keep it on the caller side:
async Task RunDataOneAsymc()
{
DataCollectionOne = await GetDataOneAsync();
}
// ...
if (sameCondOne)
{
await RunDataOneAsymc();
}
On the orchestating method, you will spawn the tasks if the condition is true and then wait for all of them to complete:
async Task RunAll()
{
var tasks = new List<Task>();
if (sameCondOne)
{
await RunDataOneAsymc();
}
// ...
if (sameCondN)
{
await RunDataNAsymc();
}
await Task.WhenAll(tasks);
}