Task.WhenAny - What happens with remaining running tasks? - c#

I have the following code:
List<Task<bool>> tasks = tasksQuery.ToList();
while (tasks.Any())
{
Task<bool> completedTask = await Task.WhenAny(tasks);
if (await completedTask)
return true;
tasks.Remove(completedTask);
}
It launches tasks in parallel. When first completed task returns true, the method returns true.
My questions are:
What happens with all remaining tasks that have been launched and probably still running in the background?
Is this the right approach to execute code that is async, parallel and should return after the first condition occurs, or it is better to launch them one by one and await singularly?

Incidentally, I am just reading Concurrency in C# CookBook, by Stephen Cleary, and I can refer to some parts of the book here, I guess.
From Recipe 2.5 - Discussion, we have
When the first task completes, consider whether to cancel the remaining tasks. If the other tasks are not canceled but are also never awaited, then they are abandoned. Abandoned tasks will run to completion, and their results will be ignored. Any exceptions from those abandoned tasks will also be ignored.
Another antipattern for Task.WhenAny is handling tasks as they complete. At first it seems like a reasonable approach to keep a list of tasks and remove each task from the list as it completes. The problem with this approach is that it executes in O(N^2) time, when an O(N) algorithm exists.
Besides that, I think WhenAny is surely the right approach. Just consider the following Leonid approach of passing the same CancellationToken for all tasks and cancel them after the first one returns. And even that, only if the cost of these operations is actually taxing the system.

Related

Two ways of calling asynchronous method. C#

Are there differences between this two approaches? Or program runs similarly in this two cases? If there are differences could you say what are these differences.
First approach:
Task myTask = MyFunctionAsync();
await myTask;
Second approach:
await MyFunctionAsync();
Short version: "not really, at least not in an interesting way"
Long version: awaitables aren't limited to Task/Task<T>, so it is possible (trivially so, in fact) to create code that compiles fine with:
await MyFunctionAsync();
but doesn't compile with:
Task myTask = MyFunctionAsync();
await myTask;
simply because MyFunctionAsync() returns something that isn't a task. ValueTask<int> would be enough for this, but you can make exotic awaitables if you want. But: if we replace Task with var, i.e.
var myTask = MyFunctionAsync();
await myTask;
then now the only difference is that we can refer to myTask at other points in the code, if we want to. This isn't exactly uncommon; the two main scenarios being
combining multiple checks over concurrent code, perhaps using WhenAny or WhenAll
(usually in the case of ValueTask[<T>]) checking whether the awaitable completed synchronously, to avoid the state machine overhead in the synchronous case
They are effectively the same. The difference is that the first way lets you do more steps before you wait for a response. So you could start many tasks concurrently in the first way, and then await them all together with await Task.WhenAll(myListOfTasks)
For example:
var myTasks = myEmployees.Select(e => ProcessPayrollAsync(e));
await Task.WhenAll(myTasks);
I would use the first way if you need to for concurrency and the second way if its a simple case because its shorter.
In that particular case, the 2 forms of code are executed in a similar way. Homewer, consider this:
public async Task<int> CalculateResult(InputData data) {
// This queues up the work on the threadpool.
var expensiveResultTask = Task.Run(() => DoExpensiveCalculation(data));
// Note that at this point, you can do some other work concurrently,
// as CalculateResult() is still executing!
// Execution of CalculateResult is yielded here!
var result = await expensiveResultTask;
return result;
}
As the comments in the code above point out, between a task is running and the await call you can execute any other concurrent code.
For more information, read this article.
What really helped me to understand async-await was the cook analogy descrived by Eric Lippert in this interview. Search somewhere in the middle for async-await.
Here he describes a cook making breakfast. Once he put on the kettle to boil water for the tea, he doesn't wait idly for the water to cook. Instead he puts the bread in the toaster, tea in the teapot and starts slicing tomatoes: whenever he has to wait for another apparatus or other cook to finish his job, he doesn't wait idly, but starts the next task, until he needs the results for one of tasks previous tasks.
Async await, does the same: whenever another process has to do something, where your thread can do nothing but wait for the other process to finish, the thread can look around to see if it can do other things instead. You typically see async-await when another lengthy process is involved: Writing data to a file, Querying data from a database or from the internet. Whenever your thread has to do this, the thread can order the other process to do something while it continues to do other things:
Task<string> taskReadFile = ReadMyFileAsync(...);
// not awaiting, we can do other things:
DoSomethingElse();
// now we need the result of the file:
string fetchedData = await taskReadFile;
So what happens. ReadMyFileAsync is async, so you know that somewhere deep inside it, the method awaits. In fact, your compiler will warn you if you forget to await.
Once your thread sees the await, it knows that the results of the await are needed, so it can't continue. Instead it goes up the call stack to continue processing (in my example DoSomethingElse()), until it sees an await. It goes up the call stack again and continues processing, etc.
So in fact there is no real difference between your first and your second method. You can compare it with:
double x = Math.Sin(4.0)
versus
double a = 4.0;
double x = Math.Sin(a);
Officially the only difference is that after these statements you can still use a. Similarly you can use information from the task after the await:
Task<MyData> myTask = FetchMyDataAsync(...);
MyData result = await myTask;
// if desired you can investigate myTask
if (result == null)
{
// why is it null, did someone cancel my task?
if (Task.IsCanceled)
{
Yell("Hey, who did cancel my task!");
}
}
But most of the times you are not interested in the task. If you don't have anything else to do while the task is executing, I'd just await for it:
MyData fetchedData = await FetchMyDataAsync(...)

Parallel.ForEach with async lambda waiting forall iterations to complete

recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Is there any way how could I write:
List<int> list = new List<int>[]();
Parallel.ForEach(arrayValues, async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
list.Add(x);
});
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Well, that's because Parallel doesn't work with async. And from a different perspective, why would you want to mix them in the first place? They do opposite things. Parallel is all about adding threads and async is all about giving up threads. If you want to do asynchronous work concurrently, then use Task.WhenAll. That's the correct tool for the job; Parallel is not.
That said, it sounds like you want to use the wrong tool, so here's how you do it...
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
You'll need to have some kind of a signal that some code can block on until the processing is done, e.g., CountdownEvent or Monitor. On a side note, you'll need to protect access to the non-thread-safe List<T> as well.
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
Since Parallel doesn't understand async lambdas, when the first await yields (returns) to its caller, Parallel will assume that interation of the loop is complete.
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
Correct. As far as Parallel knows, it can only "see" the method to the first await that returns to its caller. So it doesn't know when the async lambda is complete. It also will assume iterations are complete too early, which throws partitioning off.
You don't need Parallel.For/ForEach here you just need to await a list of tasks.
Background
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run multiple async tasks at the same time use Task.WhenAll , or a TPL Dataflow Block (or something similar) which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
Unless you need to do more inside of your lambda (for which you haven't shown), just use aSelect and WhenAll
var tasks = items .Select(LongRunningIoOperationAsync);
var results = await Task.WhenAll(tasks); // here is your list of int
If you do, you can still use the await,
var tasks = items.Select(async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
// do other stuff
return x;
});
var results = await Task.WhenAll(tasks);
Note : If you need the extended functionality of Parallel.ForEach (namely the Options to control max concurrency), there are several approach, however RX or DataFlow might be the most succinct

When is the best place to use Task.Result instead of awaiting Task

Whilst I've been using async code in .NET for a while, I've only recently started to research it and understand what's going on. I've just been going through my code and trying to alter it so if a task can be done in parallel to some work, then it is. So for example:
var user = await _userRepo.GetByUsername(User.Identity.Name);
//Some minor work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(user, DateTime.Now);
return user;
Now becomes:
var userTask = _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(userTask.Result, DateTime.Now);
return user;
My understand is that the user object is now being fetched from the database WHILST some unrelated work is going on. However, things I've seen posted imply that result should be used rarely and await is preferred but I don't understand why I'd want to wait for my user object to be fetched if I can be performing some other independant logic at the same time?
Let's make sure to not bury the lede here:
So for example: [some correct code] becomes [some incorrect code]
NEVER NEVER NEVER DO THIS.
Your instinct that you can restructure your control flow to improve performance is excellent and correct. Using Result to do so is WRONG WRONG WRONG.
The correct way to rewrite your code is
var userTask = _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(await userTask, DateTime.Now);
return user;
Remember, await does not make a call asynchronous. Await simply means "if the result of this task is not yet available, go do something else and come back here after it is available". The call is already asynchronous: it returns a task.
People seem to think that await has the semantics of a co-call; it does not. Rather, await is the extract operation on the task comonad; it is an operator on tasks, not call expressions. You normally see it on method calls simply because it is a common pattern to abstract away an async operation as a method. The returned task is the thing that is awaited, not the call.
However, things I've seen posted imply that result should be used rarely and await is preferred but I don't understand why I'd want to wait for my user object to be fetched if I can be performing some other independent logic at the same time?
Why do you believe that using Result will allow you to perform other independent logic at the same time??? Result prevents you from doing exactly that. Result is a synchronous wait. Your thread cannot be doing any other work while it is synchronously waiting for the task to complete. Use an asynchronous wait to improve efficiency. Remember, await simply means "this workflow cannot progress further until this task is completed, so if it is not complete, find more work to do and come back later". A too-early await can, as you note, make for an inefficient workflow because sometimes the workflow can progress even if the task is not complete.
By all means, move around where the awaits happen to improve efficiency of your workflow, but never never never change them into Result. You have some deep misunderstanding of how asynchronous workflows work if you believe that using Result will ever improve efficiency of parallelism in the workflow. Examine your beliefs and see if you can figure out which one is giving you this incorrect intuition.
The reason why you must never use Result like this is not just because it is inefficient to synchronously wait when you have an asynchronous workflow underway. It will eventually hang your process. Consider the following workflow:
task1 represents a job that will be scheduled to execute on this thread in the future and produce a result.
asynchronous function Foo awaits task1.
task1 is not yet complete, so Foo returns, allowing this thread to run more work. Foo returns a task representing its workflow, and signs up completing that task as the completion of task1.
The thread is now free to do work in the future, including task1.
task1 completes, triggering the execution of the completion of the workflow of Foo, and eventually completing the task representing the workflow of Foo.
Now suppose Foo instead fetches Result of task1. What happens? Foo synchronously waits for task1 to complete, which is waiting for the current thread to become available, which never happens because we're in a synchronous wait. Calling Result causes a thread to deadlock with itself if the task is somehow affinitized to the current thread. You can now make deadlocks involving no locks and only one thread! Don't do this.
Async await does not mean that several threads will be running your code.
However, it will lower the time your thread will be waiting idly for processes to finish, thus finishing earlier.
Whenever the thread normally would have to wait idly for something to finish, like waiting for a web page to download, a database query to finish, a disk write to finish, the async-await thread will not be waiting idly until the data is written / fetched, but looks around if it can do other things instead, and come back later after the awaitable task is finished.
This has been described with a cook analogy in this inverview with Eric Lippert. Search somewhere in the middle for async await.
Eric Lippert compares async-await with one(!) cook who has to make breakfast. After he starts toasting the bread he could wait idly until the bread is toasted before putting on the kettle for tea, wait until the water boils before putting the tea leaves in the teapot, etc.
An async-await cook, wouldn't wait for the toasted bread, but put on the kettle, and while the water is heating up he would put the tea leaves in the teapot.
Whenever the cook has to wait idly for something, he looks around to see if he can do something else instead.
A thread in an async function will do something similar. Because the function is async, you know there is somewhere an await in the function. In fact, if you forget to program the await, your compiler will warn you.
When your thread meets the await, it goes up its call stack to see if it can do something else, until it sees an await, goes up the call stack again, etc. Once everyone is waiting, he goes down the call stack and starts waiting idly until the first awaitable process is finished.
After the awaitable process is finished the thread will continue processing the statements after the await until he sees an await again.
It might be that another thread will continue processing the statements that come after the await (you can see this in the debugger by checking the thread ID). However this other thread has the context of the original thread, so it can act as if it was the original thread. No need for mutexes, semaphores, IsInvokeRequired (in winforms) etc. For you it seems as if there is one thread.
Sometimes your cook has to do something that takes up some time without idly waiting, like slicing tomatoes. In that case it might be wise to hire a different cook and order him to do the slicing. In the mean time your cook can continue with the eggs that just finished boiling and needed peeling.
In computer terms this would be if you had some big calculations without waiting for other processes. Note the difference with for instance writing data to disk. Once your thread has ordered that the data needs to be written to disk, it normally would wait idly until the data has been written. This is not the case when doing big calculations.
You can hire the extra cook using Task.Run
async Task<DateTime> CalculateSunSet()
{
// start fetching sunset data. however don't wait for the result yet
// you've got better things to do:
Task<SunsetData> taskFetchData = FetchSunsetData();
// because you are not awaiting your thread will do the following:
Location location = FetchLocation();
// now you need the sunset data, start awaiting for the Task:
SunsetData sunsetData = await taskFetchData;
// some big calculations are needed, that take 33 seconds,
// you want to keep your caller responsive, so start a Task
// this Task will be run by a different thread:
Task<DateTime> taskBigCalculations = Taks.Run( () => BigCalculations(sunsetData, location);
// again no await: you are still free to do other things
...
// before returning you need the result of the big calculations.
// wait until big calculations are finished, keep caller responsive:
DateTime result = await taskBigCalculations;
return result;
}
In your case, you can use:
user = await _userRepo.UpdateLastAccessed(await userTask, DateTime.Now);
or perhaps more clearly:
var user = await _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(user, DateTime.Now);
The only time you should touch .Result is when you know the task has been completed. This can be useful in some scenarios where you are trying to avoid creating an async state machine and you think there's a good chance that the task has completed synchronously (perhaps using a local function for the async case), or if you're using callbacks rather than async/await, and you're inside the callback.
As an example of avoiding a state machine:
ValueTask<int> FetchAndProcess(SomeArgs args) {
async ValueTask<int> Awaited(ValueTask<int> task) => SomeOtherProcessing(await task);
var task = GetAsyncData(args);
if (!task.IsCompletedSuccessfully) return Awaited(task);
return new ValueTask<int>(SomeOtherProcessing(task.Result));
}
The point here is that if GetAsyncData returns a synchronously completed result, we completely avoid all the async machinery.
Have you considered this version?
var userTask = _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(await userTask, DateTime.Now);
return user;
This will execute the "work" while the user is retrieved, but it also has all the advantages of await that are described in Await on a completed task same as task.Result?
As suggested you can also use a more explicit version to be able to inspect the result of the call in the debugger.
var userTask = _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await userTask;
user = await _userRepo.UpdateLastAccessed(user, DateTime.Now);
return user;

Difference between await Task and await Task.WhenAll

What is the difference when using await on multiple await task vs waiting on all the tasks to finish. My under standing is the scenario 2 is better in terms of performance because both asyncTask1 and asyncTask2 are preformed in parallel.
scenario 1 :
async Task Task()
{
await asyncTask1();
await asyncTask2();
}
scenario 2 :
async Task Task()
{
t1 = asyncTask1();
t2 = asyncTask2();
await Task.WhenAll(createWorkflowtask, getTaskWorkflowTask);
}
In scenario 1, tasks are run sequentially (asyncTask1 must complete before asyncTask2 starts) while in scenario 2 the two tasks can run in parallel.
You wrote: "... because both asyncTask1 and asyncTask2 are preformed in parallel."
No they are not!
Addition: Below I wrote that everything in async-await is performed by one thread. Schneider commented correctly that in async-await multiple threads can be involved. See the addition at the end.
An article that helped me a lot to understand how async-await works was this interview with Eric-Lippert who compared async/await with a cook making dinner. (Somewhere half-way, search for async-await).
Eric Lippert explains that if a cook starts doing something and finds after a while that he has nothing else to do but wait for a process to finish, this cook looks around to see if he can do something else instead of waiting.
When using async/await, there is still one thread involved. This thread can do only one thing at a time. While the thread is busy doing Task1, it can't execute Task2. Only if it finds an await in Task1, it will start executing statements from Task2. In your scenario 2 the tasks are not executed in parallel.
However there is a difference between the scenarios. In scenario 1 the first statement of task2 will not be executed before task1 completely finishes. Scenario 2 will start executing the first statements of task2 as soon as task1 encounters an await.
If you really want task2 to do something while task1 is also doing something, you'll have to start doing task2 in a separate thread. The easy method to do this in your scenario would be:
var task1 = Task.Run( () => asyncTask1())
// this statement is executed while task1 begins executing on a different thread.
// hence this thread is free to do other things, like performing statements
// from task2:
var task2 = asyncTask();
// the following statement will only be executed if task2 encounters an await
DoSomethingElse();
// when we need results from both task1 and task2:
await Task.WhenAll(new Task[] {task1, task2});
So usually, it is best only to await for a task to finish if you need the result of this task. As long as you can do other things, do these other things, they will be executed as soon as the other task starts awaiting until you start awaiting, in which case your caller will start doing things until his await etc.
The advantage of this method above doing things in parallel are manyfold:
Everything is performed by one thread: no need of mutexes, no chance of deadlock, starvation, etc
Your code looks quite sequential. Compare this with code that uses Task.ContinueWith and similar statements
There is no overhead of starting a separate thread / running a thread from the thread pool
Addition: Schneider's comment below about several threads is correct.
Some testing showed me that the thread ID of the current thread in the awaitable tasks is different than the thread ID of the calling thread.
For newbees to async-await it is important to understand that although various threads are involved, async-await does NOT mean that the tasks are performed in parallel. If you want parallelism you specifically have to say that the task must be run in parallel.
It seems that the cook in Eric Lippert's analogy is in fact a team of cooks, who constantly look around to see if they can help some of the other cooks instead of waiting for their tasks to finish. And indeed if cook Albert sees an await and starts doing something else, cook Bernard might finish the task of cook Albert.
In the first scenario you launch a task and then wait until is completed, then pass to the second one and wait until finish before exit from the method.
In the second scenario you launch the two task in parallel and then wait until tgey are completed when you call Task.WhenAll
Using Task.WhenAll
You cannot use methods with return types unless all of return types are the same and using generic version of WhenAll method
You don't have control over sequences that methods executed because these methods is executing in parallel
Unhandled exception in one method doesn't break the execution of other methods(because of parallel nature)
From MSDN
If any of the supplied tasks completes in a faulted state, the returned task will also complete in a TaskStatus.Faulted state, where its exceptions will contain the aggregation of the set of unwrapped exceptions from each of the supplied tasks.
Using await on multiple methods
You have control over sequences that functions called
You can use different return types and use that return types on next steps
Unhandled exception in one method will break the execution of other methods

How can I wait till the Parallel.ForEach completes

I'm using TPL in my current project and using Parallel.Foreach to spin many threads. The Task class contains Wait() to wait till the task gets completed. Like that, how I can wait for the Parallel.ForEach to complete and then go into executing next statements?
You don't have to do anything special, Parallel.Foreach() will wait until all its branched tasks are complete. From the calling thread you can treat it as a single synchronous statement and for instance wrap it inside a try/catch.
Update:
The old Parallel class methods are not a good fit for async (Task based) programming. But starting with dotnet 6 we can use Parallel.ForEachAsync()
await Parallel.ForEachAsync(items, (item, cancellationToken) =>
{
await ...
});
There are a few overloads available and the 'body' method should return a ValueTask.
You don't need that with Parallel.Foreach: it only executes the foreach in as many thread as there are processors available, but it returns synchronously.
More information can be found here
As everyone here said - you dont need to wait for it. What I can add from my experience:
If you have an async body to execute and you await some async calls inside, it just ran through my code and did not wait for anything. So I just replaced the await with .Result - then it worked as intended. I couldnt find out though why is that so :/
if you are storing results from the tasks in a List, make sure to use a thread-safe data structure such as ConcurrentBag, otherwise, some results would be missing because of concurrent write issues.
I believe that you can use IsCompleted like follows:
if(Parallel.ForEach(files, f => ProcessFiles(f)).IsCompleted)
{
// DO STUFF
}

Categories

Resources