Effects of async within a parallel for loop

Effects of async within a parallel for loop - c#

I am going to start by saying that I am learning about mulithreading at the moment so it may be the case that not all I say is correct - please feel free to correct me as required. I do have a reasonable understanding of async and await.
My basic aim is as follows:
I have a body of code that currently takes about 3 seconds. I am trying to load some data at the start of the method that will be used right at the end. My plan is to load the data on a different thread right at the start - allowing the rest of the code to execute independently. Then, at the point that I need the data, the code will wait if the data is not loaded. So far this is all seems to be working fine and as I describe.
My question relates to what happens when I call a method that is async, within a parallel for loop, without awaiting it.
My code follows this structure:
public void MainCaller()
{
List<int> listFromThread = null;
var secondThread = Task.Factory.StartNew(() =>
{
listFromThread = GetAllLists().Result;
});
//Do some other stuff
secondThread.Wait();
//Do not pass this point until this thread has completed
}
public Task<List<int>> GetAllLists()
{
var intList = new List<int>(){ /*Whatever... */};
var returnList = new List<int>();
Parallel.ForEach(intList, intEntry =>
{
var res = MyMethod().Result;
returnList.AddRange(res);
});
return Task.FromResult(returnList);
}
private async Task<List<int>> MyMethod()
{
var myList = await obtainList.ToListAsync();
}
Note the Parallel for Loop calls the async method, but does not await it as it is not async itself.
This is a method that is used elsewhere, so it is valid that it is async. I know one option is to make a copy of this method that is not async, but I am trying to understand what will happen here.
My question is, can I be sure that when I reach secondThread.Wait(); the async part of the execution will be complete. Eg will wait to know wait for the async part to complete, or will async mess up the wait, or will it work seamlessly together?
It seems to me it could be possible that as the call to MyMethod is not awaited, but there is an await within MyMethod, the parallel for loop could continue execution before the awaited call has completed?
Then I think, as it is assigning it by reference, then once the assigning takes place, the value will be the correct result.
This leads me to think that as long as the wait will know to wait for the async to complete, then there is no problem - hence my question.
I guess this relates to my lack of understanding of Tasks?
I hope this is clear?

In your code there is no part that is executed asynchrounously.
In MainCaller, you start a Task and immediately Wait for it to finished.
This is a blocking operation which only introduces the extra overhead of calling
GetAllLists in another Task.
In this Task you call You start a new Task (by calling GettAllLists) but immediately
wait for this Task to finish by waiting for its Result (which is also blocking).
In the Task started by GetAllLists you have the Parallel.Foreach loop which starts
several new Tasks. Each of these 'for' Tasks will start another Task by calling
MyMethod and immediately waiting for its result.
The net result is that your code completely executes synchronously. The only parallelism is introduced in the Parallel.For loop.
Hint: a usefull thread concerning this topic: Using async/await for multiple tasks
Additionally your code contains a serious bug:
Each Task created by the Parallel.For loop will eventually add its partial List to the ReturnList by calling AddRange. 'AddRange' is not thread safe, so you need to have some synchronisation mechanism (e.g. 'Lock') or there is the possibility that your ReturnList gets corrupted or does not contain all the results. See also: Is the List<T>.AddRange() thread safe?

Related

Two ways of calling asynchronous method. C#

Are there differences between this two approaches? Or program runs similarly in this two cases? If there are differences could you say what are these differences.
First approach:
Task myTask = MyFunctionAsync();
await myTask;
Second approach:
await MyFunctionAsync();

Short version: "not really, at least not in an interesting way"
Long version: awaitables aren't limited to Task/Task<T>, so it is possible (trivially so, in fact) to create code that compiles fine with:
await MyFunctionAsync();
but doesn't compile with:
Task myTask = MyFunctionAsync();
await myTask;
simply because MyFunctionAsync() returns something that isn't a task. ValueTask<int> would be enough for this, but you can make exotic awaitables if you want. But: if we replace Task with var, i.e.
var myTask = MyFunctionAsync();
await myTask;
then now the only difference is that we can refer to myTask at other points in the code, if we want to. This isn't exactly uncommon; the two main scenarios being
combining multiple checks over concurrent code, perhaps using WhenAny or WhenAll
(usually in the case of ValueTask[<T>]) checking whether the awaitable completed synchronously, to avoid the state machine overhead in the synchronous case

They are effectively the same. The difference is that the first way lets you do more steps before you wait for a response. So you could start many tasks concurrently in the first way, and then await them all together with await Task.WhenAll(myListOfTasks)
For example:
var myTasks = myEmployees.Select(e => ProcessPayrollAsync(e));
await Task.WhenAll(myTasks);
I would use the first way if you need to for concurrency and the second way if its a simple case because its shorter.

In that particular case, the 2 forms of code are executed in a similar way. Homewer, consider this:
public async Task<int> CalculateResult(InputData data) {
// This queues up the work on the threadpool.
var expensiveResultTask = Task.Run(() => DoExpensiveCalculation(data));
// Note that at this point, you can do some other work concurrently,
// as CalculateResult() is still executing!
// Execution of CalculateResult is yielded here!
var result = await expensiveResultTask;
return result;
}
As the comments in the code above point out, between a task is running and the await call you can execute any other concurrent code.
For more information, read this article.

What really helped me to understand async-await was the cook analogy descrived by Eric Lippert in this interview. Search somewhere in the middle for async-await.
Here he describes a cook making breakfast. Once he put on the kettle to boil water for the tea, he doesn't wait idly for the water to cook. Instead he puts the bread in the toaster, tea in the teapot and starts slicing tomatoes: whenever he has to wait for another apparatus or other cook to finish his job, he doesn't wait idly, but starts the next task, until he needs the results for one of tasks previous tasks.
Async await, does the same: whenever another process has to do something, where your thread can do nothing but wait for the other process to finish, the thread can look around to see if it can do other things instead. You typically see async-await when another lengthy process is involved: Writing data to a file, Querying data from a database or from the internet. Whenever your thread has to do this, the thread can order the other process to do something while it continues to do other things:
Task<string> taskReadFile = ReadMyFileAsync(...);
// not awaiting, we can do other things:
DoSomethingElse();
// now we need the result of the file:
string fetchedData = await taskReadFile;
So what happens. ReadMyFileAsync is async, so you know that somewhere deep inside it, the method awaits. In fact, your compiler will warn you if you forget to await.
Once your thread sees the await, it knows that the results of the await are needed, so it can't continue. Instead it goes up the call stack to continue processing (in my example DoSomethingElse()), until it sees an await. It goes up the call stack again and continues processing, etc.
So in fact there is no real difference between your first and your second method. You can compare it with:
double x = Math.Sin(4.0)
versus
double a = 4.0;
double x = Math.Sin(a);
Officially the only difference is that after these statements you can still use a. Similarly you can use information from the task after the await:
Task<MyData> myTask = FetchMyDataAsync(...);
MyData result = await myTask;
// if desired you can investigate myTask
if (result == null)
{
// why is it null, did someone cancel my task?
if (Task.IsCanceled)
{
Yell("Hey, who did cancel my task!");
}
}
But most of the times you are not interested in the task. If you don't have anything else to do while the task is executing, I'd just await for it:
MyData fetchedData = await FetchMyDataAsync(...)

When is the best place to use Task.Result instead of awaiting Task

Whilst I've been using async code in .NET for a while, I've only recently started to research it and understand what's going on. I've just been going through my code and trying to alter it so if a task can be done in parallel to some work, then it is. So for example:
var user = await _userRepo.GetByUsername(User.Identity.Name);
//Some minor work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(user, DateTime.Now);
return user;
Now becomes:
var userTask = _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(userTask.Result, DateTime.Now);
return user;
My understand is that the user object is now being fetched from the database WHILST some unrelated work is going on. However, things I've seen posted imply that result should be used rarely and await is preferred but I don't understand why I'd want to wait for my user object to be fetched if I can be performing some other independant logic at the same time?

Let's make sure to not bury the lede here:
So for example: [some correct code] becomes [some incorrect code]
NEVER NEVER NEVER DO THIS.
Your instinct that you can restructure your control flow to improve performance is excellent and correct. Using Result to do so is WRONG WRONG WRONG.
The correct way to rewrite your code is
var userTask = _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(await userTask, DateTime.Now);
return user;
Remember, await does not make a call asynchronous. Await simply means "if the result of this task is not yet available, go do something else and come back here after it is available". The call is already asynchronous: it returns a task.
People seem to think that await has the semantics of a co-call; it does not. Rather, await is the extract operation on the task comonad; it is an operator on tasks, not call expressions. You normally see it on method calls simply because it is a common pattern to abstract away an async operation as a method. The returned task is the thing that is awaited, not the call.
However, things I've seen posted imply that result should be used rarely and await is preferred but I don't understand why I'd want to wait for my user object to be fetched if I can be performing some other independent logic at the same time?
Why do you believe that using Result will allow you to perform other independent logic at the same time??? Result prevents you from doing exactly that. Result is a synchronous wait. Your thread cannot be doing any other work while it is synchronously waiting for the task to complete. Use an asynchronous wait to improve efficiency. Remember, await simply means "this workflow cannot progress further until this task is completed, so if it is not complete, find more work to do and come back later". A too-early await can, as you note, make for an inefficient workflow because sometimes the workflow can progress even if the task is not complete.
By all means, move around where the awaits happen to improve efficiency of your workflow, but never never never change them into Result. You have some deep misunderstanding of how asynchronous workflows work if you believe that using Result will ever improve efficiency of parallelism in the workflow. Examine your beliefs and see if you can figure out which one is giving you this incorrect intuition.
The reason why you must never use Result like this is not just because it is inefficient to synchronously wait when you have an asynchronous workflow underway. It will eventually hang your process. Consider the following workflow:
task1 represents a job that will be scheduled to execute on this thread in the future and produce a result.
asynchronous function Foo awaits task1.
task1 is not yet complete, so Foo returns, allowing this thread to run more work. Foo returns a task representing its workflow, and signs up completing that task as the completion of task1.
The thread is now free to do work in the future, including task1.
task1 completes, triggering the execution of the completion of the workflow of Foo, and eventually completing the task representing the workflow of Foo.
Now suppose Foo instead fetches Result of task1. What happens? Foo synchronously waits for task1 to complete, which is waiting for the current thread to become available, which never happens because we're in a synchronous wait. Calling Result causes a thread to deadlock with itself if the task is somehow affinitized to the current thread. You can now make deadlocks involving no locks and only one thread! Don't do this.

Async await does not mean that several threads will be running your code.
However, it will lower the time your thread will be waiting idly for processes to finish, thus finishing earlier.
Whenever the thread normally would have to wait idly for something to finish, like waiting for a web page to download, a database query to finish, a disk write to finish, the async-await thread will not be waiting idly until the data is written / fetched, but looks around if it can do other things instead, and come back later after the awaitable task is finished.
This has been described with a cook analogy in this inverview with Eric Lippert. Search somewhere in the middle for async await.
Eric Lippert compares async-await with one(!) cook who has to make breakfast. After he starts toasting the bread he could wait idly until the bread is toasted before putting on the kettle for tea, wait until the water boils before putting the tea leaves in the teapot, etc.
An async-await cook, wouldn't wait for the toasted bread, but put on the kettle, and while the water is heating up he would put the tea leaves in the teapot.
Whenever the cook has to wait idly for something, he looks around to see if he can do something else instead.
A thread in an async function will do something similar. Because the function is async, you know there is somewhere an await in the function. In fact, if you forget to program the await, your compiler will warn you.
When your thread meets the await, it goes up its call stack to see if it can do something else, until it sees an await, goes up the call stack again, etc. Once everyone is waiting, he goes down the call stack and starts waiting idly until the first awaitable process is finished.
After the awaitable process is finished the thread will continue processing the statements after the await until he sees an await again.
It might be that another thread will continue processing the statements that come after the await (you can see this in the debugger by checking the thread ID). However this other thread has the context of the original thread, so it can act as if it was the original thread. No need for mutexes, semaphores, IsInvokeRequired (in winforms) etc. For you it seems as if there is one thread.
Sometimes your cook has to do something that takes up some time without idly waiting, like slicing tomatoes. In that case it might be wise to hire a different cook and order him to do the slicing. In the mean time your cook can continue with the eggs that just finished boiling and needed peeling.
In computer terms this would be if you had some big calculations without waiting for other processes. Note the difference with for instance writing data to disk. Once your thread has ordered that the data needs to be written to disk, it normally would wait idly until the data has been written. This is not the case when doing big calculations.
You can hire the extra cook using Task.Run
async Task<DateTime> CalculateSunSet()
{
// start fetching sunset data. however don't wait for the result yet
// you've got better things to do:
Task<SunsetData> taskFetchData = FetchSunsetData();
// because you are not awaiting your thread will do the following:
Location location = FetchLocation();
// now you need the sunset data, start awaiting for the Task:
SunsetData sunsetData = await taskFetchData;
// some big calculations are needed, that take 33 seconds,
// you want to keep your caller responsive, so start a Task
// this Task will be run by a different thread:
Task<DateTime> taskBigCalculations = Taks.Run( () => BigCalculations(sunsetData, location);
// again no await: you are still free to do other things
...
// before returning you need the result of the big calculations.
// wait until big calculations are finished, keep caller responsive:
DateTime result = await taskBigCalculations;
return result;
}

In your case, you can use:
user = await _userRepo.UpdateLastAccessed(await userTask, DateTime.Now);
or perhaps more clearly:
var user = await _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(user, DateTime.Now);
The only time you should touch .Result is when you know the task has been completed. This can be useful in some scenarios where you are trying to avoid creating an async state machine and you think there's a good chance that the task has completed synchronously (perhaps using a local function for the async case), or if you're using callbacks rather than async/await, and you're inside the callback.
As an example of avoiding a state machine:
ValueTask<int> FetchAndProcess(SomeArgs args) {
async ValueTask<int> Awaited(ValueTask<int> task) => SomeOtherProcessing(await task);
var task = GetAsyncData(args);
if (!task.IsCompletedSuccessfully) return Awaited(task);
return new ValueTask<int>(SomeOtherProcessing(task.Result));
}
The point here is that if GetAsyncData returns a synchronously completed result, we completely avoid all the async machinery.

Have you considered this version?
var userTask = _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await _userRepo.UpdateLastAccessed(await userTask, DateTime.Now);
return user;
This will execute the "work" while the user is retrieved, but it also has all the advantages of await that are described in Await on a completed task same as task.Result?
As suggested you can also use a more explicit version to be able to inspect the result of the call in the debugger.
var userTask = _userRepo.GetByUsername(User.Identity.Name);
//Some work that doesn't rely on the user object
user = await userTask;
user = await _userRepo.UpdateLastAccessed(user, DateTime.Now);
return user;

List<MyObject> does not contain a definition for GetAwaiter

I have a method that returns a List<> of an object. This method takes a while to run.
private List<MyObject> GetBigList()
{
... slow stuff
}
This method is called from 4 or 5 sources. So, I thought I would try and use async and await to keep things moving while this list builds. I added this method:
public async Task<List<MyObject>> GetBigListAsync()
{
var resultsTask = GetBigList();
var resuls = await resultsTask;
return resuls;
}
But, on this line:
var resuls = await resultsTask;
I get this error:
List<MyObject> does not contain a definition for GetAwaiter,
and no extension method 'GetAwaiter' accepting a first argument of type List<MyObject> could be found.
What am I missing?

It seems you're a newbee to async-await. What really helped me to understand what async-await does is the restaurant analogy given by Eric Lippert in this interview. Search somewhere in the middle for async await.
Here he describes that if a cook has to wait for something, instead of doing nothing he starts looking around to see if he can do something else in the meantime.
Async-await is similar. Instead of awaiting for a file to be read, a database query to return, a web page to be downloaded, your thread will go up the callstack to see if any of the callers are not awaiting and performs those statements until he sees an await. Once he sees the await the thread goes up the call stack again to see if one of the callers is not awaiting etc. After a while when the file is read, or the query is finished etc, the statements after the await are performed.
Normally while reading your big list your thread would be very busy instead of idly waiting. It's not certain that ordering another thread to do the stuff would improve the time needed to read your list. Consider measuring both methods.
One reason to use async-await, even if it would lengthen the time
needed to read the big list, would be to keep the caller (user
interface?) responsive.
To make your function async, you should do the following:
Declare the function async;
Instead of TResult return Task<TResult> and instead of void return Task;
If your function calls other async functions, consider remembering the returned task instead of await, do other useful stuff you need to do and await the task when you need the result;
If you really want to let another thread do the busy stuff. call
Task.Run( () => GetBigList())
and await when you need the results.
private async Task<List<MyObject>> GetBigListAsync()
{
var myTask = Task.Run( () => GetBigList());
// your thread is free to do other useful stuff right nw
DoOtherUsefulStuff();
// after a while you need the result, await for myTask:
List<MyObject> result = await myTask;
// you can now use the results of loading:
ProcessResult(result);
return result;
}
Once again: if you have nothing useful to do while the other thread is loading the List (like keeping UI responsive), don't do this, or at least measure if you are faster.
Other articles that helped me understanding async-await were
- Async await, by the ever so helpful Stephen Cleary,
- and a bit more advanced: Async-Wait best practices.

resultTask is just the list returned from GetBigList(), so nothing will happen async there.
What you can do is offload the task to a separate thread on the threadpool by using Task.Run and return the awaitable Task object:
// Bad code
public Task<List<MyObject>> GetBigListAsync()
{
return Task.Run(() => GetBigList());
}
While above example best matches what you were trying to do, it is not best practice. Try to make the GetBigList() async by nature or if there really is no way, leave the decision about executing the code on a separate thread to the calling code and don't hide this in the implementation F.e. if the calling code already runs async, there is no reason to spawn yet another thread. This article describes this in more detail.

A couple of years later but I feel it is worth to add for the collection, once people search for the resolution since .NET has changed quite a bit:
return await Task.FromResult(GetBigList())

Async method not running in parallel

In the following code, in the B method, the code Trace.TraceInformation("B - Started"); never gets called.
Should the method be running in parallel?
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
private static async Task A()
{
for (;;)
{
}
}
private static async Task B()
{
Trace.TraceInformation("B - Started");
}
static void Main(string[] args)
{
var tasks = new List<Task> { A(), B() };
Task.WaitAll(tasks.ToArray());
}
}
}

Short answer
No, as you wrote your two async methods, they are indeed not running in parallel. Adding await Task.Yield(); to your first method (e.g. inside the loop) would allow them to do so, but there are more reasonable and straightforward methods, highly depending on what you actually need (interleaved execution on a single thread? Actual parallel execution on multiple threads?).
Long answer
First of all, declaring functions as async does not inherently make them run asynchronously or something. It rather simplifies the syntax to do so - read more about the concepts here: Asynchronous Programming with Async and Await
Effectively A is not asynchronous at all, as there is not a single await inside its method body. Instructions up to the first use of await run synchronously like a regular method would.
From then on, the object that you await determines what happens next, i.e. the context that the remaining method runs in.
To force execution of a task to happen on another thread, use Task.Run or similar.
In this scenario, adding await Task.Yield() does the trick since the current synchronization context is null and this happens to indeed cause the task scheduler (should be ThreadPoolTaskScheduler) to execute the remaining instuctions on a thread-pool thread - some environment or configuration might cause you to only have one of them, so things would still not run in parallel.
Summary
The moral of the story is: Be aware of the differences between two concepts:
concurrency (which is enabled by using async/await reasonably) and
parallelism (which only happens when concurrent tasks get scheduled the right way or if you enforce it using Task.Run, Thread, etc. in which case the use of async is completely irrelevant anyway)

The async modifier is not a magic spawn-a-thread-here marker. It's sole purpose is to let the compiler know a method might depend on some asynchronous operation (A complex data-processing thread, I/O...) so it has to setup a state machine to coordinate the callbacks resulting from those asynchronous operations.
To make A run on another thread you would invoke it using Task.Run which wraps the invocation on a new thread with a Task object, which you can await. Be aware that await-ing a method does not mean your code runs in parallel to A's execution all by itself: It will until the very line you await the Task object, telling the compiler you need the value that the Task object returns. In this case await-ing Task.Run(A) will effectively make your program run forever, waiting for A to return, something that will never happen (barring computer malfunction).
Do have in mind that marking a method as async but not actually awaiting anything will only have the effect of a compiler warning. If you await something that is not truly async (It returns immediately on the calling thread with something like Task.FromResult) it will mean your program takes a runtime speed penalty. It is very slight, however.

No, the methods shown are not expected to "run in parallel".
Why B is never called - you have list of tasks tasks constructed via essentially series of .Add calls - and first is result of A() is added. Since the A method does not have any await it will run to the completion synchronously on the same thread. And after that B() would be called.
Now A will never complete (it is sitting in infinite loop) so really code will not even reach call to B.
Note that even if creation would succeed code never finish WaitAll as A still sits in infinite loop.
If you want methods to "run in parallel" you need to either run them implicitly/explicitly on new threads (i.e. with Task.Run or Thread.Start) or for I/O bound calls let method to release thread with await.

How can I wait till the Parallel.ForEach completes

I'm using TPL in my current project and using Parallel.Foreach to spin many threads. The Task class contains Wait() to wait till the task gets completed. Like that, how I can wait for the Parallel.ForEach to complete and then go into executing next statements?

You don't have to do anything special, Parallel.Foreach() will wait until all its branched tasks are complete. From the calling thread you can treat it as a single synchronous statement and for instance wrap it inside a try/catch.
Update:
The old Parallel class methods are not a good fit for async (Task based) programming. But starting with dotnet 6 we can use Parallel.ForEachAsync()
await Parallel.ForEachAsync(items, (item, cancellationToken) =>
{
await ...
});
There are a few overloads available and the 'body' method should return a ValueTask.

You don't need that with Parallel.Foreach: it only executes the foreach in as many thread as there are processors available, but it returns synchronously.
More information can be found here

As everyone here said - you dont need to wait for it. What I can add from my experience:
If you have an async body to execute and you await some async calls inside, it just ran through my code and did not wait for anything. So I just replaced the await with .Result - then it worked as intended. I couldnt find out though why is that so :/

if you are storing results from the tasks in a List, make sure to use a thread-safe data structure such as ConcurrentBag, otherwise, some results would be missing because of concurrent write issues.

I believe that you can use IsCompleted like follows:
if(Parallel.ForEach(files, f => ProcessFiles(f)).IsCompleted)
{
// DO STUFF
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.