I just started learning about async/await.
There is this article of Stephen Cleary:
https://blog.stephencleary.com/2013/11/there-is-no-thread.html
I don't know much about I/O but... Can someone explain me how can I apply the async/await logic for an operation that doesn't use I/O?
Let's say we have a string[] and want to reverse all the strings in this array.
async Task<string> ReverseAsync(string s)
{
char[] charArray = s.ToCharArray();
Array.Reverse(charArray);
return new string(charArray);
}
Then I want to "parallelly" call this method for each element in the array.
I understand that a code like this will not be a valid, but how can it be implemented using async/await?
Can someone explain me how can I apply the async/await logic for an operation that doesn't use I/O?
You can use the new async/await in .NET to simplify concurrent operations without all the tedius mucking about of spinning up potentially expensive threads. async/await can be used irrespective of whether the operation is CPU-bound or IO-bound.
The syntax is mostly the same except that for a CPU-bound operation you generally explicity call Task.Run() or a version thereof. TasK can be a thought of as a high level construct to represent concurrent operations and not a synonym for Thread. Task.Run will execute code in one of the re-usable threads in the thread-pool which most likely is already running. Once your operation is complete it is returned to the pool.
e.g.
await Task.Run (() => CalculatePrimeNumbersAsync (10000));
In your scenario you should try to call Task.Run() as close as you can to the top of the callstack. This makes your code:
more intuitive
Obvious that the called operation is async
Client code has complete control over root task creation
Provides client code the chance to capture the current context (particularly useful in GUI code)
Then I want to "parallelly" call this method for each element in the array. I understand that a code like this will not be a valid, but how can it be implemented using async/await?
There is probably not much you can do to change the implementation, however you can certainly make use of concurrency by the way you call the method.
So given your code of:
async Task<string> ReverseAsync(string s)
{
char[] charArray = s.ToCharArray();
Array.Reverse(charArray);
return new string(charArray);
}
You would call it like so (let's assume it's a button click handler in WinForms):
async void OnButtonClicked (object sender, System.EventArgs e)
{
var reversed = await Task.Run(() => ReverseString ("Miss Piggy is a Muppet"));
}
Note again we use Task.Run() because the client code knows that ReverseString is CPU-bound. Don't use Task.Run() if you are going to call something I/O bound such as WCF or Entity Framework.
More
Cleary, Stephen, Task.Run Etiquette Examples: Don't Use Task.Run in the Implementation, November 2013
use Parallel.ForEach to execute method and reverse many strings in array. i suggest you to process strings in batch, like 10 by 10 or 100 by 100 depending on size of strings. Because overhead of scheduling a thread from thread pool for small tasks is a lot. When you batch items, you will have large enough tasks and this overhead becomes negligible.
int batchSize = 10;
Parallel.ForEach(array.Select((x,i) => new {value = x, index= i})
.GroupBy(a => a.index/batchSize),
(item) =>
{
array[item.index] = item.value.Reverse();
});
Related
In chapter 4.4 Dynamic Parallelism, in Stephen Cleary's book Concurrency in C# Cookbook, it says the following:
Parallel tasks may use blocking members, such as Task.Wait,
Task.Result, Task.WaitAll, and Task.WaitAny. In contrast, asynchronous
tasks should avoid blocking members, and prefer await, Task.WhenAll,
and Task.WhenAny.
I was always told that Task.Wait etc are bad because they block the current thread, and that it's much better to use await instead, so that the calling thread is not blocked.
Why is it ok to use Task.Wait etc for a parallel (which I think means CPU bound) Task?
Example:
In the example below, isn't Test1() better because the thread that calls Test1() is able to continue doing something else while it waits for the for loop to complete?
Whereas the thread that calls Test() is stuck waiting for the for loop to complete.
private static void Test()
{
Task.Run(() =>
{
for (int i = 0; i < 100; i++)
{
//do something.
}
}).Wait();
}
private static async Task Test1()
{
await Task.Run(() =>
{
for (int i = 0; i < 100; i++)
{
//do something.
}
});
}
EDIT:
This is the rest of the paragraph which I'm adding based on Peter Csala's comment:
Parallel tasks also commonly use AttachedToParent to create parent/child relationships between tasks. Parallel tasks should be created with Task.Run or Task.Factory.StartNew.
You've already got some great answers here, but just to chime in (sorry if this is repetitive at all):
Task was introduced in the TPL before async/await existed. When async came along, the Task type was reused instead of creating a separate "Promise" type.
In the TPL, pretty much all tasks were Delegate Tasks - i.e., they wrap a delegate (code) which is executed on a TaskScheduler. It was also possible - though rare - to have Promise Tasks in the TPL, which were created by TaskCompletionSource<T>.
The higher-level TPL APIs (Parallel and PLINQ) hide the Delegate Tasks from you; they are higher-level abstractions that create multiple Delegate Tasks and execute them on multiple threads, complete with all the complexity of partitioning and work queue stealing and all that stuff.
However, the one drawback to the higher-level APIs is that you need to know how much work you are going to do before you start. It's not possible for, e.g., the processing of one data item to add another data item(s) back at the beginning of the parallel work. That's where Dynamic Parallelism comes in.
Dynamic Parallelism uses the Task type directly. There are many APIs on the Task type that were designed for Dynamic Parallelism and should be avoided in async code unless you really know what you're doing (i.e., either your name is Stephen Toub or you're writing a high-performance .NET runtime). These APIs include StartNew, ContinueWith, Wait, Result, WaitAll, WaitAny, Id, CurrentId, RunSynchronously, and parent/child tasks. And then there's the Task constructor itself and Start which should never be used in any code at all.
In the particular case of Wait, yes, it does block the thread. And that is not ideal (even in parallel programming), because it blocks a literal thread. However, the alternative may be worse.
Consider the case where task A reaches a point where it has to be sure task B completes before it continues. This is the general Dynamic Parallelism case, so assume no parent/child relationship.
The old-school way to avoid this kind of blocking is to split method A up into a continuation and use ContinueWith. That works fine, but it does complicate the code - rather considerably in the case of loops. You end up writing a state machine, essentially what async does for you. In modern code, you may be able to use await, but then that has its own dangers: parallel code does not work out of the box with async, and combining the two can be tricky.
So it really comes down to a tradeoff between code complexity vs runtime efficiency. And when you consider the following points, you'll see why blocking was common:
Parallelism is normally done on Desktop applications; it's not common (or recommended) for web servers.
Desktop machines tend to have plenty of threads to spare. I remember Mark Russinovich (long before he joined Microsoft) demoing how showing a File Open dialog on Windows spawned some crazy number of threads (over 20, IIRC). And yet the user wouldn't even notice 20 threads being spawned (and presumably blocked).
Parallel code is difficult to maintain in the first place; Dynamic Parallelism using continuations is exceptionally difficult to maintain.
Given these points, it's pretty easy to see why a lot of parallel code blocks thread pool threads: the user experience is degraded by an unnoticeable amount, but the developer experience is enhanced significantly.
The thing is if you are using tasks to parallelize CPU-bound work - your method is likely not asynchronous, because the main benefit of async is asynchronous IO, and you have no IO in this case. Since your method is synchronous - you can't await anything, including tasks you use to parallelize computation, nor do you need to.
The valid concern you mentioned is you would waste current thread if you just block it waiting for parallel tasks to complete. However you should not waste it like this - it can be used as one participant in parallel computation. Say you want to perform parallel computation on 4 threads. Use current thread + 3 other threads, instead of using just 4 other threads and waste current one blocked waiting for them.
That's what for example Parallel LINQ does - it uses current thread together with thread pool threads. Note also its methods are not async (and should not be), but they do use Tasks internally and do block waiting on them.
Update: about your examples.
This one:
private static void Test()
{
Task.Run(() =>
{
for (int i = 0; i < 100; i++)
{
//do something.
}
}).Wait();
}
Is always useless - you offset some computation to separate thread while current thread is blocked waiting, so in result one thread is just wasted for nothing useful. Instead you should just do:
private static void Test()
{
for (int i = 0; i < 100; i++)
{
//do something.
}
}
This one:
private static async Task Test1()
{
await Task.Run(() =>
{
for (int i = 0; i < 100; i++)
{
//do something.
}
});
}
Is useful sometimes - when for some reason you need to perform computation but don't want to block current thread. For example, if current thread is UI thread and you don't want user interface to be freezed while computation is performed. However, if you are not in such environment, for example you are writing general purpose library - then it's useless too and you should stick to synchronous version above. If user of your library happen to be on UI thread - he can wrap the call in Task.Run himself. I would say that even if you are not writing a library but UI application - you should move all such logic (for loop in this case) into separate synchronous method and then wrap call to that method in Task.Run if necessary. So like this:
private static async Task Test2()
{
// we are on UI thread here, don't want to block it
await Task.Run(() => {
OurSynchronousVersionAbove();
});
// back on UI thread
// do something else
}
Now say you have that synchronous method and want to parallelize the computation. You may try something like this:
static void Test1() {
var task1 = Task.Run(() => {
for (int i = 0; i < 50;i++) {
// do something
}
});
var task2 = Task.Run(() => {
for (int i = 50; i < 100;i++) {
// do something
}
});
Task.WaitAll(task1, task2);
}
That will work but it wastes current thread blocked for no reason, waiting for two tasks to complete. Instead, you should do it like this:
static void Test1() {
var task = Task.Run(() => {
for (int i = 0; i < 50; i++) {
// do something
}
});
for (int i = 50; i < 100; i++) {
// do something
}
task.Wait();
}
Now you perform computation in parallel using 2 threads - one thread pool thread (from Task.Run) and current thread. And here is your legitimate use of task.Wait(). Of course usually you should stick to existing solutions like parallel LINQ, which does the same for you but better.
One of the risks of Task.Wait is deadlocks. If you call .Wait on the UI thread, you will deadlock if the task needs the main thread to complete. If you call an async method on the UI thread such deadlocks are very likely.
If you are 100% sure the task is running on a background thread, is guaranteed to complete no matter what, and that this will never change, it is fine to wait on it.
Since this if fairly difficult to guarantee it is usually a good idea to try to avoid waiting on tasks at all.
I believe that point in this passage is to not use blocking operations like Task.Wait in asynchronous code.
The main point isn't that Task.Wait is preferred in parallel code; it just says that you can get away with it, while in asynchronous code it can have a really serious effect.
This is because the success of async code depends on the tasks 'letting go' (with await) so that the thread(s) can do other work. In explicitly parallel code a blocking Wait may be OK because the other streams of work will continue going because they have a dedicated thread(s).
As I mentioned in the comments section if you look at the Receipt as a whole it might make more sense. Let me quote here the relevant part as well.
The Task type serves two purposes in concurrent programming: it can be a parallel task or an asynchronous task. Parallel tasks may use blocking members, such as Task.Wait, Task.Result, Task.WaitAll, and Task.WaitAny. In contrast, asynchronous tasks should avoid blocking members, and prefer await, Task.WhenAll, and Task.WhenAny. Parallel tasks also commonly use AttachedToParent to create parent/child relationships between tasks. Parallel tasks should be created with Task.Run or Task.Factory.StartNew.
In contrast, asynchronous tasks should avoid blocking members and prefer await, Task.WhenAll, and Task.WhenAny. Asynchronous tasks do not use AttachedToParent, but they can inform an implicit kind of parent/child relationship by awaiting an other task.
IMHO, it clearly articulates that a Task (or future) can represent a job, which can take advantage of the async I/O. OR it can represent a CPU bound job which could run in parallel with other CPU bound jobs.
Awaiting the former is the suggested way because otherwise you can't really take advantage of the underlying I/O driver's async capability. The latter does not require awaiting since it is not an async I/O job.
UPDATE Provide an example
As Theodor Zoulias asked in a comment section here is a made up example for parallel tasks where Task.WaitAll is being used.
Let's suppose we have this naive Is Prime Number implementation. It is not efficient, but it demonstrates that you perform something which is computationally can be considered as heavy. (Please also bear in mind for the sake of simplicity I did not add any error handling logic.)
static (int, bool) NaiveIsPrime(int number)
{
int numberOfDividers = 0;
for (int divider = 1; divider <= number; divider++)
{
if (number % divider == 0)
{
numberOfDividers++;
}
}
return (number, numberOfDividers == 2);
}
And here is a sample use case which run a couple of is prime calculation in parallel and waits for the results in a blocking way.
List<Task<(int, bool)>> jobs = new();
for (int number = 1_010; number < 1_020; number++)
{
var x = number;
jobs.Add(Task.Run(() => NaiveIsPrime(x)));
}
Task.WaitAll(jobs.ToArray());
foreach (var job in jobs)
{
(int number, bool isPrime) = job.Result;
var isPrimeInText = isPrime ? "a prime" : "not a prime";
Console.WriteLine($"{number} is {isPrimeInText}");
}
As you can see I haven't used any await keyword anywhere.
Here is a dotnet fiddle link
and here is a link for the prime numbers under 10 000.
I recommended using await instead of Task.Wait() for asynchronous methods/tasks, because this way the thread can be used for something else while the task is running.
However, for parallel tasks that are CPU -bound, most of the available CPU should be used. It makes sense use Task.Wait() to block the current thread until the task is complete. This way, the CPU -bound task can make full use of the CPU resources.
Update with supplementary statement.
Parallel tasks can use blocking members such as Task.Wait(), Task.Result, Task.WaitAll, and Task.WaitAny as they should consume all available CPU resources. When working with parallel tasks, it can be beneficial to block the current thread until the task is complete, since the thread is not being used for anything else. This way, the software can fully utilize all available CPU resources instead of wasting resources by keeping the thread running while it is blocked.
I work with Node.js and so I got very used to its 'programming style' and its way to deal with asynchronous operations through higher order functions and callbacks, where most I/O events are handled in a async way by design and if I want to make a sync operation, I need to use Promises or the await shortcut, whereas in synchronous programming languages like Java, C#, C++ apparently I'd have to do the opposite, by somehow telling the compiler that the task I want to achieve must be performed asynchronously. I tried reading through the Microsoft docs and couldn't really understand how to achieve it. I mean, I could use Threads but for the simple task I want to process, exploring Threads is just not worth it for the trouble on guaranteeing thread-safety.
I came across the Task class. So, suppose that I want to run a Task method multiple times in a async way, where the functions are being called in parallel. How can I do this?
private Task<int> MyCustomTask(string whatever)
{
// I/O event that I want to be processed in async manner
}
So basically, I wanted to run this method in 'parallel' without threading.
foreach (x in y)
{
MyCustomTask("");
}
If you don't want to await, you can do something like this.
public class AsyncExamples
{
public List<string> whatevers = new List<string> { "1", "2", "3" };
private void MyCustomTask(string whatever)
{
// I/O event that I want to be processed in async manner
}
public void FireAndForgetAsync(string whatever)
{
Task.Run(
() =>
{
MyCustomTask(whatever);
}
);
}
public void DoParallelAsyncStuff()
{
foreach (var whatever in whatevers)
{
FireAndForgetAsync(whatever);
}
}
}
most I/O events are handled in a async way by design and if I want to make a sync operation, I need to use Promises or the await shortcut
I believe the difference you're expressing is the difference between functional and imperative programming, not the difference between asynchronous and synchronous programming. So I think what you're saying is that asynchronous programming fits more naturally with a functional style, which I would agree with. JavaScript is mostly functional, though it also has imperative and OOP aspects. C# is more imperative and OOP than functional, although it grows more functional with each year.
However, both JavaScript and C# are synchronous by default, not asynchronous by default. A method must "opt in" to asynchrony using async/await. In that way, they are very similar.
I tried reading through the Microsoft docs and couldn't really understand how to achieve it.
Cheat sheet if you're familiar with asynchronous JavaScript:
Task<T> is Promise<T>
If you need to write a wrapper for another API (e.g., the Promise<T> constructor using resolve/reject), then the C# type you need is TaskCompletionSource<T>.
async and await work practically the same way.
Task.WhenAll is Promise.all, and Task.WhenAny is Promise.any. There isn't a built-in equivalent for Promise.race.
Task.FromResult is Promise.resolve, and Task.FromException is Promise.reject.
So, suppose that I want to run a Task method multiple times in a async way, where the functions are being called in parallel. How can I do this?
(minor pedantic note: this is asynchronous concurrency; not parallelism, which implies threads)
To do this in JS, you would take your iterable, map it over an async method (resulting in an iterable of promises), and then Promise.all those promises.
To do the same thing in C#, you would take your enumerable, Select it over an async method (resulting in an enumerable of tasks), and then Task.WhenAll those tasks.
var tasks = y.Select(x => MyCustomTask(x)).ToList();
await Task.WhenAll(tasks);
recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Is there any way how could I write:
List<int> list = new List<int>[]();
Parallel.ForEach(arrayValues, async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
list.Add(x);
});
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Well, that's because Parallel doesn't work with async. And from a different perspective, why would you want to mix them in the first place? They do opposite things. Parallel is all about adding threads and async is all about giving up threads. If you want to do asynchronous work concurrently, then use Task.WhenAll. That's the correct tool for the job; Parallel is not.
That said, it sounds like you want to use the wrong tool, so here's how you do it...
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
You'll need to have some kind of a signal that some code can block on until the processing is done, e.g., CountdownEvent or Monitor. On a side note, you'll need to protect access to the non-thread-safe List<T> as well.
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
Since Parallel doesn't understand async lambdas, when the first await yields (returns) to its caller, Parallel will assume that interation of the loop is complete.
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
Correct. As far as Parallel knows, it can only "see" the method to the first await that returns to its caller. So it doesn't know when the async lambda is complete. It also will assume iterations are complete too early, which throws partitioning off.
You don't need Parallel.For/ForEach here you just need to await a list of tasks.
Background
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run multiple async tasks at the same time use Task.WhenAll , or a TPL Dataflow Block (or something similar) which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
Unless you need to do more inside of your lambda (for which you haven't shown), just use aSelect and WhenAll
var tasks = items .Select(LongRunningIoOperationAsync);
var results = await Task.WhenAll(tasks); // here is your list of int
If you do, you can still use the await,
var tasks = items.Select(async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
// do other stuff
return x;
});
var results = await Task.WhenAll(tasks);
Note : If you need the extended functionality of Parallel.ForEach (namely the Options to control max concurrency), there are several approach, however RX or DataFlow might be the most succinct
I'm currently trying to improve my understanding of Multithreading and the TPL in particular.
A lot of the constructs make complete sense and I can see how they improve scalability / execution speed.
I know that for asynchronous calls that don't tie up a thread (like I/O bound calls), Task.WhenAll would be the perfect fit.
One thing I am wondering about, though, is the best practice for making CPU-bound work that I want to run in parallel asynchronous.
To make code run in parallel the obvious choice would be the Parallel class.
As an example, say I have an array of data I want to perform some number crunching on:
string[] arr = { "SomeData", "SomeMoreData", "SomeOtherData" };
Parallel.ForEach(arr, (s) =>
{
SomeReallyLongRunningMethod(s);
});
This would run in parallel (if the analyser decides that parallel is faster than synchronous), but it would also block the thread.
Now the first thing that came to my mind was simply wrapping it all in Task.Run() ala:
string[] arr = { "SomeData", "SomeMoreData", "SomeOtherData" };
await Task.Run(() => Parallel.ForEach(arr, (s) =>
{
SomeReallyLongRunningMethod(s);
}));
Another option would be to either have a seperate Task returing method or inline it and use Task.WhenAll like so:
static async Task SomeReallyLongRunningMethodAsync(string s)
{
await Task.Run(() =>
{
//work...
});
}
// ...
await Task.WhenAll(arr.Select(s => SomeReallyLongRunningMethodAsync(s)));
The way I understand it is that option 1 creates a whole Task that will, for the life of it, tie up a thread to just sit there and wait until the Parallel.ForEach finishes.
Option 2 uses Task.WhenAll (for which I don't know whether it ties up a thread or not) to await all Tasks, but the Tasks had to be created manually. Some of my resources (expecially MS ExamRef 70-483) have explicitly advised against manually creating Tasks for CPU-bound work as the Parallel class is supposed to be used for it.
Now I'm left wondering about the best performing version / best practice for the problem of wanting parallel execution that can be awaited.
I hope some more experienced programmer can shed some light on this for me!
You really should use Microsoft's Reactive Framework for this. It's the perfect solution. You can do this:
string[] arr = { "SomeData", "SomeMoreData", "SomeOtherData" };
var query =
from s in arr.ToObservable()
from r in Observable.Start(() => SomeReallyLongRunningMethod(s))
select new { s, r };
IDisposable subscription =
query
.Subscribe(x =>
{
/* Do something with each `x.s` and `x.r` */
/* Values arrive as soon as they are computed */
}, () =>
{
/* All Done Now */
});
This assuming that the signature of SomeReallyLongRunningMethod is int SomeReallyLongRunningMethod(string input), but it is easy to cope with something else.
It's all run on multi-threads in parallel.
If you need to marshal back to the UI thread you can do that with an .ObserveOn just prior to the .Subscribe call.
If you want to stop the computation early you can call subscription.Dispose().
Option 1 is the way to go as the thread from thread pool being used for the task will also get used in parallel for loop. Similar question answered here.
I have a method with 3 parameters on which I would like to create for it a thread. I know how to create a thread for a method without any paremeters and with object type parameter. The method header is:
public void LoadData(DataGridView d, RadioButton rb1, RadioButton rb2){
//}
In addition to Tzah answer, you do not mention the thread life time and management. This is a good place to think about it - As long as you write high quality code..
If you use thread from threadpool with 3 params and more, using my previous answer: C# - ThreadPool QueueUserWorkItem Use?
If you are using .Net 4.0+ consider using Tasks
You can use Lambda Expression like this:
new Thread(() => LoadData(var1, var2, var3)).Start();
or
Thread T1 = new Thread(() => LoadData(var1, var2, var3));
T1.Start();
As Tzah's answer will definitely work, the recommanded way of using threads in the .NET Framework now resides with the Task Parallel Library. The TPL provided an abstraction over the ThreadPool, which manages a pool of threads for us to use instead of creating and destroying, which has a non-neglectibale cost. They may not be suitable for all sorts of offload work (like very long running cpu consuming tasks), but they will definitely cover most cases.
An example equivalent to your request using the TPL would be to use Task.Run:
Task task = Task.Run(() => LoadData(var1, var2, var3));