I've been considering the new async stuff in C# 5, and one particular question came up.
I understand that the await keyword is a neat compiler trick/syntactic sugar to implement continuation passing, where the remainder of the method is broken up into Task objects and queued-up to be run in order, but where control is returned to the calling method.
My problem is that I've heard that currently this is all on a single thread. Does this mean that this async stuff is really just a way of turning continuation code into Task objects and then calling Application.DoEvents() after each task completes before starting the next one?
Or am I missing something? (This part of the question is rhetorical - I'm fully aware I'm missing something :) )
It is concurrent, in the sense that many outstanding asychronous operations may be in progress at any time. It may or may not be multithreaded.
By default, await will schedule the continuation back to the "current execution context". The "current execution context" is defined as SynchronizationContext.Current if it is non-null, or TaskScheduler.Current if there's no SynchronizationContext.
You can override this default behavior by calling ConfigureAwait and passing false for the continueOnCapturedContext parameter. In that case, the continuation will not be scheduled back to that execution context. This usually means it will be run on a threadpool thread.
Unless you're writing library code, the default behavior is exactly what's desired. WinForms, WPF, and Silverlight (i.e., all the UI frameworks) supply a SynchronizationContext, so the continuation executes on the UI thread (and can safely access UI objects). ASP.NET also supplies a SynchronizationContext that ensures the continuation executes in the correct request context.
Other threads (including threadpool threads, Thread, and BackgroundWorker) do not supply a SynchronizationContext. So Console apps and Win32 services by default do not have a SynchronizationContext at all. In this situation, continuations execute on threadpool threads. This is why Console app demos using await/async include a call to Console.ReadLine/ReadKey or do a blocking Wait on a Task.
If you find yourself needing a SynchronizationContext, you can use AsyncContext from my Nito.AsyncEx library; it basically just provides an async-compatible "main loop" with a SynchronizationContext. I find it useful for Console apps and unit tests (VS2012 now has built-in support for async Task unit tests).
For more information about SynchronizationContext, see my Feb MSDN article.
At no time is DoEvents or an equivalent called; rather, control flow returns all the way out, and the continuation (the rest of the function) is scheduled to be run later. This is a much cleaner solution because it doesn't cause reentrancy issues like you would have if DoEvents was used.
The whole idea behind async/await is that it performs continuation passing nicely, and doesn't allocate a new thread for the operation. The continuation may occur on a new thread, it may continue on the same thread.
The real "meat" (the asynchronous) part of async/await is normally done separately and the communication to the caller is done through TaskCompletionSource. As written here http://blogs.msdn.com/b/pfxteam/archive/2009/06/02/9685804.aspx
The TaskCompletionSource type serves two related purposes, both alluded to by its name: it is a source for creating a task, and the source for that task’s completion. In essence, a TaskCompletionSource acts as the producer for a Task and its completion.
and the example is quite clear:
public static Task<T> RunAsync<T>(Func<T> function)
{
if (function == null) throw new ArgumentNullException(“function”);
var tcs = new TaskCompletionSource<T>();
ThreadPool.QueueUserWorkItem(_ =>
{
try
{
T result = function();
tcs.SetResult(result);
}
catch(Exception exc) { tcs.SetException(exc); }
});
return tcs.Task;
}
Through the TaskCompletionSource you have access to a Task object that you can await, but it isn't through the async/await keywords that you created the multithreading.
Note that when many "slow" functions will be converted to the async/await syntax, you won't need to use TaskCompletionSource very much. They'll use it internally (but in the end somewhere there must be a TaskCompletionSource to have an asynchronous result)
The way I like to explain it is that the "await" keyword simply waits for a task to finish but yields execution to the calling thread while it waits. It then returns the result of the Task and continues from the statement after the "await" keyword once the Task is complete.
Some people I have noticed seem to think that the Task is run in the same thread as the calling thread, this is incorrect and can be proved by trying to alter a Windows.Forms GUI element within the method that await calls. However, the continuation is run in the calling thread where ever possible.
Its just a neat way of not having to have callback delegates or event handlers for when the Task completes.
I feel like this question needs a simpler answer for people. So I'm going to oversimplify.
The fact is, if you save the Tasks and don't await them, then async/await is "concurrent".
var a = await LongTask1(x);
var b = await LongTask2(y);
var c = ShortTask(a, b);
is not concurrent. LongTask1 will complete before LongTask2 starts.
var a = LongTask1(x);
var b = LongTask2(y);
var c = ShortTask(await a, await b);
is concurrent.
While I also urge people to get a deeper understanding and read up on this, you can use async/await for concurrency, and it's pretty simple.
Related
We are using this code snippet from StackOverflow to produce a Task that completes as soon as the first of a collection of tasks completes successfully. Due to the non-linear nature of its execution, async/await is not really viable, and so this code uses ContinueWith() instead. It doesn't specify a TaskScheduler, though, which a number of sources have mentioned can be dangerous because it uses TaskScheduler.Current when most developers usually expect TaskScheduler.Default behavior from continuations.
The prevailing wisdom appears to be that you should always pass an explicit TaskScheduler into ContinueWith. However, I haven't seen a clear explanation of when different TaskSchedulers would be most appropriate.
What is a specific example of a case where it would be best to pass TaskScheduler.Current into ContinueWith(), as opposed to TaskScheduler.Default? Are there rules of thumb to follow when making this decision?
For context, here's the code snippet I'm referring to:
public static Task<T> FirstSuccessfulTask<T>(IEnumerable<Task<T>> tasks)
{
var taskList = tasks.ToList();
var tcs = new TaskCompletionSource<T>();
int remainingTasks = taskList.Count;
foreach(var task in taskList)
{
task.ContinueWith(t =>
if(task.Status == TaskStatus.RanToCompletion)
tcs.TrySetResult(t.Result));
else
if(Interlocked.Decrement(ref remainingTasks) == 0)
tcs.SetException(new AggregateException(
tasks.SelectMany(t => t.Exception.InnerExceptions));
}
return tcs.Task;
}
Probably you need to choose a task scheduler that is appropriate for actions that an executing delegate instance performs.
Consider following examples:
Task ContinueWithUnknownAction(Task task, Action<Task> actionOfTheUnknownNature)
{
// We know nothing about what the action do, so we decide to respect environment
// in which current function is called
return task.ContinueWith(actionOfTheUnknownNature, TaskScheduler.Current);
}
int count;
Task ContinueWithKnownAction(Task task)
{
// We fully control a continuation action and we know that it can be safely
// executed by thread pool thread.
return task.ContinueWith(t => Interlocked.Increment(ref count), TaskScheduler.Default);
}
Func<int> cpuHeavyCalculation = () => 0;
Action<Task> printCalculationResultToUI = task => { };
void OnUserAction()
{
// Assert that SynchronizationContext.Current is not null.
// We know that continuation will modify an UI, and it can be safely executed
// only on an UI thread.
Task.Run(cpuHeavyCalculation)
.ContinueWith(printCalculationResultToUI, TaskScheduler.FromCurrentSynchronizationContext());
}
Your FirstSuccessfulTask() probably is the example where you can use TaskScheduler.Default, because the continuation delegate instance can be safely executed on a thread pool.
You can also use custom task scheduler to implement custom scheduling logic in your library. For example see Scheduler page on Orleans framework website.
For more information check:
It's All About the SynchronizationContext article by Stephen Cleary
TaskScheduler, threads and deadlocks article by Cosmin Lazar
StartNew is Dangerous article by Stephen Cleary
I'll have to rant a bit, this is getting way too many programmers into trouble. Every programming aid that was designed to make threading look easy creates five new problems that programmers have no chance to debug.
BackgroundWorker was the first one, a modest and sensible attempt to hide the complications. But nobody realizes that the worker runs on the threadpool so should never occupy itself with I/O. Everybody gets that wrong, not many ever notice. And forgetting to check e.Error in the RunWorkerCompleted event, hiding exceptions in threaded code is a universal problem with the wrappers.
The async/await pattern is the latest, it makes it really look easy. But it composes extraordinarily poorly, async turtles all the way down until you get to Main(). They had to fix that eventually in C# version 7.2 because everybody got stuck on it. But not fixing the drastic ConfigureAwait() problem in a library. It is completely biased towards library authors knowing what they are doing, notable is that a lot of them work for Microsoft and tinker with WinRT.
The Task class bridged the gap between the two, its design goal was to make it very composable. Good plan, they could not predict how programmers were going to use it. But also a liability, inspiring programmers to ContinueWith() up a storm to glue tasks together. Even when it doesn't make sense to do so because those tasks merely run sequentially. Notable is that they even added an optimization to ensure that the continuation runs on the same thread to avoid the context switch overhead. Good plan, but creating the undebuggable problem that this web site is named for.
So yes, the advice you saw was a good one. Task is useful to deal with asynchronicity. A common problem that you have to deal with when services move into the "cloud" and latency gets to be a detail you can no longer ignore. If you ContinueWith() that kind code then you invariably care about the specific thread that executes the continuation. Provided by TaskScheduler, low odds that it isn't the one provided by FromCurrentSynchronizationContext(). Which is how async/await happened.
If current task is a child task, then using TaskScheduler.Current will mean the scheduler will be that which the task it is in, is scheduled to; and if not inside another task, TaskScheduler.Current will be TaskScheduler.Default and thus use the ThreadPool.
If you use TaskScheduler.Default, then it will always go to the ThreadPool.
The only reason you would use TaskScheduler.Current:
To avoid the default scheduler issue, you should always pass an
explicit TaskScheduler to Task.ContinueWith and Task.Factory.StartNew.
From Stephen Cleary's post ContinueWith is Dangerous, Too.
There's further explanation here from Stephen Toub on his MSDN blog.
I most certainly don't think I am capable of providing bullet proof answer but I will give my five cents.
What is a specific example of a case where it would be best to pass TaskScheduler.Current into ContinueWith(), as opposed to TaskScheduler.Default?
Imagine you are working on some web api that webserver naturally makes multithreaded. So you need to compromise your parallelism because you don't want to use all the resources of your webserver, but at the same time you want to speed up your processing time, so you decide to make custom task scheduler with lowered concurrency level because why not.
Now your api needs to query some database and order the results, but these results are millions so you decide to do it via Merge Sort(divide and conquer), then you need all your child tasks of this algorithm to be complient with your custom task scheduler (TaskScheduler.Current) because otherwise you will end up taking all the resources for the algorithm and your webserver thread pool will starve.
When to use TaskScheduler.Current, TaskScheduler.Default, TaskScheduler.FromCurrentSynchronizationContext(), or some other TaskScheduler
TaskScheduler.FromCurrentSynchronizationContext() - Specific for WPF,
Forms applications UI thread context, you use this basically when you
want to get back to the UI thread after being offloaded some work to
non-UI thread
example taken from here
private void button_Click(…)
{
… // #1 on the UI thread
Task.Factory.StartNew(() =>
{
… // #2 long-running work, so offloaded to non-UI thread
}).ContinueWith(t =>
{
… // #3 back on the UI thread
}, TaskScheduler.FromCurrentSynchronizationContext());
}
TaskScheduler.Default - Almost all the time when you don't have any specific requirements, edge cases to collate with.
TaskScheduler.Current - I think I've given one generic example above, but in general it should be used when you have either custom scheduler or you explicitly passed TaskScheduler.FromCurrentSynchronizationContext() to TaskFactory or Task.StartNew method and later you use continuation tasks or inner tasks (so pretty damn rare imo).
I'm currently getting into the async/await keywords, and went through the following questions: When correctly use Task.Run and when just async-await and Async/Await vs Threads
However even the second link doesn't answer my question, which is when to simply use
Task.Run(...)
versus
await Task.Run(...)
Is it situational or is there something to be gained by using await (and thus returning to the caller)?
The code Task.Run(...) (in both examples) sends a delegate to the thread pool and returns a task that will contain the results of that delegate. These "results" can be an actual return value, or it can be an exception. You can use the returned task to detect when the results are available (i.e., when the delegate has completed).
So, you should make use of the Task if any of the following are true:
You need to detect when the operation completed and respond in some way.
You need to know if the operation completed successfully.
You need to handle exceptions from the operation.
You need to do something with the return value of the operation.
If your calling code needs to do any of those, then use await:
await Task.Run(...);
With long-running background tasks, sometimes you want to do those (e.g., detect exceptions), but you don't want to do it right away; in this case, just save the Task and then await it at some other point in your application:
this.myBackgroundTask = Task.Run(...);
If you don't need any of the above, then you can do a "fire and forget", as such:
var _ = Task.Run(...); // or just "Task.Run(...);"
Note that true "fire and forget" applications are extremely rare. It's literally saying "run this code, but I don't care whether it completes, when it completes, or whether it was successful".
You can use Task.Run() when handling logic with fire and forget type, similar to invoking events if someone is subscribed to them. You can use this for logging, notifying, etc.
If you depend of the result or actions executed in your method, you need to use await Task.Run() as it pauses current execution until your task is finished.
I just encountered this code. I immediately started cringing and talking to myself (not nice things). The thing is I don't really understand why and can't reasonably articulate it. It just looks really bad to me - maybe I'm wrong.
public async Task<IHttpActionResult> ProcessAsync()
{
var userName = Username.LogonName(User.Identity.Name);
var user = await _user.GetUserAsync(userName);
ThreadPool.QueueUserWorkItem((arg) =>
{
Task.Run(() => _billing.ProcessAsync(user)).Wait();
});
return Ok();
}
This code looks to me like it's needlessly creating threads with ThreadPool.QueueUserWorkItem and Task.Run. Plus, it looks like it has the potential to deadlock or create serious resource issues when under heavy load. Am I correct?
The _billing.ProcessAsync() method is awaitable(async), so I would expect that a simple "await" keyword would be the right thing to do and not all this other baggage.
I believe Scott is correct with his guess that ThreadPool.QueueUserWorkItem should have been HostingEnvironment.QueueBackgroundWorkItem. The call to Task.Run and Wait, however, are entirely nonsensical - they're pushing work to the thread pool and blocking a thread pool thread on it, when the code is already on the thread pool.
The _billing.ProcessAsync() method is awaitable(async), so I would expect that a simple "await" keyword would be the right thing to do and not all this other baggage.
I strongly agree.
However, this will change the behavior of the action. It will now wait until Billing.ProcessAsync is completed, whereas before it would return early. Note that returning early on ASP.NET is almost always a mistake - I would say any "billing" processing would be even more certainly a mistake. So, replacing this mess with await will make the app more correct, but it will cause the ProcessAsync action to take longer to return to the client.
It's strange, but depending on what the author is trying to achieve, it seems ok to me to queue a work item in the thread pool from inside an async method.
This is not as starting a thread, it's just queueing an action to be done in a ThreadPool's thread when there is a free one. So the async method (ProcessAsync) can continue and don't need to care about the result.
The weird thing is the code inside the lambda to be enqueued in the ThreadPool. Not only the Task.Run() (which is superflous and just causes unnecessary overhead), but to call an async method without waiting for it to finish is bad inside a method that should be run by the ThreadPool, because it returns the control flow to the caller when awaiting something.
So the ThreadPool eventually thinks this method is finished (and the thread free for the next action in the queue), while actually the method wants to be resumed later.
This may lead to very undefined behaviour. This code may have been working (in certain circumstances), but I would not rely on it and use it as productive code.
(The same goes for calling a not-awaited async method inside Task.Run(), as the Task "thinks" it's finished while the method actually wants to be resumed later).
As solution I'd propose to simply await that async method, too:
await _billing.ProcessAsync(user);
But of course without any knowledge about the context of the code snippet I can't guarantee anything. Note that this would change the behaviour: while until now the code did not wait for _billing.ProcessAsync() to finsih, it would now do. So maybe leaving out await and just fire and forget
_billing.ProcessAsync(user);
maybe good enough, too.
What would be an appropriate way to re-write my SlowMethodAsync async method, which executes a long running task, that can be awaited, but without using Task.Run?
I can do it with Task.Run as following:
public async Task SlowMethodAsync()
{
await Task.Run(() => SlowMethod());
}
public void SlowMethod()
{
//heavy math calculation process takes place here
}
The code, as it shown above, will spawn a new thread from a thread-pool. If it can be done differently, will it run on the invocation thread, and block it any way, as the SlowMethod content is a solid chunk of math processing without any sort of yielding and time-slice surrendering.
Just to clarify that I need my method to stay asynchronous, as it unblocks my UI thread. Just looking for another possible way to do that in a different way to how it's currently done, while keeping async method signature.
async methods are meant for asynchronous operations. They enable not blocking threads for non-CPU-bound work. If your SlowMethod has any IO-bound operations which are currently executed synchronously (e.g. Stream.Read or Socket.Send) you can exchange these for async ones and await them in your async method.
In your case (math processing) the code's probably mostly CPU-bound, so other than offloading the work to a ThreadPool thread (using Task.Run) there's no reason to use async as all. Keep SlowMethod as it is and it will run on the calling thread.
Regarding your update: You definitely want to use Task.Run (or one of the Task.Factory.StartNew overloads) to offload your work to a different thread.
Actually, for your specific case, you should use
await Task.Factory.StartNew(() => SlowMethod(), TaskCreationOptions.LongRunning)
which will allow the scheduler to run your synchronous task in the appropriate place (probably in a new thread) so it doesn't gum up the ThreadPool (which isn't really designed for CPU heavy workloads).
Wouldn't it be better just to call SlowMethod synchronously? What are you gaining by awaiting it?
The async-await features make it elegant to write non-blocking code. But, while non blocking, the work performed within an async function can still be non-trivial.
When writing async code, I find it natural to write code that follows the pattern 'all the way down the rabbit hole', so to speak, where all methods within the calling tree are marked async and the APIs used are async; but even while non blocking, the executed code can take up a fair amount of the contextual thread's time.
How and when do you decide to run an async-able method concurrently on top of asynchronously? Should one err on having the new Task created higher or lower in the call tree? Are there any best practices for this type of 'optimization'?
I've been using async in production for a couple of years. There are a few core "best practices" that I recommend:
Don't block on async code. Use async "all the way down". (Corollary: prefer async Task to async void unless you have to use async void).
Use ConfigureAwait(false) wherever possible in your "library" methods.
You've already figured out the "async all the way down" part, and you're at the point that ConfigureAwait(false) becomes useful.
Say you have an async method A that calls another async method B. A updates the UI with the results of B, but B doesn't depend on the UI. So we have:
async Task A()
{
var result = await B();
myUIElement.Text = result;
}
async Task<string> B()
{
var rawString = await SomeOtherStuff();
var result = DoProcessingOnRawString(rawString);
return result;
}
In this example, I would call B a "library" method since it doesn't really need to run in the UI context. Right now, B does run in the UI thread, so DoProcessingOnRawString is causing responsiveness issues.
So, add a ConfigureAwait(false) to every await in B:
async Task<string> B()
{
var rawString = await SomeOtherStuff().ConfigureAwait(false);
var result = DoProcessingOnRawString(rawString);
return result;
}
Now, when B resumes after awaiting SomeOtherStuff (assuming it did actually have to await), it will resume on a thread pool thread instead of the UI context. When B completes, even though it's running on the thread pool, A will resume on the UI context.
You can't add ConfigureAwait(false) to A because A depends on the UI context.
You also have the option of explicitly queueing tasks to the thread pool (await Task.Run(..)), and you should do this if you have particular CPU-intensive functionality. But if your performance is suffering from "thousands of paper cuts", you can use ConfigureAwait(false) to offload a lot of the async "housekeeping" onto the thread pool.
You may find my intro post helpful (it goes into more of the "why's"), and the async FAQ also has lots of great references.
Async-await does not actually use threads in the current .NET process-space. it is designed for "blocking" IO and network operations, like database calls, web requests, some file IO.
I cannot perceive what advantage there would be in C# to what you call the rabbit-hole technique. Doing so only obscures the code and unnecessarily couples your potentially high-cpu code to your IO code.
To answer your question directly, I would only use async-await for the aforementioned IO/network scenarios, right at the point where you are doing the blocking operations, and for anything that was CPU bound I would use threading techniques to make the best use of the available CPU cores. No need to mix the two concerns.