Considerations for not awaiting a Task in an asynchronous method - c#

I'm working on a Web API project which uses Azure's managed cache service to cache database results in memory to improve response times and alleviate duplicate traffic to the database. When attempting to put a new item in the cache, occasionally a cache-specific exception will be thrown with a code of DataCacheErrorCode.RetryLater. Naturally, in order to retry later without needing to block on this method I made it async and await Task.Delay to try again a short time later. Previously a developer had hardcoded a Thread.Sleep in there that was really hurting application performance.
The method signature looks something similar to this now:
public static async Task Put(string cacheKey, object obj)
Following this change I get ~75 compiler warnings from all the other places in the application that called the formerly synchronous version of Put indicating:
Because this call is not awaited, execution of the current method continues before the call is completed. Consider applying the 'await' operator to the result of the call.
In this case, since Put doesn't return anything, it makes sense to me to let this operation fire-and-forget as I don't see any reason to block execution of the method that called it. I'm just wondering if there are any dangers or pitfalls for allowing a lot of these fire-and-forget Tasks running in the background as Put can be called quite often. Or should I await anyway since 99% of the time I won't get the retry error and the Task will finish almost immediately. I just want to make sure that I'm not incurring any penalties for having too many threads (or something like that).

If there is a chance Put will throw any other exception for any kind of reason, and you don't use await Put each time you're inserting an object to the cache, the exceptions will be swallowed inside the returned Task which isn't being awaited. If you're on .NET 4.0, this exception will be re-thrown inside the Finalizer of that Task.. If you're using .NET 4.5, it will simply be ignored (and that might not be desirable).
Want to make sure that I'm not incurring any penalties for having too
many threads or something like that.
Im just saying this to make things clear. When you use Task.Delay, you aren't spinning any new threads. A Task isn't always equal to a new thread being spun. Specifically here, Task.Delay internally uses a Timer, so there aren't any thread overheads (except for the thread which is currently being delayed if you do use await).

The warning is telling you that you're getting fire and forget behavior in a location where you may not actually want to fire and forget. If you really do want to fire and forget and don't have a problem continuing on without ever knowing when the operation finishes, or if it even finished successfully, then you can safely ignore the warning.

One negative result of releasing a task to run unawaited is the compiler warnings themselves - 75 compiler warnings are a problem in and of themselves and they hide real warnings.
If you want to signal the compiler that you're intentionally not doing anything with the result of the task, you can use a simple extension method that does nothing, but satisfies the compiler's desire for explicitness.
// Do nothing.
public static void Release(this Task task)
{
}
Now you can call
UpdateCacheAsync(data).Release();
without any compiler warnings.
https://gist.github.com/lisardggY/396aaca7b70da1bbc4d1640e262e990a

Recommended ASP.NET way is
HostingEnvironment.QueueBackgroundWorkItem(WorkItem);
...
async Task WorkItem(CancellationToken cancellationToken)
{
try { await ...} catch (Exception e) { ... }
}
BTW Not catching/re-throw on thread other then ASP.NET thread can cause server process to be crashed/restarted

Related

Usage of ConfigureAwait in .NET

I've read about ConfigureAwait in various places (including SO questions), and here are my conclusions:
ConfigureAwait(true): Runs the rest of the code on the same thread the code before the await was run on.
ConfigureAwait(false): Runs the rest of the code on the same thread the awaited code was run on.
If the await is followed by a code that accesses the UI, the task should be appended with .ConfigureAwait(true). Otherwise, an InvalidOperationException will occur due to another thread accessing UI elements.
My questions are:
Are my conclusions correct?
When does ConfigureAwait(false) improves performance, and when it doesn't?
If writing for a GUI application, but the next lines doesn't access the UI elements. Should I use ConfigureAwait(false) or ConfigureAwait(true) ?
To answer your questions more directly:
ConfigureAwait(true): Runs the rest of the code on the same thread the code before the await was run on.
Not necessarily the same thread, but the same synchronization context. The synchronization context can decide how to run the code. In a UI application, it will be the same thread. In ASP.NET, it may not be the same thread, but you will have the HttpContext available, just like you did before.
ConfigureAwait(false): Runs the rest of the code on the same thread the awaited code was run on.
This is not correct. ConfigureAwait(false) tells it that it does not need the context, so the code can be run anywhere. It could be any thread that runs it.
If the await is followed by a code that accesses the UI, the task should be appended with .ConfigureAwait(true). Otherwise, an InvalidOperationException will occur due to another thread accessing UI elements.
It is not correct that it "should be appended with .ConfigureAwait(true)". ConfigureAwait(true) is the default. So if that's what you want, you don't need to specify it.
When does ConfigureAwait(false) improves performance, and when it doesn't?
Returning to the synchronization context might take time, because it may have to wait for something else to finish running. In reality, this rarely happens, or that waiting time is so minuscule that you'd never notice it.
If writing for a GUI application, but the next lines doesn't access the UI elements. Should I use ConfigureAwait(false) or ConfigureAwait(true) ?
You could use ConfigureAwait(false), but I suggest you don't, for a few reasons:
I doubt you would notice any performance improvement.
It can introduce parallelism that you may not expect. If you use ConfigureAwait(false), the continuation can run on any thread, so you could have problems if you're accessing non-thread-safe objects. It is not common to have these problems, but it can happen.
You (or someone else maintaining this code) may add code that interacts with the UI later and exceptions will be thrown. Hopefully the ConfigureAwait(false) is easy to spot (it could be in a different method than where the exception is thrown) and you/they know what it does.
I find it's easier to not use ConfigureAwait(false) at all (except in libraries). In the words of Stephen Toub (a Microsoft employee) in the ConfigureAwait FAQ:
When writing applications, you generally want the default behavior (which is why it is the default behavior).
Edit: I've written an article of my own on this topic: .NET: Don’t use ConfigureAwait(false)
ConfigureAwait(false) may improve performance if there are not many worker threads available and if the thread that it would need to wait for is constantly busy.
ConfigureAwait(false) is recommended everywhere where coming back to same SynchronizationContext (which usualy is linked with thread) is not needed, especially in libraries that awaits something internally: https://medium.com/bynder-tech/c-why-you-should-use-configureawait-false-in-your-library-code-d7837dce3d7f.
ConfigureAwait(true) (which is the default) is needed when you require same context but may also lead to a dead lock in certain situations.
Consider this code:
void Main()
{
// creating a windows form attaches a synchronization context to the current thread
new System.Windows.Forms.Form();
var task = DoSth();
Console.WriteLine(task.Result);
}
async Task<int> DoSth()
{
await Task.Delay(1000);
return 1;
}
in this example because of not awaited task DoSth, the main UI thread is blocked by waiting for task.Result - at the same time DoSth is blocked because it wants to come back to the UI thread after a delay. This will lead to a deadlock and this code will never execute to the end. Adding .ConfigureAwait(false) solves the problem in this case.
Using ConfigureAwait(false) in application code is normally not going to boost your application's performance in any meaningful way, because normally you don't await inside loops in application code. For example lets consider the case that your app has a button, and an async operation is started everytime the user clicks the button, and the async operation includes a single await. By typing the 22 characters .ConfigureAwait(false) after this await you have already lost comparable time of your life, with the time you can hope to save from 10 users that click this button once every minute, 8 hours per day, for 20 years each (~35,000,000 context switchings in total = some seconds of CPU processing time).
And this before taking into account the time you need to think about whether you can safely include this configuration (depending on whether the continuation contains UI-related code), the time you'll need in order to reconfirm you previous assessment every time you have to maintain/modify the code, and the time you'll lose on debugging in case your assessment was wrong.
On the other hand if your Button_Click handler contains code like this:
private async void Button_Click(object sender, EventArgs e)
{
var client = new WebClient();
using var stream = await client.OpenReadTaskAsync("someUrl");
var buffer = new byte[1024];
while ((await stream.ReadAsync(buffer, 0, buffer.Length)) > 0)
{
//...
}
}
...then by all means do spend the extra time to ConfigureAwait(false) the ReadAsync task. Also do consider refactoring the code by moving the stream-reading part to a separate asynchronous method, so that you can safely access UI elements anywhere inside the Button_Click handler, without been distracted by technicalities that don't belong to this layer of the app.

Difference between await and using discards

What is the difference between the below two lines.
await SkillReporterDatabase.Database.SaveAsync(someObject);
_ = SkillReporterDatabase.Database.SaveAsync(someObject);
Which one is preferred? Sometimes when I face some issue with await, I use _ = and it solves the problem. I couldn't see any resources online which explains the difference between these two.
The difference is that the discard (_) doesn't care about what happens in SaveAsync once it becomes asynchronous, which it presumably does do; this has two important side effects:
you won't know if the save failed
if you perform any other operations via Database, you're probably going to be running overlapped operations on a single context/connection, which is not usually a supported scenario
So in this case, await is probably preferred. There are times when it is OK to discard a task, but that usually means when you start something in the background that has no further interaction with the current flow.
Without the await, later operations will not be blocked by the SaveAsync call and will therefore run concurrently. The discard is just saving the Task (a task is conceptually a bit like a progress bar), not the result of the Task the way the awaited call is.
so SaveAsync returns a Task (like a promise in JavaScript). Calling await on that Task will block until the task completes and returns a result. Not calling await and instead just throwing away the Task is like throwing away a pointer in C++ -- your program will start the task and then forget about it -- it may still complete but the code in this method will never find out whether it does or not.

Use await for background threads or not?

I'm currently getting into the async/await keywords, and went through the following questions: When correctly use Task.Run and when just async-await and Async/Await vs Threads
However even the second link doesn't answer my question, which is when to simply use
Task.Run(...)
versus
await Task.Run(...)
Is it situational or is there something to be gained by using await (and thus returning to the caller)?
The code Task.Run(...) (in both examples) sends a delegate to the thread pool and returns a task that will contain the results of that delegate. These "results" can be an actual return value, or it can be an exception. You can use the returned task to detect when the results are available (i.e., when the delegate has completed).
So, you should make use of the Task if any of the following are true:
You need to detect when the operation completed and respond in some way.
You need to know if the operation completed successfully.
You need to handle exceptions from the operation.
You need to do something with the return value of the operation.
If your calling code needs to do any of those, then use await:
await Task.Run(...);
With long-running background tasks, sometimes you want to do those (e.g., detect exceptions), but you don't want to do it right away; in this case, just save the Task and then await it at some other point in your application:
this.myBackgroundTask = Task.Run(...);
If you don't need any of the above, then you can do a "fire and forget", as such:
var _ = Task.Run(...); // or just "Task.Run(...);"
Note that true "fire and forget" applications are extremely rare. It's literally saying "run this code, but I don't care whether it completes, when it completes, or whether it was successful".
You can use Task.Run() when handling logic with fire and forget type, similar to invoking events if someone is subscribed to them. You can use this for logging, notifying, etc.
If you depend of the result or actions executed in your method, you need to use await Task.Run() as it pauses current execution until your task is finished.

Using ThreadPool and Task.Wait inside an Async/Await Method

I just encountered this code. I immediately started cringing and talking to myself (not nice things). The thing is I don't really understand why and can't reasonably articulate it. It just looks really bad to me - maybe I'm wrong.
public async Task<IHttpActionResult> ProcessAsync()
{
var userName = Username.LogonName(User.Identity.Name);
var user = await _user.GetUserAsync(userName);
ThreadPool.QueueUserWorkItem((arg) =>
{
Task.Run(() => _billing.ProcessAsync(user)).Wait();
});
return Ok();
}
This code looks to me like it's needlessly creating threads with ThreadPool.QueueUserWorkItem and Task.Run. Plus, it looks like it has the potential to deadlock or create serious resource issues when under heavy load. Am I correct?
The _billing.ProcessAsync() method is awaitable(async), so I would expect that a simple "await" keyword would be the right thing to do and not all this other baggage.
I believe Scott is correct with his guess that ThreadPool.QueueUserWorkItem should have been HostingEnvironment.QueueBackgroundWorkItem. The call to Task.Run and Wait, however, are entirely nonsensical - they're pushing work to the thread pool and blocking a thread pool thread on it, when the code is already on the thread pool.
The _billing.ProcessAsync() method is awaitable(async), so I would expect that a simple "await" keyword would be the right thing to do and not all this other baggage.
I strongly agree.
However, this will change the behavior of the action. It will now wait until Billing.ProcessAsync is completed, whereas before it would return early. Note that returning early on ASP.NET is almost always a mistake - I would say any "billing" processing would be even more certainly a mistake. So, replacing this mess with await will make the app more correct, but it will cause the ProcessAsync action to take longer to return to the client.
It's strange, but depending on what the author is trying to achieve, it seems ok to me to queue a work item in the thread pool from inside an async method.
This is not as starting a thread, it's just queueing an action to be done in a ThreadPool's thread when there is a free one. So the async method (ProcessAsync) can continue and don't need to care about the result.
The weird thing is the code inside the lambda to be enqueued in the ThreadPool. Not only the Task.Run() (which is superflous and just causes unnecessary overhead), but to call an async method without waiting for it to finish is bad inside a method that should be run by the ThreadPool, because it returns the control flow to the caller when awaiting something.
So the ThreadPool eventually thinks this method is finished (and the thread free for the next action in the queue), while actually the method wants to be resumed later.
This may lead to very undefined behaviour. This code may have been working (in certain circumstances), but I would not rely on it and use it as productive code.
(The same goes for calling a not-awaited async method inside Task.Run(), as the Task "thinks" it's finished while the method actually wants to be resumed later).
As solution I'd propose to simply await that async method, too:
await _billing.ProcessAsync(user);
But of course without any knowledge about the context of the code snippet I can't guarantee anything. Note that this would change the behaviour: while until now the code did not wait for _billing.ProcessAsync() to finsih, it would now do. So maybe leaving out await and just fire and forget
_billing.ProcessAsync(user);
maybe good enough, too.

Can many instances of an async task share a reference to a concurrent collection and add items concurrently to it in C#?

I'm just beginning to learn C# threading and concurrent collections, and am not sure of the proper terminology to pose my question, so I'll describe briefly what I'm trying to do. My grasp of the subject is rudimentary at best at this point. Is my approach below even feasible as I've envisioned it?
I have 100,000 urls in a Concurrent collection that must be tested--is the link still good? I have another concurrent collection, initially empty, that will contain the subset of urls that an async request determines to have been moved (400, 404, etc errors).
I want to spawn as many of these async requests concurrently as my PC and our bandwidth will allow, and was going to start at 20 async-web-request-tasks per second and work my way up from there.
Would it work if a single async task handled both things: it would make the async request and then add the url to the BadUrls collection if it encountered a 4xx error? A new instance of that task would be spawned every 50ms:
class TestArgs args {
ConcurrentBag<UrlInfo> myCollection { get; set; }
System.Uri currentUrl { get; set; }
}
ConcurrentQueue<UrlInfo> Urls = new ConncurrentQueue<UrlInfo>();
// populate the Urls queue
<snip>
// initialize the bad urls collection
ConcurrentBag<UrlInfo> BadUrls = new ConcurrentBag<UrlInfo>();
// timer fires every 50ms, whereupon a new args object is created
// and the timer callback spawns a new task; an autoEvent would
// reset the timer and dispose of it when the queue was empty
void SpawnNewUrlTask(){
// if queue is empty then reset the timer
// otherwise:
TestArgs args = {
myCollection = BadUrls,
currentUrl = getNextUrl() // take an item from the queue
};
Task.Factory.StartNew( asyncWebRequestAndConcurrentCollectionUpdater, args);
}
public async Task asyncWebRequestAndConcurrentCollectionUpdater(TestArgs args)
{
//make the async web request
// add the url to the bad collection if appropriate.
}
Feasible? Way off?
The approach seems fine, but there are some issues with the specific code you've shown.
But before I get to that, there have been suggestions in the comments that Task Parallelism is the way to go. I think that's misguided. There's a common misconception that if you want to have lots of work going on in parallel, you necessarily need lots of threads. That's only true if the work is compute-bound. But the work you're doing will be IO bound - this code is going to spend the vast majority of its time waiting for responses. It will do very little computation. So in practice, even if it only used a single thread, your initial target of 20 requests per second doesn't seem like a workload that would cause a single CPU core to break into a sweat.
In short, a single thread can handle very high levels of concurrent IO. You only need multiple threads if you need parallel execution of code, and that doesn't look likely to be the case here, because there's so little work for the CPU in this particular job.
(This misconception predates await and async by years. In fact, it predates the TPL - see http://www.interact-sw.co.uk/iangblog/2004/09/23/threadless for a .NET 1.1 era illustration of how you can handle thousands of concurrent requests with a tiny number of threads. The underlying principles still apply today because Windows networking IO still basically works the same way.)
Not that there's anything particularly wrong with using multiple threads here, I'm just pointing out that it's a bit of a distraction.
Anyway, back to your code. This line is problematic:
Task.Factory.StartNew( asyncWebRequestAndConcurrentCollectionUpdater, args);
While you've not given us all your code, I can't see how that will be able to compile. The overloads of StartNew that accept two arguments require the first to be either an Action, an Action<object>, a Func<TResult>, or a Func<object,TResult>. In other words, it has to be a method that either takes no arguments, or accepts a single argument of type object (and which may or may not return a value). Your 'asyncWebRequestAndConcurrentCollectionUpdater' takes an argument of type TestArgs.
But the fact that it doesn't compile isn't the main problem. That's easily fixed. (E.g., change it to Task.Factory.StartNew(() => asyncWebRequestAndConcurrentCollectionUpdater(args));) The real issue is what you're doing is a bit weird: you're using Task.StartNew to invoke a method that already returns a Task.
Task.StartNew is a handy way to take a synchronous method (i.e., one that doesn't return a Task) and run it in a non-blocking way. (It'll run on the thread pool.) But if you've got a method that already returns a Task, then you didn't really need to use Task.StartNew. The weirdness becomes more apparent if we look at what Task.StartNew returns (once you've fixed the compilation error):
Task<Task> t = Task.Factory.StartNew(
() => asyncWebRequestAndConcurrentCollectionUpdater(args));
That Task<Task> reveals what's happening. You've decided to wrap a method that was already asynchronous with a mechanism that is normally used to make non-asynchronous methods asynchronous. And so you've now got a Task that produces a Task.
One of the slightly surprising upshots of this is that if you were to wait for the task returned by StartNew to complete, the underlying work would not necessarily be done:
t.Wait(); // doesn't wait for asyncWebRequestAndConcurrentCollectionUpdater to finish!
All that will actually do is wait for asyncWebRequestAndConcurrentCollectionUpdater to return a Task. And since asyncWebRequestAndConcurrentCollectionUpdater is already an async method, it will return a task more or less immediately. (Specifically, it'll return a task the moment it performs an await that does not complete immediately.)
If you want to wait for the work you've kicked off to finish, you'll need to do this:
t.Result.Wait();
or, potentially more efficiently, this:
t.Unwrap().Wait();
That says: get me the Task that my async method returned, and then wait for that. This may not be usefully different from this much simpler code:
Task t = asyncWebRequestAndConcurrentCollectionUpdater("foo");
... maybe queue up some other tasks ...
t.Wait();
You may not have gained anything useful by introducing `Task.Factory.StartNew'.
I say "may" because there's an important qualification: it depends on the context in which you start the work. C# generates code which, by default, attempts to ensure that when an async method continues after an await, it does so in the same context in which the await was initially performed. E.g., if you're in a WPF app and you await while on the UI thread, when the code continues it will arrange to do so on the UI thread. (You can disable this with ConfigureAwait.)
So if you're in a situation in which the context is essentially serialized (either because it's single-threaded, as will be the case in a GUI app, or because it uses something resembling a rental model, e.g. the context of an particular ASP.NET request), it may actually be useful to kick an async task off via Task.Factory.StartNew because it enables you to escape the original context. However, you just made your life harder - tracking your tasks to completion is somewhat more complex. And you might have been able to achieve the same effect simply by using ConfigureAwait inside your async method.
And it may not matter anyway - if you're only attempting to manage 20 requests a second, the minimal amount of CPU effort required to do that means that you can probably manage it entirely adequately on one thread. (Also, if this is a console app, the default context will come into play, which uses the thread pool, so your tasks will be able to run multithreaded in any case.)
But to get back to your question, it seems entirely reasonable to me to have a single async method that picks a url off the queue, makes the request, examines the response, and if necessary, adds an entry to the bad url collection. And kicking the things off from a timer also seems reasonable - that will throttle the rate at which connections are attempted without getting bogged down with slow responses (e.g., if a load of requests end up attempting to talk to servers that are offline). It might be necessary to introduce a cap for the maximum number of requests in flight if you hit some pathological case where you end up with tens of thousands of URLs in a row all pointing to a server that isn't responding. (On a related note, you'll need to make sure that you're not going to hit any per-client connection limits with whichever HTTP API you're using - that might end up throttling the effective throughput.)
You will need to add some sort of completion handling - just kicking off asynchronous operations and not doing anything to handle the results is bad practice, because you can end up with exceptions that have nowhere to go. (In .NET 4.0, these used to terminate your process, but as of .NET 4.5, by default an unhandled exception from an asynchronous operation will simply be ignored!) And if you end up deciding that it is worth launching via Task.Factory.StartNew remember that you've ended up with an extra layer of wrapping, so you'll need to do something like myTask.Unwrap().ContinueWith(...) to handle it correctly.
Of course you can. Concurrent collections are called 'concurrent' because they can be used... concurrently by multiple threads, with some warranties about their behaviour.
A ConcurrentQueue will ensure that each element inserted in it is extracted exactly once (concurrent threads will never extract the same item by mistake, and once the queue is empty, all the items have been extracted by a thread).
EDIT: the only thing that could go wrong is that 50ms is not enough to complete the request, and so more and more tasks cumulate in the task queue. If that happens, your memory could get filled, but the thing would work anyway. So yes, it is feasible.
Anyway, I would like to underline the fact that a task is not a thread. Even if you create 100 tasks, the framework will decide how many of them will be actually executed concurrently.
If you want to have more control on the level of parallelism, you should use asynchronous requests.
In your comments, you wrote "async web request", but I can't understand if you wrote async just because it's on a different thread or because you intend to use the async API.
If you were using the async API, I'd expect to see some handler attached to the completion event, but I couldn't see it, so I assumed you're using synchronous requests issued from an asynchronous task.
If you're using asynchronous requests, then it's pointless to use tasks, just use the timer to issue the async requests, since they are already asynchronous.
When I say "asynchronous request" I'm referring to methods like WebRequest.GetResponseAsync and WebRequest.BeginGetResponse.
EDIT2: if you want to use asynchronous requests, then you can just make requests from the timer handler. The BeginGetResponse method takes two arguments. The first one is a callback procedure, that will be called to report the status of the request. You can pass the same procedure for all the requests. The second one is an user-provided object, which will store status about the request, you can use this argument to differentiate among different requests. You can even do it without the timer. Something like:
private readonly int desiredConcurrency = 20;
struct RequestData
{
public UrlInfo url;
public HttpWebRequest request;
}
/// Handles the completion of an asynchronous request
/// When a request has been completed,
/// tries to issue a new request to another url.
private void AsyncRequestHandler(IAsyncResult ar)
{
if (ar.IsCompleted)
{
RequestData data = (RequestData)ar.AsyncState;
HttpWebResponse resp = data.request.EndGetResponse(ar);
if (resp.StatusCode != 200)
{
BadUrls.Add(data.url);
}
//A request has been completed, try to start a new one
TryIssueRequest();
}
}
/// If urls is not empty, dequeues a url from it
/// and issues a new request to the extracted url.
private bool TryIssueRequest()
{
RequestData rd;
if (urls.TryDequeue(out rd.url))
{
rd.request = CreateRequestTo(rd.url); //TODO implement
rd.request.BeginGetResponse(AsyncRequestHandler, rd);
return true;
}
else
{
return false;
}
}
//Called by a button handler, or something like that
void StartTheRequests()
{
for (int requestCount = 0; requestCount < desiredConcurrency; ++requestCount)
{
if (!TryIssueRequest()) break;
}
}

Categories

Resources