I have always pondered about this.
Let's say we have a simple asynchronous web request using the HttpWebRequest class
class webtest1
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("www.google.com");
public webtest1()
{
this.StartWebRequest();
}
void StartWebRequest()
{
webRequest.BeginGetResponse(new AsyncCallback(FinishWebRequest), null);
}
void FinishWebRequest(IAsyncResult result)
{
webRequest.EndGetResponse(result);
}
}
The same can be achieved easily with a synchronous operation:
class webtest1
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("www.google.com");
public webtest1()
{
webRequest.GetResponse();
}
}
So why would I want to use the more convoluted async operation when a much simpler sync operation would suffice? To save system resources?
If you make an asynchronous request you can do other things while you wait for the response to your request. If you make a synchronous request, you have to wait until you recieve your response until you can do something else.
For simple programs and scripts it may not matter so much, in fact in many of those situations the easier to code and understand synchronous method would be a better design choice.
However, for non-trivial programs, such as a desktop application, a synchronous request which locks up the entire application until the request is finished causes an unacceptable user expierence.
A synchronous operation will prevent you from doing anything else while waiting for the request to complete or time out. Using an asynchronous operation would let you animate something for the user to show the program is busy, or even let them carry on working with other areas of functionality.
The synchronous version is simpler to code but it masks a very serious problem. Network communication, or really an I/O operation, can block and for extended periods of time. Many network connections for example have a timeout of 2 minutes.
Doing a network operation synchronously means your application and UI will block for the entire duration of that operation. A not uncommon network hiccup could cause your app to block for several minutes with no ability to cancel. This leads to very unhappy customers.
Asynchronous becomes especially useful when you have more things going on than you have cores - for example, you might have a number of active web requests, a number of file access operations, and few DB calls, and maybe some other network operations (WCF or redis maybe). If all of those are synchronous, you are creating a lot of threads, a lot of stacks, and suffering a lot of context switches. If you can use an asynchronous API you can usually exploit pool threads for the brief moments when each operation is doing something. This is great for high throughput server environments. Having multiple cores is great, but being efficient is better.
In C# 5 this becomes, via await, no more work than your second example.
I was reading this the other day and a similar question has been pondered before:
Performance difference between Synchronous HTTP Handler and Asynchronous HTTP Handler
1) You are stuck to a single-threaded environment such as silverlight. Here you have no choice but to use async calls or the entire user thread will lock up.
2) You have many calls that take a long time to process. Why block your entire thread when it can go on and do other things while waiting for the return? For example if I have five function calls that each take 5 seconds, I would like to start all of them right away and have them return as necessary.
3) Too much data to process in the output synchronously. If I have a program that writes 10 gigabytes of data to the console and I want to read the output, I have a chance asynchronously to process line by line. If I do this synchronously then I will run out of buffer space and lock up the program.
Related
I wanted to ask you about async/await. Namely, why does it always need to be used? (all my friends say so)
Example 1.
public async Task Boo()
{
await WriteInfoIntoFile("file.txt");
some other logic...
}
I have a Boo method, inside which I write something to files and then execute some logic. Asynchrony is used here so that the stream does not stop while the information is being written to the file. Everything is logical.
Example 2.
public async Task Bar()
{
var n = await GetNAsync(nId);
_uow.NRepository.Remove(n);
await _uow.CompleteAsync();
}
But for the second example, I have a question. Why here asynchronously get the entity, if without its presence it will still be impossible to work further?
why does it always need to be used?
It shouldn't always be used. Ideally (and especially for new code), it should be used for most I/O-based operations.
Why here asynchronously get the entity, if without its presence it will still be impossible to work further?
Asynchronous code is all about freeing up the calling thread. This brings two kinds of benefits, depending on where the code is running.
If the calling thread is a UI thread inside a GUI application, then asynchrony frees up the UI thread to handle user input. In other words, the application is more responsive.
If the calling thread is a server-side thread, e.g., an ASP.NET request thread, then asynchrony frees up that thread to handle other user requests. In other words, the server is able to scale further.
Depending on the context, you might or might not get some benefit. In case you call the second function from a desktop application, it allows the UI to stay responsive while the async code is being executed.
Why here asynchronously get the entity, if without its presence it will still be impossible to work further?
You are correct in the sense that this stream of work cannot proceed, but using async versions allows freeing up the thread to do other work:
I like this paragraph from Using Asynchronous Methods in ASP.NET MVC 4 to explain the benefits:
Processing Asynchronous Requests
In a web app that sees a large number of concurrent requests at start-up or has a bursty load (where concurrency increases suddenly), making web service calls asynchronous increases the responsiveness of the app. An asynchronous request takes the same amount of time to process as a synchronous request. If a request makes a web service call that requires two seconds to complete, the request takes two seconds whether it's performed synchronously or asynchronously. However during an asynchronous call, a thread isn't blocked from responding to other requests while it waits for the first request to complete. Therefore, asynchronous requests prevent request queuing and thread pool growth when there are many concurrent requests that invoke long-running operations.
Not sure what you mean by
without its presence it will still be impossible to work further
regarding example 2. As far as I can tell this code gets an entity by id from its repository asynchronously, removes it, then completes the transaction on its Unit of Work. Do you mean why it does not simply remove the entry by id? That would certainly be an improvement, but would still leave you with an asynchronous method as CompleteAsync is obviously asynchronous?
As to your general question, I don't think there is a general concensus to always use async/await.
In your second example there with the async/await keywords you are getting the value of the n variable asynchronously. This might be necessary because the GetNAsync method is likely performing some time-consuming operation, such as querying a database or perhaps you might be calling a webservice downstream, that could block the main thread of execution. By calling the method asynchronously, the rest of the code in the Bar method can continue to run while the query is being performed in the background.
But if in the GetNAsync you are just calling another method locally that is doing some basic CPU bound task then the async is pointless in my view. Aync works well when you are sure you need to wait such as network calls or I/O bound calls that will definitely add latency to your stack.
So I have been trying to get the grasp for quite some time now but couldn't see the sense in declaring every controller-endpoint as an async method.
Let's look at a GET-Request to visualize the question.
This is my way to go with simple requests, just do the work and send the response.
[HttpGet]
public IActionResult GetUsers()
{
// Do some work and get a list of all user-objects.
List<User> userList = _dbService.ReadAllUsers();
return Ok(userList);
}
Below is the async Task<IActionResult> option I see very often, it does the same as the method above but the method itself is returning a Task. One could think, that this one is better because you can have multiple requests coming in and they get worked on asynchronously BUT I tested this approach and the approach above with the same results. Both can take multiple requests at once. So why should I choose this signature instead of the one above? I can only see the negative effects of this like transforming the code into a state-machine due to being async.
[HttpGet]
public async Task<IActionResult> GetUsers()
{
// Do some work and get a list of all user-objects.
List<User> userList = _dbService.ReadAllUsers();
return Ok(userList);
}
This approach below is also something I don't get the grasp off. I see a lot of code having exactly this setup. One async method they await and then returning the result. Awaiting like this makes the code sequential again instead of having the benefits of Multitasking/Multithreading. Am I wrong on this one?
[HttpGet]
public async Task<IActionResult> GetUsers()
{
// Do some work and get a list of all user-objects.
List<User> userList = await _dbService.ReadAllUsersAsync();
return Ok(userList);
}
It would be nice if you could enlighten me with facts so I can either continue developing like I do right now or know that I have been doing it wrong due to misunderstanding the concept.
Please read the "Synchronous vs. Asynchronous Request Handling" section of Intro to Async/Await on ASP.NET.
Both can take multiple requests at once.
Yes. This is because ASP.NET is multithreaded. So, in the synchronous case, you just have multiple threads invoking the same action method (on different controller instances).
For non-multithreaded platforms (e.g., Node.js), you have to make the code asynchronous to handle multiple requests in the same process. But on ASP.NET it's optional.
Awaiting like this makes the code sequential again instead of having the benefits of Multitasking/Multithreading.
Yes, it is sequential, but it's not synchronous. It's sequential in the sense that the async method executes one statement at a time, and that request isn't complete until the async method completes. But it's not synchronous - the synchronous code is also sequential, but it blocks a thread until the method completes.
So why should I choose this signature instead of the one above?
If your backend can scale, then the benefit of asynchronous action methods is scalability. Specifically, asynchronous action methods yield their thread while the asynchronous operation is in progress - in this case, GetUsers is not taking up a thread while the database is performing its query.
The benefit can be hard to see in a testing environment, because your server has threads to spare, so there's no observable difference between calling an asynchronous method 10 times (taking up 0 threads) and calling a synchronous method 10 times (taking up 10 threads, with another 54 to spare). You can artificially restrict the number of threads in your ASP.NET server and then do some tests to see the difference.
In a real-world server, you usually want to make it as asynchronous as possible so that your threads are available for handling other requests. Or, as described here:
Bear in mind that asynchronous code does not replace the thread pool. This isn’t thread pool or asynchronous code; it’s thread pool and asynchronous code. Asynchronous code allows your application to make optimum use of the thread pool. It takes the existing thread pool and turns it up to 11.
Bear in mind the "if" above; this particularly applies to existing code. If you have just a single SQL server backend, and if pretty much all your actions query the db, then changing them to be asynchronous may not be useful, since the scalability bottleneck is usually the db server and not the web server. But if your web app could use threads to handle non-db requests, or if your db backend is scalable (NoSQL, SQL Azure, etc), then changing action methods to be asynchronous would likely help.
For new code, I recommend asynchronous methods by default. Asynchronous makes better use of server resources and is more cloud-friendly (i.e., less expensive for pay-as-you-go hosting).
If your DB Service class has an async method for getting the user then you should see benefits. As soon as the request goes out to the DB then it is waiting on a network or disk response back from the service at the other end. While it is doing this the thread can be freed up to do other work, such as servicing other requests. As it stands in the non-async version the thread will just block and wait for a response.
With async controller actions you can also get a CancellationToken which will allow you to quit early if the token is signalled because the client at the other end has terminated the connection (but that may not work with all web servers).
[HttpGet]
public async Task<IActionResult> GetUsers(CancellationToken ct)
{
ct.ThrowIfCancellationRequested();
// Do some work and get a list of all user-objects.
List<User> userList = await _dbService.ReadAllUsersAsync(ct);
return Ok(userList);
}
So, if you have an expensive set of operations and the client disconnects, you can stop processing the request because the client won't ever receive it. This frees up the application's time and resources to deal with requests where the client is interested in a result coming back.
However, if you have a very simple action that requires no async calls then I probably wouldn't worry about it, and leave it as it is.
You are missing something fundamental here.
When you use async Task<> you are effectively saying, "run all the I/O code asynchronously and free up processing threads to... process".
At scale, your app will be able to serve more requests/second because your I/O is not tying up your CPU intensive work and vice versa.
You're not seeing much benefit now, because you're probably testing locally with plenty of resources at hand.
See
Understanding CPU and I/O bound for asynchronous operations
Async in depth
As you know, ASP.NET is based on a multi-threaded model of request execution. To put it simply, each request is run in its own thread, contrary to other single thread execution approaches like the event-loop in nodejs.
Now, threads are a finite resource. If you exhaust your thread pool (available threads) you can't continue handling tasks efficiently (or in most cases at all). If you use synchronous execution for your action handlers, you occupy those threads from the pool (even if they don't need to be). It is important to understand that those threads handle requests, they don't do input/output. I/O is handled by separate processes which are independent of the ASP.NET request threads. So, if in your action you have a DB fetch operation that takes let's say 15 seconds to execute, you're forcing your current thread to wait idle 15 seconds for the DB result to be returned so it can continue executing code. Therefore, if you have 50 such requests, you'll have 50 threads occupied in basically sleep mode. Obviously, this is not very scalable and becomes a problem soon enough. In order to use your resources efficiently, it is a good idea to preserve the state of the executing request when an I/O operation is reached and free the thread to handle another request in the meantime. When the I/O operation completes, the state is reassembled and given to a free thread in the pool to resume the handling. That's why you use async handling. It lets you use your finite resources more efficiently and prevents such thread starvation. Of course, it is not an ultimate solution. It helps you scale your application for higher load. If you don't have the need for it, don't use it as it just adds overhead.
The result returns a task from an asynchronous (non-blocking) operation which represents some work that should be done. The task can tell you if the work is completed and if the operation returns a result, the task gives you the result which won't be available until the task is completed. You can learn more about C# asynchronous programming from the official Microsoft Docs:
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.task?view=net-5.0
Async operations return a Task(it is not the result, but a promise made that will have the results once the task is completed.
The async methods should be awaited so that it waits till the task is completed. if you await an async method, the return values won't be a task anymore and will be the results you expected.
Imagine this async method:
async Task<SomeResult> SomeMethod()
{
...
}
the following code:
var result = SomeMethod(); // the result is a Task<SomeResult> which may have not completed yet
var result = await SomeMethod(); // the result is type of SomeResult
Now, why do we use async methods instead of sync methods:
Imagine that a Method is doing some job that may take a long time to complete, when it is done sync all other requests will be waiting till the execution of this long taking job to complete, as they are all happening in the same thread. However when it is async, the task will be done in a (possibly another) thread and will not block the other requests.
We are working on a old comparator.
When an user make a research, we are calling 10-30 different webservice (REST, SOAP) at the same time. Pretty classic so. Each webservice is reprensented by a Client in our application.
So the code is like:
//Get the request list of client to call
clientRqListToCall = BuildRequest(userContext);
List<Task> taskList = new List<Task>();
//Call the different client
Foreach (ClientRequest clientRq in clientRqListToCall) {
Task task = Task.Run(() => CallClient(clientRq));
taskList.Add(task);
}
//wait client until timeOut
Task mainWaiterTask = Task.WhenAll(taskList);
mainTask.ConfigureAwait(false);
mainTask.Wait(timeout);
Simple. (Not sure the configureAwait is needed). The response of each client is store in a field of ClientRequest, so we don't use mainTask.Result (if a client Timeout, we need to be able to continue with another's ones, and they timeout a lot! Client call behaviours are pretty similar to a fireandforget).
The application is a little old, our search engine is synchronous. The call of the different webservice are in the different CallClient callTree, according the to research context, 5 to 15 different function are call before the webservice call. Each webservice call is pretty long (1 to 15s each) ! This point seems to be important ! These are not pings simple pings requests.
Actions / Changes ?
So this is an I/O bound problem, we know Task.Run work pretty well for CPU-bound problem and not for I/O, the question is how to make this code better?
We read a lot of different article on the subject, thanks to Stephen Cleary (http://blog.stephencleary.com/2012/07/dont-block-on-async-code.html)
But we arenot sure of our choice / road map, that s why i post this ticket.
We could make the code asynchronous, but we would have to rework the whole CallClient call tree (hundreds of functions). It is the only solution ? Of course we could migrate webservice one by one using bool argument hack (https://msdn.microsoft.com/en-us/magazine/mt238404.aspx).
=> Must we start with the most costly (in term of IO) webservice, or only the number of webservice call is important, and so we should start the easiest?
In others words, if i got 1 bigs client, with a 10s response average and a lot of data, must we start to async then first? Or should be start with littles ones (1-2s) with the same amount of data. I could be wrong but a thread is lock in synchronous way until task.run() finish so obvisouly the 10s Task lock a thread for the whole time, but in term of I/O free a thread asap could be better. Does the amount of data download is important? or should we only thinck in term of webservice timer?
Task.Run use the application threadPool, we have to choice between .Run(...) or Task.Factory.StartNew(..., TaskCreationOptions.LongRunning) in order to (lots of the time),
create new thread and so maybe got a better.
=> i made some test on subjet, using a console application, .Run() seems to be 25% to 33% faster than Task.Factory.StartNew in all scenario.
Of course this is an expected result, but on a webapp with like 200 users,
i am not sure the result would be the same, i fear the pool to be full and the Task jump to each others without be ended.
Note: If startNew is used, WaitAll(timeout) remplace WhenAll.
Today we got in average 20 to 50 customers can make a research at the same time. The application work without big issues, we dont have deadlock, but sometimes we can see some delay in the task execution in our side. Our Cpu usage is pretty low (<10%), Ram is green too (<25%)
I know there is plenty of tickets about Tasks, but it s hard to merge them together to match our problem. And we also read contradictory advices.
I have used Parallel.ForEach to handle multiple I/O operations before, I did not see it mentioned above. I am not sure it will handle quite what you need seeing the function that is passed into the loop is that same for each. Maybe coupled with a strategy pattern / delegates you can achieve what you need.
I have this code (the unimportant details are that it runs on EC2 instances in AWS, processing messages on an SQS queue).
The first statement in the method gets some data over http, the second statement saves state to a local dynamo data store.
public bool HandleMessage(OrderAcceptedMessage message)
{
var order = _orderHttpClient.GetById(message.OrderId);
_localDynamoRepo.SaveAcceptedOrder(message, order);
return true;
}
The performance characteristics are that the http round trip takes 100-200 milliseconds, and the dynamo write takes around 10 milliseconds.
Both of these operations have async versions. We could write it as follows:
public async Task<bool> HandleMessage(OrderAcceptedMessage message)
{
var order = await _orderHttpClient.GetByIdAsync(message.OrderId);
await _localDynamoRepo.SaveAcceptedOrderAsync(message, order);
return true;
}
So the guidance is that since the first operation "could take longer than 50 milliseconds to execute" it should make use of async and await. (1)
But what about the second, fast operation? Which of these two arguments is correct:
Do not make it async: It does not meet the 50ms criterion and it's not worth the overhead.
Do make it async: The overhead has already been paid by the previous operation. There is already task-based asynchrony happening and it's worth using it.
1) http://blog.stephencleary.com/2013/04/ui-guidelines-for-async.html
the unimportant details are that it runs on EC2 instances in AWS, processing messages on an SQS queue
Actually, I think that's an important detail. Because this is not a UI application; it's a server application.
the guidance is that since the first operation "could take longer than 50 milliseconds to execute"
This guidance only applies to UI applications. Thus, the 50ms guideline is meaningless here.
Which of these two arguments is correct
Asynchrony is not about speed. It's about freeing up threads. The 50ms guideline for UI apps is all about freeing up the UI thread. On the server side, async is about freeing up thread pool threads.
The question is how much scalability do you want/need? If your backend is scalable, then I generally recommend async, because that frees up thread pool threads. This makes your web app more scalable and more able to react to changes in load more quickly. But this only gives you a benefit if your backend can scale along with your web app.
First notice that in web apps the biggest cost of async is reduction of productivity. This is what we are weighing the benefits against. You need to think about how much code will be infected if you make this one method async.
The benefit is saving a thread for the duration of the call. A 200ms HTTP call is a pretty good case for async (although it's impossible to say for sure because it also depends on how often you perform the call).
The 50ms criterion is not hard number. In fact that recommendation is for realtime UI apps.
A more useful number is latency times frequency. That tells you how many threads are consumed in the long term average. Infrequent calls do not need to be optimized.
100 dynamo calls per second at 10ms come out at one thread blocked. This is nothing. So this probably is not a good candidate for async.
Of course if you make the first call async you can make the second one async as well at almost no incremental productivity cost because everything is infected already.
You can run the numbers yourself and decide based on that.
This might end up in an opinionated discussion...but let's try.
tl;dr: yes, keep it async.
You are in a library and you don't care about the synchronisation context, so you should not capture it and change your code into:
var order = await _orderHttpClient.GetByIdAsync(message.OrderId).ConfigureAwait(false);
await _localDynamoRepo.SaveAcceptedOrderAsync(message, order).ConfigureAwait(false);
Besides: after the first awaited call, you'll likely end up on a thread of the thread pool. So even if you use the non-async version SaveAcceptedOrder() it will not block. However, this is nothing you should rely on and you don't necessarily know the type of the async method (CPU bound or IO bound = "async by design"). If it is IO bound, there's no need to run it on a thread.
If you're making any remote call, make it async. Yes, DynamoDB calls are fast (except where one has a sub-par hash-key, or many gigabytes of data in a single table), but you're still making them over the internet (even if you're inside AWS EC2 etc), and so you should not ignore any of the Eight Fallacies of Distributed Computing - and especially not 1) The network is reliable or 2) Latency is zero.
I have a method which has just one task to do and has to wait for that task to complete:
public async Task<JsonResult> GetAllAsync()
{
var result = await this.GetAllDBAsync();
return Json(result, JsonRequestBehavior.AllowGet);
}
public async Task<List<TblSubjectSubset>> GetAllDBAsync()
{
return await model.TblSubjectSubsets.ToListAsync();
}
It is significantly faster than when I run it without async-await.
We know
The async and await keywords don't cause additional threads to be
created. Async methods don't require multithreading because an async
method doesn't run on its own thread. The method runs on the current
synchronization context and uses time on the thread only when the
method is active
According to this link: https://msdn.microsoft.com/en-us/library/hh191443.aspx#BKMK_Threads. What is the reason for being faster when we don't have another thread to handle the job?
"Asynchronous" does not mean "faster."
"Asynchronous" means "performs its operation in a way that it does not require a thread for the duration of the operation, thus allowing that thread to be used for other work."
In this case, you're testing a single request. The asynchronous request will "yield" its thread to the ASP.NET thread pool... which has no other use for it, since there are no other requests.
I fully expect asynchronous handlers to run slower than synchronous handlers. This is for a variety of reasons: there's the overhead of the async/await state machine, and extra work when the task completes to have its thread enter the request context. Besides this, the Win32 API layer is still heavily optimized for synchronous calls (expect this to change gradually over the next decade or so).
So, why use asynchronous handlers then?
For scalability reasons.
Consider an ASP.NET server that is serving more than one request - hundreds or thousands of requests instead of a single one. In that case, ASP.NET will be very grateful for the thread returned to it during its request processing. It can immediately use that thread to handle other requests. Asynchronous requests allow ASP.NET to handle more requests with fewer threads.
This is assuming your backend can scale, of course. If every request has to hit a single SQL Server, then your scalability bottleneck will probably be your database, not your web server.
But if your situation calls for it, asynchronous code can be a great boost to your web server scalability.
For more information, see my article on async ASP.NET.
I agree with Orbittman when he mentions the overhead involved in the application architecture. It doesn't make for a very good benchmark premise since you can't be sure if the degradation can indeed be solely attributed to the async vs non-async calls.
I've created a really simple benchmark to get a rough comparison between an async and a synchronous call and async loses every time in the overall timing actually, though the data gathering section always seems to end up the same. Have a look: https://gist.github.com/mattGuima/25cb7893616d6baaf970
Having said that, the same thought regarding the architecture applies. Frameworks handle async calls differently: Async and await - difference between console, Windows Forms and ASP.NET
The main thing to remember is to never confuse async with performance gain, because it is completely unrelated and most often it will result on no gain at all, specially with CPU-bound code. Look at the Parallel library for that instead.
Async await is not the silver bullet that some people think it is and in your example is not required. If you were processing the result of the awaitable operation after you received it then you would be able to return a task and continue on the calling thread. You wouldn't have to then wait for the rest of the operation to complete. You would be correct to remove the async/await in the above code.
It's not really possible to answer the question without seeing the calling code either as it depends on what the context is trying to trying to do with the response. What you are getting back is not just a Task but a task in the context of the method that will continue when complete. See http://codeblog.jonskeet.uk/category/eduasync/ for much better information regarding the inner workings of async/await.
Lastly I would question your timings as with an Ajax request to a database and back there other areas with potentially greater latency, such as the HTTP request and response and the DB connection itself. I assume that you're using an ORM and that alone can cause an overhead. I wonder whether it's the async/await that is the problem.