Should all my actions using IO be async? - c#

As I read the MSDN article Using Asynchronous Methods in ASP.NET MVC 4, I draw the conclusion that I should always use async await for I/O-bound operations.
Consider the following code, where movieManager exposes the async methods of an ORM like Entity Framework.
public class MovieController : Controller
{
// fields and constructors
public async Task<ActionResult> Index()
{
var movies = await movieManager.listAsync();
return View(movies);
}
public async Task<ActionResult> Details(int id)
{
var movie = await movieManager.FindAsync(id);
return View(movie);
}
}
Will this always give me better scalability and/or performance?
How can I measure this?
Why isn't this used in the "real world"?
How about context synchronization?
Is it that bad, that I shouldn't use async I/O in ASP.NET MVC?
I know these are a lot of questions, but literature on this topic has conflicting conclusions. Some say you should always use async for I/O dependent Tasks, others say you shouldn't use async in ASP.NET applications at all.

Will this always give me better scalability and/or performance?
It may. If you only have a single database server as your backend, then your database could be your scalability bottleneck, and in that case scaling your web server won't have any effect in the wider scope of your service as a whole.
How can I measure this?
With load testing. If you want a simple proof-of-concept, you can check out this gist of mine.
Why isn't this used in the "real world" a lot?
It is. Asynchronous request handlers before .NET 4.5 were quite painful to write, and a lot of companies just threw more hardware at the problem instead. Now that .NET 4.5 and async/await are gaining a lot of momentum, asynchronous request handling will continue to be much more common.
How about context synchronization?
It's handled for you by ASP.NET. I have an async intro on my blog that explains how await will capture the current SynchronizationContext when you await a task. In this case it's an AspNetSynchronizationContext that represents the request, so things like HttpContext.Current, culture, etc. all get preserved across await points automatically.
Is it that bad, that I shouldn't use async I/O in ASP.NET MVC?
As a general rule, if you're on .NET 4.5, you should use async to handle any request that requires I/O. If the request is simple (i.e., does not hit a database or call another service), then just keep it synchronous.

Will this always give me better scalability and/or performance?
You answered it yourself, you need to measure and find out. Typically async is something to add later on due to adding complexity, which is the #1 concern in your code base until you have a problem that is specific.
How can I measure this?
Build it both ways, see which is faster (preferably for a large number of operations)
Why isn't this used in the "real world" a lot?
Because complexity is the biggest problem in software development. If code is complex it is more error prone and harder to debug. More, harder to fix bugs is not a good trade off for potential performance advantages.
How about context synchronization?
I am assuming you mean ASP.NET context, if so you should not have any synchronization, make sure only one thread is hitting your context and communicate through it.
Is it that bad, that I shouldn't use async I/O in ASP.NET MVC?
Introducing async just to then have to deal with synchronization is a loss unless you really need the performance.

Putting asynchronous code in a website has a lot of negative sides :
You'll get into trouble when there are dependencies between the pieces of data, as you cannot make that asynchronous.
Asynchronous work is often done for things like API requests. Have you considered that you shouldn't be doing these in a webpage? If the external service goes down, so goes your site. That doesn't scale.
Doing things asynchronously may speed up your site in some cases but you're basically introducing trouble. You always end up waiting for the slowest one, and since sometimes resources just slow down for whatever reason this means that the risk of something slowing down your site increases by a factor equal to the number of asynchronous jobs you're using. You'll have to introduce timeouts to deal with these, then error handling code, etc.
When scaling to multiple webservers because the CPU load is getting too heavy, the asynchronous work will hurt you. Everything you used to put in asynchronous code now fires simultaneously the moment the user clicks a link, and then eases down. This doesn't only apply to CPU load, but also database load and even API requests. You will see a very awful utilization pattern across all system resources: spikes of heavy usage, and then it goes down again. That doesn't scale well. Synchronous code doesn't have this problem: jobs only start after another one is done.
Asynchronous work for websites is a trap: don't go there!
Put your heavy code in a worker (or cron job) that does these things before the user asks for them. You'll have them in a database and you can keep adding features to your site without having to worry about firing too many asynchronous jobs and what not.
Performance for websites is seriously overrated. Sure, it's nice if your page renders in 50ms, but if it takes 250ms people really won't notice (to test this: put a Sleep(200) in your code).
Your code becomes a lot more scalable if you just offload the work to another process and make the website an interface to only your database. Don't make your webserver do heavy work that it shouldn't do, it doesn't scale. You can have a hundred machines spending a total of 1 CPU hour per webpage - but at least it scales in a way where the page still loads in 200ms. Good luck achieving that with asynchronous code.
I would like to add a side-note here. While my opinion on asynchronous code might seem strong, it's mostly an opinion about programmers. Asynchronous code is awesome and can make a performance difference that proves all of the points I outlined wrong. However, it needs a lot of finetuning in your code to avoid the points I mention in this post, and most programmers just can't handle that.

Related

Performance profiling application with non-pattern async calls:

I'm working on an existing large enterprise application. This application has a small asynchronous method framework built in to it's ViewModel base class. These async methods are similar to APM and the event-based asynchronous pattern. There are little bits from both established patterns that were borrowed.
I've been assigned to profile the performance of a particularly slow view in the application. I have been given a license to Redgate ANTS Performance Profiler for the job.
As far as I believe I have read today, ANTS is normally capable of linking async/await calls to the actual work done. However, since this application I am working on does not follow the async/await pattern, I believe I am missing out on this automatic linkage of Async calls to their execution to their completion handlers.
The actual work being performed is done by a service that is central to the application, so there are hundreds of things that are causing this service to perform work, constantly.
Because of this issue, what ANTS is showing me is the worker method being extremely slow, but it is giving me zero feedback on what inside the view is actually causing this slow work to be done.
I spoke to a coworker about this problem, and he told me this is why he doesn't bother with performance profilers. He told me that what he would do is put time stamped logging calls all over the view and then write a quick and dirty tool to filter the data into something consumable by a human. But this is pretty much exactly what the profiler should be doing for me.
We talked about this for a while and concluded that for a tool to be effective with Async calls, it would either have to support a specific standard, or it would have to support something in the actual code, perhaps such as an attribute, that allows you to mark the async call and the completion handler.
Do you agree with what I've said here? If so, are there any such performance profilers for .NET that have custom attributes to annotate your problematic code with for profiling? If not, could you please enlighten me as to how I can interpret this data to determine the actual cause of the issue?
Thank you for any help.

Why shouldn't all functions be async by default?

The async-await pattern of .net 4.5 is paradigm changing. It's almost too good to be true.
I've been porting some IO-heavy code to async-await because blocking is a thing of the past.
Quite a few people are comparing async-await to a zombie infestation and I found it to be rather accurate. Async code likes other async code (you need an async function in order to await on an async function). So more and more functions become async and this keeps growing in your codebase.
Changing functions to async is somewhat repetitive and unimaginative work. Throw an async keyword in the declaration, wrap the return value by Task<> and you're pretty much done. It's rather unsettling how easy the whole process is, and pretty soon a text-replacing script will automate most of the "porting" for me.
And now the question.. If all my code is slowly turning async, why not just make it all async by default?
The obvious reason I assume is performance. Async-await has a its overhead and code that doesn't need to be async, preferably shouldn't. But if performance is the sole problem, surely some clever optimizations can remove the overhead automatically when it's not needed. I've read about the "fast path" optimization, and it seems to me that it alone should take care of most of it.
Maybe this is comparable to the paradigm shift brought on by garbage collectors. In the early GC days, freeing your own memory was definitely more efficient. But the masses still chose automatic collection in favor of safer, simpler code that might be less efficient (and even that arguably isn't true anymore). Maybe this should be the case here? Why shouldn't all functions be async?
First off, thank you for your kind words. It is indeed an awesome feature and I am glad to have been a small part of it.
If all my code is slowly turning async, why not just make it all async by default?
Well, you're exaggerating; all your code isn't turning async. When you add two "plain" integers together, you're not awaiting the result. When you add two future integers together to get a third future integer -- because that's what Task<int> is, it's an integer that you're going to get access to in the future -- of course you'll likely be awaiting the result.
The primary reason to not make everything async is because the purpose of async/await is to make it easier to write code in a world with many high latency operations. The vast majority of your operations are not high latency, so it doesn't make any sense to take the performance hit that mitigates that latency. Rather, a key few of your operations are high latency, and those operations are causing the zombie infestation of async throughout the code.
if performance is the sole problem, surely some clever optimizations can remove the overhead automatically when it's not needed.
In theory, theory and practice are similar. In practice, they never are.
Let me give you three points against this sort of transformation followed by an optimization pass.
First point again is: async in C#/VB/F# is essentially a limited form of continuation passing. An enormous amount of research in the functional language community has gone into figuring out ways to identify how to optimize code that makes heavy use of continuation passing style. The compiler team would likely have to solve very similar problems in a world where "async" was the default and the non-async methods had to be identified and de-async-ified. The C# team is not really interested in taking on open research problems, so that's big points against right there.
A second point against is that C# does not have the level of "referential transparency" that makes these sorts of optimizations more tractable. By "referential transparency" I mean the property that the value of an expression does not depend on when it is evaluated. Expressions like 2 + 2 are referentially transparent; you can do the evaluation at compile time if you want, or defer it until runtime and get the same answer. But an expression like x+y can't be moved around in time because x and y might be changing over time.
Async makes it much harder to reason about when a side effect will happen. Before async, if you said:
M();
N();
and M() was void M() { Q(); R(); }, and N() was void N() { S(); T(); }, and R and S produce side effects, then you know that R's side effect happens before S's side effect. But if you have async void M() { await Q(); R(); } then suddenly that goes out the window. You have no guarantee whether R() is going to happen before or after S() (unless of course M() is awaited; but of course its Task need not be awaited until after N().)
Now imagine that this property of no longer knowing what order side effects happen in applies to every piece of code in your program except those that the optimizer manages to de-async-ify. Basically you have no clue anymore which expressions will be evaluate in what order, which means that all expressions need to be referentially transparent, which is hard in a language like C#.
A third point against is that you then have to ask "why is async so special?" If you're going to argue that every operation should actually be a Task<T> then you need to be able to answer the question "why not Lazy<T>?" or "why not Nullable<T>?" or "why not IEnumerable<T>?" Because we could just as easily do that. Why shouldn't it be the case that every operation is lifted to nullable? Or every operation is lazily computed and the result is cached for later, or the result of every operation is a sequence of values instead of just a single value. You then have to try to optimize those situations where you know "oh, this must never be null, so I can generate better code", and so on. (And in fact the C# compiler does do so for lifted arithmetic.)
Point being: it's not clear to me that Task<T> is actually that special to warrant this much work.
If these sorts of things interest you then I recommend you investigate functional languages like Haskell, that have much stronger referential transparency and permit all kinds of out-of-order evaluation and do automatic caching. Haskell also has much stronger support in its type system for the sorts of "monadic liftings" that I've alluded to.
Why shouldn't all functions be async?
Performance is one reason, as you mentioned. Note that the "fast path" option you linked to does improve performance in the case of a completed Task, but it still requires a lot more instructions and overhead compared to a single method call. As such, even with the "fast path" in place, you're adding a lot of complexity and overhead with each async method call.
Backwards compatibility, as well as compatibility with other languages (including interop scenarios), would also become problematic.
The other is a matter of complexity and intent. Asynchronous operations add complexity - in many cases, the language features hide this, but there are many cases where making methods async definitely adds complexity to their usage. This is especially true if you don't have a synchronization context, as the async methods then can easily end up causing threading issues that are unexpected.
In addition, there are many routines which aren't, by their nature, asynchronous. Those make more sense as synchronous operations. Forcing Math.Sqrt to be Task<double> Math.SqrtAsync would be ridiculous, for example, as there is no reason at all for that to be asynchronous. Instead of having async push through your application, you'd end up with await propogating everywhere.
This would also break the current paradigm completely, as well as cause issues with properties (which are effectively just method pairs.. would they go async too?), and have other repercussions throughout the design of the framework and language.
If you're doing a lot of IO bound work, you'll tend to find that using async pervasively is a great addition, an many of your routines will be async. However, when you start doing CPU bound work, in general, making things async is actually not good - it's hiding the fact that you're using CPU cycles under an API that appears to be asynchronous, but is really not necessarily truly asynchronous.
Performance aside - async can have a productivity cost. On the client (WinForms, WPF, Windows Phone) it is a boon for productivity. But on the server, or in other non-UI scenarios, you pay productivity. You certainly don't want to go async by default there. Use it when you need the scalability advantages.
Use it when at the sweet spot. In other cases, don't.
I believe there is a good reason to make all methods async if they are not needed to - extensibility. Selective making methods async only works if your code never evolves and you know that method A() is always CPU-bound (you keep it sync) and method B() is always I/O bound (you mark it async).
But what if things change? Yes, A() is doing calculations but at some point in the future you had to add logging there, or reporting, or user-defined callback with implementation which cannot predict, or the algorithm has been extended and now includes not just CPU computations but also some I/O? You'll need to convert the method to async but this would break API and all the callers up the stack would be needed to be updated as well (and they can even be different apps from different vendors). Or you'll need to add async version alongside withe the sync version but this does not make much difference - using sync version would block and thus is hardly acceptable.
It would be great if it was possible to make the existing sync method async without changing the API. But in the reality we don't have such option, I believe, and using async version even if it's not currently needed is the only way to guarantee you'd never hit compatilibty issues in the future.

Continuations: can I serialize the continuation in an F# async workflow or C# async function?

I want a serializable continuation so I can pickle async workflows to disk while waiting for new events. When the async workflow is waiting on a let!, it would be saved away along with a record of what was needed to wake it up. Instead of arbitrary in-memory IAsyncResults (or Task<T>, etc.), it would have to be, for instance, a filter criterion for incoming messages along with the continuation itself. Without language support for continuations, this might be a feat. But with computation expressions taking care of the explicit CPS tranformation, it might not be too tricky and could even be more efficient. Has anyone tackled an approach like this?
You could probably use the MailboxProcessor, or Agent, type as a means of getting close to what you want. You'd could then use the agent.PostAndAsyncReply with a timeout to retrieve the current AgentState. As mentioned above, you'll need to make the objects you are passing around serializable, but even delegates are serializable. The internals are really unrelated to async computations, though. The async computation would merely allow you a way to interact with the various agents in your program in a non-blocking fashion.
Dave Thomas and I have been working on a library called fracture-io that will provide some out-of-the-box scenarios for working with agents. We hadn't yet discussed this exact scenario, but we could probably look at baking this in ... or take a commit. :)
I also noticed that you tagged your question with callcc. I posted a sample of that operator to fssnip, but Tomas Petricek quickly posted an example of how easy it is to break with async computations. So I don't think callcc is a useful solution for this question. If you don't need async, you can look in FSharpx for the Continuation module and the callcc operator in there.
Have you looked at Windows Workflow Foundation?
http://msdn.microsoft.com/en-us/netframework/aa663328.aspx
That's probably the technology you want, assuming the events/messages are arriving in periods of hours/days/weeks and you're serializing to disk to avoid using memory/threads in the meantime. (Or else why do you want it?)

Is this a good time to use multithreading in ASP.NET MVC and how is it implemented?

I want a certain action request to trigger a set of e-mail notifications. The user does something, and it sends the emails. However I do not want the user to wait for page response until the system generates and sends the e-mails. Should I use multithreading for this? Will this even work in ASP.NET MVC? I want the user to get a page response back and the system just finish sending the e-mails at it's own pace. Not even sure if this is possible or what the code would look like. (PS: Please don't offer me an alternative solution for sending e-mails, don't have time for that kind of reconfiguration.)
SmtpClient.SendAsync is probably a better bet than manual threading, though multi-threading will work fine with the usual caveats.
http://msdn.microsoft.com/en-us/library/x5x13z6h.aspx
As other people have pointed out, success/failure cannot be indicated deterministically when the page returns before the send is actually complete.
A couple of observations when using asynchronous operations:
1) They will come back to bite you in some way or another. It's a risk versus benefit discussion. I like the SendAsync() method I proposed because it means forms can return instantly even if the email server takes a few seconds to respond. However, because it doesn't throw an exception, you can have a broken form and not even know it.
Of course unit testing should address this initially, but what if the production configuration file gets changed to point to a broken mail server? You won't know it, you won't see it in your logs, you only discover it when someone asks you why you never responded to the form they filled out. I speak from experience on this one. There are ways around this, but in practicality, async is always more work to test, debug, and maintain.
2) Threading in ASP.Net works in some situations if you understand the ThreadPool, app domain refreshes, locking, etc. I find that it is most useful for executing several operations at once to increase performance where the end result is deterministic, i.e. the application waits for all threads to complete. This way, you gain the performance benefits while still having a clear indication of results.
3) Threading/Async operations do not increase performance, only perceived performance. There may be some edge cases where that is not true (such as processor optimizations), but it's a good rule of thumb. Improperly used, threading can hurt performance or introduce instability.
The better scenario is out of process execution. For enterprise applications, I often move things out of the ASP.Net thread pool and into an execution service.
See this SO thread: Designing an asynchronous task library for ASP.NET
I know you are not looking for alternatives, but using a MessageQueue (such as MSMQ) could be a good solution for this problem in the future. Using multithreading in asp.net is normally discouraged, but in your current situation I don't see why you shouldn't. It is definitely possible, but beware of the pitfalls related to multithreading (stolen here):
•There is a runtime overhead
associated with creating and
destroying threads. When your
application creates and destroys
threads frequently, this overhead
affects the overall application
performance. •Having too many threads
running at the same time decreases the
performance of your entire system.
This is because your system is
attempting to give each thread a time
slot to operate inside. •You should
design your application well when you
are going to use multithreading, or
otherwise your application will be
difficult to maintain and extend. •You
should be careful when you implement a
multithreading application, because
threading bugs are difficult to debug
and resolve.
At the risk of violating your no-alternative-solution prime directive, I suggest that you write the email requests to a SQL Server table and use SQL Server's Database Mail feature. You could also write a Windows service that monitors the table and sends emails, logging successes and failures in another table that you view through a separate ASP.Net page.
You probably can use ThreadPool.QueueUserWorkItem
Yes this is an appropriate time to use multi-threading.
One thing to look out for though is how will you express to the user when the email sending ultamitely fails? Not blocking the user is a good step to improving your UI. But it still needs to not provide a false sense of success when ultamitely it failed at a later time.
Don't know if any of the above links mentioned it, but don't forget to keep an eye on request timeout values, the queued items will still need to complete within that time period.

Scaling up Multiple HttpWebRequests?

I'm building a server application that needs to perform a lot of http requests to a couple other servers on an ongoing basis. Currently, I'm basically setting up about 30 threads and continuously running HttpWebRequests synchronously on each thread, achieving a throughput of about 30 requests per second.
I am indeed setting the ServicePoint ConnectionLimit in the app.config so that's not the limiting factor.
I need to scale this up drastically. At the very least I'll need some more CPU horse power, but I'm wondering if I would gain any advantages by using the async methods of the HttpWebRequest object (eg: .BeginGetResponse() ) as opposed to creating threads myself and using the synchronous methods (eg: .GetResponse() ) on these threads.
If I go with the async methods, I obviously have to significantly redesign my app, so I'm wondering if anyone might have some insight before I go and recode everything, in case I'm out to lunch.
Thanks!
If you are on Windows NT, then System.Net.Sockets.Socket class always uses IO Completion ports for async operations. And HTTPWebRequest in async mode uses async sockets, and hence will be using IOCP.
Without doing detailed benchmarking, it is difficult to say if our bottleneck is inside HttpWebRequest, or up the stack, in your application, or on the remote side, in the server. But offhand, for sure, asyncc will give you better performance, because it will end up using IOCP under the covers. And reimplementing the app for async is not that difficult.
So, I would suggest that you first change your app architecture to async. Then see how much max throughput you are getting. Then you can start benchmarking and finding out where the bottleneck is, and removing that.
Fastest result so far for me is using 75 threads running sync httpwebrequest.
About 140 requests per second on a windows 2003 server, 4core 3ghz, 100mb connection.
Async Httprequest / winsock got stuck at about 30-50 req/sec. Did not test sync winsock but I guess it would give you about the same result as httpwebrequest.
Tests was against 1 200 000 blog feeds.
Been struggling with this the last month so it would be interesting to know if someone managed to squeeze more out of .net?
EDIT
New test: Got 350req/sec with the xfserver iocp component. Used a bunch of threads with one instance each before any greater result. The "client part" of the lib had a couple of really annoying bugs that made implementation harder then the "server part". Not what you're asking for and not recommended but some kind of step.
Next: Former winsock test did not use the 3.5 SocketAsyncEventArgs, that will be next.
ANSWER
The answer to your question, no it will not be worth the effort.
The async HttpWebRequest methods offloads main thread while keeping download in background, it does not improve the number/scalability of requests. (at least not in 3.5, might be different in 4.0?)
However, what might be worth looking at is building your own wrapper around async sockets/SocketAsyncEventArgs where iocp works and perhaps implement a begin/end pattern similar to HttpWebRequest (for easiest possible implementation in current code). The improvement is really enormous.

Categories

Resources