The async-await pattern of .net 4.5 is paradigm changing. It's almost too good to be true.
I've been porting some IO-heavy code to async-await because blocking is a thing of the past.
Quite a few people are comparing async-await to a zombie infestation and I found it to be rather accurate. Async code likes other async code (you need an async function in order to await on an async function). So more and more functions become async and this keeps growing in your codebase.
Changing functions to async is somewhat repetitive and unimaginative work. Throw an async keyword in the declaration, wrap the return value by Task<> and you're pretty much done. It's rather unsettling how easy the whole process is, and pretty soon a text-replacing script will automate most of the "porting" for me.
And now the question.. If all my code is slowly turning async, why not just make it all async by default?
The obvious reason I assume is performance. Async-await has a its overhead and code that doesn't need to be async, preferably shouldn't. But if performance is the sole problem, surely some clever optimizations can remove the overhead automatically when it's not needed. I've read about the "fast path" optimization, and it seems to me that it alone should take care of most of it.
Maybe this is comparable to the paradigm shift brought on by garbage collectors. In the early GC days, freeing your own memory was definitely more efficient. But the masses still chose automatic collection in favor of safer, simpler code that might be less efficient (and even that arguably isn't true anymore). Maybe this should be the case here? Why shouldn't all functions be async?
First off, thank you for your kind words. It is indeed an awesome feature and I am glad to have been a small part of it.
If all my code is slowly turning async, why not just make it all async by default?
Well, you're exaggerating; all your code isn't turning async. When you add two "plain" integers together, you're not awaiting the result. When you add two future integers together to get a third future integer -- because that's what Task<int> is, it's an integer that you're going to get access to in the future -- of course you'll likely be awaiting the result.
The primary reason to not make everything async is because the purpose of async/await is to make it easier to write code in a world with many high latency operations. The vast majority of your operations are not high latency, so it doesn't make any sense to take the performance hit that mitigates that latency. Rather, a key few of your operations are high latency, and those operations are causing the zombie infestation of async throughout the code.
if performance is the sole problem, surely some clever optimizations can remove the overhead automatically when it's not needed.
In theory, theory and practice are similar. In practice, they never are.
Let me give you three points against this sort of transformation followed by an optimization pass.
First point again is: async in C#/VB/F# is essentially a limited form of continuation passing. An enormous amount of research in the functional language community has gone into figuring out ways to identify how to optimize code that makes heavy use of continuation passing style. The compiler team would likely have to solve very similar problems in a world where "async" was the default and the non-async methods had to be identified and de-async-ified. The C# team is not really interested in taking on open research problems, so that's big points against right there.
A second point against is that C# does not have the level of "referential transparency" that makes these sorts of optimizations more tractable. By "referential transparency" I mean the property that the value of an expression does not depend on when it is evaluated. Expressions like 2 + 2 are referentially transparent; you can do the evaluation at compile time if you want, or defer it until runtime and get the same answer. But an expression like x+y can't be moved around in time because x and y might be changing over time.
Async makes it much harder to reason about when a side effect will happen. Before async, if you said:
M();
N();
and M() was void M() { Q(); R(); }, and N() was void N() { S(); T(); }, and R and S produce side effects, then you know that R's side effect happens before S's side effect. But if you have async void M() { await Q(); R(); } then suddenly that goes out the window. You have no guarantee whether R() is going to happen before or after S() (unless of course M() is awaited; but of course its Task need not be awaited until after N().)
Now imagine that this property of no longer knowing what order side effects happen in applies to every piece of code in your program except those that the optimizer manages to de-async-ify. Basically you have no clue anymore which expressions will be evaluate in what order, which means that all expressions need to be referentially transparent, which is hard in a language like C#.
A third point against is that you then have to ask "why is async so special?" If you're going to argue that every operation should actually be a Task<T> then you need to be able to answer the question "why not Lazy<T>?" or "why not Nullable<T>?" or "why not IEnumerable<T>?" Because we could just as easily do that. Why shouldn't it be the case that every operation is lifted to nullable? Or every operation is lazily computed and the result is cached for later, or the result of every operation is a sequence of values instead of just a single value. You then have to try to optimize those situations where you know "oh, this must never be null, so I can generate better code", and so on. (And in fact the C# compiler does do so for lifted arithmetic.)
Point being: it's not clear to me that Task<T> is actually that special to warrant this much work.
If these sorts of things interest you then I recommend you investigate functional languages like Haskell, that have much stronger referential transparency and permit all kinds of out-of-order evaluation and do automatic caching. Haskell also has much stronger support in its type system for the sorts of "monadic liftings" that I've alluded to.
Why shouldn't all functions be async?
Performance is one reason, as you mentioned. Note that the "fast path" option you linked to does improve performance in the case of a completed Task, but it still requires a lot more instructions and overhead compared to a single method call. As such, even with the "fast path" in place, you're adding a lot of complexity and overhead with each async method call.
Backwards compatibility, as well as compatibility with other languages (including interop scenarios), would also become problematic.
The other is a matter of complexity and intent. Asynchronous operations add complexity - in many cases, the language features hide this, but there are many cases where making methods async definitely adds complexity to their usage. This is especially true if you don't have a synchronization context, as the async methods then can easily end up causing threading issues that are unexpected.
In addition, there are many routines which aren't, by their nature, asynchronous. Those make more sense as synchronous operations. Forcing Math.Sqrt to be Task<double> Math.SqrtAsync would be ridiculous, for example, as there is no reason at all for that to be asynchronous. Instead of having async push through your application, you'd end up with await propogating everywhere.
This would also break the current paradigm completely, as well as cause issues with properties (which are effectively just method pairs.. would they go async too?), and have other repercussions throughout the design of the framework and language.
If you're doing a lot of IO bound work, you'll tend to find that using async pervasively is a great addition, an many of your routines will be async. However, when you start doing CPU bound work, in general, making things async is actually not good - it's hiding the fact that you're using CPU cycles under an API that appears to be asynchronous, but is really not necessarily truly asynchronous.
Performance aside - async can have a productivity cost. On the client (WinForms, WPF, Windows Phone) it is a boon for productivity. But on the server, or in other non-UI scenarios, you pay productivity. You certainly don't want to go async by default there. Use it when you need the scalability advantages.
Use it when at the sweet spot. In other cases, don't.
I believe there is a good reason to make all methods async if they are not needed to - extensibility. Selective making methods async only works if your code never evolves and you know that method A() is always CPU-bound (you keep it sync) and method B() is always I/O bound (you mark it async).
But what if things change? Yes, A() is doing calculations but at some point in the future you had to add logging there, or reporting, or user-defined callback with implementation which cannot predict, or the algorithm has been extended and now includes not just CPU computations but also some I/O? You'll need to convert the method to async but this would break API and all the callers up the stack would be needed to be updated as well (and they can even be different apps from different vendors). Or you'll need to add async version alongside withe the sync version but this does not make much difference - using sync version would block and thus is hardly acceptable.
It would be great if it was possible to make the existing sync method async without changing the API. But in the reality we don't have such option, I believe, and using async version even if it's not currently needed is the only way to guarantee you'd never hit compatilibty issues in the future.
Related
I'm getting in touch with the whole async / await functionality in C# right now.
I think I know what it is good for. But I encountered places where I do not want the common inheritance of all the methods which call a library function of mine to need to be "async" aware.
Consider this (rough pseudo-code, not really representing the real thing, it's just about the context):
string JokeOfTheHour;
public string GiveJokeOfTheHour()
{
if(HourIsOver)
{
jokeOfTheHour = thirdPartyLibrary.GetNewJoke().GetAwaiter().GetResult();
}
return jokeOfTheHour;
}
I have a web-back-end library function which is called up to a million times per hour (or even more).
Exactly one time of these million calls per hour, the logic within uses a third party library which just supports async calls for the methods I want to use from it.
I don't want the user of my library to even think that it would make any sense for them to asynchronously run any code when calling my library-function, because it would only generate unnessecary overhead for their code and runtime the absolute most of the time.
The reasons I would state here are:
Seperation of Concern. I know how I work, my user does not need to.
Context is everything. As a developer, having background-knowledge is the way for me to know which cases I need to consider when writing code, and which not. That enables me to ommit writing hundreds of lines of code for stuff that should never happen.
Now, I want to know what general rules there are to do this. But sadly, I can't find simple statements or rules browsing the web where anybody sais "In this, this and this situation, you can stop this "async" keyword bubbling up your method-calltree". I've just seen persons (some of them Microsoft MVP's) talking about that there absolutely are situations where this should be done, also stating that you should use .GetAwaiter().GetResult() as a best practice then, but they are never specific about the situations itself.
What I am looking for is a down-to-the-ground general rule in which I can say:
Even though I might call third party functions which are async, I do not execute async, and do not want to appear as such. I'm a bottom level function using caches 99.99999% of the time. I don't need my user to implement the async methodology all the way up to where my actual user needs to decide where the async execution stops (Which makes my user who should actually benefit timely from my library do write more code and have more execution time).
I would really be thankful for your help :)
You seem to want your method to introduce itself with: "I'm fast". The truth is that from time to time it can actually be (very) slow. This potentially has serious consequences.
The statement
I'm a bottom level function using caches 99.99999% of the time'
is not correct if you call your method once an hour.
It is better for consumers of your method to see "I can be slow, but if you call me often, I cache the result, so I will return fast" (which would be GiveJokeOfTheHourAsync() with a comment.)
If you want your method to always be fast I would suggest one of these options:
Have an UpdateJokeAsync method that you call without waiting for it in your if(HourIsOver). This would mean returning stale result until you fetch a new one.
Update your joke using a timer.
Make 'get' always get the last known and have UpdateJokeAsync to update the joke.
As I read the MSDN article Using Asynchronous Methods in ASP.NET MVC 4, I draw the conclusion that I should always use async await for I/O-bound operations.
Consider the following code, where movieManager exposes the async methods of an ORM like Entity Framework.
public class MovieController : Controller
{
// fields and constructors
public async Task<ActionResult> Index()
{
var movies = await movieManager.listAsync();
return View(movies);
}
public async Task<ActionResult> Details(int id)
{
var movie = await movieManager.FindAsync(id);
return View(movie);
}
}
Will this always give me better scalability and/or performance?
How can I measure this?
Why isn't this used in the "real world"?
How about context synchronization?
Is it that bad, that I shouldn't use async I/O in ASP.NET MVC?
I know these are a lot of questions, but literature on this topic has conflicting conclusions. Some say you should always use async for I/O dependent Tasks, others say you shouldn't use async in ASP.NET applications at all.
Will this always give me better scalability and/or performance?
It may. If you only have a single database server as your backend, then your database could be your scalability bottleneck, and in that case scaling your web server won't have any effect in the wider scope of your service as a whole.
How can I measure this?
With load testing. If you want a simple proof-of-concept, you can check out this gist of mine.
Why isn't this used in the "real world" a lot?
It is. Asynchronous request handlers before .NET 4.5 were quite painful to write, and a lot of companies just threw more hardware at the problem instead. Now that .NET 4.5 and async/await are gaining a lot of momentum, asynchronous request handling will continue to be much more common.
How about context synchronization?
It's handled for you by ASP.NET. I have an async intro on my blog that explains how await will capture the current SynchronizationContext when you await a task. In this case it's an AspNetSynchronizationContext that represents the request, so things like HttpContext.Current, culture, etc. all get preserved across await points automatically.
Is it that bad, that I shouldn't use async I/O in ASP.NET MVC?
As a general rule, if you're on .NET 4.5, you should use async to handle any request that requires I/O. If the request is simple (i.e., does not hit a database or call another service), then just keep it synchronous.
Will this always give me better scalability and/or performance?
You answered it yourself, you need to measure and find out. Typically async is something to add later on due to adding complexity, which is the #1 concern in your code base until you have a problem that is specific.
How can I measure this?
Build it both ways, see which is faster (preferably for a large number of operations)
Why isn't this used in the "real world" a lot?
Because complexity is the biggest problem in software development. If code is complex it is more error prone and harder to debug. More, harder to fix bugs is not a good trade off for potential performance advantages.
How about context synchronization?
I am assuming you mean ASP.NET context, if so you should not have any synchronization, make sure only one thread is hitting your context and communicate through it.
Is it that bad, that I shouldn't use async I/O in ASP.NET MVC?
Introducing async just to then have to deal with synchronization is a loss unless you really need the performance.
Putting asynchronous code in a website has a lot of negative sides :
You'll get into trouble when there are dependencies between the pieces of data, as you cannot make that asynchronous.
Asynchronous work is often done for things like API requests. Have you considered that you shouldn't be doing these in a webpage? If the external service goes down, so goes your site. That doesn't scale.
Doing things asynchronously may speed up your site in some cases but you're basically introducing trouble. You always end up waiting for the slowest one, and since sometimes resources just slow down for whatever reason this means that the risk of something slowing down your site increases by a factor equal to the number of asynchronous jobs you're using. You'll have to introduce timeouts to deal with these, then error handling code, etc.
When scaling to multiple webservers because the CPU load is getting too heavy, the asynchronous work will hurt you. Everything you used to put in asynchronous code now fires simultaneously the moment the user clicks a link, and then eases down. This doesn't only apply to CPU load, but also database load and even API requests. You will see a very awful utilization pattern across all system resources: spikes of heavy usage, and then it goes down again. That doesn't scale well. Synchronous code doesn't have this problem: jobs only start after another one is done.
Asynchronous work for websites is a trap: don't go there!
Put your heavy code in a worker (or cron job) that does these things before the user asks for them. You'll have them in a database and you can keep adding features to your site without having to worry about firing too many asynchronous jobs and what not.
Performance for websites is seriously overrated. Sure, it's nice if your page renders in 50ms, but if it takes 250ms people really won't notice (to test this: put a Sleep(200) in your code).
Your code becomes a lot more scalable if you just offload the work to another process and make the website an interface to only your database. Don't make your webserver do heavy work that it shouldn't do, it doesn't scale. You can have a hundred machines spending a total of 1 CPU hour per webpage - but at least it scales in a way where the page still loads in 200ms. Good luck achieving that with asynchronous code.
I would like to add a side-note here. While my opinion on asynchronous code might seem strong, it's mostly an opinion about programmers. Asynchronous code is awesome and can make a performance difference that proves all of the points I outlined wrong. However, it needs a lot of finetuning in your code to avoid the points I mention in this post, and most programmers just can't handle that.
I have two questions, stemming from observed behavior of C# static methods (which I may be misinterpretting):
First:
Would a recursive static method be tail call optimized in a sense by the way the static method is implemented under the covers?
Second:
Would it be equivalent to functional programming to write an entire application with static methods and no variables beyond local scope? I am wondering because I still haven't wrapped my head around this "no side effects" term I keep hearing about functional programming..
Edit:
Let me mention, I do use and understand why and when to use static methods in the normal C# OO methodology, and I do understand tail call optimization will not be explicitly done to a recursive static method. That said, I understand tail call optimization to be an attempt at stopping the creation of a new stack frame with each pass, and I had at a couple points observed what appeared to be a static method executing within the frame of it's calling method, though I may have misinterpreted my observation.
Would a recursive static method be tail call optimized in a sense by the way the static method is implemented under the covers?
Static methods have nothing to do with tail recursion optimization. All the rules equally apply to instance and static methods, but personally I would never rely on JIT optimizing away my tail calls. Moreover, C# compiler doesn't emit tail call instruction but sometimes it is performed anyway. In short, you never know.
F# compiler supports tail recursion optimization and, when possible, compiles recursion to loops.
See more details on C# vs F# behavior in this question.
Would it be equivalent to functional programming to write an entire application with static methods and no variables beyond local scope?
It's both no and yes.
Technically, nothing prevents you from calling Console.WriteLine from a static method (which is a static method itself!) which obviously has side-effects. Nothing also prevents you from writing a class (with instance methods) that does not change any state (i.e. instance methods don't access instance fields). However from the design point of view, such methods don't really make sense as instance methods, right?
If you Add an item to .NET Framework List<T> (which has side effects), you will modify its state.
If you append an item to an F# list, you will get another list, and the original will not be modified.
Note that append indeed is a static method on List module. Writing “transformation” methods in separate modules encourages side-effect free design, as no internal storage is available by definition, even if the language allows it (F# does, LISP doesn't). However nothing really prevents you from writing a side-effect free non-static method.
Finally, if you want to grok functional language concepts, use one! It's so much more natural to write F# modules that operate immutable F# data structures than imitate the same in C# with or without static methods.
The CLR does do some tail call optimisations but only in 64-bit CLR processes. See the following for where it is done: David Broman's CLR Profiling API Blog: Tail call JIT conditions.
As for building software with just static variables and local scope, I've done this a lot and it's actually fine. It's just another way of doing things that is as valid as OO is. In fact because there is no state outside the function/closure, it's safer and easier to test.
I read the entire SICP book from cover to cover first however: http://mitpress.mit.edu/sicp/
No side effects simply means that the function can be called with the same arguments as many times as you like and always return the same value. That simply defines that the result of the function is always consistent therefore does not depend on any external state. Due to this, it's trivial to parallelize the function, cache it, test it, modify it, decorate it etc.
However, a system without side effects is typically useless, so things that do IO will always have side effects. It allows you to neatly encapsulate everything else though which is the point.
Objects are not always the best way, despite what people say. In fact, if you've ever used a LISP variant, you will no doubt determine that typical OO does sometimes get in the way.
There's a pretty good book written on this subject, http://www.amazon.com/Real-World-Functional-Programming-Examples/dp/1933988924.
And in the real world using F# unfortunately isn't an option due to team skills or existing codebases, which is another reason I do love this book, as it has shows many ways to implement F# features in the code you use day to day. And to me at least the vast reduction in state bugs, which take far longer to debug than simple logic errors, is worth the slight reduction in OOP orthodoxy.
For the most part having no static state and operating in a static method only on the parameters given will eliminate side-effects, as you're limiting yourself to pure functions. One point to watch out for though is retrieving data to be acted on or saving data to a database in such a function. Combining OOP and static methods, though, can help here, by having your static methods delegate to lower level objects commands to manipulate state.
Also a great help in enforcing function purity is to keep objects immutable whenever possible. Any object acted on should return a new modified instance, and the original copy discarded.
Regarding second question: I believe you mean "side effects" of mutable data structures, and obviously this is not a problem for (I believe) most functional languages. For instance, Haskel mostly (or even all!?) uses immutable data structures. So there is nothing about "static" behaviour.
I want a serializable continuation so I can pickle async workflows to disk while waiting for new events. When the async workflow is waiting on a let!, it would be saved away along with a record of what was needed to wake it up. Instead of arbitrary in-memory IAsyncResults (or Task<T>, etc.), it would have to be, for instance, a filter criterion for incoming messages along with the continuation itself. Without language support for continuations, this might be a feat. But with computation expressions taking care of the explicit CPS tranformation, it might not be too tricky and could even be more efficient. Has anyone tackled an approach like this?
You could probably use the MailboxProcessor, or Agent, type as a means of getting close to what you want. You'd could then use the agent.PostAndAsyncReply with a timeout to retrieve the current AgentState. As mentioned above, you'll need to make the objects you are passing around serializable, but even delegates are serializable. The internals are really unrelated to async computations, though. The async computation would merely allow you a way to interact with the various agents in your program in a non-blocking fashion.
Dave Thomas and I have been working on a library called fracture-io that will provide some out-of-the-box scenarios for working with agents. We hadn't yet discussed this exact scenario, but we could probably look at baking this in ... or take a commit. :)
I also noticed that you tagged your question with callcc. I posted a sample of that operator to fssnip, but Tomas Petricek quickly posted an example of how easy it is to break with async computations. So I don't think callcc is a useful solution for this question. If you don't need async, you can look in FSharpx for the Continuation module and the callcc operator in there.
Have you looked at Windows Workflow Foundation?
http://msdn.microsoft.com/en-us/netframework/aa663328.aspx
That's probably the technology you want, assuming the events/messages are arriving in periods of hours/days/weeks and you're serializing to disk to avoid using memory/threads in the meantime. (Or else why do you want it?)
I am getting two contradicting views on this. Some source says there should be less little methods to reduce method calls, but some other source says writing shorter method is good for letting the JIT to do the optimization.
So, which side is correct?
The overhead of actually making the method call is inconsequentially small in most every case. You never need to worry about it unless you can clearly identify a problem down the road that requires revisiting the issue (you won't).
It's far more important that your code is simple, readable, modular, maintainable, and modifiable. Methods should do one thing, one thing only and delegate sub-things to other routines. This means your methods should be as short as they can possibly be, but not any shorter. You will see far more performance benefits by having code that is less prone to error and bugs because it is simple, than by trying to outsmart the compiler or the runtime.
The source that says methods should be long is wrong, on many levels.
None, you should have relatively short method to achieve readability.
There is no one simple rule about function size. The guideline should be a function should do 'one thing'. That's a little vague but becomes easier with experience. Small functions generally lead to readability. Big ones are occasionally necessary.
Worrying about the overhead of method calls is premature optimization.
As always, it's about finding a good balance. The most important thing is that the method does one thing only. Longer methods tend to do more than one thing.
The best single criterion to guide you in sizing methods is to keep them well-testable. If you can (and actually DO!-) thoroughly unit-test every single method, your code is likely to be quite good; if you skimp on testing, your code is likely to be, at best, mediocre. If a method is difficult to test thoroughly, then that method is likely to be "too big" -- trying to do too many things, and therefore also harder to read and maintain (as well as badly-tested and therefore a likely haven for bugs).
First of all, you should definitely not be micro-optimizing the performance on the number-of-methods level. You will most likely not get any measurable performance benefit. Only if you have some method that are being called in a tight loop millions of times, it might be an idea - but don't begin optimizing on that before you need it.
You should stick to short concise methods, that does one thing, that makes the intent of the method clear. This will give you easier-to-read code, that is easier to understand and promotes code reuse.
The most important cost to consider when writing code is maintanability. You will spend much, much more time maintaining an application and fixing bugs than you ever will fixing performance problems.
In this case the almost certainly insignificant cost of calling a method is incredibly small when compared to the cost of maintaining a large unwieldy method. Small concise methods are easier to maintain and comprehend. Additionally the cost of calling the method almost certainly will not have a significant performance impact on your application. And if it does, you can only assertain that by using a profiler. Developers are notoriously bad at identifying performance problems before hand.
Generally speaking, once a performance problem is identified, they are easy to fix. Making a method or more importantly a code base, maintainable is a much higher cost.
Personally, I am not afraid of long methods as long as the person writing them writes them well (every piece of sub-task separated by 2 newlines and a nice comment preceeding it, etc. Also, identation is very important.).
In fact, many times I even prefer them (e.g. when writing code that does things in a specific order with sequential logic).
Also, I really don't understand why breaking a long method into 100 pieces will improve readablility (as others suggest). Only the opposite. You will only end-up jumping all over the place and holding pieces of code in your memory just to get a complete picture of what is going on in your code. Combine that with possible lack of comments, bad function names, many similar function names and you have the perfect recipe for chaos.
Also, you could go the other end while trying to reduce the size of the methods: to create MANY classes and MANY functions each of which may take MANY parameters. I don't think this improves readability either (especially for a begginer to a project that has no clue what each class/method do).
And the demand that "a function should do 'one thing'" is very subjective. 'One thing' may be increasing a variable by one up to doing a ton of work supposedly for the 'same thing'.
My rule is only reuseability:
The same code should not appear many times in many places. If this is the case you need a new function.
All the rest is just philosophical talk.
In a question of "why do you make your methods so big" I reply, "why not if the code is simple?".