I'm getting in touch with the whole async / await functionality in C# right now.
I think I know what it is good for. But I encountered places where I do not want the common inheritance of all the methods which call a library function of mine to need to be "async" aware.
Consider this (rough pseudo-code, not really representing the real thing, it's just about the context):
string JokeOfTheHour;
public string GiveJokeOfTheHour()
{
if(HourIsOver)
{
jokeOfTheHour = thirdPartyLibrary.GetNewJoke().GetAwaiter().GetResult();
}
return jokeOfTheHour;
}
I have a web-back-end library function which is called up to a million times per hour (or even more).
Exactly one time of these million calls per hour, the logic within uses a third party library which just supports async calls for the methods I want to use from it.
I don't want the user of my library to even think that it would make any sense for them to asynchronously run any code when calling my library-function, because it would only generate unnessecary overhead for their code and runtime the absolute most of the time.
The reasons I would state here are:
Seperation of Concern. I know how I work, my user does not need to.
Context is everything. As a developer, having background-knowledge is the way for me to know which cases I need to consider when writing code, and which not. That enables me to ommit writing hundreds of lines of code for stuff that should never happen.
Now, I want to know what general rules there are to do this. But sadly, I can't find simple statements or rules browsing the web where anybody sais "In this, this and this situation, you can stop this "async" keyword bubbling up your method-calltree". I've just seen persons (some of them Microsoft MVP's) talking about that there absolutely are situations where this should be done, also stating that you should use .GetAwaiter().GetResult() as a best practice then, but they are never specific about the situations itself.
What I am looking for is a down-to-the-ground general rule in which I can say:
Even though I might call third party functions which are async, I do not execute async, and do not want to appear as such. I'm a bottom level function using caches 99.99999% of the time. I don't need my user to implement the async methodology all the way up to where my actual user needs to decide where the async execution stops (Which makes my user who should actually benefit timely from my library do write more code and have more execution time).
I would really be thankful for your help :)
You seem to want your method to introduce itself with: "I'm fast". The truth is that from time to time it can actually be (very) slow. This potentially has serious consequences.
The statement
I'm a bottom level function using caches 99.99999% of the time'
is not correct if you call your method once an hour.
It is better for consumers of your method to see "I can be slow, but if you call me often, I cache the result, so I will return fast" (which would be GiveJokeOfTheHourAsync() with a comment.)
If you want your method to always be fast I would suggest one of these options:
Have an UpdateJokeAsync method that you call without waiting for it in your if(HourIsOver). This would mean returning stale result until you fetch a new one.
Update your joke using a timer.
Make 'get' always get the last known and have UpdateJokeAsync to update the joke.
Related
During my current project, I'm calling a solution (which I'm going to refer to as solution2) every time the user presses a button (as long as the variables are correct). And I'm torn between calling a method inside solution2 on each correct user input, or to write everything in the start method and simply "activate" solution2 for each correct user input. I'm not too bothered which one is easier (except if one of them were going to cause major difficulty), I'm only looking for the most optimised way to do it. Thank you for your help. -TAG
If you're torn between two different means to solve a problem, and optimization is your main concern, measure it! This is a good use of the Stopwatch class, but even just recording your current time and subtracting the time after function completion to get a diff will help you out. Make a (Release!) build for each solution, and run them each a large number of times to establish which one is faster on average.
Once you've determined the most performant solution, keep that one, and consider leaving the performance tracking in so that you can identify bottlenecks in your code. This will allow you to isolate and correct performance problems with confidence. Ideally you can separate your implementation details into their own class so you can refactor and optimize freely without needing to change the rest of your code.
We've got a large C# solution with multiple APIs, SVCs and so on.
Usual sort of enterprisy mess that you get after the same code has been worked on for years by multiple people.
Anyway! We have an ability to call an external service and we have some unit tests in place that use a Moq like stub implementation of the services interface.
It so happens that there can be a large delay in calling the external service and it's not anything that we can control (it's a GDS interface).
We've been working on a way to streamline the user experience for this part of our platform.
The problem is, the stub doesn't actually do much at all - and of course, is lightening fast, compared to the real thing.
We want to introduce a random delay into one of the stubbed methods, that will cause the call to take between 10 and 20 seconds to complete.
The naive approach is to do:
int sleepTimer = random.Next(10, 20);
Thread.Sleep(sleepTimer * 1000);
But something about this gives me a bad feeling.
What other ways do people have of solving this kind of scenario, or is Thread.Sleep actually Ok to use in this context ?
Thanks for your time!
-Russ
Edit, To answer some of the comments:
Basically, we don't want to call the live external service from our test suite, because it costs money and other business problems.
However, we want to test that our new processes work well, even when there's a variable delay in this essential call to the external service.
I would love to explain the exact process, but I'm not allowed to.
But yeah, the summary is that our test needs to ensure that a long running call to an external service doesn't obstruct the rest of the flow; and we need to ensure that other tasks don't get into any kind of race conditions, as they depend on the result of this call.
I agree that calling it a unit-test is somewhat incorrect now!
With my WCF service, I am solving an issue that has both performance and design effects.
The service is a stateless RESTful PerCall service, that does a lot of simple and common thins, which all work like a dandy.
But, there is one operation, that has started to scare me a lot recently, so there is the problem:
Clients make parametrized calls to the operation and the computation of the result requires lots of time to finish. But result to a call with identical parameters will always be the same, until data on the server change. And clients make an awful LOT of calls with exact the same parameters. The server, however, cannot predict the parameters, that the users will like, so sadly enough, the results cannot be precomputed.
So I came up with caching layer and store the result object as a key-value pair, where key represents the parameters which lead to this result. And if the relevant data change, I just flush the cache. Still simple and no problems with this.
Client calls the service, server receives the call, looks, whether the result is already cached and returns it, if so. But, if the result is not cached yet, the client starts the computation. The computation may take up to 2 minutes (average time 10-15 seconds) to finish and by that time, other clients may come and because the result is still not known to cache, each of them would start their own computation. Which is NOT what we really want, so there is a flag, if someone has already started the computation with this parameters this is the place in code, where other callers' code stops and waits for the computation to be finished and inserted into cache, from where each of the invoked instances will grab the result, return it to the client and dispose.
And this is the part, which I am really struggling with.
By now, my solution looks something like this (before you read further, I want to warn you, because my experience is not near decent level and I still am a big noob in all C#, WCF and related stuff... no need telling me I'm a noob, because I am fully aware of that):
Stopwatch sw = new Stopwatch();
sw.Start();
while (true)
{
if (Cache.Contains(parameters) || sw.Elapsed > threshold)
break;
Thread.Sleep(100);
}
...do relevant stuff here
As you see, there are more problems with this solution:
Having the loop, check and all this stuff does not only feel ugly, with many clients waiting this way, the resources tend to jump up.
If the operation fails (the initial caller's computation fails to deliver within the limits of threshold), I do not really know, which client has got to be next up trying the computation, or how, or even whether should I run the operation again, or return a fault to the client...
EDIT: This is not related to synchronization, I am aware of the need for locking in some parts of my application, so my concerns are not synchronization-reated.
What should I do when the relevant server-side data change while invoked code is still performing computation (resulting in such result being a wrong one). ... More over, this has some other horrible effects on the application, but yeah, I am getting to the question here:
So, like most of the time, I did my homework and performed qoogling around before asking, but did not succeed in finding some guidance that I would either understand or that would suit my issues and domain.
I got a strong feel, that I have to introduce some kind of (static?) events-based-and-or-asynchronous class (call it layer if you will), that does some tricks and organizes and manages all this things in some kind of a register-to-me-and-i-will-give-you-a-poke / poke-all-registered-threads manner. But despite being able (to certain extent) to use the newly introduced tasks, TPL, and async-await, I not only have very limited experience on this field, more sadly, I really really need help explaining how it could come together with events (or do I even need them?)... When i try / run little things in a test-console application, I might succeed, but bringing it into this bigger environment of my WCF application, I struggle to get a clue.
So guys I will gladly welcome every kind of relevant thoughts, advice, guidance, links, code and criticism touching my topic.
I am aware of the fact, it might be confusing and will do my best to clear all misunderstandings and tricky parts, just ask me for doing that.
Thanks for help!
The async-await pattern of .net 4.5 is paradigm changing. It's almost too good to be true.
I've been porting some IO-heavy code to async-await because blocking is a thing of the past.
Quite a few people are comparing async-await to a zombie infestation and I found it to be rather accurate. Async code likes other async code (you need an async function in order to await on an async function). So more and more functions become async and this keeps growing in your codebase.
Changing functions to async is somewhat repetitive and unimaginative work. Throw an async keyword in the declaration, wrap the return value by Task<> and you're pretty much done. It's rather unsettling how easy the whole process is, and pretty soon a text-replacing script will automate most of the "porting" for me.
And now the question.. If all my code is slowly turning async, why not just make it all async by default?
The obvious reason I assume is performance. Async-await has a its overhead and code that doesn't need to be async, preferably shouldn't. But if performance is the sole problem, surely some clever optimizations can remove the overhead automatically when it's not needed. I've read about the "fast path" optimization, and it seems to me that it alone should take care of most of it.
Maybe this is comparable to the paradigm shift brought on by garbage collectors. In the early GC days, freeing your own memory was definitely more efficient. But the masses still chose automatic collection in favor of safer, simpler code that might be less efficient (and even that arguably isn't true anymore). Maybe this should be the case here? Why shouldn't all functions be async?
First off, thank you for your kind words. It is indeed an awesome feature and I am glad to have been a small part of it.
If all my code is slowly turning async, why not just make it all async by default?
Well, you're exaggerating; all your code isn't turning async. When you add two "plain" integers together, you're not awaiting the result. When you add two future integers together to get a third future integer -- because that's what Task<int> is, it's an integer that you're going to get access to in the future -- of course you'll likely be awaiting the result.
The primary reason to not make everything async is because the purpose of async/await is to make it easier to write code in a world with many high latency operations. The vast majority of your operations are not high latency, so it doesn't make any sense to take the performance hit that mitigates that latency. Rather, a key few of your operations are high latency, and those operations are causing the zombie infestation of async throughout the code.
if performance is the sole problem, surely some clever optimizations can remove the overhead automatically when it's not needed.
In theory, theory and practice are similar. In practice, they never are.
Let me give you three points against this sort of transformation followed by an optimization pass.
First point again is: async in C#/VB/F# is essentially a limited form of continuation passing. An enormous amount of research in the functional language community has gone into figuring out ways to identify how to optimize code that makes heavy use of continuation passing style. The compiler team would likely have to solve very similar problems in a world where "async" was the default and the non-async methods had to be identified and de-async-ified. The C# team is not really interested in taking on open research problems, so that's big points against right there.
A second point against is that C# does not have the level of "referential transparency" that makes these sorts of optimizations more tractable. By "referential transparency" I mean the property that the value of an expression does not depend on when it is evaluated. Expressions like 2 + 2 are referentially transparent; you can do the evaluation at compile time if you want, or defer it until runtime and get the same answer. But an expression like x+y can't be moved around in time because x and y might be changing over time.
Async makes it much harder to reason about when a side effect will happen. Before async, if you said:
M();
N();
and M() was void M() { Q(); R(); }, and N() was void N() { S(); T(); }, and R and S produce side effects, then you know that R's side effect happens before S's side effect. But if you have async void M() { await Q(); R(); } then suddenly that goes out the window. You have no guarantee whether R() is going to happen before or after S() (unless of course M() is awaited; but of course its Task need not be awaited until after N().)
Now imagine that this property of no longer knowing what order side effects happen in applies to every piece of code in your program except those that the optimizer manages to de-async-ify. Basically you have no clue anymore which expressions will be evaluate in what order, which means that all expressions need to be referentially transparent, which is hard in a language like C#.
A third point against is that you then have to ask "why is async so special?" If you're going to argue that every operation should actually be a Task<T> then you need to be able to answer the question "why not Lazy<T>?" or "why not Nullable<T>?" or "why not IEnumerable<T>?" Because we could just as easily do that. Why shouldn't it be the case that every operation is lifted to nullable? Or every operation is lazily computed and the result is cached for later, or the result of every operation is a sequence of values instead of just a single value. You then have to try to optimize those situations where you know "oh, this must never be null, so I can generate better code", and so on. (And in fact the C# compiler does do so for lifted arithmetic.)
Point being: it's not clear to me that Task<T> is actually that special to warrant this much work.
If these sorts of things interest you then I recommend you investigate functional languages like Haskell, that have much stronger referential transparency and permit all kinds of out-of-order evaluation and do automatic caching. Haskell also has much stronger support in its type system for the sorts of "monadic liftings" that I've alluded to.
Why shouldn't all functions be async?
Performance is one reason, as you mentioned. Note that the "fast path" option you linked to does improve performance in the case of a completed Task, but it still requires a lot more instructions and overhead compared to a single method call. As such, even with the "fast path" in place, you're adding a lot of complexity and overhead with each async method call.
Backwards compatibility, as well as compatibility with other languages (including interop scenarios), would also become problematic.
The other is a matter of complexity and intent. Asynchronous operations add complexity - in many cases, the language features hide this, but there are many cases where making methods async definitely adds complexity to their usage. This is especially true if you don't have a synchronization context, as the async methods then can easily end up causing threading issues that are unexpected.
In addition, there are many routines which aren't, by their nature, asynchronous. Those make more sense as synchronous operations. Forcing Math.Sqrt to be Task<double> Math.SqrtAsync would be ridiculous, for example, as there is no reason at all for that to be asynchronous. Instead of having async push through your application, you'd end up with await propogating everywhere.
This would also break the current paradigm completely, as well as cause issues with properties (which are effectively just method pairs.. would they go async too?), and have other repercussions throughout the design of the framework and language.
If you're doing a lot of IO bound work, you'll tend to find that using async pervasively is a great addition, an many of your routines will be async. However, when you start doing CPU bound work, in general, making things async is actually not good - it's hiding the fact that you're using CPU cycles under an API that appears to be asynchronous, but is really not necessarily truly asynchronous.
Performance aside - async can have a productivity cost. On the client (WinForms, WPF, Windows Phone) it is a boon for productivity. But on the server, or in other non-UI scenarios, you pay productivity. You certainly don't want to go async by default there. Use it when you need the scalability advantages.
Use it when at the sweet spot. In other cases, don't.
I believe there is a good reason to make all methods async if they are not needed to - extensibility. Selective making methods async only works if your code never evolves and you know that method A() is always CPU-bound (you keep it sync) and method B() is always I/O bound (you mark it async).
But what if things change? Yes, A() is doing calculations but at some point in the future you had to add logging there, or reporting, or user-defined callback with implementation which cannot predict, or the algorithm has been extended and now includes not just CPU computations but also some I/O? You'll need to convert the method to async but this would break API and all the callers up the stack would be needed to be updated as well (and they can even be different apps from different vendors). Or you'll need to add async version alongside withe the sync version but this does not make much difference - using sync version would block and thus is hardly acceptable.
It would be great if it was possible to make the existing sync method async without changing the API. But in the reality we don't have such option, I believe, and using async version even if it's not currently needed is the only way to guarantee you'd never hit compatilibty issues in the future.
I want a serializable continuation so I can pickle async workflows to disk while waiting for new events. When the async workflow is waiting on a let!, it would be saved away along with a record of what was needed to wake it up. Instead of arbitrary in-memory IAsyncResults (or Task<T>, etc.), it would have to be, for instance, a filter criterion for incoming messages along with the continuation itself. Without language support for continuations, this might be a feat. But with computation expressions taking care of the explicit CPS tranformation, it might not be too tricky and could even be more efficient. Has anyone tackled an approach like this?
You could probably use the MailboxProcessor, or Agent, type as a means of getting close to what you want. You'd could then use the agent.PostAndAsyncReply with a timeout to retrieve the current AgentState. As mentioned above, you'll need to make the objects you are passing around serializable, but even delegates are serializable. The internals are really unrelated to async computations, though. The async computation would merely allow you a way to interact with the various agents in your program in a non-blocking fashion.
Dave Thomas and I have been working on a library called fracture-io that will provide some out-of-the-box scenarios for working with agents. We hadn't yet discussed this exact scenario, but we could probably look at baking this in ... or take a commit. :)
I also noticed that you tagged your question with callcc. I posted a sample of that operator to fssnip, but Tomas Petricek quickly posted an example of how easy it is to break with async computations. So I don't think callcc is a useful solution for this question. If you don't need async, you can look in FSharpx for the Continuation module and the callcc operator in there.
Have you looked at Windows Workflow Foundation?
http://msdn.microsoft.com/en-us/netframework/aa663328.aspx
That's probably the technology you want, assuming the events/messages are arriving in periods of hours/days/weeks and you're serializing to disk to avoid using memory/threads in the meantime. (Or else why do you want it?)