Is there a benefit to using the async/await pattern when you are running things in a synchronous manner?
For instance in my app I have static methods that are called by cron (hangfire) to do various IO bound tasks. A simple contrived example is this:
static void Run(string[] args)
{
var data = Test(args);
//..do stuff with returned data
}
public static List<string> Test(string[] args)
{
return Db.Select(args);
}
Is there any advantage to writing this code like so:
static void Run(string[] args)
{
var dataTask = await TestAsync(args);
dataTask.Wait();
//..do stuff with returned data
}
public static async Task<List<string>> TestAsync(string[] args)
{
return await Db.SelectAsync(args);
}
My colleague tells me that I should always use this pattern and use async methods if they are available as it adds under the hood optimization but he is unable to explain why that is and I can't really find any clear cut explanation.
If I write my code with this type of pattern in my static methods it ends up looking like:
var data = someMethod();
data.Wait();
var data2 = someOtherMethod(data);
data2.Wait();
I understand using async await pattern when firing up lots of concurrent tasks but when the code originates from a static method and has to run in order like this is there any benefit at all? Which way should I write it?
as it adds under the hood optimization but he is unable to explain why that is
It is amazing to me how many people believe that async is always the best choice yet they cannot say why. This is a big misunderstanding in the community. Unfortunately, Microsoft is kind of pushing this notion. I believe they are doing this to simplify guidance.
Async IO helps with two things: 1) save threads 2) make GUI apps easier by doing away with thread management.
Most applications are totally unconstrained by the number of threads that are running. For those apps, async IO adds zero throughput, it costs additional CPU and complicates the code. I know, because I have measured throughput and scalability. I have worked on many applications.
In particular, the IO itself does not become faster. The only thing that changes is the way the call is initiated and completed. There are no IO optimizations here whatsoever.
Use async when it is either convenient to you or you have evidence that the number of running threads will be a problem. Do not use it by default because productivity will be lower. There are additional ways to add bugs. Tooling is worse, the code is longer, debugging and profiling is harder.
Of course, there is nothing wrong with using it if it is the right tool for the job.
Related
I work with Node.js and so I got very used to its 'programming style' and its way to deal with asynchronous operations through higher order functions and callbacks, where most I/O events are handled in a async way by design and if I want to make a sync operation, I need to use Promises or the await shortcut, whereas in synchronous programming languages like Java, C#, C++ apparently I'd have to do the opposite, by somehow telling the compiler that the task I want to achieve must be performed asynchronously. I tried reading through the Microsoft docs and couldn't really understand how to achieve it. I mean, I could use Threads but for the simple task I want to process, exploring Threads is just not worth it for the trouble on guaranteeing thread-safety.
I came across the Task class. So, suppose that I want to run a Task method multiple times in a async way, where the functions are being called in parallel. How can I do this?
private Task<int> MyCustomTask(string whatever)
{
// I/O event that I want to be processed in async manner
}
So basically, I wanted to run this method in 'parallel' without threading.
foreach (x in y)
{
MyCustomTask("");
}
If you don't want to await, you can do something like this.
public class AsyncExamples
{
public List<string> whatevers = new List<string> { "1", "2", "3" };
private void MyCustomTask(string whatever)
{
// I/O event that I want to be processed in async manner
}
public void FireAndForgetAsync(string whatever)
{
Task.Run(
() =>
{
MyCustomTask(whatever);
}
);
}
public void DoParallelAsyncStuff()
{
foreach (var whatever in whatevers)
{
FireAndForgetAsync(whatever);
}
}
}
most I/O events are handled in a async way by design and if I want to make a sync operation, I need to use Promises or the await shortcut
I believe the difference you're expressing is the difference between functional and imperative programming, not the difference between asynchronous and synchronous programming. So I think what you're saying is that asynchronous programming fits more naturally with a functional style, which I would agree with. JavaScript is mostly functional, though it also has imperative and OOP aspects. C# is more imperative and OOP than functional, although it grows more functional with each year.
However, both JavaScript and C# are synchronous by default, not asynchronous by default. A method must "opt in" to asynchrony using async/await. In that way, they are very similar.
I tried reading through the Microsoft docs and couldn't really understand how to achieve it.
Cheat sheet if you're familiar with asynchronous JavaScript:
Task<T> is Promise<T>
If you need to write a wrapper for another API (e.g., the Promise<T> constructor using resolve/reject), then the C# type you need is TaskCompletionSource<T>.
async and await work practically the same way.
Task.WhenAll is Promise.all, and Task.WhenAny is Promise.any. There isn't a built-in equivalent for Promise.race.
Task.FromResult is Promise.resolve, and Task.FromException is Promise.reject.
So, suppose that I want to run a Task method multiple times in a async way, where the functions are being called in parallel. How can I do this?
(minor pedantic note: this is asynchronous concurrency; not parallelism, which implies threads)
To do this in JS, you would take your iterable, map it over an async method (resulting in an iterable of promises), and then Promise.all those promises.
To do the same thing in C#, you would take your enumerable, Select it over an async method (resulting in an enumerable of tasks), and then Task.WhenAll those tasks.
var tasks = y.Select(x => MyCustomTask(x)).ToList();
await Task.WhenAll(tasks);
I have read that to do a fake async method in this way it is a bad idea:
public int myMethodSyn()
{
//sync operations
return result;
}
public async int myMethodAsync()
{
return await Task.Run(myMethodSync);
}
One of the reasons that I have read it is because for example, ASP can have scalability problems with this kind of libraries because tasks use the thread pool and ASP need the thread pool to attend each call. So the library can consume all the threads of the thread pool al block ASP. SO it is better allow to the client decides how to use the thread pool.
If am not wrong, Parallel.Invoke use the thread pool too to run methods in parallel, so I guess that if I use a method in my library that uses parallel.Invoke, or parallel.Foreach or any of this ways to run code in parallel, I would have the same problem. Is it true?
My idea is to run two mthods in parallel because they are indepent and I could get a better performance if I run them in parallel. So I would have somthing like that:
public int myMainMethodSync()
{
int result01 = myMethod01Sync();
int result02 = myMethod02Sync();
return result01 + result02;
}
private void myMethod01Sync()
{
}
private void myMethod02Sync()
{
}
public int myMainMethodAsync()
{
Task myTsk01 = Task.Run(myMethod01Sync);
Task myTsk02 = Task.Run(myMethod02Sync);
Task.WhenAll(myTsk01, myTsk02);
return myTsk01.Result + myTsk02.Result;
}
public int Task myMainMethodParallel()
{
int result01;
int result02;
Parallel.Invoke(() => result01 = myMethod01Sync(),
() => result02 = myMethod02Sync());
return result01 + result02;
}
The idea is it to have a sync method that run the two methods in sync. So the client who use the library knows that the method will not use thread pool.
Later I have two options to run the methods at the same time, with tasks or with parallel.Invoke.
In the case of the tasks, I am using a fake async methods because I am wraping the sync method inside a task, that use two threads from the threadpool. If I am not wrong, this is not recommended.
The other option it is to use Parallel.Invoke, that uses threads from thread pool too, so I guess it has the same problem that with tasks, so I guess that it is not recommended too.
In my case I would prefer to use task, because I can decide with a condition when to run the method02Sync for example, according to some condiciotion, so I could save the cost to assign a thread to run the second method if I know that it is not needed in some cases. I guess in parallel.Invoke this is not possible.
However, I think that in this case, how I implement a sync method too, I let the client to choose the method that it considerates better in its case, so really it is a bad option to use tasks in the async method?
If both solutions are bad, tasks and Parallel.Invloke, then it is not recommended to run parallel code in libraries and only use it in the top level, in the UI or client of the library? Because I guess that in this case the use of parallel is very restrictive, because in the top level, in the UI, it is not possible to use parallel if it decides it is possible because tell to library use threads or not, because it wouldn't have parallel methods.
In sumary, is my solution, expose sync and async methods a bad idea? is it bad idea to use task or parallel code in the libraries? If one of them it is better option, which one?
Thanks.
is my solution, expose sync and async methods a bad idea?
Let me reformulate the question to make it more general:
Is it a good idea to expose two versions of a method with different performance characteristics?
I think that most of the time, it is a bad idea. The API of your library should be clear, you should not make the users of your library constantly keep choosing between the two options. I think it's your responsibility as a library author to make the decision, even if it's going to be the wrong one for some of your users.
If the differences between the two options are dramatic, you could consider some approach that lets your users to choose between them. But I think having two separate methods is the wrong choice, something like an optional parameter would be a better approach, because it means there is a clear default.
The one exception I can think of is if the signatures of the two methods are different, like with truly async methods. But I don't think that applies to your use of Tasks to parallelize CPU-bound methods.
Is it bad idea to use task or parallel code in the libraries?
I think you should use them cautiously. You are right that your users might not be happy if your library uses more resources (here, threads) to make itself faster. On the other hand, most methods of parallelizing code are smart enough that if the amount of available thread pool threads is limited, they will still work fine. So, if you measured that the speedup gained by parallelizing your code is significant, I think it's okay to do it.
If one of them it is better option, which one?
I think this is more a matter of which one you prefer as a matter of code style. The performance characteristics of Parallel.Invoke() with two actions and synchronously waiting for two Tasks should be comparable.
Though keep in mind that your call to Task.WhenAll doesn't really do anything, since WhenAll returns a Task that completes when all its component Tasks complete. You could instead use Task.WaitAll, but I'm not sure what would be the point, since you're already implicitly waiting for both Tasks by acessing their Results.
So here I have a function
static bool Login(SignupData sd)
{
bool success=false;
/*
Perform login-related actions here
*/
}
And there is another function
static Task<bool> LoginAsync(SignupData sd)
{
return Task.Run<bool>(()=>Login(sd));
}
Now, I've come across a rather different implementation of this pattern, where you would add the async keyword to a function which returns Task<TResult> (so that it ends up looking like: async Task<TResult> LoginAsync(SignupData sd)). In this case, even if you return TResult instead of a Task<TResult>, the program still compiles.
My question here is, which implementation should be prefered?
static Task<bool> LoginAsync(SignupData sd)
{
return Task.Run<bool>(()=>Login(sd));
}
OR this one?
async static Task<bool> LoginAsync(SignupData sd)
{
bool success=Login(sd);
return success;
}
You shouldn't be doing either. Asynchronous methods are useful if they can prevent threads from being blocked. In your case, your method doesn't avoid that, it always blocks a thread.
How to handle long blocking calls depends on the application. For UI applications, you want to use Task.Run to make sure you don't block the UI thread. For e.g. web applications, you don't want to use Task.Run, you want to just use the thread you've got already to prevent two threads from being used where one suffices.
Your asynchronous method cannot reliably know what works best for the caller, so shouldn't indicate through its API that it knows best. You should just have your synchronous method and let the caller decide.
That said, I would recommend looking for a way to create a LoginAsync implementation that's really asynchronous. If it loads data from a database, for instance, open the connection using OpenAsync, retrieve data using ExecuteReaderAsync. If it connects to a web service, connect using the asynchronous methods for whatever protocol you're using. If it logs in some other way, do whatever you need to make that asynchronous.
If you're taking that approach, the async and await keywords make perfect sense and can make such an implementation very easy to create.
While HVD is correct, I will dive into async in an attempt to describe its intended use.
The async keyword, and the accompanying await keyword is a shortcut method of implementing non blocking code patterns within your application. While it plays along perfectly with the rest of the Task Parallel Library (TPL), it isn't usually used quite the same. It's beauty is in the elegance of how the compiler weaves in the asynchronicity, and allows it to be handled without explicitly spinning off separate threads, which may or may not be what you want.
For Example, let's look at some code:
async static Task<bool> DoStuffAsync()
{
var otherAsyncResult = doOtherStuffAsync();
return await otherAsyncResult
}
See the await keyword? It says, return to the caller, continue on until we have the result you need. Don't block, don't use a new thread, but basically return with a promise of a result when ready (A Task). The calling code can then carry on and not worry about the result until later when we have it.
Usually this ends up requiring that your code becomes non-blocking the whole way down (async all the way as it were), and often this is a difficult transition to understand. However, if you can it is incredibly powerful.
The better way to handle your code would be to make the synchronous code call the async one, and wait on it. That way you would be async as much as possible. It is always best to force that level as high as possible in your application, all the way to the UI if possible.
Hope that made sense. The TPL is a huge topic, and Async/Await really adds some interesting ways of structuring your code.
https://msdn.microsoft.com/en-us/library/hh191443.aspx
I was talking to a colleague who pointed me to the SO question about subjects being considered harmful. However, I have two cases where I have some non-deterministic code that does not seem reasonable any other way.
Non-standard event:
event handler(class, result)
{
subject.OnNext(result);
}
public delegate void _handler
([MarshalAs(UnmanagedType.Interface), In] MyClass class,
[MarshalAs(UnmanagedType.Interface), In] ResultClass result)
Parallel Tasks (Non-Deterministic number of tasks all running in parallel, starting at different times):
Task.Start(()=> ...).ContinueWith(prevTask => subject.OnNext(prevTask.result))
The subject is not exposed, only through an observable. Is there another route suggested that isnt a ton of boilerplate?
Subjects are not always harmful. There are many legitimate uses of them even within Rx itself. However, many times a person goes to use a Subject, there's already a robust Rx method written for that scenario(and it may or may not be using subjects internally). This is the case for your 2 examples. Look at Task.ToObservable and Observable.FromEventPattern.
Another common case subjects are misused is when a developer breaks a stream in two. They become convinced they need to subscribe to a stream and in the callback they produce data for a new stream. They do this with a Subject. But usually they just should have used Select instead.
Observable.FromEvent
System.FromEvent works for more than just built-in event types: you just need to use the correct overload.
class Program
{
private static event Action<int> MyEvent;
public static void Main(string[] args)
{
Observable.FromEvent<int>(
(handler) => Program.MyEvent += handler,
(handler) => Program.MyEvent -= handler
)
.Subscribe(Console.WriteLine);
Program.MyEvent(5);
Console.ReadLine();
}
}
Task.ToObservable & Merge
If you already have access to all of your tasks, you can convert them to Observables, and Merge them into a single observable.
class Program
{
public static void Main(string[] args)
{
Observable.Merge(
// Async / Await
(
(Func<Task<string>>)
(async () => { await Task.Delay(250); return "async await"; })
)().ToObservable(),
// FromResult
Task.FromResult("FromResult").ToObservable(),
// Run
Task.Run(() => "Run").ToObservable()
)
.Subscribe(Console.WriteLine);
Console.ReadLine();
}
}
Merge Observable
Alternatively, if you do not have all of your tasks up front, you can still use Merge, but you'll need some way of communicating future tasks. In this case, I've used a subject, but you should use the simplest Observable possible to express this. If that's a subject, then by all means, use a subject.
class Program
{
public static void Main(string[] args)
{
// We use a subject here since we don't have all of the tasks yet.
var tasks = new Subject<Task<string>>();
// Make up some tasks.
var fromResult = Task.FromResult("FromResult");
var run = Task.Run(() => "Run");
Func<Task<string>> asyncAwait = async () => {
await Task.Delay(250);
return "async await";
};
// Merge any future Tasks into an observable, and subscribe.
tasks.Merge().Subscribe(Console.WriteLine);
// Send tasks.
tasks.OnNext(fromResult);
tasks.OnNext(run);
tasks.OnNext(asyncAwait());
Console.ReadLine();
}
}
Subjects
Why to use or not to use Subjects is a question I don't have the time to answer adequately. Typically speaking, however, I find that using a Subject tends to be the "easy way out" when it appears an operator does not already exist.
If you can somehow limit the exposure of a subject in terms of it's visibility to the rest of the application, then by all means use a subject and do so. If you're looking for message bus functionality, however, you should rethink the design of the application, as message buses are anti-patterns.
Subjects aren't harmful. That is probably even a little too dogmatic for me (and I am first to boo-boo the use of subjects). I would say that Subjects indicate a code smell. You probably could be doing it better without them, but if you keep the encapsulated within your class then at least you keep the smell in one place.
Here I would say, that you are already using "non-standard" event patterns, and it seems you don't want to, or cant, change that. In this case, it seems the usage of subjects as a bridge isn't going to make it any worse than it is.
If you were starting from scratch, then I would suggest that you deeply think about your design and you will probably find that you just wouldn't need a subject.
Lastly, I agree with the other comments that you should be using a FromEvent and ToTask, but you suggest these do not work. Why? I dont think you provide nearly enough of your code base to help with design questions like this. e.g. How are thee nondeterministic task being created? and by what? What is the actual problem you are trying to solve. If you could provide a full example, you might get the amount of attention you are looking for.
Here is what a good book about the Rx says regarding why and when Subject can be harmful:
http://www.introtorx.com/Content/v1.0.10621.0/18_UsageGuidelines.html
"Avoid the use of the subject types. Rx is effectively a functional
programming paradigm. Using subjects means we are now managing state,
which is potentially mutating. Dealing with both mutating state and
asynchronous programming at the same time is very hard to get right.
Furthermore, many of the operators (extension methods) have been
carefully written to ensure that correct and consistent lifetime of
subscriptions and sequences is maintained; when you introduce
subjects, you can break this. Future releases may also see significant
performance degradation if you explicitly use subjects."
I'm facing the problem of designing methods that with performs network I/O (for a reusable library). I've read this question
c# 5 await/async pattern in API design
and also other ones closer to my issue.
So, the question is, if I want provide both async and non-async method how I've to design these?
For example to expose a non-async version of a method, I need to do something like
public void DoSomething() {
DoSomethingAsync(CancellationToken.None).Wait();
}
and I feel it's not a great design. I'd like a suggestion (for example) on how to define private methods that can be wrapped in public ones to provide both versions.
If you want the most maintainable option, only provide an async API, which is implemented without making any blocking calls or using any thread pool threads.
If you really want to have both async and synchronous APIs, then you'll encounter a maintainability problem. You really need to implement it twice: once async and once synchronous. Both of those methods will look nearly identical so the initial implementation is easy, but you will end up with two separate nearly-identical methods so maintenance is problematic.
In particular, there's a no good and simple way to just make an async or synchronous "wrapper". Stephen Toub has the best info on the subject:
Should I expose asynchronous wrappers for synchronous methods?
Should I expose synchronous wrappers for asynchronous methods?
(the short answer to both questions is "no")
However, there are some hacks you can use if you want to avoid the duplicated implementation; the best one is usually the boolean argument hack.
I agree with both Marc and Stephen (Cleary).
(BTW, I started to write this as a comment to Stephen's answer, but it turned out to be too long; let me know if it is OK to write this as an answer or not, and feel free to take bits from it and add it to Stephen's answer, in the spirit of "providing the one best answer").
It really "depends": like Marc said, it is important to know how DoSomethingAsync is asynchronous. We all agree that there is no point in having a the "sync" method call the "async" method and "wait": this can be done in user code. The only advantage of having a separate method is to have actual performance gains, to have an implementation which is, under the hood, different and tailored to the synchronous scenario. This is especially true if the "async" method is creating a thread (or taking it from a threadpool): you end up with something that underneath uses two "control flows", while "promising" with its synchronous looks to be executed in the callers' context. This may even have concurrency issues, depending on the implementation.
Also in other cases, like the intensive I/O that the OP is mentioning, it may be worth having two different implementation. Most operating systems (Windows for sure) have for I/O different mechanisms tailored to the two scenarios: for example, async execution of and I/O operation takes great advantages from OS level mechanisms like I/O completion ports, which add a little overhead (not significant, but not null) in the kernel (after all, they have to do bookkeeping, dispatch, etc.), and more direct implementation for synchronous operations.
Code complexity also varies a lot, especially in functions where multiple operations are done/coordinated.
What I would do is:
have some examples/test for typical usage and scenarios
see which API variant is used, where, and measure. Measure also difference in performance between a "pure sync" variant and "sync". (not for the whole API, but for selected few typical cases)
based on measurement, decide if the added cost is worth it.
This mainly because two goals are somehow in contrast with one another. If you want maintainable code, the obvious choice is implementing sync in terms of async/wait (or the other way around) (or, even better, provide only the async variant and let the user do "wait"); if you want performance you should implement the two functions differently, to exploit different underlying mechanisms (from the framework or from the OS). I think that it should not make difference from a unit-testing point of view how you actually implement your API.
I ran into the same problem but managed to find a compromise between efficiency and maintainability using two simple facts about async methods:
asynchronous method which does not execute any await is synchronous;
asynchronous method which awaits only synchronous methods is synchronous.
This is better to be shown on example:
//Simple synchronous methods that starts third party component, waits for a second and gets result.
public ThirdPartyResult Execute(ThirdPartyOptions options)
{
ThirdPartyComponent.Start(options);
System.Threading.Thread.Sleep(1000);
return ThirdPartyComponent.GetResult();
}
To provide maintainable sync/async version of this method it has been split to three layers:
//Lower level - parts that work differently for sync/async version.
//When isAsync is false there are no await operators and method is running synchronously.
private static async Task Wait(bool isAsync, int milliseconds)
{
if (isAsync)
{
await Task.Delay(milliseconds);
}
else
{
System.Threading.Thread.Sleep(milliseconds);
}
}
//Middle level - the main algorithm.
//When isAsync is false the only awaited method is running synchronously,
//so the whole algorithm is running synchronously.
private async Task<ThirdPartyResult> Execute(bool isAsync, ThirdPartyOptions options)
{
ThirdPartyComponent.Start(options);
await Wait(isAsync, 1000);
return ThirdPartyComponent.GetResult();
}
//Upper level - public synchronous API.
//Internal method runs synchronously and will be already finished when Result property is accessed.
public ThirdPartyResult ExecuteSync(ThirdPartyOptions options)
{
return Execute(false, options).Result;
}
//Upper level - public asynchronous API.
public async Task<ThirdPartyResult> ExecuteAsync(ThirdPartyOptions options)
{
return await Execute(true, options);
}
The main advantage here is that middle level algorithm which is most likely to change is implemented only once so developer don't have to maintain two almost identical pieces of code.