Task in synchronous application - IO bound - c#

We are working on a old comparator.
When an user make a research, we are calling 10-30 different webservice (REST, SOAP) at the same time. Pretty classic so. Each webservice is reprensented by a Client in our application.
So the code is like:
//Get the request list of client to call
clientRqListToCall = BuildRequest(userContext);
List<Task> taskList = new List<Task>();
//Call the different client
Foreach (ClientRequest clientRq in clientRqListToCall) {
Task task = Task.Run(() => CallClient(clientRq));
taskList.Add(task);
}
//wait client until timeOut
Task mainWaiterTask = Task.WhenAll(taskList);
mainTask.ConfigureAwait(false);
mainTask.Wait(timeout);
Simple. (Not sure the configureAwait is needed). The response of each client is store in a field of ClientRequest, so we don't use mainTask.Result (if a client Timeout, we need to be able to continue with another's ones, and they timeout a lot! Client call behaviours are pretty similar to a fireandforget).
The application is a little old, our search engine is synchronous. The call of the different webservice are in the different CallClient callTree, according the to research context, 5 to 15 different function are call before the webservice call. Each webservice call is pretty long (1 to 15s each) ! This point seems to be important ! These are not pings simple pings requests.
Actions / Changes ?
So this is an I/O bound problem, we know Task.Run work pretty well for CPU-bound problem and not for I/O, the question is how to make this code better?
We read a lot of different article on the subject, thanks to Stephen Cleary (http://blog.stephencleary.com/2012/07/dont-block-on-async-code.html)
But we arenot sure of our choice / road map, that s why i post this ticket.
We could make the code asynchronous, but we would have to rework the whole CallClient call tree (hundreds of functions). It is the only solution ? Of course we could migrate webservice one by one using bool argument hack (https://msdn.microsoft.com/en-us/magazine/mt238404.aspx).
=> Must we start with the most costly (in term of IO) webservice, or only the number of webservice call is important, and so we should start the easiest?
In others words, if i got 1 bigs client, with a 10s response average and a lot of data, must we start to async then first? Or should be start with littles ones (1-2s) with the same amount of data. I could be wrong but a thread is lock in synchronous way until task.run() finish so obvisouly the 10s Task lock a thread for the whole time, but in term of I/O free a thread asap could be better. Does the amount of data download is important? or should we only thinck in term of webservice timer?
Task.Run use the application threadPool, we have to choice between .Run(...) or Task.Factory.StartNew(..., TaskCreationOptions.LongRunning) in order to (lots of the time),
create new thread and so maybe got a better.
=> i made some test on subjet, using a console application, .Run() seems to be 25% to 33% faster than Task.Factory.StartNew in all scenario.
Of course this is an expected result, but on a webapp with like 200 users,
i am not sure the result would be the same, i fear the pool to be full and the Task jump to each others without be ended.
Note: If startNew is used, WaitAll(timeout) remplace WhenAll.
Today we got in average 20 to 50 customers can make a research at the same time. The application work without big issues, we dont have deadlock, but sometimes we can see some delay in the task execution in our side. Our Cpu usage is pretty low (<10%), Ram is green too (<25%)
I know there is plenty of tickets about Tasks, but it s hard to merge them together to match our problem. And we also read contradictory advices.

I have used Parallel.ForEach to handle multiple I/O operations before, I did not see it mentioned above. I am not sure it will handle quite what you need seeing the function that is passed into the loop is that same for each. Maybe coupled with a strategy pattern / delegates you can achieve what you need.

Related

Type of threading to use in c# for heavy IO operations

I am tasked with updating a c# application (non-gui) that is very single-threaded in it's operation and add multi-threading to it to get it to turn queues of work over quicker.
Each thread will need to perform a very minimal amount of calculations, but most of the work will be calling on and wait on SQL Server requests. So, lots of waiting as compared to CPU time.
A couple of requirements will be:
Running on some limited hardware (that is, just a couple of cores). The current system, when it's being "pushed" only takes about 25% CPU. But, since it's mostly doing waits for the SQL Server to respond (different server), we would like to the capability to have more threads than cores.
Be able to limit the number of threads. I also can't just have an unlimited number of threads going either. I don't mind doing the limiting myself via an Array, List, etc.
Be able to keep track of when these threads complete so that I can do some post-processing.
It just seems to me that the .NET Framework has so many different ways of doing threads, I'm not sure if one is better than the other for this task. I'm not sure if I should be using Task, Thread, ThreadPool, something else... It appers to me that async \ await model would not be a good fit in this case though as it waits on one specific task to complete.
I'm not sure if I should be using Task, Thread, ThreadPool, something else...
In your case it matters less than you would think. You can focus on what fits your (existing) code style and dataflow the best.
since it's mostly doing waits for the SQL Server to respond
Your main goal would be to get as many of those SQL queries going in parallel as possible.
Be able to limit the number of threads.
Don't worry about that too much. On 4 cores, with 25% CPU, you can easily have 100 threads going. More on 64bit. But you don't want 1000s of threads. A .net Thread uses 1MB minimum, estimate how much RAM you can spare.
So it depends on your application, how many queries can you get running at the same time. Worry about thread-safety first.
When the number of parallel queries is > 1000, you will need async/await to run on fewer threads.
As long as it is < 100, just let threads block on I/O. Parallel.ForEach() , Parallel.Invoke() etc look like good tools.
The 100 - 1000 range is the grey area.
add multi-threading to it to get it to turn queues of work over quicker.
Each thread will need to perform a very minimal amount of calculations, but most of the work will be calling on and wait on SQL Server requests. So, lots of waiting as compared to CPU time.
With that kind of processing, it's not clear how multithreading will benefit you. Multithreading is one form of concurrency, and since your workload is primarily I/O-bound, asynchrony (and not multithreading) would be the first thing to consider.
It just seems to me that the .NET Framework has so many different ways of doing threads, I'm not sure if one is better than the other for this task.
Indeed. For reference, Thread and ThreadPool are pretty much legacy these days; there are much better higher-level APIs. Task should also be rare if used as a delegate task (e.g., Task.Factory.StartNew).
It appers to me that async \ await model would not be a good fit in this case though as it waits on one specific task to complete.
await will wait on one task at a time, yes. Task.WhenAll can be used to combine
multiple tasks and then you can await on the combined task.
get it to turn queues of work over quicker.
Be able to limit the number of threads.
Be able to keep track of when these threads complete so that I can do some post-processing.
It sounds to me that TPL Dataflow would be the best approach for your system. Dataflow allows you to define a "pipeline" through which data flows, with some steps being asynchronous (e.g., querying SQL Server) and other steps being parallel (e.g., data processing).
I was asking a high-level question to try and get back a high-level answer.
You may be interested in my book.
The TPL Dataflow library is probably one of the best options for this job. Here is how you could construct a simple dataflow pipeline consisting of two blocks. The first block accepts a filepath and produces some intermediate data, that can be later inserted to the database. The second block consumes the data coming from the first block, by sending them to the database.
var inputBlock = new TransformBlock<string, IntermediateData>(filePath =>
{
return GetIntermediateDataFromFilePath(filePath);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = Environment.ProcessorCount // What the local machine can handle
});
var databaseBlock = new ActionBlock<IntermediateData>(item =>
{
SaveItemToDatabase(item);
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 20 // What the database server can handle
});
inputBlock.LinkTo(databaseBlock);
Now every time a user uploads a file, you just save the file in a temp path, and post the path to the first block:
inputBlock.Post(filePath);
And that's it. The data will flow from the first to the last block of the pipeline automatically, transformed and processed along the way, according to the configuration of each block.
This is an intentionally simplified example to demonstrate the basic functionality. A production-ready implementation will probably have more options defined, like the CancellationToken and BoundedCapacity, will watch the return value of inputBlock.Post to react in case the block can't accept the job, may have completion propagation, watch the databaseBlock.Completion property for errors etc.
If you are interested at following this route, it would be a good idea to study the library a bit, in order to become familiar with the options available. For example there is a TransformManyBlock available, suitable for producing multiple outputs from a single input. The BatchBlock may also be useful in some cases.
The TPL Dataflow is built-in the .NET Core, and available as a package for .NET Framework. It has some learning curve, and some gotchas to be aware of, but it's nothing terrible.
It appers to me that async \ await model would not be a good fit in this case though as it waits on one specific task to complete.
That is wrong. Async/await is just a syntax to simplify a state-machine mechanism for asynchronous code. It waits without consuming any thread. in other words async keyword does not create thread and await does not hold up any thread.
Be able to limit the number of threads
see How to limit the amount of concurrent async I/O operations?
Be able to keep track of when these threads complete so that I can do some post-processing.
If you don't use "fire and forget" pattern then you can keep track of the task and its exceptions just by writing await task
var task = MethodAsync();
await task;
PostProcessing();
async Task MethodAsync(){ ... }
Or for a similar approach you can use ContinueWith:
var task = MethodAsync();
await task.ContinueWith(() => PostProcessing());
async Task MethodAsync(){ ... }
read more:
Releasing threads during async tasks
https://learn.microsoft.com/en-us/dotnet/standard/asynchronous-programming-patterns/?redirectedfrom=MSDN

Why are web apps going crazy with await / async nowadays?

I come from a back end / thick client background, so maybe I'm missing something... but I recently looked at the source for an open source JWT token server and the authors went crazy with await / async. Like on every method and every line.
I get what the pattern is for... to run long running tasks in a separate thread. In my thick client days, I would use it if a method might take a few seconds, so as not to block the GUI thread... but definitely not on a method that takes a few ms.
Is this excessive use of await / async something you need for web dev or for something like Angular? This was in a JWT token server, so not even seeing what it has to do with any of those. It's just a REST end point.
How is making every single line async going to improve performace? To me, it'll kill performance from spinning up all those threads, no?
I get what the pattern is for... to run long running tasks in a separate thread.
This is absolutely not what this pattern is for.
Await does not put the operation on a new thread. Make sure that is very clear to you. Await schedules the remaining work as the continuation of the high latency operation.
Await does not make a synchronous operation into an asynchronous concurrent operation. Await enables programmers who are working with a model that is already asynchronous to write their logic to resemble synchronous workflows. Await neither creates nor destroys asynchrony; it manages existing asynchrony.
Spinning up a new thread is like hiring a worker. When you await a task, you are not hiring a worker to do that task. You are asking "is this task already done? If not, call me back when its done so I can keep doing work that depends on that task. In the meanwhile, I'm going to go work on this other thing over here..."
If you're doing your taxes and you find you need a number from your work, and the mail hasn't arrived yet, you don't hire a worker to wait by the mailbox. You make a note of where you were in your taxes, go get other stuff done, and when the mail comes, you pick up where you left off. That's await. It's asynchronously waiting for a result.
Is this excessive use of await / async something you need for web dev or for something like Angular?
It's to manage latency.
How is making every single line async going to improve performance?
In two ways. First, by ensuring that applications remain responsive in a world with high-latency operations. That kind of performance is important to users who don't want their apps to hang. Second, by providing developers with tools for expressing the data dependency relationships in asynchronous workflows. By not blocking on high-latency operations, system resources are freed up to work on unblocked operations.
To me, it'll kill performance from spinning up all those threads, no?
There are no threads. Concurrency is a mechanism for achieving asynchrony; it is not the only one.
Ok, so if I write code like: await someMethod1(); await someMethod2(); await someMethod3(); that is magically going to make the app more responsive?
More responsive compared to what? Compared to calling those methods without awaiting them? No, of course not. Compared to synchronously waiting for the tasks to complete? Absolutely, yes.
That's what I'm not getting I guess. If you awaited on all 3 at the end, then yeah, you're running the 3 methods in parallel.
No no no. Stop thinking about parallelism. There need not be any parallelism.
Think about it this way. You wish to make a fried egg sandwich. You have the following tasks:
Fry an egg
Toast some bread
Assemble a sandwich
Three tasks. The third task depends on the results of the first two, but the first two tasks do not depend on each other. So, here are some workflows:
Put an egg in the pan. While the egg is frying, stare at the egg.
Once the egg is done, put some toast in the toaster. Stare at the toaster.
Once the toast is done, put the egg on the toast.
The problem is that you could be putting the toast in the toaster while the egg is cooking. Alternative workflow:
Put an egg in the pan. Set an alarm that rings when the egg is done.
Put toast in the toaster. Set an alarm that rings when the toast is done.
Check your mail. Do your taxes. Polish the silverware. Whatever it is you need to do.
When both alarms have rung, grab the egg and the toast, put them together, and you have a sandwich.
Do you see why the asynchronous workflow is far more efficient? You get lots of stuff done while you're waiting for the high latency operation to complete. But you did not hire an egg chef and a toast chef. There are no new threads!
The workflow I proposed would be:
eggtask = FryEggAsync();
toasttask = MakeToastAsync();
egg = await eggtask;
toast = await toasttask;
return MakeSandwich(egg, toast);
Now, compare that to:
eggtask = FryEggAsync();
egg = await eggtask;
toasttask = MakeToastAsync();
toast = await toasttask;
return MakeSandwich(egg, toast);
Do you see how that workflow differs? This workflow is:
Put an egg in the pan and set an alarm.
Go do other work until the alarm goes off.
Get the egg out of the pan; put the bread in the toaster. Set an alarm...
Go do other work until the alarm goes off.
When the alarm goes off, assemble the sandwich.
This workflow is less efficient because we have failed to capture the fact that the toast and egg tasks are high latency and independent. But it is surely more efficient use of resources than doing nothing while you're waiting for the egg to cook.
The point of this whole thing is: threads are insanely expensive, so don't spin up new threads. Rather, make more efficient use of the thread you've got by putting it to work while you're doing high latency operations. Await is not about spinning up new threads; it is about getting more work done on one thread in a world with high latency computation.
Maybe that computation is being done on another thread, maybe it's blocked on disk, whatever. Doesn't matter. The point is, await is for managing that asynchrony, not creating it.
I'm having a difficult time understanding how asynchronous programming can be possible without using parallelism somewhere. Like, how do you tell the program to get started on the toast while waiting for the eggs without DoEggs() running concurrently, at least internally?
Go back to the analogy. You are making an egg sandwich, the eggs and toast are cooking, and so you start reading your mail. You get halfway through the mail when the eggs are done, so you put the mail aside and take the egg off the heat. Then you go back to the mail. Then the toast is done and you make the sandwich. Then you finish reading your mail after the sandwich is made. How did you do all that without hiring staff, one person to read the mail, one person to cook the egg, one to make the toast and one to assemble the sandwich? You did it all with a single worker.
How did you do that? By breaking tasks up into small pieces, noting which pieces have to be done in which order, and then cooperatively multitasking the pieces.
Kids today with their big flat virtual memory models and multithreaded processes think that this is how its always been, but my memory stretches back to the days of Windows 3, which had none of that. If you wanted two things to happen "in parallel" that's what you did: split the tasks up into small parts and took turns executing parts. The whole operating system was based on this concept.
Now, you might look at the analogy and say "OK, but some of the work, like actually toasting the toast, is being done by a machine", and that is the source of parallelism. Sure, I didn't have to hire a worker to toast the bread, but I achieved parallelism in hardware. And that is the right way to think of it. Hardware parallelism and thread parallelism are different. When you make an asynchronous request to the network subsystem to go find you a record from a database, there is no thread that is sitting there waiting for the result. The hardware achieves parallelism at a level far, far below that of operating system threads.
If you want a more detailed explanation of how hardware works with the operating system to achieve asynchrony, read "There is no thread" by Stephen Cleary.
So when you see "async" do not think "parallel". Think "high latency operation split up into small pieces" If there are many such operations whose pieces do not depend on each other then you can cooperatively interleave the execution of those pieces on one thread.
As you might imagine, it is very difficult to write control flows where you can abandon what you are doing right now, go do something else, and seamlessly pick up where you left off. That's why we make the compiler do that work! The point of "await" is that it lets you manage those asynchronous workflows by describing them as synchronous workflows. Everywhere that there is a point where you could put this task aside and come back to it later, write "await". The compiler will take care of turning your code into many tiny pieces that can each be scheduled in an asynchronous workflow.
UPDATE:
In your last example, what would be the difference between
eggtask = FryEggAsync();
egg = await eggtask;
toasttask = MakeToastAsync();
toast = await toasttask;
egg = await FryEggAsync();
toast = await MakeToastAsync();?
I assume it calls them synchronously but executes them asynchronously? I have to admit I've never even bothered to await the task separately before.
There is no difference.
When FryEggAsync is called, it is called regardless of whether await appears before it or not. await is an operator. It operates on the thing returned from the call to FryEggAsync. It's just like any other operator.
Let me say this again: await is an operator and its operand is a task. It is a very unusual operator, to be sure, but grammatically it is an operator, and it operates on a value just like any other operator.
Let me say it again: await is not magic dust that you put on a call site and suddenly that call site is remoted to another thread. The call happens when the call happens, the call returns a value, and that value is a reference to an object that is a legal operand to the await operator.
So yes,
var x = Foo();
var y = await x;
and
var y = await Foo();
are the same thing, the same as
var x = Foo();
var y = 1 + x;
and
var y = 1 + Foo();
are the same thing.
So let's go through this one more time, because you seem to believe the myth that await causes asynchrony. It does not.
async Task M() {
var eggtask = FryEggAsync();
Suppose M() is called. FryEggAsync is called. Synchronously. There is no such thing as an asynchronous call; you see a call, control passes to the callee until the callee returns. The callee returns a task which represents an egg to be made available in the future.
How does FryEggAsync do this? I don't know and I don't care. All I know is I call it, and I get an object back that represents a future value. Maybe that value is produced on a different thread. Maybe it is produced on this thread but in the future. Maybe it is produced by special-purpose hardware, like a disk controller or a network card. I don't care. I care that I get back a task.
egg = await eggtask;
Now we take that task and await asks it "are you done?" If the answer is yes, then egg is given the value produced by the task. If the answer is no then M() returns a Task representing "the work of M will be completed in the future". The remainder of M() is signed up as the continuation of eggtask, so when eggtask completes, it will call M() again and pick it up not from the beginning, but from the assignment to egg. M() is a resumable at any point method. The compiler does the necessary magic to make that happen.
So now we've returned. The thread keeps on doing whatever it does. At some point the egg is ready, so the continuation of eggtask is invoked, which causes M() to be called again. It resumes at the point where it left off: assigning the just-produced egg to egg. And now we keep on trucking:
toasttask = MakeToastAsync();
Again, the call returns a task, and we:
toast = await toasttask;
check to see if the task is complete. If yes, we assign toast. If no, then we return from M() again, and the continuation of toasttask is *the remainder of M().
And so on.
Eliminating the task variables does nothing germane. Storage for the values is allocated; it's just not given a name.
ANOTHER UPDATE:
is there a case to be made to call Task-returning methods as early as possible but awaiting them as late as possible?
The example given is something like:
var task = FooAsync();
DoSomethingElse();
var foo = await task;
...
There is some case to be made for that. But let's take a step back here. The purpose of the await operator is to construct an asynchronous workflow using the coding conventions of a synchronous workflow. So the thing to think about is what is that workflow? A workflow imposes an ordering upon a set of related tasks.
The easiest way to see the ordering required in a workflow is to examine the data dependence. You can't make the sandwich before the toast comes out of the toaster, so you're going to have to obtain the toast somewhere. Since await extracts the value from the completed task, there's got to be an await somewhere between the creation of the toaster task and the creation of the sandwich.
You can also represent dependencies on side effects. For example, the user presses the button, so you want to play the siren sound, then wait three seconds, then open the door, then wait three seconds, then close the door:
DisableButton();
PlaySiren();
await Task.Delay(3000);
OpenDoor();
await Task.Delay(3000);
CloseDoor();
EnableButton();
It would make no sense at all to say
DisableButton();
PlaySiren();
var delay1 = Task.Delay(3000);
OpenDoor();
var delay2 = Task.Delay(3000);
CloseDoor();
EnableButton();
await delay1;
await delay2;
Because this is not the desired workflow.
So, the actual answer to your question is: deferring the await until the point where the value is actually needed is a pretty good practice, because it increases the opportunities for work to be scheduled efficiently. But you can go too far; make sure that the workflow that is implemented is the workflow you want.
Generally this is because once asynchronous functions play nicer with other async functions, otherwise you start losing the benefits of asynchronicity. As a result, functions calling async functions end up being async themselves and it spreads throughout the entire application eg. if you made your interactions with a data store async, then things utilising that functionality tend to get made as async as well.
As you convert synchronous code to asynchronous code, you’ll find that it works best if asynchronous code calls and is called by other asynchronous code—all the way down (or “up,” if you prefer). Others have also noticed the spreading behavior of asynchronous programming and have called it “contagious” or compared it to a zombie virus. Whether turtles or zombies, it’s definitely true that asynchronous code tends to drive surrounding code to also be asynchronous. This behavior is inherent in all types of asynchronous programming, not just the new async/await keywords.
Source: Async/Await - Best Practices in Asynchronous Programming
It's an Actor Model World, Really...
My view is that async / await are simply a way of dressing up software systems so as to avoid having to concede that, really, a lot of systems (especially those with a lot of network comms) are better seen as Actor model (or better yet, Communicating Sequential Process) systems.
With both of these the whole point is that you wait for one of several things to become complete-able, take the necessary action when one does, and then return to waiting. Specifically you're waiting for a message to arrive from somewhere else, reading it, and acting on the content. In *nix, the waiting is generally done with a call to epoll() or select().
Using await / async is simply a way of pretending that your system is still kinda synchronous method calls (and therefore familiar), whilst making it difficult to efficiently cope with things not consistently completing in the same order every time.
However, once you get over the idea that you're no longer calling methods but simply passing messages to and fro it all becomes very natural. It's very much a "please do this", "sure, here's the answer" thing, with many such interactions intertwined. Wrapping it up with a big WaitForLotsOfThings() call at the top of a loop is merely an explicit acknowledgement that your program will wait until it has something to do in response to many other programs communicating with it.
How Windows Makes it Hard
Unfortunately, Windows makes it very hard to implement a reactor system ("if you read that message now, you'll get it"). Windows is proactor ("that message you asked me to read? It's now been read."). It's an important distinction.
First, I'll explain reactor and proactor.
A reactor "reacts" to events. For example, if a socket becomes ready to read, only at that point does the "reactor" model program decide what to read and what to do with it.
Whereas a proactor proactively decides what it's going to do if and when the socket becomes ready, and commits to that action.
With a reactor, a message (or indeed a timeout) that means "stop listening to that other actor" is easily dealt with - you simply exclude that other actor from the list you'll listen to next time you wait (the next call to select() or epoll()).
With a proactor, it's a lot harder. How does one honour a "stop listening to that other actor" message when the socket read() has already been started with some sort of async call, which won't complete until something is read? A completed read() is a doubtful outcome given the instruction recently received?
I'm nit-picking to some extent. Reactor is very useful in systems with dynamic connectivity, Actors dropping into the system, dropping out again. Proactor is fine if you have a fixed population of actors with comms links that'll never go away. Nevertheless, given that a proactor system is easily implemented in a reactor platform, but a reactor system cannot easily be implemented on a proactor platform (time won't go backwards), I find Window's approach particularly irritating.
So one way or other, async / await are definitely still in proactor land.
Knock on Impact
This has infected many other libraries.
C++'s Boost asio is also proactor, even on *nix, largely it seems because they wanted to have a Windows implementation.
ZeroMQ, which is an reactor framework, is limited to some extent on Windows being based on a call to select() (which in Windows works on only sockets).
For the cygwin family of POSIX runtimes on Windows, they had to implement select(), epoll(), etc. by having a thread per file descriptor polling (yes, polling!!!!) the underlying socket / serial port / pipe for incoming data in order to recreate POSIX's routines. Yeurk! The comments on the cygwin dev's mailing lists dating back to the time when they were implementing that part make for amusing reading.
Actor Isn't Necessarily Slow
It's worth noting that the phrase "passing messages" doesn't necessarily mean passing copies around - there's plenty of formulations of the Actor Model where you're merely passing ownership of references to messages around (e.g. Dataflow, part of the Task Parallel library in C#). This makes it fast. I've not yet got round to looking at the Dataflow library, but it doesn't really make Windows reactor all of a sudden. It doesn't give you an actor model reactor system working on all sorts of data bearers like sockets, pipes, queues, etc.
Windows 10's Linux Runtime
So having just blasted Windows and it's inferior proactor architecture, one intriguing point is that Windows 10 now runs Linux binaries under WSL1. How, I'd very much like to know, has Microsoft implemented the system call that underlies select(), epoll() in WSL1 given that it has to function on sockets, serial ports, pipes and everything else in the land of POSIX that is a file descriptor, when everything else on Windows can't? I'd give my hind teeth to know the answer to that question.

Should I make a fast operation async if the method is already async

I have this code (the unimportant details are that it runs on EC2 instances in AWS, processing messages on an SQS queue).
The first statement in the method gets some data over http, the second statement saves state to a local dynamo data store.
public bool HandleMessage(OrderAcceptedMessage message)
{
var order = _orderHttpClient.GetById(message.OrderId);
_localDynamoRepo.SaveAcceptedOrder(message, order);
return true;
}
The performance characteristics are that the http round trip takes 100-200 milliseconds, and the dynamo write takes around 10 milliseconds.
Both of these operations have async versions. We could write it as follows:
public async Task<bool> HandleMessage(OrderAcceptedMessage message)
{
var order = await _orderHttpClient.GetByIdAsync(message.OrderId);
await _localDynamoRepo.SaveAcceptedOrderAsync(message, order);
return true;
}
So the guidance is that since the first operation "could take longer than 50 milliseconds to execute" it should make use of async and await. (1)
But what about the second, fast operation? Which of these two arguments is correct:
Do not make it async: It does not meet the 50ms criterion and it's not worth the overhead.
Do make it async: The overhead has already been paid by the previous operation. There is already task-based asynchrony happening and it's worth using it.
1) http://blog.stephencleary.com/2013/04/ui-guidelines-for-async.html
the unimportant details are that it runs on EC2 instances in AWS, processing messages on an SQS queue
Actually, I think that's an important detail. Because this is not a UI application; it's a server application.
the guidance is that since the first operation "could take longer than 50 milliseconds to execute"
This guidance only applies to UI applications. Thus, the 50ms guideline is meaningless here.
Which of these two arguments is correct
Asynchrony is not about speed. It's about freeing up threads. The 50ms guideline for UI apps is all about freeing up the UI thread. On the server side, async is about freeing up thread pool threads.
The question is how much scalability do you want/need? If your backend is scalable, then I generally recommend async, because that frees up thread pool threads. This makes your web app more scalable and more able to react to changes in load more quickly. But this only gives you a benefit if your backend can scale along with your web app.
First notice that in web apps the biggest cost of async is reduction of productivity. This is what we are weighing the benefits against. You need to think about how much code will be infected if you make this one method async.
The benefit is saving a thread for the duration of the call. A 200ms HTTP call is a pretty good case for async (although it's impossible to say for sure because it also depends on how often you perform the call).
The 50ms criterion is not hard number. In fact that recommendation is for realtime UI apps.
A more useful number is latency times frequency. That tells you how many threads are consumed in the long term average. Infrequent calls do not need to be optimized.
100 dynamo calls per second at 10ms come out at one thread blocked. This is nothing. So this probably is not a good candidate for async.
Of course if you make the first call async you can make the second one async as well at almost no incremental productivity cost because everything is infected already.
You can run the numbers yourself and decide based on that.
This might end up in an opinionated discussion...but let's try.
tl;dr: yes, keep it async.
You are in a library and you don't care about the synchronisation context, so you should not capture it and change your code into:
var order = await _orderHttpClient.GetByIdAsync(message.OrderId).ConfigureAwait(false);
await _localDynamoRepo.SaveAcceptedOrderAsync(message, order).ConfigureAwait(false);
Besides: after the first awaited call, you'll likely end up on a thread of the thread pool. So even if you use the non-async version SaveAcceptedOrder() it will not block. However, this is nothing you should rely on and you don't necessarily know the type of the async method (CPU bound or IO bound = "async by design"). If it is IO bound, there's no need to run it on a thread.
If you're making any remote call, make it async. Yes, DynamoDB calls are fast (except where one has a sub-par hash-key, or many gigabytes of data in a single table), but you're still making them over the internet (even if you're inside AWS EC2 etc), and so you should not ignore any of the Eight Fallacies of Distributed Computing - and especially not 1) The network is reliable or 2) Latency is zero.

Proper understanding of Tasks

At the risk of asking a stupid question (and I will voluntarily delete the question myself if my peers think it is a stupid question)..
I have a C# desktop app.
I upload data to my server using a WCF Service.
I am experimenting with using Tasks.
This code calls my web service...
Task t = Task.Run(() => { wcf.UploadMotionDynamicRaw(bytes); });
I am stress testing this line of code.
I call it as many times in 1 second for a period of X time.
Will this 'flood' my router if the internet is slow for whatever reason?
I can sort of guess that this will be the case...
So, how can I test whether the task has completed before calling it again? In doing this will the extra plumbing slow down my speed gains by using Task?
Finally, is using Tasks making usage of multiple cores?
Will this 'flood' my router if the internet is slow for whatever
reason?
This depends on the size of the file you are uploading and you connection speed. To figure that out, just run .
So, how can I test whether the task has completed before calling it
again?
You can use Task.ContinueWith function (any of available overloads) to "catch" task completion and run some other method, may be in recursion too.
In doing this will the extra plumbing slow down my speed gains by
using Task?
It depends on workload, your processor and timing you expect. So, in other words, run it to measure, there is no possible generic answer to this.
is using Tasks making usage of multiple cores?
Yes, whenever it figures out it is possible. Running single task one after another will not spread the single function work on multiple cores. For this you need to use Parallel.For and similar artifacts. And again, .NET does not provide you with a mechanism for SIMD orchestration, so you are not guaranteed that it will run on multicores, but most probably will.

Why should I use asynchronous operation over synchronous operation?

I have always pondered about this.
Let's say we have a simple asynchronous web request using the HttpWebRequest class
class webtest1
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("www.google.com");
public webtest1()
{
this.StartWebRequest();
}
void StartWebRequest()
{
webRequest.BeginGetResponse(new AsyncCallback(FinishWebRequest), null);
}
void FinishWebRequest(IAsyncResult result)
{
webRequest.EndGetResponse(result);
}
}
The same can be achieved easily with a synchronous operation:
class webtest1
{
HttpWebRequest webRequest = (HttpWebRequest)WebRequest.Create("www.google.com");
public webtest1()
{
webRequest.GetResponse();
}
}
So why would I want to use the more convoluted async operation when a much simpler sync operation would suffice? To save system resources?
If you make an asynchronous request you can do other things while you wait for the response to your request. If you make a synchronous request, you have to wait until you recieve your response until you can do something else.
For simple programs and scripts it may not matter so much, in fact in many of those situations the easier to code and understand synchronous method would be a better design choice.
However, for non-trivial programs, such as a desktop application, a synchronous request which locks up the entire application until the request is finished causes an unacceptable user expierence.
A synchronous operation will prevent you from doing anything else while waiting for the request to complete or time out. Using an asynchronous operation would let you animate something for the user to show the program is busy, or even let them carry on working with other areas of functionality.
The synchronous version is simpler to code but it masks a very serious problem. Network communication, or really an I/O operation, can block and for extended periods of time. Many network connections for example have a timeout of 2 minutes.
Doing a network operation synchronously means your application and UI will block for the entire duration of that operation. A not uncommon network hiccup could cause your app to block for several minutes with no ability to cancel. This leads to very unhappy customers.
Asynchronous becomes especially useful when you have more things going on than you have cores - for example, you might have a number of active web requests, a number of file access operations, and few DB calls, and maybe some other network operations (WCF or redis maybe). If all of those are synchronous, you are creating a lot of threads, a lot of stacks, and suffering a lot of context switches. If you can use an asynchronous API you can usually exploit pool threads for the brief moments when each operation is doing something. This is great for high throughput server environments. Having multiple cores is great, but being efficient is better.
In C# 5 this becomes, via await, no more work than your second example.
I was reading this the other day and a similar question has been pondered before:
Performance difference between Synchronous HTTP Handler and Asynchronous HTTP Handler
1) You are stuck to a single-threaded environment such as silverlight. Here you have no choice but to use async calls or the entire user thread will lock up.
2) You have many calls that take a long time to process. Why block your entire thread when it can go on and do other things while waiting for the return? For example if I have five function calls that each take 5 seconds, I would like to start all of them right away and have them return as necessary.
3) Too much data to process in the output synchronously. If I have a program that writes 10 gigabytes of data to the console and I want to read the output, I have a chance asynchronously to process line by line. If I do this synchronously then I will run out of buffer space and lock up the program.

Categories

Resources