This past year we've been working on a new web app that calls into our company's existing service layer.
We made the decision to wrap up all of our service oriented calls into our own service layer (which I'll refer to as our web-service layer) so that the details of which services we use (we'll move to a new API at some point in the future) are hidden from the web layer itself.
We also decided that most of our web-service layer methods would return Task<T>.
As it stands, the underlying services that we call are not async and so there are some concerns that our web-service layer will max out the available threads and cause problems when we have a large volume of users.
I'm looking for information, one way or the other, to further understand how our decision to return Task<T> will impact our site and whether or not we need to consider changing our return types.
We'll be moving to VS2012 at some point but right now we're using VS2010 and are not using async and await.
As it stands, the underlying services that we call are not async and
so there are some concerns that our web-service layer will max out the
available threads and cause problems when we have a large volume of
users.
Yes that's something you should be concerned with. I would recommend you doing this only if you have real asynchronous methods. But simply wrapping blocking synchronous methods into an async API will be worse than calling the synchronous methods directly from the consuming code. In an ASP.NET application you will get benefit from asynchronous calls only if the underlying API is relying on I/O Completion Ports. Otherwise it's just a waste of resource.
The only useful scenario in which you could do that is if your methods could be called in parallel instead of sequentially. This is possible only is the different methods are not connected between them. In this case you could indeed wrap your sync methods in async tasks.
Related
I need to use a third-party DLL which implements a TCP socket client (in C++) using blocking calls. So basically (pseudocode);
void DoRequest()
{
send(myblockingSocket,data);
recv(myblockingSocket,responsedata);
}
What is the recommended way to make these calls accessible in .NET as asynchronous calls using async-await (without changing the original DLL) ?
I read: https://learn.microsoft.com/en-us/dotnet/standard/async-in-depth#deeper-dive-into-tasks-for-an-io-bound-operation and https://learn.microsoft.com/en-us/dotnet/csharp/async and several other pages and did not find another solution than spawning a new task, which is not recommended to do on I/O bound operations because of the task creation overhead.
What is the recommended way to make these calls accessible in .NET as asynchronous calls using async-await (without changing the original DLL) ?
There is no recommended solution because this isn't possible. Either the DLL itself must be changed/replaced so that it supports asynchrony, or the asynchronous calls will just be running the synchronous code on a background thread - what I call "fake asynchrony" because it appears asynchronous but is actually taking up a thread anyway.
... did not find another solution than spawning a new task, which is not recommended to do on I/O bound operations because of the task creation overhead.
It's actually not recommended for a couple of reasons:
It lies to the upstream code. It says "this API is asynchronous" when it's not. This can lead consumers to make incorrect decisions, e.g., preferring the asynchronous API in a server scenario.
It doesn't provide any actual benefit. Implementing a method with Task.Run forces the consumers to use an additional thread. If you just kept the API synchronous, then consumers can choose to call it with Task.Run or not, depending on their needs.
I've heard that the responsibility for threading should lie on the application and I shouldn't use Task.Run or maybe TaskFactory.StartNew in async methods.
However if I have a library that has methods that do quite heavy computation, then to free the threads that for example are accepting asp .net core http requests, couldn't I make the method async and make it run a long running task? Or this should be a sync method and the asp .net core application should be responsible to start the task?
At first, let's think why we need Asynchrony?
Asynchrony is needed either for scalability or offloading.
In case of Scalability, exposing async version of that call does nothing. Because you’re typically still consuming the same amount of resources you would have if you’d invoked it synchronously, even a bit more. But, Scalability is achieved by decreasing the amount of resources you use. And you are not decreasing resources by using Task.Run().
In case of Offloading, you can expose async wrappers of your sync methods. Because it can be very useful for responsiveness, as it allows you to offload long-running operations to a different thread. And in that way, you are getting some benefit from that async wrapper of your method.
Result:
Wrapping a synchronous method with a simple asynchronous façade does not yield any scalability benefits, but yields offloading benefits. But in such cases, by exposing only the synchronous method, you get some nice benefits. For example:
Surface area of your library is reduced.
Your users will know whether there are actually scalability benefits to using exposed asynchronous APIs
If both the synchronous method and an asynchronous wrapper around it are exposed, the developer is then faced with thinking they should invoke the asynchronous version for scalability(?) reasons, but in reality will actually be hurting their throughput by paying for the additional offloading overhead without the scalability benefits.
The source is Should I expose asynchronous wrappers for synchronous methods? by Stepen Toub. And I strongly recommend to you to read it.
Update:
Question in the comment:
Scalability is well explained in that article, with one example. Let's take into account Thread.Sleep. There are two possible ways to implement async version of that call:
public Task SleepAsync(int millisecondsTimeout)
{
return Task.Run(() => Sleep(millisecondsTimeout));
}
And another new implementation:
public Task SleepAsync(int millisecondsTimeout)
{
TaskCompletionSource<bool> tcs = null;
var t = new Timer(delegate { tcs.TrySetResult(true); }, null, –1, -1);
tcs = new TaskCompletionSource<bool>(t);
t.Change(millisecondsTimeout, -1);
return tcs.Task;
}
Both of these implementations provide the same basic behavior, both completing the returned task after the timeout has expired. However, from a scalability perspective, the latter is much more scalable. The former implementation consumes a thread from the thread pool for the duration of the wait time, whereas the latter simply relies on an efficient timer to signal the Task when the duration has expired.
So, in your case, just wrapping call with Task.Run won't be exposed for scalability, but offloading. But, user of that library is not aware of that.
User of your library, can just wrap that call with Task.Run himself. And I really, think he must do it.
Not exactly answering the question (I think the other answer is good enought for that), but to add some additional advice: Becareful with using Task.Run in a library which other people can use. It can cause unexpected Thread pool starvation for the library users. For example a developer is using a lot of third party libraries and all of them use Task.Run() and stuff. Now the developer tries to use Task.Run in his app too, but it slows down his app, because the thread pool is already used up by the third party libraries.
When you want to parallel stuff with Parallel.ForEach it is a different issue.
I'm confused about async IO operations. In this article Stephen Cleary explains that we should not use Task.Run(() => SomeIoMethod()) because truly async operations should use
standard P/Invoke asynchronous I/O system in .NET
http://blog.stephencleary.com/2013/11/there-is-no-thread.html
However, avoid “fake asynchrony” in libraries. Fake asynchrony is when
a component has an async-ready API, but it’s implemented by just
wrapping the synchronous API within a thread pool thread. That is
counterproductive to scalability on ASP.NET. One prominent example of
fake asynchrony is Newtonsoft JSON.NET, an otherwise excellent
library. It’s best to not call the (fake) asynchronous versions for
serializing JSON; just call the synchronous versions instead. A
trickier example of fake asynchrony is the BCL file streams. When a
file stream is opened, it must be explicitly opened for asynchronous
access; otherwise, it will use fake asynchrony, synchronously blocking
a thread pool thread on the file reads and writes.
And he advises to use HttpClient but internaly it use Task.Factory.StartNew()
Does this mean that HttpClient provides not truly async operations?
Does this mean that HttpClient provides not truly async operations?
Sort of. HttpClient is in an unusual position, since it's primary implementation uses HttpWebRequest, which is only partially asynchronous.
In particular, the DNS lookup is synchronous, and I think maybe the proxy resolution, too. After that, it's all asynchronous. So, for most scenarios, the DNS is fast (usually cached) and there isn't a proxy, so it acts asynchronously. Unfortunately, there are enough scenarios (particularly from within corporate networks) where the synchronous operations can cause significant lag.
So, when the team was writing HttpClient, they had three options:
Fix HttpWebRequest (and friends) allowing for fully-asynchronous operations. Unfortunately, this would have broken a fair amount of code. Due to the way inheritance is used as extension points in these objects, adding asynchronous methods would be backwards-incompatible.
Write their own HttpWebRequest equivalent. Unfortunately, this would take a lot of work and they'd lose all the interoperability with existing WebRequest-related code.
Queue requests to the thread pool to avoid the worst-case scenario (blocking synchronous code on the UI thread). Unfortunately, this has the side effects of degrading scalability on ASP.NET, being dependent on a free thread pool thread, and incurring the worst-case scenario cost even for best-case scenarios.
In an ideal world (i.e., when we have infinite developer and tester time), I would prefer (2), but I understand why they chose (3).
On a side note, the code you posted shows a dangerous use of StartNew, which has actually caused problems due to its use of TaskScheduler.Current. This has been fixed in .NET Core - not sure when the fix will roll back into .NET Framework proper.
No, your assumptions are wrong.
StartNew isn't equal to the Run method.
This code is from HttpClientHandler, not the HttpClient, and you didn't examine the this.startRequest code from this class. The code you're inspecting is a prepare method, which starts a task in new thread pool, and inside call actual code to start an http request.
HTTP-connection is created not on the .NET level of abstraction, and I'm sure that inside startRequest you'LL find some P/Invoke method, which will do actual work for:
DNS lookup
Socket connection
Sending the request
waiting for the answer
etc.
As you can see, all above are logic which really should be called in async manner, because it is outside the .NET framework, and some operation can be very time-consuming. This is exactly logic that should be called asynchroniously, and during the waiting for it .NET thread is being released in ThreadPool to process other tasks.
Why is the practice of returning a Task<T> from Web Api methods not the default and in the methods that you get when you create a new Web Api Controller in Visual Studio?
Are there any disadvantages to doing this?
public class MyController : ApiController
{
public Task<string> Boo()
{
return Task.Factory.StartNew(() =>
{
return "Boo";
});
}
}
Are there any disadvantages to doing this?
Yes, you're making your code less readable, longer and less performant for no good reason. I don't see any adavantages of doing this.
When to use asynchronous operations:
Your application has to query data from external sources (external services, databases,..). Using asynchronous operations with Task is this case is key to scalable applications as your threads are not blocked waiting for the external sources.
You need to do a lot of compute-bound operations. Since compute-bound operations occur on CPU, parallelizing these operations can greatly improve the application throughput, especially if your application is run on a multi-core computer.
With that being said, we do not always use async: http://msdn.microsoft.com/en-us/magazine/hh456402.aspx
A typical case is we don't need to query data from external sources, it's already there:
It can actually benefit a developer to avoid using
async methods in a certain, small set of use cases, particularly for
library methods that will be accessed in a more fine-grained manner.
Typically, this is the case when it’s known that the method may
actually be able to complete synchronously because the data it’s
relying on is already available.
Asynchronous operations with Task does have overhead:
When designing asynchronous methods, the Framework developers spent a
lot of time optimizing away object allocations. This is because
allocations represent one of the largest performance costs possible in
the asynchronous method infrastructure. The act of allocating an
object is typically quite cheap. Allocating objects is akin to filling
your shopping cart with merchandise, in that it doesn't cost you much
effort to put items into your cart; it’s when you actually check out
that you need to pull out your wallet and invest significant
resources. While allocations are usually cheap, the resulting garbage
collection can be a showstopper when it comes to the application’s
performance.
I'm writing a series of ASP.Net Web Api services that basically get data from a database and return it.
We decided for now to reuse previous poorly written Data Access Objects (let's call them PoorDAO) that use ADO.Net to call stored procedures in the database.
One improvement in the future will be to rewrite that data access layer to benefit from Async data calls with Entity Framework.
Because of this, we decided to wrap the PoorDAO's in Repositories implementing an interface that exposes asynchronous methods. The idea is to keep the same interfaces for future EF asynchronous repositories :
// future common interface
public interface ICountryRepository
{
Task<Country> GetAllCountries();
}
// current implementation hiding a PoorDAO in shame
public class CountryRepository : ICountryRepository
{
public Task<Country> GetAllCountries()
{
var countries = PoorCountryDAO.GetAllcountries(); // poor static API call
// some data transformation ...
return Task.FromResult(result);
}
}
What we have here is basically a synchronous operation hiding in asynchronous clothing. This is all fine, but my question is : while we're at it, wouldn't it be better to make the method entirely async and call await Task.Run(() => poorCountryDAO.GetAllcountries()) instead of just poorCountryDAO.GetAllcountries() ?
As far as I can tell, this would free up the IIS thread the Web Api service HTTP request is currently running on, and create or reuse another thread. This thread would be blocked waiting for the DB to respond instead of the IIS thread being blocked. Is that any better resource wise ? Did I totally misunderstand or overinterpret how Task.Run() works ?
Edit : I came across this article which claims that in some cases, asynchronous database calls can result in an 8 fold performance improvement. His scenario is very close to mine. I can't get my head around how that could be possible given the answers here and am a bit perplexed about what to do...
Is that any better resource wise?
No; it's provably worse. The existing Task.FromResult and await is the best solution.
Task.Run, Task.Factory.StartNew, and Task.Start should not be used in an ASP.NET application. They steal threads from the same thread pool that ASP.NET uses, causing extra thread switches. Also, if they are long-running, they will mess with the default ASP.NET thread pool heuristics, possibly causing it to create and destroy threads unnecessarily.
It's the same thing, you're locking up a thread while releasing another one. In theory performance is the same, although it will actually be slightly worse because of the overhead of context switching
A few points: first, for await Task.Start(() => poorCountryDAO.GetAllcountries()), Task.Start(() => poorCountryDAO.GetAllcountries()) already gives you a task, so you should just return that instead rather than awaiting.
Note that in any case, the fact that this method's Task is really synchronous is an implementation detail. There may be a temptation to wrap the GetAllCountries() call itself in a background thread, but that's a bad idea.
In all of these cases, you're still going to be stuck wasting a thread. The scenario you desire where you free up the IIS thread completely requires the use of "Overlapped IO" for the database calls (as per your link).
Basically, in these cases right now, one way or another, a thread (either the main thread or a worker thread) are going to block when they call PoorCountryDAO.GetAllcountries(). However, when you switch to the asynchronous DB calls, they will no longer burn a thread at all. If, however, the caller uses its own Task.Run, that will now come back to bite you.