Should data access in Web applications run on different Tasks? - c#

I'm writing a series of ASP.Net Web Api services that basically get data from a database and return it.
We decided for now to reuse previous poorly written Data Access Objects (let's call them PoorDAO) that use ADO.Net to call stored procedures in the database.
One improvement in the future will be to rewrite that data access layer to benefit from Async data calls with Entity Framework.
Because of this, we decided to wrap the PoorDAO's in Repositories implementing an interface that exposes asynchronous methods. The idea is to keep the same interfaces for future EF asynchronous repositories :
// future common interface
public interface ICountryRepository
{
Task<Country> GetAllCountries();
}
// current implementation hiding a PoorDAO in shame
public class CountryRepository : ICountryRepository
{
public Task<Country> GetAllCountries()
{
var countries = PoorCountryDAO.GetAllcountries(); // poor static API call
// some data transformation ...
return Task.FromResult(result);
}
}
What we have here is basically a synchronous operation hiding in asynchronous clothing. This is all fine, but my question is : while we're at it, wouldn't it be better to make the method entirely async and call await Task.Run(() => poorCountryDAO.GetAllcountries()) instead of just poorCountryDAO.GetAllcountries() ?
As far as I can tell, this would free up the IIS thread the Web Api service HTTP request is currently running on, and create or reuse another thread. This thread would be blocked waiting for the DB to respond instead of the IIS thread being blocked. Is that any better resource wise ? Did I totally misunderstand or overinterpret how Task.Run() works ?
Edit : I came across this article which claims that in some cases, asynchronous database calls can result in an 8 fold performance improvement. His scenario is very close to mine. I can't get my head around how that could be possible given the answers here and am a bit perplexed about what to do...

Is that any better resource wise?
No; it's provably worse. The existing Task.FromResult and await is the best solution.
Task.Run, Task.Factory.StartNew, and Task.Start should not be used in an ASP.NET application. They steal threads from the same thread pool that ASP.NET uses, causing extra thread switches. Also, if they are long-running, they will mess with the default ASP.NET thread pool heuristics, possibly causing it to create and destroy threads unnecessarily.

It's the same thing, you're locking up a thread while releasing another one. In theory performance is the same, although it will actually be slightly worse because of the overhead of context switching

A few points: first, for await Task.Start(() => poorCountryDAO.GetAllcountries()), Task.Start(() => poorCountryDAO.GetAllcountries()) already gives you a task, so you should just return that instead rather than awaiting.
Note that in any case, the fact that this method's Task is really synchronous is an implementation detail. There may be a temptation to wrap the GetAllCountries() call itself in a background thread, but that's a bad idea.
In all of these cases, you're still going to be stuck wasting a thread. The scenario you desire where you free up the IIS thread completely requires the use of "Overlapped IO" for the database calls (as per your link).
Basically, in these cases right now, one way or another, a thread (either the main thread or a worker thread) are going to block when they call PoorCountryDAO.GetAllcountries(). However, when you switch to the asynchronous DB calls, they will no longer burn a thread at all. If, however, the caller uses its own Task.Run, that will now come back to bite you.

Related

Are database.SaveChanges() equivalent to await await database.SaveChangesAsync()?

Is await database.SaveChangesAsync() functionally equivalent to database.SaveChanges() ? If so, are there any benefits in using the async version if I need to update the database immediately ? Are there even more overheads in doing that ?
Thanks !
Update: I'm updating the question to give a better picture of my use case. This is in response to an answer below.
I'm writing WebAPI calling a Mediatr. I'm doing CRUD type of transaction. And I really needed to update the database and then continue from there. I don't see a need to spawn off a (what I saw on YouTube video) state machine to process the database update. Am I thinking correctly ? So if that's the case, I really don't need to do a await database.SaveChangesAsync, right ? I should just do a database.SaveChanges() and let the database update happen in the current thread. That's the purpose of my question. Is this thinking correct ?
Lots of codebases provide async and sync methods. The difference is the sync method will lock your thread. Essentially when you call the 'await database.SaveChangesAsync()' method, the tasks sets up an event to resume processing when you get a result and the thread returns to the pool. Performance wise, for large systems the more you make use of async/await the lower the drain on system resources.
Using async await also lets you do things like this
private async Task Foo() {
var dataBaseResultTask = database.SaveChangesAsync();
SomeOtherWorkICanDoWithoutWaiting();
await dataBaseResultTask;
}
In general, if you can use the async version of a method, you should. But if you're calling the database from a sync method, it's not the end of the world to use the sync variant. They are functionally the same. Generally though, the more you use C# the more async will spread through your system like a virus, eventually you're gonna have to give in.
Updating for updated question:
In an API context, crud operations make a lot of sense to use the async variant. Say you have 10 requests coming in at slightly different times, the 10 method calls will reach the database step, freeze the thread and wait for the database to respond. You now have 10 threads locked up despite your API not actually doing any processing. On the other hand if all the methods do await databaseAsync, all 10 threads will spin up, trigger the database call, and then return to the pool freeing them up for other API calls/processes. Then when replies from the database come back whatever threads are available will spin up, process the replies and then return to the pool again.
For a small API you can get away with either method but best practices wise, your use case is textbook for async await. Tl;dr; The method itself will behave the same with either approach, but use less resources with async await.

Do I always need to use async/await?

I wanted to ask you about async/await. Namely, why does it always need to be used? (all my friends say so)
Example 1.
public async Task Boo()
{
await WriteInfoIntoFile("file.txt");
some other logic...
}
I have a Boo method, inside which I write something to files and then execute some logic. Asynchrony is used here so that the stream does not stop while the information is being written to the file. Everything is logical.
Example 2.
public async Task Bar()
{
var n = await GetNAsync(nId);
_uow.NRepository.Remove(n);
await _uow.CompleteAsync();
}
But for the second example, I have a question. Why here asynchronously get the entity, if without its presence it will still be impossible to work further?
why does it always need to be used?
It shouldn't always be used. Ideally (and especially for new code), it should be used for most I/O-based operations.
Why here asynchronously get the entity, if without its presence it will still be impossible to work further?
Asynchronous code is all about freeing up the calling thread. This brings two kinds of benefits, depending on where the code is running.
If the calling thread is a UI thread inside a GUI application, then asynchrony frees up the UI thread to handle user input. In other words, the application is more responsive.
If the calling thread is a server-side thread, e.g., an ASP.NET request thread, then asynchrony frees up that thread to handle other user requests. In other words, the server is able to scale further.
Depending on the context, you might or might not get some benefit. In case you call the second function from a desktop application, it allows the UI to stay responsive while the async code is being executed.
Why here asynchronously get the entity, if without its presence it will still be impossible to work further?
You are correct in the sense that this stream of work cannot proceed, but using async versions allows freeing up the thread to do other work:
I like this paragraph from Using Asynchronous Methods in ASP.NET MVC 4 to explain the benefits:
Processing Asynchronous Requests
In a web app that sees a large number of concurrent requests at start-up or has a bursty load (where concurrency increases suddenly), making web service calls asynchronous increases the responsiveness of the app. An asynchronous request takes the same amount of time to process as a synchronous request. If a request makes a web service call that requires two seconds to complete, the request takes two seconds whether it's performed synchronously or asynchronously. However during an asynchronous call, a thread isn't blocked from responding to other requests while it waits for the first request to complete. Therefore, asynchronous requests prevent request queuing and thread pool growth when there are many concurrent requests that invoke long-running operations.
Not sure what you mean by
without its presence it will still be impossible to work further
regarding example 2. As far as I can tell this code gets an entity by id from its repository asynchronously, removes it, then completes the transaction on its Unit of Work. Do you mean why it does not simply remove the entry by id? That would certainly be an improvement, but would still leave you with an asynchronous method as CompleteAsync is obviously asynchronous?
As to your general question, I don't think there is a general concensus to always use async/await.
In your second example there with the async/await keywords you are getting the value of the n variable asynchronously. This might be necessary because the GetNAsync method is likely performing some time-consuming operation, such as querying a database or perhaps you might be calling a webservice downstream, that could block the main thread of execution. By calling the method asynchronously, the rest of the code in the Bar method can continue to run while the query is being performed in the background.
But if in the GetNAsync you are just calling another method locally that is doing some basic CPU bound task then the async is pointless in my view. Aync works well when you are sure you need to wait such as network calls or I/O bound calls that will definitely add latency to your stack.

Proper method of creating, not awaiting, and ensuring completion of, Tasks

Preface: I don't have a good understanding of the underlying implementation of tasks in C#, only their usage. Apologies for anything I butcher below:
I'm unable to find a good answer to the question of "How can I start a task but not await it?" in C#. More specifically, how can I guarantee that the task completes even if the context of the task is finalized/destroyed?
_ = someFunctionAsync(); is satisfactory for launching and forgetting about a task, but what if the parent is transient? What if the task cannot complete before the parent task? This is a frequent occurrence in controller methods, and tasks written in the fashion _ = someFunctionAsync(); are subject to cancellation.
Example Code:
[HttpGet]
public IActionResult Get()
{
_ = DoSomethingAsync();
return StatusCode(204);
}
In order to combat this cancellation, I created a (fairly stupid, imo) static class to hold onto the tasks so that they have time to complete, but it does not work, as the tasks are cancelled when the parent controller is destroyed:
public static class IncompleteTaskManager
{
private static ConcurrentBag<Task> _incompleteTasks = new();
private static event EventHandler<Task>? _onTaskCompleted;
public static void AddTask(Task t)
{
_onTaskCompleted += (sender, task) =>
{
_incompleteTasks = new ConcurrentBag<Task>(_incompleteTasks.Where(task => task != t));
};
_incompleteTasks.Add(CreateTaskWithRemovalEvent(t));
}
private static async Task CreateTaskWithRemovalEvent(Task t)
{
await t;
_onTaskCompleted?.Invoke(null, t);
}
}
Plus, this seems convoluted and feels like a bad solution to a simple problem. So, how the heck do I handle this? What is the proper way of starting a task, forgetting about it, but guaranteeing it runs to completion?
Edit 1, in case anyone suggests it: I've read posts suggesting that _ = Task.Run(async () => await someFunctionAsync()); may serve my needs, but this is not the case either. Though another thread runs the method, its context is lost as well and the task is cancelled, cancelling the child task.
Edit 2: I realize that the controller example is not necessarily the best, as I could simply write the code differently to respond immediately, then wait for the method to complete before disposing of the controller:
[HttpGet]
public async Task Get()
{
base.Response.StatusCode = 204;
await base.Response.CompleteAsync(); //Returns 204 to caller here.
await DoSomethingAsync();
}
There's a lot to unpack here. I'll probably miss a few details, but let me share a few things that should set up a pretty good foundation.
Fundamentally, what it sounds like you're asking about is how to create background tasks in ASP.NET. In .NET 4.x, there was a QueueBackgroundWorkItem method created for this purpose: it gave your task a new cancellation token to use instead of the one provided by the controller action, and it switched you to a different context for the action you provided.
In asp.net core, there are more powerful (but more complicated) IHostedService implementations, including the BackgroundService, but there's nothing quite as simple as QueueBackgroundWorkItem. However, the docs include an example showing how you can use a BackgroundService to basically write your own implementation of the same thing. If you use their code, you should be able to inject an IBackgroundTaskQueue into your controller and call QueueBackgroundWorkItemAsync to enqueue a background task.
Both of these approaches take care of the need to have something await the tasks that get started. You can never truly "guarantee" completion of any given tasks, but they can at least handle the common use cases more gracefully. For example, they let your hosting environment (e.g. IIS) know that something is still running, so it doesn't automatically shut down just because no requests are coming in. And if the hosting environment is being instructed to shut down, it can signal that fact through the cancellation tokens and you can hopefully quickly get your task into a safe state for shutting down rather than being unceremoniously aborted.
They also handle the problem of uncaught exceptions in the background tasks: the exceptions are caught and logged instead of either being silently eaten or completely killing the application.
Neither of these do anything to help you maintain context about things like the current request or user. This is sensible, because the whole point is to allow an action to extend beyond the scope of any given request. So you'll need to write any code you call in these actions to not rely on HttpContext/IHttpContextAccessor or anything stateful like that. Instead, gather what information you need from the context prior to enqueueing the background task, and pass that information along as variables and parameters to downstream code. This is usually good practice anyway, since the HTTP Context is a responsibility that should stay in controller-level code, while most of your business logic should think in terms of business-level models instead. And relying on State is usually best avoided where possible, to create software that's more reliable, testable, etc.
For other types of applications, there are other approaches you'd need to take. Usually it's best to do an internet search for [framework] background tasks where [framework] is the framework you're working in (WPF, e.g.). Different frameworks will have different restrictions. For example, if you write a console app that expects to run without any interaction beyond the command-line arguments, the Task returned from your Main function will probably need to await all the tasks that you start therein. A WPF app, on the other hand, might kick off several background tasks when events like button clicks are invoked, but there are tricks to make sure you do CPU-intensive work on background threads while only interacting with UI elements while on the UI thread.
how can I guarantee that the task completes even if the context of the task is finalized/destroyed?
...
the tasks are cancelled when the parent controller is destroyed
...
Though another thread runs the method, its context is lost as well and the task is cancelled, cancelling the child task.
Your core question is about how to run a task that continues running after the request completes. So there is no way to preserve the request context. Any solution you use must copy any necessary information out of the request context before the request completes.
Plus, this seems convoluted and feels like a bad solution to a simple problem. So, how the heck do I handle this? What is the proper way of starting a task, forgetting about it, but guaranteeing it runs to completion?
That last part is the stickler: "guaranteeing it runs to completion". Discarding tasks, using Task.Run, and using an in-memory collection of in-progress tasks are all incorrect solutions in this case.
The only correct solution is even more convoluted than these relatively simple approaches: you need a basic distributed architecture (explained in detail on my blog). Specifically:
A durable queue (e.g., Azure Queue). This holds the serialized representation of the work to be done - including any values from the request context.
A background processor (e.g., Azure Function). I prefer independent background processors, but it's also possible to use BackgroundService for this.
The durable queue is the key; it's the only way to guarantee the tasks will be executed.

Why does an async single task run faster than a normal single task?

I have a method which has just one task to do and has to wait for that task to complete:
public async Task<JsonResult> GetAllAsync()
{
var result = await this.GetAllDBAsync();
return Json(result, JsonRequestBehavior.AllowGet);
}
public async Task<List<TblSubjectSubset>> GetAllDBAsync()
{
return await model.TblSubjectSubsets.ToListAsync();
}
It is significantly faster than when I run it without async-await.
We know
The async and await keywords don't cause additional threads to be
created. Async methods don't require multithreading because an async
method doesn't run on its own thread. The method runs on the current
synchronization context and uses time on the thread only when the
method is active
According to this link: https://msdn.microsoft.com/en-us/library/hh191443.aspx#BKMK_Threads. What is the reason for being faster when we don't have another thread to handle the job?
"Asynchronous" does not mean "faster."
"Asynchronous" means "performs its operation in a way that it does not require a thread for the duration of the operation, thus allowing that thread to be used for other work."
In this case, you're testing a single request. The asynchronous request will "yield" its thread to the ASP.NET thread pool... which has no other use for it, since there are no other requests.
I fully expect asynchronous handlers to run slower than synchronous handlers. This is for a variety of reasons: there's the overhead of the async/await state machine, and extra work when the task completes to have its thread enter the request context. Besides this, the Win32 API layer is still heavily optimized for synchronous calls (expect this to change gradually over the next decade or so).
So, why use asynchronous handlers then?
For scalability reasons.
Consider an ASP.NET server that is serving more than one request - hundreds or thousands of requests instead of a single one. In that case, ASP.NET will be very grateful for the thread returned to it during its request processing. It can immediately use that thread to handle other requests. Asynchronous requests allow ASP.NET to handle more requests with fewer threads.
This is assuming your backend can scale, of course. If every request has to hit a single SQL Server, then your scalability bottleneck will probably be your database, not your web server.
But if your situation calls for it, asynchronous code can be a great boost to your web server scalability.
For more information, see my article on async ASP.NET.
I agree with Orbittman when he mentions the overhead involved in the application architecture. It doesn't make for a very good benchmark premise since you can't be sure if the degradation can indeed be solely attributed to the async vs non-async calls.
I've created a really simple benchmark to get a rough comparison between an async and a synchronous call and async loses every time in the overall timing actually, though the data gathering section always seems to end up the same. Have a look: https://gist.github.com/mattGuima/25cb7893616d6baaf970
Having said that, the same thought regarding the architecture applies. Frameworks handle async calls differently: Async and await - difference between console, Windows Forms and ASP.NET
The main thing to remember is to never confuse async with performance gain, because it is completely unrelated and most often it will result on no gain at all, specially with CPU-bound code. Look at the Parallel library for that instead.
Async await is not the silver bullet that some people think it is and in your example is not required. If you were processing the result of the awaitable operation after you received it then you would be able to return a task and continue on the calling thread. You wouldn't have to then wait for the rest of the operation to complete. You would be correct to remove the async/await in the above code.
It's not really possible to answer the question without seeing the calling code either as it depends on what the context is trying to trying to do with the response. What you are getting back is not just a Task but a task in the context of the method that will continue when complete. See http://codeblog.jonskeet.uk/category/eduasync/ for much better information regarding the inner workings of async/await.
Lastly I would question your timings as with an Ajax request to a database and back there other areas with potentially greater latency, such as the HTTP request and response and the DB connection itself. I assume that you're using an ORM and that alone can cause an overhead. I wonder whether it's the async/await that is the problem.

How to use non-thread-safe async/await APIs and patterns with ASP.NET Web API?

This question has been triggered by EF Data Context - Async/Await & Multithreading. I've answered that one, but haven't provided any ultimate solution.
The original problem is that there are a lot of useful .NET APIs out there (like Microsoft Entity Framework's DbContext), which provide asynchronous methods designed to be used with await, yet they are documented as not thread-safe. That makes them great for use in desktop UI apps, but not for server-side apps. [EDITED] This might not actually apply to DbContext, here is Microsoft's statement on EF6 thread safety, judge for yourself. [/EDITED]
There are also some established code patterns falling into the same category, like calling a WCF service proxy with OperationContextScope (asked here and here), e.g.:
using (var docClient = CreateDocumentServiceClient())
using (new OperationContextScope(docClient.InnerChannel))
{
return await docClient.GetDocumentAsync(docId);
}
This may fail because OperationContextScope uses thread local storage in its implementation.
The source of the problem is AspNetSynchronizationContext which is used in asynchronous ASP.NET pages to fulfill more HTTP requests with less threads from ASP.NET thread pool. With AspNetSynchronizationContext, an await continuation can be queued on a different thread from the one which initiated the async operation, while the original thread is released to the pool and can be used to serve another HTTP request. This substantially improves the server-side code scalability. The mechanism is described in great details in It's All About the SynchronizationContext, a must-read. So, while there is no concurrent API access involved, a potential thread switch still prevents us from using the aforementioned APIs.
I've been thinking about how to solve this without sacrificing the scalability. Apparently, the only way to have those APIs back is to maintain thread affinity for the scope of the async calls potentially affected by a thread switch.
Let's say we have such thread affinity. Most of those calls are IO-bound anyway (There Is No Thread). While an async task is pending, the thread it's been originated on can be used to serve a continuation of another similar task, which result is already available. Thus, it shouldn't hurt scalability too much. This approach is nothing new, in fact, a similar single-threaded model is successfully used by Node.js. IMO, this is one of those things that make Node.js so popular.
I don't see why this approach could not be used in ASP.NET context. A custom task scheduler (let's call it ThreadAffinityTaskScheduler) might maintain a separate pool of "affinity apartment" threads, to improve scalability even further. Once the task has been queued to one of those "apartment" threads, all await continuations inside the task will be taking place on the very same thread.
Here's how a non-thread-safe API from the linked question might be used with such ThreadAffinityTaskScheduler:
// create a global instance of ThreadAffinityTaskScheduler - per web app
public static class GlobalState
{
public static ThreadAffinityTaskScheduler TaScheduler { get; private set; }
public static GlobalState
{
GlobalState.TaScheduler = new ThreadAffinityTaskScheduler(
numberOfThreads: 10);
}
}
// ...
// run a task which uses non-thread-safe APIs
var result = await GlobalState.TaScheduler.Run(() =>
{
using (var dataContext = new DataContext())
{
var something = await dataContext.someEntities.FirstOrDefaultAsync(e => e.Id == 1);
var morething = await dataContext.someEntities.FirstOrDefaultAsync(e => e.Id == 2);
// ...
// transform "something" and "morething" into thread-safe objects and return the result
return data;
}
}, CancellationToken.None);
I went ahead and implemented ThreadAffinityTaskScheduler as a proof of concept, based on the Stephen Toub's excellent StaTaskScheduler. The pool threads maintained by ThreadAffinityTaskScheduler are not STA thread in the classic COM sense, but they do implement thread affinity for await continuations (SingleThreadSynchronizationContext is responsible for that).
So far, I've tested this code as console app and it appears to work as designed. I haven't tested it inside an ASP.NET page yet. I don't have a lot of production ASP.NET development experience, so my questions are:
Does it make sense to use this approach over simple synchronous invocation of non-thread-safe APIs in ASP.NET (the main goal is to avoid sacrificing scalability)?
Is there alternative approaches, besides using synchronous API invocations or avoiding those APis at all?
Has anyone used something similar in ASP.NET MVC or Web API projects and is ready to share his/her experience?
Any advice on how to stress-test and profile this approach with ASP.NET would be
appreciated.
Entity Framework will (should) handle thread jumps across await points just fine; if it doesn't, then that's a bug in EF. OTOH, OperationContextScope is based on TLS and is not await-safe.
1. Synchronous APIs maintain your ASP.NET context; this includes things such as user identity and culture that are often important during processing. Also, a number of ASP.NET APIs assume they are running on an actual ASP.NET context (I don't mean just using HttpContext.Current; I mean actually assuming that SynchronizationContext.Current is an instance of AspNetSynchronizationContext).
2-3. I have used my own single-threaded context nested directly within the ASP.NET context, in attempts to get async MVC child actions working without having to duplicate code. However, not only do you lose the scalability benefits (for that part of the request, at least), you also run into the ASP.NET APIs assuming that they're running on an ASP.NET context.
So, I have never used this approach in production. I just end up using the synchronous APIs when necessary.
You should not intertwine multithreading with asynchrony. The problem with an object not being thread-safe is when a single instance (or static) is accessed by multiple threads at the same time. With async calls the context is possibly accessed from a different thread in the continuation, but never at the same time (when not shared across multiple requests, but that isn't good in the first place).

Categories

Resources