I am posting this partly out of intrest on how the Task Parallel Library works, and for spreading knowledge. And also for investigating whether my "Cancellation" updates is the reason to a new issue where the user is suddenly logged out.
The project I am working on have these components:
Web forms site. A website that acts as portal for administrating company vehicles. Further refered as "Web"
WCF web service. A backend service on a seperate machine. Further refered as "Service"
Third party service. Further refered as "3rd"
Note: I am using .NET 4.0. Therefore the newer updates to the Task Parallel Library are not available.
The issue that I was assigned to fix was that the login function was very slow and CPU intensive. This later was later admitted to be a problem in the Third party service. However I tried to optimize the login behavior as well as I could.
The login request and response doesn't contain perticularly much data. But for gathering the response data several API calls are made to the Third party service.
1. Pre changes
The Web invokes a WCF method on the Service for gathering "session data".
This method would sometimes take so long that it would timeout (I think the timeout was set to 1 minute).
A pseudo representation of the "GetSessionData" method:
var agreements = getAgreements(request);
foreach (var agreement in agreements)
{
getAgreementDetails(agreement);
var customers = getCustomersWithAgreement(agreement);
foreach (var customer in customers)
{
getCustomerInfo(customer);
getCustomerAddress(customer);
getCustomerBranches(customer);
}
}
var person = getPerson(request);
var accounts = getAccount(person.Id);
foreach (var account in accounts)
{
var accountDetail = getAccountDetail(account.Id);
foreach (var vehicle in accountDetail.Vehicles)
{
getCurrentMilageReport(vehicle.Id);
}
}
return sessionData;
See gist for code snippet.
This method quickly becomes heavy the more agreements and accounts the user has.
2. Parallel.ForEach
I figured that I could replace foreach loops with a Parallel.ForEach(). This greatly improved the speed of the method for larger users.
See gist for code snippet.
3. Cancel
Another problem we had was that when the web services server is maxed on CPU usage, all method calls becomes much slower and could result in a timeout for the user. And a popular response to a timeout is to try again, so the user triggers another login attempt which is "queued"(?) due to the high CPU usage levels. This all while the first request has not returned yet.
We discovered that the request is still alive if the web site times out. So we decided to implement a similiar timeout on the Service side.
See gist for code snippet.
The idea is that GetSessionData(..) is to be invoked with a CancellationToken that will trigger Cancel after about the same time as the Web timeout. So that no work will be done if no one is there to show or use the results.
I also implemented the cancellation for the method calls to the Third party service.
Is it correct to share the same CancellationToken for all of the loops and service calls? Could there be an issue when all threads are "aborted" by throwing the cancel exception?
See gist for code snippet.
Is it correct to share the same CancellationToken for all of the loops and service calls? Could there be an issue when all threads are "aborted" by throwing the cancel exception?
Yes, it is correct. And yes, there could be an issue with throwing a lot of exceptions at the same time, but only in specific situations and huge amount of parallel work.
Several hints:
Use one CancellationTokenSource per one complete action. For example, per request. Pass the same Cancellation Token from this source to every asynchronous method
You can avoid throwing an exception and just return from a method. Later, to check that work was done and nothing been cancelled, you you check IsCancellationRequested on cts
Check token for cancellation inside loops on each iteration and just return if cancelled
Use threads only when there is an IO work, for example, when you query something from database or requests to another services; don't use it for CPU-bound work
I was tired at the end of working day and suggested a bad thing. Mainly, you don't need threads for IO bound work, for example, for waiting for a response from database of third service. Use threads only for CPU computations.
Also, I reviewed your code again and found several bottlenecks:
You can call GetAgreementDetail, GetFuelCards, GetServiceLevels, GetCustomers in asynchronously; don't wait for each next, not running all four requests
You can call GetAddressByCustomer and GetBranches in parallel as well
I noticed that you use mutex. I guess it is for protecting agreementDto. Customers and response.Customers on addition. If so, you can reduce scope of the lock
You can start work with Vehicles earlier, as you know UserId at the beginning of the method; do it in parallel too
Related
I am writing an API using ASP.NET and I have some potentially long running code from the different end points. The system uses CQRS and Event Sourcing. A Command comes into to an end point and is then published as an event using MediatR. However the Handlers are potentially long running. Since some of the Requests coming in might be sent to multiple Handlers. This process could take longer than the 12s that AWS allows before returning an Error code.
Is there a way to return a response back to the caller to say that the event has been created while still contining with the process? That is to say fire off a separate task that performs the long running piece of code, that also catches and logs errors. Then return a value back to the user saying the Event has been successfully created?
I believe that ASP.NET spins up a new instance each time a call is made, will the old instance die one a value is returned, killing the task?
I could be wrong with a number of points here, this is my knowledge gleaned from the internet but I could have missunderstood articles.
Thanks.
Yes, you should pass the long-running task off to a background process and return to the user. When the task is complete, notifiy the user with whatever mechanism is appropriate for your site.
But do not start a new thread, what you want is to have a background service running for this, and use that to manage your request.
If a new thread is running the long operation it will remain “open/live” until it finishes. Also you can configure the app pool to always be active.
There are a lot of frameworks to work with long running tasks like Hangfire.
And to keep the user updated with the status of the task you can use SignalR to push notifications to the UI
I have put Flurl in high load using DownloadFileAsync method to download files in private network from one server to another and after several hours the method starts to throw exceptions "Get TimeOut". The only solution to solve that is restart application.
downloadUrl.DownloadFileAsync(Helper.CreateTempFolder()).Result;
I have added second method as failover using HTTPClient and its download files fine after flurl fails, so it is not server problem.
private void DownloadFile(string fileUri, string locationToStoreTo)
{
using (var client = new HttpClient())
using (var response = client.GetAsync(new Uri(fileUri)).Result)
{
response.EnsureSuccessStatusCode();
var stream = response.Content.ReadAsStreamAsync().Result;
using (var fileStream = File.Create(locationToStoreTo))
{
stream.CopyTo(fileStream);
}
}
}
Do you have any idea why Get TimeOut error starts popup on high load using the method?
public static Task<string> DownloadFileAsync(this string url, string localFolderPath, string localFileName = null, int bufferSize = 4096, CancellationToken cancellationToken = default(CancellationToken));
The two download code differ only that Flurl re-use HttpClient instance for all request and my code destroy and create new HttpClient object for every new request. I know that creating and destroying HttpClient is time and resource consuming I rather would use Flurl if it would work.
As others point out, you're trying to use Flurl synchronously by calling .Result. This is not supported, and under highly concurrent workloads you're most likely witnessing deadlocks.
The HttpClient solution is using a new instance for every call, and since instances aren't shared at all it's probably less prone to deadlocks. But it's inviting a whole new problem: port exhaustion.
In short, if you want to continue using Flurl then go ahead and do so, especially since you're getting smart HttpClient reuse "for free". Just use it asynchronously (with async/await) as intended. See the docs for more information and examples.
I can think of two or three possibilities (I'm sure there are others that I can't think of as well)
Server IP address has changed.
You wrote that Flurl reuses a HttpClient. I've never used, or even heard of Flurl, so I have no idea how it works. But an HttpClient re-uses a pool of connections, which is why it's efficient to reuse a single instance and why it's critical to do so in a high-volume microservice application, otherwise you're likely to exhaust all ports, but that gives a different error message, not a time out, so I know you haven't hit that case. However, while it's important to re-use an HttpClient in the short term, HttpClient will cache DNS results, which means it's important to dispose and create new HttpClients periodically. In short-lived processes, you can use a static or singleton instance. But in long running processes, you should create a new instance periodically. If you only use it to access one server, that server's DNS TTL is a good value to use.
So, what might be happening is the server changed IP addresses a few hours after your program started, and because Flurl keep reusing the same HttpClient, it doesn't get the new IP address from the DNS entry. One way to check if this is the problem is write the server's IP address to a log at the beginning of the process and when you encounter the problem, check if the IP address is the same or not.
If this is the problem, you can look into ASP.NET Core 2.1's HttpClientFactory. It's a bit awkward to use outside of ASP.NET, but I did once. It gives you re-use of HttpClients, to avoid the TCP port exhaustion problem of using more than 32k HttpClients in 120 seconds, but also avoid DNS caching issues. My memory is that it creates a new HttpClient every 5 minutes by default.
Reaching the maximum connections per server
ServicepointManager.DefaultConnectionLimit sets the maximum number of HTTP connections that a client will open to a server. If your code tries to use more than this simultaneously, the requests that exceed the limit will wait for an existing HTTP client to finish its request, then it will use the newly available connection. However, in the past when I was looking into this, the HTTP timeout started from when the HttpClient's method was called, not when the HttpClient sends the request to the server over a connection. This means that if your limit is 2 and both are used for longer than the timeout period (for example if downloading 2 large files), other requests to download from the same server will time out, even though no http request was ever sent to the server.
So, depending on your application and server, you may be able to use a higher connection limit, otherwise you need to implement request queuing in your app.
Thread pool exhaustion
Async code is awesome for performance when used correctly in highly concurrent, IO bound workloads. I sometimes think it's a bad idea to use anywhere else because it such huge potential for causing weird problems when used incorrectly. Like Crowcoder wrote in a comment on the question, you shouldn't use .Result, or any code that blocks a running thread, when in an async context. Although the code sample you provided says public void DownloadFile(... , if it's actually public async Task DownloadFile(..., or if DownloadFile is called from an async method, then there's real risk of issues. If DownloadFile is not called from an async method, but is called on the thread pool, there's the same risk of errors.
Understanding async is a huge topic, unfortunately with a lot of misinformation on the internet as well, so I can't possibly cover it in detail here. A key thing to note is that async tasks run on the thread pool. So, if you call ThreadPool.QueueUserWorkItem and block the thread that your code runs on, or if you have async tasks that you block on (for example by calling .Result), what could happen is that you block every thread in the thread pool, and when an HTTP response comes back from the network, the .NET run time has no threads available to complete the task. The problem with this idea is that there are also no threads available to signal the timeout, so I don't believe you're exhausting the thread pool (if you were, I would expect a deadlock), but I don't know how timeouts are implemented. If timeouts/timers use a dedicated thread it could be possible for a cancellation token (the thing that signals a timeout) to be set by the timer's thread, and then any code on a blocking wait for either the HTTP response or the cancellation token could be triggered. But thread pool exhaustion generally causes deadlocks, so if you're getting an error back, it's probably not this.
To check if you're having threadpool exhaustion issues, when your program starts getting the timeout errors, get a memory dump of your app (for example using Task Manager). If you have the Enterprise or Ultimate SKU of Visual Studio, you can open/debug the memory dump in VS. Otherwise you'll need to learn how to use windbg (or find another tool). When debugging the memory dump, check the number of threads. If there's a very large number of threads, that's a hint you might be on the right track. Check where the thread was at the time of the memory dump. If they're all in blocking calls like WaitForObject, or something similar, then there's a real risk you've exhausted the thread pool. I've never debugged an async task deadlock/thread pool exhaustion issue before, so I'm not sure if there's a way to get a list of tasks and see from their runstate if they're likely to be deadlocked or not. If you ever see more tasks in the running state than you have cores on your CPU, you almost certainly have blocking in an async task, however.
In summary, you haven't given us enough details to give you an answer that will work with 100% certainty. You need to keep investigating to understand the problem until you can either solve it yourself, or provide us with more information. I've given you some of the most likely causes, but it could very easily be something else completely.
I have a data processing MVC application that works with uploaded file sizes ranging from 100MB to 2GB and contains a couple of long running operations. Users will upload the files and the data in those files will be processed and then finally some analysis on the data will be sent to related users/clients.
It will take least a couple of hours to process the data, so in order to make sure the user doesn't have to wait all the way, I've spun up a separate task to do this long running operation. This way, once the files are received by the server and stored on the disk, the user will get a response back with a ReferenceID and they can close the browser.
So far, it's been working well as intended but after reading up on issues with using Fire-and-Forget pattern in MVC and worker threads getting thrown away by IIS during recycling, I have concerns about this approach.
Is this approach still safe? If not, How can I ensure that the thread that is processing the data doesn't die until it finishes processing and sends the data to clients? (in a relatively simpler way)
The app runs on .NET 4.5, so don't think I will be able to use HostingEnvironment.QueueBackgroundWorkItem at the moment.
Does using Async/Await at controller help?
I've also thought of using a message queue on app server to store messages once the files are stored to disk and then making the DataProcessor a separate service/Process and then listen to the queue. If the queue is recoverable, then it will assure me that the messages will always get processed eventually even if the server crashes or the thread gets thrown away before finish processing the data. Is this a better approach?
My current setup is something like below
Controller
public ActionResult ProcessFiles()
{
HttpFileCollectionBase uploadedfiles = Request.Files;
var isValid = ValidateService.ValidateFiles(uploadedFiles);
if(!isValid){
return View("Error");
}
var referenceId = DataProcessor.ProcessData(uploadedFiles);
return View(referenceId);
}
Business Logic
public Class DataProcessor
{
public int ProcessFiles(HttpFileCollectionBase uploadedFiles)
{
var referenceId = GetUniqueReferenceIdForCurrentSession();
var location = SaveIncomingFilesToDisk(referenceId, uploadedFiles);
//ProcessData makes a DB call and takes a few hours to complete.
TaskFactory.StartNew(() => ProcessData(ReferenceId,location))
.ContinueWith((prevTask) =>
{
Log.Info("Completed Processing. Carrying on with other work");
//Below method takes about 30 mins to an hour
SendDataToRelatedClients(ReferenceId);
}
return referenceId;
}
}
References
http://blog.stephencleary.com/2014/06/fire-and-forget-on-asp-net.html
Apppool recycle and Asp.net with threads?
Is this approach still safe?
It was never safe.
Does using Async/Await at controller help?
No.
The app runs on .NET 4.5, so don't think I will be able to use HostingEnvironment.QueueBackgroundWorkItem at the moment.
I have an AspNetBackgroundTasks library that essentially does the same thing as QueueBackgroundWorkItem (with minor differences). However...
I've also thought of using a message queue on app server to store messages once the files are stored to disk and then making the DataProcessor a separate service/Process and then listen to the queue. If the queue is recoverable, then it will assure me that the messages will always get processed eventually even if the server crashes or the thread gets thrown away before finish processing the data. Is this a better approach?
Yes. This is the only reliable approach. It's what I call the "proper distributed architecture" in my blog post.
No, it is not safe. Create a service application on your server that handles these requests and publishes the result. If you are hosted on Azure, take advantage of their WebJob service.
I am developing a web-api that takes data from client, and saves it for later use. Now i have an external system that needs to know of all events, so i want to setup a notification component in my web-api.
What i do is, after data is saved, i execute a SendNotification(message) method in my new component. Meanwhile i don't want my client to wait or even know that we're sending notifications, so i want to return a 201 Created / 200 OK response as fast as possible to my clients.
Yes this is a fire-and-forget scenario. I want the notification component to handle all exception cases (if notification fails, the client of the api doesn't really care at all).
I have tried using async/await, but this does not work in the web-api, since when the request-thread terminates, the async operation does so aswell.
So i took a look at Task.Run().
My controller looks like so:
public IHttpActionResult PostData([FromBody] Data data) {
_dataService.saveData(data);
//This could fail, and retry strategy takes time.
Task.Run(() => _notificationHandler.SendNotification(new Message(data)));
return CreatedAtRoute<object>(...);
}
And the method in my NotificationHandler
public void SendNotification(Message message) {
//..send stuff to a notification server somewhere, syncronously.
}
I am relatively new in the C# world, and i don't know if there is a more elegant(or proper) way of doing this. Are there any pitfalls with using this method?
It really depends how long. Have you looked into the possibility of QueueBackgroundWorkItem as detailed here. If you want to implement a very fast fire and forget you also might want to consider a queue to pop these messages onto so you can return from the controller immediately. You'd then have to have something which polls the queue and sends out the notifications i.e. Scheduled Task, Windows service etc. IIRC, if IIS recycles during a task, the process is killed whereas with QueueBackgroundWorkItem there is a grace period for which ASP.Net will let the work item finish it's job.
I would take a look on Hangfire. It is fairly easy to setup, it should be able to run within your ASP.NET process and is easy to migrate to a standalone process in case your IIS load suddenly increases.
I experimented with Hangfire a while ago but in standalone mode. It has enough docs and easy to understand API.
My understanding is the point of Task is to abstract out threads, and that a new thread is not guaranteed per Task.
I'm debugging in VS2010, and I have something similar to this:
var request = WebRequest.Create(URL);
Task.Factory.FromAsync<WebResponse>(
request.BeginGetResponse,
request.EndGetResponse).ContinueWith(
t => { /* ... Stuff to do with response ... */ });
If I make X calls to this, e.g. start up X async web requests, how am I to calculate how many simultaneous (concurrent) connections are actually being made at any given time during execution? I assume that somehow it is opening only the max it can (in the case X is very high), and the other Tasks are blocked while waiting?
Any insight into this or how I can check with the debugger to determine how many active (open) connections are existent at a given point in execution would be great.
Basically, I'm wondering if it's handled for me, or if I have to take special consideration so that I do not appear to be attacking a server?
This won't really be specific to Task. The external connection is created as soon as you make your call to Task.Factory.FromAsync. The "task" that the Task is performing is simply waiting for the response to get back (not for it to be sent in the first place). Thus the call to BeginGetResponse will fail if your machine is unable to send any more requests, and the response will contain an error message if the server is rejecting your requests due to their belief that you are flooding them.
The only real place that Task comes into play here is the amount of time between when the response is actually received by the machine and when your continuation runs. If you are getting lots of responses, or otherwise have lots of work in the thread pool, it could take some time for it to get to your continuation.