We have a function app that build a large json payload(+- 2000 lines) everyday and posts it to the api to be mapped and saved into a database.
We are using cqrs with mediatr and it seems the API side takes exceptionally long to create and save all the neccesary information.
The problem we have is that the function's postasjsonasync waits for the api response and times out after a few minutes.
Any idea how to run this as a background task or just post and forget? Our API is only concerned that it received data.
Function side:
using (var client = new HttpClient())
{
client.Timeout = new TimeSpan(0, 10, 0);
var response = await client.PostAsJsonAsync($"{endpoint}/api/v1.0/BatchImport/Import", json); <-- Times out waiting for API
response.EnsureSuccessStatusCode();
}
API mediatr handle side:
public async Task<Unit> Handle(CreateBatchOrderCommand request, CancellationToken cancellationToken)
{
foreach (var importOrder in request.Payload) <-- Takes long to process all the data
{
await PopulateImportDataAsync(importOrder, cancellationToken);
await CreateOrderAsync(importOrder, cancellationToken);
}
return Unit.Value;
}
Cheers
The problem we have is that the function's postasjsonasync waits for the api response and times out after a few minutes.
The easiest solution is going to be just increasing that timeout. If you are talking about Azure Functions, I believe you can increase the timeout to 10 minutes.
Any idea how to run this as a background task or just post and forget? Our API is only concerned that it received data.
Any fire-and-forget solution is not going to end well; you'll end up with lost data. I recommend that you not use fire-and-forget at all, and this advice goes double as soon as you're in the cloud.
Assuming increasing the timeout isn't sufficient, your solution is to use a basic distributed architecture, as described on my blog:
Have your API place the incoming request into a durable queue.
Have a separate backend (e.g., Azure (Durable) Function) process that request from the queue.
Assuming you’re on .NET Core, you could stick incoming requests into a queued background task:
https://learn.microsoft.com/en-us/aspnet/core/fundamentals/host/hosted-services?view=aspnetcore-6.0&tabs=visual-studio#queued-background-tasks
Keep in mind this chews up resources from servicing other web requests so it will not scale well with millions of requests. This same basic principle, a message queue item and offline processing, can also be distributed across multiple services to take some of the load off the web service.
Related
I have a long running task in an Azure function which I want to run in a background thread using Task.Run. I don't care about the result.
public static async Task Run(...)
{
var taskA = await DoTaskA();
Task.Run(new Action(MethodB));
....
// return result based on taskA
}
Is this an acceptable pattern in Azure functions? (this is an HTTP trigger function)
I know this could also be done by adding a message to a queue and have another function execute this task but I want to know the pros and cons of starting run long running tasks in a background thread in Azure functions.
It might be best to have an Azure Function running TaskA and have it post a message in a ServiceBus which would trigger another Azure Function running TaskB when something is posted in that ServiceBus since no answer is needed anyway.
Here is the example shown on microsoft's website:
[FunctionName("FunctionB")]
public static void Run(
[ServiceBusTrigger("myqueue", AccessRights.Manage, Connection = "ServiceBusConnection")]
string myQueueItem,
TraceWriter log)
{
log.Info($"C# ServiceBus queue trigger function processed message: {myQueueItem}");
MethodB();
}
In that situation, you do not have to start a new task. Just call MethodB().
That will give you the flexibility the adjust the Plan of your Azure Functions (App Service vs Consumption Plan) and minimize the overall cost.
Depending on how complex your scenario is, you may want to look into Durable Functions. Durable Functions gives you greater control over a variety of scenarios, including long-running tasks.
No, no and no.
Have your HTTP triggered function return a 202 Accepted, the results of which you post to a blob URL later on. The 202 should include a Location header that points to the soon to exist blob URL and maybe a Retry-after header as well if you have a rough idea how long the processing takes.
The long processing task should be a queue triggered function. Why? Because things don't always go according to plan and you may need to retry processing. Why not have the retry built in.
I am trying to implement a task in fire and forget manner.
Lets look at the below piece of code.
public IHttpActionResult Update(int id)
{
var updatedResult = _updater.update(id);
// fire and forget a task
sendEmailToUser();
return ok();
}
private async Task sendEmailToUser()
{
var httpclient = new HttpClient();
// assume the client is initiated with required url and other headers
await httpclient.postasync("some url");
}
Given the above code, can i safely assume that whenever Update endpoint is called, sendEmailToUser task is triggered and will be run to completion ?
No. You should almost never start any background threads in web application. HTTP is suppose to be stateless and the web server was designed with that in mind.
The server might be put into sleep state when there is no incoming request for a set period of time. During that time all the background execution will be halt including the one you had. It might and might not get resume when the next request comes in.
Or when IIS decides to recycle your App domain on a scheduled basis your thread will get killed too.
If you really need background tasks then do that using windows service or run it as a separate console application.
Under normal conditions, it's reasonable to expect that the task will run to completion. It will go on independently.
Your biggest concerns, in this case, should be about the web API not being terminated, and the task not throwing an exception.
But if OP needs to be 100% sure, there are other safer ways to code that.
I am posting this partly out of intrest on how the Task Parallel Library works, and for spreading knowledge. And also for investigating whether my "Cancellation" updates is the reason to a new issue where the user is suddenly logged out.
The project I am working on have these components:
Web forms site. A website that acts as portal for administrating company vehicles. Further refered as "Web"
WCF web service. A backend service on a seperate machine. Further refered as "Service"
Third party service. Further refered as "3rd"
Note: I am using .NET 4.0. Therefore the newer updates to the Task Parallel Library are not available.
The issue that I was assigned to fix was that the login function was very slow and CPU intensive. This later was later admitted to be a problem in the Third party service. However I tried to optimize the login behavior as well as I could.
The login request and response doesn't contain perticularly much data. But for gathering the response data several API calls are made to the Third party service.
1. Pre changes
The Web invokes a WCF method on the Service for gathering "session data".
This method would sometimes take so long that it would timeout (I think the timeout was set to 1 minute).
A pseudo representation of the "GetSessionData" method:
var agreements = getAgreements(request);
foreach (var agreement in agreements)
{
getAgreementDetails(agreement);
var customers = getCustomersWithAgreement(agreement);
foreach (var customer in customers)
{
getCustomerInfo(customer);
getCustomerAddress(customer);
getCustomerBranches(customer);
}
}
var person = getPerson(request);
var accounts = getAccount(person.Id);
foreach (var account in accounts)
{
var accountDetail = getAccountDetail(account.Id);
foreach (var vehicle in accountDetail.Vehicles)
{
getCurrentMilageReport(vehicle.Id);
}
}
return sessionData;
See gist for code snippet.
This method quickly becomes heavy the more agreements and accounts the user has.
2. Parallel.ForEach
I figured that I could replace foreach loops with a Parallel.ForEach(). This greatly improved the speed of the method for larger users.
See gist for code snippet.
3. Cancel
Another problem we had was that when the web services server is maxed on CPU usage, all method calls becomes much slower and could result in a timeout for the user. And a popular response to a timeout is to try again, so the user triggers another login attempt which is "queued"(?) due to the high CPU usage levels. This all while the first request has not returned yet.
We discovered that the request is still alive if the web site times out. So we decided to implement a similiar timeout on the Service side.
See gist for code snippet.
The idea is that GetSessionData(..) is to be invoked with a CancellationToken that will trigger Cancel after about the same time as the Web timeout. So that no work will be done if no one is there to show or use the results.
I also implemented the cancellation for the method calls to the Third party service.
Is it correct to share the same CancellationToken for all of the loops and service calls? Could there be an issue when all threads are "aborted" by throwing the cancel exception?
See gist for code snippet.
Is it correct to share the same CancellationToken for all of the loops and service calls? Could there be an issue when all threads are "aborted" by throwing the cancel exception?
Yes, it is correct. And yes, there could be an issue with throwing a lot of exceptions at the same time, but only in specific situations and huge amount of parallel work.
Several hints:
Use one CancellationTokenSource per one complete action. For example, per request. Pass the same Cancellation Token from this source to every asynchronous method
You can avoid throwing an exception and just return from a method. Later, to check that work was done and nothing been cancelled, you you check IsCancellationRequested on cts
Check token for cancellation inside loops on each iteration and just return if cancelled
Use threads only when there is an IO work, for example, when you query something from database or requests to another services; don't use it for CPU-bound work
I was tired at the end of working day and suggested a bad thing. Mainly, you don't need threads for IO bound work, for example, for waiting for a response from database of third service. Use threads only for CPU computations.
Also, I reviewed your code again and found several bottlenecks:
You can call GetAgreementDetail, GetFuelCards, GetServiceLevels, GetCustomers in asynchronously; don't wait for each next, not running all four requests
You can call GetAddressByCustomer and GetBranches in parallel as well
I noticed that you use mutex. I guess it is for protecting agreementDto. Customers and response.Customers on addition. If so, you can reduce scope of the lock
You can start work with Vehicles earlier, as you know UserId at the beginning of the method; do it in parallel too
I am trying to implement files conversion using Azure Functions solution. The conversion can take a lot of time. Therefore I don't want waiting for the response on the calling server.
I wrote the function that returns response immediately (to indicate that service is available and converting is started) and runs conversion in separate thread. Callback URL is used to send converting result.
public static async Task<HttpResponseMessage> Run(HttpRequestMessage req, Stream srcBlob, Binder binder, TraceWriter log)
{
log.Info($"C# HTTP trigger function processed a request. RequestUri={req.RequestUri}");
// Get request model
var input = await req.Content.ReadAsAsync<ConvertInputModel>();
//Run convert in separate thread
Task.Run( async () => {
//Read input blob -> convert -> upload output blob
var convertResult = await ConvertAndUploadFile(input, srcBlob, binder, log);
//return result using HttpClient
SendCallback(convertResult, input.CallbackUrl);
});
//Return response immediately
return req.CreateResponse(HttpStatusCode.OK);
}
The problem that the new task breaks binding. I get exception while accessing params. So how can I run long-time operation in separate tread? Or such solution is totally wrong?
This pattern is not recommended (or supported) in Azure Functions. Particularly when running in the consumption plan, since the runtime won't be able to accurately manage your function's lifetime and will eventually shutdown your service.
One of the recommended (and widely used) patterns here would be to queue up this work to be processed by another function, listening on that queue, and return the response to the client right away.
With this approach, you accomplish essentially the same thing, where the actual processing will be done asynchronously, but in a reliable and efficient way (benefiting from automatic scaling to properly handle increased loads, if needed)
Do keep in mind that, when using the consumption plan, there's a function timeout of 5 minutes. If the processing is expected to take longer, you'd need to run your function on a dedicated plan with AlwaysOn enabled.
Your solution of running the background work inside the Azure Function is wrong like you suspected. You need a 2nd service that is designed to run these long running tasks. Here is documentation to Micosoft's best practices on azure for doing background jobs.
We're planning our system to have a set of publicly accessible services which call into a set of internal services, all implemented using ServiceStack.
My question is, what is the best method (in terms of performance, stability and code maintanability) for this cross-service communication?
E.g. should my public services call the internal services using a ServiceStack client or use the Rabbit / Redis messaging system? And if the latter, can I call two or more internal services asynchronously and await for the response from both?
For one-way communications Messaging offers a lot of benefits where if installing a Rabbit MQ Broker is an option, Rabbit MQ provides the more industrial strength option.
For request/reply services where requests are transient and both endpoints are required to be up, the typed C# Service Clients allow for more direct/debuggable point-to-point communications with less moving parts.
Using the clients async API's let you easily make multiple calls in parallel, e.g:
//fire off to 2 async requests simultaneously...
var task1 = client.GetAsync(new Request1 { ... });
var task2 = client.GetAsync(new Request2 { ... });
//additional processing if any...
//Continue when first response is received
var response1 = await task1;
//Continue after 2nd response, if it arrived before task1, call returns instantly
var response2 = await task1;
The above code continues after Request1 is completed, you can also use Task.WhenAny() to process which ever request was completed first.