long running queries web application (azure) solution - c#

We have an ASP MVC 3.0 application that reads data from the db using Entity framework (all on Azure). We have several long running queries (optimization has been done) and we want to make sure that the solution is scalable and prevent thread starvation.
We looked at async controllers and using I/O completion ports to run the query (using BeginExecute instead of the usual EF). However, async is hard to debug and increases the complexity of the code.
The proposed solution is as follows:
The web server (web role) gets a request that involves a long running query (example customer segmentation)
It enters the request information into a table along with the relevant parameters and returns thereby allowing the thread to process other requests.
We set a flag in the db that enables the UI to state that the query is in progress whenever a refresh to the page is done.
A worker role constantly queries this table and as soon as it finds this entry processes the long running query (customer segmentation) and updates the original customer table with the results.
In this case an immediate return of status to the users is not necessary. Users can check back within a couple of minutes to see if their request has been worked on. Instead of the table we were planning to use Azure Queues (but I guess Azure queues cannot notify a worker role so a db table will do just fine). Is this a workable solution. Are there any pitfalls to doing it this way?

While Windows Azure Storage queues don't give you a notification after a message has been processed, you could implement that yourself (perhaps with Windows Azure Storage tables). The nice part about queues: They handle concurrency and failed attempts.
For instance: If you have 2 worker instances processing messages off the same queue, every time a queue message is read, the message goes invisible in the queue, for an amount of time you specify. While invisible, only the worker instance that read the message has it. If that instance finishes processing, it can just delete the queue message (and update your notification table). If it fails (maybe due to the role instance crashing), the message re-appears on the queue after the invisibility timeout expires. Going one step further: Let's say it's simply a bad message that causes your code to crash every time. You can check the dequeue count before processing the message. If it's greater than, say, 2, simply store the message in a dead-letter table and inspect it manually.
One caveat with queues: The queue messages need to be idempotent operations (that is, they can be processed at least once, and the results should have the exact same side-effects each time).
If you go with a table instead of a queue, you'll need to deal with scaling (multiple threads or role instances processing the table), and dead-letter handling.

This depends. If your worker role does nothing other than delegating the heavy work to a SQL database, it seems a waste of resource and your money. Using a web role with async requests allows you to reduce the cost. If it is needed to do a heavy work in the worker role itself, then it is a good approach.
You can also use AJAX or web socket. Start the database query, and return the response immediately. The client can either poll the web role to see if a query has finished (if you use HTTP), or the web role can notify the client directly (if you use web socket).

Related

Start extremely long running processes through a REST request

I'm working at an automation firm so we create processes for industrial automation. Previously this automation was done on the machine side of things, but we're slowly transitioning to controlling the machines with c#.
On my current project the production for one day takes about 2 hours. The operators of the factory have a web interface that we created in c# using asp.net core MVC in which they can start/pause/stop this production process.
When starting the process we await a function in our controller that is basically a while loop that controls this 2h long production process.
The problem is now that when I send out the REST request to start the production this request takes 2h to complete, I would prefer this request immediately completes and the production process starts on the background of my asp.net core application.
First I thought I could just leave out the await and simply do this in my controller (simplified code):
_ = _productionController.StartLongProcess(); // This contains the while loop
return Ok();
But since _productionController is scoped and all its dependencies are as well, these immediately get disposed of when the method returns and I can't access my database anymore for example.
The process should be able to continuously talk to our database to save information about the production process in case something fails, that way we can always pick off where we left off.
My question to you is now, are we tackling this the wrong way? I imagine it's bad practice to start these long running processes in the asp.net controller.
How do I make sure I always have access to my DatabaseContext in this long running process even though the REST request has already ended. Create a separate scope only for this method?
Starting ASP.NET Core 2.1, the right way to do this (within asp.net) is to extend BackgroundService (or implement IHostedService).
Incoming requests can tell the background service to start the long-running operation and return immediately. You'll of course need to handle cases where duplicate requests are sent in, or new requests are sent in before the existing request is completed.
The documentation page has an example where the BackgroundService reads commands of a queue and processes them one at a time.
How do I make sure I always have access to my DatabaseContext in this long running process even though the REST request has already ended. Create a separate scope only for this method?
Yes, create a separate scope.
My question to you is now, are we tackling this the wrong way? I imagine it's bad practice to start these long running processes in the asp.net controller.
We've done something similar in the past. As long as fault-tolerance (particularly w.r.t. app restarts) and idempotence are built into the long-running-operation's logic, you should be good to go.
REST requests are expected to be short, a few seconds at maximum.
So best practice here would be to offload a long running task to a background service and return a token where you can poll the service if the operation has already finished.
The background service could be a BackGroundWorker in Net Core. This is easy but not really fault tolerant, so some sort of db and retry logic could be good.
If you are in an intranet, you could also move to an inherently asynchronous protocol like RabbitMQ, where you send a StartOperation Message and then receive a Started Message when the process has completed.
Another option would be to use Hangfire. It will allow you to Enqueue the work that you want to execute to a persistent store e.g. SQL Server, MSMQ, Redis depending on what you have in your infrastructure. The job will then be picked up by a worker which can also run in the ASP.NET process or a windows service. It's distributed too so you can have a number of instances of the workers running. Also supports retrying failed jobs and has a dashboard to view the jobs. Best of all, it's free!
var jobId = BackgroundJob.Enqueue(() => ExecuteLongRunningProcess(parameter1));
https://www.hangfire.io/
Following is my understanding of the issue that you have posted:
You want to initiate a long running call, via Rest api call
You want to use the Async call, but not sure how to maintain the DB context for a long running call which is used for db communication on regular basis during the operation
Couple of important points:
Mostly you are not clear regarding working of the Async calls
When you make an Async call, then it stores the current thread synchronization context for the continuity using state machine, it doesn't block any thread pool thread, it utilize the hardware based concurrency
Can use ConfigureAwait(false) on backend to avoid explicit reentry in the current synchronization context, which is better for performance
Only challenge with Async calls to be genuine async the complete chain need to be Async enabled from the entry point, else the benefits can't be reaped, if you use Task.Wait or Task.Result anywhere, infact may even cause a deadlock in the ASP.Net
Regarding the long running operation, following are the options
A Simple async call as suggested above, though it can help you can make large number of async calls (thus scalability) but context will be lost if the client goes away and no way to reap the status of operation back
You can make a fire and forget call, and use a mechanism like ASP.Net SignalR, which is like IObservable over the network and can help in notifying the client when the processing finish
Best option would be using a messaging queue like Rabbit MQ, which doesn't run the risk of client going down, it acts a producer consumer and can notify when the client comes up, in this case MQ can be notified when the process finish and thus client can be informed. MQ can be used for both incoming and response message in an async manner
In case, where client wants to periodically come up and check the status of the request, then DB persistence is important, which can be updated at regular intervals and it can be checked what's the status of the long running process.
My question to you is now, are we tackling this the wrong way? I imagine it's bad practice to start these long running processes in the asp.net controller.
Generally, yes. Ideally an ASP.NET service does not have any long-running processes inside it - or at the very least, no long-running processes that can't be easily and quickly shut down.
Doing work outside of an HTTP request (i.e., request-extrinsic code) is most reliably achieved by adding a durable queue with a separate background processor. The "background processor" can be a Win32 service, possibly even on the same machine. With this architecture, the HTTP request places a request on the queue, and the processor picks up requests from the queue and executes them.

Sending request to ASP.Net Core Web API running on a specific node in a Service Fabric Cluster

I am working on a Service Fabric Application, in which I am running my Application that contains a bunch of ASP.NET Core Web APIs. Now when I run my application on my local service fabric cluster that is configured with 5 nodes, the application runs successfully and I am able to send post requests the exposed Web APIs. Actually I want to hit the code running on a same cluster node with different post requests to the exposed APIs on that particular node.
For further explanation, for example there is an API exposed on Node '0' that accept a post request and execute a Job, and also there is an API that abort the running job. Now when I request to execute a Job, it starts to execute on Node '0' but when I try to abort the Job, the service fabric cluster forward the request to a different node for example say node '1'. In resulting I could not able to abort the running Job because there is no running Job available on Node '1'. I don't know how to handle this situation.
For states, I am using a Stateless service of type ASP.Net Core Web API and running the app on 5 nodes of my local service fabric cluster.
Please suggest what should be the best approach.
Your problem is because you are running your APIs to do a Worker task.
You should use your API to schedule the work in the Background(Process\Worker) and return to the user a token or operation id. The user will use this token to request the status or cancel the task.
The first step: When you call your API the first time, you could generate a GUID(Or insert in DB) and put this message in a queue(i.e: Service Bus), and then return the GUID to the caller.
The second step: A worker process will be running in your cluster listening for messages from this queue and process these messages whenever a message arrives. You can make this a single thread service that process message by message in a loop, or a multi-threaded service that process multiple messages using one Thread for each message. It will depend how complex you want to be:
In a single threaded listener, to scale you application, you have to span multiple instances so that multiple tasks will run in parallel, you can do that in SF with a simple scale command and SF will distribute the service instances across your available nodes.
In a multi-threaded version you will have to manage the concurrency for better performance, you might have to consider memory, cpu, disk and so on, otherwise you risk having too much load in a single node.
The third step, the cancellation: The cancellation process is easy and there are many approaches:
Using a similar approach and enqueue a cancellation message
Your service will listen for the cancellation in a separate thread and cancel the running task(if running).
Using a different queue to send the cancellation messages is better
If running multiple listener instances you might consider a topic instead of a queue.
Using a cache key to store the job status and check on every iteration if the cancellation has been requested.
Table with job status, where you check on every iteration as you would do with the cache key.
Creating a Remote endpoint to make a direct call to the service and trigger a cancellation token.
There are many approaches, these are simple, and you might make use of multiple in combination to have a better control of your tasks.
You'll need some storage to do that.
Create a table (e.g JobQueue). Before starting to process the job, you store in a database, store the status (e.g Running, it could be an enum), and then return the ID to the caller. Once you need to abort/cancel the job, you call the abort method from the API sending the ID you want to abort. In the abort method, you just update the status of the job to Aborting. Inside the first method (which runs the job), you'll need to check this table onde in a while, if it's aborting, then you stop the job (and update the status to Aborted). Or you could just delete from the database once the job has been aborted or finished.
Alternatively, if you want the data to be temporary, you could use a sixth server as a cache server and store data there. This cache server could be a clustered server as well, but then you would need to use something like Redis.

Creating MVC webservice with no timeout

I need to create a ASP.NET API service that when called doesnt wait for a response from the webserver. Basically I have a long sql task that I want to run then when its completed send an email to the user to tell them the job is done. It needs to avoid server response timeout, so something that just lets the user carry on without waiting round. I cant seem to find a way in MVC to do this, is it possible?
IMHO, I would queue this job and process it using another process outside IIS.
For example, this would be the flow:
User performs a request to your API to start the long task, but what API does in the server-side is queueing the whole task.
API returns a 200 OK response specifying that the job was queued successfully. You may use Azure Service Bus, Queues, MSMQ, RabbitMQ, Redis or even SQL Server using a table to maintain job state.
Some Windows Service, Azure Worker Role or a periodic scheduled task dequeues the task, processes it and as soon as it ends, it sends an email to the user to notify that the operation was done.
Queue the task and return the response immediately.
Basically, your server-side handler (controller action, Web API method, whatever) shouldn't invoke the long-running back-end task directly. It should do something relatively fast to just queue the task and then immediately return some indication that the task has been successfully queued. Another process entirely should actually execute the long-running task.
What I would recommend is two server-side applications. One is the web application, the other is either a Windows Service or a periodically scheduled Console Application. The web application would write a record to a database table to "queue" the process. This could contain simply:
User who queued the process
When it was queued
What process was queued (if there would ever be more than one, for example)
Status of the process ("queued" initially)
Anything else you might want to store.
Just insert a record here and then return to the user. At this point the web application has done its job.
The Windows Service (or Console Application) would check this database table for "queued" records. When it finds one, update the status to "processing" (so other executions don't try to run the same one) and invoke the long-running process. When the long-running process is complete, update the status to "complete" (or just delete the record if you don't want it anymore) and notify the user. (Handle error conditions accordingly, of course. Maybe re-try, maybe notify the user of the error, etc.)
By separating the concerns like this you place the appropriate responsibilities in the appropriate application contexts and provide the user with exactly the experience they're looking for. You additionally open the door for future functionality, such as queueing the process by means other than the web application or running reports on queued/running/failed/etc. processes by examining that database table.
Long story short: Don't try to hack a web application so that it doesn't behave like a web application. Use the technologies for their appropriate purposes.

how to make web service run to finish even if user leaves page

I have a website where I need to take a bit of data from the user, make an ajax call to a .net webservice, and then the webservice does some work for about 5-10 minutes.
I naturally dont want the user to have to sit there that whole time, so I have made it an asynchronous ajax call to the webservice, and after the call has been sent, I redirect the user to a "you are done!" page.
What I want to happen is for the webservice to keep running to finish--and not abort--after it receives the information from the user.
From my testing, this is more or less what happens, but now I'm finding that this might be limited by time? I.e. if the webservice runs past a certain amount of time, it will abort if the user isnt still connected.
I might be off here in this assessment, but this is what I THINK is going on from my testing.
So my question is whether with .net web services, if this is indeed what happens? Does it get aborted after some time if the user isnt still on the other end? Is there any way to disable this abort?
Thanks in advance!
when you invoke a web service, it will always finish its work, even if user leaves the page that invoked it.
Of course webservices have their own configuration and one of them sets timeout.
If you're creating a WCF service (SOAP Service) you can set it in its contract (changing binding properties), if you're creating a service with WebApi or MVC (REST/Http Service) then you can either add to its config file or programmatically set in its controller as it follows.
HttpContext.Server.ScriptTimeout = 3600; //Number of seconds
That can be a reason causing webservice to interrupt its work but it is not related to what happens on client side.
Have a nice day,
Alberto
Whilst I agree that the answer here is technically correct, I just
wanted to post a more robust alternative approach that avoids some of
the pitfalls possible with your current approach such as
Web Server being bounced during the long-running processing of request
Web Server App pool being recycled during processing
Web server running out of threads due to too many long-running requests and not being able to process any more requests
I would recommend you take a thoroughly ansynchronous approach and use
Message Queues (MSMQ for example) with a trigger on the queue that
will execute the work.
The process would be:
Your page makes Ajax call to the Webservice
Webservice writes a message into the Queue and returns right away. The message contains details of what work needs to be carried out.
User continues on your site as usual, or goes home, etc.
A trigger on the Queue is watching for messages and when a message
arrives in the queue, it activates a process which:
Reads the message
Performs the necessary work
Updates any back-end storage, etc, with the results of the work
This is much more robust because it totaly decouples the Web service from any long-running work and means that if the user makes a request and the web server goes down a moment later (for whatever reason) then the work will still be queued up when the server comes back online, etc.
You can read more about it here (MSMQ is the MS Message Queue tech; there are many others!)
Just my 2c

Flush HttpRuntime.Cache objects across all worker processes on IIS webserver

I have an asp.net web application running on IIS 7 set-up in web-garden mode. I want to clear runtime cache items across all worker processes using a single-step. I can setup a database key-value, but that would mean a thread executing on each worker process, on each of my load-balanced-scenario web servers will poll for changes on that key-value and flush cache. That would be a very bad mechanism as I flush cache items once per day at max. Also I cannot implement a push notification using the SqlCacheDependency with Service Broker notifications as I have a MySql db. Any thoughts? Is there any dirty work-around? One possible workaround, expose an aspx page, and hit that page multiple times using the ip and port on which the site is hosted instead of the domain name - ex: http://ip.ip.ip.ip:82/CacheClear.aspx, so that a request for that page might be sent to all the worker processes within that webserver, and on Page_Load, clear the cache items. But this is a really dirty hack and may not work in cases when all requests are sent to the same worker process.
You need to setup inter-process communication.
For caching there are two commonly used ways of doing this:
Setup a shared cache (memcached or the like.)
Setup a message queue (e.g. ms-mqueue or rabbitMq) and use it to spread state to the local caches.
A shared cache is the ultimate solution as it means the whole cache is distributed but it is also the most complex: it needs to be set up so the cache load is properly distributed between nodes and make sure it doesn't become a bottle neck.
The second option requires more code on your part but it is easier if you don't want to share the cache content (as in your case.)
The easiest is to setup a listener thread or task to handle the cache clear or individual entries invalidation messages. This thread will be dormant if there are no messages so the impact on performance is minimal.
You can also forgo the listener thread by handling messages as part of the usual iis request pipeline. I.e. set up a filter/module that checks for messages in the queue and processes them before handling the request; but performance wise the first option is (slightly) better.

Categories

Resources