I wrote a c# datacentric web application.
This application needs to perform some things asynchronously (for example, sending email, transmitting some data to external api), but I want them to be persisted for any case of crash / restart.
I also want to pass some data that will be persisted, so when the thread wakes up, it will have this data for the invocation. when I say data, I mean data context, some structured object, so when the thread wakesup, it will have the data for the thread operation, incase of email, To,subject and body.
So just to visualized it, here is an api that I can think of...
public interface IAsyncService{
void QueueWork<T>(object dataContext) where T : IAsyncOperation;
}
public interface IAsyncOperation{
void ExecuteQueuedWork(object dataContext);
}
Does this case scenario possible in .net native? if not, do you know any other possible solution for that?
Yes, and no.
You can't "persist a thread". That's simply impossible. Thread is a low-level thing.
However, you can have the expected result. Just persist the jobs, not threads. Job (or task, or workitem, or whatever you would like to name it) is the set of input data that defines the task to be performed, plus, optionally, the information about progress, temporary results, and similar things.
If you define the "job" just as a set of input data, you will be able to have a pool of workers that will start processing the jobs. When a worker crashes, assuming the job is still persisted, you will be able to start a new worker and let it process the failed job again from the beginning.
If you inclide in the "job" some temporary (partial) results, then after a crash, your new worker can start its work from that saved point.
Now, the granularity of savepoints (if any), the tracking of "which thread does what job", the tracking "what job is completed and which are not" - are solely your responibilities. You have to design and write all of that yourself. That's doable, not that hard, but requires a bit of planning.
Or, with a bit of luck, you might find workerpool/messagequeueing/etc library. I don't remember any right now.
Related
So what I am dealing with here is a pretty complex server application that performs quite a long, complex and lengthy operation involving a number of threads and a number of tasks (yes, some are created as Tasks and some are "manually" created threads.
The complex and lengthy process we are talking here is triggered using a REST API call (assume a web page generates the request object based on user input), and provides the user with an ID to poll for the progress of the operation.
I am looking for an efficient way to allow the user to click an "Abort/Stop" button that will stop this whole orchestra of threads/tasks.
(keep in mind this server can process a number of requests as described above).
I have looked into a number of options, all of which seem to require the threads/tasks themselves to monitor for an "abort flag" during their operation loop and break out of the loop should it be required.
Obviously using Thread.Abort is a big no no.
I have thought about having some kind of an abstract class (say: AbortableThread) all which all my "worker" threads will have to implement, where the only abstract function would be Abort(), so that each thread can end in a clean manner, closing and finishing whatever it needs to.
This way, I would perhaps be able to keep tabs on the threads that have been spawned by a specific user request and just call Abort() in a foreach loop.
Despite that, I still try to figure out how would it be able to break into the loops I run, so my 2nd though would be that this "abstract" class could have a property shouldBreak which will again I will be able to set from "outside", but this then brings me back to (almost) square one where I have to add "logic" into my threads to be able to abort.
A 3rd idea I came up with is to have my logical loops call Abort() on every loop, without any validation whether an abort is currently required or not, where I will check my abstract's base class shouldBreak bool and act accordingly should it be required.
There are no code examples and I have decided I would rather figure this out in high level before I dive into implementation.
Thank you for reading this long question!
I have several long-running threads in an MVC3 application that are meant to run forever.
I'm running into a problem where a ThreadAbortException is being called by some other code (not mine) and I need to recover from this gracefully and restart the thread. Right now, our only recourse is to recycle the worker process for the appDomain, which is far from ideal.
Here's some details about this code works:
A singleton service class exists for this MVC3 application. It has to be a singleton because it caches data. This service is responsible for making request to a database. A 3rd party library is used for the actual database connection code.
In this singleton class we use a collection of classes that are called "QueryRequestors". These classes identify unique package+stored_procedure names for requests to the database, so that we can queue those calls. That is the purpose of the QueryRequestor class: to make sure calls to the same package+stored_procedure (although they may have infinite different parameters) are queued, and do not happen simultaneously. This eases our database strain considerably and improves performance.
The QueryRequestor class uses an internal BlockingCollection and an internal Task (thread) to monitor its queue (blocking collection). When a request comes into the singleton service, it finds the correct QueryRequestor class via the package+stored_procedure name, and it hands the query over to that class. The query gets put in the queue (blocking collection). The QueryRequestor's Task sees there's a request in the queue and makes a call to the database (now the 3rd party library is involved). When the results come back they are cached in the singleton service. The Task continues processing requests until the blocking collection is empty, and then it waits.
Once a QueryRequestor is created and up and running, we never want it to die. Requests come in to this service 24/7 every few minutes. If the cache in the service has data, we use it. When data is stale, the very next request gets queued (and subsequent simultaneous requests continue to use the cache, because they know someone (another thread) is already making a queued request, and this is efficient).
So the issue here is what to do when the Task inside a QueryRequestor class encounters a ThreadAbortException. Ideally I'd like to recover from that and restart the thread. Or, at the very least, dispose of the QueryRequestor (it's in a "broken" state now as far as I'm concerned) and start over. Because the next request that matches the package+stored_procedure name will create a new QueryRequestor if one is not present in the service.
I suspect the thread is being killed by the 3rd party library, but I can't be certain. All I know is that nowhere do I abort or attempt to kill the thread/task. I want it to run forever. But clearly we have to have code in place for this exception. It's very annoying when the service bombs because a thread has been aborted.
What is the best way to handle this? How can we handle this gracefully?
You can stop re-throwing of ThreadAbortException by calling Thread.ResetAbort.
Note that most common case of the exception is Redirect call, and canceling thread abort may case undesired effects of execution of request code that otherwise would be ignored due to killing the thread. It is common issue in WinForms (where separation of code and rendering is less clear) than in MVC (where you can return special redirect results from controllers).
Here's what I came up with for a solution, and it works quite nicely.
The real issue here isn't preventing the ThreadAbortException, because you can't prevent it anyway, and we don't want to prevent it. It's actually a good thing if we get an error report telling us this happened. We just don't want our app coming down because of it.
So, what we really needed was a graceful way to handle this Exception without bringing down the application.
The solution I came up with was to create a bool flag property on the QueryRequestor class called "IsValid". This property is set to true in the constructor of the class.
In the DoWork() call that is run on the separate thread in the QueryRequestor class, we catch the ThreadAbortException and we set this flag to FALSE. Now we can tell other code that this class is in an Invalid (broken) state and not to use it.
So now, the singleton service that makes use of this QueryRequestor class knows to check for this IsValid property. If it's not valid, it replaces the QueryRequestor with a new one, and life moves on. The application doesn't crash and the broken QueryRequestor is thrown away, replaced with a new version that can do the job.
In testing, this worked quite well. I would intentionally call Thread.Abort() on the DoWork() thread, and watch the Debug window for output lines. The app would report that the thread had been aborted, and then the singleton service was correctly replacing the QueryRequestor. The replacement was then able to successfully handle the request.
I have a series of calculations that need to be processed - the calculations and the order they run are all defined by the user on the UI.
If they just ran one after each other, it wouldn't be too hard. However, some of the calculations need to be processed concurrently and all calculations must have the ability to be individually paused at any time. I also need to be able to re-arrange orders or add new calculations to be processed at any time. So whatever I do must be flexible enough to handle this.
On the UI, imagine a listbox (a queue, if you like) of usercontrols - with each usercontrol displaying the name of the calculation and a pause button. And I can add calculations to this list at any time during processing.
What is the best way to do this?
Should I be running each calculation in its own thread? If so, how should I store the list of running processes? How will I pass the queue to the calculation processor? How will I be able to ensure that every time the queue changes (new ordering or new calculation) the calculation processor will be made aware of this?
My initial thoughts were to have:
CalcProcessor class
CalcCalculation class
In CalcProcessor have 2 Lists of CalcCalculations. One being the "queue" as shown on the UI (perhaps a pointer to it? Or some other way to ensure it updates live), and the other being the list of currently running calculations.
Somehow I need to get the CalcCalculation to be running in its own thread to process the calculation, and be able to handle any pause events. So I need some way to transmit the info of the Pause button being pressed in the UI to the CalcProcessor object, and then into the correct CalcCalculation.
Edit in response to David Hope:
Thanks for your reply.
Yes, there are n calculations but this could change at any time due to being able to add more calculations to process on the UI.
They do not need to share data in anyway. There will be a setting in the application to specify how many should run concurrently (ie. 10 at any given time, the first 10 in the queue for example - and when 1 finishes the next calculation in the queue will start processing).
The calculation will involve taking data from some data source - it could be a database or a file, and analysing it and performing some calculations on that data. When I say the calculation needs to be paused, I don't mean pausing the thread... I just mean (for example, as I haven't written this part of the application yet) if it is reading row by row from a database and doing some live calculations pausing at the completion of processing the current row... and continuing on when the pause button is unclicked on the UI - which could be done with something as primitive as a while(notPaused) loop providing I can get the Pause information from the UI into the thread.
There are several questions here:
How to synchronize the UI and the model?
I think you got this one backwards. Your model shouldn't have a “pointer” to the queue you're showing in the UI. Instead, the queue should be in your model and you should use databinding together with INotifyPropertyChange and ObservableCollection to show the queue on the UI. (At least that's how it's done in WPF.)
This way, you can manipulate your queue directly from your model, and it will automatically show on the UI.
How to start and monitor calculations?
I think Tasks are ideal for this. You can start a Task using Task.Factory.StartNew(). Since it seems your Tasks will take long to execute, you might consider using TaskCreationOptions.LongRunning. You can also use the Task to find out when is the calculation complete (or if it failed with an exception).
How to pause running calculations?
You can use ManualReserEventSlim for that. Normally, it would be set, but if you wanted to pause a running Task, you would Reset() it. The calculation will need to periodically call Wait() on that event. It's not possible to reasonably pause a running thread without cooperation from the calculation on that thread.
If you were using C# 5.0, a better approach would be to use something like PauseToken.
In Framework 4.5, the answer here is the Async API, which removes the need to manage threads. For details, look at the async/await keywords.
From a broader perspective, a "CalcProcessor" class is a good idea, but I think the Task object will suffice to replace your "CalcCalculation" class. The Processor can simply have an Enumerable of Tasks. The Processor can expose methods for managing the queue, if needed, as well as returning information about its status. When your application finally reaches a state where it must have the results, you can use the AwaitAll method to block the CalcProcessor's thread until all of the tasks complete.
Without more information about the actual goal here, it's hard to give better advice.
You can use Observer Pattern to display results on UI and order changes back in to Processor. State and Command patterns will help you to start, pause, cancel the calculations. These patterns have great answers to your questions in design way. Concurrency is still a problem, they do not answer multi-threading problems but they open an easier road to manage threads.
I suggest that you haven't broken the problem down far enough, which is the reason you are frustrated.
You need to start small and build up from there. You mention, but don't define your actual requirements, but they seem to be...
Need to be able to run ?N? calculations
Some need to be run concurrently (does this imply that they share data, if so how are you going to share the data)
Must be able to pause the calculation (don't use Thread.Suspend, as it potentially leaves a thread in an unstable state, particularly bad if you are sharing data), so you will need to build pause points into each calculation. Also need to consider how you are going to communicate the pause/unpause to the calculation
As far as methods, there are several to consider...
Threads are an obvious choice, but require careful tending too (starting, pausing, stopping, etc...)
You could also use BackGroundWorker or possibly Parallel.ForEach
BackGroundWorker contains the framework for cancelling the worker and providing progress (which can be useful).
My recommendation to start would be to go with BackGroundWorker, potentially subclass it to add the Pause/Resume functionality you need. Determine how you are going to manage data sharing (at least use lock to protect against simultaneous access).
You may find BackGroundWorker too restrictive and need to go with Threads, but I'm usually able to avoid it.
If you post more clear requirements, or samples of what you've tried and didn't work, I'll be happy to help more.
For queue you can use heap data structure (priority queue). This will help prioritize yours tasks. Also you should use Thread Pool for effectively calculations. And try to split you tasks to little parts.
I have this [HttpPost] action method:
[HttpPost]
public ActionResult AddReview(Review review)
{
repository.Add(review);
repository.Save();
repository.UpdateSystemScoring(review.Id); // call SPROC with new Review ID.
return View("Success", review);
}
So, basically a user clicks a button, i add it to my database (via Entity Framework 4.0), save changes, and then i call a stored procedure with the identity field, which is that second last line of code.
This needs to be done after the review is saved (as the identity field is only created once Save is called, and EF persists the changes), and it is a system-wide calculation.
From the user point of view, he/she doesn't/shouldn't care that this calculation is happening.
This procedure can take anywhere from 0-20 seconds. It does not return anything.
Is this a candidate for an asynchronous controller?
Is there a way i can add the Review, and let another asynchronous controller handle the long-running SPROC call, so the user can be taken to the Success page immediately?
I must admit (partially ashamed of this): this is a rewrite of an existing system, and in the original system (ASP.NET Web Forms), i fired off another thread in order to achieve the above task - which is why i was wondering if the same principal can be applied to ASP.NET MVC 3.
I always try and avoid multi-threading in ASP.NET but user experience is the #1 priority, and i do not want the page timing out.
So - is this possible? Also happy to hear any other ideas. Also - i can't use triggers here, don't really want to go into too much detail why - but i can't.
I would fire a new thread (not from the thread pool) to perform this task and return immediately especially if you don't care about the results. Asynchronous controllers are useful in situations where most of the time is spent waiting for some other system to complete the task and you once this system completes the task your application is signaled to process the result. During the execution of the task no threads are consumed from your application. So in your scenario this task could be performed by SQL Server using the async versions of the BeginRead methods in ADO.NET. You could use this if you need the results back. If you don't firing a new thread would work just fine as before.
I think asynchronous controllers are more for things where the request may take a long time to return a response, but the main thread would spend most of that time waiting for another thread/process. This is mostly useful for ajax calls rather than main page load, when it is acceptable to just show a progress indicator until the response is returned.
I use a separate queueing system for this type of task, which is more robust and easier to work with but does take a bit more work to set up. If you really need to do it within the ASP.net process, a separate request is probably the best option, though there is some potential for the task not to run - for example I'm not sure what happens if the connection drops or the app pool recycles while an async task is running.
Since the scoring system takes so long to run I would recommend using a scheduled task in SQL Server or Windows to update the scores every x amount of minutes. Since the user doesn't know about the request it don't matter to run immediately.
You could add the ID's to a queue and process the queue every 30 minutes.
Otherwise if there is a reason this needs to be run immediately you could do an asyc call or see if you could trim some fat of the stored proc.
I have a very similar system that I wrote. Instead of doing things synchronously we do everything asynchronous using queues.
Action -> causes javascript request to web server
|
Web server puts notification on queue
|
Worker picks up message from queue and does point calculation
|
At some point in future user sees points adjusted
This allows us to be able to handle large amounts of user load and not need to worry about this having an adverse affect on our calculation engine. This also means that we can add more workers to handle larger load when we have large load and can remove workers when we don't have a large load.
Sometimes there is a lot that needs to be done when a given Action is called. Many times, there is more that needs to be done than what needs to be done to generate the next HTML for the user. In order to make the user have a faster experience, I want to only do what I need to do to get them their next view and send it off, but still do more things afterwards. How can I do this, multi-threading? Would I then need to worry about making sure different threads don't step on each others feet? Is there any built in functionality for this type of thing in ASP.NET MVC?
As others have mentioned, you can use a spawned thread to do this. I would take care to consider the 'criticality' of several edge cases:
If your background task encounters an error, and fails to do what the user expected to be done, do you have a mechanism of report this failure to the user?
Depending on how 'business critical' the various tasks are, using a robust/resilient message queue to store 'background tasks to be processed' will help protected against a scenario where the user requests some action, and the server responsible crashes, or is taken offline, or IIS service is restarted, etc. and the background thread never completes.
Just food for though on other issues you might need to address.
How can I do this, multi-threading?
Yes!
Would I then need to worry about making sure different threads don't step on each others feet?
This is something you need to take care of anyway, since two different ASP.NET request could arrive at the same time (from different clients) and be handled in two different worker threads simultaneously. So, any code accessing shared data needs to be coded in a thread-safe way anyway, even without your new feature.
Is there any built in functionality for this type of thing in ASP.NET MVC?
The standard .net multi-threading techniques should work just fine here (manually starting threads, or using the Task features, or using the Async CTP, ...).
It depends on what you want to do, and how reliable you need it to be. If the operaitons pending after the response was sent are OK to be lost, then .Net Async calls, ThreadPool or new Thread are all going to work just fine. If the process crashes the pending work is lost, but you already accepted that this can happen.
If the work requires any reliable guarantee, for instance the work incurs updates in the site database, then you cannot use the .Net process threading, you need to persist the request to do the work and then process this work even after a process restart (app-pool recycle as IIS so friendly calls them).
One way to do this is to use MSMQ. Other way is to use the a database table as a queue. The most reliable way is to use the database activation mechanisms, as described in Asynchronous procedure execution.
You can start a background task, then return from the action. This example is using the task Parallel Library, found in .NET 4.0:
public ActionResult DoSomething()
{
Task t = new Task(()=>DoSomethingAsynchronously());
t.Start();
return View();
}
I would use MSMQ for this kind of work. Rather than spawning threads in an ASP.NET application, I'd use an Asynchronous out of process way to do this. It's very simple and very clean.
In fact I've been using MSMQ in ASP.NET applications for a very long time and have never had any issues with this approach. Further, having a different process (that is an executable in a different app domain) do the long running work is an ideal way to handle it since your web application is no being used to do this work. So IIS, the threadpool and your web application can continue to do what they need to, while other processes handle long running tasks.
Maybe you should give it a try: Using an Asynchronous Controller in ASP.NET MVC