I found this related question but my situation is a little bit different.
I have a ASP.NET application that produces long running tasks that should be processed by a number of background processes (Windows Services). Most of the tasks are similar and can be handled by most task runners. Due to different versions of the client application (where the tasks are generated by users) some tasks can only be processed by task runners of specific version. The web server has no knowledge about the kind of task. It just sends all the tasks to the same queue using MSMQ.
If a task enters the queue, the next free task runner should receive the task, decide if he can handle this kind of task, remove the task from the queue and run the task.
If the runner that received the message is not able to process this kind of task, it should put back the message to the queue, so that another runner can have a look on the message.
I tried to implement a conditional receive using a transaction, that I can abort if the task has the wrong format:
transaction.Begin();
var msg = queue.Receive(TimeSpan.FromSeconds(1000), transaction);
if (CanHandle(msg))
{
transaction.Commit();
// handle
}
else
{
transaction.Abort();
}
It seems to work, but I don't know if this is the preferable way do go.
Another problem with this solution is, if there is no other free runner that can handle this message I will receive it again and again.
Is there a way I can solve this problem only using MSMQ? The whole task data is already stored in a SQL database. The task runner accesses the task data over a HTTP API (Thats why I rule out solution like SQLServer Service Broker). The data sent to the message queue is only meta data used to identify the job.
If plain MSMQ is not the right tool, can I solve the problem using MassTransit (I didn't like the fact that I have to install and run the additional MassTransit RuntimeServices + SQL db for it) for example? Other suggestions?
The way you are utilizing MSMQ is really circumventing some of the fundamental features of the technology. If queue message cannot be universally handled by the reader, you are incurring a pretty sizable system performance penalty, where many of your task processing services can get sent back empty-handed when they ask for tasks. In extreme scenario, imagine what would happen if there were only one service that could perform task type "A." If that service were to go down, and the first task to be pulled out of the queue is of type "A," then your entire system will shut down.
I would suggest one of two approaches:
Utilize multiple queues, as in one per task version. Hide task retrieval behind an API or some other service. Your service can request a task from one or more task types, or you can even allow for anything. The API would then be charged with figuring out which queue to pull from (i.e. map to a specific task type, pick one at random, do some sort of round robining, etc.)
Opt for a different storage technology over queueing. If you write good enough SQL, a relational database would be more than up for the task. You just must exhibit a lot of care to not incur deadlocks.
Can you create another queue? if Yes then I would create multiple queues. Like GenericTaskQ, which will have all the tasks in it then xTaskQ and yTaskQ. Now your xTaskRunner will pick the tasks from Generic queue and if can not process it then put it in yTaskQ(or whatever q is appropriate). same is for yTaskRunner, if it cant handle the message put it in xTaskQueue. And x and y taskrunners should always look for their respective queues first, if nothing there then go look into genericq.
If you can not create multiple qs, use message(task) labels (which should be unique, we normaly use GUID) to remember what tasks have already been seen by a task runner and can not be processed. also use Peek, to check if this message is already been addressed, before actually receiving the message.
Related
I wrote a c# datacentric web application.
This application needs to perform some things asynchronously (for example, sending email, transmitting some data to external api), but I want them to be persisted for any case of crash / restart.
I also want to pass some data that will be persisted, so when the thread wakes up, it will have this data for the invocation. when I say data, I mean data context, some structured object, so when the thread wakesup, it will have the data for the thread operation, incase of email, To,subject and body.
So just to visualized it, here is an api that I can think of...
public interface IAsyncService{
void QueueWork<T>(object dataContext) where T : IAsyncOperation;
}
public interface IAsyncOperation{
void ExecuteQueuedWork(object dataContext);
}
Does this case scenario possible in .net native? if not, do you know any other possible solution for that?
Yes, and no.
You can't "persist a thread". That's simply impossible. Thread is a low-level thing.
However, you can have the expected result. Just persist the jobs, not threads. Job (or task, or workitem, or whatever you would like to name it) is the set of input data that defines the task to be performed, plus, optionally, the information about progress, temporary results, and similar things.
If you define the "job" just as a set of input data, you will be able to have a pool of workers that will start processing the jobs. When a worker crashes, assuming the job is still persisted, you will be able to start a new worker and let it process the failed job again from the beginning.
If you inclide in the "job" some temporary (partial) results, then after a crash, your new worker can start its work from that saved point.
Now, the granularity of savepoints (if any), the tracking of "which thread does what job", the tracking "what job is completed and which are not" - are solely your responibilities. You have to design and write all of that yourself. That's doable, not that hard, but requires a bit of planning.
Or, with a bit of luck, you might find workerpool/messagequeueing/etc library. I don't remember any right now.
I'm writing a service that has to read tasks from an AMQP message queue and perform a synchronous action based on the message type. These actions might be to send an email or hit a web service, but will generally be on the order of a couple hundred milliseconds assuming no errors.
I want this to be extensible so that other actions can be added in the future. Either way, the volume of messages could be quite high, with bursts of 100's / second coming in.
I'm playing around with several designs, but my questions are as follows:
What type of threading model should I go with? Do I:
a) Go with a single thread to consume from the queue and put tasks on a thread pool? If so, how do I represent those tasks?
b) Create multiple threads to host their own consumers and have them handle the task synchronously?
c) Create multiple threads to host their own consumers and have them all register a delegate to handle the tasks as they come in?
In the case of a or c, what's the best way to have the spawned thread communicate back with the main thread? I need to ack the message that came off the the queue. Do I raise an event from the spawned thread that the main thread listens to?
Is there a guideline as to how many threads I should run, given x cores? Is it x, 2*x? There are other services running on this system too.
You should generally* avoid direct thread programming in favor of the Task Parallel Library and concurrent collections built into .NET 4.0 and higher. Fortunately, the producer/consumer problem you described is common and Microsoft has a general-purpose tool for this: the BlockingCollection. This article has a good summary of its features. You may also refer to this white paper for performance analysis of the BlockingCollection<T> (among other things).
However, before pursuing the BlockingCollection<T> or an equivalent, given the scenario you described, why not go for the simple solution of using the Tasks. The TPL gives you the asynchronous execution of tasks with a lot of extras like cancellation and continuation. If, however, you need more advanced lifecycle management, then go for something like a BlockingCollection<T>.
* By "generally", I'm insinuating that the generic solution will not necessarily perform the best for your specific case as it's almost certain that a properly designed custom solution will be better. As with every decision, perform the cost/benefit analysis.
I'm really loving the TPL. Simply calling Task.Factory.StartNew() and not worrying about anything, is quite amazing.
But, is it possible to have multiple Factories running on the same thread?
Basically, I have would like to have two different queues, executing different types of tasks.
One queue handles tasks of type A while the second queue handles tasks of type B.
If queue A has nothing to do, it should ignore tasks in queue B and vice versa.
Is this possible to do, without making my own queues, or running multiple threads for the factories?
To clarify what I want to do.
I read data from a network device. I want to do two things with this data, totally independent from each other.
I want to log to a database.
I want to send to another device over network.
Sometimes the database log will take a while, and I don't want the network send to be delayed because of this.
If you use .NET 4.0:
LimitedConcurrencyLevelTaskScheduler (with concurrency level of 1; see here)
If you use .NET 4.5:
ConcurrentExclusiveSchedulerPair (take only the exclusive scheduler out of the pair; see here)
Create two schedulers and pass them to the appropriate StartNew. Or create two TaskFactories with these schdulers and use them to create and start the tasks.
You can define yourself a thread pool using a queue of threads
I would like to use MSMQ queue to handle a lot of operations on XML data files. If I properly understand that technology, tasks will be passed to queue where they will get by handler. Also if there are a lot of such tasks, handler will catch tasks one by one. Therefore there are some pending tasks that just laying in queue and waits for handler.
So also I should show a progress of handling of uploaded XML files on website in percents.
The question is how can I demonstrate such progress of pending tasks which really didn't start to be handled.
POST EDIT
The regular way of reflecting a progress of handling some task is to request service back for a percentage of completeness by some token, which client was generated before. And then just write it on the site.
You can open a queue as bi-directional and let the handler pass an answer back to the sender.
MSMQ is ment to be used by a different process that can be run on even a different computer.
This a way to offload long running jobs off the current process, as for example a service.
If that service is down, your client will not know about it, it even shouldn't care as MSMQ "guarantees" the job will be done. Consider how much use tracking progress is in that case? (besides observing that the service could be dead)
If you just to want to do some simple async work I suggest to look at the Task class and leave MSMQ.
Sometimes there is a lot that needs to be done when a given Action is called. Many times, there is more that needs to be done than what needs to be done to generate the next HTML for the user. In order to make the user have a faster experience, I want to only do what I need to do to get them their next view and send it off, but still do more things afterwards. How can I do this, multi-threading? Would I then need to worry about making sure different threads don't step on each others feet? Is there any built in functionality for this type of thing in ASP.NET MVC?
As others have mentioned, you can use a spawned thread to do this. I would take care to consider the 'criticality' of several edge cases:
If your background task encounters an error, and fails to do what the user expected to be done, do you have a mechanism of report this failure to the user?
Depending on how 'business critical' the various tasks are, using a robust/resilient message queue to store 'background tasks to be processed' will help protected against a scenario where the user requests some action, and the server responsible crashes, or is taken offline, or IIS service is restarted, etc. and the background thread never completes.
Just food for though on other issues you might need to address.
How can I do this, multi-threading?
Yes!
Would I then need to worry about making sure different threads don't step on each others feet?
This is something you need to take care of anyway, since two different ASP.NET request could arrive at the same time (from different clients) and be handled in two different worker threads simultaneously. So, any code accessing shared data needs to be coded in a thread-safe way anyway, even without your new feature.
Is there any built in functionality for this type of thing in ASP.NET MVC?
The standard .net multi-threading techniques should work just fine here (manually starting threads, or using the Task features, or using the Async CTP, ...).
It depends on what you want to do, and how reliable you need it to be. If the operaitons pending after the response was sent are OK to be lost, then .Net Async calls, ThreadPool or new Thread are all going to work just fine. If the process crashes the pending work is lost, but you already accepted that this can happen.
If the work requires any reliable guarantee, for instance the work incurs updates in the site database, then you cannot use the .Net process threading, you need to persist the request to do the work and then process this work even after a process restart (app-pool recycle as IIS so friendly calls them).
One way to do this is to use MSMQ. Other way is to use the a database table as a queue. The most reliable way is to use the database activation mechanisms, as described in Asynchronous procedure execution.
You can start a background task, then return from the action. This example is using the task Parallel Library, found in .NET 4.0:
public ActionResult DoSomething()
{
Task t = new Task(()=>DoSomethingAsynchronously());
t.Start();
return View();
}
I would use MSMQ for this kind of work. Rather than spawning threads in an ASP.NET application, I'd use an Asynchronous out of process way to do this. It's very simple and very clean.
In fact I've been using MSMQ in ASP.NET applications for a very long time and have never had any issues with this approach. Further, having a different process (that is an executable in a different app domain) do the long running work is an ideal way to handle it since your web application is no being used to do this work. So IIS, the threadpool and your web application can continue to do what they need to, while other processes handle long running tasks.
Maybe you should give it a try: Using an Asynchronous Controller in ASP.NET MVC