I would like to use MSMQ queue to handle a lot of operations on XML data files. If I properly understand that technology, tasks will be passed to queue where they will get by handler. Also if there are a lot of such tasks, handler will catch tasks one by one. Therefore there are some pending tasks that just laying in queue and waits for handler.
So also I should show a progress of handling of uploaded XML files on website in percents.
The question is how can I demonstrate such progress of pending tasks which really didn't start to be handled.
POST EDIT
The regular way of reflecting a progress of handling some task is to request service back for a percentage of completeness by some token, which client was generated before. And then just write it on the site.
You can open a queue as bi-directional and let the handler pass an answer back to the sender.
MSMQ is ment to be used by a different process that can be run on even a different computer.
This a way to offload long running jobs off the current process, as for example a service.
If that service is down, your client will not know about it, it even shouldn't care as MSMQ "guarantees" the job will be done. Consider how much use tracking progress is in that case? (besides observing that the service could be dead)
If you just to want to do some simple async work I suggest to look at the Task class and leave MSMQ.
Related
I am developing a tcp server that processes commands.
When command arrives a use Task.Run to handle processing on a different thread to unblock the requesting side. The problem is that it is possible for requesting side to make large amount of request thus causing processing side to create lots of threads which subsequently based on my observation can cripple it and unable to process new requests.
Essentially what i need that a single requesting side can have defined max of requests processed and all of other requests queued for later execution.
So my question is should i be looking at a custom Task Scheduler ?
There is an Microsoft example https://msdn.microsoft.com/en-us/library/ee789351(v=vs.100).aspx
I haven't started digging deeper as i wanted to see that this is the actual road i would need to go.
I found this related question but my situation is a little bit different.
I have a ASP.NET application that produces long running tasks that should be processed by a number of background processes (Windows Services). Most of the tasks are similar and can be handled by most task runners. Due to different versions of the client application (where the tasks are generated by users) some tasks can only be processed by task runners of specific version. The web server has no knowledge about the kind of task. It just sends all the tasks to the same queue using MSMQ.
If a task enters the queue, the next free task runner should receive the task, decide if he can handle this kind of task, remove the task from the queue and run the task.
If the runner that received the message is not able to process this kind of task, it should put back the message to the queue, so that another runner can have a look on the message.
I tried to implement a conditional receive using a transaction, that I can abort if the task has the wrong format:
transaction.Begin();
var msg = queue.Receive(TimeSpan.FromSeconds(1000), transaction);
if (CanHandle(msg))
{
transaction.Commit();
// handle
}
else
{
transaction.Abort();
}
It seems to work, but I don't know if this is the preferable way do go.
Another problem with this solution is, if there is no other free runner that can handle this message I will receive it again and again.
Is there a way I can solve this problem only using MSMQ? The whole task data is already stored in a SQL database. The task runner accesses the task data over a HTTP API (Thats why I rule out solution like SQLServer Service Broker). The data sent to the message queue is only meta data used to identify the job.
If plain MSMQ is not the right tool, can I solve the problem using MassTransit (I didn't like the fact that I have to install and run the additional MassTransit RuntimeServices + SQL db for it) for example? Other suggestions?
The way you are utilizing MSMQ is really circumventing some of the fundamental features of the technology. If queue message cannot be universally handled by the reader, you are incurring a pretty sizable system performance penalty, where many of your task processing services can get sent back empty-handed when they ask for tasks. In extreme scenario, imagine what would happen if there were only one service that could perform task type "A." If that service were to go down, and the first task to be pulled out of the queue is of type "A," then your entire system will shut down.
I would suggest one of two approaches:
Utilize multiple queues, as in one per task version. Hide task retrieval behind an API or some other service. Your service can request a task from one or more task types, or you can even allow for anything. The API would then be charged with figuring out which queue to pull from (i.e. map to a specific task type, pick one at random, do some sort of round robining, etc.)
Opt for a different storage technology over queueing. If you write good enough SQL, a relational database would be more than up for the task. You just must exhibit a lot of care to not incur deadlocks.
Can you create another queue? if Yes then I would create multiple queues. Like GenericTaskQ, which will have all the tasks in it then xTaskQ and yTaskQ. Now your xTaskRunner will pick the tasks from Generic queue and if can not process it then put it in yTaskQ(or whatever q is appropriate). same is for yTaskRunner, if it cant handle the message put it in xTaskQueue. And x and y taskrunners should always look for their respective queues first, if nothing there then go look into genericq.
If you can not create multiple qs, use message(task) labels (which should be unique, we normaly use GUID) to remember what tasks have already been seen by a task runner and can not be processed. also use Peek, to check if this message is already been addressed, before actually receiving the message.
We wrote service that using ~200 threads .
200 Threads must do:
1- Download from internet
2- Parse the raw data (html,xml,json...)
3- Store the newly created data to db
For ~10 threads elapsed time for second operation(Parsing) is 50ms (per thread)
For ~50 threads elapsed time for second operation(Parsing) is 80-18000 ms (per thread)
So we have an idea !
We can download documents as multithreaded but using MSMQ we can send rawdata to another process (consumer). And another process implement second part (Parsing) as single threaded.
You can say why dont you use c# Queue class in same process.. We could not prevent our "precious parsing thread" from Thread Context switch. If there are 200 threads in same process the precious will be context switch victim.
Using MSMQ for this requirement is normal?
Yes, this is an excellent example of where MSMQ makes a lot of sense. You can offload your difficult work to a different process to handle without affecting the performance of your current process which clearly doesn't care about the results. Not only that, but if your new worker process goes down, the queue will preserve state and messages (other than maybe the one being worked on when it went down) will not be lost.
Depending on your needs and goals I'd consider offloading the download to the other process as well - passing URLs to work on to the queue for example. Then, scaling up your system is as easy as dialing up the queue receivers, since queue messages are received in a thread safe manner when implemented correctly.
Yes, it is normal. And there are frameworks/libraries that help you building these kind of solutions providing you more than only transports.
NServiceBus or MassTransit are examples (both can sit on top of MSMQ)
Does an HttpHandler listen for a disconnect from the browser?
My guess is "no" since it seems to be mostly/only used for dynamic file creation, so why would it?
But I can't find an answer in the docs or goog.
Many thanks in advance!
Background
I'd like to "abort" an HttpHandler because currently, I allow huge excel exports (~150k sql rows, so ~600k html lines). For reasons almost as ridiculous as the code, I have a query that fires for as many sql rows that the user tries to export. As you can imagine, this takes a very long time.
I think I'm getting backed up with worker processes because users probably get frustrated with the lag, and try again with a smaller result. I currently flush the worker procs automatically every 30 min, but I'd rather cleanup more quickly.
I don't have the time to clean up the sql right now, so I'd like to just listen for an "abort" from the client and kill the handler if "aborted".
What you're hoping to accomplish by listening for a client connection drop won't really help solve your problem at all. The core of your problem is a long running task being kicked off in an HttpHandler directly.
In this case, even if you could listen for a client disconnect it wouldn't ever be acted upon as your code will be too busy executing to listen for it.
The only way to properly determine progress and perform actions during long running processes such as this is to ensure that your code is multi-threaded. The problem with doing this in ASP.NET for long running processes is they'll suck up threads from the thread pool needed to serve your pages. This could result in your website hanging or responding very slowly, as you've been experiencing.
I would recommend writing a Windows Service to handle these long running jobs and having it spit the results into a staging directory. I would then use MSMQ or similar to throw the request to the service for processing.
Ultimately, you want to get this long running thread outside of ASP.NET where you can take advantage of the benefits that multi-threading can offer you. Such as, the ability to report back the progress and to abort when needed.
Working on a C# project at the minute - a general idea of what I'm trying to do is...
A user has one or more portfolios with symbols they are interested in,
Each portfolio downloads data on the symbols to csv, parses it, then runs a set of rules on the data,
Alerts are generated based on the outcomes of these rules
The portfolios should download the data etc. whenever a set interval has passed
I was planning on having each portfolio running on it's own thread, that way when the interval has elapsed, each portfolio can proceed to download the data, parse and run the rules on it concurrently, rather than one by one. The alerts should then be pushed onto another thread (!) that contains a queue of alerts. As it receives alerts, it sends them off to a client.
A bit much but what would be the best way to go about this with C# - using threads, or something like background worker and just having the alert-queue running on a separate thread?
Any advice is much appreciated, new to some of this stuff so feel free to tell me if I'm completely wrong :)
You can use blocking collection. This is new in c# 4.0 to support producer consumer scenario.
It also supports 1 consumer and multiple producer.
If this is entirely interval driven and you can upload alerts to the client at the end of each interval, then you can use a simplified approach:
When your interval timer fires:
Use TPL to launch simultaneous tasks to parse data, giving each task a callback method to pass alerts back to main.
The callback writes alerts to a List protected by a lock statement.
Wait for all tasks to complete. Then execute a continuation routine to forward the alerts to the client.
Restart the interval timer.
If you need to start processing the next interval before the upload is finished, you can use a double-buffered approach where you write to one List, give that to the upload method, and then start writing alerts to a new List, switching back and forth between lists as you collect and upload alerts.