Working on a C# project at the minute - a general idea of what I'm trying to do is...
A user has one or more portfolios with symbols they are interested in,
Each portfolio downloads data on the symbols to csv, parses it, then runs a set of rules on the data,
Alerts are generated based on the outcomes of these rules
The portfolios should download the data etc. whenever a set interval has passed
I was planning on having each portfolio running on it's own thread, that way when the interval has elapsed, each portfolio can proceed to download the data, parse and run the rules on it concurrently, rather than one by one. The alerts should then be pushed onto another thread (!) that contains a queue of alerts. As it receives alerts, it sends them off to a client.
A bit much but what would be the best way to go about this with C# - using threads, or something like background worker and just having the alert-queue running on a separate thread?
Any advice is much appreciated, new to some of this stuff so feel free to tell me if I'm completely wrong :)
You can use blocking collection. This is new in c# 4.0 to support producer consumer scenario.
It also supports 1 consumer and multiple producer.
If this is entirely interval driven and you can upload alerts to the client at the end of each interval, then you can use a simplified approach:
When your interval timer fires:
Use TPL to launch simultaneous tasks to parse data, giving each task a callback method to pass alerts back to main.
The callback writes alerts to a List protected by a lock statement.
Wait for all tasks to complete. Then execute a continuation routine to forward the alerts to the client.
Restart the interval timer.
If you need to start processing the next interval before the upload is finished, you can use a double-buffered approach where you write to one List, give that to the upload method, and then start writing alerts to a new List, switching back and forth between lists as you collect and upload alerts.
Related
We wrote service that using ~200 threads .
200 Threads must do:
1- Download from internet
2- Parse the raw data (html,xml,json...)
3- Store the newly created data to db
For ~10 threads elapsed time for second operation(Parsing) is 50ms (per thread)
For ~50 threads elapsed time for second operation(Parsing) is 80-18000 ms (per thread)
So we have an idea !
We can download documents as multithreaded but using MSMQ we can send rawdata to another process (consumer). And another process implement second part (Parsing) as single threaded.
You can say why dont you use c# Queue class in same process.. We could not prevent our "precious parsing thread" from Thread Context switch. If there are 200 threads in same process the precious will be context switch victim.
Using MSMQ for this requirement is normal?
Yes, this is an excellent example of where MSMQ makes a lot of sense. You can offload your difficult work to a different process to handle without affecting the performance of your current process which clearly doesn't care about the results. Not only that, but if your new worker process goes down, the queue will preserve state and messages (other than maybe the one being worked on when it went down) will not be lost.
Depending on your needs and goals I'd consider offloading the download to the other process as well - passing URLs to work on to the queue for example. Then, scaling up your system is as easy as dialing up the queue receivers, since queue messages are received in a thread safe manner when implemented correctly.
Yes, it is normal. And there are frameworks/libraries that help you building these kind of solutions providing you more than only transports.
NServiceBus or MassTransit are examples (both can sit on top of MSMQ)
I would like to use MSMQ queue to handle a lot of operations on XML data files. If I properly understand that technology, tasks will be passed to queue where they will get by handler. Also if there are a lot of such tasks, handler will catch tasks one by one. Therefore there are some pending tasks that just laying in queue and waits for handler.
So also I should show a progress of handling of uploaded XML files on website in percents.
The question is how can I demonstrate such progress of pending tasks which really didn't start to be handled.
POST EDIT
The regular way of reflecting a progress of handling some task is to request service back for a percentage of completeness by some token, which client was generated before. And then just write it on the site.
You can open a queue as bi-directional and let the handler pass an answer back to the sender.
MSMQ is ment to be used by a different process that can be run on even a different computer.
This a way to offload long running jobs off the current process, as for example a service.
If that service is down, your client will not know about it, it even shouldn't care as MSMQ "guarantees" the job will be done. Consider how much use tracking progress is in that case? (besides observing that the service could be dead)
If you just to want to do some simple async work I suggest to look at the Task class and leave MSMQ.
My silverlight application fetches file sets from a webservice (async). The webservice method accepts an array of file names and returns the set of files (also as an array). The silverlight client makes several such requests for file sets.
The client issues many requests to the webservice at once. I need a BackgroundWorker thread at the client to process received file sets one after the other.
How can I collect all the file sets as and when they receive and give these sets to the BackgroundWorker thread one at a time.
EDIT:
I could not run multiple BackgorundWorkers as the file set processing module is not thread-safe.
Use a BlockingCollection / ConcurrentQueue to hold the information about file sets to be processed... in the backgroundworker you just have while loop dequeuing the next file set and processing it... the mentioned collections are threadsafe and really fast since most operations are implemented lock-free...
The backgroundworker has no built-in Listen mechanism. It is supposed to perform a long action and terminate.
One solution could consist in firing up one backgroundworker for each file set.
If the processing of those file sets must be synchronized, you could decide to push each request into a queue (basically an array. Make sure you synchronize access to it). Whenever you backgroundworker is done processing a file set, it would report to the main thread (ProgressChanged event IIRC) and loop over further possible requests in the array. Whenever the array is empty, the worker exits.
Pay attention though: If the worker exits while you are sending a request, you'll have a problem. That's why a basic thread may prove stronger than a background thread, especially if you can't know whether there will be further file sets to process. It all depends on your workflow.
I have a program that does the following:
Call webservice (there are many calls to the same web service)
Process the result of 1.
Insert result of 2. in a DB
So I think it should be better to do some multithreading. I think I can do it like this:
one thread is the master (let's call it A)
it creates some thread which calls the webservices (let's call it W)
when W has some results it sends it to A (or A detects that W has some stuff)
A sends the results to some computing thread (let's call it C)
when C has some results it sends it to A (or A detects that C has some stuff)
A sends the results to some database thread (let's call it D)
So sometimes C or D will wait for work to do.
With this technique I'll be able to set the thread number for each task.
Can you please tell me how I can do that, maybe if there is any pattern.
EDIT : I added "some" instead of "a", so I'll create many thread for some time-consuming process, and maybe only one for the fastest.
It sounds to me like you could use the producer/consumer pattern. With .NET 4 this has become really simple to implement. Start a number of Tasks and use the BlockingCollection<T> as a buffer between the tasks. Check out this post for details.
In .net you have a thread pool.
When you release a thread it does not actually get closed, it just goes back into the thread pool. When you open a new thread you get one from the thread pool.
If they are not used for a long time the thread pool will close them.
I would start two timers, which will fire their event handlers on separate ThreadPool threads. The event handler for the first timer will check the web service for data, write it to a Queue<T> if it finds some, and then go back to sleep until the next interval.
The event handler for the second timer reads the queue and updates the database, then sleeps until its next interval. Both event handlers should wrap access to the queue a lock to manage concurrency.
Separate timers with independent intervals will let you decouple when data is available from how long it takes to insert it into the database, with the queue acting as a buffer. Since generic queues can grow dynamically, you get some breathing room even if the database is unavailable for a time. The second event handler could even spawn multiple threads to insert multiple records simultaneously or to mirrored databases. The event handlers can also post updates to a log file or user interface to help you monitor activity.
In my web application there is a process that queries data from all over the web, filters it, and saves it to the database. As you can imagine this process takes some time. My current solution is to increase the page timeout and give an AJAX progress bar to the user while it loads. This is a problem for two reasons - 1) it still takes to long and the user must wait 2) it sometimes still times out.
I've dabbled in threading the process and have read I should async post it to a web service ("Fire and forget").
Some references I've read:
- MSDN
- Fire and Forget
So my question is - what is the best method?
UPDATE: After the user inputs their data I would like to redirect them to the results page that incrementally updates as the process is running in the background.
To avoid excessive architecture astronomy, I often use a hidden iframe to call the long running process and stream back progress information. Coupled with something like jsProgressBarHandler, you can pretty easily create great out-of-band progress indication for longer tasks where a generic progress animation doesn't cut it.
In your specific situation, you may want to use one LongRunningProcess.aspx call per task, to avoid those page timeouts.
For example, call LongRunningProcess.aspx?taskID=1 to kick it off and then at the end of that task, emit a
document.location = "LongRunningProcess.aspx?taskID=2".
Ad nauseum.
We had a similar issue and solved it by starting the work via an asychronous web service call (which meant that the user did not have to wait for the work to finish). The web service then started a SQL Job which performed the work and periodically updated a table with the status of the work. We provided a UI which allowed the user to query the table.
I ran into this exact problem at my last job. The best way I found was to fire off an asychronous process, and notify the user when it's done (email or something else). Making them wait that long is going to be problematic because of timeouts and wasted productivity for them. Having them wait for a progress bar can give them false sense of security that they can cancel the process when they close the browser which may not be the case depending on how you set up the system.
How are you querying the remote data?
How often does it change?
Are the results something that could be cached for a period of time?
How long a period of time are we actually talking about here?
The 'best method' is likely to depend in some way on the answers to these questions...
You can create another thread and store a reference to the thread in the session or application state, depending on wether the thread can run only once per website, or once per user session.
You can then redirect the user to a page where he can monitor the threads progress. You can set the page to refresh automatically, or display a refresh button to the user.
Upon completion of the thread, you can send an email to the user.
My solution to this, has been an out of band service that does these and caches them in db.
When the person asks for something the first time, they get a bit of a wait, and then it shows up but if they refresh, its immediate, and then, because its int he db, its now part of the hourly update for the next 24 hours from the last request.
Add the job, with its relevant parameters, to a job queue table. Then, write a windows service that will pick up these jobs and process them, save the results to an appropriate location, and email the requester with a link to the results. It is also a nice touch to give some sort of a UI so the user can check the status of their job(s).
This way is much better than launching a seperate thread or increasing the timeout, especially if your application is larger and needs to scale, as you can simply add multiple servers to process jobs if necessary.