I want to build a windows service that will use a remote encoding service (like encoding.com, zencoder, etc.) to upload video files for encoding, download them after the encoding process is complete, and process them.
In order to do that, I was thinking about having different queues, one for handling currently waiting files, one for files being uploaded, one for files waiting for encoding to complete and one more for downloading them. Each queue has a limitation, for example only 5 files can be uploaded for encoding at a certain time. The queues have to be visible and able to resurrect from a crash - currently we do that by writing the queue to an SQL table and managing the number of items in a separate table.
I also want the queues to run in the background, independent of each other, but able to transfer files from one queue to another as the process goes on.
My biggest question mark is about how to build the queues and managing them, and less about limiting the number of items in each queue.
I am not sure what is the right approach for this and would really appreciate any help.
Thanks!
You probably don't need to separate the work into separate queues, as long as they are logically separated in some way (tagged with different "job types" or such).
As I see it, the challenge is to not pick up and process more than a given limited number of jobs from the queue, based on the type of job. I had a somewhat similar issue a while ago which led to a question here on SO, and a subsequent blog post with my solution, both of which might give you some ideas.
In short my solution was that I keep a list of "tokens". When ever I want to perform a job that has some sort of limitation, I first pick up a token. If no tokens are available, I will need to wait for one to become available. Then you can use whatever queueing mechanism suitable to handle the queue as such.
There are various ways to approach this and it depends which one suits your case in terms of reliability and resilience/development cost/maintenance cost. You need to answer the question on the likes that what if server crashes, is it important to carry on what you were doing?
Queue can be implemented in MSMQ, SQL Server or simply in code and all queues in memory. For workflow you can use Windows Workflow Foundation, or implement it yourself which would be probably easier but change would be more difficult.
So if you give a few more hints, I should be able to help you better.
Related
I've read a good bit about threading with C#, but to be upfront I haven't done anything in production using it.
I have an application that has to process a bunch of documents and then send the documents via email. This may take 60 seconds to accomplish. I don't want the user of my web application to have to wait for these things to process to move on to other parts of the site.
On a button click the SendEmail function is called. What can I do to this code to make it so that my users can continue browsing the site without discontinuing the processing I need to do within the EmailPDFs function?
[Authorize]
public ActionResult SendEmail(decimal? id, decimal? id2)
{
EmailPDFs(..., ..., ...);
}
Thanks so much!
This is really the kind of thing that message queues are designed to handle. Fire off a message, and a process on a potentially separate server picks it up and processes it. When it's done, it sends a message back to a queue on your server, where a process on your server picks it up and notifies you that it's complete. You then notify your user that the work is finished.
Modern message queue systems can be backed by databases (such as Mongo, MySql, or SQL Server), and are extremely robust. The great thing about them is that they allow you to move long-running or CPU-intensive processes off onto other servers so that your web site remains nice and snappy.
You could try to add multi-threading and parallelism to your web application, by using TaskFactory and all that other stuff (for many folks, this is the route they take), but it doesn't make it very easy to separate your application if you need to, and break those big, resource-hogging pieces off if it becomes necessary.
I urge you to consider a queue-based solution.
Update:
For samples and information on how to implement this type of solution, see the following:
Reliable Messaging with MSMQ and .NET on MSDN
C#: A Message Queuing Service Application on MSDN
Also, consider glancing at this StackOverflow question for a quick crash course on the bare minimimum amount of code required.
A final note: MSMQ is built into certain flavors of Windows, and can be added to it through the Add/Remove Programs feature of the Control Panel. However, how you install it will depend on your specific flavor and version of Windows. A simple Google search will help you to find the appropriate instructions.
Good luck!
I need to download certain files using FTP.Already it is implemented without using the thread. It takes too much time to download all the files.
So i need to use some thread for speed up the process .
my code is like
foreach (string str1 in files)
{
download_FTP(str1)
}
I refer this , But i don't want every files to be queued at ones.say for example 5 files at a time.
If the process is too slow, it means most likely that the network/Internet connection is the bottleneck. In that case, downloading the files in parallel won't significantly increase the performance.
It might be another story though if you are downloading from different servers. We may then imagine that some of the servers are slower than others. In that case, parallel downloads would increase the overall performance since the program would download files from other servers while being busy with slow downloads.
EDIT: OK, we have more info from you: Single server, many small files.
Downloading multiple files involves some overhead. You can decrease this overhead by somehow grouping the files (tar, zip, whatever) on server-side. Of course, this may not be possible. If your app would talk to a web server, I'd advise to create a zip file on the fly server-side according to the list of files transmitted in the request. But you are on an FTP server so I'll assume you have nearly no flexibility server-side.
Downloading several files in parallel may probably increase the throughput in your case. Be very careful though about restrictions set by the server such as the max amount of simultaneous connections. Also, keep in mind that if you have many simultaneous users, you'll end up with a big amount of connections on the server: users x threads. Which may prove counter-productive according to the scalability of the server.
A commonly accepted rule of good behaviour consists in limiting to max 2 simultaneoud connections per user. YMMV.
Okay, as you're not using .NET 4 that makes it slightly harder - the Task Parallel Library would make it really easy to create five threads reading from a producer/consumer queue. However, it still won't be too hard.
Create a Queue<string> with all the files you want to download
Create 5 threads, each of which has a reference to the queue
Make each thread loop, taking an item off the queue and downloading it, or finishing if the queue is empty
Note that as Queue<T> isn't thread-safe, you'll need to lock to make sure that only one thread tries to fetch an item from the queue at a time:
string fileToDownload = null;
lock(padlock)
{
if (queue.Count == 0)
{
return; // Done
}
fileToDownload = queue.Dequeue();
}
As noted elsewhere, threading may not speed things up at all - it depends where the bottleneck is. If the bottleneck is the user's network connection, you won't be able to get more data down the same size of pipe just by using multi-threading. On the other hand, if you have a lot of small files to download from different hosts, then it may be latency rather than bandwidth which is the problem, in which case threading will help.
look up on ParameterizedThreadStart
List<System.Threading.ParameterizedThreadStart> ThreadsToUse = new List<System.Threading.ParameterizedThreadStart>();
int count = 0;
foreach (string str1 in files)
{
ThreadsToUse.add(System.Threading.ParameterizedThreadStart aThread = new System.Threading.ParameterizedThreadStart(download_FTP));
ThreadsToUse[count].Invoke(str1);
count ++;
}
I remember something about Thread.Join that can make all threads respond to one start statement, due to it being a delegate.
There is also something else you might want to look up on which i'm still trying to fully grasp which is AsyncThreads, with these you will know when the file has been downloaded. With a normal thread you gonna have to find another way to flag it's finished.
This may or may not help your speed, in one way of your line speed is low then it wont help you much,
on the other hand some servers set each connection to be capped to a certain speed in which you this in theory will set up multiple connections to the server therefore having a slight increase in speed. how much increase tho I cannot answer.
Hope this helps in some way
I can add some experience to the comments already posted. In an app some years ago I had to generate a treeview of files on an FTP server. Listing files does not normally require actual downloading, but some of the files were zipped folders and I had to download these and unzip them, (sometimes recursively), to display the files/folders inside. For a multithreaded solution, this reqired a 'FolderClass' for each folder that could keep state and so handle both unzipped and zipped folders. To start the operation off, one of these was set up with the root folder and submitted to a P-C queue and a pool of threads. As the folder was LISTed and iterated, more FolderClass instances were submitted to the queue for each subfolder. When a FolderClass instance reached the end of its LIST, it PostMessaged itself, (it was not C#, for which you would need BeginInvoke or the like), to the UI thread where its info was added to the listview.
This activity was characterised by a lot of latency-sensitive TCP connect/disconnect with occasional download/unzip.
A pool of, IIRC, 4-6 threads, (as already suggested by other posters), provided the best performance on the single-core system i had at the time and, in this particular case, was much faster than a single-threaded solution. I can't remember the figures exactly, but no stopwatch was needed to detect the performance boost - something like 3-4 times faster. On a modern box with multiiple cores where LISTs and unzips could occur concurrently, I would expect even more improvement.
There were some problems - the visual ListView component could not keep up with the incoming messages, (because of the multiple threads, data arrived for aparrently 'random' positions on the treeview and so required continual tree navigation for display), and so the UI tended to freeze during the operation. Another problem was detecting when the operation had actually finished. These snags are probably not relevant to your download-many-small-files app.
Conclusion - I expect that downloading a lot of small files is going to be faster if multithreaded with multiple connections, if only from mitigating the connect/disconnect latency which can be larger than the actual data download time. In the extreme case of a satellite connection with high speed but very high latency, a large thread pool would provide a massive speedup.
Note the valid caveats from the other posters - if the server, (or its admin), disallows or gets annoyed at the multiple connections, you may get no boost, limited bandwidth or a nasty email from the admin!
Rgds,
Martin
As part of my constant learning curve into what you can do to make apps scale better, I am currently trying to get a direction to go with queuing, i.e. job queuing or workload processing whichever phrase you like.
In the distant past I used IBM MQ/Series - it worked for a financial app but quite heavy if I remember.
I know of MSMQ, and I have also heard of quite a few others.
But first, here is my context
I have a C#/.NET back-end web app which serves data etc to a Javascript (mostly jQuery etc) front-end via AJAX calls etc. I have a situation where a certain action involves uploading some files, setting up a few record entries in the database, emailing some users etc. So of course I don't want to make this process "online"/"real-time" due to the possible time delay and I am sure the overheads on the webserver/database etc.
So given the type of "messages" that I need to queue and process, what would be (I shouldn't just say easy here I guess!) a good start point? should I run with MSMQ and/or the SQL 2008 service broker stuff, or something like ZeroMQ - or should I simply create my own lightweight workload queue service?
I realise again without seeing the full picture it is hard to make full recommendations, however any start points gratefully received!
David
Don't try to make your own, please! There are so many things to take into account that you will spend more time on it than the rest of your project most probably.
I'd say go for MSMQ, it's very easy to use with WCF, the queues are transactional, have a retry mechanism, etc, and you benefit from the MSMQ UI to see the messages, move them and so on.
In an application I'm creating, I've got two components that I want the user to be able to pause/resume. I'm wondering what standard patterns might exist to support pausing and resuming, if any? Both components do a lot of network I/O. It seems like, at a high level, I have to persist the current queue of work that each component has - but persisting it is where I'm looking for these standard patterns? Do I serialize the component itself? Do I serialize just the work? What format do I serialize to (xml, database, etc...)? What does .NET have built in that might help? Are there any libraries to help with this? Are there any differences to consider if the user just pauses/resumes within the same app session or if they pause/resume after opening, closing and then opening the application again? What about persisting this information across different computers?
Any suggestions from past experience or patterns that come to mind? I hope this turns into more of discussion of the various ways of doing this and the pros/cons of each. Thanks.
By message queue I meant MSMQ or one of it's brethren. All messages would be persisted in some sort of database and therefore still available when the app restarts. The primary purpose of such queues is to ensure that messages get delivered even when communication is intermittent and/or unreliable.
It sounds like you could have your communication components take work from MSMQ instead of your current queues pretty easily.
If that doesn't fit your application, it is probably as simple as serializing the objects in your existing queues on termination, and de-serializing them again at application start up. If surviving unexpected termination is important you should always serialize an object as it is added to the work queue, but at that point you may want to look again at an existing message queue system.
You could implement threading and simply call the Suspend() and the Resume() functions on the thread accordingly.
I am creating a mass mailer application, where a web application sets up a email template and then queues a bunch of email address for sending. The other side will be a Windows service (or exe) that will poll this queue, picking up the messages for sending.
My question is, what would the advantage be of using SQL Service Broker (or MSMQ) over just creating my own custom queue table?
Everything I'm reading is suggesting I use Service Broker, but I really don't see what the huge advantage over a flat table (that would be a lot simpler to work with for me). For reference the application will be used to send 50,000-100,000 emails almost daily.
Do you know how to implement a queue over a flat table? This is not a silly question, implementing a queue over a table correctly is much harder than it sounds. Queue-like-tables are notoriously deadlock prone and you need to carefully consider the table design and the enqueue and dequeue operations. Also, do you know how to scale your pooling of the table? And how are you goind to handle retries and timeouts (ie. what timers are used for)?
I'm not saying you should use SSB. The lerning curve is very steep and is primarily a distributed applicaiton platform, not a local queueing product so some features, like dialogs, will actually be obstacles for you rather than advantages. I'm just saying that you must consider also the difficulties of flat-table-queues. If you never implemented a flat-table-queue then be warned, there are many dragons under that bridge.
50k-100k messages per day is nothing, is only one message per second. If you want 100k per minute, then we have something to talk about.
If you every need to port to another vendor's database, you will have less problem if you used normal tables.
As you seem to only have one reader and one write from your queue, I would tend to use a standard table until you hit problem. However if you start to feel the need to use “locking hints” etc, that the time to switch to the Service Broker Queues.
I would not use MSMQ, if both the sender and the reader need a database connection to work. MSMQ would be good if the sender did not talk to the database at all, as it lets the sender keep working when the database is down. However having to setup and maintain both the MSMQ and the database is likely to be more work then it is worth for most systems.
For advantages of Service Broker see this link:
http://msdn.microsoft.com/en-us/library/ms166063.aspx
In general we try to use a tool or standard functionality rather than building things ourselves. This lowers the cost and can make upgrading easier.
I know this is old question, but is sufficiently abstract to be relevant for long enough time.
After using both paradigms I would suggest flat table. It is surprisingly scalable and nifty. Correct hints need to be used.
Once the application goes distributed, or starts using mutiple allways on groups with different RW and RO servers, the Service Broker (or any other method of distributed communication) becomes a neccessity.
Flat table
needs only few hints (higly dependent on isolation level) to work scalably and reliably in the consumer (READPAST, UPDLOCK, ROWLOCK)
the order of message processing is not set in stone
the consumer must make sure that the message stays in the queue if the processing fails
needs some polling mechanism (job, CDC (here lies madness :)), external application...)
turn of maintenance jobs and automatic statistics for the table
Service broker
needs extremely overblown "infrastructure" (message types, contracts, services, queues, activation procedures, must be enabled after each server restart, conversations need to be correctly created and dropped...)
is extremely opaque - we have spent ages trying to make it run after it mysteriously stopped working
there is a predefined order of message processing
the tables it uses can cause deadlocks themselfs if SB is overused
is the only way (except for linked servers...) to send messages directly from database on RW server of one HA group to a database that is RO in this HA group (without any external app)
is the only way to send messages between different servers (linked servers are a big NONO (unless they become an YESYES - you know the drill - it depends)) (without any external app)