Imagine I want to have a small network of worker drones possibly on separate threads and possibly on separate processes or even on different PCs. The work items are created by a central program.
I'm looking for an existing product or service that will do this all for me. I know that there is MSMQ and also MQSeries. MQSeries is too expensive. MSMQ is notoriously unreliable. A database backed system would be fine, but I don't want to own/manage/write it. I want to use someone else's work queue system.
Related Articles:
Here is a similar question, but it's advocating building a custom queue mechanism.
The queue that I like a lot is this one from Google App Engine.
http://www.codeproject.com/KB/library/DotNetMQ.aspx
If you follow some guidelines you can use a database as a queue store with good success, see Using tables as Queues.
SQL Server comes with its own built-in message queuing, namely Service Broker. It allows you to avoid many of the MSMQ pitfalls when it comes to scalability, reliability and high availability and disaster recovery scenarios.
Servcie Broker is fully integrated in the database (no external store, one consistent backup/restore, one unit of failover, no need for expensive two-phase-commit DTC between message store and database, one single T-SQL API to access and program both the messages and your data) and also has some nice unique features such as transactional messaging with guaranteed Exactly-Once-In-Order delivery, correlated message locking, internal activation etc.
I have used Rabbit MQ in the past for a pet project, you could add that to your list for Queue systems.
As far as a framework to wrap the Queue's, you could take a look at http://www.nservicebus.com/ we have done a couple of basic projects here at work with that. And here's a quick example to get started: http://meisinger2.wordpress.com/2009/11/09/nservicebus-fifteen-minutes/
I have successfully used MassTransit in the past. It supports using MSMQ as well as RabbitMQ.
Related
I have a scenario where about 10 different messages will need to be enqueued and then dequeued / processed. One subscriber will need all 10 messages, but another will only need 8 of the 10 messages. I am trying to understand what the best way is to setup this type of architecture. Do you create a queue for each message type so the subscriber(s) can just subscribe to the relevant queues or do you dump them all to the same queue and ignore the messages that are not relevant to that subscriber? I want to ensure the solution is flexible / scalable, etc.
Process:
10 different xml messages will be enqueued to an IBM WebSphere MQ server.
We will use .Net (Most likely WCF since WebSphere MQ 7.1 has added in WCF support)
We will dequeue the messages and load them into another backend DB (Most likely SQL Server).
Solution needs to scale well because we will be processing a very large number of messages and this could grow (Probably 40-50,000 / hr). At least large amount for us.
As always greatly appreciate the info.
--S
Creating queues is relatively 'cheap' from a resource perspective, plus yes, it's better to use a queue for each specific purpose, so it's probably better in this case to separate them by target client if possible. Using a queue to pull messages selectively based on some criteria (correlation ID or some other thing) is usually a bad idea. The best performing scenario in messaging is the most straightforward one: simply pull messages from the queue as they arrive, rather than peeking and receiving selectively.
As to scaling, I can't speak for Websphere MQ or other IBM products, but 40-50K messages per hour isn't particularly hard for MSMQ on Windows Server to handle, so I'd assume IBM can do that as well. Usually the bottleneck isn't the queuing platform itself but rather the process of dequeuing and processing individual messages.
OK, based on the comments, here's a suggestion that will scale and doesn't require much change on the apps.
On the producer side, I'd copy the message selection criteria to a message property and then publish the message to a topic. The only change that is required here to the app is the message property. If for some reason you don't want to make it publish using the native functionality, you can define an alias over a topic. The app thinks it is sending messages but they are really publications.
On the consumer side you have a couple of choices. One is to create administrative subscriptions for each app and use a selector in the subscription. The messages are then funneled to a dedicated queue per consumer, based on the selection criteria. The apps think that they are simply consuming messages.
Alternatively the app can simply subscribe to the topic. This gives you the option of a dynamic subscription that doesn't receive messages when the app is disconnected (if in fact you wanted that) or a durable subscription that is functionally equivalent to the administrative subscription.
This solution will easily scale to the volumes you cited. Another option is that the producer doesn't use properties. Here, the consumer application consumes all messages, breaks open the message payload on each and decides whether to process or ignore the message. In this solution the producer is still publishing to a topic. Any solution involving straight queueing forces the producer to know all the destinations. Add another consumer, change the producer. Also, there's a PUT for each destination.
The worst case is a producer putting multiple messages and a consumer having to read each one to decide if it's going to be ignored. That option might have problems scaling, depending on how deep in the payload the selection criteria field lies. Really long XPath expression = poor performance and no way to tune WMQ to make up for it since the latency is all in the application at that point.
Best case, producer sets a message property and publishes. Consumers select on property in their subscription or an administrative subscription does this for them. Whether this solution uses application subscriptions or administrative subscriptions doesn't make any difference as far as scalability is concerned.
I'm trying to design a system which reports activity events to a database via a web service. The web service and database have already been built (COTS software) - all I have to do is provide the event source.
The catch, though, is that the event source needs to be fault tolerant. We have multiple replicated databases that I can talk to, so if the web service or database I'm talking to goes down, the software can quickly switch to another one that's up.
What I need help with though is the case when all the databases are down. I've already designed a queue that will hold on to the events as they pile in (and burst them out once the connection is restored), but the queue is an in-memory structure: if my app crashes in this state, or if power is lost, etc., then all the events in the queue are lost. This is unacceptable. What I need is a way to persist the events so that when a database comes back online I can send a burst of queued-up events, even in the event of power loss or crash.
I know that I don't want to re-implement the queue itself to use the file system as a backing store. This would work (and I've tried it) - but that method slows the system down dramatically as the hard drive becomes a bottleneck. Aside from this though, I can't think of a single way to design this system such that all the events are safely stored on the hard drive only when access to the database isn't available.
Does anyone have any ideas? =)
When I need messaging with fault tolerance (and/or guaranteed delivery, which based on your description I am guessing you also need), I usually turn to MSMQ. It provides both fault tolerance (messages are stored on disk in case of machine restart) and guaranteed delivery (messages will automatically and continually resend until they are received), as well as transactional sends and receives, message journaling, poison message handling, and other features.
I have been able to achieve a throughput of several thousand messages per second using MSMQ. Frankly, I am not sure that you will get too much better than that while still being fault tolerant.
msmq. I think you could also take a look at the notion of Job object.
I would agree with guys that better to use out of the box system like MSMQ with a set of messaging patterns in hand.
Anyway, if you have to do it yourself, you can use in memory database instead of serializing data yourself, I believe it should be faster enough.
I am creating a mass mailer application, where a web application sets up a email template and then queues a bunch of email address for sending. The other side will be a Windows service (or exe) that will poll this queue, picking up the messages for sending.
My question is, what would the advantage be of using SQL Service Broker (or MSMQ) over just creating my own custom queue table?
Everything I'm reading is suggesting I use Service Broker, but I really don't see what the huge advantage over a flat table (that would be a lot simpler to work with for me). For reference the application will be used to send 50,000-100,000 emails almost daily.
Do you know how to implement a queue over a flat table? This is not a silly question, implementing a queue over a table correctly is much harder than it sounds. Queue-like-tables are notoriously deadlock prone and you need to carefully consider the table design and the enqueue and dequeue operations. Also, do you know how to scale your pooling of the table? And how are you goind to handle retries and timeouts (ie. what timers are used for)?
I'm not saying you should use SSB. The lerning curve is very steep and is primarily a distributed applicaiton platform, not a local queueing product so some features, like dialogs, will actually be obstacles for you rather than advantages. I'm just saying that you must consider also the difficulties of flat-table-queues. If you never implemented a flat-table-queue then be warned, there are many dragons under that bridge.
50k-100k messages per day is nothing, is only one message per second. If you want 100k per minute, then we have something to talk about.
If you every need to port to another vendor's database, you will have less problem if you used normal tables.
As you seem to only have one reader and one write from your queue, I would tend to use a standard table until you hit problem. However if you start to feel the need to use “locking hints” etc, that the time to switch to the Service Broker Queues.
I would not use MSMQ, if both the sender and the reader need a database connection to work. MSMQ would be good if the sender did not talk to the database at all, as it lets the sender keep working when the database is down. However having to setup and maintain both the MSMQ and the database is likely to be more work then it is worth for most systems.
For advantages of Service Broker see this link:
http://msdn.microsoft.com/en-us/library/ms166063.aspx
In general we try to use a tool or standard functionality rather than building things ourselves. This lowers the cost and can make upgrading easier.
I know this is old question, but is sufficiently abstract to be relevant for long enough time.
After using both paradigms I would suggest flat table. It is surprisingly scalable and nifty. Correct hints need to be used.
Once the application goes distributed, or starts using mutiple allways on groups with different RW and RO servers, the Service Broker (or any other method of distributed communication) becomes a neccessity.
Flat table
needs only few hints (higly dependent on isolation level) to work scalably and reliably in the consumer (READPAST, UPDLOCK, ROWLOCK)
the order of message processing is not set in stone
the consumer must make sure that the message stays in the queue if the processing fails
needs some polling mechanism (job, CDC (here lies madness :)), external application...)
turn of maintenance jobs and automatic statistics for the table
Service broker
needs extremely overblown "infrastructure" (message types, contracts, services, queues, activation procedures, must be enabled after each server restart, conversations need to be correctly created and dropped...)
is extremely opaque - we have spent ages trying to make it run after it mysteriously stopped working
there is a predefined order of message processing
the tables it uses can cause deadlocks themselfs if SB is overused
is the only way (except for linked servers...) to send messages directly from database on RW server of one HA group to a database that is RO in this HA group (without any external app)
is the only way to send messages between different servers (linked servers are a big NONO (unless they become an YESYES - you know the drill - it depends)) (without any external app)
Say I need to design an in-memory service because of a very high load read/write system. I want to dump the results of the objects every 2 minutes. How would I access the in-memory objects/data from within a web application?
(I was thinking a Windows service would be running in the background handling the in-memory service etc.)
I want the fastest possible solution, and I would guess most people would say use a web service? What other options would I have? I just don't understand how I could hook into the Windows service's objects etc.
(Please don't ask why I would want to do this, maybe you're right and it's a bad idea but I am also curious if this type of architecture is possible.)
Update
I was looking at this site swoopo.com that I would think has a lot of hits near the end of auctions, but since the auction keeps resetting the hits to the database would be just crazy so I was thinking if they did it in memory then dumped to db every x minutes...
What you're describing is called a cache, with a facade front-end.
You write a facade to which you commit your changes and acquire your datasets. The facade queues up reads and writes and commits when the queue is full or after a certain amount of time has passed. Your web application has a single point of access to the data (the facade), and the facade is structured in such a way to avoid writing and reading from storage too often.
Most relational database management systems do this for you. They do this kind of optimization and queuing internally so writing another layer on top of it would only slow things down. So don't write a cache if you're using an RDBMS.
Regarding the specifics of accessing such a facade, you can treat it as just an object, and implement it however you want (its own thread, a thread pool, a Web service, a Windows service, whatever).
Any remoting technology would work such as sockets, pipes and the like.
Check out: www.remobjects.com
You could use a Windows Message Queues or a Service Bus, or even .NET remoting.
See http://www.nservicebus.com/, or http://code.google.com/p/masstransit/.
You could hook into the Windows Services objects by using Remoting or WCF, both offer very fast interprocess communication. Sockets are fast too but are more cumbersome to program compared to WCF. There is a ton of WCF documentation and support online.
Databases provide a level of caching for you. The advantage of an in memory golden copy such as the one you propose is that it never has to read from disk when a request comes in and if you host it on the same machine as your IIS (provided you have enough RAM for both) there is no extra network hop, making it much faster that querying a db. However, the downside to this approach is that it does not scale as well if you need to add machines to load balance.
Third party messaging providers such as TIBCO are also worth looking at.
I have a CRUD winform App that uses Merge Replication to allow "disconnected" functionality. My question is; If I am doing all initializing and synchronizing programatically with RMO (like HERE) does it matter if it is a Push or Pull?
What would be a difference?
I understand the differences between the two (see HERE) but it seems that if I am only interacting through RMO the differences become a little fuzzy. If I can it seems that, even though Pull is favored for Merge Replication, I would want to use Push to make the Server bear the brunt and easier management.
Also, due to our environment, I do not need "real-time" updates. Syncing, in either case, will be fired from a UI event.
Does anyone have any experience with this?
Thanks!
We use merge replication via RMO on 20+ client systems that are occasionally connected. As far as I know, you should go with pull subscriptions. I don't know if you could make it work with push subscriptions but I don't advise trying. As you say, the client system will be requesting the sync, which fits the definition of a pull subscription.
The "Use When" section in your second link is pretty clear in its recommendation for push in this case:
Data will typically be synchronized on demand or on a schedule rather than
continuously.
The publication has a large number of Subscribers, and/or it would be too
resource-intensive to run all the
agents at the Distributor.
Subscribers are autonomous, disconnected, and/or mobile.
Subscribers will determine when they
will connect and synchronize changes.
Most often used with merge replication.