How to implement Message Queuing Solution

How to implement Message Queuing Solution - c#

I have a scenario where about 10 different messages will need to be enqueued and then dequeued / processed. One subscriber will need all 10 messages, but another will only need 8 of the 10 messages. I am trying to understand what the best way is to setup this type of architecture. Do you create a queue for each message type so the subscriber(s) can just subscribe to the relevant queues or do you dump them all to the same queue and ignore the messages that are not relevant to that subscriber? I want to ensure the solution is flexible / scalable, etc.
Process:
10 different xml messages will be enqueued to an IBM WebSphere MQ server.
We will use .Net (Most likely WCF since WebSphere MQ 7.1 has added in WCF support)
We will dequeue the messages and load them into another backend DB (Most likely SQL Server).
Solution needs to scale well because we will be processing a very large number of messages and this could grow (Probably 40-50,000 / hr). At least large amount for us.
As always greatly appreciate the info.
--S

Creating queues is relatively 'cheap' from a resource perspective, plus yes, it's better to use a queue for each specific purpose, so it's probably better in this case to separate them by target client if possible. Using a queue to pull messages selectively based on some criteria (correlation ID or some other thing) is usually a bad idea. The best performing scenario in messaging is the most straightforward one: simply pull messages from the queue as they arrive, rather than peeking and receiving selectively.
As to scaling, I can't speak for Websphere MQ or other IBM products, but 40-50K messages per hour isn't particularly hard for MSMQ on Windows Server to handle, so I'd assume IBM can do that as well. Usually the bottleneck isn't the queuing platform itself but rather the process of dequeuing and processing individual messages.

OK, based on the comments, here's a suggestion that will scale and doesn't require much change on the apps.
On the producer side, I'd copy the message selection criteria to a message property and then publish the message to a topic. The only change that is required here to the app is the message property. If for some reason you don't want to make it publish using the native functionality, you can define an alias over a topic. The app thinks it is sending messages but they are really publications.
On the consumer side you have a couple of choices. One is to create administrative subscriptions for each app and use a selector in the subscription. The messages are then funneled to a dedicated queue per consumer, based on the selection criteria. The apps think that they are simply consuming messages.
Alternatively the app can simply subscribe to the topic. This gives you the option of a dynamic subscription that doesn't receive messages when the app is disconnected (if in fact you wanted that) or a durable subscription that is functionally equivalent to the administrative subscription.
This solution will easily scale to the volumes you cited. Another option is that the producer doesn't use properties. Here, the consumer application consumes all messages, breaks open the message payload on each and decides whether to process or ignore the message. In this solution the producer is still publishing to a topic. Any solution involving straight queueing forces the producer to know all the destinations. Add another consumer, change the producer. Also, there's a PUT for each destination.
The worst case is a producer putting multiple messages and a consumer having to read each one to decide if it's going to be ignored. That option might have problems scaling, depending on how deep in the payload the selection criteria field lies. Really long XPath expression = poor performance and no way to tune WMQ to make up for it since the latency is all in the application at that point.
Best case, producer sets a message property and publishes. Consumers select on property in their subscription or an administrative subscription does this for them. Whether this solution uses application subscriptions or administrative subscriptions doesn't make any difference as far as scalability is concerned.

Related

JMS: updating message version / prevent certain message from being queued

I am trying to create a message based application based with ActiveMQ, using .NET Clients.
Client 1: A Web Service (producer)
Client 2: A Windows Service (consumer)
My question is: Is it possible to prevent messages of a certain type or content from being queued by a Client?
The reason why I want to do this is Version Updating.
I think there will be a time, when I need to extend or change the message type.
My plan is to do that update in the following order:
Prevent messages of the old version to be queued.
Wait until the consumer has processed all messages of the old version.
Update producer and consumer software.
I would like the Web Service to be still available during the update process to report back to the call. But it should not be able to queue new messages.
Of course if there is a better way of solving this problem altogether, please let me know.

As a general rule it is a good idea to only have one type of payload per queue. An easy way to do this is to use two different queues for the two different message versions. Something like:
mysystem.orders.1_0
mysystem.orders.1_1
The version should be the last part of the queue name, as it makes it easy to work with wildcards, which are used for a lot of the config options in ActiveMQ.
Splitting up different versions into different queues gets you around the problem of having to upgrade the producer and consumer at the same time, and also gives you some visibility as whether all of the 1_0 messages have been consumed.

Consuming SQL Server data events for messaging purposes

At our organization we have a SQL Server 2005 database and a fair number of database clients: web sites (php, zope, asp.net), rich clients (legacy fox pro). Now we need to pass certain events from the core database with other systems (MongoDb, LDAP and others). Messaging paradigm seems pretty capable of solving this kind of problem. So we decided to use RabbitMQ broker as a middleware.
The problem of consuming events from the database at first seemed to have only two possible solutions:
Poll the database for outgoing messages and pass them to a message broker.
Use triggers on certain tables to pass messages to a broker on the same machine.
I disliked the first idea due to latency issues which arise when periodical execution of sql is involved.
But event-based trigger approach has a problem which seems unsolvable to me at the moment. Consider this scenario:
A row is inserted into a table.
Trigger fires and sends a message (using a CLR Stored Procedure written in C#)
Everything is ok unless transaction which writes data is rolled back. In this case data will be consistent, but the message has already been sent and cannot be rolled back because trigger fires at the moment of writing to the database log, not at the time of transaction commit (which is a correct behaviour of a RDBMS).
I realize now that I'm asking too much of triggers and they are not suitable for tasks other than working with data.
So my questions are:
Has anyone managed to extract data events using triggers?
What other methods of consuming data events can you advise?
Is Query Notification (built on top of Service Broker) suitable in my situation?
Thanks in advance!

Lest first cut out of the of the equation the obvious misfit: Query Notification is not right technology for this, because is designed to address cache invalidation of relatively stable data. With QN you'll only know that table has changed, but you won't be able to know what had changed.
Kudos to you for figuring out why triggers invoking SQLCRL won't work: the consistency is broken on rollback.
So what does work? Consider this: BizTalk Server. In other words, there is an entire business built around this problem space, and solutions are far from trivial (otherwise nobody would buy such products).
You can get quite far though following a few principles:
decoupling. Event based triggers are OK, but do not send the message from the trigger. Aside from the consistency issue on rollback you also have the latency issue of having every DML operation now wait for an external API call (the RabbitMQ send) and the availability issue of the external API call failure (if RabbitMQ is unavailable, your DB is unavailable). The solution is to have the trigger use ordinary tables as queues, the trigger will enqueue a message in the local db queue (ie. will insert into this table) and and external process will service this queue by dequeueing the messages (ie. delete from the table) and forwarding them to RabbitMQ. This decouples the transaction from the RabbitMQ operation (the external process is able to see the message only if the original xact commits), but the cost is some obvious added latency (there is an extra hop involved, the local table acting as a queue).
idempotency. Since RabbitMQ cannot enroll in distributed transactions with the database you cannot guarantee atomicity of the DB operation (the dequeue from local table acting as queue) and the RabbitMQ operation (the send). Either one can succeed when the other failed, and there is simply no way around it w/o explicit distributed transaction enrollment support. Which implies that the application will send duplicate messages every once in a while (usually when things already go bad for some reason). And a quick heads up: enrolling into the act of explicit 'acknowledge' messages and send sequence numbers is a loosing battle as you'll quickly discover that you're reinventing TCP on top of messaging, that road is paved with bodies.
tolerance. For the same reasons as the item above every now in a while a message you believe was sent will never make it. Again, what damage this causes is entirely business specific. The issue is not how to prevent this situation (is almost impossible...) but how to detect this situation, and what to do about it. No silver bullet, I'm afraid.
You do mention in passing Service Broker (the fact that is powering Query Notification is the least interestign aspect of it...). As a messaging platform built into SQL Server which offers Exactly Once In Order delivery guarantees and is fully transacted it would solve all the above pain points (you can SEND from triggers withouth impunity, you can use Activation to solve the latency issue, you'll never see a duplicate or a missing message, there are clear error semantics) and some other pain points I did not mention before (consistency of backup/restore as the data and the messages are on the same unit of storage - the database, cosnsitnecy of HA/DR failover as SSB support both database mirroring and clustering etc). The draw back though is that SSB is only capable of talking to another SSB service, in other words it can only be used to exchange messages between two (or more) SQL Server instances. Any other use requires the parties to use a SQL Server to exchange messages. But if your endpoints are all SQL Server, then consider that there are some large scale deployments using Service Broker. Note that endpoints like php or asp.net can be considered SQL Server endpoints, they are just programming layers on top of the DB API, a different endpoint would, say, the need to send messages from handheld devices (phones) directly to the database (and eve those 99% of the time go through a web service, which means they can reach a SQL Server ultimately). Another consideration is that SSB is geared toward throughput and reliable delivery, not toward low latency. Is definitely not the technology to use to get back the response in a HTTP web request, for instance. IS the technology to use to submit for processing something triggered by a web request.

Remus's answer lays out some sound principals for generating and handling events. You can initiate the pushing of events from a trigger to achieve low latency.
You can achieve everything necessary from a trigger. We will still decouple this into two components: a trigger that generates the events and a local reader that reads the events.
The first component is the trigger.
Make a CLR trigger that prepares what needs to be done when the transaction commits.
Create a System.Transactions.IEnlistmentNotification that always agrees to be prepared, and whose void Commit(System.Transactions.Enlistment) method executes the prepared action.
In the trigger, call System.Transactions.Transaction.Current.EnlistVolatile(enlistmentNotification, System.Transactions.EnlistmentOptions.None)
You'll want your action to be short and sweet, like appending the data to a lockless queue in memory or updating some other state in memory. Don't try to communicate with other machines or processes. Don't write to a disk (if you wanted to write to a disk, just make an ordinary trigger that inserts into a queue table). You'll need to be careful to make sure your assembly is loaded only once so that any shared static state will be unique; this is easiest to do if your static state is in a top level assembly that isn't referenced by other assemblies, so no other assemblies will try to load it.
You will also need to either
initialize your state in such a way that it will be correct even if the system was restarted without sending all the previously queued messages (since a short, in memory queue will not be durable). This means you might be resending messages, so they will need to be idempotent. or
rely on the tolerance of another component to pick up on missed messages
The second component reads the state that is update by the trigger. Make a separate CLR component that reads from your queue or state, and does whatever you need done (like send an idempotent message to a messaging system, record that it was sent, whatever). If this component can fail (hint: it can), you will need some form of tolerance, which may belong in another system. You can achieve low latency by having the trigger signal the second component when new state is available.
One architectural possibility is to have the trigger put the event in memory on commit for another low-latency component to pick up and have the second component send a low-latency, low-reliability copy of an idempotent message. You can pair that with a more reliably or durable messaging system, such as SSB, that will reliably and durably, but with grater latency, send the same idempotent message later.

Message Passing Like erlang in C# With Very Large Data Stream

This is a question about message passing. This relates specifically to an in-house application written in C#. But it has a home grown "message passing" system resembling erlang.
Okay, we hope that it will be possible to learn from erlang folks or documentation to find an elegant solution to a couple of message passing challenges. But alas, after reading erlang documentation online and forums these topics don't seem to be addressed--that we can find.
So the question is: In erlang, when does the queue to send messages to a process get full? And does erlang handle the queue full situation? Or are the queues for message passing in erlang boundless--that is only limited by system memory?
Well in our system it involves processing a stream of financial data with potentially billions of tuples of information being read from disk, each tuple of financial information is called a "tick" in the financial world.
So it was necessary to set a limit to the queue size of each "process" in our system. We arbitrarily selected 1000 items max in each queue. Now those queues quickly get filled entirely by the tick messages.
The problem is that the processes also need to send other types of messages to each other besides just ticks but the ticks fill up the queues preventing any other types of message from getting passed.
As a "band aid" solution (which is messy) allow multiple queues per process for each message type. So a process will have a tick queue, and a command queue, and fill queue, and so on.
But erlang seems so much cleaner by having a single queue to each "process" that carries different message types. But again, how does it deal with the the queue getting hogged by a flood of only one of the message types?
So perhaps this is a question about the internals of erlang. Does erlang internally have separate limits on message types in a queue? Or does it internally have a separate queue per type of message?
In any case, how are sending processes aware when a queue is too full to receive certain types of message? Does the send fail? Does that mean error handling becomes necessary in erlang for inability to send?
In our system, it tracks when queues get full and then prevents any processes from running which will attempt to add to a full queue until that queue has more space. This avoids messy error handling logic since processes, once invoked, are guaranteed to have room to send one message.
But again, if we put multiple types of messages on that same queue. Other message types will be blocked that must get through.
It has become my perhaps mistaken impression that erlang wasn't designed to handle this situation so perhaps it doesn't address the problem of a queues getting filled with a flood of a single message type.
But we hope someone know how to answer this point to good reference information or book that covers this specific scenario.

Erlang sends all messages to a single queue with the system memory being the upper limit on the queue size. If you want to prioritize messages you have to scan the entire queue for the high priority messages before fetching a low priority one.
There are ways to get around this by spawning handler processes which throttle and prioritize traffic, but the erlang VM as such has no support for it.

Answer to the additional question in the comment:
Even at Safari books online, the main ones never say how messages are passed on erlang. It's clear they don't used "shared memory". So how do they communicate? is it via loopback tcp/ip when on the same machine?
Within one virtual machine, messages are simply copied (except for big enough binaries; for them, pointers are copied) between memory areas assigned to processes. If you start several Erlang VMs on the same machine, they can communicate over TCP/IP.

Persistent Work Queue in C#

Imagine I want to have a small network of worker drones possibly on separate threads and possibly on separate processes or even on different PCs. The work items are created by a central program.
I'm looking for an existing product or service that will do this all for me. I know that there is MSMQ and also MQSeries. MQSeries is too expensive. MSMQ is notoriously unreliable. A database backed system would be fine, but I don't want to own/manage/write it. I want to use someone else's work queue system.
Related Articles:
Here is a similar question, but it's advocating building a custom queue mechanism.
The queue that I like a lot is this one from Google App Engine.
http://www.codeproject.com/KB/library/DotNetMQ.aspx

If you follow some guidelines you can use a database as a queue store with good success, see Using tables as Queues.
SQL Server comes with its own built-in message queuing, namely Service Broker. It allows you to avoid many of the MSMQ pitfalls when it comes to scalability, reliability and high availability and disaster recovery scenarios.
Servcie Broker is fully integrated in the database (no external store, one consistent backup/restore, one unit of failover, no need for expensive two-phase-commit DTC between message store and database, one single T-SQL API to access and program both the messages and your data) and also has some nice unique features such as transactional messaging with guaranteed Exactly-Once-In-Order delivery, correlated message locking, internal activation etc.

I have used Rabbit MQ in the past for a pet project, you could add that to your list for Queue systems.
As far as a framework to wrap the Queue's, you could take a look at http://www.nservicebus.com/ we have done a couple of basic projects here at work with that. And here's a quick example to get started: http://meisinger2.wordpress.com/2009/11/09/nservicebus-fifteen-minutes/

I have successfully used MassTransit in the past. It supports using MSMQ as well as RabbitMQ.

SQL Service Broker vs Custom Queue

I am creating a mass mailer application, where a web application sets up a email template and then queues a bunch of email address for sending. The other side will be a Windows service (or exe) that will poll this queue, picking up the messages for sending.
My question is, what would the advantage be of using SQL Service Broker (or MSMQ) over just creating my own custom queue table?
Everything I'm reading is suggesting I use Service Broker, but I really don't see what the huge advantage over a flat table (that would be a lot simpler to work with for me). For reference the application will be used to send 50,000-100,000 emails almost daily.

Do you know how to implement a queue over a flat table? This is not a silly question, implementing a queue over a table correctly is much harder than it sounds. Queue-like-tables are notoriously deadlock prone and you need to carefully consider the table design and the enqueue and dequeue operations. Also, do you know how to scale your pooling of the table? And how are you goind to handle retries and timeouts (ie. what timers are used for)?
I'm not saying you should use SSB. The lerning curve is very steep and is primarily a distributed applicaiton platform, not a local queueing product so some features, like dialogs, will actually be obstacles for you rather than advantages. I'm just saying that you must consider also the difficulties of flat-table-queues. If you never implemented a flat-table-queue then be warned, there are many dragons under that bridge.
50k-100k messages per day is nothing, is only one message per second. If you want 100k per minute, then we have something to talk about.

If you every need to port to another vendor's database, you will have less problem if you used normal tables.
As you seem to only have one reader and one write from your queue, I would tend to use a standard table until you hit problem. However if you start to feel the need to use “locking hints” etc, that the time to switch to the Service Broker Queues.
I would not use MSMQ, if both the sender and the reader need a database connection to work. MSMQ would be good if the sender did not talk to the database at all, as it lets the sender keep working when the database is down. However having to setup and maintain both the MSMQ and the database is likely to be more work then it is worth for most systems.

For advantages of Service Broker see this link:
http://msdn.microsoft.com/en-us/library/ms166063.aspx
In general we try to use a tool or standard functionality rather than building things ourselves. This lowers the cost and can make upgrading easier.

I know this is old question, but is sufficiently abstract to be relevant for long enough time.
After using both paradigms I would suggest flat table. It is surprisingly scalable and nifty. Correct hints need to be used.
Once the application goes distributed, or starts using mutiple allways on groups with different RW and RO servers, the Service Broker (or any other method of distributed communication) becomes a neccessity.
Flat table
needs only few hints (higly dependent on isolation level) to work scalably and reliably in the consumer (READPAST, UPDLOCK, ROWLOCK)
the order of message processing is not set in stone
the consumer must make sure that the message stays in the queue if the processing fails
needs some polling mechanism (job, CDC (here lies madness :)), external application...)
turn of maintenance jobs and automatic statistics for the table
Service broker
needs extremely overblown "infrastructure" (message types, contracts, services, queues, activation procedures, must be enabled after each server restart, conversations need to be correctly created and dropped...)
is extremely opaque - we have spent ages trying to make it run after it mysteriously stopped working
there is a predefined order of message processing
the tables it uses can cause deadlocks themselfs if SB is overused
is the only way (except for linked servers...) to send messages directly from database on RW server of one HA group to a database that is RO in this HA group (without any external app)
is the only way to send messages between different servers (linked servers are a big NONO (unless they become an YESYES - you know the drill - it depends)) (without any external app)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.