We are in the process of migrating a legacy system to nservicebus 5.0. What is generally the best way to migrate our business data to saga data? For example, if we had an OrderCancellationPolicy saga, that only allowed cancellation within 2 days, how would past orders from the legacy system create these new sagas in the correct state?
I see two options. The first being to write some sql script to prepopulate the saga persistence tables (we are using nhibernate persistence). The other being to create some kind of special import message, such as MigrateOrderDataCmd, that contains the data from the legacy order. An import script could send out these messages which the sagas could handle and set the saga data that way.
Any guidance in this area is appreciated.
Theoretically I'd go for option two, or a version of it. Imagine your saga being down for a day and messages are piling up in the queue. With messages from a day coming in, you'd want to verify when the message was sent, or add a custom DateTime to the message yourself.
When the saga finally picks up the message, it knows it should not set the timeout two days in the future, but rather the time the message took to be delivered into the saga. That way when you migrate your current state, all messages get a proper timeout set.
On the other hand, if you've got a really large set of running processes, you might want to investigate how to propagate the tables. I have no hands on experience adding records to those tables myself, just changing them. :)
Related
I am developing a program in WPF.Net, and I need to know when somebody makes a change over any table of the database.
The idea is receive a event from the database when it was changed. I was reading a lot of articles but I can't find a method to resolve my problem.
Kind Regards
The best solution is to use a message queue. After your app commits a change to the database, the app also publishes a message on the message queue. Other clients then just wait for notifications on that message queue.
There are a few other common solutions, but all of them have disadvantages.
Polling. If a client is interested in recent changes, they run a query searching for new data every N seconds.
The downside is you have to keep polling even during times when there are no changes. You might have to poll very frequently, depending on how promptly you need to notice the changes. This adds to database load just to support the polling queries.
Also it costs more if you have many clients all polling for queries. In one system I supported, the database was struggling to process 30,000 queries per second just for clients running polling.
Change Data Capture. Using the binary log as a de facto message queue, because it records all the changes. Use a client tool such as Debezium, or write your own binlog tail client (this is a lot of work).
The downside is the binlog records all changes, not just those you want to be notified about. You have to filter it somehow. Also you have to learn how to use Debezium or equivalent tool.
Triggers. Write a trigger on the table that invokes a UDF to post notification outside the database. This is a bad idea, because the trigger executes when your insert/update/delete executes, not when the transaction commits. Clients could be notified of changes before the changes are committed, so if they go query the database right after they get the notification, the change is not visible to them yet.
Also a disadvantage because it requires you install a UDF extension in MySQL Server. MySQL doesn't normally have any way of posting an external notification.
I'm not a C# developer so I can't suggest specific code. But the general methods above are similar regardless of which language the app is written in.
I don't think this is possible with MySQL, DBs like MondgoDB have this sort of feature.
You may like to use the method described in this answer.
Essentially have date/time fields on rows where you can pull data since a certain date time. Or you could use a CQRS/Event stratagem and maybe use a message queue.
I have a question regarding the implementation of using a:
CosmosDB
Service Bus + DLQ
I have a Service Bus Trigger which triggers, does processing to incoming data, and then stores it in a CosmosDB with Upsert. If one message fails in the processing, I store it on a DeadLetter Queue (DLQ) which will be sent upon request at a later time. This will possibly lead to the problem that I will re-send an (much older) message from the DLQ which will override a "newer" object in the database. Today, I use a Table in order to store timestamps in order to make sure when an object was latest updated. A better way is to investigate a timestamp at the stored document vs. the Enqueued time property at the incoming ServiceBus message, however this does not work for a non-persistent Database.
Is there any "cleaner" ways to come around this issue?
Here are some aspects to help think about your solution:
Use a Service Bus transaction to ensure all the relevant work is done before completing the message. If the CosmosDB upsert fails, abandon the message for a retry.
Make sure that your design is idempotent. You can use the enqueued time and/or a correlation id to manage order to help implement the idempotence. I would add this as an array in the CosmosDB document and avoid using another data store to reduce possible points of failure.
Also make sure you have chosen the correct CosmosDB consistency level for your use case and if you are dealing with huge volumes, then you will need to think how important this feature is and possibly use your correlation ID in your partitioning strategy.
I am setting up a system where we will transport messages between several internal services on ServiceBus Topics. The messages will hold serialized objects. The model objects are defined as quite complex trees of classes. This means it is not practical to maintain duplet versions of the model structures in the code.
We expect the model structure to change so I have exposed the model version as a property on the brokered message.
What is the best way to handle the transition when we need to upgrade the model version?
I don't think we will really need to support two parallell model versions. But I am concerned we don't loose messages during the transition. I assume it is a good strategy to upgrade the sending services first and let all subscribers continue to process messages. When all messages of the previous version are processed, then it is time to upgrade the subscribing services.
What is the best mechanism for skipping messages with a new version that the listening service is currently not handling?
I know I could go back to the old school and define parallell model versions by using schemas for json or xml, thus making it possible for the listening service to handle parallell versions. But that would be cumbersome, so I really want to avoid that.
I noticed the BrokeredMessage has a Defer method. Would that be useful? It looked promising until I realized the messages will be "moved" from the live queue into a separate state where they need to be pulled by referencing them by key. Not practical.
Is it possible to postpone the message by modifying delivery time? A couple of minutes would be fine. If the same service is still running by that time it can be postponed once again. (A working code example would be appreciated!)
Do I need to create separate subscriptions based on model version? So far we allow different message types to travel on the same topic so that would call for some redesign.
As a rule of thumb: upgrades on a live system are difficult. The easiest option that minimises risks of system downtime is:
add next message version support to the current code base
run two message versions concurrently
ensure all versions are supported and system runs without a problem
remove previous version
I have been looking at something similar but am yet to implement it so can't provide full guidance, but to answer your question on #3... I have messages which have a flag to re-queue the message to run again, e.g. to get a process to run every 5 minutes.
So during the process I extract the object from the BrokeredMessage:
var myObject = receivedMessage.GetBody<MyModel>();
I then complete that message to remove it from the queue and create a new BrokeredMessage based on that object and you can then set the ScheduledEnqueueTimeUtc field to something in the future.
BrokeredMessage brokeredMsg = new BrokeredMessage(myObject);
brokeredMsg.ScheduledEnqueueTimeUtc = DateTime.UtcNow.AddMinutes(5);
Client.Send(brokeredMsg);
So if you only want to process one model version at a time, you could assign a version number to your Model and code something in to your processor to look for a certain model number. If the model is higher, then re-queue it for a future time (Until you have updated your code). If it is lower (a missed message), then perhaps have some exception handling.
Use custom MessageProperty on message, say Version.
Under SB topic - create new subscription that will accept only messages with new version (using Rules), and modify existing subscription(s) to NOT accept new version messages.
Then you can upgrade senders - new messages will be stored only in new 'temporary' subscription.
After that, you upgrade listeners, change rules on subscriptions (remove version rule from 'main' subscription, disable receive on temporary subscription).
And now you have choice:
using any tool read messages from temporary subscription and write them back to topic - they will arrive to upgraded listeners.
temporarily start one more listener that will read temporary subscription and process all messages in it
other ways, specific to your architecture
At our organization we have a SQL Server 2005 database and a fair number of database clients: web sites (php, zope, asp.net), rich clients (legacy fox pro). Now we need to pass certain events from the core database with other systems (MongoDb, LDAP and others). Messaging paradigm seems pretty capable of solving this kind of problem. So we decided to use RabbitMQ broker as a middleware.
The problem of consuming events from the database at first seemed to have only two possible solutions:
Poll the database for outgoing messages and pass them to a message broker.
Use triggers on certain tables to pass messages to a broker on the same machine.
I disliked the first idea due to latency issues which arise when periodical execution of sql is involved.
But event-based trigger approach has a problem which seems unsolvable to me at the moment. Consider this scenario:
A row is inserted into a table.
Trigger fires and sends a message (using a CLR Stored Procedure written in C#)
Everything is ok unless transaction which writes data is rolled back. In this case data will be consistent, but the message has already been sent and cannot be rolled back because trigger fires at the moment of writing to the database log, not at the time of transaction commit (which is a correct behaviour of a RDBMS).
I realize now that I'm asking too much of triggers and they are not suitable for tasks other than working with data.
So my questions are:
Has anyone managed to extract data events using triggers?
What other methods of consuming data events can you advise?
Is Query Notification (built on top of Service Broker) suitable in my situation?
Thanks in advance!
Lest first cut out of the of the equation the obvious misfit: Query Notification is not right technology for this, because is designed to address cache invalidation of relatively stable data. With QN you'll only know that table has changed, but you won't be able to know what had changed.
Kudos to you for figuring out why triggers invoking SQLCRL won't work: the consistency is broken on rollback.
So what does work? Consider this: BizTalk Server. In other words, there is an entire business built around this problem space, and solutions are far from trivial (otherwise nobody would buy such products).
You can get quite far though following a few principles:
decoupling. Event based triggers are OK, but do not send the message from the trigger. Aside from the consistency issue on rollback you also have the latency issue of having every DML operation now wait for an external API call (the RabbitMQ send) and the availability issue of the external API call failure (if RabbitMQ is unavailable, your DB is unavailable). The solution is to have the trigger use ordinary tables as queues, the trigger will enqueue a message in the local db queue (ie. will insert into this table) and and external process will service this queue by dequeueing the messages (ie. delete from the table) and forwarding them to RabbitMQ. This decouples the transaction from the RabbitMQ operation (the external process is able to see the message only if the original xact commits), but the cost is some obvious added latency (there is an extra hop involved, the local table acting as a queue).
idempotency. Since RabbitMQ cannot enroll in distributed transactions with the database you cannot guarantee atomicity of the DB operation (the dequeue from local table acting as queue) and the RabbitMQ operation (the send). Either one can succeed when the other failed, and there is simply no way around it w/o explicit distributed transaction enrollment support. Which implies that the application will send duplicate messages every once in a while (usually when things already go bad for some reason). And a quick heads up: enrolling into the act of explicit 'acknowledge' messages and send sequence numbers is a loosing battle as you'll quickly discover that you're reinventing TCP on top of messaging, that road is paved with bodies.
tolerance. For the same reasons as the item above every now in a while a message you believe was sent will never make it. Again, what damage this causes is entirely business specific. The issue is not how to prevent this situation (is almost impossible...) but how to detect this situation, and what to do about it. No silver bullet, I'm afraid.
You do mention in passing Service Broker (the fact that is powering Query Notification is the least interestign aspect of it...). As a messaging platform built into SQL Server which offers Exactly Once In Order delivery guarantees and is fully transacted it would solve all the above pain points (you can SEND from triggers withouth impunity, you can use Activation to solve the latency issue, you'll never see a duplicate or a missing message, there are clear error semantics) and some other pain points I did not mention before (consistency of backup/restore as the data and the messages are on the same unit of storage - the database, cosnsitnecy of HA/DR failover as SSB support both database mirroring and clustering etc). The draw back though is that SSB is only capable of talking to another SSB service, in other words it can only be used to exchange messages between two (or more) SQL Server instances. Any other use requires the parties to use a SQL Server to exchange messages. But if your endpoints are all SQL Server, then consider that there are some large scale deployments using Service Broker. Note that endpoints like php or asp.net can be considered SQL Server endpoints, they are just programming layers on top of the DB API, a different endpoint would, say, the need to send messages from handheld devices (phones) directly to the database (and eve those 99% of the time go through a web service, which means they can reach a SQL Server ultimately). Another consideration is that SSB is geared toward throughput and reliable delivery, not toward low latency. Is definitely not the technology to use to get back the response in a HTTP web request, for instance. IS the technology to use to submit for processing something triggered by a web request.
Remus's answer lays out some sound principals for generating and handling events. You can initiate the pushing of events from a trigger to achieve low latency.
You can achieve everything necessary from a trigger. We will still decouple this into two components: a trigger that generates the events and a local reader that reads the events.
The first component is the trigger.
Make a CLR trigger that prepares what needs to be done when the transaction commits.
Create a System.Transactions.IEnlistmentNotification that always agrees to be prepared, and whose void Commit(System.Transactions.Enlistment) method executes the prepared action.
In the trigger, call System.Transactions.Transaction.Current.EnlistVolatile(enlistmentNotification, System.Transactions.EnlistmentOptions.None)
You'll want your action to be short and sweet, like appending the data to a lockless queue in memory or updating some other state in memory. Don't try to communicate with other machines or processes. Don't write to a disk (if you wanted to write to a disk, just make an ordinary trigger that inserts into a queue table). You'll need to be careful to make sure your assembly is loaded only once so that any shared static state will be unique; this is easiest to do if your static state is in a top level assembly that isn't referenced by other assemblies, so no other assemblies will try to load it.
You will also need to either
initialize your state in such a way that it will be correct even if the system was restarted without sending all the previously queued messages (since a short, in memory queue will not be durable). This means you might be resending messages, so they will need to be idempotent. or
rely on the tolerance of another component to pick up on missed messages
The second component reads the state that is update by the trigger. Make a separate CLR component that reads from your queue or state, and does whatever you need done (like send an idempotent message to a messaging system, record that it was sent, whatever). If this component can fail (hint: it can), you will need some form of tolerance, which may belong in another system. You can achieve low latency by having the trigger signal the second component when new state is available.
One architectural possibility is to have the trigger put the event in memory on commit for another low-latency component to pick up and have the second component send a low-latency, low-reliability copy of an idempotent message. You can pair that with a more reliably or durable messaging system, such as SSB, that will reliably and durably, but with grater latency, send the same idempotent message later.
I am creating a mass mailer application, where a web application sets up a email template and then queues a bunch of email address for sending. The other side will be a Windows service (or exe) that will poll this queue, picking up the messages for sending.
My question is, what would the advantage be of using SQL Service Broker (or MSMQ) over just creating my own custom queue table?
Everything I'm reading is suggesting I use Service Broker, but I really don't see what the huge advantage over a flat table (that would be a lot simpler to work with for me). For reference the application will be used to send 50,000-100,000 emails almost daily.
Do you know how to implement a queue over a flat table? This is not a silly question, implementing a queue over a table correctly is much harder than it sounds. Queue-like-tables are notoriously deadlock prone and you need to carefully consider the table design and the enqueue and dequeue operations. Also, do you know how to scale your pooling of the table? And how are you goind to handle retries and timeouts (ie. what timers are used for)?
I'm not saying you should use SSB. The lerning curve is very steep and is primarily a distributed applicaiton platform, not a local queueing product so some features, like dialogs, will actually be obstacles for you rather than advantages. I'm just saying that you must consider also the difficulties of flat-table-queues. If you never implemented a flat-table-queue then be warned, there are many dragons under that bridge.
50k-100k messages per day is nothing, is only one message per second. If you want 100k per minute, then we have something to talk about.
If you every need to port to another vendor's database, you will have less problem if you used normal tables.
As you seem to only have one reader and one write from your queue, I would tend to use a standard table until you hit problem. However if you start to feel the need to use “locking hints” etc, that the time to switch to the Service Broker Queues.
I would not use MSMQ, if both the sender and the reader need a database connection to work. MSMQ would be good if the sender did not talk to the database at all, as it lets the sender keep working when the database is down. However having to setup and maintain both the MSMQ and the database is likely to be more work then it is worth for most systems.
For advantages of Service Broker see this link:
http://msdn.microsoft.com/en-us/library/ms166063.aspx
In general we try to use a tool or standard functionality rather than building things ourselves. This lowers the cost and can make upgrading easier.
I know this is old question, but is sufficiently abstract to be relevant for long enough time.
After using both paradigms I would suggest flat table. It is surprisingly scalable and nifty. Correct hints need to be used.
Once the application goes distributed, or starts using mutiple allways on groups with different RW and RO servers, the Service Broker (or any other method of distributed communication) becomes a neccessity.
Flat table
needs only few hints (higly dependent on isolation level) to work scalably and reliably in the consumer (READPAST, UPDLOCK, ROWLOCK)
the order of message processing is not set in stone
the consumer must make sure that the message stays in the queue if the processing fails
needs some polling mechanism (job, CDC (here lies madness :)), external application...)
turn of maintenance jobs and automatic statistics for the table
Service broker
needs extremely overblown "infrastructure" (message types, contracts, services, queues, activation procedures, must be enabled after each server restart, conversations need to be correctly created and dropped...)
is extremely opaque - we have spent ages trying to make it run after it mysteriously stopped working
there is a predefined order of message processing
the tables it uses can cause deadlocks themselfs if SB is overused
is the only way (except for linked servers...) to send messages directly from database on RW server of one HA group to a database that is RO in this HA group (without any external app)
is the only way to send messages between different servers (linked servers are a big NONO (unless they become an YESYES - you know the drill - it depends)) (without any external app)