I'm developing a system which will involve a lot of data synchronisation broken down into small tasks. I am adding each small task as a job/message on the Azure Service Bus queue.
I have X number of Worker Roles then checking the queues and processing the data.
I don't expect to have many messages in the queue because the aim is to process a message, complete it and then re-add the same message again, but scheduled for X minutes time. This will give me a loop to continue to process those tasks.
The great thing about the Azure functionality is that they handle all of the server side stuff for you, but the downside is that it can sometimes be difficult to debug or manipulate the data.
What I want to be able to do is present a list of messages in the queue (Which I have done using PeekBatch) in a web interface. I then want to be able to select some/all of the messages and delete them.
I might want to do this if there is a bug in the code and I want to stop messages of a certain type to stop.
Following on from that I'll have functionality to re-add messages from the web page too. Perhaps I might want to up my worker roles and messages to perform a task at a faster rate (or slow them down), or re-add messages I have deleted.
So, the question is, how can I actually select a specific message from the queue and then delete it? From what I can see, there is no obvious way to do this and, if it's possible, it will require some kind of workaround. This sounds a bit bizarre to me.
Edit:
I've got something that works, but it really doesn't seem a great solution:
public void DeleteMessages(List<long> messageIds)
{
foreach (var msg in Client.ReceiveBatch(100))
{
if (messageIds.Contains(msg.SequenceNumber))
msg.Complete(); // Deletes the message
else
msg.Abandon(); // Puts it back in the queue
}
}
This will get less and less efficient the larger the queue, but it at least does stop all activities whilst the delete call is going on and deletes the specified messages.
It will also only delete messages which are ready to process. Messages in the future will be ignored, so I've currently added the ability to add "Sleep" messages, to stop processing the queue until my messages are "ready" and I can delete them.
I've been informed by Microsoft that they are currently working on the API to delete specific messages which should be available in a few months. Until then, it's all about workarounds.
June Update:
Still no update from Microsoft on this issue and the above method was less than ideal. I have now modified my code so that:
The object I put into the message has a new property:
Guid? MessageId { get; set; }
Note it is nullable Guid just to be backwards-compatible
When I want to delete a message, I add my MessageId into a database table "DeletedMessage".
When it comes to processing a message, I look in the DeletedMessage table for a matching Guid and if it finds one I simply Complete() the message without doing the normal processing.
This works well, but is a slight overhead. If you are not dealing with huge numbers of messages it's not a big problem.
Also note that I did this originally by using the SequenceNumber, but (bizarrely) the SequenceNumber changes between peeking and retrieving the message! This prevents the idea from working, unless you use your own Id as above.
You can create your own queue cleaner program that has a message listener that peeks and either abandons or commits the the message depending on some criteria of the message. It could be an action in a controller if you foresee the need for cleansing on actual production environments as well.
From my experiments it is not possible to delete an Active message from a message queue.
Deferred messages can be removed by calling Receive(sequenceNumber) and then calling Complete().
Scheduled messages can be removed by calling CancelScheduledMessageAsync(sequenceNumber).
For active messages you'll have to make sure that they at some point are moved to the dead letter queue.
Messages in the dead letter queue can then later either be deleted or re-submitted.
Related
I have a worker attached to a subscription processing messages. Depending on the message payload, the worker could possibly take days to completely process the message.
I'm a bit confused by different properties that the client can set that control how the Pub/Sub client library automatically extends the deadline so that the worker can process the message without fearing that the message will be redelivered.
Properties from documentation which is not very clear :
MinimumAckDeadline, MaximumAckDeadline, DefaultAckDeadline, MinimumAckExtensionWindow, DefaultAckExtensionWindow, MinimumLeaseExtensionDelay, and DefaultMaxTotalAckExtension
I believe I want to set DefaultMaxTotalAckExtension to a large value (several days) to allow my subscriber to continue working on the message without getting some kind of time out.
But I think I also want to modify the AckDeadline so that Pub/Sub knows that the client is still alive. Not sure which one I would want to modify: MinimumAckDeadline, MaximumAckDeadline, DefaultAckDeadline.
Aside from those properties, I don't know if I need to set MinimumAckExtensionWindow, DefaultAckExtensionWindow, or MinimumLeaseExtensionDelay.
All of the properties you mentioned are default/limit properties for the SubscriberClient class itself. Note that they are static only have getters, not setters. What you want to set is MaxTotalAckExtension, which controls the maximum amount of time a message's lease will be extended.
However, taking days to process a message is considered an anti-pattern for Cloud Pub/Sub and will very likely result in duplicate deliveries. If you are going to take that long to process a message, you probably need to look at other options like persisting it locally (in a file or in a database), acking it, and then processing it. At that point, you may consider just writing directly to a database instead of to Pub/Sub and scanning for rows that need to be processed by your subscribers.
I'm working on a project that is using NSB, really like it but it's my first NSB solution so a bit of a noob. We have a job that needs to run every day that processes members - it is not expected to take long as the work is simple, but will potentially effect thousands of members, and in the future, perhaps tens or hundreds of thousands.
Having it all happen in a single handler in one go feels wrong, but having a handler discover affected members and then fire separate events for each one sounds a bit too much in the opposite direction. I can think of a few other methods of doing it, but was wondering if there is an idiomatic way of dealing with this in NSB?
Edit to clarify: I'm using Schedule to send a command at 3am, the handler for that will query the SQL db for a list of members who need to be processed. Processing will involve updating/inserting one or two rows per member. My question is around how to process that potentially larege list of members within NSB.
Edit part 2: the job now needs to run monthly, not daily.
I would not use a saga for this. Sagas should be lightweight and are designed for orchestration rather than performing work. They are started by messages rather than scheduled.
You can achieve your ends by using the built-in scheduler. I've not used it, but it looks simple enough.
You could do something like:
configure a command message (eg StartJob) to be sent every day at 0300.
StartJob handler will then query the DB to get the work.
Then, depending on your requirements:
If you need all the work done at once, create a single command with all the work in it, and send it to another endpoint for processing. If you use transactional MSMQ then this will succeed or fail as a unit.
If you don't care if only some work succeeds then create a command per unit of work, and dispatch to an endpoint for processing. This has the benefit that you can scale out using the distributor if you needed to.
I'm working on a project that is using NSB...We have a job that needs
to run every day...
Although you can use NSB for this kind of work, it's not really something I would do. There are many other approaches you could use. A SQL job or cron job would be the obvious one (and a hell of a lot quicker to develop, more performant, and simpler).
Even though it does support such use cases, NServiceBus is not really designed for scheduled batch processing. I would seriously question whether you should even use NSB for this task.
You mention a running process and that sounds like a job for a Saga (see https://docs.particular.net/nservicebus/sagas/). You can use saga data and persist checkpoints in different storage mediums (SQL, Mongo etc). But yes, having something long running then dispatch messages from the Saga to individual handlers is definitely something I would do also.
Something else to consider is message deferral (Timeout Managers). So for example, lets say you process x number of users but want to run this again. NServiceBus allows you to defer messages for a defined period and the message will sit in the queue waiting to be dispatched.
Anymore info just shout and I can update my answer.
A real NSB solution would be to get rid of the "batch" job that processes all those records in one run and find out what action(s) would cause each of these records to need processing after all.
When such an action is performed you should publish an NSB event and refactor the batch job to a NSB handler that subscribes to these events so it can do the processing the moment the action is performed, running in parallel with the rest of your proces.
This way there would be no need anymore for a scheduled 'start' message at 3 am, because all the work would already have been done.
Here is how I might model this idiomatically with NServiceBus: there might be a saga called PointsExpirationPolicy, which would be initiated at the moment that any points are awarded to a user. The saga would store the user ID, and number of points awarded, and also calculate the date/time the points should expire. Then it would request a timeout callback message to be sent at the date/time these points should expire. When that callback arrives, the saga sends a command to expire that number of points from the user's account. This would also give you some flexibility around the logic of exactly when and how points expire, and would eliminate the whole batch process.
I added a new worker role to our project to handle making outgoing phone calls. It draws the calls to make off a service bus queue.
I made a little console app to throw things onto the queue so that I could check to make sure it was working. Unfortunately it seems like whatever is put onto the queue by my app is instantly (or almost instantly?) put onto the dead letter queue. I took a look at the queue's properties and I can see
MessageCount 5
ActiveMessageCount 0
DeadLetterMessageCount 5
TransferMessageCount 0
AvailabilityStatus Available
EnableDeadLetteringOnMessageExpiration False
The worker role tries to get stuff off the queue, but all it's pulling is nulls.
The queue settings are the same as our other queue that's working. The object that I'm putting onto the queue is made up of almost all primitives, marked [MessageContract] and [Serializable]. All the members are marked [MessageHeader].
I also tried using the object that we use for our other queue just to see what would happen and that one dead-lettered right away too.
I don't get it. The object is clearly making it to the queue as the queue size is growing. But it just dead letters right away and I don't know what could cause that to happen besides the thing timing out.
Further Information: Took a look using Service Bus Explorer and it seems that the messages were being dead-lettered because MaxDeliveryCountExceeded. This seems to mean that if receiving a message fails more than 10 times it will shift it to the dead letter queue. So that parts solved, but I put a debug point in the worker role code and there's never any errors happening in there. My
BrokeredMessage message = Client.Receive()
always returns null so there's not even any time for it do anything wrong. I suppose something is going wrong in the actual Receive() call?
Turns out that having a Uri in the MessageContract was enough to have it fail somewhere in the deserialization process. So Client.Receive(); tried to get it 10 times, always failing, and then it got deadlettered.
I thought that Uris were Serializable but there seems to be a problem with putting them through a service bus. In any case, in my situation it wasn't a big deal to change the Uri to a string and everything's fine now.
I am trying to create a message based application based with ActiveMQ, using .NET Clients.
Client 1: A Web Service (producer)
Client 2: A Windows Service (consumer)
My question is: Is it possible to prevent messages of a certain type or content from being queued by a Client?
The reason why I want to do this is Version Updating.
I think there will be a time, when I need to extend or change the message type.
My plan is to do that update in the following order:
Prevent messages of the old version to be queued.
Wait until the consumer has processed all messages of the old version.
Update producer and consumer software.
I would like the Web Service to be still available during the update process to report back to the call. But it should not be able to queue new messages.
Of course if there is a better way of solving this problem altogether, please let me know.
As a general rule it is a good idea to only have one type of payload per queue. An easy way to do this is to use two different queues for the two different message versions. Something like:
mysystem.orders.1_0
mysystem.orders.1_1
The version should be the last part of the queue name, as it makes it easy to work with wildcards, which are used for a lot of the config options in ActiveMQ.
Splitting up different versions into different queues gets you around the problem of having to upgrade the producer and consumer at the same time, and also gives you some visibility as whether all of the 1_0 messages have been consumed.
I have a scenario where about 10 different messages will need to be enqueued and then dequeued / processed. One subscriber will need all 10 messages, but another will only need 8 of the 10 messages. I am trying to understand what the best way is to setup this type of architecture. Do you create a queue for each message type so the subscriber(s) can just subscribe to the relevant queues or do you dump them all to the same queue and ignore the messages that are not relevant to that subscriber? I want to ensure the solution is flexible / scalable, etc.
Process:
10 different xml messages will be enqueued to an IBM WebSphere MQ server.
We will use .Net (Most likely WCF since WebSphere MQ 7.1 has added in WCF support)
We will dequeue the messages and load them into another backend DB (Most likely SQL Server).
Solution needs to scale well because we will be processing a very large number of messages and this could grow (Probably 40-50,000 / hr). At least large amount for us.
As always greatly appreciate the info.
--S
Creating queues is relatively 'cheap' from a resource perspective, plus yes, it's better to use a queue for each specific purpose, so it's probably better in this case to separate them by target client if possible. Using a queue to pull messages selectively based on some criteria (correlation ID or some other thing) is usually a bad idea. The best performing scenario in messaging is the most straightforward one: simply pull messages from the queue as they arrive, rather than peeking and receiving selectively.
As to scaling, I can't speak for Websphere MQ or other IBM products, but 40-50K messages per hour isn't particularly hard for MSMQ on Windows Server to handle, so I'd assume IBM can do that as well. Usually the bottleneck isn't the queuing platform itself but rather the process of dequeuing and processing individual messages.
OK, based on the comments, here's a suggestion that will scale and doesn't require much change on the apps.
On the producer side, I'd copy the message selection criteria to a message property and then publish the message to a topic. The only change that is required here to the app is the message property. If for some reason you don't want to make it publish using the native functionality, you can define an alias over a topic. The app thinks it is sending messages but they are really publications.
On the consumer side you have a couple of choices. One is to create administrative subscriptions for each app and use a selector in the subscription. The messages are then funneled to a dedicated queue per consumer, based on the selection criteria. The apps think that they are simply consuming messages.
Alternatively the app can simply subscribe to the topic. This gives you the option of a dynamic subscription that doesn't receive messages when the app is disconnected (if in fact you wanted that) or a durable subscription that is functionally equivalent to the administrative subscription.
This solution will easily scale to the volumes you cited. Another option is that the producer doesn't use properties. Here, the consumer application consumes all messages, breaks open the message payload on each and decides whether to process or ignore the message. In this solution the producer is still publishing to a topic. Any solution involving straight queueing forces the producer to know all the destinations. Add another consumer, change the producer. Also, there's a PUT for each destination.
The worst case is a producer putting multiple messages and a consumer having to read each one to decide if it's going to be ignored. That option might have problems scaling, depending on how deep in the payload the selection criteria field lies. Really long XPath expression = poor performance and no way to tune WMQ to make up for it since the latency is all in the application at that point.
Best case, producer sets a message property and publishes. Consumers select on property in their subscription or an administrative subscription does this for them. Whether this solution uses application subscriptions or administrative subscriptions doesn't make any difference as far as scalability is concerned.