I have a scaled out application, where each instance connects to a azure service bus subscription with the same name. The end result being that only a single instance gets to act on any given message because they are all listening to the same subscription.
Occasionally the application needs to place an instance into an idle state (service fabric ActiveSecondary replica). When this occurs, I need to close the subscription so that this instance no longer receives messages. If there were 2 instances originally, once one gets placed into the idle state all message should go to the remaining instance. This is important so that all messages are handled by a properly configured primary instance.
When the instance becomes idle, a cancellation token is cancelled. I have code listening for the cancellation and calling Close() on the SubscriptionClient generated when I created the subscription originally.
The issue is, even after I call Close() on one instance, messages are still being randomly split between it and the primary.
Is the way I'm doing this inherently wrong, or is something else in my code causing this behavior?
The Azure Service Bus track 0 and 1 SDKs do not support CancellationTokens. If you're closing your client and messages won't be processed, they'd be picked up another competing instance when visible again. That's where MaxLockDuration and MaxDeliveryCount are important to ensure messages have enough processing attempts to account the situation you're describing w/o waiting for too long.
Disregard this post. Turns out I had the same subscription name used twice within a single instance, so they were competing for the events. The close() function works as expected.
Related
I have a Windows service that spawns a set of child activities on separate threads and that should only terminate when all those activities have successfully completed. I do not know in advance how long it might take to terminate an activity after a stop signal is received. During OnStop(), I wait in intervals for that stop signal and keep requesting additional time for as long as the system is willing to grant it.
Here is the basic structure:
class MyService : ServiceBase
{
private CancellationTokenSource stopAllActivities;
private CountdownEvent runningActivities;
protected override void OnStart(string[] args)
{
// ... start a set of activities that signal runningActivities
// when they stop
// ... initialize runningActivities to the number of activities
}
protected override void OnStop()
{
stopAllActivities.Cancel();
while (!runningActivities.Wait(10000))
{
RequestAdditionalTime(15000); // NOTE: 5000 added for overhead
}
}
}
Just how much "overhead" should I be adding in the RequestAdditionalTime call? I'm concerned that the requests are cumulative, instead of based on the point in time when each RequestAdditionalTime call is made. If that's the case, adding overhead could result in the system eventually denying the request because it's too far out in the future. But if I don't add any overhead then my service could be terminated before it has a chance to request the next block of additional time.
This post wasn't exactly encouraging:
The MSDN documentation doesn’t mention this but it appears that the value specified in RequestAdditionalTime is not actually ‘additional’ time. Instead, it replaces the value in ServicesPipeTimeout. Worse still, any value greater than two minutes (120000 milliseconds) is ignored, i.e. capped at two minutes.
I hope that's not the case, but I'm posting this as a worst-case answer.
UPDATE: The author of that post was kind enough to post a very detailed reply to my comment, which I've copied below.
Lars, the short answer is no.
What I would say is that I now realise that Windows Services ought to be designed to start and terminate processing quickly when requested to do so.
As developers, we tend to focus on the implementation of the processing and then package it up and deliver it as a Windows Service.
However, this really isn’t the correct approach to designing Windows Services. Services must be able to respond quickly to requests to start and stop not only when an administrator making the request from the services console but also when the operating system is requesting a start as part of its start up processing or a stop because it is shutting down,
Consider what happens when Windows is configured to shut down when a UPS signals that the power has failed. It’s not appropriate for the service to respond with “I need a few more minutes…”.
It’s possible to write services that react quickly to stop requests even when they implement long running processing tasks. Usually a long running process will consist of batch processing of data and the processing should check if a stop has been requested at the level of the smallest unit of work that ensures data consistency.
As an example, the first service where I found the stop timeout was a problem involved the processing of a notifications queue on a remote server. The processing retrieved a notification from the queue, calling a web service to retrieve data related to the subject of the notification, and then writing a data file for processing by another application.
I implemented the processing as a timer driven call to a single method. Once the method is called it doesn’t return until all the notifications in the queue have been processed. I realised this was a mistake for a Windows Service because occasionally there might be tens of thousands of notifications in the queue and processing might take several minutes.
The method is capable of processing 50 notifications per second. So, what I should have done was implement a check to see if a stop had been requested before processing each notification. This would have allowed the method to return when it has completed the processing of a notification but before it has started to process the next notification. This would have ensured that the service responds quickly to a stop request and any pending notifications remained queued for processing when the service is restarted.
I'm using RabbitMQ to deliver messages to worker processes (using the official C# client). I have been running simple tests during the implementation, and all has been going swimmingly until now.
I ran a test where I queued messages for a worker process that was not listening (no connection). Once I had queued several hundred messages, I started that process. It created its IModel, declared its queue (which already existed), and began consuming messages (with BasicConsume). This went great. This process, as it processed messages, created messages for other queues. There were processes already listening to these queues (with BasicConsume), and so the messages were immediately delivered to those clients (or so the server thought...). The messages are never processed.
The server definitely believes that the messages have been delivered (the messages are all in the "unacked" bucket, not the "ready" bucket), but
IBasicConsumer.HandleBasicDeliver never got called on the client. I have tried several different techniques (using a Subscription, using QueueingBasicConsumer as well as my own custom consumer), and the outcome is exactly the same. I'm at a complete loss. If I close the connection (there is only one connection here), then the messages immediately move from the "unacked" bucket to the "ready" bucket".
Why doesn't the client get notified when messages are delivered?
Looking into the code, ModelBase.Close() calls ConsumerDispatcher.Shutdown() (ModelBase.cs line 301), and from there, it calls workService.StopWork() (ConcurrentConsumerDispatcher.cs line 27). It seems to me (by a cursory view of the code) that this stops ALL work in the connection's ConsumerWorkService. Instead, should ConcurrentConsumerDispatcher.Shutdown() be calling workService.StopWork(this) on line 27?
It's a bug in the RabbitMQ client, and a fix has already been merged in.
It should be available in the next nightly build, on 4/18/2015.
If your BasicConsume defines noAck = false, after you Dequeues a message needs to run the next code:channel.BasicAck(result.DeliveryTag, false);
If your BasicConsume defines noAck = true, after you Dequeues a message it's removed from the server automatically.
I'm using an Azure Service Bus Queue with Session based messaging enabled. To consume from the queue I register an IMessageSessionAsyncHandler and then process the message in the OnMessageAsync method.
This issue I'm seeing is that if I abandon a message for whatever reason, rather than being received again immediately, I receive the next message in the session and only after processing that message, do I receive the first message again (assuming only two messages in the session).
As an example, lets say I have a queue with 2 messages on it, both with the same SessionId. The two messages have sequence numbers of 1 and 2 respectively. I start receiving and get message with sequence 1, as expected. If I then abandon this message using message.Abandon (the reason for abandoning is irrelevant), I immediately get the next message in the session (sequence number 2). Only after handling (or abandoning) this second message, do I get the first message again.
This behaviour I'm seeing isn't what I'd expect from abandoning a message and isn't consistent with other ways of using the queue. I've tested this same example in the following scenarios
without the use of an IMessageSessionAsyncHandler and instead just manually accepting a message session.
without the use of sessions and instead just having two independent messages on the queue.
In both scenarios, I see the expected bahaviour, in that when I abandon a message it is always guaranteed to be the next message received, unless the max delivery count is exceeded and it is dead-lettered.
My question is this: Is the behaviour I'm seeing with the use of an IMessageSessionAsyncHandler expected, or is this a bug in the Service Bus Library? If this is not a bug, can anyone give me an explaination for why this behaves different to the other ways of receiving?
When you Register a session handler on the Queueclient, Prefetch is turned on internally to improve latency and throughput of the receivers. Unfortunately for the IMessageSessionAsyncHandler scenario this behavior cannot be overriden. One option is to abandon the Session itself when you encounter a message in a session which needs to be abandoned, this will ensure that the messages are delivered in order.
We have pub/sub application that involves an external client subscribing to a Web Role publisher via an Azure Service Bus Topic. Our current billing cycle indicates we've sent/received >25K messages, while our dashboard indicates we've sent <100. We're investigating our implementation and checking our assumptions in order to understand the disparity.
As part of our investigation we've gathered wireshark captures of client<=>service bus traffic on the client machine. We've noticed a regular pattern of communication that we haven't seen documented and would like to better understand. The following exchange occurs once every 50s when there is otherwise no activity on the bus:
The client pushes ~200B to the service bus.
10s later, the service bus pushes ~800B to the client. The client registers the receipt of an empty message (determined via breakpoint.)
The client immediately responds by pushing ~1000B to the service bus.
Some relevant information:
This occurs when our web role is not actively pushing data to the service bus.
Upon receiving a legit message from the Web Role, the pattern described above will not occur again until a full 50s has passed.
Both client and server connect to sb://namespace.servicebus.windows.net via TCP.
Our application messages are <64 KB
Questions
What is responsible for the regular, 3-packet message exchange we're seeing? Is it some sort of keep-alive?
Do each of the 3 packets count as a separately billable message?
Is this behavior configurable or otherwise documented?
EDIT:
This is the code the receives the messages:
private void Listen()
{
_subscriptionClient.ReceiveAsync().ContinueWith(MessageReceived);
}
private void MessageReceived(Task<BrokeredMessage> task)
{
if (task.Status != TaskStatus.Faulted && task.Result != null)
{
task.Result.CompleteAsync();
// Do some things...
}
Listen();
}
I think what you are seeing is the Receive call in the background. Behind the scenes the Receive calls are all using long polling. Which means they call out to the Service Bus endpoint and ask for a message. The Service Bus service gets that request and if it has a message it will return it immediately. If it doesn't have a message it will hold the connection open for a time period in case a message arrives. If a message arrives within that time frame it will be returned to the client. If a message is not available by the end of the time frame a response is sent to the client indicating that no message was there (aka, your null BrokeredMessage). If you call Receive with no overloads (like you've done here) it will immediately make another request. This loop continues to happend until a message is received.
Thus, what you are seeing are the number of times the client requests a message but there isn't one there. The long polling makes it nicer than what the Windows Azure Storage Queues have because they will just immediately return a null result if there is no message. For both technologies it is common to implement an exponential back off for requests. There are lots of examples out there of how to do this. This cuts back on how often you need to go check the queue and can reduce your transaction count.
To answer your questions:
Yes, this is normal expected behaviour.
No, this is only one transaction. For Service Bus you get charged a transaction each time you put a message on a queue and each time a message is requested (which can be a little opaque given that Recieve makes calls multiple times in the background). Note that the docs point out that you get charged for each idle transaction (meaning a null result from a Receive call).
Again, you can implement a back off methodology so that you aren't hitting the queue so often. Another suggestion I've recently heard was if you have a queue that isn't seeing a lot of traffic you could also check the queue depth to see if it was > 0 before entering the loop for processing and if you get no messages back from a receive call you could go back to watching the queue depth. I've not tried that and it is possible that you could get throttled if you did the queue depth check too often I'd think.
If these are your production numbers then your subscription isn't really processing a lot of messages. It would likely be a really good idea to have a back off policy to a time that is acceptable to wait before it is processed. Like, if it is okay that a message sits for more than 10 minutes then create a back off approach that will eventually just be checking for a message every 10 minutes, then when it gets one process it and immediately check again.
Oh, there is a Receive overload that takes a timeout, but I'm not 100% that is a server timeout or a local timeout. If it is local then it could still be making the calls every X seconds to the service. I think this is based on the OperationTimeout value set on the Messaging Factory Settings when creating the SubscriptionClient. You'd have to test that.
I have an NServiceBus application for which a given message may not be processed due to some external event not having taken place. Because this other event is not an NSB event I can't implement sagas properly.
However, rather than just re-queuing the message (which would cause a loop until that external event has occurred), I'm wrapping the message in another message (DelayMessage) and queuing that instead. The DelayMessage is picked up by a different service and placed in a database until the retry interval expires. At which point, the delay service re-queues the message on the original queue so another attempt can be made.
However, this can happen more than once if that external event still hasn't taken place, and in the case where that even never happens, I want to limit the number of round trips the message takes. This means the DelayMessage has a MaxRetries property, but that is lost when the delay service queues the original message for the retry.
What other options am I missing? I'm happy to accept that there's a totally different solution to this problem.
Consider implementing a saga which stores that first message, holding on to it until the second message arrives. You might also want the saga to open a timeout as well so that your process won't wait indefinitely if that second message got lost or something.