I have an web job that consumes message from Azure Service Bus Topic by registering a OnMessage callback . The message lock duration was set to 30 seconds and the lock renew timeout to 60 seconds. As such jobs taking more than 30 seconds to process service bus message were getting lock expired exception.
Now,I have set the message lock duration to more than lock renew time out. But somehow it still throws same exception. I also restarted my webjob, but still no luck.
I tried running same webjob consuming messages from different topic with later settings and it works fine. Is this behaviour expected, and after how much time does this setting change normally reflect.
Any help will be great
I have set the message lock duration to more than lock renew time out. But somehow it still throws same exception.
The max value of lock duration is 5 min. If you need less than 5 min to process the job, you could increase the lock duration of your message to meet your requirement.
If you need more than 5 min to process your job, you need to set the AutoRenewTimeout property of OnMessageOptions. It will renew the lock if the lock expired before it reached the AutoRenewTimeout. For example, if you set lock duration to 1 min and set AutoRenewTimeout to 5 min. The message will keep in locked for up to 5 min if you don't release the lock.
Here are the sample code I used to test the lock duration and AutoRenewTimeout on my side. If the job spent more time than lock duration and AutoRenewTimeout, it will throw a exception when we complete the message(it means timeout happened). I also modified the lock duration on portal and the configuration will be applied immediately when I receive a message.
SubscriptionClient Client = SubscriptionClient.CreateFromConnectionString(connectionString, "topic name", "subscription name");
// Configure the callback options.
OnMessageOptions options = new OnMessageOptions();
options.AutoComplete = false;
options.AutoRenewTimeout = TimeSpan.FromSeconds(60);
Client.OnMessage((message) =>
{
try
{
//process the message here, I used following code to simulation a long time spent job
for (int i = 0; i < 30; i++)
{
Thread.Sleep(3000);
}
// Remove message from subscription.
message.Complete();
}
catch (Exception ex)
{
// Indicates a problem, unlock message in subscription.
message.Abandon();
}
}, options);
For your issue, please check how much time will be spent on your job and choose a right way to set lock duration and AutoRenewTimeout.
The settings should be reflected almost immediately. Also lock renewal should probably be more than the lock duration or disabled.
Lock renewal feature is ASB client feature and it doesn't override lock duration set on entities. If you can reproduce this issue and share the repro, raise a support issue with Microsoft.
Related
I have an application written in C# that long polls a SQS queue with a ReceiveWaitTime of 20 seconds, and the max number of messages read is one.
This application uses AWSSDK.SQS (3.3.3.62).
I am running into a bit of an issue where the polling seems to just hang indefinitely and does not stop polling until the application is restarted (when the application is restarted, we re-create the Message Queue Monitor and start polling from there).
Here is the bit of code that does the polling:
private async Task ReceiveInternalAsync(Func<IMessageReceipt, Task> onMessageReceived,
bool processAsynchronously, TimeSpan? maximumWait = null, CancellationToken? cancellationToken = null, int maxNumberOfMessages = 1)
{
var request = new ReceiveMessageRequest();
var totalWait = maximumWait ?? TimeSpan.MaxValue;
request.QueueUrl = Address;
request.WaitTimeSeconds = GetSqsWaitTimeSeconds(totalWait, cancellationToken.HasValue);
request.MaxNumberOfMessages = maxNumberOfMessages;
var stopwatch = Stopwatch.StartNew();
Amazon.SQS.Model.Message[] sqsMessages;
while (true)
{
var stopwatch2 = Stopwatch.StartNew();
var response = await _sqsClient.ReceiveMessageAsync(request).ConfigureAwait(false);
stopwatch2.Stop();
sqsMessages = response.Messages.Where(i => i != null).ToArray();
_logger.LogDebug($"{request.QueueUrl} {sqsMessages.Length} messages received after {stopwatch2.ElapsedMilliseconds} ms");
...
}
}
Where the parameters being sent to this method are:
onMessagedReceived = a delegate to handle the received message
processAsynchronously = true
maximumWait = 20 seconds (new TimeSpan(0,0,20))
cancellationToken = null
maxNumberOfMessages = 1
I have omitted the rest of the while loop, as I don't believe it's indefinitely looping in there, but I am more than welcome to share the rest of it, if we think it can be the crux of the issue.
The reason why I believe it's the sdk that is hanging is because I don't see the debug message:
{request.QueueUrl} {sqsMessages.Length} messages received after {stopwatch2.ElapsedMilliseconds} ms
Appear, and I know it has hit this method because the caller has a log that states that it has called this method (let me know if I should share the caller's code as well).
I looked up similar issues online and I found this:
https://github.com/aws/aws-sdk-net/issues/609
which seems similar to what I have.
The issue is that this seems to only happen on production whereas locally I cannot replicate to the full extent where it never polls again.
What I have done locally is:
Scenario 1: disconnect completely from the internet
Long Poll queue that has no messages in it
Disconnect from the Internet before 20 seconds are up
About 1 minute and 40 seconds, the AWS SDK will not throw an error but continue on as if there were empty messages in the queue
About 2 minutes in or 2 minutes and 30 seconds in I get a DNS name resolution error
Scenario 2: disconnect from the internet for 1 minute and 40 seconds and reconnect
Based on my analysis from the above scenario, I wondered then what would happen if I were to reconnect after step 3) in scenario 1.
I found that the AWS SDK will wait for 20 seconds to retrieve any messages from the queue.
Theory as to what's happening
I suppose we can have indefinite polling if the client's network keeps going in and out of a connection, such that they are not disconnected for a total of 1 minute and 40 seconds but keep disconnecting and reconnecting before the 20 seconds are up?
Has anyone encountered this issue before? My temporary solution is to send a cancellationToken with a client-side timeout specified. Just wondering if there is any other reasons for the indefinite polling?
Thank you very much for the help!
I'm using Microsoft.Azure.ServiceBus. (doc)
I was getting an exception of:
The lock supplied is invalid. Either the lock expired, or the message
has already been removed from the queue.
By the help of these questions:
1, 2, 3,
I am able to avoid the Exception by setting the AutoComplete to false and by increment the Azure's queue lock duration to its max (from 30 seconds to 5 minutes).
_queueClient.RegisterMessageHandler(ProcessMessagesAsync, new
MessageHandlerOptions(ExceptionReceivedHandler)
{
MaxConcurrentCalls = 1,
MaxAutoRenewDuration = TimeSpan.FromSeconds(10),
AutoComplete = false
}
);
private async Task ProcessMessagesAsync(Message message, CancellationToken token)
{
await ProccesMessage(message);
}
private async Task ProccesMessage(Message message)
{
//The complete should be closed before long-timed process
await _queueClient.CompleteAsync(message.SystemProperties.LockToken);
await DoFoo(message.Body); //some long running process
}
My questions are:
This answer suggested that the exception was raised because the lock was being expired before the long time process, but in my case I was marking the message as complete immediately (before the long run process), so I'm not sure why changing the locking duration from azure made any difference? when I change it back to 30 seconds I can see the exception again.
Not sure if it related to the question but what is the purpose MaxAutoRenewDuration, the official docs is The maximum duration during which locks are automatically renewed.. If in my case I have only one app receiver that en-queue from this queue, so is it not needed because I do not need to lock the message from another app to capture it? and why this value should be greater than the longest message lock duration?
There are a few things you need to consider.
Lock duration
Total time since a message acquired from the broker
The lock duration is simple - for how long a single competing consumer can lease a message w/o having that message leased to any other competing consumer.
The total time is a bit tricker. Your callback ProcessMessagesAsync registered with to receive the message is not the only thing that is involved. In the code sample, you've provided, you're setting the concurrency to 1. If there's a prefetch configured (queue gets more than one message with every request for a message or several), the lock duration clock on the server starts ticking for all those messages. So if your processing is done slightly under MaxLockDuration but for the same of example, the last prefetched message was waiting to get processed too long, even if it's done within less than lock duration time, it might lose its lock and the exception will be thrown when attempting completion of that message.
This is where MaxAutoRenewDuration comes into the game. What it does is extends the message lease with the broker, "re-locking" it for the competing consumer that is currently handling the message. MaxAutoRenewDuration should be set to the "possibly maximum processing time a lease will be required". In your sample, it's set to TimeSpan.FromSeconds(10) which is extremely low. It needs to be set to be at least longer than the MaxLockDuration and adjusted to the longest period of time ProccesMessage will need to run. Taking prefetching into consideration.
To help to visualize it, think of the client-side having an in-memory queue where the messages can be stored while you perform the serial processing of the messages one by one in your handler. Lease starts the moment a message arrives from the broker to that in-memory queue. If the total time in the in-memory queue plus the processing exceeds the lock duration, the lease is lost. Your options are:
Enable concurrent processing by setting MaxConcurrentCalls > 1
Increase MaxLockDuration
Reduce message prefetch (if you use it)
Configure MaxAutoRenewDuration to renew the lock and overcome the MaxLockDuration constraint
Note about #4 - it's not a guaranteed operation. Therefore there's a chance a call to the broker will fail and message lock will not be extended. I recommend designing your solutions to work within the lock duration limit. Alternatively, persist message information so that your processing doesn't have to be constrained by the messaging.
How to implement exponential backoff in Azure Functions?
I have a function that depends on external API. I would like to handle the unavailability of this service using the retry policy.
This function is triggered when a new message appears in the queue and in this case, this policy is turned on by default:
For most triggers, there is no built-in retry when errors occur during function execution. The two triggers that have retry support are Azure Queue storage and Azure Blob storage. By default, these triggers are retried up to five times. After the fifth retry, both triggers write a message to a special poison queue.
Unfortunately, the retry starts immediately after the exception (TimeSpan.Zero), and this is pointless in this case, because the service is most likely still unavailable.
Is there a way to dynamically modify the time the message is again available in the queue?
I know that I can set visibilityTimeout (host.json reference), but it's set for all queues and that is not what I want to achieve here.
I found one workaround, but it is far from ideal solution. In case of exception, we can add the message again to the queue and set visibilityTimeout for this message:
[FunctionName("Test")]
public static async Task Run([QueueTrigger("queue-test")]string myQueueItem, TraceWriter log,
ExecutionContext context, [Queue("queue-test")] CloudQueue outputQueue)
{
if (true)
{
log.Error("Error message");
await outputQueue.AddMessageAsync(new CloudQueueMessage(myQueueItem), TimeSpan.FromDays(7),
TimeSpan.FromMinutes(1), // <-- visibilityTimeout
null, null).ConfigureAwait(false);
return;
}
}
Unfortunately, this solution is weak because it does not have a context (I do not know which attempt it is and for this reason I can not limit the number of calls and modify the time (exponential backoff)).
Internal retry policy also is not welcome, because it can drastically increase costs (pricing models).
Microsoft added retry policies around November 2020 (preview), which support exponential backoff:
[FunctionName("Test")]
[ExponentialBackoffRetry(5, "00:00:04", "00:15:00")] // retries with delays increasing from 4 seconds to 15 minutes
public static async Task Run([QueueTrigger("queue-test")]string myQueueItem, TraceWriter log, ExecutionContext context)
{
// ...
}
I had a similar problem and ended up using durable functions which have an automatic retry feature built-in. This can be used when you wrap your external API call into activity and when calling this activity you can configure retry behavior through the options object. You can set the following options:
Max number of attempts: The maximum number of retry attempts.
First retry interval: The amount of time to wait before the first retry attempt.
Backoff coefficient: The coefficient used to determine rate of increase of backoff. Defaults to 1.
Max retry interval: The maximum amount of time to wait in between retry attempts.
Retry timeout: The maximum amount of time to spend doing retries. The default behavior is to retry indefinitely.
Handle: A user-defined callback can be specified to determine whether a function should be retried.
One option to consider is to have your Function invoke a Logic App that has a delay set to your desired amount of time and then after the delay invokes the function again. You could also add other retry logic (like # of attempts) to the Logic App using some persistent storage to tally your attempts. You would only invoke the Logic App if there was a connection issue.
Alternatively you could shift your process starting point to Logic Apps as it also can be triggered (think bound) queue messages. In either case Logic Apps adds the ability to pause and re-invoke the Function and/or process.
If you are explicitly completing/deadlettering messages ("autoComplete": false), here's an helper function that will exponentially delay and retry until the max delivery count is reached:
public static async Task ExceptionHandler(IMessageSession MessageSession, string LockToken, int DeliveryCount)
{
if (DeliveryCount < Globals.MaxDeliveryCount)
{
var DelaySeconds = Math.Pow(Globals.ExponentialBackoff, DeliveryCount);
await Task.Delay(TimeSpan.FromSeconds(DelaySeconds));
await MessageSession.AbandonAsync(LockToken);
}
else
{
await MessageSession.DeadLetterAsync(LockToken);
}
}
Since November 2022, there hasn't been anymore support for Function-level retries for QueueTrigger (source).
Instead of this, you must use Binding extensions:
{
"version": "2.0",
"extensions": {
"serviceBus": {
"clientRetryOptions":{
"mode": "exponential",
"tryTimeout": "00:01:00",
"delay": "00:00:00.80",
"maxDelay": "00:01:00",
"maxRetries": 3
}
}
}
}
I am using RabbitMQ from last month successfully.Messages are read from queue using BasicConsume feature of RabbitMQ.The messages published to queue is immediately consumed by the corresponding consumer.
Now i created a new queue DelayedMsg,The messages published to this queue has to be read only after 5 min delay.What should i do?
Add a current timestamp value to the message while publishing message to the main queue from publisher / sender. Say example, 'published_on' => 1476424186.
At the consumer side, first check the time difference of current timestamp and published_on.
If the difference found to be less than 5 minutes, then send your message in another queue ( a DLX queue ) with setting expiration time.( use 'expiration' property of amqp message )
This expiration value should be ( current timestamp - published_on ) and it should be in milliseconds.
The message will gets expired in the DLX queue on exact 5 min.
Make sure 'x-dead-letter-exchange' should be your main queue exchange and is bounded with the dlx queue so that when the message gets expired, it will automatically gets queued up back into the main queue. see Dead Letter Exchange for more details.
So, consumer now get the message back after 5 min, process it normally, since its current timestamp and published_on difference will be greater than 5 min.
Try avoiding using DLX to implement delayed messages anymore. It is more like a workaround before having the "RabbitMQ Delayed Message Plugin."
Since now we have this plugin, we should try to use it instead:
https://www.rabbitmq.com/blog/2015/04/16/scheduling-messages-with-rabbitmq/
create 2 queues.
1 is work queue
2 is delay queue
and set delay queue property x-dead-letter-->work queue name,ttl-->5min
send message to delay queue,not need consumer it ,after 5min,this send the message to deal-letter(work queue),so you just consumer the work queue and process it
This is the setup for the queue:
The queue is public and marked as transactional.
A web service is responsible for inserting messages into the queue.
A Windows service is listening in on the queue
Given that, how can I only process messages that are at least 20 minutes old?
Given that a queue is first-in, first-out, you have to assume that all messages that are in a queue arrive at a time equal to or later than the time than the first message in the queue.
You can use that to your advantage.
Basically, you would call the Peek method on your MessageQueue instance and look at the ArrivedTime property. If the time between now, and the arrived time is greater than 20 minutes, then you would process it, otherwise, you would continue to wait (until the next time you process the messages on the queue).
Here is an example:
static Message GetQueueMessage(MessageQueue queue)
{
// Store the current time.
DateTime now = DateTime.Now;
// Get the message on top of the queue.
Message message = queue.Peek();
// If the current time is greater than or equal to 20 minutes, then process it,
// otherwise, get out and return false. Generate 20 minutes first.
// You can refactor this to have it passed in.
TimeSpan delay = TimeSpan.FromMinutes(20);
// If the delay is greater than the arrived time and now, then get out.
if (delay > (now - message.ArrivedTime))
{
// Return null.
return null;
}
// Pop the message from the queue to remove it.
return queue.ReceiveById(message.Id);
}
You can then run this in a loop, waiting for a bit in between (every five seconds, ten, whatever you feel is appropriate for the frequency of processing):
// Have some other reasonable check here.
do while (true)
{
// The message to process.
Message message = GetQueueMessage(queue);
// Continue to process while there is a message.
do while (message != null)
{
// Process the message.
// Get the next message.
message = GetQueueMessage(queue);
}
// Wait a little bit, five seconds for example.
Thread.Sleep(5000);
}
Ideally, based on Charles Conway's comment, you would not process this in a loop, but rather, just call GetQueueMessage from an event handler on a timer which would fire on your designated interval.
// Have some other reasonable check here.
do while (true)
{
...
}
Having "true" in a boolean condition is a code smell... Have you considered using a timer the wakes every 5 seconds?
Timer timer = new Timer();
timer.Elapsed +=new ElapsedEventHandler(timer_Elapsed);
static void timer_Elapsed(object sender, ElapsedEventArgs e)
{
// The message to process.
Message message = GetQueueMessage(queue);
// Continue to process while there is a message.
do while (message != null)
{
// Process the message.
// Get the next message.
message = GetQueueMessage(queue);
}
}
The caveat of this is at what point are you defining 20 minutes old?
Case 1: If defined based upon when the message was generated and sent, then you're going to have to use the Message.SentTime. You can then use Peek to look at the message, and delay processing until the 20 minute threshold has been met. This assumes that messages will arrive in a linear order, which is not guaranteed! (See below.)
Case 2: If defined based upon when the message arrives in the queue, take note of the time of Message.ArrivedTime (IE process is "listening" on the message queue) and consume 20 minutes later.
But... Both cases cause a problem as you can only peek on the first message. As such, you will have to consume the message into a queue or other structure for processing at the designated time.