Azure Service Bus Queues AutoRenewTimeout produces a "SessionLockLostException" on '.Complete()' - c#

I have an Azure Service Bus Queue.
It's configured with:
Requires Duplicate Detection: true
Requires Session: true
Enable Partitions: false
Max Delivery Count: 10
Lock Duration: 1 minute
Batch Operations Enabled: true
Deadletter on Expiration Enabled: false
Enforce message ordering: true
When retrieving a message from the queue I use the following OnMessageOptions:
AutoComplete: false
AutoRenewTimeout: 12 minutes
Each message takes on average 2 minutes to complete.
Some of them succeed, others throw a "SessionLockLostException".
Why does the lock "AutoRenew" not keep the message lock renewed? It's supposed to keep doing it's job for 12 minutes, yet we get that exception after 2.
How do you debug the cause of the exception? The exception tells me roughly what happened, but not why. I can't find any information about logging within the Service Bus Queue client.
Where is the documentation? The MSDN in this instance is awful! It lacks even basic information about how these classes are supposed to work.
EDIT: As MaDeRkAn helpfully mentioned in a comment, the documentation for "SessionLockLostException" does mention that Azure can move around messages between partitions.
When I originally created a test application to see if this approach worked I had the queue configured to use partitions. While figuring out the code needed to handle the various exceptions that occur in various situations I read about that exception.
I have discounted this as being the problem for two reasons:
I've (literally) triple checked that Partitions are disabled. I also checked that the Queue we're using is the same Queue I'm looking at for the properties.
If Azure was causing failures this often (every 2-5 messages) then the service would be pretty much unusable! And while Azure has issues at times it's not normally totally broken like that.

Related

Azure ServiceBus - same message read multiple times

We have some issues with messages from Azure ServiceBus being read multiple times. Previously we had the same issue, which turned out to be due to lock timeout. Then, as the lock timed out the messages were read again, and their deliveryCount increased by 1 for each time the message was read. After this, we set the max delivery count to 1 to avoid resending of messages, and also increased the lock timeout to 5 minutes.
The current issue is a lot more strange.
First, messages are read at 10:45:34. Message locks are set to 10:50:34, and deliveryCount is 1. The reading says it succeeds, at 10:45:35.0. All good so far.
But then, at 10:45:35.8, the same messages are read again! And the delivery count is still 1. Both the sequence number and message id are the same in the two receive logs. This happens for a very small percentage of messages, something like 0,02% of the messages.
From what I understand, reading a message should either result in a success where the message should be removed, or an increase of deliveryCount, which in my case should send the message to DLQ. In these cases, neither happens.
I'm using ServiceBusTrigger, like this:
[FunctionName(nameof(ReceiveMessages))]
public async Task Run([ServiceBusTrigger(queueName: "%QueueName%", Connection = "ServiceBusConnectionString")]
string[] messages,
This seems to be like a bug in either the service bus or the library, any thoughts on what it could be?
That’s not the SDK but rather the specific entity. It sounds like the entity is corrupted. Delete and recreate it. If that doesn’t help, then open a support case.
On a different note, most of the time when delivery count is set to 1 is an indicator of something off. If you truly need at-most-once delivery guarantee, use ReceiveAndDelete mode instead of PeekLock.

Queue-trigger Azure Function: Queue messages processed more than once

I have an HTTP-trigger Azure Function that adds a message to the queue: outputQueue.AddAsync(myMessage); Then the Queue-trigger Azure Function is triggered. It adds 100 messages to the same queue. Each one of those 100 messages is dequeued by this function and processed. This processing takes about 5-7 minutes. My functionTimeout is 10 minutes. Sometimes (in 10% of the calls) the same message is being dequeued amd processed twice and even more, thought the previous processing of this message was successful. Also I paid attantion that each such redundent dequeue happens about 10 minutes after the previous dequeue of the same massage (seems to be related to my functionTimeout of 10 minutes). So it looks like after the processing is done the function is not ended and hence not deleted from the queue, which causes another instance to dequeue it.
When I look at the Failures section of Application Insights I see that approximately for 1K operations I have about 10 WebExceptions and 2 TimeoutExceptions.
WebException:
Message: The remote server returned an error: (409) Conflict.
Failed method:
Microsoft.WindowsAzure.Storage.Shared.Protocol.HttpResponseParsers.ProcessExpectedStatusCodeNoExceptiond
FormattedMessage: An unhandled exception has occurred. Host is shutting down.
TimeoutException:
Message: The client could not finish the operation within specified timeout. The client could not finish the operation within specified timeout.
Failed method: Microsoft.WindowsAzure.Storage.Core.Executor.Executor.EndExecuteAsync
FormattedMessage: An unhandled exception has occurred. Host is shutting down.
I have try..catch in my Function entry point, but probably those 2 exceptions don't go to the catch block.
My host.json is as follows:
{
"functionTimeout": "00:10:00",
"version": "2.0",
"extensions": {
"queues": {
"maxPollingInterval": 1000,
"visibilityTimeout": "01:00:00",
"batchSize": 8,
"maxDequeueCount": 5,
"newBatchThreshold": 4
}
}
}
When I set "batchSize": 2 and "newBatchThreshold": 1 I have less redundant dequeues, but more instances are created (I know this by logging the IP of the server of every Azure Function call). If I have more servers that process different messages then my static data is less re-used betwen instances.
Also note that I've set the "visibilityTimeout" to 1 hour (I tried 30 minutes as well), but looks like this value is completely ignored and the message becomes visible after 10 minutes.
Any idea how I can avoid duplicate processing of the same messages? I'm thinking about writing the message info to DB after the successful processing and on every dequeue of a message check if this message was processed, say, within 1 hour from now and if so, not to process it again. Another option I'm thinking about is setting "maxDequeueCount" to 1 (I have a restore mechanism if some messages won't be processed at all due to some real failure).
BTW, those 10% of redundant processings don't cause functionality issues, but I still want to improve the performance.

NServiceBus - getting error when message spawns many child messages

I'm very new to NserviceBus and I have an NServiceBus application which processes a message with a command that creates multiple child messages containing a different command.
For example I put message BulkOrder01 on the queue which gets picked up by my BulkOrder message handler. The payload of my BulkOrder01 conatins an bulk order ID which when looked up in a database returns 4000 orders. For each of those child orders I send an Order message to the queue.
This seems to work fine in our production environment for less than 1000 child orders but once we get above 1000 we often see the parent message not being processed and therefore the child messages not being created.
When running it locally, I have found that it will send one or two of the child messages to the queue but then will exception with an NServiceBus.Unicast.Queing. FailedToSendMessgeException 'Failed to send message to address: Namespace.OrderService#MyComputername' and the inner exception is 'System.Messaging.MessageQueueException Cannot enlist the transaction'
I have found that if I set DoNotWrapHandlersExecutionInATransactionScope in the EndpointConfig then I do not get the exception until what appears to be the parent message times out. Which I can prevent by increasing the transaction timeout.
However setting DoNotWrapHandlersExecutionInATransactionScope makes me nervous, I can't seem to find much information about what this actually does. Obviously it does not wrap handlers execution in a transaction scope but from the testing I have performed it still appears to behave transactionally so if the parent message fails then none of the child messages get sent. I remember reading that there are multiple layers of transaction scope - so does this just remove one of those layers?
Maybe this whole approach the wrong way of going about it - I know about the existence of Sagas within the NserviceBus but nothing really about them maybe the process i have described should be done using a Saga...
Googling the exception bought up that it is a timeout issue, however I have found that only increasing the timeout on it's own simply delays the exception by the timeout amount. It only works locally when I have DoNotWrapHandlersExecutionInATransactionScopeset. In rpoduction it seems to work more reliably and only fails on large volumes of child messages.
Also the child messages that are created seems to take a fairly long time to be added to the queue, around 50 milliseconds which when scaled up to 4000 messages is a total of 3.3 minutes. This seems a long time to put a small message onto the queue maybe something is not configured correctly?
I am using Entity Framework for access to the database and Unity for dependency injection within a C# 4.5 environment running NServiceBus 4.3.0.0
I am using IBus.SendLocal to send the messages and I'm configuring the timeout and setting as follows:
NServiceBus.Configure.Transactions.Advanced(x => x.DefaultTimeout(new TimeSpan(0, 5, 0)));
NServiceBus.Configure.Transactions.Advanced(x => x.DoNotWrapHandlersExecutionInATransactionScope());
Can anyone point me in the right direction as to if I'm doing this correctly -is the slow(ish) performance expected. Thanks.
We have a similar process and the challenge is that all of that work is wrapped up in the same transaction as you have found out. You will want to create another endpoint to handle your child messages.
I would recommend configuring the Distributor in NSB and have the Distributor delegate all the child processing to workers. You can therefore scale out the processing of child messages as needed.

Calling Abandon on an Azure Service Bus re-queues the message at the back rather than the front of the queue

I'm using an Azure Service Bus Queue with Session based messaging enabled. To consume from the queue I register an IMessageSessionAsyncHandler and then process the message in the OnMessageAsync method.
This issue I'm seeing is that if I abandon a message for whatever reason, rather than being received again immediately, I receive the next message in the session and only after processing that message, do I receive the first message again (assuming only two messages in the session).
As an example, lets say I have a queue with 2 messages on it, both with the same SessionId. The two messages have sequence numbers of 1 and 2 respectively. I start receiving and get message with sequence 1, as expected. If I then abandon this message using message.Abandon (the reason for abandoning is irrelevant), I immediately get the next message in the session (sequence number 2). Only after handling (or abandoning) this second message, do I get the first message again.
This behaviour I'm seeing isn't what I'd expect from abandoning a message and isn't consistent with other ways of using the queue. I've tested this same example in the following scenarios
without the use of an IMessageSessionAsyncHandler and instead just manually accepting a message session.
without the use of sessions and instead just having two independent messages on the queue.
In both scenarios, I see the expected bahaviour, in that when I abandon a message it is always guaranteed to be the next message received, unless the max delivery count is exceeded and it is dead-lettered.
My question is this: Is the behaviour I'm seeing with the use of an IMessageSessionAsyncHandler expected, or is this a bug in the Service Bus Library? If this is not a bug, can anyone give me an explaination for why this behaves different to the other ways of receiving?
When you Register a session handler on the Queueclient, Prefetch is turned on internally to improve latency and throughput of the receivers. Unfortunately for the IMessageSessionAsyncHandler scenario this behavior cannot be overriden. One option is to abandon the Session itself when you encounter a message in a session which needs to be abandoned, this will ensure that the messages are delivered in order.

Messages in Azure Service Bus Queues remain in queue after expire time

I'm working with Azure Service Bus Queues in a request/response pattern using two queues and in general it is working well. I'm using pretty simple code from some good examples I've found. My queues are between web and worker roles, using MVC4, Visual Studio 2012 and .NET 4.5.
During some stress testing, I end up overloading my system and some responses are not delivered before the client gives up (which I will fix, not the point of this question).
When this happens, I end up with many messages left in my response queue, all well beyond their ExpiresAtUtc time. My message TimeToLive is set for 5 minutes.
When I look at the properties for a message still in the queue, it is clearly set to expire in the past, with a TimeToLive of 5 minutes.
I create the queues if they don't exist with the following code:
namespaceManager.CreateQueue(
new QueueDescription( RequestQueueName )
{
RequiresSession = true,
DefaultMessageTimeToLive = TimeSpan.FromMinutes( 5 ) // messages expire if not handled within 5 minutes
} );
What would cause a message to remain in a queue long after it is set to expire?
As I understand it, there is no background process cleaning these up, only the act of moving the queue cursor forward with a call to Receive will cause the server to skip past and dispose of messages which are expired and actually return the first message that is not expired or none if all are expired.

Categories

Resources