MessageLockLostException after moving BrokeredMessage to dead letter in Azure Function v1 - c#

I have a Azure Function v1 triggered via Service Bus topic. If there happens any error, I put the BrokeredMessage to dead letter. It seems to work but after that I see following in Function's log streaming:
2019-11-19T10:49:31.382 [Error] MessageReceiver error
(Action=Complete) :
Microsoft.ServiceBus.Messaging.MessageLockLostException: The lock
supplied is invalid. Either the lock expired, or the message has
already been removed from the queue.
Here is what how I am putting BrokeredMessage to dead letter:
myBrokeredMessage.DeadLetter(deadLetterReason, exception.Message);
// after this I have tried following but doesn't work:
// 1. do nothing
// 2. myBrokeredMessage.Complete();
// 3. myBrokeredMessage.Abandon();
My Function is running fine. But after it has run and executed above code, that error appears to log streaming. It seems doing what I want (putting BrokeredMessage to dead letter queue), but that error doesn't seem nice and I want to fix it. I guess there is some kind of lock that I'm not handling correctly.
What should I do to fix that error?

What should I do to fix that error?
This is more of a warning than an error. The way Functions are designed is that by default the function completes or dead-letters the message. If you take control over what happens to the incoming message, Functions runtime doesn't it and still tries to apply the logic of completion as from its perspective there was no exception thrown from the user code and therefore the incoming message should be considered successfully processed and be completed.
With Functions 2.x there's a host setting that you could turn on to allow manual completion and disable the automatic completion. That is not available in v1.0, so you'll have to ignore the logged error. Or, alternatively, upgrade to 2.x.

Related

Send Message when NServiceBus Recoverability fails using Notifications

How can I send a message (or publish an event) when a message runs out of retries and moves to the error queue?
When a request comes into my system, I create a Saga to track it. The Saga sends commands to Handlers to do async work. If the handler fails, I want to both move that command to the error queue (the default behavior) and send a message to the Saga to alert the client that originally requested the work.
I have tried customizing the recoverability behavior to use the Saga as the error queue, which sends the command back but does not get it into the error queue:
recoverability.CustomPolicy((config, context) =>
{
// invocation of default recoverability policy
var action = DefaultRecoverabilityPolicy.Invoke(config, context);
if (action is MoveToError)
{
return RecoverabilityAction.MoveToError("SagaEndpoint");
}
return action;
});
Another thing I tried was using a Behavior to hook into the Pipeline, but there does not seem to be a a way to override the "move to error queue" step. I can create an IIncomingLogicalMessageContext and try/catch around the await next();, but that triggers for each retry instead of just the final one. I also tried an IOutgoingLogicalMessageContext, but that does not get invoked when a message moves to the error queue. If I missed something, that could be a solution.
I also know I can use a timeout in the Saga to guess when the Handler fails. But I would rather not wait for a timeout if the failure is quick or risk timing out if the work takes longer than expected.
I found this older question that sounds like it's asking the same thing, but the answer is incomplete and uses the older EventHandler Notifications instead of the newer Task-based Notifications. If there is a way to access an IMessageSession or IEndpointInstance from the Notification callback, I think that would work for me as well.
There's not an "easy" way to do that because at the moment when recoverability is happening, any transaction related to the incoming message (this is different for each transport) is in doubt, so you can't really do anything else within the scope of what's going on right at that moment.
Once you start your endpoint, you can cast the IEndpointInstance to an IMessageSession (same thing without things like the Stop method) and then assign that to a place where your "error queue notifier" will be able to find it. Then any operation you do with the IMessageSession will basically be a separate context, disconnected from the processing of the incoming message.
Just understand that if the message is failing processing because of an underlying problem with the queue, that's not going to report correctly. That's why most people would be doing some sort of call to a reporting/diagnostics service in those callbacks.

Exception in QueueTrigger Azure Function

I have an azure function which is triggered by adding a new message into a queue.
It should download a file from an FTP server and the name of the file is a part of the message that I push into the queue.
At some points, the server which is hosting files might become inaccessible and I will get exceptions, of course.
I would like to know how the queue behaves in these cases? does it pop the message and leave it? Or does it keep it and call the function again and again until the task gets completed without any exceptions?
From the docs:
The Functions runtime receives a message in PeekLock mode. It calls Complete on the message if the function finishes successfully, or calls Abandon if the function fails. If the function runs longer than the PeekLock timeout, the lock is automatically renewed as long as the function is running.
So, if the function fails it will be available for a next run, for a maximum of 10 retries. After 10 retries it goes to the DeadLetter Queue (source):
Service Bus Queues and Subscriptions each have a QueueDescription.MaxDeliveryCount and SubscriptionDescription.MaxDeliveryCount property respectively. the default value is 10. Whenever a message has been delivered under a lock (ReceiveMode.PeekLock), but has been either explicitly abandoned or the lock has expired, the message BrokeredMessage.DeliveryCount is incremented. When DeliveryCount exceeds MaxDeliveryCount, the message is moved to the DLQ, specifying the MaxDeliveryCountExceeded reason code.

Azure ServiceBus queue not retrying messages when TaskCanceledException is thrown

I have an Azure WebJob function that listens to messages on an Azure ServiceBus queue. Usually when I encounter an exception in my code, the message is abandoned as per the Azure WebJobs SDK documentation:
The SDK receives a message in PeekLock mode and calls Complete on the message if the function finishes successfully, or calls Abandon if the function fails. If the function runs longer than the PeekLock timeout, the lock is automatically renewed.
According to the Azure ServiceBus documentation this should mean that the message becomes available again, and will be retried:
If the application is unable to process the message for some reason, it can call the AbandonAsync method on the received message (instead of CompleteAsync). This method enables Service Bus to unlock the message and make it available to be received again, either by the same consumer or by another competing consumer. Secondly, there is a timeout associated with the lock and if the application fails to process the message before the lock timeout expires (for example, if the application crashes), then Service Bus unlocks the message and makes it available to be received again (essentially performing an AbandonAsync operation by default).
The behavior described above is what usually happens, but I have found an exception to this rule. If my code throws a TaskCanceledException specifically, the message is not abandoned as it should:
public void ProcessQueueMessage([ServiceBusTrigger("queue")] BrokeredMessage message, TextWriter log)
{
throw new TaskCanceledException();
}
When running this function through a web job, I see the error message printed out clear as day, but the message is consumed without any retries and without entering the dead-letter queue. If I replace the TaskCanceledException above with InvalidOperationException, the message is abondened and retried as it should (I have verified this against an actual ServiceBus queue).
I have not been able to find any explanation for this behavior. Currently I am wrapping the TaskCanceledException in another exception to work around the issue.
The question
Is what I am experiencing a bug in the Azure WebJobs SDK? Is TaskCanceledException special in this regard, or do other types of exceptions have similar behavior?
I am using the following NuGet packages:
Microsoft.Azure.WebJobs 2.3.0
Microsoft.Azure.WebJobs.ServiceBus 2.3.0
Functions are supposed to abandon the message if execution was not successful. If you're saying the message was not abandoned and retried even though it should be (assuming MaxDeliveryCount was set to larger than 1 and the receive mode was PeekLock), then it's likely to be the issue with the Functions and not Azure Service Bus. You could verify that by running a console application and performing the same, checking wherever the message is completed and removed from the queue or still on the queue and available for consumption.
Also, looks like you're using the older version of WebJobs (and Azure Service Bus). When performing verification, you'd need to use the older Azure Service Bus client (WindowsAzure.ServiceBus) and not the new one (Microsoft.Azure.ServiceBus).

Azure functions with service bus: How to keep a message in the queue if something goes wrong with its processing?

I'm new to service bus and not able to figure this out.
Basically i'm using Azure function app which is hooked onto the service bus queue. Let's say a trigger is fired from the service bus and I receive a message from the queue, and in the processing of that message something goes wrong in my code. In such cases how do I make sure to put that message back in the queue again? Currently its just disappearing into thin air and when I restart my function app on VS, the next message from the queue is taken.
Ideally only when all my data processing is done and when i hit myMsg.Success() do I want it to be removed from the queue.
public static async Task RunAsync([ServiceBusTrigger("xx", "yy", AccessRights.Manage)]BrokeredMessage mySbMsg, TraceWriter log)
{
try{ // do something with mySbMsg }
catch{ // put that mySbMsg back in the queue so it doesn't disappear. and throw exception}
}
I was reading up on mySbMsg.Abandon() but it looks like that puts the message in the dead letter queue and I am not sure how to access it? and if there is a better way to error handle?
Cloud queues are a bit different than in-memory queues because they need to be robust to the possibility of the client crashing after it received the queue message but before it finished processing the message.
When a queue message is received, the message becomes "invisible" so that other clients can't pick it up. This gives the client a chance to process it and the client must mark it as completed when it is done (Azure Functions will do this automatically when you return from the function). That way, if the client were to crash in the middle of processing the message (we're on the cloud, so be robust to random machine crashes due to powerloss, etc), the server will see the absence of the completed message, assume the client crashed, and eventually resend the message.
Practically, this means that if you receive a queue message, and throw an exception (and thus we don't mark the message as completed), it will be invisible for a few minutes, but then it will show up again after a few minutes and another client can attempt to handle it. Put another way, in Azure functions, queue messages are automatically retried after exceptions, but the message will be invisible for a few minutes inbetween retries.
If you want the message to remain on the queue to be retried, the function should not swallow exception and rather throw. That way Function will not auto-complete the message and retry it.
Keep in mind that this will cause message to be retried and eventually, if exception persists, to be moved into dead-letter queue.
As per my understanding, I think what you are for is if there is an error in processing the message it needs to retry the execution instead of swallowing it. If you are using Azure Functions V2.0 you define the message handler options in the host.json
"extensions": {
"serviceBus": {
"prefetchCount": 100,
"messageHandlerOptions": {
"autoComplete": false,
"maxConcurrentCalls": 1
}
}
}
prefetchCount - Gets or sets the number of messages that the message receiver can simultaneously request.
autoComplete - Whether the trigger should automatically call complete after processing, or if the function code will manually call complete.
After retrying the message n(defaults to 10) number of times it will transfer the message to DLQ.

Forcing EventProcessorHost to re-deliver failed Azure Event Hub eventData's to IEventProcessor.ProcessEvents method

The application uses .NET 4.6.1 and the Microsoft.Azure.ServiceBus.EventProcessorHost nuget package v2.0.2, along with it's dependency WindowsAzure.ServiceBus package v3.0.1 to process Azure Event Hub messages.
The application has an implementation of IEventProcessor. When an unhandled exception is thrown from the ProcessEventsAsync method the EventProcessorHost never re-sends those messages to the running instance of IEventProcessor. (Anecdotally, it will re-send if the hosting application is stopped and restarted or if the lease is lost and re-obtained.)
Is there a way to force the event message that resulted in an exception to be re-sent by EventProcessorHost to the IEventProcessor implementation?
One possible solution is presented in this comment on a nearly identical question:
Redeliver unprocessed EventHub messages in IEventProcessor.ProcessEventsAsync
The comment suggests holding a copy of the last successfully processed event message and checkpointing explicitly using that message when an exception occurs in ProcessEventsAsync. However, after implementing and testing such a solution, the EventProcessorHost still does not re-send. The implementation is pretty simple:
private EventData _lastSuccessfulEvent;
public async Task ProcessEventsAsync(
PartitionContext context,
IEnumerable<EventData> messages)
{
try
{
await ProcessEvents(context, messages); // does actual processing, may throw exception
_lastSuccessfulEvent = messages
.OrderByDescending(ed => ed.SequenceNumber)
.First();
}
catch(Exception ex)
{
await context.CheckpointAsync(_lastSuccessfulEvent);
}
}
An analysis of things in action:
A partial log sample is available here: https://gist.github.com/ttbjj/4781aa992941e00e4e15e0bf1c45f316#file-gistfile1-txt
TLDR: The only reliable way to re-play a failed batch of events to the IEventProcessor.ProcessEventsAsync is to - Shutdown the EventProcessorHost(aka EPH) immediately - either by using eph.UnregisterEventProcessorAsync() or by terminating the process - based on the situation. This will let other EPH instances to acquire the lease for this partition & start from the previous checkpoint.
Before explaining this - I want to call-out that, this is a great Question & indeed, was one of the toughest design choices we had to make for EPH. In my view, it was a trade-off b/w: usability/supportability of the EPH framework, vs Technical-Correctness.
Ideal Situation would have been: When the user-code in IEventProcessorImpl.ProcessEventsAsync throws an Exception - EPH library shouldn't catch this. It should have let this Exception - crash the process & the crash-dump clearly shows the callstack responsible. I still believe - this is the most technically-correct solution.
Current situation: The contract of IEventProcessorImpl.ProcessEventsAsync API & EPH is,
as long as EventData can be received from EventHubs service - continue invoking the user-callback (IEventProcessorImplementation.ProcessEventsAsync) with the EventData's & if the user-callback throws errors while invoking, notify EventProcessorOptions.ExceptionReceived.
User-code inside IEventProcessorImpl.ProcessEventsAsync should handle all errors and incorporate Retry's as necessary. EPH doesn't set any timeout on this call-back to give users full control over processing-time.
If a specific event is the cause of trouble - mark the EventData with a special property - for ex:type=poison-event and re-send to the same EventHub(include a pointer to the actual event, copy these EventData.Offset and SequenceNumber into the New EventData.ApplicationProperties) or fwd it to a SERVICEBUS Queue or store it elsewhere, basically, identify & defer processing the poison-event.
if you handled all possible cases and are still running into Exceptions - catch'em & shutdown EPH or failfast the process with this exception. When the EPH comes back up - it will start from where-it-left.
Why does check-pointing 'the old event' NOT work (read this to understand EPH in general):
Behind the scenes, EPH is running a pump per EventHub Consumergroup partition's receiver - whose job is to start the receiver from a given checkpoint (if present) and create a dedicated instance of IEventProcessor implementation and then receive from the designated EventHub partition from the specified Offset in the checkpoint (if not present - EventProcessorOptions.initialOffsetProvider) and eventually invoke IEventProcessorImpl.ProcessEventsAsync. The purpose of the Checkpoint is to be able to reliably start processing messages, when the EPH process Shutsdown and the ownership of Partition is moved to another EPH instances. So, checkpoint will be consumed only while starting the PUMP and will NOT be read, once the pump started.
As I am writing this, EPH is at version 2.2.10.
more general reading on Event Hubs...
Simple Answer:
Have you tried EventProcessorHost.ResetConnection(string partiotionId)?
Complex Answer:
It might be an architecture problem that needs to addressed at your end, why did the processing fail? was it a transient error? is retrying the processing logic is a possible scenario? And so on...

Categories

Resources