FixedDelayRetry attribute on Azure Function doesn't work properly - c#

Since the release of the latest Microsoft.Azure.WebJobs assembly, it is now possible to setup the retry policy of an Azure Function through an attribute such as [FixedDelayRetry]. I was very interested in this feature since I have a Blob Trigger Function, which retries five times by default when it fails, and I was not interested in the retry policy feature. I wanted the function to not retry when it fails, or only once.
So I have setup this attribute in my function as you can see below, and have indicated an amount of maximum one retry if the function fails :
public class Function1
{
private readonly IGremlinService _gremlinService;
private readonly TelemetryClient _telemetryClient;
public Function1(IGremlinService gremlinService, TelemetryConfiguration telemetryConfiguration)
{
this._gremlinService = gremlinService;
this._telemetryClient = new TelemetryClient(telemetryConfiguration);
}
[FunctionName(nameof(Function1))]
[FixedDelayRetry(1, "00:00:10")]
public async Task Run([BlobTrigger("files/{directory}/{name}.pdf", Connection = "AzureWebJobsStorage")] Stream myBlob, string name, ILogger logger)
{
try
{
//my lengthy code not related to the issue
}
}
}
When I try this code on my testing environment, with the Storage Emulator and my CosmosDb Emulator, it works perfectly fine, the function retries only once when it encounters an exception.
However, when I run my function on the Azure platform, instead of retrying only once when it fails, it retries... nine times. I setup the value of the maximum of retries to 5, and this time, the function retried 25 times. I have the feeling that, instead of retrying the amount of times specified in the attribute, the function multiplies that number to the default retry policy of 5. Below you can see my logs, where the function clearly retried 10 times :
What am I doing wrong ?

The function app retry policy is independent of any retries or resiliency that the trigger provides. The function retry policy will only layer on top of a trigger resilient retry. You can read about retry behaviour at here.
Since Azure Blob has maxDequeueCount set to 5 (read
here) that’s why you see that your message is being retried multiple times.
To achieve the desired result you can define the maxDequeueCount property in host.json to 1.

Related

Azure Function App delay retry for azure service bus

First let me explain what I have. I have myself an Azure Service Bus with an Azure Function App. The Service Bus is setup to use SQL Filters to push specific message types into specific topics. Then using my Azure Function App these will get the newest message and then process it.
A basic example
1: I send a request to my EmailAPI
2: EmailAPI then pushing a new message into the Service Bus with a type of "Email"
3: The SQL Filter then sees the type is of "Email" and is placed into the email Topic in the Service Bux
4: The EmailListener Azure Function monitors the Service bus and notices a new message
5: Gather the Service Bus message and process it (basically just send the email using the information provided)
Now let's say for some reason the SMTP server connection is a little broken and some times we get a TimeOutException when attempting to send the email (EmailListener). What happens now when an exception is thrown, the Function App EmailListener will attempt to send it again instantly, no wait, it will just attempt to send it again. It will do this for a total of 10 times and then inform the Service Bus to place the message in the Dead Letter queue.
What I am attempting to do is when an exception is thrown (such as TimeOutException), we wait X amount of time before attempting to process the same message again. I have looked around at many different posts talking about the host.json and attempting to set those settings, but these have not worked. I have found a solution, however the solution requires your to create a clone of the message and push it back into the Service Bus and give it a delayed process time. I would prefer not to implement my own manual delay system, if Azure Service Bus / Function App can deal with retries itself.
The biggest issue I am having (which is probably down to my understanding) is who is at fault? Is it the Service Bus settings to handle the Retry Policy or is it the Azure Function App to deal with attempting to retry after X time.
I have provided a some code, but I feel code isn't really going to help explain my question.
// Pseudo code
public static class EmailListenerTrigger
{
[FunctionName("EmailListenerTrigger")]
public static void Run([ServiceBusTrigger("messages", "email", Connection = "ConnectionString")]string mySbMsg, TraceWriter log)
{
var emailLauncher = new EmailLauncher("SmtpAddress", "SmtpPort", "FromAddress");
try
{
emailLauncher.SendServiceBusMessage(mySbMsg);
}
catch(Exception ex)
{
log.Info($"Audit Log: {mySbMsg}, Excpetion: {ex.message}");
}
}
}
reference one: https://blog.kloud.com.au/2017/05/22/message-retry-patterns-in-azure-functions/ (Thread.Sleep doesn't seem like a good idea)
reference two: https://github.com/Azure/azure-functions-host/issues/2192 (Manually implemented retry)
reference three: https://www.feval.ca/posts/function-queue-retry/ (This seems to refer to queues when I am using topics)
reference four: Can the Azure Service Bus be delayed before retrying a message? (Talks about Defering the message, but then you need to manually get it back out the queue/topic.)
You might be able to solve your issue with the use of Durable Functions. There is for example a built-in method CallActivityWithRetryAsync() that can retry when the activity functions throws an exception.
https://learn.microsoft.com/en-us/sandbox/functions-recipes/durable-diagnostics#calling-activity-functions-with-retry
Your flow would probably something like this:
Service Bus triggered Function. This one starts an Orchestrator Function
The orchestrator calls your activity function (using the aforementioned method)
Your email sending is implemented in an Activity Function and can throw exceptions as needed
While there is no native support for what you want to do, it is still doable without having to do a lot of custom development. You can basically add a service bus output binding to your Azure function, that is connected to the same queue your function consumes messages from. Then, use a custom property to track the number of retries. The following is an example:
private static TimeSpan[] BackoffDurationsBetweenFailures = new[] { }; // add delays here
[FunctionName("retrying-poc")]
public async Task Run(
[ServiceBusTrigger("myQueue")] Message rawRequest,
IDictionary<string, object> userProperties,
[ServiceBus("myQueue")] IAsyncCollector<Message> collector)
{
var request = GetRequest(rawRequest);
var retryCount = GetRetryCount(userProperties);
var shouldRetry = false;
try
{
await _unreliableService.Call(request);
}
catch (Exception ex)
{
// I don't retry if it is a timeout, but that's my own choice.
shouldRetry = !(ex is TimeoutException) && retryCount < BackoffDurationsBetweenFailures.Length;
}
if (shouldRetry)
{
var retryMessage = new Message(rawRequest.Body);
retryMessage.UserProperties.Add("RetryCount", retryCount + 1);
retryMessage.ScheduledEnqueueTimeUtc = DateTime.UtcNow.Add(BackoffDurationsBetweenFailures[retryCount]);
await collector.AddAsync(retryMessage);
}
}
private MyBusinessObject GetRequest(Message rawRequest)
=> JsonConvert.DeserializeObject<MyBusinessObject>(Encoding.UTF8.GetString(rawRequest.Body));
private int GetRetryCount(IDictionary<string, object> properties)
=> properties.TryGetValue("RetryCount", out var value) && int.TryParse(value.ToString(), out var retryCount)
? retryCount
: 0;

Unable to send message into Service Bus Queue using Azure function

I have Azure function like below -
[FunctionName("Demo")]
public static void Run([ServiceBusTrigger("%Demo-Queue%", Connection = "AzureWebJobsBPGAServiceBus")]string myQueueItem,
[ServiceBus("%Update-Queue%", Connection = "AzureWebJobsBPGAServiceBus")] ICollector<BrokeredMessage> updateMessage,
TraceWriter log)
{
string query = "SELECT Id FROM MyTable";
var data = dbs.GetData(query).GetAwaiter().GetResult();
BrokeredMessage brokeredMessage;
foreach (var item in data)
{
JObject jObject = new JObject(new JProperty("Id", item), new JProperty("MessageId", new Guid(item)));
brokeredMessage = new BrokeredMessage(jObject.ToString());
updateMessage.Add(brokeredMessage);
}
}
But message going in dead letter queue . why ? Message format is also correct.Any clue ?
If a message is moved to dead-letter Queue, the reason may be one of these. There will be two custom properties added to the dead-letter messages, when it is moved to dead-letter Queue (DeadLetterReason and DeadLetterErrorDescription), try reading those properties to find the reason.
Your Function is being triggered by messages in your Demo-Queue and the message(s) myQueueItem was/were dispatched to your Function Demo at least 10 times before being moved to the dead-letter queue. Failing >=10 times means that your Function execution did not succeed for >= 10 times.
Kindly search your Function Logs to see if there are any error messages to indicate why your Function execution did not succeed, e.g. due to timeout or errors in your Function code. If it's due to timeout, you may change the autoRenewTimeout property in your host.json to see if that will resolve the issue.
Note that if you are on consumption plan, the autoRenewTimeout needs to stay within the constraints of the max execution time of 5 mins (default) or the configured functionTimeout property (honored to max of 10 mins) in your host.json file.

How do DocumentDBAttribute bindings respond to throttling?

I have azure functions (C# v1 functions--non scripted) that use DocumentDBAttribute bindings for both reading and writing documents. How do those bindings respond to throttling in the following situations?
Writing an item by adding it to an ICollector
Reading an item by providing an Id
This is for functions v1.
First case:
//input binding
[DocumentDB(ResourceNames.APCosmosDBName,
ResourceNames.EpisodeOfCareCollectionName,
ConnectionStringSetting = "APCosmosDB",
CreateIfNotExists = true)] ICollector<EOC> eoc,
//...
eoc.Add(new EOC()); //what happens here if throttling is occuring?
Second case:
[DocumentDB(ResourceNames.ORHCasesDBName, ResourceNames.ORHCasesCollectionName, ConnectionStringSetting = "ORHCosmosDBCases", CreateIfNotExists = true, Id = "{id}")] string closedCaseStr,
Both input and output bindings use CosmosDB SDK which has the retry mechanism in place.
By default, SDK retries 9 times on a throttled result, after that, the exception is bubbled and you Function will error. Depending on the trigger type, it will fail HTTP call, put the message back to the queue etc.
The retries respect the timing recommendation returned by Cosmos DB:
When a client is sending requests faster than the allowed rate, the service will return HttpStatusCode 429 (Too Many Request) to rate limit the client. The current implementation in the SDK will then wait for the amount of time the service tells it to wait and retry after the time has elapsed.
At the moment, there is no way to configure the bindings with a policy other than default.

Event Hub input binding for Azure Functions

I have an Azure Function with an input binding to an Event Hub.
public static async Task Run(TraceWriter log, string eventHubMessage)
When the function is triggered, how many messages does it receive per execution by default?
Is it 1 execution = 1 message?
I have read the documentation and understand you can set these properties in the function's host.json file:
"eventHub": {
// The maximum event count received per receive loop. The default is 64.
"maxBatchSize": 64,
// The default PrefetchCount that will be used by the underlying EventProcessorHost.
"prefetchCount": 256
}
Does maxBatchSize mean I will receive 64 messages in 1 execution?
By default it's going to be 1 by 1 processing, but you can do batches too. Change the signature of your function to
public static async Task Run(TraceWriter log, string[] eventHubMessages)
(if you change the name like I did, rename the binding parameter too)
Reference github issue.
#Mikhail is correct. I'd just like to add the following:
If you use the default EventHub-Trigger C# template, the Function created will process 1 message per execution.
If you need each execution to process in batches, change the following:
a. In function.json, add the property "cardinality":"many" as shown here.
b. In run.csx, modify Function signature and process messages in a loop, e.g.,
public static async Task Run(TraceWriter log, string[] eventHubMessages)
{
foreach(string message in eventHubMessages)
{
// process messages
}
}
The host.json configuration you specified in the question allows you to experiment with the correct batch size and prefetch buffer to meet the needs of your workflow.
Additional comments:
Under the Consumption Plan, a Function is currently allowed a max default 5-minute execution time (configurable up to 10 mins --Added on 11/30/2017). You should experiment with the maxBatchSize and prefetchCount setting to ensure that a typical execution of the batch will complete within the timeframe.
The prefetchCount should be 3-4 times the maxBatchSize.
Each Function host instance is backed by a single EventProcessorHost (EPH). EPH uses a checkpointing mechanism to mark the last successfully processed message. A Function execution could terminate prematurely due to uncaught exceptions in the Function code host crashing, timeout or partition lease lost, resulting in an unsuccessful checkpoint. When the Function execution restarts again, the batch retrieved will have messages from the last known checkpoint. Setting a very high value for maxBatchSize will also mean that you must re-process a large batch. EventHub guarantees at-least-once delivery but not at-most-once delivery. Azure Functions will not attempt to change that behavior. If having only unique messages is a priority, you will need to handle de-duplication in your downstream workflows.

Azure triggered queue failing to delete

According to the documentation from here and here, numerous SO questions like this one, my understanding is that when a queued message fails the given number of times, in this case 5, it is moved from the current queue and into the poison queue automatically.
Unfortunately my limited experience has found this to be only partially true as when the job does fail the max dequeue count, it is added to the poison queue automatically but it is not removed from the original queue and is then reprocessed an apparently unchangeable, 10 minutes later, adding the same message to the poison queue, creating duplicates, still not removing it.
When I implemented my own IQueueProcessorFactory class, created a custom QueueProcessor while overwriting DeleteMessageAsync, I was able to confirm the method is being called when the exception is thrown 5 times and the method finishes without exceptions but the message in the queue remains. I have also tried deleting both the normal and poison queues.
The code I'm using:
public class Program
{
private const string QUEUE_NAME = "some-queue";
// Please set the following connection strings in app.config for this WebJob to run:
// AzureWebJobsDashboard and AzureWebJobsStorage
static void Main()
{
var config = new JobHostConfiguration();
config.Queues.QueueProcessorFactory = new CustomFactory();
var host = new JobHost(config);
// The following code ensures that the WebJob will be running continuously
host.RunAndBlock();
}
private class CustomFactory : IQueueProcessorFactory
{
public QueueProcessor Create(QueueProcessorFactoryContext context)
{
return new CustomQueueProcessor(context);
}
private class CustomQueueProcessor : QueueProcessor
{
public CustomQueueProcessor(QueueProcessorFactoryContext context) : base(context)
{
}
protected override Task DeleteMessageAsync(CloudQueueMessage message, CancellationToken cancellationToken)
{
return base.DeleteMessageAsync(message, cancellationToken);
}
}
}
public static void QueueTrigger([QueueTrigger(QUEUE_NAME)] CloudQueueMessage message)
{
Console.WriteLine($"Processing message: {message.AsString}");
throw new Exception("test exception");
}
}
Everything works as expected except that the message remains in the original queue. I'm assuming, and hoping, the error is in my end and or that it's something stupid that I simply overlooked because I am new to queue's but after having spent almost 2 days trawling the internet for information, I am officially at a loss as what to do or try next.
Edit
While we did end up going with Service Bus, it is worth noting that we came up with an alternative which was to semi-manage the queue ourselves from within the queue trigger.
What this entailed was to check the dequeue count and if it is above the max dequeue (retry) count, simply return. This would signal to the caller that the message "successfully" processed which then removed it from the queue. The approach would result in almost the expected behavior in that the message would get added to the poison queue while being removed from the normal queue 10 minutes later.
It has the added benefit of continuing to work with future releases of the packages or updates to the queues themselves that would fix the original problem as the if would simply never be true.
public class Program
{
private const int MAX_DEQUEUE_COUNT = 5;
static void Main()
{
var config = new JobHostConfiguration();
...
config.Queues.MaxDequeueCount = MAX_DEQUEUE_COUNT;
...
}
public static void QueueTrigger([QueueTrigger("some-queue")] CloudQueueMessage message)
{
if (message.DequeueCount > MAX_DEQUEUE_COUNT)
{
// prevents the message from indefinitely retrying every 10 minutes and ultimately creating duplicates within the poison queue.
return;
}
// do stuff
}
I'm about to give you a non-referenced-answer based on personal experience and what I've seen on StackOverflow. You are not the first person to have issues with automatic deadlettering and honoring the max dequeue count in regards to WebJob QueueTriggerAttributes. My recommendation is to sidestep the flakiness of Storage Queues + QueueTriggers in favor of using Service Bus Queues and Service Bus Triggers.
As a messaging technology, Service Bus Queues are much more full-featured and are cost-comparable. The only real reasons I'd choose to use Storage Queues over Service Bus Queues is if you needed to store more than 80GB of messages, which is the Service Bus Queue limit with partitioning.
I encountered the same behaviour and it was caused by a bug produced between the webjobs sdk (v2) and the storage client library v8.x.
This should be fixed since 2.1.0-beta1-10851. The downside is that there is currently no stable released version of 2.1.0 yet.

Categories

Resources