Microsoft have recently released retry policies for Azure Functions (preview), which can be applied using the FixedDelayRetry and ExponentialBackoffRetry attributes. Do these retry policies hook into the Azure Functions runtime and operate at a level below the function invocations, or are they effectively the same as an await Task.Delay in user-code? Specifically, would the retry delays of these policies count towards the function execution time, and hence be billed and cause timeouts if they exceed the 10-minute maximum duration on consumption plan?
I can only find the following relevant method in the source code (simplified version below), which enforces retry delays using an await Task.Delay, but I might be missing something.
namespace Microsoft.Azure.WebJobs.Host.Executors
{
internal static class FunctionExecutorExtensions
{
public static async Task<IDelayedException> TryExecuteAsync(this IFunctionExecutor executor, Func<IFunctionInstance> instanceFactory, ILoggerFactory loggerFactory, CancellationToken cancellationToken)
{
var attempt = 0;
while (true)
{
var functionInstance = instanceFactory.Invoke();
var functionException = await executor.TryExecuteAsync(functionInstance, cancellationToken);
if (functionException == null)
return null; // function invocation succeeded
if (functionInstance.FunctionDescriptor.RetryStrategy == null)
return functionException; // retry is not configured
var retryStrategy = functionInstance.FunctionDescriptor.RetryStrategy;
if (retryStrategy.MaxRetryCount != -1 && ++attempt > retryStrategy.MaxRetryCount)
return functionException; // retry count exceeded
TimeSpan nextDelay = retryStrategy.GetNextDelay(retryContext);
await Task.Delay(nextDelay);
}
}
}
}
There are many shortcomings in other aspects of these retry policies, so unless they offer some advantage through their integration with the functions runtime, I'd prefer to stick with a mature reusable library such as Polly.
The retry policies cannot be customized. ExponentialBackoffRetry uses a hardcoded factor of 2 that cannot be changed. FixedDelayRetry does not support random jitter, and ExponentialBackoffRetry has a hardcoded random jitter of ±20%.
Retry failures get logged as errors, needlessly cluttering the error logs. It doesn't seem possible to disable these retry errors without losing all other function errors too.
There is no way of getting the retry count. RetryContext still gives a binding error on the latest non-beta packages.
Technical documentation is sparse, with the above details needing to be inferred from the source code.
Related
I have several GraphServiceClients and I'm using them to retrieve information from Microsoft Graph API. There's a throttle on the GraphServiceClient calls. As far as I understood from this documentation, you can't call APIs more than 10,000 times in a 10-minute time frame and you can only use 4 concurrent requests at the same time. What's a thread-safe and efficient way to check if I have reached the maximum limit?
My implementation
I came up with this but I'm not sure if it's actually how the Microsoft Graph is checking for the limits.
public class ThrottledClient
{
private readonly TimeSpan _throttlePeriod;
private readonly int _throttleLimit;
public ThrottledClient(int throttleLimit, TimeSpan throttlePeriod)
{
_throttleLimit = throttleLimit;
_throttlePeriod = throttlePeriod;
}
private readonly ConcurrentQueue<DateTime> _requestTimes = new();
public required GraphServiceClient GraphClient { get; set; }
public async Task CheckThrottleAsync(CancellationToken cancellationToken)
{
_requestTimes.Enqueue(DateTime.UtcNow);
if(_requestTimes.Count > _throttleLimit)
{
Console.WriteLine($"Count limit, {DateTime.Now:HH:mm:ss}");
_requestTimes.TryDequeue(out var oldestRequestTime);
var timeRemaining = oldestRequestTime + _throttlePeriod - DateTime.UtcNow;
if(timeRemaining > TimeSpan.Zero)
{
Console.WriteLine($"Sleeping for {timeRemaining}");
await Task.Delay(timeRemaining, cancellationToken).ConfigureAwait(false);
Console.WriteLine($"Woke up, {DateTime.Now:HH:mm:ss}");
}
}
}
}
public class Engine
{
public async Task RunAsync()
{
var client = GetClient();
await client.CheckThrottleAsync(_cts.Token).ConfigureAwait(false);
await DoSomethingAsync(client.GraphClient).ConfigureAwait(false);
}
}
I can think of other ways to use like lock or Semaphore but again, I'm not sure if I'm thinking about this correctly.
I believe you can use a Graph Developer proxy to test these.
Microsoft Graph Developer Proxy aims to provide a better way to test applications that use Microsoft Graph. Using the proxy to simulate errors, mock responses, and demonstrate behaviors like throttling, developers can identify and fix issues in their code early in the development cycle before they reach production.
More details can be found here https://github.com/microsoftgraph/msgraph-developer-proxy
We use Polly to automatically retry failed http calls. It also has support for exponentially backing of.
So why not handle the error in a way that works for your application, instead of trying to figure out what the limit it before hand (and doing an extra call counting to the limit). You can test those scenarios with the Graph Developer proxy from the other answer.
We also use a circuit breaker to fail quick without extra call to graph and retry later.
The 4 concurrent requests you’re mentioning are for Outlook resources (I’ve written many user voices, GitHub issues and escalations on it). Since Q4 2022, you can do 20 requests in a batch for all resources (in the same tenant). Batching reduces the http overhead and might help you to overcome throttling limits, by combining requests in a smart way.
During startup I basically add an HttpClient like this:
services.AddHttpClient<IHttpClient, MyHttpClient>().AddPolicyHandler(GetRetryPolicy());
public IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
HttpPolicyExtensions
.HandleTransientHttpError()
.OrResult(message => message.StatusCode == HttpStatusCode.NotFound)
.WaitAndRetryAsync(GetBackOffDelay(options),
onRetry: (result, timespan, retryAttempt, context) =>
{
context.GetLogger()?.LogWarning($"Failure with status code {result.Result.StatusCode}. Retry attempt {retryAttempt}. Retrying in {timespan}.");
}));
}
How can I test that the retry policy works as expected? I've tried writing a test like this:
public async Task MyTest()
{
var policy = GetMockRetryPolicy(); // returns the policy shown above.
var content = HttpStatusCode.InternalServerError;
var services = new ServiceCollection();
services.AddHttpClient<IHttpClient, MyFakeClient>()
.AddPolicyHandler(policy);
var client = (MyFakeClient)services.BuildServiceProvider().GetRequiredService<IHttpClient>();
await client.Post(new Uri("https://someurl.com")), content);
// Some asserts that don't work right now
}
For reference here's the bulk of my Post method on MyFakeClient:
if(Enum.TryParse(content.ToString(), out HttpStatusCode statusCode))
{
if(statusCode == HttpStatusCode.InternalServerError)
{
throw new HttpResponseException(new Exception("Internal Server Error"), (int)HttpStatusCode.InternalServerError);
}
}
The MyFakeClient has a Post method that checks to see if the content is an HttpStatusCode and if it's an internal server error throws an HttpResponseException. At the moment, this creates the correct client and triggers the post fine. It throws the HttpResponseException but in doing so exits the test rather than using the retry policy.
How can I get it to use the retry policy in the test?
Update
I followed Peter's advice and went down the integration test route and managed to get this to work using hostbuilder and a stub delegating handler. In the stub handler I pass in a custom header to enable me to read the retry count back out of the response. The retry count is just a property on the handler that gets incremented every time it's called, so it's actually the first attempt, plus all following retries. That means if the retry count is 3, you should expect 4 as the value.
The thing is you can't really unit test this:
The retry is registered on the top of the HttpClient via the DI. When you are unit testing then you are not relying on the DI rather on individual components. So, integration testing might be more suitable for this. I've already detailed how can you do that via WireMock.Net. The basic idea is to create a local http server (mocking the downstream system) with predefined response sequence.
After you have defined the retry policy (with the retry count, time penalties) you can not retrieve them easily. So, from a unit testing perspective it is really hard to make sure that the policy has been defined correctly (like the delay is specified in seconds, not in minutes). I've already created a github issue for that, but unfortunately the development of the new version has been stuck.
The Durable Functions documentation specifies the following pattern to set up automatic handling of retries when an exception is raised within an activity function:
public static async Task Run(DurableOrchestrationContext context)
{
var retryOptions = new RetryOptions(
firstRetryInterval: TimeSpan.FromSeconds(5),
maxNumberOfAttempts: 3);
await ctx.CallActivityWithRetryAsync("FlakyFunction", retryOptions, "ABC");
// ...
}
However I can't see a way to check which retry you're up to within the activity function:
[FunctionName("FlakyFunction")]
public static string[] MyFlakyFunction(
[ActivityTrigger] string id,
ILogger log)
{
// Is there a built-in way to tell what retry attempt I'm up to here?
var retry = ??
DoFlakyStuffThatMayCauseException();
}
EDIT: I know it can probably be handled by mangling some sort of count into the RetryOptions.Handle delegate, but that's a horrible solution. It can be handled manually by maintaining an external state each time it's executed, but given that there's an internal count of retries I'm just wondering if there's any way to access that. Primary intended use is debugging and logging, but I can think of many other uses.
There does not seem to be a way to identify the retry. Activity functions are unaware of state and retries. When the CallActivityWithRetryAsync call is made the DurableOrchestrationContext calls the ScheduleWithRetry method of the OrchestrationContext class inside the DurableTask framework:
public virtual Task<T> ScheduleWithRetry<T>(string name, string version, RetryOptions retryOptions, params object[] parameters)
{
Task<T> RetryCall() => ScheduleTask<T>(name, version, parameters);
var retryInterceptor = new RetryInterceptor<T>(this, retryOptions, RetryCall);
return retryInterceptor.Invoke();
}
There the Invoke method on the RetryInterceptor class is called and that does a foreach loop over the maximum number of retries. This class does not expose properties or methods to obtain the number of retries.
Another workaround to help with debugging could be logging statements inside the activity function. And when you're running it locally you can put in a breakpoint there to see how often it stops there. Note that there is already a feature request to handle better logging for retries. You could add your feedback there or raise a new issue if you feel that's more appropriate.
To be honest, I think it's good that an activity is unaware of state and retries. That should be the responsibility of the orchestrator. It would be useful however if you could get some statistics on retries to see if there is a trend in performance degradation.
I have the following policy in a PolicyRegistry to be reused globally:
var fallbackPolicy = Policy
.Handle<DrmException>().OrInner<DrmException>()
.Fallback(
fallbackAction: () => { //should commit or dispose the transaction here using a passed in Func or Action },
onFallback: (exception) => { Log.Error().Exception(exception).Message($"Exception occurred, message: {exception.Message}.").Write(); }
);
I have the following code which I want to implement the fallbackPolicy in:
if(Settings.DRM_ENABLED)
drmManager.ExecuteAsync(new DeleteUser(123).Wait();//HTTP Call, throws DrmException if unsuccessful
//in some cases, there is an if(transaction == null) here (if transaction was passed as a parameter and needs to be committed here)
transaction.Commit();//if not thrown - commits the transaction
I would like it to look something like this:
var fallbackPolicy = Policy
.Handle<DrmException>().OrInner<DrmException>()
.Fallback(
fallbackAction: (transaction) => { transaction.Dispose(); },
onFallback: (exception) => { Log.Error().Exception(exception).Message($"Exception occurred, message: {exception.Message}.").Write(); }
);
fallbackPolicy.Execute(() => drmManager.ExecuteAsync(new DeleteUser(123).Wait(), transaction)
As far as I understand the fallbackPolicy.Execute takes Action/Func to be carried out which either succeeds, in which case the fallbackPolicy is not hit, or fails, in which case the fallbackPolicy kicks in with some predefined fallbackAction.
What I would like to do is to pass in two handlers (onFail(transaction) which disposes the transaction and onSuccess(transaction) which commits the transaction) when executing the policy. Is there an easier way of doing it instead of wrapping it or using a Polly's context?
Feels like there are a few separate questions here:
How can I make a centrally-defined FallbackPolicy do something dynamic?
How can I make one FallbackPolicy do two things?
With Polly in the mix, how can I do one thing on overall failure and another on overall success?
I'll answer these separately to give you a full toolkit to build your own solution - and for future readers - but you'll probably not need all three to achieve your goal. Cut to 3. if you just want a solution.
1. How can I make a centrally-defined FallbackPolicy do something dynamic?
For any policy defined centrally, yes Context is the way you can pass in something specific for that execution. References: discussion in a Polly issue; blog post.
Part of your q seems around making the FallbackPolicy both log; and deal with the transaction. So ...
2. How can I make one FallbackPolicy do two things?
You can pass in something dynamic (per above). Another option is use two different fallback policies. You can use the same kind of policy multiple times in a PolicyWrap. So you could define a centrally-stored FallbackPolicy to do just the logging, and keep it simple, non-dynamic:
var loggingFallbackPolicy = Policy
.Handle<DrmException>().OrInner<DrmException>()
.Fallback(fallbackAction: () => { /* maybe nothing, maybe rethrow - see discussion below */ },
onFallback: (exception) => { /* logging; */ });
Then you can define another FallbackPolicy locally to roll back the transaction on failure. Since it's defined locally, you could likely just pass the transaction variable in to its fallbackAction: using a closure (in which case you don't have to use Context).
Note: If using two FallbackPolicys in a PolicyWrap, you'd need to make the inner FallbackPolicy rethrow (not swallow) the handled exception, so that the outer FallbackPolicy also handles it.
Re:
What I would like to do is to pass in two handlers (onFail(transaction) which
disposes the transaction and onSuccess(transaction) which commits the transaction)
There isn't any policy which offers special handling on success, but:
3. With Polly in the mix, how can I do one thing on overall failure and another on overall success?
Use .ExecuteAndCapture(...). This returns a PolicyResult with property .Outcome == OutcomeType.Successful or OutcomeType.Failure (and other info: see documentation)
So overall, something like:
var logAndRethrowFallbackPolicy = Policy
.Handle<DrmException>().OrInner<DrmException>()
.Fallback(fallbackAction: (exception, context, token) => {
throw exception; // intentional rethrow so that the 'capture' of ExecuteAndCapture reacts. Use ExceptionDispatchInfo if you care about the original call stack.
},
onFallback: (exception, context) => { /* logging */ });
At execution site:
PolicyResult result = myPolicies.ExecuteAndCapture(() => ... ); // where myPolicies is some PolicyWrap with logAndRethrowFallbackPolicy outermost
if (result.Outcome == OutcomeType.Successful)
{ transaction.Commit(); }
else
{ transaction.Dispose(); }
I am working on a small app to translate and import a large amount of data from one database to another. To do this, I'm using Entity Framework and some custom extensions to commit a page of items at a time, in batches of 1000 or so. Since this can take a while, I was also interested in making sure the whole thing wouldn't grind to a halt if there is a hiccup in the connection while it's running.
I chose the Transient Fault Handling Application block, part of Enterprise Library 5.0, following this article (see Case 2: Retry Policy With Transaction Scope). Here is an example of my implementation in the form of an ObjectContext extension, which simply adds objects to the context and tries to save them, using a Retry Policy focused on Sql Azure stuff:
public static void AddObjectsAndSave<T>(this ObjectContext context, IEnumerable<T> objects)
where T : EntityObject
{
if(!objects.Any())
return;
var policy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>
(10, TimeSpan.FromSeconds(10));
var tso = new TransactionOptions();
tso.IsolationLevel = IsolationLevel.ReadCommitted;
var name = context.GetTableName<T>();
foreach(var item in objects)
context.AddObject(name, item);
policy.ExecuteAction(() =>
{
using(TransactionScope ts = new TransactionScope(TransactionScopeOption.Required, tso))
{
context.SaveChanges();
ts.Complete();
}
});
}
The code works great, until I actually test the Retry Policy by pausing my local instance of Sql Server while it's running. It almost immediately poops, which is odd. You can see that I've got the policy configured to try again in ten second intervals; either it is ignoring the interval or failing to catch the error. I suspect the latter, but I'm new to this so I don't really know.
I suspect that the SqlAzureTransientErrorDetectionStrategy does not include the error your are simulating. This policy implements specific errors thrown by SQL Azure. Look at this code to find out which errors are implemented by this policy: http://code.msdn.microsoft.com/Reliable-Retry-Aware-BCP-a5ae8e40/sourcecode?fileId=22213&pathId=1098196556
To handle the error you are trying to catch, you could implement your own strategy by implementing the ITransientErrorDetectionStrategy interface.