Using Azure WebJob Timeout on Continuous Job with TimerTrigger - c#

I have a continuous WebJob with a function using the TimerTrigger to run a process every 30 seconds. A particular call in the function occasionally and seemingly randomly hangs, causing the webjob to wait indefinitely. Current solution is notice the service has stopped, then log into the Azure Dashboard and abort it manually.
Note that I know the correct course of action is to identify the root cause and fix it. Trust me, we're working on that. In the mean time, I want to treat the symptom, and need help doing so.
I'm attempting to have the WebJob detect if status using the Timeout decorator as described in this post on the Azure WebJobs SDK: https://github.com/Azure/azure-webjobs-sdk/issues/590. Implementing the suggestion, I'm able to see that when the problematic call hangs, the Timeout is detected, but the WebJob still doesn't die. What I doing wrong here that won't kill the function to allow subsequent invocations?
Program.cs
static void Main()
{
var config = new JobHostConfiguration();
config.UseTimers();
config.FunctionTimeout = new TimeSpan(0, 15, 0);
var host = new JobHost(config);
Functions.Initialize();
host.RunAndBlock();
}
Functions.cs
[Singleton]
[Timeout("00:05:00")]
public async static Task PeriodicProcess([TimerTrigger("00:00:30", RunOnStartup = true)] TimerInfo timer, CancellationToken cancelToken, TextWriter log)
{
log.WriteLine("-- Processing Begin --");
List<Emails> cases = GetEmailsAndWhatNot();
foreach (Email e in Emails)
{
try
{
ProblematicFunction_SendEmail(e, log);
}
catch(Exception e)
{
// do stuff
}
}
log.WriteLine("-- Processing End -- ");
}
public static void ProblematicFunction_SendEmail(Email e, TextWriter log)
{
// send email
}
WebJob Output During Issues
-- Processing Begin --
Timeout value of 00:05:00 exceeded by function 'Functions.PeriodicProcess' (Id: '0f7438bd-baad-451f-95a6-9461f35bfb2d'). Initiating cancellation.
Despite the webjob initiating cancellation, the function doesn't die. Do I need to monitor the CancellationToken? How far down do I need to propogate asynchronous calling? What am I missing here that will actually abort the process?

As TimerTrigger states about TimerTrigger:
Singleton Locks
TimerTrigger uses the Singleton feature of the WebJobs SDK to ensure that only a single instance of your triggered function is running at any given time.
Scheduling
If your function execution takes longer than the timer interval, another execution won't be triggered until after the current invocation completes. The next execution is scheduled after the current execution completes.
Here is my test for this scenario, you could refer to it:
Use CancellationToken.None and never propogate the cancellation token
Note: The function PeriodicProcess would be time out after 30 s, but the Time-consuming job is still running, and after the long-running job has done, the Processing End log would be printed.
Propogate the cancellation token
Note: If we propogate the cancellation token, the Time-consuming job would be cancelled immediately.
Code snippet
[Timeout("00:00:30")]
[Singleton]
public async static Task PeriodicProcess([TimerTrigger("00:00:10", RunOnStartup = true)] TimerInfo timer, CancellationToken cancelToken, TextWriter log)
{
log.WriteLine($"-- [{DateTime.Now.ToString()}] Processing Begin --");
try
{
await longRunningJob(log, cancelToken);
}
catch (Exception e)
{
// do stuff
}
log.WriteLine($"-- [{DateTime.Now.ToString()}] Processing End -- ");
}
private async static Task longRunningJob(TextWriter log, CancellationToken cancelToken)
{
log.WriteLine($"-- [{DateTime.Now.ToString()}] Begin Time-consuming jobs --");
await Task.Delay(TimeSpan.FromMinutes(1), cancelToken);
log.WriteLine($"-- [{DateTime.Now.ToString()}] Complete Time-consuming jobs --");
}

Related

Why the "runtimeStatus" in "statusQueryGetUri" not set immediately after timer is finished?

Why the "runtimeStatus" is set to "Completed" only after 52 seconds not 30 as I set in context.CreateTimer() function when checking it with statusQueryGetUri http request?
The documentation that I used
My Code
[FunctionName("H")]
public static async Task<HttpResponseMessage> Start([HttpTrigger(AuthorizationLevel.Anonymous, "get","post",Route = "route/{route}")] HttpRequestMessage req, [DurableClient] IDurableOrchestrationClient client, string route)
{
string id = await client.StartNewAsync("Or1");
return client.CreateCheckStatusResponse(req, id);
}
[FunctionName("Or1")]
public static async Task<string> Or1([OrchestrationTrigger] IDurableOrchestrationContext context, ILogger logger)
{
using (CancellationTokenSource cts = new CancellationTokenSource())
{
DateTime endTime = context.CurrentUtcDateTime.AddSeconds(30);
logger.LogInformation($"*********time now {context.CurrentUtcDateTime}");
logger.LogInformation($"*********end Time {endTime}");
await context.CreateTimer(endTime, cts.Token);
logger.LogInformation($"*********end Time finish {context.CurrentUtcDateTime}");
return "timer finished";
}
}
[FunctionName("Activity1")]
public static async Task A1([ActivityTrigger] IDurableActivityContext context)
{
//Do something
}
The Log
Functions:
H: [GET,POST] http://localhost:7071/api/route/{route}
Activity1: activityTrigger
Or1: orchestrationTrigger
For detailed output, run func with --verbose flag.
[2021-01-13T16:17:06.841Z] Host lock lease acquired by instance ID '000000000000000000000000EB8F9C93'.
[2021-01-13T16:17:24.767Z] Executing 'H' (Reason='This function was programmatically called via the host APIs.', Id=0aeee0e1-6148-4c21-9aa9-d17a43bce8d1)
[2021-01-13T16:17:24.925Z] Executed 'H' (Succeeded, Id=0aeee0e1-6148-4c21-9aa9-d17a43bce8d1, Duration=164ms)
[2021-01-13T16:17:24.995Z] Executing 'Or1' (Reason='(null)', Id=6aa97b04-d526-41b1-9532-afb21c088b18)
[2021-01-13T16:17:25.006Z] *********time now 1/13/2021 4:17:24 PM
[2021-01-13T16:17:25.007Z] *********endTime 1/13/2021 4:17:54 PM
[2021-01-13T16:17:25.017Z] Executed 'Or1' (Succeeded, Id=6aa97b04-d526-41b1-9532-afb21c088b18, Duration=23ms)
[2021-01-13T16:18:16.476Z] Executing 'Or1' (Reason='(null)', Id=9749d719-5789-419a-908f-6523cf497cca)
[2021-01-13T16:18:16.477Z] *********time now 1/13/2021 4:17:24 PM
[2021-01-13T16:18:16.478Z] *********endTime 1/13/2021 4:17:54 PM
[2021-01-13T16:18:16.481Z] *********endTime finish 1/13/2021 4:18:16 PM
[2021-01-13T16:18:16.485Z] Executed 'Or1' (Succeeded, Id=9749d719-5789-419a-908f-6523cf497cca, Duration=9ms)
The azure Orchestrater works on Queue polling which is implemented as a random exponential back-off algorithm to reduce the effect of idle-queue polling on storage transaction costs. When a message is found, the runtime immediately checks for another message; when no message is found, it waits for a period of time before trying again. After subsequent failed attempts to get a queue message, the wait time continues to increase until it reaches the maximum wait time, which defaults to 30 seconds.
If see your logs, you can notice that Orchestrater has triggered Timer at 16:17:24 and when it was finished at 16:17:54 a message was added in the storage queue. As mentioned above due to queue polling it seems that the message was picked at 16:18:16 to resume the orchestration execution.
I believe if you trigger the durable function multiple times then you will notice the total time to finish the orchestration would be different for each instance.
You can read about Azure function orchestration queue polling at here.
You can also check the history table to understand when a message was queued and when picked. Read about it at here.
To show how queuing works you can stop the function as soon timer is triggered. Following is the output in my local environment emulator queue which displays that a message is queued when timer is triggered
Now when Orchestrator function resumes again then it polls the message from queue and pick it to process further.
Note - in my local environment, I tried couple of time with your code as I noticed all instances finishes in ~30 secs.

Must StartNewAsync be awaited on the orchestration client?

I have an Azure orchestration where the orchestration client, which triggers the orchestration, threw a timeout exception.
The orchestration client function only does two things, starting two orchestrations, awaiting each as most example code suggest to.
await orchestrationClient.StartNewAsync("TableOrchestrator", updates);
await orchestrationClient.StartNewAsync("ClientOrchestrator", clientExport);
However, as I understand then the orchestration client is not a special function like the orchestration functions, so it can only run for a max of 10 minutes.
Obviously there is a high chance that the combined run time of my two orchestrations exceeds 10 minutes in total.
Questions:
Is the orchestration client state saved like the actual orchestration functions?
Do I need to await the orchestrations they do not depend on previous orchestration results?
Update Made a complete example of what my code does, and the runtimes as shown below.
It seems that starting an orchestration will await it if there is code written after, but not if the orchestration is the last statement!
Updated Questions:
Will any code after calling StartNewAsync() make the function await till the orchestration really finishes? or will e.g. log statements not trigger this behaviour?
Is it the recommended code practice that StartNewAsync() should only be called after all other code has executed?
.
public static class testOrchestration
{
[FunctionName("Start")]
public static async Task Start([TimerTrigger("0 */30 * * * *", RunOnStartup = true, UseMonitor = false)]TimerInfo myStartTimer, [OrchestrationClient] DurableOrchestrationClient orchestrationClient, ILogger log)
{
var startTime = DateTime.Now;
log.LogInformation(new EventId(0, "Startup"), "Starting Orchestror 1 ***");
await orchestrationClient.StartNewAsync("Orchestrator", "ONE");
log.LogInformation($"Elapsed time, await ONE: {DateTime.Now - startTime}");
await Task.Delay(5000);
log.LogInformation($"Elapsed time, await Delay: {DateTime.Now - startTime}");
log.LogInformation(new EventId(0, "Startup"), "Starting Orchestror 2 ***");
await orchestrationClient.StartNewAsync("Orchestrator", "TWO");
log.LogInformation($"Elapsed time, await TWO: {DateTime.Now - startTime}");
}
[FunctionName("Orchestrator")]
public static async Task<string> TestOrchestrator([OrchestrationTrigger] DurableOrchestrationContextBase context, ILogger log)
{
var input = context.GetInput<string>();
log.LogInformation($"Running {input}");
await Task.Delay(5000);
return $"Done {input}";
}
}
Running this gives me the following output:
Starting Orchestror 1 ***
Elapsed time, await ONE: 00:00:08.5445755
Running ONE
Elapsed time, await Delay: 00:00:13.5541264
Starting Orchestror 2 ***
Elapsed time, await TWO: 00:00:13.6211995
Running TWO
StartNewAsync() just schedules the orchestrators to be started (immediately). To await those calls does not mean that your initial function will really wait for the orchestrators to run - or even to actually start and finish its work.
The StartNewAsync (.NET) or startNew (JavaScript) method on the
orchestration client binding starts a new instance. Internally, this
method enqueues a message into the control queue, which then triggers
the start of a function with the specified name that uses the
orchestration trigger binding.
This async operation completes when the orchestration process is
successfully scheduled
Source
This async operation completes when the orchestration process is successfully scheduled.
So yes: You should await those calls (can also be done in parallel as Miguel suggested). But it will not take longer than a few milliseconds.
If they don't depend on each other, you can run them in parallel using:
var t1 = orchestrationClient.StartNewAsync("TableOrchestrator", updates);
var t2 = orchestrationClient.StartNewAsync("ClientOrchestrator", clientExport);
await Task.WhenAll(t1, t2);

Wait for RabbitMQ Threads to finish in Windows Service OnStop()

I am working on a windows service written in C# (.NET 4.5, VS2012), which uses RabbitMQ (receiving messages by subscription). There is a class which derives from DefaultBasicConsumer, and in this class are two actual consumers (so two channels). Because there are two channels, two threads handle incoming messages (from two different queues/routing keys) and both call the same HandleBasicDeliver(...) function.
Now, when the windows service OnStop() is called (when someone is stopping the service), I want to let both those threads finish handling their messages (if they are currently processing a message), sending the ack to the server, and then stop the service (abort the threads and so on).
I have thought of multiple solutions, but none of them seem to be really good. Here's what I tried:
using one mutex; each thread tries to take it when entering HandleBasicDeliver, then releases it afterwards. When OnStop() is called, the main thread tries to grab the same mutex, effectively preventing the RabbitMQ threads to actually process any more messages. The disadvantage is, only one consumer thread can process a message at a time.
using two mutexes: each RabbitMQ thread has uses a different mutex, so they won't block each other in the HandleBasicDeliver() - I can differentiate which
thread is actually handling the current message based on the routing key. Something like:
HandleBasicDeliver(...)
{
if(routingKey == firstConsumerRoutingKey)
{
// Try to grab the mutex of the first consumer
}
else
{
// Try to grab the mutex of the second consumer
}
}
When OnStop() is called, the main thread will try to grab both mutexes; once both mutexes are "in the hands" of the main thread, it can proceed with stopping the service. The problem: if another consumer would be added to this class, I'd need to change a lot of code.
using a counter, or CountdownEvent. Counter starts off at 0, and each time HandleBasicDeliver() is entered, counter is safely incremented using the Interlocked class. After the message is processed, counter is decremented. When OnStop() is called, the main thread checks if the counter is 0. Should this condition be fulfilled, it will continue. However, after it checks if counter is 0, some RabbitMQ thread might begin to process a message.
When OnStop() is called, closing the connection to the RabbitMQ (to make sure no new messages will arrive), and then waiting a few seconds ( in case there are any messages being processed, to finish processing) before closing the application. The problem is, the exact number of seconds I should wait before shutting down the apllication is unknown, so this isn't an elegant or exact solution.
I realize the design does not conform to the Single Responsibility Principle, and that may contribute to the lack of solutions. However, could there be a good solution to this problem without having to redesign the project?
We do this in our application, The main idea is to use a CancellationTokenSource
On your windows service add this:
private static readonly CancellationTokenSource CancellationTokenSource = new CancellationTokenSource();
Then in your rabbit consumers do this:
1. change from using Dequeue to DequeueNoWait
2. have your rabbit consumer check the cancellation token
Here is our code:
public async Task StartConsuming(IMessageBusConsumer consumer, MessageBusConsumerName fullConsumerName, CancellationToken cancellationToken)
{
var queueName = GetQueueName(consumer.MessageBusConsumerEnum);
using (var model = _rabbitConnection.CreateModel())
{
// Configure the Quality of service for the model. Below is how what each setting means.
// BasicQos(0="Don't send me a new message until I’ve finished", _fetchSize = "Send me N messages at a time", false ="Apply to this Model only")
model.BasicQos(0, consumer.FetchCount.Value, false);
var queueingConsumer = new QueueingBasicConsumer(model);
model.BasicConsume(queueName, false, fullConsumerName, queueingConsumer);
var queueEmpty = new BasicDeliverEventArgs(); //This is what gets returned if nothing in the queue is found.
while (!cancellationToken.IsCancellationRequested)
{
var deliverEventArgs = queueingConsumer.Queue.DequeueNoWait(queueEmpty);
if (deliverEventArgs == queueEmpty)
{
// This 100ms wait allows the processor to go do other work.
// No sense in going back to an empty queue immediately.
// CancellationToken intentionally not used!
// ReSharper disable once MethodSupportsCancellation
await Task.Delay(100);
continue;
}
//DO YOUR WORK HERE!
}
}
Usually, how we ensure a windows service not stop before processing completes is to use some code like below. Hope that help.
protected override void OnStart(string[] args)
{
// start the worker thread
_workerThread = new Thread(WorkMethod)
{
// !!!set to foreground to block windows service be stopped
// until thread is exited when all pending tasks complete
IsBackground = false
};
_workerThread.Start();
}
protected override void OnStop()
{
// notify the worker thread to stop accepting new migration requests
// and exit when all tasks are completed
// some code to notify worker thread to stop accepting new tasks internally
// wait for worker thread to stop
_workerThread.Join();
}

How to cancel long-running but single-threaded operations

I see a lot of options for canceling a long-running operation in C#, but each example seems to talk about cancelling parallel (multithreaded) operations or are overly-simple examples, or involve periodically polling for whether a request to cancel the operation was submitted. I don't think that will work here.
I have a method BuildZipFile() which, for now, takes no arguments, but I suspect might need a CancellationToken argument. Calling this method does the following. BuildZipFile() blocks; execution on the thread doesn't resume until it's done with its work.
Files are extracted and added to a zip file. This operation is so quick that I don't want it to be cancelable. If the user requests a cancel, it should ignore the request until the operation is complete, and then skip the rest of BuildZipFile() and return (or throw an exception; doesn't matter).
Files are processed using something called a "pipeline." This operation does take a long time and the user should be able to cancel it. To start this processing, BuildZipFile() calls a non-blocking method Start() on the pipeline. A pipeline raises Done when it's done with its work, so I use an AutoResetEvent to block the method until I hear that event, and then release the block.
Some more operations similar to item #1: quick-running operations that should not support cancelling.
Here's an overly-simplified implementation:
public void BuildZipFile()
{
// single-threaded operation that is quick and can't be canceled
DoQuickUncancelableThings();
// and now a long-running operation that the user SHOULD be able to cancel;
// it must be possible to interrupt the AutoResetEvent
var pipeline = GimmeAPipeline();
var reset = new AutoResetEvent(false);
// when the pipeline raises Done, stop blocking the method and resume execution
pipeline.Done += () => reset.Set();
// define the work to be done
ThreadPool.QueueUserWorkItem(state => pipeline.Start());
// call pipeline.Start() and block the thread until pipeline.Done is raised
reset.WaitOne();
// ...and more quick operations that can't be canceled
DoMoreQuickUncancelableThings();
}
Note that in reality, that middle block of code exists in another class which this one calls.
I can stop the pipeline in this method by calling pipeline.Stop() which will indirectly raise the Done event once the request to stop it was handled.
So, how can I modify BuildZipFile() to support user cancellation? My instinct is to add support for catching an OperationCanceledException, but that would allow those quick operations to cancel too, wouldn't it? And, I can't poll for a cancellation request unless I'm missing something because I'm waiting for that Done event from pipeline to be raised, and the last thing I want to do is poll using a timer to interrupt it.
I have no issues with modifying BuildZipFile() to become non-blocking, but the steps within it are very linear. Step #2 can't even start until step #1 is done; the process can't be made parallel. I cannot change how pipelines work; they must remain asynchronous and raise events when they're done.
I'm using .NET 4.5 in a Windows Forms application so I can use pretty much any framework feature I need.
I think you should use Tasks to do want you want.
Check this msdn article,it is very usefull
http://msdn.microsoft.com/en-us/library/dd537609(v=vs.110).aspx
Here is a full example in a console application
using System;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApplication4
{
class Program
{
static CancellationTokenSource tokenSource2;
static CancellationToken ct;
static void Main(string[] args)
{
tokenSource2 = new CancellationTokenSource();
ct = tokenSource2.Token;
Task myTask = new Task(BuildZipFile);
myTask.Start();
Console.WriteLine("Press enter to cancel");
Console.ReadLine();
tokenSource2.Cancel();
Console.ReadLine();
}
public static void BuildZipFile()
{
Task quick1 = new Task(DoQuickUncancelableThings);
quick1.ContinueWith(ant => DoLongRunnignThings(), ct).ContinueWith(ant => DoMoreQuickUncancelableThings());
quick1.Start();
}
private static void DoMoreQuickUncancelableThings()
{
Console.WriteLine("Q2");
}
private static void DoLongRunnignThings()
{
for (int i = 0; i < 10; i++)
{
System.Threading.Thread.Sleep(1000);
ct.ThrowIfCancellationRequested();
}
Console.WriteLine("Long ended");
}
private static void DoQuickUncancelableThings()
{
Console.WriteLine("Q1");
}
}
}

Polling the right way?

I am a software/hardware engineer with quite some experience in C and embedded technologies. Currently i am busy with writing some applications in C# (.NET) that is using hardware for data acquisition. Now the following, for me burning, question:
For example: I have a machine that has an endswitch for detecting the final position of an axis. Now i am using a USB Data acquisition module to read the data. Currently I am using a Thread to continuously read the port-status.
There is no interrupt functionality on this device.
My question: Is this the right way? Should i use timers, threads or Tasks? I know polling is something that most of you guys "hate", but any suggestion is welcome!
IMO, this heavily depends on your exact environment, but first off - You should not use Threads anymore in most cases. Tasks are the more convenient and more powerful solution for that.
Low polling frequency: Timer + polling in the Tick event:
A timer is easy to handle and stop. No need to worry about threads/tasks running in the background, but the handling happens in the main thread
Medium polling frequency: Task + await Task.Delay(delay):
await Task.Delay(delay) does not block a thread-pool thread, but because of the context switching the minimum delay is ~15ms
High polling frequency: Task + Thread.Sleep(delay)
usable at 1ms delays - we actually do this to poll our USB measurement device
This could be implemented as follows:
int delay = 1;
var cancellationTokenSource = new CancellationTokenSource();
var token = cancellationTokenSource.Token;
var listener = Task.Factory.StartNew(() =>
{
while (true)
{
// poll hardware
Thread.Sleep(delay);
if (token.IsCancellationRequested)
break;
}
// cleanup, e.g. close connection
}, token, TaskCreationOptions.LongRunning, TaskScheduler.Default);
In most cases you can just use Task.Run(() => DoWork(), token), but there is no overload to supply the TaskCreationOptions.LongRunning option which tells the task-scheduler to not use a normal thread-pool thread.
But as you see Tasks are easier to handle (and awaitable, but does not apply here). Especially the "stopping" is just calling cancellationTokenSource.Cancel() in this implementation from anywhere in the code.
You can even share this token in multiple actions and stop them at once. Also, not yet started tasks are not started when the token is cancelled.
You can also attach another action to a task to run after one task:
listener.ContinueWith(t => ShutDown(t));
This is then executed after the listener completes and you can do cleanup (t.Exception contains the exception of the tasks action if it was not successful).
IMO polling cannot be avoided.
What you can do is create a module, with its independent thread/Task that will poll the port regularly. Based on the change in data, this module will raise the event which will be handled by the consuming applications
May be:
public async Task Poll(Func<bool> condition, TimeSpan timeout, string message = null)
{
// https://github.com/dotnet/corefx/blob/3b24c535852d19274362ad3dbc75e932b7d41766/src/Common/src/CoreLib/System/Threading/ReaderWriterLockSlim.cs#L233
var timeoutTracker = new TimeoutTracker(timeout);
while (!condition())
{
await Task.Yield();
if (timeoutTracker.IsExpired)
{
if (message != null) throw new TimeoutException(message);
else throw new TimeoutException();
}
}
}
Look into SpinWait or into Task.Delay internals either.
I've been thinking about this and what you could probably do is build an abstraction layer on utilizing Tasks and Func, Action with the Polling service taking in the Func, Action and polling interval as args. This would keep the implementation of either functionality separate while having them open to injection into the polling service.
So for example you'd have something like this serve as your polling class
public class PollingService {
public void Poll(Func<bool> func, int interval, string exceptionMessage) {
while(func.Invoke()){
Task.Delay(interval)
}
throw new PollingException(exceptionMessage)
}
public void Poll(Func<bool, T> func, T arg, int interval, string exceptionMessage)
{
while(func.Invoke(arg)){
Task.Delay(interval)
}
throw new PollingException(exceptionMessage)
}
}

Categories

Resources