Event Hub input binding for Azure Functions - c#

I have an Azure Function with an input binding to an Event Hub.
public static async Task Run(TraceWriter log, string eventHubMessage)
When the function is triggered, how many messages does it receive per execution by default?
Is it 1 execution = 1 message?
I have read the documentation and understand you can set these properties in the function's host.json file:
"eventHub": {
// The maximum event count received per receive loop. The default is 64.
"maxBatchSize": 64,
// The default PrefetchCount that will be used by the underlying EventProcessorHost.
"prefetchCount": 256
}
Does maxBatchSize mean I will receive 64 messages in 1 execution?

By default it's going to be 1 by 1 processing, but you can do batches too. Change the signature of your function to
public static async Task Run(TraceWriter log, string[] eventHubMessages)
(if you change the name like I did, rename the binding parameter too)
Reference github issue.

#Mikhail is correct. I'd just like to add the following:
If you use the default EventHub-Trigger C# template, the Function created will process 1 message per execution.
If you need each execution to process in batches, change the following:
a. In function.json, add the property "cardinality":"many" as shown here.
b. In run.csx, modify Function signature and process messages in a loop, e.g.,
public static async Task Run(TraceWriter log, string[] eventHubMessages)
{
foreach(string message in eventHubMessages)
{
// process messages
}
}
The host.json configuration you specified in the question allows you to experiment with the correct batch size and prefetch buffer to meet the needs of your workflow.
Additional comments:
Under the Consumption Plan, a Function is currently allowed a max default 5-minute execution time (configurable up to 10 mins --Added on 11/30/2017). You should experiment with the maxBatchSize and prefetchCount setting to ensure that a typical execution of the batch will complete within the timeframe.
The prefetchCount should be 3-4 times the maxBatchSize.
Each Function host instance is backed by a single EventProcessorHost (EPH). EPH uses a checkpointing mechanism to mark the last successfully processed message. A Function execution could terminate prematurely due to uncaught exceptions in the Function code host crashing, timeout or partition lease lost, resulting in an unsuccessful checkpoint. When the Function execution restarts again, the batch retrieved will have messages from the last known checkpoint. Setting a very high value for maxBatchSize will also mean that you must re-process a large batch. EventHub guarantees at-least-once delivery but not at-most-once delivery. Azure Functions will not attempt to change that behavior. If having only unique messages is a priority, you will need to handle de-duplication in your downstream workflows.

Related

Deferring and re-receiving a deferred message in an IHostBuilder hosted service

If the processing of an Azure Service Bus message depends on another resource, e.g. an API or a database service, and this resource is not available, not calling CompleteMessageAsync() is not an option, because the message will be immediately received again until the Max Delivery Count is reached, and then put into the DLQ. If an API is down for maintenance, we want to wait a bit before retrying.
One of the answers to this question has the general steps for deferring and receiving deferred messages. This is a little better than Microsoft's documentation, but not enough for me to understand the intent of the API, and how it is to be implemented in a hosted service that basically sits in ServiceBusProcessor.StartProcessingAsync all day long.
This is the basic structure of my service:
public class ServiceBusWatcher : IHostedService, IDisposable
{
public Task StartAsync(CancellationToken stoppingToken)
{
ReceiveMessagesAsync();
return Task.CompletedTask;
}
private async void ReceiveMessagesAsync()
{
ServiceBusClient client = new ServiceBusClient(connectionString);
processor = client.CreateProcessor(queueName, new ServiceBusProcessorOptions());
processor.ProcessMessageAsync += MessageHandler;
await processor.StartProcessingAsync();
}
async Task MessageHandler(ProcessMessageEventArgs args)
{
// a dependency is not available that allows me to process a message. so:
await args.DeferMessageAsync(args.Message);
Once the message is deferred, it is my understanding that the processor will not get to it anymore (or will it?). Instead, I have to use ReceiveDeferredMessageAsync() to receive it, along with the sequence number of the originally received message.
In my case, it will make sense to wait minutes or hours before trying again.
This could be done with a separate service that uses a timer and an explicit call to ReceiveDeferredMessageAsync(), as opposed to using a ServiceBusProcessor. I also suppose that the deferred message sequence numbers will have to be persisted in non-volatile storage so that they don't get lost.
Does this sound like a viable approach? I don't like having to remember its sequence numbers so that I can get to a message later. It goes against everything that using a message queue brings to the table in the first place.
Or, instead of deferring, I could just post a new "internal" message with the sequence number and use the ScheduledEnqueueTimeUtc property to delay receiving it. Once I receive this message, I could call ReceiveDeferredMessageAsync() with that sequence number to get to the original message. This seems elegant at the surface, but messages could quickly multiply if there is a longer outage of a dependency.
Another idea that could work without another service: I could complete and repost the payload of the message and set ScheduledEnqueueTimeUtc to a time in the future, as described in another answer to the question I mentioned earlier. Assuming that this works (Microsoft's documentation does not mention what this property is for), it seems simple and clean, and I like simple.
How have you solved this? Is there a better/preferred way that balances low complexity with high robustness without requiring a large amount of code?
Deferring a message works when you know what message you want to retrieve later and your receiver will have the message sequence number saved to retrieve the deferred message. If the receiver has no ability to save message sequence number, the delaying the message is a better option. Delaying a message will mean to copy the original message data into a newly scheduled one and completing the original message. That way the consumer doesn't have to neither hold on to the message sequence number nor initiate the retrieval of a specific message.

How do DocumentDBAttribute bindings respond to throttling?

I have azure functions (C# v1 functions--non scripted) that use DocumentDBAttribute bindings for both reading and writing documents. How do those bindings respond to throttling in the following situations?
Writing an item by adding it to an ICollector
Reading an item by providing an Id
This is for functions v1.
First case:
//input binding
[DocumentDB(ResourceNames.APCosmosDBName,
ResourceNames.EpisodeOfCareCollectionName,
ConnectionStringSetting = "APCosmosDB",
CreateIfNotExists = true)] ICollector<EOC> eoc,
//...
eoc.Add(new EOC()); //what happens here if throttling is occuring?
Second case:
[DocumentDB(ResourceNames.ORHCasesDBName, ResourceNames.ORHCasesCollectionName, ConnectionStringSetting = "ORHCosmosDBCases", CreateIfNotExists = true, Id = "{id}")] string closedCaseStr,
Both input and output bindings use CosmosDB SDK which has the retry mechanism in place.
By default, SDK retries 9 times on a throttled result, after that, the exception is bubbled and you Function will error. Depending on the trigger type, it will fail HTTP call, put the message back to the queue etc.
The retries respect the timing recommendation returned by Cosmos DB:
When a client is sending requests faster than the allowed rate, the service will return HttpStatusCode 429 (Too Many Request) to rate limit the client. The current implementation in the SDK will then wait for the amount of time the service tells it to wait and retry after the time has elapsed.
At the moment, there is no way to configure the bindings with a policy other than default.

Azure queue handling via ReceiveAsync returns null right away

The normal expected behaviour for the code below, would be that ReceiveAsync, looks at the Azure queue for up to 1 minute before returning null or a message if one is received. The intended use for this is to have an IoT hub resource, where multiple messages may be added to a queue intended for one of several DeviceClient objects. Each DeviceClient will continuously poll this queue to receive message intended for it. Messages for other DeviceClients are thus left in the queue for those others.
The actual behaviour is that ReceiveAsync is immediately returning null each time it's called, with no delay. This is regardless of the value that is given with TimeSpan - or if no parameters are given (and the default time is used).
So, rather than seeing 1 log item per minute, stating there was a null message received, I'm getting 2 log items per second (!). This behaviour is different from a few months ago,. so I started some research - with little result so far.
using Microsoft.Azure.Devices;
using Microsoft.Azure.Devices.Client;
public static TimeSpan receiveMessageWaitTime = new TimeSpan(0, 1 , 0);
Microsoft.Azure.Devices.Client.Message receivedMessage = null;
deviceClient = DeviceClient.CreateFromConnectionString(Settings.lastKnownConnectionString, Microsoft.Azure.Devices.Client.TransportType.Amqp);
// This code is within an infinite loop/task/with try/except code
if(deviceClient != null)
{
receivedMessage = await deviceClient.ReceiveAsync(receiveMessageWaitTime);
if(receivedMessage != null)
{
string Json = Encoding.ASCII.GetString(receivedMessage.GetBytes());
// Handle the message
}
else
{
// Log the fact that we got a null message, and try again later
}
await Task.Delay(500); // Give the CPU some time, this is an infinite loop after all.
}
I looked at the Azure hub, and noticed 8 messages in the queue. I then added 2 more, and neither of the new messages were received, and the queue is now on 10 items.
I did notice this question: Azure ServiceBus: Client.Receive() returns null for messages > 64 KB
But I have no way to see whether there is indeed a message that big currently in the queue (since receivemessage returns null...)
As such the questions:
Could you preview the messages in the queue?
Could you get a queue size, e.g. ask the number of messages in the queue before getting them?
Could you delete messages from the queue without getting them?
Could you create a callback based receive instead of an infinite loop? (I guess internally the code would just do a peek and the same as we are already doing)
Any help would be greatly appreciated.
If you use the Azure ServiceBus, I recommend that you could use the Service Bus Explorer to preview the message, get the number of message in the queue. And Also you could delete the message without getting them.

Do Webjobs automatically renew leases on Azure Queue messages?

When Webjobs get a message from a queue on Azure Storage via QueueTrigger, it leases the message (makes it invisible). If the triggering function (of webjob) takes a long time to process the message, is this lease automatically extended? Or should I handle that in the function?
On this link Windows Azure Queues: Improved Leases, Progress Tracking, and Scheduling of Future Work, the author states that "A lease on the message can be extended by the worker that did the original dequeue so that it can continue processing the message"
Note: I've tried a webjob (with a QueueTrigger) which waits for 20 minutes.
//Write Log
Thread.Sleep(1200000);
//Write Log
It is completed successfully. And during this time no other webjob instance try to attempt for the same queue item (It did not become visible). Therefore it seems that an auto-renew mechanism for leases exists. Anyhow I am waiting for an answer from a Microsoft employee or with an official link (msdn, azure, ...).
Yes, your lease is automatically extended. It is 10 minutes each time.
See this answer here [1] by a Microsoft employee referring to the docs and comments on azure.microsoft.com [2].
EDIT (long answer)
In addition, an examination of the source code, starting with the QueueListener class at https://github.com/Azure/azure-webjobs-sdk/blob/cfc875a7f00e595410c0603e6ca65537025490a9/src/Microsoft.Azure.WebJobs.Host/Queues/Listeners/QueueListener.cs indicates the same.
The code in QueueListener has the relevant parts on line 138, where the 10 minute visibilityTimeout variable is defined:
TimeSpan visibilityTimeout = TimeSpan.FromMinutes(10); // long enough to process the job
That variable is then passed along to ProcessMessageAsync, which starts a timer defined in the method CreateUpdateMessageVisibilityTimer with that same value. The 10 minute value is used to determine when the first and next update the visibility timeout also (by halving it and creating an instance of the LinearSpeedupStrategy class).
Eventually, in the class UpdateQueueMessageVisibilityCommand [3], you will find that the UpdateMessageAsync method on the queue is called with that same 10 minute renewal.
The LinearSpeedupStrategy will renew again after 5 minutes, unless the renewal failed in which case it will try again after 1 minute (as defined in QueueListener).
[1] Azure Storage Queue and multiple WebJobs instances: will QueueTrigger set the message lease time on triggered?
[2] https://azure.microsoft.com/en-us/documentation/articles/websites-dotnet-webjobs-sdk-get-started/
[3] https://github.com/Azure/azure-webjobs-sdk/blob/cfc875a7f00e595410c0603e6ca65537025490a9/src/Microsoft.Azure.WebJobs.Host/Queues/Listeners/UpdateQueueMessageVisibilityCommand.cs
You can use method(Java code):
queue.retrieveMessage()
to get a message from a queue on azure storage. It will be visible after 30 seconds by default.
If you want to extend the lease, you can use code below:
CloudQueueMessage updateMessage = queue.retrieveMessage();
EnumSet<MessageUpdateFields> updateFields = EnumSet.of(MessageUpdateFields.CONTENT, MessageUpdateFields.VISIBILITY);
queue.updateMessage(updateMessage, 60, updateFields, null, null);
This means your message will be able to be processed for another 60 seconds.

Odd Behavior of Azure Service Bus ReceiveBatch()

Working with a Azure Service Bus Topic currently and running into an issue receiving my messages using ReceiveBatch method. The issue is that the expected results are not actually the results that I am getting. Here is the basic code setup, use cases are below:
SubscriptionClient client = SubscriptionClient.CreateFromConnectionString(connectionString, convoTopic, subName);
IEnumerable<BrokeredMessage> messageList = client.ReceiveBatch(100);
foreach (BrokeredMessage message in messageList)
{
try
{
Console.WriteLine(message.GetBody<string>() + message.MessageId);
message.Complete();
}
catch (Exception ex)
{
message.Abandon();
}
}
client.Close();
MessageBox.Show("Done");
Using the above code, if I send 4 messages, then poll on the first run through I get the first message. On the second run through I get the other 3. I'm expecting to get all 4 at the same time. It seems to always return a singular value on the first poll then the rest on subsequent polls. (same result with 3 and 5 where I get n-1 of n messages sent on the second try and 1 message on the first try).
If I have 0 messages to receive, the operation takes between ~30-60 seconds to get the messageList (that has a 0 count). I need this to return instantly.
If I change the code to IEnumerable<BrokeredMessage> messageList = client.ReceiveBatch(100, new Timespan(0,0,0)); then issue #2 goes away because issue 1 still persists where I have to call the code twice to get all the messages.
I'm assuming that issue #2 is because of a default timeout value which I overwrite in #3 (though I find it confusing that if a message is there it immediately responds without waiting the default time). I am not sure why I never receive the full amount of messages in a single ReceiveBatch however.
The way I got ReceiveBatch() to work properly was to do two things.
Disable Partitioning in the Topic (I had to make a new topic for this because you can't toggle that after creation)
Enable Batching on each subscription created like so:
List item
SubscriptionDescription sd = new SubscriptionDescription(topicName, orgSubName);
sd.EnableBatchedOperations = true;
After I did those two things, I was able to get the topics to work as intended using IEnumerable<BrokeredMessage> messageList = client.ReceiveBatch(100, new TimeSpan(0,0,0));
I'm having a similar problem with an ASB Queue. I discovered that I could mitigate it somewhat by increasing the PrefetchCount on the client prior to receiving the batch:
SubscriptionClient client = SubscriptionClient.CreateFromConnectionString(connectionString, convoTopic, subName);
client.PrefetchCount = 100;
IEnumerable<BrokeredMessage> messageList = client.ReceiveBatch(100);
From the Azure Service Bus Best Practices for Performance Improvements Using Service Bus Brokered Messaging:
Prefetching enables the queue or subscription client to load additional messages from the service when it performs a receive operation.
...
When using the default lock expiration of 60 seconds, a good value for
SubscriptionClient.PrefetchCount is 20 times the maximum processing rates of all receivers of the factory. For example, a factory creates 3 receivers, and each receiver can process up to 10 messages per second. The prefetch count should not exceed 20*3*10 = 600.
...
Prefetching messages increases the overall throughput for a queue or subscription because it reduces the overall number of message operations, or round trips. Fetching the first message, however, will take longer (due to the increased message size). Receiving prefetched messages will be faster because these messages have already been downloaded by the client.
Just a few more pieces to the puzzle. I still couldn't get it to work even after Enable Batching and Disable Partitioning - I still had to do two ReceiveBatch calls. I did find however:
Restarting the Service Bus services (I am using Service Bus for Windows Server) cleared up the issue for me.
Doing a single RecieveBatch and taking no action (letting the message locks expire) and then doing another ReceiveBatch caused all of the messages to come through at the same time. (Doing an initial ReceiveBatch and calling Abandon on all of the messages didn't cause that behavior.)
So it appears to be some sort of corruption/bug in Service Bus's in-memory cache.

Categories

Resources