ServiceBusProcessor cannot be used without specifying ProcessMessageAsync and ProcessErrorAsync. What is the first method, it's very clear but I'm not sure what to do in ProcessErrorAsync?
Are the following methods identical?
var client = new ServiceBusClient(connectionString);
var processor = _client.CreateProcessor(queueName);
processor.ProcessMessageAsync += async arg =>
{
try
{
//process message
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
};
processor.ProcessErrorAsync += arg =>
{
return Task.CompletedTask;
};
await _processor.StartProcessingAsync(cancellationToken);
and
var client = new ServiceBusClient(connectionString);
var processor = _client.CreateProcessor(queueName);
processor.ProcessMessageAsync += async arg =>
{
//process message
};
processor.ProcessErrorAsync += arg =>
{
Console.WriteLine(ex.Message);
return Task.CompletedTask;
};
await _processor.StartProcessingAsync(cancellationToken);
ProcessMessageAsync is a handler that will be called each time a message is read from your Service Bus instance and needs to be processed. This is where your business logic for handling messages should be.
ProcessErrorAsync is a handler that allows you to observe exceptions that occur during processor operation - both in your message processing code and within the processor infrastructure itself.
The processor is built to be resilient and do it's best to recover from problems and continue processing. Because of this, it will shrug off most exceptions, surface them to the handler, and then continue moving forward. The error handler is how your application is notified of problems and take the actions appropriate for your application.
As for what you should do in the handler, much of that depends on your application and its needs. At minimum, most applications want to understand when errors occur and log them in case analysis is needed at a later point. You may also want to use this to detect poison messages or other processing problems and takes the action that is appropriate for your application.
The processor has no knowledge of your application's design or needs, nor that of the the environment in which it is hosted. That means that it cannot make intelligent decisions about when a normally transient issue should be fatal or when there's a bigger issue in your application ecosystem. It is important to remember is your application is responsible for understanding the pattern of errors and determining if the application or processor is not healthy in a non-obvious way.
For example, if the processor cannot reach your Service Bus instance, it will continue to retry forever. If your application sees these exceptions consistently for a period of time, it may be a sign of an unhealthy host network and your application may choose to stop processing and reset the host. Likewise, if your application is expecting a specific schema for incoming messages and those published to your Service Bus instance aren't correct, the processor will continue to try handling them, but your application should be better able to recognize the larger problem and take the appropriate action.
Related
Introduction
Hello all, we're currently working on a microservice platform that uses Azure EventHubs and events to sent data in between the services.
Let's just name these services: CustomerService, OrderService and MobileBFF.
The CustomerService mainly sends updates (with events) which will then be stored by the OrderService and MobileBFF to be able to respond to queries without having to call the CustomerService for this data.
All these 3 services + our developers on the DEV environment make use of the same ConsumerGroup to connect to these event hubs.
We currently make use of only 1 partition but plan to expand to multiple later. (You can see our code is already made to be able to read from multiple partitions)
Exception
Every now and then we're running into an exception though (if it starts it usually keeps throwing this error for an hour or something). For now we've only seen this error on DEV/TEST environments though.
The exception:
Azure.Messaging.EventHubs.EventHubsException(ConsumerDisconnected): At least one receiver for the endpoint is created with epoch of '0', and so non-epoch receiver is not allowed. Either reconnect with a higher epoch, or make sure all epoch receivers are closed or disconnected.
All consumers of the EventHub, store their SequenceNumber in their own Database. This allows us to have each consumer consume events separately and also store the last processed SequenceNumber in it's own SQL database. When the service (re)starts, it loads the SequenceNumber from the db and then requests events from here onwards untill no more events can be found. It then sleeps for 100ms and then retries. Here's the (somewhat simplified) code:
var consumerGroup = EventHubConsumerClient.DefaultConsumerGroupName;
string[] allPartitions = null;
await using (var consumer = new EventHubConsumerClient(consumerGroup, _inboxOptions.EventHubConnectionString, _inboxOptions.EventHubName))
{
allPartitions = await consumer.GetPartitionIdsAsync(stoppingToken);
}
var allTasks = new List<Task>();
foreach (var partitionId in allPartitions)
{
//This is required if you reuse variables inside a Task.Run();
var partitionIdInternal = partitionId;
allTasks.Add(Task.Run(async () =>
{
while (!stoppingToken.IsCancellationRequested)
{
try
{
await using (var consumer = new EventHubConsumerClient(consumerGroup, _inboxOptions.EventHubConnectionString, _inboxOptions.EventHubName))
{
EventPosition startingPosition;
using (var testScope = _serviceProvider.CreateScope())
{
var messageProcessor = testScope.ServiceProvider.GetService<EventHubInboxManager<T, EH>>();
//Obtains starting position from the database or sets to "Earliest" or "Latest" based on configuration
startingPosition = await messageProcessor.GetStartingPosition(_inboxOptions.InboxIdentifier, partitionIdInternal);
}
while (!stoppingToken.IsCancellationRequested)
{
bool processedSomething = false;
await foreach (PartitionEvent partitionEvent in consumer.ReadEventsFromPartitionAsync(partitionIdInternal, startingPosition, stoppingToken))
{
processedSomething = true;
startingPosition = await messageProcessor.Handle(partitionEvent);
}
if (processedSomething == false)
{
await Task.Delay(100, stoppingToken);
}
}
}
}
catch (Exception ex)
{
//Log error / delay / retry
}
}
}
}
The exception is thrown on the following line:
await using (var consumer = new EventHubConsumerClient(consumerGroup, _inboxOptions.EventHubConnectionString, _inboxOptions.EventHubName))
More investigation
The code described above is running in the MicroServices (which are hosted as AppServices in Azure)
Next to that we're also running 1 Azure Function that also reads events from the EventHub. (Probably uses the same consumer group).
According to the documentation here: https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-features#consumer-groups it should be possible to have 5 consumers per consumer group. It seems to be suggested to only have one, but it's not clear to us what could happen if we don't follow this guidance.
We did do some tests with manually spawning multiple instances of our service that reads events and when there were more then 5 this resulted in a different error which stated quite clearly that there could only be 5 consumers per partition per consumer group (or something similar).
Furthermore it seems like (we're not 100% sure) that this issue started happening when we rewrote the code (above) to be able to spawn one thread per partition. (Even though we only have 1 partition in the EventHub). Edit: we did some more log-digging and also found a few exception before merging in the code to spawn one thread per partition.
That exception indicates that there is another consumer configured to use the same consumer group and asserting exclusive access over the partition. Unless you're explicitly setting the OwnerLevel property in your client options, the likely candidate is that there is at least one EventProcessorClient running.
To remediate, you can:
Stop any event processors running against the same Event Hub and Consumer Group combination, and ensure that no other consumers are explicitly setting the OwnerLevel.
Run these consumers in a dedicated consumer group; this will allow them to co-exist with the exclusive consumer(s) and/or event processors.
Explicitly set the OwnerLevel to 1 or greater for these consumers; that will assert ownership and force any other consumers in the same consumer group to disconnect.
(note: depending on what the other consumer is, you may need to test different values here. The event processor types use 0, so anything above that will take precedence.)
To add to the Jesse's answer, I think the exception message is part of
the old SDK.
If you look into the docs, there 3 types of receiving modes defined there:
Epoch
Epoch is a unique identifier (epoch value) that the service uses, to enforce partition/lease ownership.
The epoch feature provides users the ability to ensure that there is only one receiver on a consumer group at any point in time...
Non-epoch:
... There are some scenarios in stream processing where users would like to create multiple receivers on a single consumer group. To support such scenarios, we do have ability to create a receiver without epoch and in this case we allow upto 5 concurrent receivers on the consumer group.
Mixed:
... If there is a receiver already created with epoch e1 and is actively receiving events and a new receiver is created with no epoch, the creation of new receiver will fail. Epoch receivers always take precedence in the system.
I have pretty naive code :
public async Task Produce(string topic, object message, MessageHeader messageHeaders)
{
try
{
var producerClient = _EventHubProducerClientFactory.Get(topic);
var eventData = CreateEventData(message, messageHeaders);
messageHeaders.Times?.Add(DateTime.Now);
await producerClient.SendAsync(new EventData[] { eventData });
messageHeaders.Times?.Add(DateTime.Now);
//.....
Log.Info($"Milliseconds spent: {(messageHeaders.Times[1]- messageHeaders.Times[0]).TotalMilliseconds});
}
}
private EventData CreateEventData(object message, MessageHeader messageHeaders)
{
var eventData = new EventData(Encoding.UTF8.GetBytes(JsonConvert.SerializeObject(message)));
eventData.Properties.Add("CorrelationId", messageHeaders.CorrelationId);
if (messageHeaders.DateTime != null)
eventData.Properties.Add("DateTime", messageHeaders.DateTime?.ToString("s"));
if (messageHeaders.Version != null)
eventData.Properties.Add("Version", messageHeaders.Version);
return eventData;
}
in logs I had values for almost 1 second (~ 800 milliseconds)
What could be a reason for such long execution time?
The EventHubProducerClient opens connections to the Event Hubs service lazily, waiting until the first time an operation requires it. In your snippet, the call to SendAsync triggers an AMQP connection to be created, an AMQP link to be created, and authentication to be performed.
Unless the client is closed, most future calls won't incur that overhead as the connection and link are persistent. Most being an important distinction in that statement, as the client may need to reconnect in the face of a network error, when activity is low and the connection idles out, or if the Event Hubs service terminates the connection/link.
As Serkant mentions, if you're looking to understand timings, you'd probably be best served using a library like Benchmark.NET that works ove a large number of iterations to derive statistically meaningful results.
You are measuring the first 'Send'. That will incur some overhead that other Sends won't. So, always do warm up first like send single event and then measure the next one.
Another important thing. It is not right to measure just single 'Send' call. Measure bunch of calls instead and calculate latency percentile. That should provide a better figure for your tests.
First let me explain what I have. I have myself an Azure Service Bus with an Azure Function App. The Service Bus is setup to use SQL Filters to push specific message types into specific topics. Then using my Azure Function App these will get the newest message and then process it.
A basic example
1: I send a request to my EmailAPI
2: EmailAPI then pushing a new message into the Service Bus with a type of "Email"
3: The SQL Filter then sees the type is of "Email" and is placed into the email Topic in the Service Bux
4: The EmailListener Azure Function monitors the Service bus and notices a new message
5: Gather the Service Bus message and process it (basically just send the email using the information provided)
Now let's say for some reason the SMTP server connection is a little broken and some times we get a TimeOutException when attempting to send the email (EmailListener). What happens now when an exception is thrown, the Function App EmailListener will attempt to send it again instantly, no wait, it will just attempt to send it again. It will do this for a total of 10 times and then inform the Service Bus to place the message in the Dead Letter queue.
What I am attempting to do is when an exception is thrown (such as TimeOutException), we wait X amount of time before attempting to process the same message again. I have looked around at many different posts talking about the host.json and attempting to set those settings, but these have not worked. I have found a solution, however the solution requires your to create a clone of the message and push it back into the Service Bus and give it a delayed process time. I would prefer not to implement my own manual delay system, if Azure Service Bus / Function App can deal with retries itself.
The biggest issue I am having (which is probably down to my understanding) is who is at fault? Is it the Service Bus settings to handle the Retry Policy or is it the Azure Function App to deal with attempting to retry after X time.
I have provided a some code, but I feel code isn't really going to help explain my question.
// Pseudo code
public static class EmailListenerTrigger
{
[FunctionName("EmailListenerTrigger")]
public static void Run([ServiceBusTrigger("messages", "email", Connection = "ConnectionString")]string mySbMsg, TraceWriter log)
{
var emailLauncher = new EmailLauncher("SmtpAddress", "SmtpPort", "FromAddress");
try
{
emailLauncher.SendServiceBusMessage(mySbMsg);
}
catch(Exception ex)
{
log.Info($"Audit Log: {mySbMsg}, Excpetion: {ex.message}");
}
}
}
reference one: https://blog.kloud.com.au/2017/05/22/message-retry-patterns-in-azure-functions/ (Thread.Sleep doesn't seem like a good idea)
reference two: https://github.com/Azure/azure-functions-host/issues/2192 (Manually implemented retry)
reference three: https://www.feval.ca/posts/function-queue-retry/ (This seems to refer to queues when I am using topics)
reference four: Can the Azure Service Bus be delayed before retrying a message? (Talks about Defering the message, but then you need to manually get it back out the queue/topic.)
You might be able to solve your issue with the use of Durable Functions. There is for example a built-in method CallActivityWithRetryAsync() that can retry when the activity functions throws an exception.
https://learn.microsoft.com/en-us/sandbox/functions-recipes/durable-diagnostics#calling-activity-functions-with-retry
Your flow would probably something like this:
Service Bus triggered Function. This one starts an Orchestrator Function
The orchestrator calls your activity function (using the aforementioned method)
Your email sending is implemented in an Activity Function and can throw exceptions as needed
While there is no native support for what you want to do, it is still doable without having to do a lot of custom development. You can basically add a service bus output binding to your Azure function, that is connected to the same queue your function consumes messages from. Then, use a custom property to track the number of retries. The following is an example:
private static TimeSpan[] BackoffDurationsBetweenFailures = new[] { }; // add delays here
[FunctionName("retrying-poc")]
public async Task Run(
[ServiceBusTrigger("myQueue")] Message rawRequest,
IDictionary<string, object> userProperties,
[ServiceBus("myQueue")] IAsyncCollector<Message> collector)
{
var request = GetRequest(rawRequest);
var retryCount = GetRetryCount(userProperties);
var shouldRetry = false;
try
{
await _unreliableService.Call(request);
}
catch (Exception ex)
{
// I don't retry if it is a timeout, but that's my own choice.
shouldRetry = !(ex is TimeoutException) && retryCount < BackoffDurationsBetweenFailures.Length;
}
if (shouldRetry)
{
var retryMessage = new Message(rawRequest.Body);
retryMessage.UserProperties.Add("RetryCount", retryCount + 1);
retryMessage.ScheduledEnqueueTimeUtc = DateTime.UtcNow.Add(BackoffDurationsBetweenFailures[retryCount]);
await collector.AddAsync(retryMessage);
}
}
private MyBusinessObject GetRequest(Message rawRequest)
=> JsonConvert.DeserializeObject<MyBusinessObject>(Encoding.UTF8.GetString(rawRequest.Body));
private int GetRetryCount(IDictionary<string, object> properties)
=> properties.TryGetValue("RetryCount", out var value) && int.TryParse(value.ToString(), out var retryCount)
? retryCount
: 0;
I have a C# application that sets up numerous MQ listeners (multiple threads and potentially multiple servers each with their own listeners). There are some messages that will come off the queue that I will want to leave on the queue, move on to the next message on the MQ, but then under some circumstances I will want to go back to re-read those messages...
var connectionFactory = XMSFactoryFactory.GetInstance(XMSC.CT_WMQ).CreateConnectionFactory();
connectionFactory.SetStringProperty(XMSC.WMQ_HOST_NAME, origination.Server);
connectionFactory.SetIntProperty(XMSC.WMQ_PORT, int.Parse(origination.Port));
connectionFactory.SetStringProperty(XMSC.WMQ_QUEUE_MANAGER, origination.QueueManager);
connectionFactory.SetStringProperty(XMSC.WMQ_CHANNEL, origination.Channel);
var connection = connectionFactory.CreateConnection(null, null);
_connections.Add(connection);
var session = connection.CreateSession(false, AcknowledgeMode.ClientAcknowledge); //changed to use ClientAcknowledge so that we will leave the message on the MQ until we're sure we're processing it
_sessions.Add(session);
var destination = session.CreateQueue(origination.Queue);
_destinations.Add(destination);
var consumer = session.CreateConsumer(destination);
_consumers.Add(consumer);
Logging.LogDebugMessage(Constants.ListenerStart);
connection.Start();
ThreadPool.QueueUserWorkItem((o) => Receive(forOrigination, consumer));
Then I have...
if (OnMQMessageReceived != null)
{
var message = consumer.Receive();
var identifier = string.Empty;
if (message is ITextMessage)
{
//do stuff with the message here
//populates identifier from the message
}
else
{
//do stuff with the message here
//populates identifier from the message
}
if (!string.IsNullOrWhiteSpace(identifier)&& OnMQMessageReceived != null)
{
if( some check to see if we should process the message now)
{
//process message here
message.Acknowledge(); //this really pulls it off of the MQ
//here is where I want to trigger the next read to be from the beginning of the MQ
}
else
{
//We actually want to do nothing here. As in do not do Acknowledge
//This leaves the message on the MQ and we'll pick it up again later
//But we want to move on to the next message in the MQ
}
}
else
{
message.Acknowledge(); //this really pulls it off of the MQ...its useless to us anyways
}
}
else
{
Thread.Sleep(0);
}
ThreadPool.QueueUserWorkItem((o) => Receive(forOrigination, consumer));
So a couple of questions:
If I do not acknowledge the message it stays on the MQ, right?
If the message is not acknowledged then by default when I read from the MQ again with the same listener it reads the next one and does not go to the beginning, right?
How do I change the listener so that the next time I read I start at the beginning of the queue?
Leaving messages on a queue is an anti-pattern. If you don't want to or cannot process the message at a certain point of your logic, then you have a number of choices:
Get it off the queue and put to another queue/topic for a delayed/different processing.
Get it off the queue and dump to a database, flat file - whatever, if you want to process it outside of messaging flow, or don't want to process at all.
If it is feasible, you may want to change the message producer so it doesn't mix the messages with different processing requirements in the same queue/topic.
In any case, do not leave a message on the queue, and always move forward to the next message. This will make the application way more predictable and easier to reason about. You will also avoid all kinds of performance problems. If your application is or may ever become sensitive to the sequence of message delivery, then manual acknowledgement of selected messages will be at odds with it too.
To your questions:
The JMS spec is vague regarding the behavior of unacknowledged messages - they may be delivered out of order, and it is undefined when exactly when they will be delivered. Also, the acknowledge method call will acknowledge all previously received and unacknowledged messages - probably not what you had in mind.
If you leave messages behind, the listener may or may not go back immediately. If you restart it, it of course will start afresh, but while it is sitting there waiting for messages it is implementation dependent.
So if you try to make your design work, you may get it kind of work under certain circumstances, but it will not be predictable or reliable.
We have a Rebus message handler that talks to a third party webservice. Due to reasons beyond our immediate control, this WCF service frequently throws an exception because it encountered a database deadlock in its own database. Rebus will then try to process this message five times, which in most cases means that one of those five times will be lucky and not get a deadlock. But it frequently happens that a message does get deadlock after deadlock and ends up in our error queue.
Besides fixing the source of the deadlocks, which would be a longterm goal, I can think of two options:
Keep trying with only this particular message type until it succeeds. Preferably I would be able to set a timeout, so "if five deadlocks then try again in 5 minutes" rather than choke the process up even more by trying continuously. I already do a Thread.Sleep(random) to spread the messages somewhat, but it will still give up after five tries.
Send this particular message type to a different queue that has only one worker that processes the message, so that this happens serially rather than in parallel. Our current configuration uses 8 worker threads, but this just makes the deadlock situation worse as the webservice now gets called concurrently and the messages get in each other's way.
Option #2 has my preference, but I'm not sure if this is possible. Our configuration on the receiving side currently looks like this:
var adapter = new Rebus.Ninject.NinjectContainerAdapter(this.Kernel);
var bus = Rebus.Configuration.Configure.With(adapter)
.Logging(x => x.Log4Net())
.Transport(t => t.UseMsmqAndGetInputQueueNameFromAppConfig())
.MessageOwnership(d => d.FromRebusConfigurationSection())
.CreateBus().Start();
And the .config for the receiving side:
<rebus inputQueue="app.msg.input" errorQueue="app.msg.error" workers="8">
<endpoints>
</endpoints>
</rebus>
From what I can tell from the config, it's only possible to set one input queue to 'listen' to. I can't really find a way to do this via the fluent mapping API either. That seems to take only one input- and error queue as well:
.Transport(t =>t.UseMsmq("input", "error"))
Basically, what I'm looking for is something along the lines of:
<rebus workers="8">
<input name="app.msg.input" error="app.msg.error" />
<input name="another.input.queue" error="app.msg.error" />
</rebus>
Any tips on how to handle my requirements?
I suggest you make use of a saga and Rebus' timeout service to implement a retry strategy that fits your needs. This way, in your Rebus-enabled web service facade, you could do something like this:
public void Handle(TryMakeWebServiceCall message)
{
try
{
var result = client.MakeWebServiceCall(whatever);
bus.Reply(new ResponseWithTheResult{ ... });
}
catch(Exception e)
{
Data.FailedAttempts++;
if (Data.FailedAttempts < 10)
{
bus.Defer(TimeSpan.FromSeconds(1), message);
return;
}
// oh no! we failed 10 times... this is probably where we'd
// go and do something like this:
emailService.NotifyAdministrator("Something went wrong!");
}
}
where Data is the saga data that is made magically available to you and persisted between calls.
For inspiration on how to create a saga, check out the wiki page on coordinating stuff that happens over time where you can see an example on how a service might have some state (i.e. number of failed attempts in your case) stored locally that is made available between handling messages.
When the time comes to make bus.Defer work, you have two options: 1) use an external timeout service (which I usually have installed one of on each server), or 2) just use "yourself" as a timeout service.
At configuration time, you go
Configure.With(...)
.(...)
.Timeouts(t => // configure it here)
where you can either StoreInMemory, StoreInSqlServer, StoreInMongoDb, StoreInRavenDb, or UseExternalTimeoutManager.
If you choose (1), you need to check out the Rebus code and build Rebus.Timeout yourself - it's basically just a configurable, Topshelf-enabled console application that has a Rebus endpoint inside.
Please let me know if you need more help making this work - bus.Defer is where your system becomes awesome, and will be capable of overcoming all of the little glitches that make all others' go down :)