Confluent-kafka client OutOfOrderSequenceException while Kafka/MSK cluster had security patch applied - c#

When sending data to kafka using an async, idempotent producer, we received errors showing below. We restarted producer applications and resolved an issue. We opened a ticket with Confluent - Ticket Link. They mentioned that it's an existing bug in the Kafka client we use and suggested by restarting producers or resetting the transactional producer will help to address the issue in mean time they release new client updates (not in their release pipeline yet).
We tried to reproduce the issue in our dev and QA environments with the higher load but could not re-produce it. Does anyone have an idea how can we possible re-produce the same issue? Also, if are there any other suggestions to handle this issue in production?
Error, we received in our producer application, most likely caused by the confluent-kafka-dotnet client we use. -
"%3|1668238507.285|ERROR|<PUBLISHER_APP_NAME>#producer-1|: Fatal error: Broker: Broker received an out of order sequence number: ProduceRequest for <TOPIC_NAME> [38] with 1 message(s) failed due to sequence desynchronization with broker 1 (PID{Id:22181,Epoch:0}, base seq 0, idemp state change 8184098ms ago, last partition error NO_ERROR (actions , base seq 0..0, base msgid 0, -1ms ago)"
Our Producer configuration looks like:
config = new ProducerConfig {
BootstrapServers = appConfiguration.KafkaBootStrapServers,
SecurityProtocol = SecurityProtocol.Ssl,
Acks = Acks.All,
EnableIdempotence = true,
ClientId = appConfiguration.KafkaClientID
};
Our Asp.net C# Producer method call looks like:
try {
Headers headers = new Headers();
headers.Add(<CONSTANT>.ATT_TRANSACTION_ID, Encoding.ASCII.GetBytes(transactionId));
DeliveryResult<string, string> response =
await kafkaProducer.ProduceAsync(topic, new Message<string, string> { Key = key, Value = eventData, Headers = headers });
return response; // logging
}
catch (ProduceException<string, string> ex)
{
// catch exception, processing and logging
return null;
}
catch (Exception ex)
{
// catch exception and logging
return null;
}
Environment -
Kafka cluster (v2.8.0) with 3 brokers and in-sync replicas
Confluent.Kafka nuget library 1.7.0 with librdkafka.redist v1.7.0

Related

Azure Service Bus Receive Messages continuously when ever new message placed in web application [duplicate]

I am using Azure.Messaging.ServiceBus nuget package to work with Azure service bus. We have created a topic and a subscription. The subscription has 100+ messages. We want to read all the message and continue to read message as they arrive.
Microsoft.Azure.ServiceBus package (deprecated now) provided RegisterMessageHandler which use to process every incoming message. I am not able to find similar option under Azure.Messaging.ServiceBus nuget package.
I am able to read one message at a time but I have to call await receiver.ReceiveMessageAsync(); every time manually.
To receive multiple messages (a batch), you should use ServiceBusReceiver.ReceiveMessagesAsync() (not plural, not singular 'message'). This method will return whatever number of messages it can send back. To ensure you retrieve all 100+ messages, you'll need to loop until no messages are available.
If you'd like to use a processor, that's also available in the new SDK. See my answer to a similar question here.
As suggested by #gaurav Mantri, I used ServiceBusProcessor class to implement event based model for processing messages
public async Task ReceiveAll()
{
string connectionString = "Endpoint=sb://sb-test-today.servicebus.windows.net/;SharedAccessKeyName=manage;SharedAccessKey=8e+6SWp3skB3Aedsadsadasdwz5DU=;";
string topicName = "topicone";
string subscriptionName = "subone";
await using var client = new ServiceBusClient(connectionString, new ServiceBusClientOptions
{
TransportType = ServiceBusTransportType.AmqpWebSockets
});
var options = new ServiceBusProcessorOptions
{
// By default or when AutoCompleteMessages is set to true, the processor will complete the message after executing the message handler
// Set AutoCompleteMessages to false to [settle messages](https://learn.microsoft.com/en-us/azure/service-bus-messaging/message-transfers-locks-settlement#peeklock) on your own.
// In both cases, if the message handler throws an exception without settling the message, the processor will abandon the message.
AutoCompleteMessages = false,
// I can also allow for multi-threading
MaxConcurrentCalls = 1
};
await using ServiceBusProcessor processor = client.CreateProcessor(topicName, subscriptionName, options);
processor.ProcessMessageAsync += MessageHandler;
processor.ProcessErrorAsync += ErrorHandler;
await processor.StartProcessingAsync();
Console.ReadKey();
}
public async Task MessageHandler(ProcessMessageEventArgs args)
{
string body = args.Message.Body.ToString();
Console.WriteLine(body);
// we can evaluate application logic and use that to determine how to settle the message.
await args.CompleteMessageAsync(args.Message);
}
public Task ErrorHandler(ProcessErrorEventArgs args)
{
// the error source tells me at what point in the processing an error occurred
Console.WriteLine(args.ErrorSource);
// the fully qualified namespace is available
Console.WriteLine(args.FullyQualifiedNamespace);
// as well as the entity path
Console.WriteLine(args.EntityPath);
Console.WriteLine(args.Exception.ToString());
return Task.CompletedTask;
}

IBM XMS Receive method not returning messages immediately

I use IBM XMS to connect to a third party to send and receive messages.
UPDATE:
Client .Net Core 3.1
IBM XMS library version from Nuget. Tried 9.2.4 and 9.1.5 with same results
Same code used to work fine a week ago - so something must have changed in the MQ manager or somewhere in my infrastructure
SSL and client certificates
I have been using a receive with timeout for a while without problems but since last week I started to not see any messages to pick - even when they were there - but once I changed to the not timeout receive method I started again to pick messages every 5 minutes.
Looking at the XMS logs I can see the messages are actually read almost immediately with and without timeout but that XMS seems to be deciding to wait for those 5 minutes before returning the message...
I haven't changed anything in my side and the third party reassures they haven't either.
My question is: given the below code used to receive is there anything there that may be the cause of the 5 minutes wait? Any ideas on things I can try? I can share the XMS logs too if that helps.
// This is used to set the default properties in the factory before calling the receive method
private void SetConnectionProperties(IConnectionFactory cf)
{
cf.SetStringProperty(XMSC.WMQ_HOST_NAME, _mqConfiguration.Host);
cf.SetIntProperty(XMSC.WMQ_PORT, _mqConfiguration.Port);
cf.SetStringProperty(XMSC.WMQ_CHANNEL, _mqConfiguration.Channel);
cf.SetStringProperty(XMSC.WMQ_QUEUE_MANAGER, _mqConfiguration.QueueManager);
cf.SetStringProperty(XMSC.WMQ_SSL_CLIENT_CERT_LABEL, _mqConfiguration.CertificateLabel);
cf.SetStringProperty(XMSC.WMQ_SSL_KEY_REPOSITORY, _mqConfiguration.KeyRepository);
cf.SetStringProperty(XMSC.WMQ_SSL_CIPHER_SPEC, _mqConfiguration.CipherSuite);
cf.SetIntProperty(XMSC.WMQ_CONNECTION_MODE, XMSC.WMQ_CM_CLIENT);
cf.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_OPTIONS, XMSC.WMQ_CLIENT_RECONNECT);
cf.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT, XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT_DEFAULT);
}
public IEnumerable<IMessage> ReceiveMessage()
{
using var connection = _connectionFactory.CreateConnection();
using var session = connection.CreateSession(false, AcknowledgeMode.AutoAcknowledge);
using var destination = session.CreateQueue(_mqConfiguration.ReceiveQueue);
using var consumer = session.CreateConsumer(destination);
connection.Start();
var result = new List<IMessage>();
var keepRunning = true;
while (keepRunning)
{
try
{
var sw = new Stopwatch();
sw.Start();
var message = _mqConfiguration.ConsumerTimeoutMs == 0 ? consumer.Receive()
: consumer.Receive(_mqConfiguration.ConsumerTimeoutMs);
if (message != null)
{
result.Add(message);
_messageLogger.LogInMessage(message);
var ellapsedMillis = sw.ElapsedMilliseconds;
if (_mqConfiguration.ConsumerTimeoutMs == 0)
{
keepRunning = false;
}
}
else
{
keepRunning = false;
}
}
catch (Exception e)
{
// We log the exception
keepRunning = false;
}
}
consumer.Close();
destination.Dispose();
session.Dispose();
connection.Close();
return result;
}
The symptoms look like a match for APAR IJ20591: Managed .NET SSL application making MQGET calls unexpectedly receives MQRC_CONNECTION_BROKEN when running in .NET Core. This impacts messages larger than 15kb and IBM MQ .net standard (core) libraries using TLS channels. See also this thread. This will be fixed in 9.2.0.5, no CDS release is listed.
It states:
Setting the heartbeat interval to lower values may reduce the frequency of occurrence.
If your .NET application is not using a CCDT you can lower the heartbeat by having the SVRCONN channel's HBINT lowered and reconnecting your application.

Initialization of MQQueueManager hangs in .NET core 2.2

We are having issues when initializing the Constructor for MQQueueManager. Our service is a Cron Job hosted on Kubernetes (AWS) which is scheduled to run every minute. This Cron Job connects to IBM MQ Server and reads the messages by looping through a list of queues. We are using "IBM MQ Client for .NET Core(9.2.0)" for connecting to MQ server.
Service was working fine for couple of months in Production , but one fine day the service stopped working, when checking the logs we could see Service was stuck while initializing MQQueueManager and Cron Job status in the POD was still showing running. Since the Cron Job was in running status , new jobs couldn't be created and finally we had to kill the POD on K8s to get the service up and running.
We have tried replicating the issue on our dev and test env's , but not succeeded yet. Not sure if the issue is while initializing
MQQueueManager constructor or the thread that is running is in deadlock. Below is the piece of code we are using... Any help would be appreciated....
Also there is no exception thrown...
private void InitialiseMQProperties()
{
try
{
_mqProperties = new Hashtable();
_mqProperties.Add(MQC.HOST_NAME_PROPERTY, _hostName);
_mqProperties.Add(MQC.PORT_PROPERTY, _port);
_mqProperties.Add(MQC.CHANNEL_PROPERTY, _channelName);
_mqProperties.Add(MQC.TRANSPORT_PROPERTY, MQC.TRANSPORT_MQSERIES_MANAGED);
_mqProperties.Add(MQC.USER_ID_PROPERTY, _userId);
}
catch (Exception ex)
{
Logger.Error("ConfigurationError: Initialising properties for Queue {Message}", ex.Message);
}
}
**This function gets called by the Cron Job every minute**
public async Task GetMessagesFromQueue()
{
try
{
InitialiseMQProperties();
var queTaskList = new List<Task>();
var _queues = _queueList.Split(',');
foreach (var queue in _queues)
{
queTaskList.Add(GetMessasgesByQueue(queue));
}
await Task.WhenAll(queTaskList).ConfigureAwait(false);
Logger.Information("Messages successfully processed from the Queues");
}
catch (Exception ex)
{
Logger.Error("GeneralError: Reading messages from Queue error {Message}", ex.Message);
}
}
private async Task GetMessasgesByQueue(string queueName)
{
Logger.Information("Initialising queuemanager properties");
var queueManager = new MQQueueManager("", _mqProperties);
**From the logs it gets Stuck here**
MQQueue queue = null;
MQMessage message;
MQGetMessageOptions getMessageOptions;
try
{
getMessageOptions = new MQGetMessageOptions();
getMessageOptions.Options |= MQC.MQGMO_SYNCPOINT + MQC.MQGMO_FAIL_IF_QUIESCING;
Logger.Information("Accessing queue {queueName}", queueName);
queue = queueManager.AccessQueue(queueName, MQC.MQOO_INPUT_AS_Q_DEF + MQC.MQOO_FAIL_IF_QUIESCING);
Logger.Information("Connection to queue succeeded");
while (true)
{
var commitTrans = true;
message = new MQMessage();
queue.Get(message, getMessageOptions);
var queMessage = message.ReadString(message.MessageLength);
Logger.Debug("Message read from MQ {0}", queMessage); //sensitive
if (queMessage.Length > 0)
{
var data = await _dataSerializer.DeserializePayload<Response>(queMessage);
if (data != null)
{
**Internal logic**
}
}
}
catch (MQException mqe)
{
switch (mqe.ReasonCode)
{
case MQC.MQRC_NO_MSG_AVAILABLE:
Logger.Information("MQInfo: No message available in the queue {queueName}", queueName);
CloseQueue(queue);
break;
case MQC.MQRC_Q_MGR_QUIESCING:
case MQC.MQRC_Q_MGR_STOPPING:
CloseQueue(queue);
queueManager.Backout();
Logger.Error("MQError: Queue Manager Stopping error {Message}", mqe.Message);
break;
case MQC.MQRC_Q_MGR_NOT_ACTIVE:
case MQC.MQRC_Q_MGR_NOT_AVAILABLE:
CloseQueue(queue);
queueManager.Backout();
Logger.Error("MQError: Queue Manager not available error {Message}", mqe.Message);
break;
default:
Logger.Error("MQError: Error reading queue {queueName} error {Message}", queueName, mqe.Message);
CloseQueue(queue);
queueManager.Backout();
break;
}
}
Logger.Information("Closing queue {queueName}", queueName);
CloseQueue(queue);
CloseQueueManager(queueManager);
Logger.Information("Closing Queue manager for queue {queueName}", queueName);
}
I have a very similar setup, I am using aks and XMS client for .net core 3.1, running the CronJob every second, I saw a couple of crashes in Prod without understanding why even checking the POD logs, I had a conversation with the MQ team and their advice was to recycle the connection, it looks like the amount of connection exhaust the communication and crash the app but we couldn't see that in DEV or Test because of the amount of messages processed per environment.
What I ended doing was to spin up the container, create the connection with MQ and leave it open for the life cycle of the POD instead of each CronJob Cycle and destroying it only if the POD crashes or stops.
Something that worth evaluating?

How to purge messages for Service Bus Topic Subscription

Just wondering the best way (even if via Portal, Powershell, or C#) to purge the messages off of a Service Bus Topic's Subscription.
Imagine we have a topic with 4 subscriptions, and we only want to purge the messages from one of the subscriptions.
I have a feeling the only way may be to read the messages in a while loop, but hoping for something better.
UPDATE:
Apart from using code, you can use the Server Explorer as suggested in the answer - right click subscription and purge messages:
You can most certainly do it via code. If you're using Service Bus SDK, you could do something like the following:
static void PurgeMessagesFromSubscription()
{
var connectionString = "Endpoint=sb://account-name.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=access key";
var topic = "topic-name";
var subscription = "subscription-name";
int batchSize = 100;
var subscriptionClient = SubscriptionClient.CreateFromConnectionString(connectionString, topic, subscription, ReceiveMode.ReceiveAndDelete);
do
{
var messages = subscriptionClient.ReceiveBatch(batchSize);
if (messages.Count() == 0)
{
break;
}
}
while (true);
}
What this code will do is fetch messages from the subscription (100 at a time) in Receive & Delete mode so that as soon as messages are fetched, they are deleted from the subscription automatically.
I believe Service Bus Explorer tool also has the capability to purge messages. You can use that as well instead of writing the code.
If you have a lot of messages and can tolerate a bit of downtime on subscriber side, it might be faster to just drop the subscription and create a new one with the same name.
Thank you #Gaurav Mantri, I used slightly changed code without the batch option with version 5.2.0 of Microsoft.Azure.ServiceBus Nuget Package:
var connectionString = "Endpoint=sb://";
var topic = "topic";
var subscription = "subscription";
var subscriptionClient = new SubscriptionClient(connectionString, topic, subscription, ReceiveMode.ReceiveAndDelete);
subscriptionClient.RegisterMessageHandler(
(message, token) =>
{
Console.WriteLine($"Received message: SequenceNumber:
{message.SystemProperties.SequenceNumber}");
return Task.CompletedTask;
},
(exceptionEvent) =>
{
Console.WriteLine("Exception = " + exceptionEvent.Exception);
return Task.CompletedTask;
});

Windows Non-WCF Service Moving Transacted MSMQ Message to Failed Queue

I have a legacy Windows service running on Server 2008 that reads messages from a Transactional MSMQ Queue. This is not configured as a WCF service.
We are wanting to improve the handling of failed and poison messages in code (C# 4.0) by catching custom exceptions and sending the related message to a separate 'failed' or 'poison' queue depending upon the type of exception thrown.
I can't get the Catch code to send the message to the separate queue - it disappears from the original queue (as desired!) but doesn't show up in the 'failed' queue.
For testing all of the queues have no Authentication required and permissions are set to allow Everyone to do everything.
Clearly something is missing or wrong and I suspect it is transaction related, but I can't see it. Or perhaps this is not possible the way I am trying to do it ?
Any guidance / suggestions appreciated!
Our simplified PeekCompleted Event code:
private void MessageReceived(object sender, PeekCompletedEventArgs e)
{
using (TransactionScope txnScope = new TransactionScope())
{
MyMessageType currentMessage = null;
MessageQueue q = ((MessageQueue)sender);
try
{
Message queueMessage = q.EndPeek(e.AsyncResult);
currentMessage = (FormMessage)queueMessage.Body;
Processor distributor = new Processor();
Processor.Process(currentMessage); // this will throw if need be
q.ReceiveById(e.Message.Id);
txnScope.Complete();
q.BeginPeek();
}
catch (MyCustomException ex)
{
string qname = ".\private$\failed";
if (!MessageQueue.Exists(qname)){
MessageQueue.Create(qname , true);
}
MessageQueue fq = new MessageQueue(qname){
Formatter = new BinaryMessageFormatter()
};
System.Messaging.Message message2 = new System.Messaging.Message{
Formatter = new BinaryMessageFormatter(),
Body = currentMessage,
Label = "My Failed Message";
};
fq.Send(message2); //send to failed queue
q.ReceiveById(e.Message.Id); //off of original queue
txnScope.Complete(); // complete the trx
_queue.BeginPeek(); // next or wait
}
//other catches handle any cases where we want to tnxScope.Dispose()
EDIT : October 8, 2013
Hugh's answer below got us on the right track. Inside the Catch block the Failed Queue was already created as transactional
MessageQueue.Create(qname , true);
but the Send needed a TransactionType parameter
fq.Send(message2,MessageQueueTransactionType.Single);
That did the trick. Thanks Hugh!
If the message is disappearing from your original queue then that means your code is reaching the second scope.Complete() (in your catch block).
This means the problem has to do with your send to the error queue.
I would suggest that you need to create the queue as transactional because you are sending from within a scope.
MessageQueue fq = new MessageQueue(qname, true){
Formatter = new BinaryMessageFormatter()
};
Then you need to do a transactional send:
fq.Send(message2, Transaction.Current);
See if this works.

Categories

Resources