We are having issues when initializing the Constructor for MQQueueManager. Our service is a Cron Job hosted on Kubernetes (AWS) which is scheduled to run every minute. This Cron Job connects to IBM MQ Server and reads the messages by looping through a list of queues. We are using "IBM MQ Client for .NET Core(9.2.0)" for connecting to MQ server.
Service was working fine for couple of months in Production , but one fine day the service stopped working, when checking the logs we could see Service was stuck while initializing MQQueueManager and Cron Job status in the POD was still showing running. Since the Cron Job was in running status , new jobs couldn't be created and finally we had to kill the POD on K8s to get the service up and running.
We have tried replicating the issue on our dev and test env's , but not succeeded yet. Not sure if the issue is while initializing
MQQueueManager constructor or the thread that is running is in deadlock. Below is the piece of code we are using... Any help would be appreciated....
Also there is no exception thrown...
private void InitialiseMQProperties()
{
try
{
_mqProperties = new Hashtable();
_mqProperties.Add(MQC.HOST_NAME_PROPERTY, _hostName);
_mqProperties.Add(MQC.PORT_PROPERTY, _port);
_mqProperties.Add(MQC.CHANNEL_PROPERTY, _channelName);
_mqProperties.Add(MQC.TRANSPORT_PROPERTY, MQC.TRANSPORT_MQSERIES_MANAGED);
_mqProperties.Add(MQC.USER_ID_PROPERTY, _userId);
}
catch (Exception ex)
{
Logger.Error("ConfigurationError: Initialising properties for Queue {Message}", ex.Message);
}
}
**This function gets called by the Cron Job every minute**
public async Task GetMessagesFromQueue()
{
try
{
InitialiseMQProperties();
var queTaskList = new List<Task>();
var _queues = _queueList.Split(',');
foreach (var queue in _queues)
{
queTaskList.Add(GetMessasgesByQueue(queue));
}
await Task.WhenAll(queTaskList).ConfigureAwait(false);
Logger.Information("Messages successfully processed from the Queues");
}
catch (Exception ex)
{
Logger.Error("GeneralError: Reading messages from Queue error {Message}", ex.Message);
}
}
private async Task GetMessasgesByQueue(string queueName)
{
Logger.Information("Initialising queuemanager properties");
var queueManager = new MQQueueManager("", _mqProperties);
**From the logs it gets Stuck here**
MQQueue queue = null;
MQMessage message;
MQGetMessageOptions getMessageOptions;
try
{
getMessageOptions = new MQGetMessageOptions();
getMessageOptions.Options |= MQC.MQGMO_SYNCPOINT + MQC.MQGMO_FAIL_IF_QUIESCING;
Logger.Information("Accessing queue {queueName}", queueName);
queue = queueManager.AccessQueue(queueName, MQC.MQOO_INPUT_AS_Q_DEF + MQC.MQOO_FAIL_IF_QUIESCING);
Logger.Information("Connection to queue succeeded");
while (true)
{
var commitTrans = true;
message = new MQMessage();
queue.Get(message, getMessageOptions);
var queMessage = message.ReadString(message.MessageLength);
Logger.Debug("Message read from MQ {0}", queMessage); //sensitive
if (queMessage.Length > 0)
{
var data = await _dataSerializer.DeserializePayload<Response>(queMessage);
if (data != null)
{
**Internal logic**
}
}
}
catch (MQException mqe)
{
switch (mqe.ReasonCode)
{
case MQC.MQRC_NO_MSG_AVAILABLE:
Logger.Information("MQInfo: No message available in the queue {queueName}", queueName);
CloseQueue(queue);
break;
case MQC.MQRC_Q_MGR_QUIESCING:
case MQC.MQRC_Q_MGR_STOPPING:
CloseQueue(queue);
queueManager.Backout();
Logger.Error("MQError: Queue Manager Stopping error {Message}", mqe.Message);
break;
case MQC.MQRC_Q_MGR_NOT_ACTIVE:
case MQC.MQRC_Q_MGR_NOT_AVAILABLE:
CloseQueue(queue);
queueManager.Backout();
Logger.Error("MQError: Queue Manager not available error {Message}", mqe.Message);
break;
default:
Logger.Error("MQError: Error reading queue {queueName} error {Message}", queueName, mqe.Message);
CloseQueue(queue);
queueManager.Backout();
break;
}
}
Logger.Information("Closing queue {queueName}", queueName);
CloseQueue(queue);
CloseQueueManager(queueManager);
Logger.Information("Closing Queue manager for queue {queueName}", queueName);
}
I have a very similar setup, I am using aks and XMS client for .net core 3.1, running the CronJob every second, I saw a couple of crashes in Prod without understanding why even checking the POD logs, I had a conversation with the MQ team and their advice was to recycle the connection, it looks like the amount of connection exhaust the communication and crash the app but we couldn't see that in DEV or Test because of the amount of messages processed per environment.
What I ended doing was to spin up the container, create the connection with MQ and leave it open for the life cycle of the POD instead of each CronJob Cycle and destroying it only if the POD crashes or stops.
Something that worth evaluating?
Related
I am new to Azure Service Bus and appreciate any help I can get.
In my current project, using c# and Azure.Messaging.ServiceBus we use an On-Premise server running a "Task Engine" windows service that listens to various queues (including MSMQ) to receive and process messages. We are migrating to Azure Service Bus Queue now.
I implemented ReceiveMessageAsync() method to read and process messages. The connection is persistent because the base class of the Task Engine service is already running in loop. While the below code works fine from my local pc (connected to VPN), it fails with the following error as soon as it's deployed to the on-premise server. The server also uses up all memory and shuts down causing other queues to terminate.
Error messages:
Azure.Messaging.ServiceBus.ServiceBusException: Creation of ReceivingAmqpLink did not complete in 30000 milliseconds. (ServiceTimeout)
Azure.Messaging.ServiceBus.ServiceBusException: 'receiver31' is closed (GeneralError)
Note:
Private Endpoint is enabled on Azure Service Bus and we use token and client credentials to connect to Azure.
All code below works fine locally when run for more than 2 hours and processes messages as soon as they are manually sent to queue using Azure Portal.
Code:
public **override **void StartUp(ContextBase context)
{
// Save the thread context
base.StartUp(context);
//Get values from Config
_tenantId = System.Configuration.ConfigurationManager.AppSettings["tenant-id"];
_clientId = System.Configuration.ConfigurationManager.AppSettings["client-id"];
_clientSecret = System.Configuration.ConfigurationManager.AppSettings["client-secret"];
_servicebusNamespace = System.Configuration.ConfigurationManager.AppSettings["servicebus-namespace"];
_messageQueueName = System.Configuration.ConfigurationManager.AppSettings["servicebus-inbound-queue"];
getAzureServiceBusAccess();
// Set the running flag
_isRunning = true;
}
//Called when service is initialized and then when Reset Connection happens due to error
private static void getAzureServiceBusAccess()
{
var _token = new ClientSecretCredential(_tenantId, _clientId, _clientSecret);
var clientOptions = new ServiceBusClientOptions()
{
TransportType = ServiceBusTransportType.AmqpWebSockets
};
_serviceBusClient = new ServiceBusClient(_servicebusNamespace, _token, clientOptions);
_serviceBusReceiver = _serviceBusClient.CreateReceiver(_messageQueueName, new ServiceBusReceiverOptions());
}
public **override **void DoAction()
{
// Make sure we haven't shut down
if (_isRunning)
{
// Wait next message
tryReceiveMessages();
}
}
private async void tryReceiveMessages()
{
try
{
ServiceBusReceivedMessage message = null;
message = await _serviceBusReceiver.ReceiveMessageAsync();
if (message != null && _isRunning)
{
try
{
string _messageBody = message.Body.ToString();
// <<Send message body to Task Adapter that adds it to the database and processes the job>>
await _serviceBusReceiver.CompleteMessageAsync(message);
}
catch (ServiceBusException s)
{
Tracer.RaiseError(Source.AzureSB, "Azure Service Bus Queue resulted in exception when processing message.", s);
throw;
}
catch (Exception ex)
{
Tracer.RaiseError(Source.AzureSB, "Unexpected error occurred moving task from Azure Service Bus to database; attempting to re-queue message.", ex);
if (message != null)
await _serviceBusReceiver.AbandonMessageAsync(message);
}
}
}
catch (ServiceBusException s)
{
tryResetConnections(s);
}
catch(Exception ex)
{
Tracer.RaiseError(Source.AzureSB, "Azure Service Bus Queue reset connection error.", ex);
throw;
}
}
private void tryResetConnections(Exception exception)
{
try
{
if (DateTime.Now.Subtract(LastQueueReset).TotalSeconds > 1800)
{
LastQueueReset = DateTime.Now;
getAzureServiceBusAccess();
}
else
{
//Send notification email to dev group
}
}
catch (Exception ex)
{
throw;
}
}
private async void closeAndDisposeConnectionAsync()
{
try
{
await _serviceBusReceiver.DisposeAsync();
}
catch(Exception ex)
{
//Do not throw and eat exception - Receiver may have been already disposed
}
try
{
await _serviceBusClient.DisposeAsync();
}
catch (Exception ex)
{
//Do not throw and eat exception - Client may have been already disposed
}
}
We tried to open the network settings on Azure Service Bus to public but that didn't resolve the issue.
I have requested the DevOps team to open ports 443, 5671 and 5672 for AMQPWebSockets and still waiting to hear back to test.
I am testing .NET version of gRPC to understand how to handle network failures. I put the server to one external machine and debugging the client. The server ticks with a message onnce a second and the client just shows it on the console. So when I stop my local Wi-Fi connection for seconds, then gRPC engine automatically recovers and I even get remaining values. However, if I disable Wi-Fi for longer time like a minute, then it just gets stuck. I don't even get any exceptions so that I can just handle this case and recover manually. This scenario works fine when I close the server app manually, then an exception will occur on the client. This is what I have on the client:
static async Task Main(string[] args)
{
try
{
await Subscribe();
}
catch (Exception)
{
Console.WriteLine("Fail");
Thread.Sleep(1000);
await Main(args);
}
Console.ReadLine();
}
private static async Task Subscribe()
{
using var channel = GrpcChannel.ForAddress("http://x.x.x.x:5555");
var client = new Greeter.GreeterClient(channel);
var replies = client.GerReplies(new HelloRequest { Message = "Test" });
while (await replies.ResponseStream.MoveNext(CancellationToken.None))
{
Console.WriteLine(replies.ResponseStream.Current.Message);
}
Console.WriteLine("Completed");
}
This works when the server app stopped but it doesn't work if I just disable loca Wi-Fi connection on the client side. How can I handle such a case and similar ones?
I've managed to solve it by KeepAlivePingDelay setting:
var handler = new SocketsHttpHandler
{
KeepAlivePingDelay = TimeSpan.FromSeconds(5),
KeepAlivePingTimeout = TimeSpan.FromSeconds(5),
};
using var channel = GrpcChannel.ForAddress("http://x.x.x.x:5555", new GrpcChannelOptions
{
HttpHandler = handler
});
This configuration force gRPC fail after 10 seconds in case of no connection.
I have a problem with MassTransit (Using MassTransit 3.5.7 (via Nuget), RabbitMQ 3.6.10, Erlang 19.0)
Looks like that MassTransit not clear up RabbitMQ channels, when bus is failed to start.
Here is my test program
using System;
using System.Threading;
using MassTransit;
namespace TestSubscriber
{
class Program
{
static void Main()
{
IBusControl busControl = null;
var failCount = 0;
var busNotInitialised = true;
//Keep RabbitMQ switched off for a few iterations of this loop, then switch it on.
while (busNotInitialised)
{
busControl = Bus.Factory.CreateUsingRabbitMq(sbc =>
{
var host = sbc.Host(new Uri("rabbitmq://localhost/"), h =>
{
h.Username("guest");
h.Password("guest");
});
sbc.ReceiveEndpoint(host, "some_queue", endpoint =>
{
endpoint.Handler<string>(async context =>
{
await Console.Out.WriteLineAsync($"Received: {context.Message}");
});
});
});
try
{
busControl.Start();
busNotInitialised = false;
}
catch (Exception)
{
Console.WriteLine($"Attempt:{++failCount} failed.");
//wait some time
Thread.Sleep(5000);
}
}
//At this point, using RabbitMq's management web page, you will see failCount + 1 channels.
busControl.Stop();
//At this point, using RabbitMq's management web page, you will see failCount channels.
Console.ReadLine();
}
}
}
It continuously tries to create a service bus that uses RabbitMQ.
The program breaks out of the loop as soon as the service bus is created successfully.
After running the program for a couple of minutes (with stopped RabbitMQ) and stopping on a breakpoint, I can see a lot of worker threads (one for each failed service bus creation attempt).
After startup RabbitMQ, all of those "dangling" connect threads will connect to RabbitMQ.
If I try to shut down the bus, the latest connection ( that belongs to the bus that was successfully created), is closed. All of the other dangling connections are still connected to RabbitMQ.
Problem here is that those dangling threads, when connected, reads data from the queue which leads to data loss.
Is there any way to get around this problem?
Please find the below code:
MQEnvironment.Hostname = HostName;
MQEnvironment.Channel = Channel;
if (!string.IsNullOrEmpty(SSLKeyRepository))
{
MQEnvironment.SSLCipherSpec = SSLCipherSpec;
MQEnvironment.SSLKeyRepository = SSLKeyRepository;
}
if (Port > 0)
MQEnvironment.Port = Port;
try
{
MQManager = new MQQueueManager(QueueManager);
try
{
MQRequestQueue = MQManager.AccessQueue(QueueNameGet, MQC.MQOO_INPUT_AS_Q_DEF + MQC.MQOO_FAIL_IF_QUIESCING);
MQResponseQueue = MQManager.AccessQueue(QueueNameGet, MQC.MQOO_OUTPUT + MQC.MQOO_FAIL_IF_QUIESCING);
return true;
}
catch (IBM.WMQ.MQException exIBM)
{
CloseConnection();
ErrorCode = exIBM.Reason;
ErrorDescription = exIBM.Message;
}
}
catch (IBM.WMQ.MQException exIBM)
{
CloseConnection();
ErrorCode = exIBM.Reason;
ErrorDescription = exIBM.Message;
}
catch (Exception ex)
{
CloseConnection();
ErrorCode = Constants.SYSTEMEXCEPTION;
ErrorDescription = ex.Message;
}
return false;
Issue: I am not getting the issue when I run it for single or 2-3 times. But I get the issue when it runs in a loop for multiple times.
Also, I have tried to run the same piece of code for 10,000 times from the IIS server and it ran successfully.
I am getting the issue only when I have this code on IIS webservice and that webservice is getting called multiple times.
IBM MQ client 7.5.0.0 installed on the IIS server and I am using the dll of the same version.
UPDATE
Error description:
Error Message The handle is invalid
StackTrace at System.Diagnostics.NtProcessManager.GetModuleInfos(Int32 processId, Boolean firstModuleOnly) at System.Diagnostics.Process.get_Modules() at IBM.WMQ.CommonServices.TraceEnvironment() at IBM.WMQ.CommonServices.CreateCommonServices() at IBM.WMQ.CommonServices.TraceEnabled() at IBM.WMQ.MQBase..ctor() at IBM.WMQ.MQManagedObject..ctor()
Thanks for providing the call stack. The issue you mention is very similar to the one fixed here in MQ version 7.5.0.2. As you are at MQ v7.5.0.0, I suggest you to upgrade your MQ client to the latest level, MQ v7.5.0.7 and try.
I have said this many times here and it applies to both Java and .NET, the MQEnvironment class is NOT thread safe. By using it, you are shooting yourself in the foot.
Put the values (channel, hostname & port #) into a HashTable and pass the HashTable to the MQQueueManager class.
Hashtable qMgrHT = new Hashtable();
qMgrHT.Add(MQC.TRANSPORT_PROPERTY, MQC.TRANSPORT_MQSERIES_MANAGED);
qMgrHT.Add(MQC.HOST_NAME_PROPERTY, "10.10.10.10");
qMgrHT.Add(MQC.CHANNEL_PROPERTY, "TEST.CHL");
qMgrHT.Add(MQC.PORT_PROPERTY, 1414);
qMgrHT.Add(MQC.USER_ID_PROPERTY, "myUserID");
qMgrHT.Add(MQC.PASSWORD_PROPERTY, "myPwd");
MQQueueManager qMgr = new MQQueueManager(qManager, qMgrHT);
Finally, write your code so that it maintains a connection rather than connecting and disconnecting over and over. Very, VERY bad form.
We have a service that receives messages from n message queues. However, if the Message Queuing service is restarted, the message retrieval service stops receiving messages even after the Message Queuing service has restarted successfully.
I have tried to specifically catch the MessageQueueException that is thrown in the message retrieval service and invoke the queue's BeginReceive method again. However, in the 2 seconds or so that it takes the Message Queuing service to restart, I get about 1875 instances of the exception and then the service stops functioning when another MessageQueueException is thrown in our StartListening method.
Is there an elegant way to recover from a Message Queuing service restart?
private void OnReceiveCompleted(object sender, ReceiveCompletedEventArgs e)
{
MessageQueue queue = (MessageQueue)sender;
try
{
Message message = queue.EndReceive(e.AsyncResult);
this.StartListening(queue);
if (this.MessageReceived != null)
this.MessageReceived(this, new MessageReceivedEventArgs(message));
}
catch (MessageQueueException)
{
LogUtility.LogError(String.Format(CultureInfo.InvariantCulture, StringResource.LogMessage_QueueManager_MessageQueueException, queue.MachineName, queue.QueueName, queue.Path));
this.StartListening(queue);
}
}
public void StartListening(MessageQueue queue)
{
queue.BeginReceive();
}
I need to deal with the infinite loop issue this causes and clean it up a bit but you get the idea.
When the MessageQueueException occurs, invoke the RecoverQueue method.
private void RecoverQueue(MessageQueue queue)
{
string queuePath = queue.Path;
bool queueRecovered = false;
while (!queueRecovered)
{
try
{
this.StopListening(queue);
queue.Close();
queue.Dispose();
Thread.Sleep(2000);
MessageQueue newQueue = this.CreateQueue(queuePath);
newQueue.ReceiveCompleted += new ReceiveCompletedEventHandler(this.OnReceiveCompleted);
this.StartListening(newQueue);
LogUtility.LogInformation(String.Format(CultureInfo.InvariantCulture, "Message queue {0} recovered successfully.", newQueue.QueueName));
queueRecovered = true;
}
catch (Exception ex)
{
LogUtility.LogError(String.Format(CultureInfo.InvariantCulture, "The following error occurred while trying to recover queue: {0} error: {1}", queue.QueueName, ex.Message));
}
}
}
public void StopListening(MessageQueue queue)
{
queue.ReceiveCompleted -= new ReceiveCompletedEventHandler(this.OnReceiveCompleted);
}
Upon receiving the exception that is the result of the service restarting, you have to release the old MessageQueue, i.e. unwiring your ReceiveCompleted event, disposing the MessageQueue, etc. Then create a new instance of the MessageQueue and hook up to the ReceiveCompleted event again on the new MessageQueue instance.
Alternatively, you can use a polling method that creates a new instance on a certain interval, calls MessageQueue.Receive(TimeSpan), will wait for an incoming message or until the timeout occurs. In which case you handle the message and destroy the MessageQueue instance and start the iteration again.
By recreating the MessageQueue each time, you ensure a built in recovery. Also, the overhead of creating the MessageQueue is minimal due to internal caching of the underlying queue.
Pseudo-code...
while (!notDone)// or use a timer or periodic task of some sort...
{
try
{
using (MessageQueue queue = new MessageQueue(queuePath))
{
Message message = queue.Receive(TimeSpan.FromMilliseconds(500));
// process message
}
}
catch (MessageQueueException ex)
{
// handle exceptions
}
}