RabbitMQ channel leak when bus was failed to start - c#

I have a problem with MassTransit (Using MassTransit 3.5.7 (via Nuget), RabbitMQ 3.6.10, Erlang 19.0)
Looks like that MassTransit not clear up RabbitMQ channels, when bus is failed to start.
Here is my test program
using System;
using System.Threading;
using MassTransit;
namespace TestSubscriber
{
class Program
{
static void Main()
{
IBusControl busControl = null;
var failCount = 0;
var busNotInitialised = true;
//Keep RabbitMQ switched off for a few iterations of this loop, then switch it on.
while (busNotInitialised)
{
busControl = Bus.Factory.CreateUsingRabbitMq(sbc =>
{
var host = sbc.Host(new Uri("rabbitmq://localhost/"), h =>
{
h.Username("guest");
h.Password("guest");
});
sbc.ReceiveEndpoint(host, "some_queue", endpoint =>
{
endpoint.Handler<string>(async context =>
{
await Console.Out.WriteLineAsync($"Received: {context.Message}");
});
});
});
try
{
busControl.Start();
busNotInitialised = false;
}
catch (Exception)
{
Console.WriteLine($"Attempt:{++failCount} failed.");
//wait some time
Thread.Sleep(5000);
}
}
//At this point, using RabbitMq's management web page, you will see failCount + 1 channels.
busControl.Stop();
//At this point, using RabbitMq's management web page, you will see failCount channels.
Console.ReadLine();
}
}
}
It continuously tries to create a service bus that uses RabbitMQ.
The program breaks out of the loop as soon as the service bus is created successfully.
After running the program for a couple of minutes (with stopped RabbitMQ) and stopping on a breakpoint, I can see a lot of worker threads (one for each failed service bus creation attempt).
After startup RabbitMQ, all of those "dangling" connect threads will connect to RabbitMQ.
If I try to shut down the bus, the latest connection ( that belongs to the bus that was successfully created), is closed. All of the other dangling connections are still connected to RabbitMQ.
Problem here is that those dangling threads, when connected, reads data from the queue which leads to data loss.
Is there any way to get around this problem?

Related

Handle network issues when wi-fi is switched off

I am testing .NET version of gRPC to understand how to handle network failures. I put the server to one external machine and debugging the client. The server ticks with a message onnce a second and the client just shows it on the console. So when I stop my local Wi-Fi connection for seconds, then gRPC engine automatically recovers and I even get remaining values. However, if I disable Wi-Fi for longer time like a minute, then it just gets stuck. I don't even get any exceptions so that I can just handle this case and recover manually. This scenario works fine when I close the server app manually, then an exception will occur on the client. This is what I have on the client:
static async Task Main(string[] args)
{
try
{
await Subscribe();
}
catch (Exception)
{
Console.WriteLine("Fail");
Thread.Sleep(1000);
await Main(args);
}
Console.ReadLine();
}
private static async Task Subscribe()
{
using var channel = GrpcChannel.ForAddress("http://x.x.x.x:5555");
var client = new Greeter.GreeterClient(channel);
var replies = client.GerReplies(new HelloRequest { Message = "Test" });
while (await replies.ResponseStream.MoveNext(CancellationToken.None))
{
Console.WriteLine(replies.ResponseStream.Current.Message);
}
Console.WriteLine("Completed");
}
This works when the server app stopped but it doesn't work if I just disable loca Wi-Fi connection on the client side. How can I handle such a case and similar ones?
I've managed to solve it by KeepAlivePingDelay setting:
var handler = new SocketsHttpHandler
{
KeepAlivePingDelay = TimeSpan.FromSeconds(5),
KeepAlivePingTimeout = TimeSpan.FromSeconds(5),
};
using var channel = GrpcChannel.ForAddress("http://x.x.x.x:5555", new GrpcChannelOptions
{
HttpHandler = handler
});
This configuration force gRPC fail after 10 seconds in case of no connection.

IBM XMS Receive method not returning messages immediately

I use IBM XMS to connect to a third party to send and receive messages.
UPDATE:
Client .Net Core 3.1
IBM XMS library version from Nuget. Tried 9.2.4 and 9.1.5 with same results
Same code used to work fine a week ago - so something must have changed in the MQ manager or somewhere in my infrastructure
SSL and client certificates
I have been using a receive with timeout for a while without problems but since last week I started to not see any messages to pick - even when they were there - but once I changed to the not timeout receive method I started again to pick messages every 5 minutes.
Looking at the XMS logs I can see the messages are actually read almost immediately with and without timeout but that XMS seems to be deciding to wait for those 5 minutes before returning the message...
I haven't changed anything in my side and the third party reassures they haven't either.
My question is: given the below code used to receive is there anything there that may be the cause of the 5 minutes wait? Any ideas on things I can try? I can share the XMS logs too if that helps.
// This is used to set the default properties in the factory before calling the receive method
private void SetConnectionProperties(IConnectionFactory cf)
{
cf.SetStringProperty(XMSC.WMQ_HOST_NAME, _mqConfiguration.Host);
cf.SetIntProperty(XMSC.WMQ_PORT, _mqConfiguration.Port);
cf.SetStringProperty(XMSC.WMQ_CHANNEL, _mqConfiguration.Channel);
cf.SetStringProperty(XMSC.WMQ_QUEUE_MANAGER, _mqConfiguration.QueueManager);
cf.SetStringProperty(XMSC.WMQ_SSL_CLIENT_CERT_LABEL, _mqConfiguration.CertificateLabel);
cf.SetStringProperty(XMSC.WMQ_SSL_KEY_REPOSITORY, _mqConfiguration.KeyRepository);
cf.SetStringProperty(XMSC.WMQ_SSL_CIPHER_SPEC, _mqConfiguration.CipherSuite);
cf.SetIntProperty(XMSC.WMQ_CONNECTION_MODE, XMSC.WMQ_CM_CLIENT);
cf.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_OPTIONS, XMSC.WMQ_CLIENT_RECONNECT);
cf.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT, XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT_DEFAULT);
}
public IEnumerable<IMessage> ReceiveMessage()
{
using var connection = _connectionFactory.CreateConnection();
using var session = connection.CreateSession(false, AcknowledgeMode.AutoAcknowledge);
using var destination = session.CreateQueue(_mqConfiguration.ReceiveQueue);
using var consumer = session.CreateConsumer(destination);
connection.Start();
var result = new List<IMessage>();
var keepRunning = true;
while (keepRunning)
{
try
{
var sw = new Stopwatch();
sw.Start();
var message = _mqConfiguration.ConsumerTimeoutMs == 0 ? consumer.Receive()
: consumer.Receive(_mqConfiguration.ConsumerTimeoutMs);
if (message != null)
{
result.Add(message);
_messageLogger.LogInMessage(message);
var ellapsedMillis = sw.ElapsedMilliseconds;
if (_mqConfiguration.ConsumerTimeoutMs == 0)
{
keepRunning = false;
}
}
else
{
keepRunning = false;
}
}
catch (Exception e)
{
// We log the exception
keepRunning = false;
}
}
consumer.Close();
destination.Dispose();
session.Dispose();
connection.Close();
return result;
}
The symptoms look like a match for APAR IJ20591: Managed .NET SSL application making MQGET calls unexpectedly receives MQRC_CONNECTION_BROKEN when running in .NET Core. This impacts messages larger than 15kb and IBM MQ .net standard (core) libraries using TLS channels. See also this thread. This will be fixed in 9.2.0.5, no CDS release is listed.
It states:
Setting the heartbeat interval to lower values may reduce the frequency of occurrence.
If your .NET application is not using a CCDT you can lower the heartbeat by having the SVRCONN channel's HBINT lowered and reconnecting your application.

Initialization of MQQueueManager hangs in .NET core 2.2

We are having issues when initializing the Constructor for MQQueueManager. Our service is a Cron Job hosted on Kubernetes (AWS) which is scheduled to run every minute. This Cron Job connects to IBM MQ Server and reads the messages by looping through a list of queues. We are using "IBM MQ Client for .NET Core(9.2.0)" for connecting to MQ server.
Service was working fine for couple of months in Production , but one fine day the service stopped working, when checking the logs we could see Service was stuck while initializing MQQueueManager and Cron Job status in the POD was still showing running. Since the Cron Job was in running status , new jobs couldn't be created and finally we had to kill the POD on K8s to get the service up and running.
We have tried replicating the issue on our dev and test env's , but not succeeded yet. Not sure if the issue is while initializing
MQQueueManager constructor or the thread that is running is in deadlock. Below is the piece of code we are using... Any help would be appreciated....
Also there is no exception thrown...
private void InitialiseMQProperties()
{
try
{
_mqProperties = new Hashtable();
_mqProperties.Add(MQC.HOST_NAME_PROPERTY, _hostName);
_mqProperties.Add(MQC.PORT_PROPERTY, _port);
_mqProperties.Add(MQC.CHANNEL_PROPERTY, _channelName);
_mqProperties.Add(MQC.TRANSPORT_PROPERTY, MQC.TRANSPORT_MQSERIES_MANAGED);
_mqProperties.Add(MQC.USER_ID_PROPERTY, _userId);
}
catch (Exception ex)
{
Logger.Error("ConfigurationError: Initialising properties for Queue {Message}", ex.Message);
}
}
**This function gets called by the Cron Job every minute**
public async Task GetMessagesFromQueue()
{
try
{
InitialiseMQProperties();
var queTaskList = new List<Task>();
var _queues = _queueList.Split(',');
foreach (var queue in _queues)
{
queTaskList.Add(GetMessasgesByQueue(queue));
}
await Task.WhenAll(queTaskList).ConfigureAwait(false);
Logger.Information("Messages successfully processed from the Queues");
}
catch (Exception ex)
{
Logger.Error("GeneralError: Reading messages from Queue error {Message}", ex.Message);
}
}
private async Task GetMessasgesByQueue(string queueName)
{
Logger.Information("Initialising queuemanager properties");
var queueManager = new MQQueueManager("", _mqProperties);
**From the logs it gets Stuck here**
MQQueue queue = null;
MQMessage message;
MQGetMessageOptions getMessageOptions;
try
{
getMessageOptions = new MQGetMessageOptions();
getMessageOptions.Options |= MQC.MQGMO_SYNCPOINT + MQC.MQGMO_FAIL_IF_QUIESCING;
Logger.Information("Accessing queue {queueName}", queueName);
queue = queueManager.AccessQueue(queueName, MQC.MQOO_INPUT_AS_Q_DEF + MQC.MQOO_FAIL_IF_QUIESCING);
Logger.Information("Connection to queue succeeded");
while (true)
{
var commitTrans = true;
message = new MQMessage();
queue.Get(message, getMessageOptions);
var queMessage = message.ReadString(message.MessageLength);
Logger.Debug("Message read from MQ {0}", queMessage); //sensitive
if (queMessage.Length > 0)
{
var data = await _dataSerializer.DeserializePayload<Response>(queMessage);
if (data != null)
{
**Internal logic**
}
}
}
catch (MQException mqe)
{
switch (mqe.ReasonCode)
{
case MQC.MQRC_NO_MSG_AVAILABLE:
Logger.Information("MQInfo: No message available in the queue {queueName}", queueName);
CloseQueue(queue);
break;
case MQC.MQRC_Q_MGR_QUIESCING:
case MQC.MQRC_Q_MGR_STOPPING:
CloseQueue(queue);
queueManager.Backout();
Logger.Error("MQError: Queue Manager Stopping error {Message}", mqe.Message);
break;
case MQC.MQRC_Q_MGR_NOT_ACTIVE:
case MQC.MQRC_Q_MGR_NOT_AVAILABLE:
CloseQueue(queue);
queueManager.Backout();
Logger.Error("MQError: Queue Manager not available error {Message}", mqe.Message);
break;
default:
Logger.Error("MQError: Error reading queue {queueName} error {Message}", queueName, mqe.Message);
CloseQueue(queue);
queueManager.Backout();
break;
}
}
Logger.Information("Closing queue {queueName}", queueName);
CloseQueue(queue);
CloseQueueManager(queueManager);
Logger.Information("Closing Queue manager for queue {queueName}", queueName);
}
I have a very similar setup, I am using aks and XMS client for .net core 3.1, running the CronJob every second, I saw a couple of crashes in Prod without understanding why even checking the POD logs, I had a conversation with the MQ team and their advice was to recycle the connection, it looks like the amount of connection exhaust the communication and crash the app but we couldn't see that in DEV or Test because of the amount of messages processed per environment.
What I ended doing was to spin up the container, create the connection with MQ and leave it open for the life cycle of the POD instead of each CronJob Cycle and destroying it only if the POD crashes or stops.
Something that worth evaluating?

Why a simple configuration in MassTransit creates 2 queues and 3 exchanges?

I created a MassTransit quickstart program to interact with my localhost RabbitMQ:
namespace ConsoleApp1
{
public static class Program
{
public class YourMessage
{
public string Text { get; set; }
}
public static async Task Main(params string[] args)
{
var bus = Bus.Factory.CreateUsingRabbitMq(sbc =>
{
var host = sbc.Host(new Uri("rabbitmq://localhost"), h =>
{
h.Username("guest");
h.Password("guest");
});
sbc.ReceiveEndpoint(host, "test_queue", ep =>
{
ep.Handler<YourMessage>(async context => await Console.Out.WriteLineAsync($"Received: {context.Message.Text}"));
});
});
await bus.StartAsync();
await bus.Publish(new YourMessage{Text = "Hi"});
Console.WriteLine("Press any key to exit");
Console.ReadKey();
await bus.StopAsync();
}
}
}
Everything looked fine untill I actually checked the underlying RabbitMQ management and found out that just for this very simple program, MassTransit created 3 exchanges and 2 queues.
Exchanges, all fanouts:
ConsoleApp1:Program-YourMessage: Durable
VP0003748_dotnet_bus_6n9oyyfzxhyx9ybobdmpj8qeyt: Auto-delete and Durable?
test_queue: Durable
Queues:
VP0003748_dotnet_bus_6n9oyyfzxhyx9ybobdmpj8qeyt: x-expire 60000
test_queue: Durable
I would like to know why all of that is necessary or is the default configuration? In particular, I am not really sure to get the point of creating so "many".
It is all described in the documentation.
ConsoleApp1:Program-YourMessage is the message contract exchange, here messages are being published.
test_queue is the endpoint exchange. It binds to the message exchange. This way, when you have multiple consumers for the same message type (pub-sub), they all get their copy of the message.
test_queue is the queue, which binds to the endpoint exchange. Publish-subscribe in RMQ requires exchanges and queues can find to exchanges, so messages get properly delivered.
Both non-durable queue and exchange with weird names are the endpoint temp queue and exchange, which are used for request-response.

Error using Azure Service Bus Subscription OnMessageAsync event

We are using Azure Service Bus in our project and while reading messages from service bus topic/subscription.
We are using subscriptionClient.OnMessageAsync event in conjunction with onMessageOptions.ExceptionReceived.
Let me write down the steps we followed to reproduce the issue we are facing.
Create a service bus namespace with default config in the azure portal
Create a topic inside it with default config in the azure portal
Create a subscription inside it with default config in the azure portal
Create a console app and paste the code added below
Connect the service bus using Service Bus Explorer
Run the console app
Send a few test messages from service bus explorer & watch the console app window
Though the messages are processed successfully every time the control is going inside the ExceptionReceived method
Here's the code:
class Program
{
static void Main()
{
var subscriptionClient = SubscriptionClient.CreateFromConnectionString
(
"servicebusendpointaddress",
"topicname",
"subscriptionname",
ReceiveMode.PeekLock
);
var onMessageOptions = new OnMessageOptions();
onMessageOptions.ExceptionReceived += OnMessageError;
subscriptionClient.OnMessageAsync(OnMessageReceived, onMessageOptions);
Console.ReadKey();
}
private static void OnMessageError(object sender, ExceptionReceivedEventArgs e)
{
if (e != null && e.Exception != null)
{
Console.WriteLine("Hey, there's an error!" + e.Exception.Message + "\r\n\r\n");
}
}
private static async Task OnMessageReceived(BrokeredMessage arg)
{
await arg.CompleteAsync();
Console.WriteLine("Message processing done!");
}
}
Are we missing something here?
Also one point to mention is that is we enable ‘autocomplete’ and remove the await arg.CompleteAsync(); then this is not happening.
var onMessageOptions = new OnMessageOptions() { AutoComplete = true};
In both the cases the messages are being processed successfully & removed from the subscription immediately.
You might be getting this because you are debugging and stepping though the code i.e. the lock expires. The LockDuration by default is 60 seconds.
You can try setting your OnMessageOptions() like this to test:
var onMessageOptions = new OnMessageOptions() { AutoRenewTimeout = TimeSpan.FromMinutes(1) };

Categories

Resources