I am testing .NET version of gRPC to understand how to handle network failures. I put the server to one external machine and debugging the client. The server ticks with a message onnce a second and the client just shows it on the console. So when I stop my local Wi-Fi connection for seconds, then gRPC engine automatically recovers and I even get remaining values. However, if I disable Wi-Fi for longer time like a minute, then it just gets stuck. I don't even get any exceptions so that I can just handle this case and recover manually. This scenario works fine when I close the server app manually, then an exception will occur on the client. This is what I have on the client:
static async Task Main(string[] args)
{
try
{
await Subscribe();
}
catch (Exception)
{
Console.WriteLine("Fail");
Thread.Sleep(1000);
await Main(args);
}
Console.ReadLine();
}
private static async Task Subscribe()
{
using var channel = GrpcChannel.ForAddress("http://x.x.x.x:5555");
var client = new Greeter.GreeterClient(channel);
var replies = client.GerReplies(new HelloRequest { Message = "Test" });
while (await replies.ResponseStream.MoveNext(CancellationToken.None))
{
Console.WriteLine(replies.ResponseStream.Current.Message);
}
Console.WriteLine("Completed");
}
This works when the server app stopped but it doesn't work if I just disable loca Wi-Fi connection on the client side. How can I handle such a case and similar ones?
I've managed to solve it by KeepAlivePingDelay setting:
var handler = new SocketsHttpHandler
{
KeepAlivePingDelay = TimeSpan.FromSeconds(5),
KeepAlivePingTimeout = TimeSpan.FromSeconds(5),
};
using var channel = GrpcChannel.ForAddress("http://x.x.x.x:5555", new GrpcChannelOptions
{
HttpHandler = handler
});
This configuration force gRPC fail after 10 seconds in case of no connection.
Related
I use IBM XMS to connect to a third party to send and receive messages.
UPDATE:
Client .Net Core 3.1
IBM XMS library version from Nuget. Tried 9.2.4 and 9.1.5 with same results
Same code used to work fine a week ago - so something must have changed in the MQ manager or somewhere in my infrastructure
SSL and client certificates
I have been using a receive with timeout for a while without problems but since last week I started to not see any messages to pick - even when they were there - but once I changed to the not timeout receive method I started again to pick messages every 5 minutes.
Looking at the XMS logs I can see the messages are actually read almost immediately with and without timeout but that XMS seems to be deciding to wait for those 5 minutes before returning the message...
I haven't changed anything in my side and the third party reassures they haven't either.
My question is: given the below code used to receive is there anything there that may be the cause of the 5 minutes wait? Any ideas on things I can try? I can share the XMS logs too if that helps.
// This is used to set the default properties in the factory before calling the receive method
private void SetConnectionProperties(IConnectionFactory cf)
{
cf.SetStringProperty(XMSC.WMQ_HOST_NAME, _mqConfiguration.Host);
cf.SetIntProperty(XMSC.WMQ_PORT, _mqConfiguration.Port);
cf.SetStringProperty(XMSC.WMQ_CHANNEL, _mqConfiguration.Channel);
cf.SetStringProperty(XMSC.WMQ_QUEUE_MANAGER, _mqConfiguration.QueueManager);
cf.SetStringProperty(XMSC.WMQ_SSL_CLIENT_CERT_LABEL, _mqConfiguration.CertificateLabel);
cf.SetStringProperty(XMSC.WMQ_SSL_KEY_REPOSITORY, _mqConfiguration.KeyRepository);
cf.SetStringProperty(XMSC.WMQ_SSL_CIPHER_SPEC, _mqConfiguration.CipherSuite);
cf.SetIntProperty(XMSC.WMQ_CONNECTION_MODE, XMSC.WMQ_CM_CLIENT);
cf.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_OPTIONS, XMSC.WMQ_CLIENT_RECONNECT);
cf.SetIntProperty(XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT, XMSC.WMQ_CLIENT_RECONNECT_TIMEOUT_DEFAULT);
}
public IEnumerable<IMessage> ReceiveMessage()
{
using var connection = _connectionFactory.CreateConnection();
using var session = connection.CreateSession(false, AcknowledgeMode.AutoAcknowledge);
using var destination = session.CreateQueue(_mqConfiguration.ReceiveQueue);
using var consumer = session.CreateConsumer(destination);
connection.Start();
var result = new List<IMessage>();
var keepRunning = true;
while (keepRunning)
{
try
{
var sw = new Stopwatch();
sw.Start();
var message = _mqConfiguration.ConsumerTimeoutMs == 0 ? consumer.Receive()
: consumer.Receive(_mqConfiguration.ConsumerTimeoutMs);
if (message != null)
{
result.Add(message);
_messageLogger.LogInMessage(message);
var ellapsedMillis = sw.ElapsedMilliseconds;
if (_mqConfiguration.ConsumerTimeoutMs == 0)
{
keepRunning = false;
}
}
else
{
keepRunning = false;
}
}
catch (Exception e)
{
// We log the exception
keepRunning = false;
}
}
consumer.Close();
destination.Dispose();
session.Dispose();
connection.Close();
return result;
}
The symptoms look like a match for APAR IJ20591: Managed .NET SSL application making MQGET calls unexpectedly receives MQRC_CONNECTION_BROKEN when running in .NET Core. This impacts messages larger than 15kb and IBM MQ .net standard (core) libraries using TLS channels. See also this thread. This will be fixed in 9.2.0.5, no CDS release is listed.
It states:
Setting the heartbeat interval to lower values may reduce the frequency of occurrence.
If your .NET application is not using a CCDT you can lower the heartbeat by having the SVRCONN channel's HBINT lowered and reconnecting your application.
We are having issues when initializing the Constructor for MQQueueManager. Our service is a Cron Job hosted on Kubernetes (AWS) which is scheduled to run every minute. This Cron Job connects to IBM MQ Server and reads the messages by looping through a list of queues. We are using "IBM MQ Client for .NET Core(9.2.0)" for connecting to MQ server.
Service was working fine for couple of months in Production , but one fine day the service stopped working, when checking the logs we could see Service was stuck while initializing MQQueueManager and Cron Job status in the POD was still showing running. Since the Cron Job was in running status , new jobs couldn't be created and finally we had to kill the POD on K8s to get the service up and running.
We have tried replicating the issue on our dev and test env's , but not succeeded yet. Not sure if the issue is while initializing
MQQueueManager constructor or the thread that is running is in deadlock. Below is the piece of code we are using... Any help would be appreciated....
Also there is no exception thrown...
private void InitialiseMQProperties()
{
try
{
_mqProperties = new Hashtable();
_mqProperties.Add(MQC.HOST_NAME_PROPERTY, _hostName);
_mqProperties.Add(MQC.PORT_PROPERTY, _port);
_mqProperties.Add(MQC.CHANNEL_PROPERTY, _channelName);
_mqProperties.Add(MQC.TRANSPORT_PROPERTY, MQC.TRANSPORT_MQSERIES_MANAGED);
_mqProperties.Add(MQC.USER_ID_PROPERTY, _userId);
}
catch (Exception ex)
{
Logger.Error("ConfigurationError: Initialising properties for Queue {Message}", ex.Message);
}
}
**This function gets called by the Cron Job every minute**
public async Task GetMessagesFromQueue()
{
try
{
InitialiseMQProperties();
var queTaskList = new List<Task>();
var _queues = _queueList.Split(',');
foreach (var queue in _queues)
{
queTaskList.Add(GetMessasgesByQueue(queue));
}
await Task.WhenAll(queTaskList).ConfigureAwait(false);
Logger.Information("Messages successfully processed from the Queues");
}
catch (Exception ex)
{
Logger.Error("GeneralError: Reading messages from Queue error {Message}", ex.Message);
}
}
private async Task GetMessasgesByQueue(string queueName)
{
Logger.Information("Initialising queuemanager properties");
var queueManager = new MQQueueManager("", _mqProperties);
**From the logs it gets Stuck here**
MQQueue queue = null;
MQMessage message;
MQGetMessageOptions getMessageOptions;
try
{
getMessageOptions = new MQGetMessageOptions();
getMessageOptions.Options |= MQC.MQGMO_SYNCPOINT + MQC.MQGMO_FAIL_IF_QUIESCING;
Logger.Information("Accessing queue {queueName}", queueName);
queue = queueManager.AccessQueue(queueName, MQC.MQOO_INPUT_AS_Q_DEF + MQC.MQOO_FAIL_IF_QUIESCING);
Logger.Information("Connection to queue succeeded");
while (true)
{
var commitTrans = true;
message = new MQMessage();
queue.Get(message, getMessageOptions);
var queMessage = message.ReadString(message.MessageLength);
Logger.Debug("Message read from MQ {0}", queMessage); //sensitive
if (queMessage.Length > 0)
{
var data = await _dataSerializer.DeserializePayload<Response>(queMessage);
if (data != null)
{
**Internal logic**
}
}
}
catch (MQException mqe)
{
switch (mqe.ReasonCode)
{
case MQC.MQRC_NO_MSG_AVAILABLE:
Logger.Information("MQInfo: No message available in the queue {queueName}", queueName);
CloseQueue(queue);
break;
case MQC.MQRC_Q_MGR_QUIESCING:
case MQC.MQRC_Q_MGR_STOPPING:
CloseQueue(queue);
queueManager.Backout();
Logger.Error("MQError: Queue Manager Stopping error {Message}", mqe.Message);
break;
case MQC.MQRC_Q_MGR_NOT_ACTIVE:
case MQC.MQRC_Q_MGR_NOT_AVAILABLE:
CloseQueue(queue);
queueManager.Backout();
Logger.Error("MQError: Queue Manager not available error {Message}", mqe.Message);
break;
default:
Logger.Error("MQError: Error reading queue {queueName} error {Message}", queueName, mqe.Message);
CloseQueue(queue);
queueManager.Backout();
break;
}
}
Logger.Information("Closing queue {queueName}", queueName);
CloseQueue(queue);
CloseQueueManager(queueManager);
Logger.Information("Closing Queue manager for queue {queueName}", queueName);
}
I have a very similar setup, I am using aks and XMS client for .net core 3.1, running the CronJob every second, I saw a couple of crashes in Prod without understanding why even checking the POD logs, I had a conversation with the MQ team and their advice was to recycle the connection, it looks like the amount of connection exhaust the communication and crash the app but we couldn't see that in DEV or Test because of the amount of messages processed per environment.
What I ended doing was to spin up the container, create the connection with MQ and leave it open for the life cycle of the POD instead of each CronJob Cycle and destroying it only if the POD crashes or stops.
Something that worth evaluating?
From various examples, I have set up my named pipe loop like so:
NamedPipeServerStream pipeServer;
private void initPipeServer(string pipeName)
{
Task.Factory.StartNew(async () =>
{
pipeServer = new NamedPipeServerStream(pipeName, PipeDirection.InOut);
var threadId = Thread.CurrentThread.ManagedThreadId;
pipeServer.WaitForConnection();
onNotify("Client connected on thread[{0}]", threadId);
try
{
var pipeReader = new StreamReader(pipeServer);
var pipeWriter = new StreamWriter(pipeServer);
// how do I call and get a response from outside this loop?
// var result = await pipeReader.ReadLineAsync();
// await pipeWriter.WriteLineAsync("stuff");
// LOOP HERE SOMEHOW WAITING FOR ASYNC MESSAGES AND RETURNING RESPONSE
while (no body quits)
{
let communication happen, when a hotkey is pressed in Host, it
will use the named pipe to request and receive information from the client
}
pipeServer.Disconnect();
}
catch (IOException ex)
{
onNotify(ex);
}
finally
{
pipeServer.Close();
initPipeServer(pipeName);
}
}, TaskCreationOptions.LongRunning);
}
but how would I send and receive a response from outside the loop instead of just having a predetermined exchange?
Update. I think I'm looking for something like Async two-way communication with Windows Named Pipes (.Net) except I'm not using WCF and I'm not on a server. This is a WPF application.
Update 2. Ideal flow:
Host sets up named pipe server
Client connects to pipe server
When hotkey is pressed in Host, request is sent to client for information
Client returns information to Host
Proceed until either Host or Client quits, if Client quits, Host resets pipe server and waits for connection
For some reason this started happening. Didn't yesterday. My client hangs after channel.BasicPublish returns in channel.Dispose. My connection is not bad, i.e. it's in open state and app is not being shutdown as suggested in https://groups.google.com/forum/?fromgroups=#!topic/rabbitmq-discuss/5nzeEqI5qxw. Both ways behave the same:
using (var channel = _connection.CreateModel()) {
//use channel here
}
and
var channel = _connection.CreateModel()
//use channel here
channel.Dispose();
They have bug 25255 on this issue - link.
For now try using a timeout around this call.
private void DisconnectWithTimeout(IConnection connection, int timeoutMillis)
{
var task = Task.Run(() => connection.Dispose());
if (!task.Wait(timeoutMillis))
{
//timeout
throw new TimeoutException("Timeout on connection.Dispose()");
}
}
Do you have any pointers how to determine when a subscription problem has occurred so I can reconnect?
My service uses RabbitMQ.Client.MessagePatterns.Subscription for it's subscription. After some time, my client silently stops receiving messages. I suspect network issues as I our VPN connection is not the most reliable.
I've read through the docs for awhile looking for a key to find out when this subscription might be broken due to a network issue without much luck. I've tried checking that the connection and channel are still open, but it always seems to report that it is still open.
The messages it does process work quite well and are acknowledged back to the queue so I don't think it's an issue with the "ack".
I'm sure I must be just missing something simple, but I haven't yet found it.
public void Run(string brokerUri, Action<byte[]> handler)
{
log.Debug("Connecting to broker: {0}".Fill(brokerUri));
ConnectionFactory factory = new ConnectionFactory { Uri = brokerUri };
using (IConnection connection = factory.CreateConnection())
{
using (IModel channel = connection.CreateModel())
{
channel.QueueDeclare(queueName, true, false, false, null);
using (Subscription subscription = new Subscription(channel, queueName, false))
{
while (!Cancelled)
{
BasicDeliverEventArgs args;
if (!channel.IsOpen)
{
log.Error("The channel is no longer open, but we are still trying to process messages.");
throw new InvalidOperationException("Channel is closed.");
}
else if (!connection.IsOpen)
{
log.Error("The connection is no longer open, but we are still trying to process message.");
throw new InvalidOperationException("Connection is closed.");
}
bool gotMessage = subscription.Next(250, out args);
if (gotMessage)
{
log.Debug("Received message");
try
{
handler(args.Body);
}
catch (Exception e)
{
log.Debug("Exception caught while processing message. Will be bubbled up.", e);
throw;
}
log.Debug("Acknowledging message completion");
subscription.Ack(args);
}
}
}
}
}
}
UPDATE:
I simulated a network failure by running the server in a virtual machine and I do get an exception (RabbitMQ.Client.Exceptions.OperationInterruptedException: The AMQP operation was interrupted) when I break the connection for long enough so perhaps it isn't a network issue. Now I don't know what it would be but it fails after just a couple hours of running.
EDIT: Since I'm sill getting upvotes on this, I should point out that the .NET RabbitMQ client now has this functionality built in: https://www.rabbitmq.com/dotnet-api-guide.html#connection-recovery
Ideally, you should be able to use this and avoid manually implementing reconnection logic.
I recently had to implement nearly the same thing. From what I can tell, most of the available information on RabbitMQ assumes that either your network is very reliable or that you run a RabbitMQ broker on the same machine as any client sending or receiving messages, allowing Rabbit to deal with any connection issues.
It's really not that hard to set up the Rabbit client to be robust against dropped connections, but there are a few idiosyncrasies that you need to deal with.
The first thing you need to do turn on the heartbeat:
ConnectionFactory factory = new ConnectionFactory()
{
Uri = brokerUri,
RequestedHeartbeat = 30,
};
Setting the "RequestedHeartbeat" to 30 will make the client check every 30 seconds if the connection is still alive. Without this turned on, the message subscriber will sit there happily waiting for another message to come in without a clue that its connection has gone bad.
Turning the heartbeat on also makes the server check to see if the connection is still up, which can be very important. If a connection goes bad after a message has been picked up by the subscriber but before it's been acknowledged, the server just assumes that the client is taking a long time, and the message gets "stuck" on the dead connection until it gets closed. With the heartbeat turned on, the server will recognize when the connection goes bad and close it, putting the message back in the queue so another subscriber can handle it. Without the heartbeat, I've had to go in manually and close the connection in the Rabbit management UI so that the stuck message can get passed to a subscriber.
Second, you will need to handle OperationInterruptedException. As you noticed, this is usually the exception the Rabbit client will throw when it notices the connection has been interrupted. If IModel.QueueDeclare() is called when the connection has been interrupted, this is the exception you will get. Handle this exception by disposing of your subscription, channel, and connection and creating new ones.
Finally, you will have to handle what your consumer does when trying to consume messages from a closed connection. Unfortunately, each different way of consuming messages from a queue in the Rabbit client seems to react differently. QueueingBasicConsumer throws EndOfStreamException if you call QueueingBasicConsumer.Queue.Dequeue on a closed connection. EventingBasicConsumer does nothing, since it's just waiting for a message. From what I can tell from trying it, the Subscription class you're using seems to return true from a call to Subscription.Next, but the value of args is null. Once again, handle this by disposing of your connection, channel, and subscription and recreating them.
The value of connection.IsOpen will be updated to False when the connection fails with the heartbeat on, so you can check that if you would like. However, since the heartbeat runs on a separate thread, you will still need to handle the case where the connection is open when you check it, but closes before subscription.Next() is called.
One final thing to watch out for is IConnection.Dispose(). This call will throw a EndOfStreamException if you call dispose after the connection has been closed. This seems like a bug to me, and I don't like not calling dispose on an IDisposable object, so I call it and swallow the exception.
Putting that all together in a quick and dirty example:
public bool Cancelled { get; set; }
IConnection _connection = null;
IModel _channel = null;
Subscription _subscription = null;
public void Run(string brokerUri, string queueName, Action<byte[]> handler)
{
ConnectionFactory factory = new ConnectionFactory()
{
Uri = brokerUri,
RequestedHeartbeat = 30,
};
while (!Cancelled)
{
try
{
if(_subscription == null)
{
try
{
_connection = factory.CreateConnection();
}
catch(BrokerUnreachableException)
{
//You probably want to log the error and cancel after N tries,
//otherwise start the loop over to try to connect again after a second or so.
continue;
}
_channel = _connection.CreateModel();
_channel.QueueDeclare(queueName, true, false, false, null);
_subscription = new Subscription(_channel, queueName, false);
}
BasicDeliverEventArgs args;
bool gotMessage = _subscription.Next(250, out args);
if (gotMessage)
{
if(args == null)
{
//This means the connection is closed.
DisposeAllConnectionObjects();
continue;
}
handler(args.Body);
_subscription.Ack(args);
}
}
catch(OperationInterruptedException ex)
{
DisposeAllConnectionObjects();
}
}
DisposeAllConnectionObjects();
}
private void DisposeAllConnectionObjects()
{
if(_subscription != null)
{
//IDisposable is implemented explicitly for some reason.
((IDisposable)_subscription).Dispose();
_subscription = null;
}
if(_channel != null)
{
_channel.Dispose();
_channel = null;
}
if(_connection != null)
{
try
{
_connection.Dispose();
}
catch(EndOfStreamException)
{
}
_connection = null;
}
}