Having no error when acking message failed - c#

Sometimes, about 1 of 10000 rabbit acks I have problem with acking properly.
In scenario with failure when I am acking message A I got no exception. Seems like everything went ok. When I enqueue next message B I am getting exception System.IO.EndOfStreamException: SharedQueue. Stack trace says that exception was thrown in RabbitMQ.Util.SharedQueue.EnsureIsOpen() which is private method.
There is a problem - A message is not acked! I got no exception when acking, but it is still not acked! There is a small chance that two application can get the same message. Is there anything I can do about it?
I already tried to change some settings like:
AutomaticRecoveryEnabled = true;
RequestedHeartbeat = 60;
NetworkRecoveryInterval = TimeSpan.FromSeconds(10);
And wrote a method to test connection before every ack:
public bool HasFullConnection() {
if (!HasServerConnection(Config.ConnectionConfig.HostAddress)) return false;
if (!RabbitMQConnection.IsOpen) return false;
if (!RabbitMQClient.IsOpen) return false;
return true;
}
Unfortunelly - problem still exists.

So, there are a couple of things going on here. First, as you've probably found out by now, the C# client library doesn't handle off-nominal situations very well. Second, and more importantly, your application design requires 100% reliability on acking, which it should not.
A failure rate of 1 in 10,000 (or 0.01%) is reasonably good. If you're only seeing ack failures on 0.01% of your messages, I would consider that to be an acceptable level of failure. Acknowledgements in RabbitMQ are delivered to the broker asynchronously, and are based on the channel and consumer that originally received the message. Many things can happen between the time a message was received and when it was actually acknowledged. Thus, if anything happens which disrupts the consumer, channel, broker, or connection, the message is assumed to have failed and is re-delivered.
This is known as at least once delivery. The alternative to this is at most once delivery (the alternate does not use acknowledgements). So, your system should be designed to handle redelivered messages, assuming you're using at most once delivery.

Related

Azure Service Bus - MaxConcurrentCalls=1 - The lock supplied is invalid. Either the lock expired

I am using Azure Service Bus and I have the code below (c# .NetCore 3.1). I am constantly getting the error "The lock supplied is invalid. Either the lock expired, or the message has already been removed from the queue, or was received by a different receiver instance." when I call "CompleteAsync"
As you can see in the code I have the "ReceiveMode.PeekLock", "AutoComplete = false" and MaxAutoRenewDuration to 5 min. The code that handles the message completes in less than 1 second and I still get that error every single time.
What drove me crazy is that after hours reading posts, rewriting my code and a lot of "try and error" I decided to increase the MaxConcurrentCalls from 1 to 2 and magically the error disappeared.
Does anybody knows what is going on here?
public void OpenQueue(string queueName)
{
var messageHandlerOptions = new MessageHandlerOptions(exceptionReceivedEventArgs =>
{
Log.Error($"Message handler encountered an exception {exceptionReceivedEventArgs.Exception}.");
return Task.CompletedTask;
});
messageHandlerOptions.MaxConcurrentCalls = 1;
messageHandlerOptions.AutoComplete = false;
messageHandlerOptions.MaxAutoRenewDuration = TimeSpan.FromSeconds(300);
messageReceiver = queueManagers.OpenReceiver(queueName, ReceiveMode.PeekLock);
messageReceiver.RegisterMessageHandler(async (message, token) =>
{
if (await ProcessMessage(message)) //really quick operation less than 1 second
{
await messageReceiver.CompleteAsync(message.SystemProperties.LockToken);
}
else
{
await messageReceiver.AbandonAsync(message.SystemProperties.LockToken);
}
}, messageHandlerOptions);
}
I decided to increase the MaxConcurrentCalls from 1 to 2 and magically the error disappeared.
Concurrency and lock duration is not the only variables in the equation. This sounds like a prefetch issue. If enabled, more messages are prefetched than processed to save on the latency and the roundtrips. If the prefetch is too aggressive, messages that are pre-fetched and waiting are still going to be processed, and while the processing would normally be short enough, the combined time of waiting for processing and the actual processing would exceed the lock duration.
I would suggest to:
Increase MaxLockDuration on the queue
Validate the prefetch count
Regarding MaxLockDuration vs MaxAutoRenewDuration these two are tricky. While the first is guaranteed, the second is not and is a best-effort by the client.
I'm writing the solution for my problem as it may help others.
Turns out the root cause of the problem was a quite basic mistake, but the error got me really confused.
The method OpenQueue was called more than once on the same class instance (multiple queues scenario) what was a mistake. The behavior was quite weird. Looks like queueManagers registered all queues as expected but the token got overwritten causing it to always be invalid.
When I wrote:
I decided to increase the MaxConcurrentCalls from 1 to 2 and magically the error disappeared.
Later that statement proved to be incorrect. When I enabled multiple queues that failed miserably.
The block of code I posted here is actually working. What was around it was broken. I was trying to gain some time and ended up writing bad code. I fixed my design to manage things properly and everything is now running smooth.

After using the InternetConnect() API in wininet how can I tell if I'm still connected?

I use the InternetConnect() method from the WinINet APIs. I connect to my FTP server just fine with no issues. After I connect, I wait about 1 min and the server disconnects me because of no activity as expected. I then try to send a file but I'm not connected.
Is there a way to "check" the FTP connection to see if I'm still connected? Or is there some type of way for me to attach an event to tell me when I get disconnected?
I haven't used wininet for FTP, and I use the classes, not the global functions directly. But I suspect that CFtpConnection behaves the same way as CHttpConnection in this respect. Anyway, you might learn something from what I have discovered about the latter.
CHttpConnection seems to be a high and abstract level of connection. When I started out I expected its member functions to throw exceptions once the server closed the underlying socket (for timeout). I now know better or at least believe otherwise. NORMAL closing of the socket does NOT cause exceptions to be thrown at this high level of classy wininet. You might suspect as much from inspecting the wininet error codes: There is no code corresponding to the server having closed the connection.
I experimented with this and found: The server closing the socket (for timeout) is considered normal and does not cause an exception to be thrown. You can go ahead and use CHttpConnection without worrying about this. It will simply reconnect if needed without alerting you. So once you have called GetHttpConnection and got your CHttpConnection object, it will normally last forever!
The exceptions that might be thrown, ERROR_INTERNET_CONNECTION_ABORTED and ERROR_INTERNET_CONNECTION_RESET, are caused by abnormal conditions, f.ex. a proxy server crashing or somebody accidentally pulling the power plug to your modem. The server closing the socket for timeout is considered NORMAL and is transparent to the user of wininet classes.
So the tentative conclusion is that you don't have to worry about the connection being closed by the server. If that happens, CHttpConnection will reconnect backstage and you won't be bothered. You can pretend that the connection always stays open - it seems so to the user of wininet classes.
Consider the following experiment. A connection is opened and then a request is sent once a minute. The function returns once an exception is thrown. But if not, then it loops forever. I tried it on two different web sites: An exception is NEVER thrown! Despite a whole minute of inactivity between requests.
int httpclient::test(string host)
{
int flags = INTERNET_FLAG_RELOAD;
int port = INTERNET_DEFAULT_HTTP_PORT;
CHttpConnection *connection = session.GetHttpConnection(host.cstring(),flags,port);
int secs = 0;
while (true)
{
CHttpFile *fil;
try
{
fil = connection->OpenRequest(CHttpConnection::HTTP_VERB_HEAD, "index.htm",
0,1,0, "HTTP/1.1", flags);
}
catch (CInternetException *exc)
{
connection->Close();
int feil = exc->m_dwError;
exc->Delete();
return -feil;
}
fil->AddRequestHeaders("Connection: Keep-Alive");
try
{
fil->SendRequest();
}
catch (CInternetException *exc)
{
connection->Close();
int feil = exc->m_dwError;
exc->Delete();
return feil;
}
fil->Close();
Sleep(60 * 1000);
secs += 60;
printf("%u seconds passed\n", secs);
}
return 0;
}
Take all this with a grain of salt. wininet is poorly documented and all I know is what experiments have taught me.

Odd Behavior of Azure Service Bus ReceiveBatch()

Working with a Azure Service Bus Topic currently and running into an issue receiving my messages using ReceiveBatch method. The issue is that the expected results are not actually the results that I am getting. Here is the basic code setup, use cases are below:
SubscriptionClient client = SubscriptionClient.CreateFromConnectionString(connectionString, convoTopic, subName);
IEnumerable<BrokeredMessage> messageList = client.ReceiveBatch(100);
foreach (BrokeredMessage message in messageList)
{
try
{
Console.WriteLine(message.GetBody<string>() + message.MessageId);
message.Complete();
}
catch (Exception ex)
{
message.Abandon();
}
}
client.Close();
MessageBox.Show("Done");
Using the above code, if I send 4 messages, then poll on the first run through I get the first message. On the second run through I get the other 3. I'm expecting to get all 4 at the same time. It seems to always return a singular value on the first poll then the rest on subsequent polls. (same result with 3 and 5 where I get n-1 of n messages sent on the second try and 1 message on the first try).
If I have 0 messages to receive, the operation takes between ~30-60 seconds to get the messageList (that has a 0 count). I need this to return instantly.
If I change the code to IEnumerable<BrokeredMessage> messageList = client.ReceiveBatch(100, new Timespan(0,0,0)); then issue #2 goes away because issue 1 still persists where I have to call the code twice to get all the messages.
I'm assuming that issue #2 is because of a default timeout value which I overwrite in #3 (though I find it confusing that if a message is there it immediately responds without waiting the default time). I am not sure why I never receive the full amount of messages in a single ReceiveBatch however.
The way I got ReceiveBatch() to work properly was to do two things.
Disable Partitioning in the Topic (I had to make a new topic for this because you can't toggle that after creation)
Enable Batching on each subscription created like so:
List item
SubscriptionDescription sd = new SubscriptionDescription(topicName, orgSubName);
sd.EnableBatchedOperations = true;
After I did those two things, I was able to get the topics to work as intended using IEnumerable<BrokeredMessage> messageList = client.ReceiveBatch(100, new TimeSpan(0,0,0));
I'm having a similar problem with an ASB Queue. I discovered that I could mitigate it somewhat by increasing the PrefetchCount on the client prior to receiving the batch:
SubscriptionClient client = SubscriptionClient.CreateFromConnectionString(connectionString, convoTopic, subName);
client.PrefetchCount = 100;
IEnumerable<BrokeredMessage> messageList = client.ReceiveBatch(100);
From the Azure Service Bus Best Practices for Performance Improvements Using Service Bus Brokered Messaging:
Prefetching enables the queue or subscription client to load additional messages from the service when it performs a receive operation.
...
When using the default lock expiration of 60 seconds, a good value for
SubscriptionClient.PrefetchCount is 20 times the maximum processing rates of all receivers of the factory. For example, a factory creates 3 receivers, and each receiver can process up to 10 messages per second. The prefetch count should not exceed 20*3*10 = 600.
...
Prefetching messages increases the overall throughput for a queue or subscription because it reduces the overall number of message operations, or round trips. Fetching the first message, however, will take longer (due to the increased message size). Receiving prefetched messages will be faster because these messages have already been downloaded by the client.
Just a few more pieces to the puzzle. I still couldn't get it to work even after Enable Batching and Disable Partitioning - I still had to do two ReceiveBatch calls. I did find however:
Restarting the Service Bus services (I am using Service Bus for Windows Server) cleared up the issue for me.
Doing a single RecieveBatch and taking no action (letting the message locks expire) and then doing another ReceiveBatch caused all of the messages to come through at the same time. (Doing an initial ReceiveBatch and calling Abandon on all of the messages didn't cause that behavior.)
So it appears to be some sort of corruption/bug in Service Bus's in-memory cache.

Handling poison messages in MSMQ

Current Setup includes a windows service which picks up a message from the local queue and extracts the information and puts in to my SQL database.According to my design
Service picks up the message from the queue.(I am using Peek() here).
Sends it to the database.
If for some reason i get an exception while saving it to the database the message is back into the queue,which to me is reliable.
I am logging the errors so that a user can know what's the issue and fix it.
Exception example:If the DBconnection is lost during saving process of the messages to the database then the messages are not lost as they are in the queue.I don't comit untill i get an acknowledgement from the DB that the message is inserted .So a user can see the logs and make sure that the DBconnection exists and every thing would be normal and we dont lose any messages in the queue.
But looking into another scenario:The messages I would be getting in the queue are from a 3rd party according a standard schema.The schema would remain same and there is no change in that.But i have seen some where i get some format exceptions and since its not committed the message is back to the queue.At this point this message would be a bottle neck for me as the same messages is picked up again and tries to process the message.Every time the service would pick up the same message and gets the same exception.So this loops infinitely unless that message is removed or put that message last in the queue.
Looking at removing the message:As of now if i go based on the format exception...then i might be wrong since i might encounter some other exceptions in the future .
Is there a way i can put this messages back to the queue last in the list instead beginning of the queue.
Need some advice on how to proceed further.
Note:Queue is Transactional .
As far as I'm aware, MSMQ doesn't automatically dump messages to fail queues. Either way you handle it, it's only a few lines of code (Bill, Michael, and I recommend a fail queue). As far as a fail queue goes, you could simple create one named .\private$\queuename_fail.
Surviving poison messages in MSMQ is a a decent article over this exact topic, which has an example app and source code at the end.
private readonly MessageQueue _failQueue;
private readonly MessageQueue _messageQueue;
/* Other code here (cursor, peek action, run method, initialization etc) */
private void dumpToFailQueue(Message message)
{
var oldId = message.Id;
_failQueue.Send(message, MessageQueueTransactionType.Single);
// Remove the poisoned message
_messageQueue.ReceiveById(oldId);
}
private void moveToEnd(Message message)
{
var oldId = message.Id;
_messageQueue.Send(message, MessageQueueTransactionType.Single);
// Remove the poisoned message
_messageQueue.ReceiveById(oldId);
}

Serial processing of a certain message type in Rebus

We have a Rebus message handler that talks to a third party webservice. Due to reasons beyond our immediate control, this WCF service frequently throws an exception because it encountered a database deadlock in its own database. Rebus will then try to process this message five times, which in most cases means that one of those five times will be lucky and not get a deadlock. But it frequently happens that a message does get deadlock after deadlock and ends up in our error queue.
Besides fixing the source of the deadlocks, which would be a longterm goal, I can think of two options:
Keep trying with only this particular message type until it succeeds. Preferably I would be able to set a timeout, so "if five deadlocks then try again in 5 minutes" rather than choke the process up even more by trying continuously. I already do a Thread.Sleep(random) to spread the messages somewhat, but it will still give up after five tries.
Send this particular message type to a different queue that has only one worker that processes the message, so that this happens serially rather than in parallel. Our current configuration uses 8 worker threads, but this just makes the deadlock situation worse as the webservice now gets called concurrently and the messages get in each other's way.
Option #2 has my preference, but I'm not sure if this is possible. Our configuration on the receiving side currently looks like this:
var adapter = new Rebus.Ninject.NinjectContainerAdapter(this.Kernel);
var bus = Rebus.Configuration.Configure.With(adapter)
.Logging(x => x.Log4Net())
.Transport(t => t.UseMsmqAndGetInputQueueNameFromAppConfig())
.MessageOwnership(d => d.FromRebusConfigurationSection())
.CreateBus().Start();
And the .config for the receiving side:
<rebus inputQueue="app.msg.input" errorQueue="app.msg.error" workers="8">
<endpoints>
</endpoints>
</rebus>
From what I can tell from the config, it's only possible to set one input queue to 'listen' to. I can't really find a way to do this via the fluent mapping API either. That seems to take only one input- and error queue as well:
.Transport(t =>t.UseMsmq("input", "error"))
Basically, what I'm looking for is something along the lines of:
<rebus workers="8">
<input name="app.msg.input" error="app.msg.error" />
<input name="another.input.queue" error="app.msg.error" />
</rebus>
Any tips on how to handle my requirements?
I suggest you make use of a saga and Rebus' timeout service to implement a retry strategy that fits your needs. This way, in your Rebus-enabled web service facade, you could do something like this:
public void Handle(TryMakeWebServiceCall message)
{
try
{
var result = client.MakeWebServiceCall(whatever);
bus.Reply(new ResponseWithTheResult{ ... });
}
catch(Exception e)
{
Data.FailedAttempts++;
if (Data.FailedAttempts < 10)
{
bus.Defer(TimeSpan.FromSeconds(1), message);
return;
}
// oh no! we failed 10 times... this is probably where we'd
// go and do something like this:
emailService.NotifyAdministrator("Something went wrong!");
}
}
where Data is the saga data that is made magically available to you and persisted between calls.
For inspiration on how to create a saga, check out the wiki page on coordinating stuff that happens over time where you can see an example on how a service might have some state (i.e. number of failed attempts in your case) stored locally that is made available between handling messages.
When the time comes to make bus.Defer work, you have two options: 1) use an external timeout service (which I usually have installed one of on each server), or 2) just use "yourself" as a timeout service.
At configuration time, you go
Configure.With(...)
.(...)
.Timeouts(t => // configure it here)
where you can either StoreInMemory, StoreInSqlServer, StoreInMongoDb, StoreInRavenDb, or UseExternalTimeoutManager.
If you choose (1), you need to check out the Rebus code and build Rebus.Timeout yourself - it's basically just a configurable, Topshelf-enabled console application that has a Rebus endpoint inside.
Please let me know if you need more help making this work - bus.Defer is where your system becomes awesome, and will be capable of overcoming all of the little glitches that make all others' go down :)

Categories

Resources