I added a new worker role to our project to handle making outgoing phone calls. It draws the calls to make off a service bus queue.
I made a little console app to throw things onto the queue so that I could check to make sure it was working. Unfortunately it seems like whatever is put onto the queue by my app is instantly (or almost instantly?) put onto the dead letter queue. I took a look at the queue's properties and I can see
MessageCount 5
ActiveMessageCount 0
DeadLetterMessageCount 5
TransferMessageCount 0
AvailabilityStatus Available
EnableDeadLetteringOnMessageExpiration False
The worker role tries to get stuff off the queue, but all it's pulling is nulls.
The queue settings are the same as our other queue that's working. The object that I'm putting onto the queue is made up of almost all primitives, marked [MessageContract] and [Serializable]. All the members are marked [MessageHeader].
I also tried using the object that we use for our other queue just to see what would happen and that one dead-lettered right away too.
I don't get it. The object is clearly making it to the queue as the queue size is growing. But it just dead letters right away and I don't know what could cause that to happen besides the thing timing out.
Further Information: Took a look using Service Bus Explorer and it seems that the messages were being dead-lettered because MaxDeliveryCountExceeded. This seems to mean that if receiving a message fails more than 10 times it will shift it to the dead letter queue. So that parts solved, but I put a debug point in the worker role code and there's never any errors happening in there. My
BrokeredMessage message = Client.Receive()
always returns null so there's not even any time for it do anything wrong. I suppose something is going wrong in the actual Receive() call?
Turns out that having a Uri in the MessageContract was enough to have it fail somewhere in the deserialization process. So Client.Receive(); tried to get it 10 times, always failing, and then it got deadlettered.
I thought that Uris were Serializable but there seems to be a problem with putting them through a service bus. In any case, in my situation it wasn't a big deal to change the Uri to a string and everything's fine now.
Related
I have a worker attached to a subscription processing messages. Depending on the message payload, the worker could possibly take days to completely process the message.
I'm a bit confused by different properties that the client can set that control how the Pub/Sub client library automatically extends the deadline so that the worker can process the message without fearing that the message will be redelivered.
Properties from documentation which is not very clear :
MinimumAckDeadline, MaximumAckDeadline, DefaultAckDeadline, MinimumAckExtensionWindow, DefaultAckExtensionWindow, MinimumLeaseExtensionDelay, and DefaultMaxTotalAckExtension
I believe I want to set DefaultMaxTotalAckExtension to a large value (several days) to allow my subscriber to continue working on the message without getting some kind of time out.
But I think I also want to modify the AckDeadline so that Pub/Sub knows that the client is still alive. Not sure which one I would want to modify: MinimumAckDeadline, MaximumAckDeadline, DefaultAckDeadline.
Aside from those properties, I don't know if I need to set MinimumAckExtensionWindow, DefaultAckExtensionWindow, or MinimumLeaseExtensionDelay.
All of the properties you mentioned are default/limit properties for the SubscriberClient class itself. Note that they are static only have getters, not setters. What you want to set is MaxTotalAckExtension, which controls the maximum amount of time a message's lease will be extended.
However, taking days to process a message is considered an anti-pattern for Cloud Pub/Sub and will very likely result in duplicate deliveries. If you are going to take that long to process a message, you probably need to look at other options like persisting it locally (in a file or in a database), acking it, and then processing it. At that point, you may consider just writing directly to a database instead of to Pub/Sub and scanning for rows that need to be processed by your subscribers.
Had a quick question regarding subscriber usage in ActiveMQ 5.15.3 using Apache.NSM.ActiveMQ libs for .NET. Looking at two different durable subscribers listening to the same topics that publish XML data, they more or less get and process all the same messages between the two which is expected behavior on this particular case. But occasionally when looking at internal logs one consumer doesn't process one of the messages while the other consumer does and vice versa.
Of note, both of them are using FailoverTransport connections through TCP.
We initially thought it to be a bug in the code of one the consumers, but we see that behavior on both of them so I'm more inclined to believe this to be a configuration error. Also, there are no visible ERROR logs or dropped connectivity.
Could there be a prefetch or timeout problem here? Since I see no obvious errors (messages still get processed fine for the most part on both of them) I could use some pointers or material to scour through on what could be causing this particular behavior.
I tried looking at internal logs for my internal application, and I also tried debugging this internally through a small console app to no real avail, since the functionality worked as expected on a small sample size.
I expected that both subscribers will always process the same amount of messages.
Most of the messages get consumed correctly, but at seemingly random either subscriber will miss/skip a message.
I'm developing a system which will involve a lot of data synchronisation broken down into small tasks. I am adding each small task as a job/message on the Azure Service Bus queue.
I have X number of Worker Roles then checking the queues and processing the data.
I don't expect to have many messages in the queue because the aim is to process a message, complete it and then re-add the same message again, but scheduled for X minutes time. This will give me a loop to continue to process those tasks.
The great thing about the Azure functionality is that they handle all of the server side stuff for you, but the downside is that it can sometimes be difficult to debug or manipulate the data.
What I want to be able to do is present a list of messages in the queue (Which I have done using PeekBatch) in a web interface. I then want to be able to select some/all of the messages and delete them.
I might want to do this if there is a bug in the code and I want to stop messages of a certain type to stop.
Following on from that I'll have functionality to re-add messages from the web page too. Perhaps I might want to up my worker roles and messages to perform a task at a faster rate (or slow them down), or re-add messages I have deleted.
So, the question is, how can I actually select a specific message from the queue and then delete it? From what I can see, there is no obvious way to do this and, if it's possible, it will require some kind of workaround. This sounds a bit bizarre to me.
Edit:
I've got something that works, but it really doesn't seem a great solution:
public void DeleteMessages(List<long> messageIds)
{
foreach (var msg in Client.ReceiveBatch(100))
{
if (messageIds.Contains(msg.SequenceNumber))
msg.Complete(); // Deletes the message
else
msg.Abandon(); // Puts it back in the queue
}
}
This will get less and less efficient the larger the queue, but it at least does stop all activities whilst the delete call is going on and deletes the specified messages.
It will also only delete messages which are ready to process. Messages in the future will be ignored, so I've currently added the ability to add "Sleep" messages, to stop processing the queue until my messages are "ready" and I can delete them.
I've been informed by Microsoft that they are currently working on the API to delete specific messages which should be available in a few months. Until then, it's all about workarounds.
June Update:
Still no update from Microsoft on this issue and the above method was less than ideal. I have now modified my code so that:
The object I put into the message has a new property:
Guid? MessageId { get; set; }
Note it is nullable Guid just to be backwards-compatible
When I want to delete a message, I add my MessageId into a database table "DeletedMessage".
When it comes to processing a message, I look in the DeletedMessage table for a matching Guid and if it finds one I simply Complete() the message without doing the normal processing.
This works well, but is a slight overhead. If you are not dealing with huge numbers of messages it's not a big problem.
Also note that I did this originally by using the SequenceNumber, but (bizarrely) the SequenceNumber changes between peeking and retrieving the message! This prevents the idea from working, unless you use your own Id as above.
You can create your own queue cleaner program that has a message listener that peeks and either abandons or commits the the message depending on some criteria of the message. It could be an action in a controller if you foresee the need for cleansing on actual production environments as well.
From my experiments it is not possible to delete an Active message from a message queue.
Deferred messages can be removed by calling Receive(sequenceNumber) and then calling Complete().
Scheduled messages can be removed by calling CancelScheduledMessageAsync(sequenceNumber).
For active messages you'll have to make sure that they at some point are moved to the dead letter queue.
Messages in the dead letter queue can then later either be deleted or re-submitted.
I have a nice fast task scheduling component (windows service as it happens but this is irrelevant), it subscribes to an in memory queue of things to do.
The queue is populated really fast ... and when I say fast I mean fast ... so fast that I'm experiencing problems with some particular part.
Each item in the queue gets a "category" attached to it and then is passed to a WCf endpoint to be processed then saved in a remote db.
This is presenting a bit of a problem.
The "queue" can be processed in the millions of items per minute whereas the WCF endpoint will only realistically handle about 1000 to 1200 items per second and many of those are "stacked" in order to wait for a slot to dump them to the db.
My WCF client is configured so that the call is fire and forget (deliberate) my problem is that when the call is made occasionally a timeout occurs and thats when the headaches begin.
The thread just seems to stop after timeout no dropping in to my catch block nothing ... just sits there, whats even more confusing is that this is an intermittent thing, this only happens when the queue is dealing with extreme loads and the WCF endpoint is over taxed, and even in that scenario it's only about once a fortnight this happens.
This code is constantly running on the server, round the clock 24/7.
So ... my question ...
How can I identify the edge case that is causing my problem so that I can resolve it?
Some extra info:
The client calling the WCF endpoint seems to automatically "throttle itself" by the fact that i'm limiting the number of threads making calls, and the code hangs about until a call is considered complete (i'm thinking this is a http level thing as im not asking the service for a result of my method call).
The db is talked to with EF which seems to never open more than a fixed number of connections to the db (quite a low number too which is cool) and the WCF endpoint from the call reception back seems super reliable.
The problem seems to be coming off the queue processor to the WCf endpoint.
The queue processor has a single instance of my WCF endpoint client which it reuses for all calls ... (is it good practice to rebuild this endpoint per call? - bear in mind number of calls here).
Final note:
It's a peculiar "module" of functionality, under heavy load for hours at a time it's stable, but for some reason this odd thing happens resulting in the whole lot just stopping and not recovering. The call is wrapped in a try catch, but seemingly even if the catch is hit (which isn't guaranteed) the code doesn't recover / drop out as expected ... it just hangs.
Any ideas?
Please let me know if there's anything else I can add to help resolve this.
Edit 1:
binding - basicHttpBinding
error handling - no code written other than wrapping the WCF call in a try catch.
Seemingly my solution appears to be to increase the timeout settings on the client config to allow the server more time to respond.
The net result being that whilst the database is busy saving data (effectively the slowest part of this process) the calling client sits and waits (on all threads but seemingly not as long as i would have liked).
This issue seems to be the net result of a lot of multithreaded calls to the WCF and not giving it enough time to respond.
The high load is not conintuous, the service usage seems to spike then tail off, adding to the expected response time allows spikes to be filtered through as they happen.
A key note:
Way too many calls will result in the server / service treating them as a dos type attack and as such may simply terminate the connection.
This isn't what I'm getting, but some fine tuning and time may result in this ...
Time for some bigger servers !!!
I have written my network code for my app using Sockets. I've tested the library on a windows PC and it is very fast. Whether it's many small "packets" (by packets I mean send operations, I am using TCP which is streaming) of data in a small amount of time, or a few large ones, it works perfectly.
I moved the code into a test app for the iPhone. Ran the test, great speeds again. about 5 MB sent over wifi between two phones in about 3 seconds.
Im using synchronous Socket.Send() operations in a threadpool thread, and using ReceiveAsync() for receiving. (I've also tried the BeginReceive() style, but it behaves the same.
However, once I move the code into my app, I start to encounter problems. The general problem is that the receiving app doesnt seem to receive consistently. I could send several "packets" of data from the host phone, and it can be anywhere from instantly, to a few seconds to 10 minutes before they appear on the receiving end. I've been on this problem for 2 weeks now (evenings/weekends) and i've spent days testing it to try understand exactly what I can do to reproduce it, but its never the same twice.
At the moment, im putting it down to threadpool threads being exhausted. I've used
ThreadPool.SetMaxThreads()
to increase the thread count drastically, but it doesnt make any difference. Its like as if the completed callback in SocketAsyncEventArgs cannot get a thread to operate on, so it just sits there. I've gone through my code and refactored anything that was un-necessarily using threads with a loop performing periodic tasks and changed them to timers, but the problem remains.
I have literally no idea where to turn with this one. Im hoping its maybe a bug in Monotouch (not that Im trying to blame those guys!).
Im not sure what code to post, as the network code has been tested on its own and operates fine. I've tested it with 1,000,000 send/receives to check if there was some kind of leak but found no problems.
It seems like the data is getting to the recipient, but the callback is somehow getting severely delayed in getting called, sometimes by several minutes.
Can anyone point me in a direction of why this might be happening?
Thank you.
My problem with this was caused by having a GKSession also initialized. I hope this is a bug in Monotouch/Mono that can be fixed, as I do need both network features enabled. As soon as i disabled the GKSession, the socket code flows freely.