I am working on a command processing application which uses azure service bus queue.
Commands are issued from a website and posted to the queue and the queue messages are processed by a worker role. Processing involves fetching data from db and other sources based on the queue message values and sending it to different topics. The flow is ,
Receive message
process the message
Mark message as complete / Abandon message On processing exception.
The challenge I face here is the processing time. Sometimes it exceeds the maximum message lock time period (5 minutes -configured) and hence the message is unlocked and it re-appears for the worker role to pick up (consider multiple instances of the worker role). So this causes same message to be processed again.
What are the options I have to handle such a scenario.?
I have thought about ,
Receive message - add to a local variable - mark message complete.
In case of exception send the message again to the queue or to a
separate queue (let us say failed message queue). A second queue
also means another worker role to process it.
In the processing there is a foreach loop that runs. So I thought of
using a Parallel.Foreach instead . but not sure how much of time
gain it will give and also read some posts on issues when using
Parallel in azure.
Suggestions,fixes welcome.
Aravind, you can absolutely use SB queue in this scenario. With the latest SDK you can renew the lock on your message for as long as your are continuing to process it. Details are at: http://msdn.microsoft.com/en-us/library/microsoft.servicebus.messaging.brokeredmessage.renewlock.aspx
This is similar to the Azure storage queue functionality of updating the visibility timeout: http://msdn.microsoft.com/en-us/library/windowsazure/microsoft.windowsazure.storage.queue.cloudqueue.updatemessage.aspx
You may want to consider using an Azure Queue, the maxium lease time for an Azure Queue message is 7 days, as opposed to the Azure Service Bus Queue lease time of 5 minutes.
This msdn article describes the differences between the two Azure queue types.
If the standard Azure Queue doesn't contain all the features you need you might consider using both types of Queue.
You can fire off a Task with a heartbeat operation that keeps renewing the lock for you while you're processing it. This is exactly what I do. I described my approach at Creating a Task with a heartbeat
Related
I have a set of files in an S3 bucket and I'd like to have multiple clients process the files by downloading and deleting them so they can be processed locally.
How can I ensure that only one client can access any single file so exactly one worker download and processes it? I know I can introduce an additional queuing system or other external process to implement some kind of FIFO queue or locking mechanism, but I'm really hoping to minimize the number of components here so it's simply
(file_generation -> S3 -> workers) without adding more systems to manage or things that might break.
So is there any way to obtain a lock on a file or somehow atomically tag it for a single worker such that other workers will know to ignore it? Perhaps renaming the object's key with the worker's ID so it's "claimed" and no one else will touch it?
Why are you using a filestore like a queue? Why not use a queue? (From your question, it sounds like you are being lazy!).
If you want to keep a similar workflow, create a file on S3 and post the URI of the file to the queue (this can be done automatically by AWS).
Queues can have multiple consumers and there will never be any conflicts (normally).
How can I ensure that only one client can access any single file so exactly one worker download and processes it?
By using a queue, such as an Amazon SQS queue:
Create an SQS queue
Configure the S3 bucket to automatically send a message to the queue when a new object is created
Configure your workers to poll the SQS queue for messages.
When they receive a message, the message is temporarily made 'invisible' but is not removed from the queue
When the worker has completed their process, they delete the message from the SQS queue
It meets your requirements 100% and works "out-of-the-box". Much more reliable than writing your own process.
what if an entry in the queue gets lost and the file remains
Amazon SQS supports the concept of an invisibility period while messages are being processed, but are not fully processed. If a worker fails to delete the message after processing, the message will reappear on the queue after a defined period, ready for another worker to process it.
or the queue goes offline
Amazon SQS is a regional service, which means that the queues are replicated between multiple Availability Zones, operated by parallel servers.
or objects get renamed
It is not possible to 'rename' objects in Amazon S3. An object would need to be copied and the original object deleted.
We are using RABBITMQ Queues with C# API to perform distributed work where we have different windows application running, subscribed to a one rabbitmq queue and is working fine but we have a situation where we require to perform some operation only if its a last message in a queue. Is there any way in c# api to know whether the receiving message is the last message or not in a queue.? Something like if an application consumes a message from a queue and we get to know this is the last message and perform some operation.
You can check this http://docs.celeryproject.org/en/latest/userguide/monitoring.html#inspecting-queues
It gives you list of messages in queue and you can easily find last message.
I'm using RabbitMQ for the following scenario. When a user uses a premium search feature, I send a message via RabbitMQ to one of a few server instances. They run the same routine (DB queries and billing). I want to make sure I don't process the same message more than once.
I've come across this great tutorial but the exchange type presented in it is "Topic", which does not work for me, because I process the same message more than once.
How can I implement the request-response pattern with worker queues in RabbitMQ so that each message is handled only once and there's load balancing?
Anton Gogolev's comment above is correct. You cannot guarantee a message will be processed only once, for many reasons. But, this is often a requirement of systems - to only produce the desired result once.
The way to do that is through idempotence - the idea that no matter how many times a given message is processed, it will only make the desired change once.
There are a lot of ways to do this. One simple example is to use a shared database that tracks which messages have been processed. When you receive a message, you check to see if it has been processed already. If not, you process it. If it has, you just ignore it and move on.
In your case, if you are doing request/response and want load balancing, you probably want multiple consumers on the same queue. You could have 2 or 10 or 300 instances of your request handler listening to the same queue, and you won't have too much worry about duplicate processing.
RabbitMQ will send a given message to a single consumer. It will wait for that consumer to say it is done processing, or if the consumer crashes or rejects the message, it will requeue the message for another consumer to try again.
In this way, you will generally have only 1 request handler per request. But it will always be possible for more than one to handle the same message, which is why idempotence is important.
Regarding the use of a topic exchange vs any other type of exchange - it doesn't make much difference. There will always be the possibility of more than one queue receiving the message that you are sending, because you can have multiple queues bound to the same exchange with the same binding keys.
I'm new to this, please help me. I want to design a system with a web site and a service. The website accept user's request and put job into a queue. The service get job from queue and process it. But how to deal with the scenario that the service breaks down after it has already fetch the job from the queue? Is there a mechanism to know the system has crashed and put the job back to the queue. Thanks in advance!
When using MSMQ, by default messages are wrapped in a transaction when being put into the queues and when they are being processed (via a handler).
If the message was half way through the handler and power was cut to the server, the transaction would fail and the message would be left at the top of the queue. When the server came back up, the top message will be pulled off the queue and processed (the message that was being processed during the previous failure).
I am current writing a TCP listener that has many client applications sending short messages to. The TCP listener I have is a C# winform and what I need to do is to process these logs in batches to avoid hitting the database on every message I receive in the Queue. Currently, on every message I receive in the listener, I do Enqueue with the C# Queue Class.
A separate thread will execute every 5 minutes to check this Queue and start processing the Queue if there are any queued items. It seems that there is a concurrency/race condition issue with this design as when the 5 minute thread kicks off, the new messages being received can no longer access the Queue since I have a lock on it during DeQueue. Therefore, these new messages gets lost. It seems to be happening only when there are large amounts of messages being sent to the TCP listener.
Does anyone think this is a flawed design on my part or would there be a much better solution for this? I am not allowed to use MSMQ or WCF based on restrictions from the client application that are sending the messages.
So you have a producer-consumer scenario, with multiple producers and one (buffered) consumer. You may want to take a look at Reactive Extensions (they have a version for .NET 3.5). At the very least, you could leverage their backport of BlockingCollection<T>.