Should I call CreateIfNotExistsAsync() before every read/write on Azure queue?
I know it results in a REST call, but does it do any IO on the queue?
I am using the .Net library for Azure Queue (if this info is important).
All that method does is try to create the queue and catches the AlreadyExists error, which you could just as easily replicate yourself by catching the 404 when you try and access the queue. There is bound to be some performance impact.
More importantly, it increases your costs: from the archive of Understanding Windows Azure Storage Billing – Bandwidth, Transactions, and Capacity [MSDN]
We have seen applications that perform a CreateIfNotExist [sic] on a Queue before every put message into that queue. This results in two separate requests to the storage system for every message they want to enqueue, with the create queue failing. Make sure you only create your Blob Containers, Tables and Queues at the start of their lifetime to avoid these extra transaction costs.
Related
I have a set of files in an S3 bucket and I'd like to have multiple clients process the files by downloading and deleting them so they can be processed locally.
How can I ensure that only one client can access any single file so exactly one worker download and processes it? I know I can introduce an additional queuing system or other external process to implement some kind of FIFO queue or locking mechanism, but I'm really hoping to minimize the number of components here so it's simply
(file_generation -> S3 -> workers) without adding more systems to manage or things that might break.
So is there any way to obtain a lock on a file or somehow atomically tag it for a single worker such that other workers will know to ignore it? Perhaps renaming the object's key with the worker's ID so it's "claimed" and no one else will touch it?
Why are you using a filestore like a queue? Why not use a queue? (From your question, it sounds like you are being lazy!).
If you want to keep a similar workflow, create a file on S3 and post the URI of the file to the queue (this can be done automatically by AWS).
Queues can have multiple consumers and there will never be any conflicts (normally).
How can I ensure that only one client can access any single file so exactly one worker download and processes it?
By using a queue, such as an Amazon SQS queue:
Create an SQS queue
Configure the S3 bucket to automatically send a message to the queue when a new object is created
Configure your workers to poll the SQS queue for messages.
When they receive a message, the message is temporarily made 'invisible' but is not removed from the queue
When the worker has completed their process, they delete the message from the SQS queue
It meets your requirements 100% and works "out-of-the-box". Much more reliable than writing your own process.
what if an entry in the queue gets lost and the file remains
Amazon SQS supports the concept of an invisibility period while messages are being processed, but are not fully processed. If a worker fails to delete the message after processing, the message will reappear on the queue after a defined period, ready for another worker to process it.
or the queue goes offline
Amazon SQS is a regional service, which means that the queues are replicated between multiple Availability Zones, operated by parallel servers.
or objects get renamed
It is not possible to 'rename' objects in Amazon S3. An object would need to be copied and the original object deleted.
I have multiple queues that multiple clients insert messages into them.
On the server side, I have multiple micro-services that access the queues and handle those messages. I want to lock a queue whenever a service is working on it, so that other services won't be able to work on that queue.
Meaning that if service A is processing a message from queue X, no other service can process a message from that queue, until service A has finished processing the message. Other services can process messages from any queue other than X.
Does anyone has an idea on how to lock a queue and prevent others from accessing it? preferably the other services will receive an exception or something so that they'll try again on a different queue.
UPDATE
Another way can be to assign the queues to the services, and whenever a service is working on a queue no other service should be assigned to the queue, until the work item was processed. This is also something that isn't easy to achieve.
There are several built-in ways of doing this. If you only have a single worker, you can set MessageOptions.MaxConcurrentCalls = 1.
If you have multiple, you can use the Singleton attribute. This gives you the option of setting it in Listener mode or Function mode. The former gives the behavior you're asking for, a serially-processed FIFO queue. The latter lets you lock more granularly, so you can specifically lock around critical sections, ensuring consistency while allowing greater throughput, but doesn't necessarily preserve order.
My Guess is they'd have implemented the singleton attribute similarly to your Redis approach, so performance should be equivalent. I've done no testing with that though.
You can achieve this by using Azure Service Bus message sessions
All messages in your queue must be tagged with the same SessionId. In that case, when a client receives a message, it locks not only this message but all messages with the same SessionId (effectively whole queue).
The solution was to use Azure's redis to store the locks in-memory and have micro-services that manage those locks using the redis store.
The lock() and unlock() operations are atomic and the lock has a TTL, so that a queue won't be locked indefinitely.
Azure Service Bus is a broker with competing consumers. You can't have what you're asking with a general queue all instances of your service are using.
Put the work items into a relational database. You can still use queues to push work to workers but the queue items can now be empty. When a worker receives an item he know to look into the database instead. The content of the message is disregarded.
That way messages are independent and idempotent. For queueing to work these two properties usually must hold.
That way you can more easily sequence actions that actually are sequential. You can use transactions as well.
Maybe you don't need queues at all. Maybe it is enough to have a fixed number of workers polling the database for work. This loses auto-scaling with queues, though.
I have a WCF service (the fact that it's WCF shouldn't matter) and I'm not looking for message queuing, but instead for an asynchronous work queue in which to place tasks, once a request / message is received. Requirements:
Must support persistent store that enables recovery of tasks in the case of Server / service process failure.
Supports re-running of failed jobs, up to a given limit (i.e. try re-running a job up to 5 times)
Able to record the failed job call along with its parameters, in an easily queried fashion. For example, I would query the store for failed jobs and receive a list of "job name, parameters".
Unfortunately cannot be a cloud-based / hosted solution.
Queues that I'm probably not looking for:
MSMQ (RabbitMQ, AMQP). Low level, and is focused on message transport.
Quartz.NET. Has some of the above but its error-recording facilities are lacking. Geared more toward cron-like scheduling than async work and error reporting.
the Default Task Scheduler of .NET TPL. It has no persistence of the process owning it stops abruptly and doesn't support re-running of tasks very well.
I think I'd be looking for something more along the lines of Celery, Resque, or even qless. I know Resque.NET exists (https://www.nuget.org/packages/Resque/), but not sure if there's something more mainstream, or if that could suffice.
What about Amazon SQS? You don't have to worry about infrastructure as you would with RabbitMQ/MSMQ. SQS is dirt cheap, too. Last time I checked, it was $0.01 per 10,000 messages. Why re-invent the wheel? Let Amazon (or other cloud providers with similar services, like Microsoft and Rackspace) do all the worrying.
I use Amazon SQS in production for all message-based services. Some of these messages act like chron jobs; an external process queues the message at a specific time. Some of them are acted upon immediately.
I have an asp.net web application running on IIS 7 set-up in web-garden mode. I want to clear runtime cache items across all worker processes using a single-step. I can setup a database key-value, but that would mean a thread executing on each worker process, on each of my load-balanced-scenario web servers will poll for changes on that key-value and flush cache. That would be a very bad mechanism as I flush cache items once per day at max. Also I cannot implement a push notification using the SqlCacheDependency with Service Broker notifications as I have a MySql db. Any thoughts? Is there any dirty work-around? One possible workaround, expose an aspx page, and hit that page multiple times using the ip and port on which the site is hosted instead of the domain name - ex: http://ip.ip.ip.ip:82/CacheClear.aspx, so that a request for that page might be sent to all the worker processes within that webserver, and on Page_Load, clear the cache items. But this is a really dirty hack and may not work in cases when all requests are sent to the same worker process.
You need to setup inter-process communication.
For caching there are two commonly used ways of doing this:
Setup a shared cache (memcached or the like.)
Setup a message queue (e.g. ms-mqueue or rabbitMq) and use it to spread state to the local caches.
A shared cache is the ultimate solution as it means the whole cache is distributed but it is also the most complex: it needs to be set up so the cache load is properly distributed between nodes and make sure it doesn't become a bottle neck.
The second option requires more code on your part but it is easier if you don't want to share the cache content (as in your case.)
The easiest is to setup a listener thread or task to handle the cache clear or individual entries invalidation messages. This thread will be dormant if there are no messages so the impact on performance is minimal.
You can also forgo the listener thread by handling messages as part of the usual iis request pipeline. I.e. set up a filter/module that checks for messages in the queue and processes them before handling the request; but performance wise the first option is (slightly) better.
We have an ASP MVC 3.0 application that reads data from the db using Entity framework (all on Azure). We have several long running queries (optimization has been done) and we want to make sure that the solution is scalable and prevent thread starvation.
We looked at async controllers and using I/O completion ports to run the query (using BeginExecute instead of the usual EF). However, async is hard to debug and increases the complexity of the code.
The proposed solution is as follows:
The web server (web role) gets a request that involves a long running query (example customer segmentation)
It enters the request information into a table along with the relevant parameters and returns thereby allowing the thread to process other requests.
We set a flag in the db that enables the UI to state that the query is in progress whenever a refresh to the page is done.
A worker role constantly queries this table and as soon as it finds this entry processes the long running query (customer segmentation) and updates the original customer table with the results.
In this case an immediate return of status to the users is not necessary. Users can check back within a couple of minutes to see if their request has been worked on. Instead of the table we were planning to use Azure Queues (but I guess Azure queues cannot notify a worker role so a db table will do just fine). Is this a workable solution. Are there any pitfalls to doing it this way?
While Windows Azure Storage queues don't give you a notification after a message has been processed, you could implement that yourself (perhaps with Windows Azure Storage tables). The nice part about queues: They handle concurrency and failed attempts.
For instance: If you have 2 worker instances processing messages off the same queue, every time a queue message is read, the message goes invisible in the queue, for an amount of time you specify. While invisible, only the worker instance that read the message has it. If that instance finishes processing, it can just delete the queue message (and update your notification table). If it fails (maybe due to the role instance crashing), the message re-appears on the queue after the invisibility timeout expires. Going one step further: Let's say it's simply a bad message that causes your code to crash every time. You can check the dequeue count before processing the message. If it's greater than, say, 2, simply store the message in a dead-letter table and inspect it manually.
One caveat with queues: The queue messages need to be idempotent operations (that is, they can be processed at least once, and the results should have the exact same side-effects each time).
If you go with a table instead of a queue, you'll need to deal with scaling (multiple threads or role instances processing the table), and dead-letter handling.
This depends. If your worker role does nothing other than delegating the heavy work to a SQL database, it seems a waste of resource and your money. Using a web role with async requests allows you to reduce the cost. If it is needed to do a heavy work in the worker role itself, then it is a good approach.
You can also use AJAX or web socket. Start the database query, and return the response immediately. The client can either poll the web role to see if a query has finished (if you use HTTP), or the web role can notify the client directly (if you use web socket).