Distributed locks with queue - c#

I have a single instance application that need to be moved in order that can run from multiple servers for performance reasons.
When a specific operation is executed the software need to be sure that the other instances was not working on the same operation.
I made some tests with Redis distributed locks but have some problems: also trying for 10 seconds with a retry each 50 msec sometimes the software is unlucky and is not able to acquire a lock because in the meanwhile other instances that started later was able to acquire the lock.
Exists some distributed locks services that are able to manage queue? Basically what I need is that when the software was not able to acquire the lock because is used by another instance can keep the priority and been the first to be served when the lock was released.
Of course my last solution can be write a custom software that can manage locks with queue, but I'm trying to understand if already exists some other solutions.

Exists some distributed locks services that are able to manage queue? Basically what I need is that when the software was not able to acquire the lock because is used by another instance can keep the priority and been the first to be served when the lock was released.
Do you need the hard canonical way or can you allow some simplification?
The hard canonical way is the one you study at university and requires a lot of custom software. I am a bit rusty, but in a distributed system you must use an ordered and reliable message delivery to be barely sure to keep the FIFO order. A service bus or a queue can guarantee to deliver messages in order, but then in order to hold a distributed lock you need to implement consensus, which is extremely hard. After that, you must deal with all sorts of implementation flaws that are very common in software science.
A consensus algorithm, leader or leaderless, involves a lot of communication to make sure all peers receive the same messages in the same order and are able to determine who is the owner of the lock, even without sending an explicit lock signal after release (which is explicit). Of course, the service bus must deliver the message to the node self before the node acquires the lock.
Think of a distributed algorithm like: the node is listening on the bus. When the first lock request comes in, node knows that the sender is the owner of the lock. Node awaits for other lock requests and puts those in a local queue. When the original holder sends a release message, every node knows the lock owner is the one that sent the second lock request. In order to acquire the lock, a node sends a message to the broadcast, and waits until it receives it to know its queue position. When it received n release messages, it knows it's their turn.
Make it practical
You could use a relational database. I don't have knowledge of your scenario, but a single point of failure could be mitigated by some kind of clustering, master-slave primary-secondary, etc.
SQL databases are f***in great at handling concurrency. They ensure that transactions are executed in a consistent order. If your application uses a database table as queue, you will see that it's very easy to run SQL UPDATE LOCKS SET IS_LOCKED = 1 WHERE LOCK_ID = ? AND IS_LOCKED = 0 without clashes. To handle the queue, put the node name in another table and make sure you ORDER BY a SQL sequence.

Related

RabbitMQ EventingBasicConsumer concurrency consideration

I was reading through the .NET API Guide and it is pretty good information but I'm a bit confused on how RabbitMQ manages threads. Under the Concurrency Consideration section it states that every IConnection is backed up by a single background thread. Then it continues with:
The one place where the nature of the threading model is visible to the application is in any callback the application registers with the library. Such callbacks include:
any IBasicConsumer method
the BasicReturn event on IModel
any of the various shutdown events on IConnection, IModel etc.
I'm a bit confused by this. Do they mean that every time HandleBasicDeliver is called a new thread is created? In that case there will be as many threads as messages are received and the concurrency is controlled by the prefetch count along with the number of consumers?
So if we consider a case where I have one IConnection and two channels (IModel) with prefetch count of one and one EventingBasicConsumer per channel, how many threads would we have running in the application?
I have done considerable amount of research on this topic since I first asked the question so I thought I would post it here in case someone finds this information useful.
Take this with a grain of salt. These are my understandings of how rabbit (C#) works.
IConnection is an actual socket connection. It will have a single thread polling it. According to suggestions I have read, use one connection per application unless you have a reason to use more.
Using more than one connection does not necessarily mean that you have better fault tolerance since if a connection fails, there usually is a problem that will result in all the connections to fail. Also, in many cases using one connection is enough to handle the traffic coming from the network and it is simply unnecessary to have more.
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2011-July/013771.html
http://lists.rabbitmq.com/pipermail/rabbitmq-discuss/2012-May/019873.html
In C# channels are not considered thread safe so it is not a bad idea to have a channel per thread, otherwise one should make sure to use locking mechanism.
https://www.rabbitmq.com/dotnet-api-guide.html
As per my understanding of reading the source code, handleBasicDeliver (and I think all IModel calls) is ran in a Task. With that in mind, having multiple consumers does increase the concurrency of your software since if one consumer receives a message and is busy processing it, another consumer is free to pick up a message and execute it in a different thread. Also, each consumer must be on its own channel to maximize concurrency otherwise message order is preserved. However, if that kind of concurrency is not welcome consider using a single channel to ensure the messages are processed in the order they arrived.
NOTE: Point 3 may have changed in that channel order is no longer preserved. I haven't had time to explore the new changes so read through this and draw your own conclusion:
https://github.com/rabbitmq/rabbitmq-dotnet-client/issues/251
I have revised my answer and research that you did looks good.
I have created an application using Rabbitmq and in that I had a situation where I have to share the same IConnection(Connection) for number of IModel(channel). And that is the way we should use one connection as one connection is enough for serving multiple channels.
The problem I faced once was if I am creating one connection per client then the number of connections for that queue goes on increasing which results in termination of the application after a while.
So we should avoid having multiple connections unless its needed. If possible only one connection should be used for multiple channels.
Various shut down events on IConnection, IModel:
The IModel even if goes down the IConnection is still there. But if the IConnection is down then all the IModel under that connection will be shut down.

Executing interdependent Database ops on Multiple Threads

My server application that is written in C# starts a new thread every time it needs to insert or remove data from the database. The problem is that since the execution of the threads is arbitrary it is not ensured that a delete command is executed after the insertion of the same object if those events occur almost at the same time.
E.g.: The server receives the command to insert multiple objects. It takes about 5 seconds to insert all the object. After 1 second of execution the server receives the command to delete all those objects again from the database. Since the removal could happen before all objects are completely stored the outcome is unknown.
How can the order of execution of certain thread be managed?
You can use transactions for this and specify different levels for different operations.
For example, you can use the highest level of transactions for writes/updates/deletes but a low level for reads. You can also fine-tune this to allow blocking of only specific rows, compared to tables. Specific terminology depends on the database and data access library you use.
I would advice against using any ordering. Parallel and ordered just don't go well together. For example:
You need to horizontally scale servers, once you add a second server and a load balancer a mutex solution will not work
In a large and distributed systems a message queue won't work either as by the time one thread completed a scan and decided that we good to go, another thread can write a message that should have prevented operation execution. Moreover, given you receive high load, scanning the same queue multiple times is inefficient.
If you know that you receive insert before delete and the problem is just that you don't want to interrupt your insertion then you can just use lock on your insertion code.
static object m_Lock = new object();
public void Insert()
{
lock (m_Lock)
{
InsertRecords();
}
}
public void Remove()
{
lock (m_Lock)
{
RemoveRecords();
}
}
This way you are sure that remove won't happen during insert.
P.S. Seems strange though that you need to insert and then delete right away.
I think the simplest way is to queue all incoming requests to insert objects in one collection, and to queue all incoming requests to delete objects in a second collection.
The server should have a basic loop that does :
a. check if there are incoming inserts , if so -> perform all inserts.
b. check if there are incoming delete requests, if so -> perform all delete requests.
c. sleep for X milli-seconds.
Now, if you have a delete request on an object that does not exist.
you have two options:
a. igore this request and discard it.
b. ignore this request for this round and keep it in the collection for the next N rounds,
before deleting it (Finally deleting it- assuming this is simply a bad request and is not a problem of race condition.)
Use a Queue (with a single servicing thread) to enforce the ordering. You can also use Task Parallel Library to manage tasks with dependencies on other tasks, though that's very difficult with arbitrary DB operations.
I think you need to rethink how you manage the incoming operations, and whether or not their inter-dependencies are predictable enough that you can safely use multiple threads in this way. You may need to add some "depends on" information into incoming operations to achieve that goal.

Best way to run automated task every minute when site is on multiple servers

I need to setup an automated task that runs every minute and sends emails in the queue. I'm using ASP.NET 4.5 and C#. Currently, I use a scheduler class that starts in the global.asax and makes use of caching and cache callback. I've read this leads to several problems.
The reason I did it that way is because this app runs on multiple load balanced servers and this allows me to have the execution in one place and the code will run even if one or more servers are offline.
I'm looking for some direction to make this better. I've read about Quartz.NET but never used it. Does Quartz.NET call methods from the application? or from a windows service? or from a web service?
I've also read about using a Windows service, but as far as I can tell, those are installed to the server direct. The thing is, I need the task to execute regardless of how many servers are online and don't want to duplicate it. For example, if I have a scheduled task setup on server 1 and server 2, they would both run together therefore duplicating the requests. However, if server 1 was offline, I need server 2 to run the task.
Any advice on how to move forward here or is the global.asax method the best way for the multi-server environment? BTW, the web servers are running Win Server 2012 with IIS 8.
EDIT
In a request for more information, the queue is stored in a database. I should also make mention that the database servers are separate from the web servers. There are two database servers, but only one runs at a time. There is a central storage they both read from so there is only one instance of the database. When one database server goes down, the other comes online.
That being said, would it make more sense to put a Windows Service deployed to both database servers? That would make sure only one runs at a time.
Also, what are your thoughts about running Quartz.NET from the application? As millimoose mentions, I don't necessarily need it running on the web front end, however, doing so allows me to not deploy a windows service to multiple machines and I don't think there would be a performance difference going either way. Thoughts?
Thanks everyone for the input so far. If any additional info is needed, please let me know.
I have had to tackle the exact problem you're facing now.
First, you have to realize that you absolutely cannot reliably run a long-running process inside ASP.NET. If you instantiate your scheduler class from global.asax, you have no control over the lifetime of that class.
In other words, IIS may decide to recycle the worker process that hosts your class at any time. At best, this means your class will be destroyed (and there's nothing you can do about it). At worst, your class will be killed in the middle of doing work. Oops.
The appropriate way to run a long-lived process is by installing a Windows Service on the machine. I'd install the service on each web box, not on the database.
The Service instantiates the Quartz scheduler. This way, you know that your scheduler is guaranteed to continue running as long as the machine is up. When it's time for a job to run, Quartz simply calls a method on a IJob class that you specify.
class EmailSender : Quartz.IJob
{
public void Execute(JobExecutionContext context)
{
// send your emails here
}
}
Keep in mind that Quartz calls the Execute method on a separate thread, so you must be careful to be thread-safe.
Of course, you'll now have the same service running on multiple machines. While it sounds like you're concerned about this, you can actually leverage this into a positive thing!
What I did was add a "lock" column to my database. When a send job executes, it grabs a lock on specific emails in the queue by setting the lock column. For example, when the job executes, generate a guid and then:
UPDATE EmailQueue SET Lock=someGuid WHERE Lock IS NULL LIMIT 1;
SELECT * FROM EmailQueue WHERE Lock=someGuid;
In this way, you let the database server deal with the concurrency. The UPDATE query tells the DB to assign one email in the queue (that is currently unassigned) to the current instance. You then SELECT the the locked email and send it. Once sent, delete the email from the queue (or however you handle sent email), and repeat the process until the queue is empty.
Now you can scale in two directions:
By running the same job on multiple threads concurrently.
By virtue of the fact this is running on multiple machines, you're effectively load balancing your send work across all your servers.
Because of the locking mechanism, you can guarantee that each email in the queue gets sent only once, even though multiple threads on multiple machines are all running the same code.
In response to comments: There's a few differences in the implementation I ended up with.
First, my ASP application can notify the service that there are new emails in the queue. This means that I don't even have to run on a schedule, I can simply tell the service when to start work. However, this kind of notification mechanism is very difficult to get right in a distributed environment, so simply checking the queue every minute or so should be fine.
The interval you go with really depends on the time sensitivity of your email delivery. If emails need to be delivered ASAP, you might need to trigger every 30 seconds or even less. If it's not so urgent, you can check every 5 minutes. Quartz limits the number of jobs executing at once (configurable), and you can configure what should happen if a trigger is missed, so you don't have to worry about having hundreds of jobs backing up.
Second, I actually grab a lock on 5 emails at a time to reduce query load on the DB server. I deal with high volumes, so this helped efficiency (fewer network roundtrips between the service and the DB). The thing to watch out here is what happens if a node happens to go down (for whatever reason, from an Exception to the machine itself crashing) in the middle of sending a group of emails. You'll end up with "locked" rows in the DB and nothing servicing them. The larger the size of the group, the bigger this risk. Also, an idle node obviously can't work on anything if all remaining emails are locked.
As far as thread safety, I mean it in the general sense. Quartz maintains a thread pool, so you don't have to worry about actually managing the threads themselves.
You do have to be careful about what the code in your job accesses. As a rule of thumb, local variables should be fine. However, if you access anything outside the scope of your function, thread safety is a real concern. For example:
class EmailSender : IJob {
static int counter = 0;
public void Execute(JobExecutionContext context) {
counter++; // BAD!
}
}
This code is not thread-safe because multiple threads may try to access counter at the same time.
Thread A Thread B
Execute()
Execute()
Get counter (0)
Get counter (0)
Increment (1)
Increment (1)
Store value
Store value
counter = 1
counter should be 2, but instead we have an extremely hard to debug race condition. Next time this code runs, it might happen this way:
Thread A Thread B
Execute()
Execute()
Get counter (0)
Increment (1)
Store value
Get counter (1)
Increment (2)
Store value
counter = 2
...and you're left scratching your head why it worked this time.
In your particular case, as long as you create a new database connection in each invocation of Execute and don't access any global data structures, you should be fine.
You'll have to be more specific about your architecture. Where is the email queue; in memory or a database? If they exist on a database, you could have a flag column named "processing" and when a task grabs an email from the queue it only grabs emails that are not currently processing, and sets the processing flag to true for emails it grabs. You then leave concurrency woes to the database.

Load balancing with shared priority queues

I am trying to implement a load balancer at the moment and have hit a bit of a speed bump. The situation is as follows (simplified),
I have a queue of requests queue_a which are processed by worker_a
There is a second queue of requests queue_b which are processed by worker_b
And I have a third queue of requests queue_c that can go to either of the workers
The reason for this kind of setup is that each worker has unique requests that only it can process, but there are also general requests that anyone can process.
I was going to implement this basically using 3 instances of the C5 IntervalHeap. Each worker would have access to its local queue + the shared queues that it is a part of (e.g., worker_a could see queue_a & queue_c).
The problem with this idea is that if there is a request in the local queue and a request in the shared queue(s) with the same priority, it's impossible to know which one should be processed first (the IntervalHeap is normally first-come-first-serve when this happens).
EDIT: I have discovered IntervalHeap appears to not be first-come-first-server with same priority requests!
I would like to minimise locking across the queues as it will be relatively high throughput and time sensitive, but the only way I can think of at the moment would involve a lot more complexity where the third queue is removed and shared requests are placed into both queue_a and queue_b. When the request is sucked up it would know it is a shared request and have to remove it from the other queues.
Hope that explains it clearly enough!
It seems that you'll simply end up pushing the bubble around - no matter how you arrange it, in the worst case you'll have three things of equal priority to execute by only two workers. What sort of tie breaking criteria could you apply beyond priority in order to choose which queue to pull the next task from?
Here are two ideas:
Pick the queue at random. All priorities are equal so it shouldn't matter which one is chosen. On average in the worst case, all queues will be serviced at roughly the same rate.
Minimize queue length by taking from the queue that has the largest number of elements. This might cause some starvation of other queues if one queue's fill rate is consistently higher than others.
HTH
Synchronizing your workers can share the same pool of resources as well as their private queue. Of there is 1 item available in the queue for worker 1 and 1 item available in the shared queue, it would be a shame if worker 1 picks up the item of the shared queue first since this will limit parallel runs. Rather you want worker 1 to pick up the private item first, this however leads to new caveats, one being where worker 1 and worker 2 are both busy handling private items and therefore older shared items will not be picked up.
Finding a solution that addresses these problems will be very difficult when also trying to keep the complexity down. A simple implementation is only to handle shared items when the private queue is empty. This does not tackle the part where priorities are not handled correctly on high load scenario's. (e.g. where the shared queue wont be handled since the private queues are always full). To balance this, you might want to handle the private queue first, only if the other workers private queue is empty. This is still not a perfect solution since this will still prefer private queue items over shared items. Addressing this problem again can be achieved by setting up multiple strategies but here comes even more complexity.
It all depends on your requirements.

Options to use multithreading to process a group of database records?

I have a database table that contains some records to be processed. The table has a flag column that represents the following status values. 1 - ready to be processed, 2- successfully processed, 3- processing failed.
The .net code (repeating process - console/service) will grab a list of records that are ready to be processed, and loop through them and attempt to process them (Not very lengthy), update status based on success or failure.
To have better performance, I want to enable multithreading for this process. I'm thinking to spawn say 6 threads, each threads grabbing a subset.
Obviously I want to avoid having different threads process the same records. I dont want to have a "Being processed" flag in the database to handle the case where the thread crashes leaving the record hanging.
The only way I see doing this is to grab the complete list of available records and assigning a group (maybe ids) to each thread. If an individual thread fails, its unprocessed records will be picked up next time the process runs.
Is there any other alternatives to dividing the groups prior to assigning them to threads?
The most straightforward way to implement this requirement is to use the Task Parallel Library's
Parallel.ForEach (or Parallel.For).
Allow it to manage individual worker threads.
From experience, I would recommend the following:
Have an additional status "Processing"
Have a column in the database that indicates when a record was picked up for processing and a cleanup task / process that runs periodically looking for records that have been "Processing" for far too long (reset the status to "ready for processing).
Even though you don't want it, "being processed" will be essential to crash recovery scenarios (unless you can tolerate the same record being processed twice).
Alternatively
Consider using a transactional queue (MSMQ or Rabbit MQ come to mind). They are optimized for this very problem.
That would be my clear choice, having done both at massive scale.
Optimizing
If it takes a non-trivial amount of time to retrieve data from the database, you can consider a Producer/Consumer pattern, which is quite straightforward to implement with a BlockingCollection. That pattern allows one thread (producer) to populate a queue with DB records to be processed, and multiple other threads (consumers) to process items off of that queue.
A New Alternative
Given that several processing steps touch the record before it is considered complete, have a look at Windows Workflow Foundation as a possible alternative.
I remember doing something like what you described...A thread checks from time to time if there is something new in database that needs to be processed. It will load only the new ids, so if at time x last id read is 1000, at x+1 will read from id 1001.
Everything it reads goes into a thread safe Queue. When items are added to this queue, you notify the working threads (maybe use autoreset events, or spawn threads here). each thread will read from this thread safe queue one item at a time, until the queue is emptied.
You should not assign before the work foreach thread (unless you know that foreach file the process takes the same amount of time). if a thread finishes the work, then it should take the load from the other ones left. using this thread safe queue, you make sure of this.
Here is one approach that does not rely/use an additional database column (but see #4) or mandate an in-process queue. The premise this approach is to "shard" records across workers based on some consistent value, much like a distributed cache.
Here are my assumptions:
Re-processing does not cause unwanted side-effects; at most some work "is wasted".
The number of threads is fixed upon start-up. This is not a requirement, but it does simplify the implementation and allows me to skip transitory details in the simple description below.
There is only one "worker process" (but see #1) controlling the "worker threads". This simplifies dealing with how the records are split between workers.
There is some [immutable] "ID" column which is "well distributed". This is required so search worker gets about the same amount of work.
Work can be done "out of order" as long as it is "eventually done". Also, workers might not always run "at 100%" due to each one effectively working on a different queue.
Assign each thread a unique bucket value from [0, thread_count). If a thread dies/is restarted it will take the same bucket as that which it vacated.
Then, each time a thread needs a new record is needed it will fetch from the database:
SELECT *
FROM record
WHERE state = 'unprocessed'
AND (id % $thread_count) = $bucket
ORDER BY date
There could of course be other assumptions made about reading the "this threads tasks" in batch and storing them locally. A local queue, however, would be per thread (and thus re-loaded upon a new thread startup) and thus it would only deal with records associated for the given bucket.
When the thread is finished processing a record should mark the record as processed using the appropriate isolation level and/or optimistic concurrency and proceed to the next record.

Categories

Resources