a pattern for packing incoming parallel requests into one

a pattern for packing incoming parallel requests into one - c#

Suppose we have many randomly incoming threads accessing same resource in parallel. To access the resource thread needs to acquire a lock. If we could pack N incoming threads into one request resource usage would be N times more efficient. Also we need to answer individual request as fast as possible. What is the best way/pattern to do that in C#?
Currently I have something like that:
//batches lock
var ilock = ModifyBatch.GetTableDeleteBatchLock(table_info.Name);
lock (ilock)
{
// put the request into requests batch
if (!ModifyBatch._delete_batch.ContainsKey(table_info.Name))
{
ModifyBatch._delete_batch[table_info.Name] = new DeleteData() { Callbacks = new List<Action<string>>(), ids = ids };
}
else
{
ModifyBatch._delete_batch[table_info.Name].ids.UnionWith(ids);
}
//this callback will get called once the job is done by a thread that will acquire resource lock
ModifyBatch._delete_batch[table_info.Name].Callbacks.Add(f =>
{
done = true;
error = f;
});
}
bool lockAcquired = false;
int maxWaitMs = 60000;
DeleteData _delete_data = null;
//resource lock
var _write_lock = GetTableWriteLock(typeof(T).Name);
try
{
DateTime start = DateTime.Now;
while (!done)
{
lockAcquired = Monitor.TryEnter(_write_lock, 100);
if (lockAcquired)
{
if (done) //some other thread did our job
{
Monitor.Exit(_write_lock);
lockAcquired = false;
break;
}
else
{
break;
}
}
Thread.Sleep(100);
if ((DateTime.Now - start).TotalMilliseconds > maxWaitMs)
{
throw new Exception("Waited too long to acquire write lock?");
}
}
if (done) //some other thread did our job
{
if (!string.IsNullOrEmpty(error))
{
throw new Exception(error);
}
else
{
return;
}
}
//not done, but have write lock for the table
lock (ilock)
{
_delete_data = ModifyBatch._delete_batch[table_info.Name];
var oval = new DeleteData();
ModifyBatch._delete_batch.TryRemove(table_info.Name, out oval);
}
if (_delete_data.ids.Any())
{
//doing the work with resource
}
foreach (var cb in _delete_data.Callbacks)
{
cb(null);
}
}
catch (Exception ex)
{
if (_delete_data != null)
{
foreach (var cb in _delete_data.Callbacks)
{
cb(ex.Message);
}
}
throw;
}
finally
{
if (lockAcquired)
{
Monitor.Exit(_write_lock);
}
}

If it is OK to process the task outside the scope of the current request, i.e. to queue it for later, then you can think of a sequence like this1:
Implement a resource lock (monitor) and a List of tasks.
For each request:
Lock the List, Add current task to the List, remember nr. of tasks in the List, unlock the List.
Try to acquire the lock.
If unsuccessful:
If the nr. of tasks in the list < threshold X, then Return.
Else Acquire the Lock (will block)
Lock the List, move it's contents to a temp list, unlock the List.
If temp list is not empty
Execute the tasks in the temp list.
Repeat from step 5.
Release the lock.
The first request will go through the whole sequence. Subsequent requests, if the first is still executing, will short-circuit at step 4.
Tune for the optimal threshold X (or change it to a time-based threshold).
1 If you need to wait for the task in the scope of the request, then you need to extend the process slightly:
Add two fields to the Task class: completion flag and exception.
At step 4, before Returning, wait for the task to complete (Monitor.Wait) until its completion flag becomes true. If exception is not null, throw it.
At step 6, for each task, set the completion flag and optionally the exception and then notify the waiters (Monitor.PulseAll).

Related

How to use Consul to perform synchronize task on one machine at a time?

I have a system with 10 machines where I need to perform a certain task on each machine one by one in synchronize order. Basically only one machine should do that task at a particular time. We already use Consul for some other purpose but I was thinking can we use Consul to do this as well?
I read more about it and it looks like we can use leader election with consul where each machine will try to acquire lock, do the work and then release the lock. Once work is done, it will release the lock and then other machine will try to acquire lock again and do the same work. This way everything will be synchronized one machine at a time.
I decided to use this C# PlayFab ConsulDotNet library which already has this capability built in looks like but if there is any better option available I am open to that as well. Below Action method in my code base is called on each machine at the same time almost through a watcher mechanism.
private void Action() {
// Try to acquire lock using Consul.
// If lock acquired then DoTheWork() otherwise keep waiting for it until lock is acquired.
// Once work is done, release the lock
// so that some other machine can acquire the lock and do the same work.
}
Now inside that above method I need to do below things -
Try to acquire lock. If you cannot acquire the lock wait for it since other machine might have grabbed it before you.
If lock acquired then DoTheWork().
Once work is done, release the lock so that some other machine can acquire the lock and do the same work.
Idea is all 10 machines should DoTheWork() one at a time in synchronize order. Based on this blog and this blog I decided to modify their example to fit our needs -
Below is my LeaderElectionService class:
public class LeaderElectionService
{
public LeaderElectionService(string leadershipLockKey)
{
this.key = leadershipLockKey;
}
public event EventHandler<LeaderChangedEventArgs> LeaderChanged;
string key;
CancellationTokenSource cts = new CancellationTokenSource();
Timer timer;
bool lastIsHeld = false;
IDistributedLock distributedLock;
public void Start()
{
timer = new Timer(async (object state) => await TryAcquireLock((CancellationToken)state), cts.Token, 0, Timeout.Infinite);
}
private async Task TryAcquireLock(CancellationToken token)
{
if (token.IsCancellationRequested)
return;
try
{
if (distributedLock == null)
{
var clientConfig = new ConsulClientConfiguration { Address = new Uri("http://consul.host.domain.com") };
ConsulClient client = new ConsulClient(clientConfig);
distributedLock = await client.AcquireLock(new LockOptions(key) { LockTryOnce = true, LockWaitTime = TimeSpan.FromSeconds(3) }, token).ConfigureAwait(false);
}
else
{
if (!distributedLock.IsHeld)
{
await distributedLock.Acquire(token).ConfigureAwait(false);
}
}
}
catch (LockMaxAttemptsReachedException ex)
{
//this is expected if it couldn't acquire the lock within the first attempt.
Console.WriteLine(ex.Stacktrace);
}
catch (Exception ex)
{
Console.WriteLine(ex.Stacktrace);
}
finally
{
bool lockHeld = distributedLock?.IsHeld == true;
HandleLockStatusChange(lockHeld);
//Retrigger the timer after a 10 seconds delay (in this example). Delay for 7s if not held as the AcquireLock call will block for ~3s in every failed attempt.
timer.Change(lockHeld ? 10000 : 7000, Timeout.Infinite);
}
}
protected virtual void HandleLockStatusChange(bool isHeldNew)
{
// Is this the right way to check and do the work here?
// In general I want to call method "DoTheWork" in "Action" method itself
// And then release and destroy the session once work is done.
if (isHeldNew)
{
// DoTheWork();
Console.WriteLine("Hello");
// And then were should I release the lock so that other machine can try to grab it?
// distributedLock.Release();
// distributedLock.Destroy();
}
if (lastIsHeld == isHeldNew)
return;
else
{
lastIsHeld = isHeldNew;
}
if (LeaderChanged != null)
{
LeaderChangedEventArgs args = new LeaderChangedEventArgs(lastIsHeld);
foreach (EventHandler<LeaderChangedEventArgs> handler in LeaderChanged.GetInvocationList())
{
try
{
handler(this, args);
}
catch (Exception ex)
{
Console.WriteLine(ex.Stacktrace);
}
}
}
}
}
And below is my LeaderChangedEventArgs class:
public class LeaderChangedEventArgs : EventArgs
{
private bool isLeader;
public LeaderChangedEventArgs(bool isHeld)
{
isLeader = isHeld;
}
public bool IsLeader { get { return isLeader; } }
}
In the above code there are lot of pieces which might not be needed for my use case but idea is same.
Problem Statement
Now in my Action method I would like to use above class and perform the task as soon as lock is acquired otherwise keep waiting for the lock. Once work is done, release and destroy the session so that other machine can grab it and do the work. I am kinda confuse on how to use above class properly in my below method.
private void Action() {
LeaderElectionService electionService = new LeaderElectionService("data/process");
// electionService.LeaderChanged += (source, arguments) => Console.WriteLine(arguments.IsLeader ? "Leader" : "Slave");
electionService.Start();
// now how do I wait for the lock to be acquired here indefinitely
// And once lock is acquired, do the work and then release and destroy the session
// so that other machine can grab the lock and do the work
}
I recently started working with C# so that's why kinda confuse on how to make this work efficiently in production by using Consul and this library.
Update
I tried with below code as per your suggestion and I think I tried this earlier as well but for some reason as soon as it goes to this line await distributedLock.Acquire(cancellationToken);, it just comes back to main method automatically. It never moves forward to my Doing Some Work! print out. Does CreateLock actually works? I am expecting that it will create data/lock on consul (since it is not there) and then try to acquire the lock on it and if acquired, then do the work and then release it for other machines?
private static CancellationTokenSource cts = new CancellationTokenSource();
public static void Main(string[] args)
{
Action(cts.Token);
Console.WriteLine("Hello World");
}
private static async Task Action(CancellationToken cancellationToken)
{
const string keyName = "data/lock";
var clientConfig = new ConsulClientConfiguration { Address = new Uri("http://consul.test.host.com") };
ConsulClient client = new ConsulClient(clientConfig);
var distributedLock = client.CreateLock(keyName);
while (true)
{
try
{
// Try to acquire lock
// As soon as it comes to this line,
// it just goes back to main method automatically. not sure why
await distributedLock.Acquire(cancellationToken);
// Lock is acquired
// DoTheWork();
Console.WriteLine("Doing Some Work!");
// Work is done. Jump out of loop to release the lock
break;
}
catch (LockHeldException)
{
// Cannot acquire the lock. Wait a while then retry
await Task.Delay(TimeSpan.FromSeconds(10), cancellationToken);
}
catch (Exception)
{
// TODO: Handle exception thrown by DoTheWork method
// Here we jump out of the loop to release the lock
// But you can try to acquire the lock again based on your requirements
break;
}
}
// Release and destroy the lock
// So that other machine can grab the lock and do the work
await distributedLock.Release(cancellationToken);
await distributedLock.Destroy(cancellationToken);
}

IMO, LeaderElectionService from those blogs is an overkill in your case.
Update 1
There is no need to do while loop because:
ConsulClient is local variable
No need to check IsHeld property
Acquire will block indefinitely unless
Set LockTryOnce true in LockOptions
Set timeout to CancellationToken
Side note, it is not necessary to invoke Destroy method after you call Release on the distributed lock (reference).
private async Task Action(CancellationToken cancellationToken)
{
const string keyName = "YOUR_KEY";
var client = new ConsulClient();
var distributedLock = client.CreateLock(keyName);
try
{
// Try to acquire lock
// NOTE:
// Acquire method will block indefinitely unless
// 1. Set LockTryOnce = true in LockOptions
// 2. Pass a timeout to cancellation token
await distributedLock.Acquire(cancellationToken);
// Lock is acquired
DoTheWork();
}
catch (Exception)
{
// TODO: Handle exception thrown by DoTheWork method
}
// Release the lock (not necessary to invoke Destroy method),
// so that other machine can grab the lock and do the work
await distributedLock.Release(cancellationToken);
}
Update 2
The reason why OP's code just returns back to Main method is that, Action method is not awaited. You can use async Main if you use C# 7.1, and put await on Action method.
public static async Task Main(string[] args)
{
await Action(cts.Token);
Console.WriteLine("Hello World");
}

Ensure a long running task is only fired once and subsequent request are queued but with only one entry in the queue

I have a compute intensive method Calculate that may run for a few seconds, requests come from multiple threads.
Only one Calculate should be executing, a subsequent request should be queued until the initial request completes. If there is already a request queued then the the subsequent request can be discarded (as the queued request will be sufficient)
There seems to be lots of potential solutions but I just need the simplest.
UPDATE: Here's my rudimentaryattempt:
private int _queueStatus;
private readonly object _queueStatusSync = new Object();
public void Calculate()
{
lock(_queueStatusSync)
{
if(_queueStatus == 2) return;
_queueStatus++;
if(_queueStatus == 2) return;
}
for(;;)
{
CalculateImpl();
lock(_queueStatusSync)
if(--_queueStatus == 0) return;
}
}
private void CalculateImpl()
{
// long running process will take a few seconds...
}

The simplest, cleanest solution IMO is using TPL Dataflow (as always) with a BufferBlock acting as the queue. BufferBlock is thread-safe, supports async-await, and more important, has TryReceiveAll to get all the items at once. It also has OutputAvailableAsync so you can wait asynchronously for items to be posted to the buffer. When multiple requests are posted you simply take the last and forget about the rest:
var buffer = new BufferBlock<Request>();
var task = Task.Run(async () =>
{
while (await buffer.OutputAvailableAsync())
{
IList<Request> requests;
buffer.TryReceiveAll(out requests);
Calculate(requests.Last());
}
});
Usage:
buffer.Post(new Request());
buffer.Post(new Request());
Edit: If you don't have any input or output for the Calculate method you can simply use a boolean to act as a switch. If it's true you can turn it off and calculate, if it became true again while Calculate was running then calculate again:
public bool _shouldCalculate;
public void Producer()
{
_shouldCalculate = true;
}
public async Task Consumer()
{
while (true)
{
if (!_shouldCalculate)
{
await Task.Delay(1000);
}
else
{
_shouldCalculate = false;
Calculate();
}
}
}

A BlockingCollection that only takes 1 at a time
The trick is to skip if there are any items in the collection
I would go with the answer from I3aron +1
This is (maybe) a BlockingCollection solution
public static void BC_AddTakeCompleteAdding()
{
using (BlockingCollection<int> bc = new BlockingCollection<int>(1))
{
// Spin up a Task to populate the BlockingCollection
using (Task t1 = Task.Factory.StartNew(() =>
{
for (int i = 0; i < 100; i++)
{
if (bc.TryAdd(i))
{
Debug.WriteLine(" add " + i.ToString());
}
else
{
Debug.WriteLine(" skip " + i.ToString());
}
Thread.Sleep(30);
}
bc.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
try
{
// Consume consume the BlockingCollection
while (true)
{
Debug.WriteLine("take " + bc.Take());
Thread.Sleep(100);
}
}
catch (InvalidOperationException)
{
// An InvalidOperationException means that Take() was called on a completed collection
Console.WriteLine("That's All!");
}
}))
Task.WaitAll(t1, t2);
}
}
}

It sounds like a classic producer-consumer. I'd recommend looking into BlockingCollection<T>. It is part of the System.Collection.Concurrent namespace. On top of that you can implement your queuing logic.
You may supply to a BlockingCollection any internal structure to hold its data, such as a ConcurrentBag<T>, ConcurrentQueue<T> etc. The latter is the default structure used.

Make sure ProcessingQueue.Count correct in multiple threading application

I have a windows service to process xml files in the linked list queue. The files in the queue were added by FileSystemWatcher event when files created.
namespace XMLFTP
{
public class XML_Processor : ServiceBase
{
public string s_folder { get; set; }
public XML_Processor(string folder)
{
s_folder = folder;
}
Thread worker;
FileSystemWatcher watcher;
DirectoryInfo my_Folder;
public static AutoResetEvent ResetEvent { get; set; }
bool running;
public bool Start()
{
my_Folder = new DirectoryInfo(s_folder);
bool success = true;
running = true;
worker = new Thread(new ThreadStart(ServiceLoop));
worker.Start();
// add files to queue by FileSystemWatcher event
return (success);
}
public bool Stop()
{
try
{
running = false;
watcher.EnableRaisingEvents = false;
worker.Join(ServiceSettings.ThreadJoinTimeOut);
}
catch (Exception ex)
{
return (false);
}
return (true);
}
public void ServiceLoop()
{
string fileName;
while (running)
{
Thread.Sleep(2000);
if (ProcessingQueue.Count > 0)
{
// process file and write info to DB.
}
}
}
void watcher_Created(object sender, FileSystemEventArgs e)
{
switch (e.ChangeType)
{
case WatcherChangeTypes.Created:// add files to queue
}
}
}
}
There might be a thread safe problem.
while (running)
{
Thread.Sleep(2000);
if (ProcessingQueue.Count > 0)
{
// process file and write info to DB.
}
}
As the access to ProcessingQueue.Count isn't protected by a lock, the Count can change, if a different thread alters the "queue". As result the process file part may fail. That's also the case if you implement the Count property as:
public static int Count
{
get { lock (syncRoot) return _files.Count; }
}
as the lock is released to early.
My two questions:
How to make ProcessingQueue.Count be correct?
If I use .NET Framework 4.5 BlockingCollection skills, the sample code as:
class ConsumingEnumerableDemo
{
// Demonstrates:
// BlockingCollection<T>.Add()
// BlockingCollection<T>.CompleteAdding()
// BlockingCollection<T>.GetConsumingEnumerable()
public static void BC_GetConsumingEnumerable()
{
using (BlockingCollection<int> bc = new BlockingCollection<int>())
{
// Kick off a producer task
Task.Factory.StartNew(() =>
{
for (int i = 0; i < 10; i++)
{
bc.Add(i);
Thread.Sleep(100); // sleep 100 ms between adds
}
// Need to do this to keep foreach below from hanging
bc.CompleteAdding();
});
// Now consume the blocking collection with foreach.
// Use bc.GetConsumingEnumerable() instead of just bc because the
// former will block waiting for completion and the latter will
// simply take a snapshot of the current state of the underlying collection.
foreach (var item in bc.GetConsumingEnumerable())
{
Console.WriteLine(item);
}
}
}
}
The sample uses a constant 10 as the iteration-clause, how to apply my dynamic count in queue to it?

With BlockingCollection, you don't have to know the count. The consumer knows to keep processing items until the queue is empty and IsCompleted is true. So you could have this:
var producer = Task.Factory.StartNew(() =>
{
// Add 10 items to the queue
foreach (var i in Enumerable.Range(0, 10))
queue.Add(i);
// Wait one minute
Thread.Sleep(TimeSpan.FromMinutes(1.0));
// Add 10 more items to the queue
foreach (var i in Enumerable.Range(10, 10))
queue.Add(i);
// mark the queue as complete for adding
queue.CompleteAdding();
});
// consumer
foreach (var item in queue.GetConsumingEnumerable())
{
Console.WriteLine(item);
}
The consumer will output the first 10 items, which empties the queue. But because the producer hasn't called CompleteAdding, the consumer will continue to block on the queue. It will catch the next 10 items that the producer writes. Then, the queue is empty and IsCompleted == true, so the consumer ends (GetConsumingEnumerable gets to the end of the queue).
You can check Count at any time you like, but the value you get is just a snapshot. By the time you evaluate it, it's likely that either the producer or the consumer will have modified the queue and changed the count. But it shouldn't matter. As long as you don't call CompleteAdding, the consumer will continue to wait for an item.
The number of items that the producer writes doesn't have to be constant. For example in my Simple Multithreading blog post, I show a producer that reads a file and writes the items to a BlockingCollection that's serviced by a consumer. The producer and consumer run concurrently, and everything goes until the producer reaches the end of the file.

How to handle this (db) queue race condition?

Basically i have multi threads that adds data into a queue via SQLite. I have another one thread that pulls them and process them one at a time (too much resource to do multiple at once). The processing thread does this:
pull data from DB
foreach { proccess }
if count == 0 { thread.suspend() } (waken by thread.resume())
repeat
my worker thread does:
Validates data
Inserts into DB
call Queue.Poke(QueueName)
When I poke it, if the thread is suspended I .resume() it.
What I am worried about is if the process thread sees count==0, my worker inserts and pokes then my process continues down the if and sleeps. It won't realize there is something new in the DB.
How should I write this in such a way that I won't have a race condition.

Processing thread:
event.Reset
pull data from DB
foreach { proccess }
if count == 0 then event.Wait
repeat
And the other thread:
Validates data
Inserts into DB
event.Set()
You'll have extra wakes (wake on an empty queue, nothing to process, go back to sleep) but you won't have missed inserts.

I think this may be the structure you need.
private readonly Queue<object> _Queue = new Queue<object>();
private readonly object _Lock = new object();
void FillQueue()
{
while (true)
{
var dbData = new { Found = true, Data = new object() };
if (dbData.Found)
{
lock (_Lock)
{
_Queue.Enqueue(dbData.Data);
}
}
// If you have multiple threads filling the queue you
// probably want to throttle it a little bit so the thread
// processing the queue won't be throttled.
// If 1ms is too long consider using
// TimeSpan.FromTicks(1000).
Thread.Sleep(1);
}
}
void ProcessQueue()
{
object data = null;
while (true)
{
lock (_Lock)
{
data = _Queue.Count > 0 ? _Queue.Dequeue() : null;
}
if (data == null)
{
Thread.Sleep(1);
}
else
{
// Proccess
}
}
}

Producer Consumer With AutoResetEvent

I'm trying to use the producer consumer pattern to process and save some data. I'm using AutoResetEvent for signalling between the two therads here is the code I have
Here is the producer function
public Results[] Evaluate()
{
processingComplete = false;
resultQueue.Clear();
for (int i = 0; i < data.Length; ++i)
{
if (saveThread.ThreadState == ThreadState.Unstarted)
saveThread.Start();
//-....
//Process data
//
lock (lockobject)
{
resultQueue.Enqueue(result);
}
signal.Set();
}
processingComplete = true;
}
And here is the consumer function
private void SaveResults()
{
Model dataAccess = new Model();
while (!processingComplete || resultQueue.Count > 0)
{
if (resultQueue.Count == 0)
signal.WaitOne();
ModelResults result;
lock (lockobject)
{
result = resultQueue.Dequeue();
}
dataAccess.Save(result);
}
SaveCompleteSignal.Set();
}
So my issue is sometimes resultQueue.Dequeue() throws InvalidOperation exception because the Queue is empty. I'm not sure what I'm doing wrong shouldn't the signal.WaitOne() above that block the the queue is empty?

You have synchronization issues due to a lack of proper locking.
You should lock all of the queue access, including the count check.
In addition, using Thread.ThreadState in this manner is a "bad idea". From the MSDN docs for ThreadState:
"Thread state is only of interest in debugging scenarios. Your code should never use thread state to synchronize the activities of threads."
You can't rely on this as a means of handling synchronization. You should redesign to make sure the thread will be started before it's used. If it's not started, just don't initialize it. (You can always use a null check - if the thread's null, create it and start it).

You check the Queue's Count outside of a synchronized context. Since the Queue is not threadsafe, this could be a problem (possibly while Enqueue is in process Count return 1 but no item can be dequeued), and it would go seriously wrong if you were to use more than one consumer anyways.
You may want to read the threading articles written by Joseph Albahari, he has also a good sample for your problem as well as a "better" solution without OS synchronization objects.

You have to put lock() around all references to the queue. You also have some issues around identifying processing complete (at the end of the queue you'll get a signal but the queue will be empty).
public Results[] Evaluate()
{
processingComplete = false;
lock(lockobject)
{
resultQueue.Clear();
}
for (int i = 0; i < data.Length; ++i)
{
if (saveThread.ThreadState == ThreadState.Unstarted)
saveThread.Start();
//-....
//Process data
//
lock (lockobject)
{
resultQueue.Enqueue(result);
}
signal.Set();
}
processingComplete = true;
}
private void SaveResults()
{
Model dataAccess = new Model();
while (true)
{
int count;
lock(lockobject)
{
count = resultQueue.Count;
}
if (count == 0)
signal.WaitOne();
lock(lockobject)
{
count = resultQueue.Count;
}
// we got a signal, but queue is empty, processing is complete
if (count == 0)
break;
ModelResults result;
lock (lockobject)
{
result = resultQueue.Dequeue();
}
dataAccess.Save(result);
}
SaveCompleteSignal.Set();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

a pattern for packing incoming parallel requests into one - c#

Related

How to use Consul to perform synchronize task on one machine at a time?

Ensure a long running task is only fired once and subsequent request are queued but with only one entry in the queue

Make sure ProcessingQueue.Count correct in multiple threading application

How to handle this (db) queue race condition?

Producer Consumer With AutoResetEvent

Categories

Resources