I am implementing a flow control component that limits maximum requests can be sent. Every worker thread can send either a single request or a batch of requests, but at any time the total amount of pending requests should not exceed a maximum number.
I initially want to implement with a SemaphoreSlim:
initialising the semaphore to the maximum request count, then when a worker thread is going to call service, it must acquire enough count of tokens, however I found actually SemaphoreSlim and Semaphore only allow a thread to decrease Semaphore count by 1, in my case I want to decrease the count by the number of requests that work thread carries.
What synchronization primitive should I use here?
Just to clarify, the service supports batch processing, so one thread can send a N requests in one service call, but accordingly it should be able to decrease semaphore's current count by N.
Below is a custom SemaphoreManyFifo class that offers the methods Wait(int acquireCount) method and Release(int releaseCount). Its behavior is strictly FIFO. It has a quite decent performance (~500,000 operations per second on 8 threads in my PC).
public class SemaphoreManyFifo : IDisposable
{
private readonly object _locker = new object();
private readonly Queue<(ManualResetEventSlim, int AcquireCount)> _queue;
private readonly ThreadLocal<ManualResetEventSlim> _pool;
private readonly int _maxCount;
private int _currentCount;
public int CurrentCount => Volatile.Read(ref _currentCount);
public SemaphoreManyFifo(int initialCount, int maxCount)
{
// Proper arguments validation omitted
Debug.Assert(initialCount >= 0);
Debug.Assert(maxCount > 0 && maxCount >= initialCount);
_queue = new Queue<(ManualResetEventSlim, int)>();
_pool = new ThreadLocal<ManualResetEventSlim>(
() => new ManualResetEventSlim(false), trackAllValues: true);
_currentCount = initialCount;
_maxCount = maxCount;
}
public SemaphoreManyFifo(int initialCount) : this(initialCount, Int32.MaxValue) { }
public void Wait(int acquireCount)
{
Debug.Assert(acquireCount > 0 && acquireCount <= _maxCount);
ManualResetEventSlim gate;
lock (_locker)
{
Debug.Assert(_currentCount >= 0 && _currentCount <= _maxCount);
if (acquireCount <= _currentCount && _queue.Count == 0)
{
_currentCount -= acquireCount; return; // Fast path
}
gate = _pool.Value;
gate.Reset(); // Important, because the gate is reused
_queue.Enqueue((gate, acquireCount));
}
gate.Wait();
}
public void Release(int releaseCount)
{
Debug.Assert(releaseCount > 0);
lock (_locker)
{
Debug.Assert(_currentCount >= 0 && _currentCount <= _maxCount);
if (releaseCount > _maxCount - _currentCount)
throw new SemaphoreFullException();
_currentCount += releaseCount;
while (_queue.Count > 0 && _queue.Peek().AcquireCount <= _currentCount)
{
var (gate, acquireCount) = _queue.Dequeue();
_currentCount -= acquireCount;
gate.Set();
}
}
}
public void Dispose()
{
foreach (var gate in _pool.Values) gate.Dispose();
_pool.Dispose();
}
}
Adding support for timeout and cancellation in the above implementation is not trivial. It would require a different (updateable) data structure instead of the Queue<T>.
The original Wait+Pulse implementation can be found in the 1st revision of this answer. It is simple, but it lacks the desirable FIFO behavior.
It looks to me like you want something like
using System;
using System.Collections.Generic;
using System.Threading;
namespace Sema
{
class Program
{
// do a little bit of timing magic
static ManualResetEvent go = new ManualResetEvent(false);
static void Main()
{
// limit the resources
var s = new SemaphoreSlim(30, 30);
// start up some threads
var threads = new List<Thread>();
for (int i = 0; i < 20; i++)
{
var start = new ParameterizedThreadStart(dowork);
Thread t = new Thread(start);
threads.Add(t);
t.Start(s);
}
go.Set();
// Wait until all threads finished
foreach (Thread thread in threads)
{
thread.Join();
}
Console.WriteLine();
}
private static void dowork(object obj)
{
go.WaitOne();
var s = (SemaphoreSlim) obj;
var batchsize = 3;
// acquire tokens
for (int i = 0; i < batchsize; i++)
{
s.Wait();
}
// send the request
Console.WriteLine("Working on a batch of size " + batchsize);
Thread.Sleep(200);
s.Release(batchsize);
}
}
}
However, you'll soon figure out that this causes deadlocks. You'll additionally need some synchronization on the semaphore to guarantee that one thread either gets all of its tokens or none.
var trylater = true;
while (trylater)
{
lock (s)
{
if (s.CurrentCount >= batchsize)
{
for (int i = 0; i < batchsize; i++)
{
s.Wait();
}
trylater = false;
}
}
if (trylater)
{
Thread.Sleep(20);
}
}
Now, that's potentially suffering from starvation. A huge batch (say 29) may never get enough resources while hundreds single requests are made.
Related
I am implementing a flow control component that limits maximum requests can be sent. Every worker thread can send either a single request or a batch of requests, but at any time the total amount of pending requests should not exceed a maximum number.
I initially want to implement with a SemaphoreSlim:
initialising the semaphore to the maximum request count, then when a worker thread is going to call service, it must acquire enough count of tokens, however I found actually SemaphoreSlim and Semaphore only allow a thread to decrease Semaphore count by 1, in my case I want to decrease the count by the number of requests that work thread carries.
What synchronization primitive should I use here?
Just to clarify, the service supports batch processing, so one thread can send a N requests in one service call, but accordingly it should be able to decrease semaphore's current count by N.
Below is a custom SemaphoreManyFifo class that offers the methods Wait(int acquireCount) method and Release(int releaseCount). Its behavior is strictly FIFO. It has a quite decent performance (~500,000 operations per second on 8 threads in my PC).
public class SemaphoreManyFifo : IDisposable
{
private readonly object _locker = new object();
private readonly Queue<(ManualResetEventSlim, int AcquireCount)> _queue;
private readonly ThreadLocal<ManualResetEventSlim> _pool;
private readonly int _maxCount;
private int _currentCount;
public int CurrentCount => Volatile.Read(ref _currentCount);
public SemaphoreManyFifo(int initialCount, int maxCount)
{
// Proper arguments validation omitted
Debug.Assert(initialCount >= 0);
Debug.Assert(maxCount > 0 && maxCount >= initialCount);
_queue = new Queue<(ManualResetEventSlim, int)>();
_pool = new ThreadLocal<ManualResetEventSlim>(
() => new ManualResetEventSlim(false), trackAllValues: true);
_currentCount = initialCount;
_maxCount = maxCount;
}
public SemaphoreManyFifo(int initialCount) : this(initialCount, Int32.MaxValue) { }
public void Wait(int acquireCount)
{
Debug.Assert(acquireCount > 0 && acquireCount <= _maxCount);
ManualResetEventSlim gate;
lock (_locker)
{
Debug.Assert(_currentCount >= 0 && _currentCount <= _maxCount);
if (acquireCount <= _currentCount && _queue.Count == 0)
{
_currentCount -= acquireCount; return; // Fast path
}
gate = _pool.Value;
gate.Reset(); // Important, because the gate is reused
_queue.Enqueue((gate, acquireCount));
}
gate.Wait();
}
public void Release(int releaseCount)
{
Debug.Assert(releaseCount > 0);
lock (_locker)
{
Debug.Assert(_currentCount >= 0 && _currentCount <= _maxCount);
if (releaseCount > _maxCount - _currentCount)
throw new SemaphoreFullException();
_currentCount += releaseCount;
while (_queue.Count > 0 && _queue.Peek().AcquireCount <= _currentCount)
{
var (gate, acquireCount) = _queue.Dequeue();
_currentCount -= acquireCount;
gate.Set();
}
}
}
public void Dispose()
{
foreach (var gate in _pool.Values) gate.Dispose();
_pool.Dispose();
}
}
Adding support for timeout and cancellation in the above implementation is not trivial. It would require a different (updateable) data structure instead of the Queue<T>.
The original Wait+Pulse implementation can be found in the 1st revision of this answer. It is simple, but it lacks the desirable FIFO behavior.
It looks to me like you want something like
using System;
using System.Collections.Generic;
using System.Threading;
namespace Sema
{
class Program
{
// do a little bit of timing magic
static ManualResetEvent go = new ManualResetEvent(false);
static void Main()
{
// limit the resources
var s = new SemaphoreSlim(30, 30);
// start up some threads
var threads = new List<Thread>();
for (int i = 0; i < 20; i++)
{
var start = new ParameterizedThreadStart(dowork);
Thread t = new Thread(start);
threads.Add(t);
t.Start(s);
}
go.Set();
// Wait until all threads finished
foreach (Thread thread in threads)
{
thread.Join();
}
Console.WriteLine();
}
private static void dowork(object obj)
{
go.WaitOne();
var s = (SemaphoreSlim) obj;
var batchsize = 3;
// acquire tokens
for (int i = 0; i < batchsize; i++)
{
s.Wait();
}
// send the request
Console.WriteLine("Working on a batch of size " + batchsize);
Thread.Sleep(200);
s.Release(batchsize);
}
}
}
However, you'll soon figure out that this causes deadlocks. You'll additionally need some synchronization on the semaphore to guarantee that one thread either gets all of its tokens or none.
var trylater = true;
while (trylater)
{
lock (s)
{
if (s.CurrentCount >= batchsize)
{
for (int i = 0; i < batchsize; i++)
{
s.Wait();
}
trylater = false;
}
}
if (trylater)
{
Thread.Sleep(20);
}
}
Now, that's potentially suffering from starvation. A huge batch (say 29) may never get enough resources while hundreds single requests are made.
I am trying to poll an API as fast and as efficiently as possible to get market data. The API allows you to get market data from batchSize markets per request. The API allows you to have 3 concurrent requests but no more (or throws errors).
I may be requesting data from many more than batchSize different markets.
I continuously loop through all of the markets, requesting the data in batches, one batch per thread and 3 threads at any time.
The total number of markets (and hence batches) can change at any time.
I'm using the following code:
private static object lockObj = new object();
private void PollMarkets()
{
const int NumberOfConcurrentRequests = 3;
for (int i = 0; i < NumberOfConcurrentRequests; i++)
{
int batch = 0;
Task.Factory.StartNew(async () =>
{
while (true)
{
if (markets.Count > 0)
{
List<string> batchMarketIds;
lock (lockObj)
{
var numBatches = (int)Math.Ceiling((double)markets.Count / batchSize);
batchMarketIds = markets.Keys.Skip(batch*batchSize).Take(batchSize).ToList();
batch = (batch + 1) % numBatches;
}
var marketData = await GetMarketData(batchMarketIds);
// Do something with marketData
}
else
{
await Task.Delay(1000); // wait for some markets to be added.
}
}
}
});
}
}
Even though there is a lock in the critical section, each thread starts with batch = 0 (each thread is often polling for duplicate data).
If I change batch to a private volatile field the above code works as I want it to (volatile and lock).
So for some reason my lock doesn't work? I feel like it's something obvious but I'm missing it.
I believe that it is best here to use a lock instead of a volatile field, is this also correct?
Thanks
The issue was that you were defining the batch variable inside the for loop. That meant that the threads were using their own variable instead of sharing it.
In my mind you should use Queue<> to create a jobs pipeline.
Something like this
private int batchSize = 10;
private Queue<int> queue = new Queue<int>();
private void AddMarket(params int[] marketIDs)
{
lock (queue)
{
foreach (var marketID in marketIDs)
{
queue.Enqueue(marketID);
}
if (queue.Count >= batchSize)
{
Monitor.Pulse(queue);
}
}
}
private void Start()
{
for (var tid = 0; tid < 3; tid++)
{
Task.Run(async () =>
{
while (true)
{
List<int> toProcess;
lock (queue)
{
if (queue.Count < batchSize)
{
Monitor.Wait(queue);
continue;
}
toProcess = new List<int>(batchSize);
for (var count = 0; count < batchSize; count++)
{
toProcess.Add(queue.Dequeue());
}
if (queue.Count >= batchSize)
{
Monitor.Pulse(queue);
}
}
var marketData = await GetMarketData(toProcess);
}
});
}
}
I have following code which throws SemaphoreFullException, I don't understand why ?
If I change _semaphore = new SemaphoreSlim(0, 2) to
_semaphore = new SemaphoreSlim(0, int.MaxValue)
then all works fine.
Can anyone please find fault with this code and explain to me.
class BlockingQueue<T>
{
private Queue<T> _queue = new Queue<T>();
private SemaphoreSlim _semaphore = new SemaphoreSlim(0, 2);
public void Enqueue(T data)
{
if (data == null) throw new ArgumentNullException("data");
lock (_queue)
{
_queue.Enqueue(data);
}
_semaphore.Release();
}
public T Dequeue()
{
_semaphore.Wait();
lock (_queue)
{
return _queue.Dequeue();
}
}
}
public class Test
{
private static BlockingQueue<string> _bq = new BlockingQueue<string>();
public static void Main()
{
for (int i = 0; i < 100; i++)
{
_bq.Enqueue("item-" + i);
}
for (int i = 0; i < 5; i++)
{
Thread t = new Thread(Produce);
t.Start();
}
for (int i = 0; i < 100; i++)
{
Thread t = new Thread(Consume);
t.Start();
}
Console.ReadLine();
}
private static Random _random = new Random();
private static void Produce()
{
while (true)
{
_bq.Enqueue("item-" + _random.Next());
Thread.Sleep(2000);
}
}
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
}
}
}
If you want to use the semaphore to control the number of concurrent threads, you're using it wrong. You should acquire the semaphore when you dequeue an item, and release the semaphore when the thread is done processing that item.
What you have right now is a system that allows only two items to be in the queue at any one time. Initially, your semaphore has a count of 2. Each time you enqueue an item, the count is reduced. After two items, the count is 0 and if you try to release again you're going to get a semaphore full exception.
If you really want to do this with a semaphore, you need to remove the Release call from the Enqueue method. And add a Release method to the BlockingQueue class. You then would write:
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
bq.Release();
}
}
That would make your code work, but it's not a very good solution. A much better solution would be to use BlockingCollection<T> and two persistent consumers. Something like:
private BlockingCollection<int> bq = new BlockingCollection<int>();
void Test()
{
// create two consumers
var c1 = new Thread(Consume);
var c2 = new Thread(Consume);
c1.Start();
c2.Start();
// produce
for (var i = 0; i < 100; ++i)
{
bq.Add(i);
}
bq.CompleteAdding();
c1.Join();
c2.Join();
}
void Consume()
{
foreach (var i in bq.GetConsumingEnumerable())
{
Console.WriteLine("Consumed-" + i);
Thread.Sleep(1000);
}
}
That gives you two persistent threads consuming the items. The benefit is that you avoid the cost of spinning up a new thread (or having the RTL assign a pool thread) for each item. Instead, the threads do non-busy waits on the queue. You also don't have to worry about explicit locking, etc. The code is simpler, more robust, and much less likely to contain a bug.
For some reasons I have to stick to .NET 3.5 and I need a functionality of Barrier class from .NET 4. I have a bunch of threads that do some work and I want them to wait for each other until all are done. When all are done I want that they do the job again and again in the similar manner.
Encouraged by the thread Difference between Barrier in C# 4.0 and WaitHandle in C# 3.0? I have decided to implement the Barrier functionality with AutoResetEvent and WaitHandle classes.
Altough I encounter a problem with my code:
class Program
{
const int numOfThreads = 3;
static AutoResetEvent[] barrier = new AutoResetEvent[numOfThreads];
static Random random = new Random(System.DateTime.Now.Millisecond);
static void barriers2(object barrierObj)
{
AutoResetEvent[] barrierLocal = (AutoResetEvent[])barrierObj;
string name = Thread.CurrentThread.Name;
for (int i = 0; i < 10; i++)
{
int sleepTime = random.Next(2000, 10000);
System.Console.Out.WriteLine("Thread {0} at the 'barrier' will sleep for {1}.", name, sleepTime);
Thread.Sleep(sleepTime);
System.Console.Out.WriteLine("Thread {0} at the 'barrier' with time {1}.", name, sleepTime);
int currentId = Convert.ToInt32(name);
//for(int z = 0; z < numOfThreads; z++)
barrierLocal[currentId].Set();
WaitHandle.WaitAll(barrier);
/*
for (int k = 0; k < numOfThreads; k++)
{
if (k == currentId)
{
continue;
}
System.Console.Out.WriteLine("Thread {0} is about to wait for the singla from thread: {1}", name, k);
barrierLocal[k].WaitOne();
System.Console.Out.WriteLine("Thread {0} is about to wait for the singla from thread: {1}. done", name, k);
}
*/
}
}
static void Main(string[] args)
{
for (int i = 0; i < numOfThreads; i++)
{
barrier[i] = new AutoResetEvent(false);
}
for (int i = 0; i < numOfThreads; i++)
{
Thread t = new Thread(Program.barriers2);
t.Name = Convert.ToString(i);
t.Start(barrier);
}
}
}
The output I receive is as follows:
Thread 0 at the 'barrier' will sleep for 7564
Thread 1 at the 'barrier' will sleep for 5123
Thread 2 at the 'barrier' will sleep for 4237
Thread 2 at the 'barrier' with time 4237
Thread 1 at the 'barrier' with time 5123
Thread 0 at the 'barrier' with time 7564
Thread 0 at the 'barrier' will sleep for 8641
Thread 0 at the 'barrier' with time 8641
And that's it. After the last line there is no more output and the app does not terminate. It looks like there is some sort of deadlock. However can not find the issue. Any help welcome.
Thanks!
That's because you use AutoResetEvent. One of the thread's WaitAll() call is going to complete first. Which automatically causes Reset() on all the AREs. Which prevents the other threads from ever completing their WaitAll() calls.
A ManualResetEvent is required here.
Download the Reactive Extensions backport for .NET 3.5. You will find the Barrier class along with the other useful concurrent data structures and synchronization mechanisms that were released in .NET 4.0.
Here is my implementation I use for my XNA game. Barrier was not available when I wrote this, and I am still stuck with .Net 3.5. It requires three sets of ManualResetEvents, and a counter array to keep phase.
using System;
using System.Threading;
namespace Colin.Threading
{
/// <summary>
/// Threading primitive for "barrier" sync, where N threads must stop at certain points
/// and wait for all their bretheren before continuing.
/// </summary>
public sealed class NThreadGate
{
public int mNumThreads;
private ManualResetEvent[] mEventsA;
private ManualResetEvent[] mEventsB;
private ManualResetEvent[] mEventsC;
private ManualResetEvent[] mEventsBootStrap;
private Object mLockObject;
private int[] mCounter;
private int mCurrentThreadIndex = 0;
public NThreadGate(int numThreads)
{
this.mNumThreads = numThreads;
this.mEventsA = new ManualResetEvent[this.mNumThreads];
this.mEventsB = new ManualResetEvent[this.mNumThreads];
this.mEventsC = new ManualResetEvent[this.mNumThreads];
this.mEventsBootStrap = new ManualResetEvent[this.mNumThreads];
this.mCounter = new int[this.mNumThreads];
this.mLockObject = new Object();
for (int i = 0; i < this.mNumThreads; i++)
{
this.mEventsA[i] = new ManualResetEvent(false);
this.mEventsB[i] = new ManualResetEvent(false);
this.mEventsC[i] = new ManualResetEvent(false);
this.mEventsBootStrap[i] = new ManualResetEvent(false);
this.mCounter[i] = 0;
}
}
/// <summary>
/// Adds a new thread to the gate system.
/// </summary>
/// <returns>Returns a thread ID for this thread, to be used later when waiting.</returns>
public int AddThread()
{
lock (this.mLockObject)
{
this.mEventsBootStrap[this.mCurrentThreadIndex].Set();
this.mCurrentThreadIndex++;
return this.mCurrentThreadIndex - 1;
}
}
/// <summary>
/// Stop here and wait for all the other threads in the NThreadGate. When all the threads have arrived at this call, they
/// will unblock and continue.
/// </summary>
/// <param name="myThreadID">The thread ID of the caller</param>
public void WaitForOtherThreads(int myThreadID)
{
// Make sure all the threads are ready.
WaitHandle.WaitAll(this.mEventsBootStrap);
// Rotate between three phases.
int phase = this.mCounter[myThreadID];
if (phase == 0) // Flip
{
this.mEventsA[myThreadID].Set();
WaitHandle.WaitAll(this.mEventsA);
this.mEventsC[myThreadID].Reset();
}
else if (phase == 1) // Flop
{
this.mEventsB[myThreadID].Set();
WaitHandle.WaitAll(this.mEventsB);
this.mEventsA[myThreadID].Reset();
}
else // Floop
{
this.mEventsC[myThreadID].Set();
WaitHandle.WaitAll(this.mEventsC);
this.mEventsB[myThreadID].Reset();
this.mCounter[myThreadID] = 0;
return;
}
this.mCounter[myThreadID]++;
}
}
}
Setting up the thread gate:
private void SetupThreads()
{
// Make an NThreadGate for N threads.
this.mMyThreadGate = new NThreadGate(Environment.ProcessorCount);
// Make some threads...
// e.g. new Thread(new ThreadStart(this.DoWork);
}
Thread worker method:
private void DoWork()
{
int localThreadID = this.mMyThreadGate.AddThread();
while (this.WeAreStillRunning)
{
// Signal this thread as waiting at the barrier
this.mMyThreadGate.WaitForOtherThreads(localThreadID);
// Synchronized work here...
// Signal this thread as waiting at the barrier
this.mMyThreadGate.WaitForOtherThreads(localThreadID);
// Synchronized work here...
// Signal this thread as waiting at the barrier
this.mMyThreadGate.WaitForOtherThreads(localThreadID);
}
}
Example for threading queue book "Accelerated C# 2008" (CrudeThreadPool class) not work correctly. If I insert long job in WorkFunction() on 2-processor machine executing for next task don't run before first is over. How to solve this problem? I want to load the processor to 100 percent
public class CrudeThreadPool
{
static readonly int MAX_WORK_THREADS = 4;
static readonly int WAIT_TIMEOUT = 2000;
public delegate void WorkDelegate();
public CrudeThreadPool()
{
stop = 0;
workLock = new Object();
workQueue = new Queue();
threads = new Thread[MAX_WORK_THREADS];
for (int i = 0; i < MAX_WORK_THREADS; ++i)
{
threads[i] = new Thread(new ThreadStart(this.ThreadFunc));
threads[i].Start();
}
}
private void ThreadFunc()
{
lock (workLock)
{
int shouldStop = 0;
do
{
shouldStop = Interlocked.Exchange(ref stop, stop);
if (shouldStop == 0)
{
WorkDelegate workItem = null;
if (Monitor.Wait(workLock, WAIT_TIMEOUT))
{
// Process the item on the front of the queue
lock (workQueue)
{
workItem = (WorkDelegate)workQueue.Dequeue();
}
workItem();
}
}
} while (shouldStop == 0);
}
}
public void SubmitWorkItem(WorkDelegate item)
{
lock (workLock)
{
lock (workQueue)
{
workQueue.Enqueue(item);
}
Monitor.Pulse(workLock);
}
}
public void Shutdown()
{
Interlocked.Exchange(ref stop, 1);
}
private Queue workQueue;
private Object workLock;
private Thread[] threads;
private int stop;
}
public class EntryPoint
{
static void WorkFunction()
{
Console.WriteLine("WorkFunction() called on Thread 0}", Thread.CurrentThread.GetHashCode());
//some long job
double s = 0;
for (int i = 0; i < 100000000; i++)
s += Math.Sin(i);
}
static void Main()
{
CrudeThreadPool pool = new CrudeThreadPool();
for (int i = 0; i < 10; ++i)
{
pool.SubmitWorkItem(
new CrudeThreadPool.WorkDelegate(EntryPoint.WorkFunction));
}
pool.Shutdown();
}
}
I can see 2 problems:
Inside ThreadFunc() you take a lock(workLock) for the duration of the method, meaning your threadpool is no longer async.
in the Main() method, you close down the threadpool w/o waiting for it to finish. Oddly enough that is why it is working now, stopping each ThreadFunc after 1 loop.
It's hard to tell because there's no indentation, but it looks to me like it's executing the work item while still holding workLock - which is basically going to serialize all the work.
If at all possible, I suggest you start using the Parallel Extensions framework in .NET 4, which has obviously had rather more time spent on it. Otherwise, there's the existing thread pool in the framework, and there are other implementations around if you're willing to have a look. I have one in MiscUtil although I haven't looked at the code for quite a while - it's pretty primitive.