Producer Consumer in C# with multiple (parallel) consumers and no TPL Dataflow

Producer Consumer in C# with multiple (parallel) consumers and no TPL Dataflow - c#

I am trying to implement producer/consumer pattern with multiple or parallel consumers.
I did an implementation but I would like to know how good it is. Can somebody do better? Can any of you spot any errors?
Unfortunately I can not use TPL dataflow, because we are at the end of our project and to put in an extra library in our package would take to much paperwork and we do not have that time.
What I am trying to do is to speed up the following portion:
anIntermediaryList = StepOne(anInputList); // I will put StepOne as Producer :-) Step one is remote call.
aResultList = StepTwo(anIntermediaryList); // I will put StepTwo as Consumer, however he also produces result. Step two is also a remote call.
// StepOne is way faster than StepTwo.
For this I came up with the idea that I will chunk the input list (anInputList)
StepOne will be inside of a Producer and will put the intermediary chunks into a queue.
There will be multiple Producers and they will take the intermediary results and process it with StepTwo.
Here is a simplified version of of the implementation later:
Task.Run(() => {
aChunkinputList = Split(anInputList)
foreach(aChunk in aChunkinputList)
{
anIntermediaryResult = StepOne(aChunk)
intermediaryQueue.Add(anIntermediaryResult)
}
})
while(intermediaryQueue.HasItems)
{
anItermediaryResult = intermediaryQueue.Dequeue()
Task.Run(() => {
aResultList = StepTwo(anItermediaryResult);
resultQueue.Add(aResultList)
}
}
I also thought that the best number for the parallel running Consumers would be: "Environment.ProcessorCount / 2". I would like to know if this also is a good idea.
Now here is my mock implementation and the question is can somebody do better or spot any error?
class Example
{
protected static readonly int ParameterCount_ = 1000;
protected static readonly int ChunkSize_ = 100;
// This might be a good number for the parallel consumers.
protected static readonly int ConsumerCount_ = Environment.ProcessorCount / 2;
protected Semaphore mySemaphore_ = new Semaphore(Example.ConsumerCount_, Example.ConsumerCount_);
protected ConcurrentQueue<List<int>> myIntermediaryQueue_ = new ConcurrentQueue<List<int>>();
protected ConcurrentQueue<List<int>> myResultQueue_ = new ConcurrentQueue<List<int>>();
public void Main()
{
List<int> aListToProcess = new List<int>(Example.ParameterCount_ + 1);
aListToProcess.AddRange(Enumerable.Range(0, Example.ParameterCount_));
Task aProducerTask = Task.Run(() => Producer(aListToProcess));
List<Task> aTaskList = new List<Task>();
while(!aProducerTask.IsCompleted || myIntermediaryQueue_.Count > 0)
{
List<int> aChunkToProcess;
if (myIntermediaryQueue_.TryDequeue(out aChunkToProcess))
{
mySemaphore_.WaitOne();
aTaskList.Add(Task.Run(() => Consumer(aChunkToProcess)));
}
}
Task.WaitAll(aTaskList.ToArray());
List<int> aResultList = new List<int>();
foreach(List<int> aChunk in myResultQueue_)
{
aResultList.AddRange(aChunk);
}
aResultList.Sort();
if (aListToProcess.SequenceEqual(aResultList))
{
Console.WriteLine("All good!");
}
else
{
Console.WriteLine("Bad, very bad!");
}
}
protected void Producer(List<int> elements_in)
{
List<List<int>> aChunkList = Example.SplitList(elements_in, Example.ChunkSize_);
foreach(List<int> aChunk in aChunkList)
{
Console.WriteLine("Thread Id: {0} Producing from: ({1}-{2})",
Thread.CurrentThread.ManagedThreadId,
aChunk.First(),
aChunk.Last());
myIntermediaryQueue_.Enqueue(ProduceItemsRemoteCall(aChunk));
}
}
protected void Consumer(List<int> elements_in)
{
Console.WriteLine("Thread Id: {0} Consuming from: ({1}-{2})",
Thread.CurrentThread.ManagedThreadId,
Convert.ToInt32(Math.Sqrt(elements_in.First())),
Convert.ToInt32(Math.Sqrt(elements_in.Last())));
myResultQueue_.Enqueue(ConsumeItemsRemoteCall(elements_in));
mySemaphore_.Release();
}
// Dummy Remote Call
protected List<int> ProduceItemsRemoteCall(List<int> elements_in)
{
return elements_in.Select(x => x * x).ToList();
}
// Dummy Remote Call
protected List<int> ConsumeItemsRemoteCall(List<int> elements_in)
{
return elements_in.Select(x => Convert.ToInt32(Math.Sqrt(x))).ToList();
}
public static List<List<int>> SplitList(List<int> masterList_in, int chunkSize_in)
{
List<List<int>> aReturnList = new List<List<int>>();
for (int i = 0; i < masterList_in.Count; i += chunkSize_in)
{
aReturnList.Add(masterList_in.GetRange(i, Math.Min(chunkSize_in, masterList_in.Count - i)));
}
return aReturnList;
}
}
Main function:
class Program
{
static void Main(string[] args)
{
Example anExample = new Example();
anExample.Main();
}
}
Bye
Laszlo

Based on the comments I've posted a second and third version:
https://codereview.stackexchange.com/questions/71182/producer-consumer-in-c-with-multiple-parallel-consumers-and-no-tpl-dataflow/71233#71233

Related

Multi Thread & Semaphore & Events

I am trying to execute some commands that come from RabbitMQ. Its about 5 msgs/sec. So as are too many msg, I have to send to a thread to execute, but I dont have so many threads, so I put a limit of 10.
so the ideia was that the msgs would come to the worker, put in a queue and any of the 10 threads would peak and execute. All these using semaphore.
After some experiments, I don´t know why, but my thread only executes 3 or 4 items, after that it just stops with no error...
The problem I think is the logic when the event calls the method to execute, could not think in a better way...
Why just the first 4 msgs are processed??
What pattern or better way to do this?
Here are some parts of my code:
const int MaxThreads = 10;
private static Semaphore sem = new Semaphore(MaxThreads, MaxThreads);
private static Queue<BasicDeliverEventArgs> queue = new Queue<BasicDeliverEventArgs>();
static void Main(string[] args)
{
consumer.Received += (sender, ea) =>
{
var m = JsonConvert.DeserializeObject<Mail>(ea.Body.GetString());
Console.WriteLine($"Sub-> {m.Subject}");
queue.Enqueue(ea);
RUN();
};
channel.BasicConsume(queueName, false, consumer);
Console.Read();
}
private static void RUN()
{
while (queue.Count > 0)
{
sem.WaitOne();
var item = queue.Dequeue();
ThreadPool.QueueUserWorkItem(sendmail, item);
}
}
private static void sendmail(Object item)
{
//.....soem processing stuff....
//tell rabbitMq that everything was OK
channel.BasicAck(deliveryTag: x.DeliveryTag, multiple: true);
//release thread
sem.Release();
}

I think that you could use a blocking collection here. It will simplify the code.
So your email sender would look something like that:
public class ParallelEmailSender : IDisposable
{
private readonly BlockingCollection<string> blockingCollection;
public ParallelEmailSender(int threadsCount)
{
blockingCollection = new BlockingCollection<string>(new ConcurrentQueue<string>());
for (int i = 0; i < threadsCount; i++)
{
Task.Factory.StartNew(SendInternal);
}
}
public void Send(string message)
{
blockingCollection.Add(message);
}
private void SendInternal()
{
foreach (string message in blockingCollection.GetConsumingEnumerable())
{
// send method
}
}
public void Dispose()
{
blockingCollection.CompleteAdding();
}
}
Of course you will need to add error catching logic and you could also improve the app shutting down process by using cancellation tokens.
I strongly suggest to read the great e-book about multithreading programming written by Joseph Albahari.

Data Propagation in TPL Dataflow Pipeline with Batchblock.Triggerbatch()

In my Producer-Consumer scenario, I have multiple consumers, and each of the consumers send an action to external hardware, which may take some time. My Pipeline looks somewhat like this:
BatchBlock --> TransformBlock --> BufferBlock --> (Several) ActionBlocks
I have assigned BoundedCapacity of my ActionBlocks to 1.
What I want in theory is, I want to trigger the Batchblock to send a group of items to the Transformblock only when one of my Actionblocks are available for operation. Till then the Batchblock should just keep buffering elements and not pass them on to the Transformblock. My batch-sizes are variable. As Batchsize is mandatory, I do have a really high upper-limit for BatchBlock batch size, however I really don't wish to reach upto that limit, I would like to trigger my batches depending upon the availability of the Actionblocks permforming the said task.
I have achieved this with the help of the Triggerbatch() method. I am calling the Batchblock.Triggerbatch() as the last action in my ActionBlock.However interestingly after several days of working properly the pipeline has come to a hault. Upon checking I found out that sometimes the inputs to the batchblock come in after the ActionBlocks are done with their work. In this case the ActionBlocks do actually call Triggerbatch at the end of their work, however since at this point there is no input to the Batchblock at all, the call to TriggerBatch is fruitless. And after a while when inputs do flow in to the Batchblock, there is no one left to call TriggerBatch and restart the Pipeline. I was looking for something where I could just check if something is infact present in the inputbuffer of the Batchblock, however there is no such feature available, I could also not find a way to check if the TriggerBatch was fruitful.
Could anyone suggest a possible solution to my problem. Unfortunately using a Timer to triggerbatches is not an option for me. Except for the start of the Pipeline, the throttling should be governed only by the availability of one of the ActionBlocks.
The example code is here:
static BatchBlock<int> _groupReadTags;
static void Main(string[] args)
{
_groupReadTags = new BatchBlock<int>(1000);
var bufferOptions = new DataflowBlockOptions{BoundedCapacity = 2};
BufferBlock<int> _frameBuffer = new BufferBlock<int>(bufferOptions);
var consumerOptions = new ExecutionDataflowBlockOptions { BoundedCapacity = 1};
int batchNo = 1;
TransformBlock<int[], int> _workingBlock = new TransformBlock<int[], int>(list =>
{
Console.WriteLine("\n\nWorking on Batch Number {0}", batchNo);
//_groupReadTags.TriggerBatch();
int sum = 0;
foreach (int item in list)
{
Console.WriteLine("Elements in batch {0} :: {1}", batchNo, item);
sum += item;
}
batchNo++;
return sum;
});
ActionBlock<int> _worker1 = new ActionBlock<int>(async x =>
{
Console.WriteLine("Number from ONE :{0}",x);
await Task.Delay(500);
Console.WriteLine("BatchBlock Output Count : {0}", _groupReadTags.OutputCount);
_groupReadTags.TriggerBatch();
},consumerOptions);
ActionBlock<int> _worker2 = new ActionBlock<int>(async x =>
{
Console.WriteLine("Number from TWO :{0}", x);
await Task.Delay(2000);
_groupReadTags.TriggerBatch();
}, consumerOptions);
_groupReadTags.LinkTo(_workingBlock);
_workingBlock.LinkTo(_frameBuffer);
_frameBuffer.LinkTo(_worker1);
_frameBuffer.LinkTo(_worker2);
_groupReadTags.Post(10);
_groupReadTags.Post(20);
_groupReadTags.TriggerBatch();
Task postingTask = new Task(() => PostStuff());
postingTask.Start();
Console.ReadLine();
}
static void PostStuff()
{
for (int i = 0; i < 10; i++)
{
_groupReadTags.Post(i);
Thread.Sleep(100);
}
Parallel.Invoke(
() => _groupReadTags.Post(100),
() => _groupReadTags.Post(200),
() => _groupReadTags.Post(300),
() => _groupReadTags.Post(400),
() => _groupReadTags.Post(500),
() => _groupReadTags.Post(600),
() => _groupReadTags.Post(700),
() => _groupReadTags.Post(800)
);
}

Here is an alternative BatchBlock implementation with some extra features. It includes a TriggerBatch method with this signature:
public int TriggerBatch(int nextMinBatchSizeIfEmpty);
Invoking this method will either trigger a batch immediately if the input queue is not empty, otherwise it will set a temporary MinBatchSize that will affect only the next batch. You could invoke this method with a small value for nextMinBatchSizeIfEmpty to ensure that in case a batch cannot be currently produced, the next batch will occur sooner than the configured BatchSize at the block's constructor.
This method returns the size of the batch produced. It returns 0 in case that the input queue is empty, or the output queue is full, or the block has completed.
public class BatchBlockEx<T> : ITargetBlock<T>, ISourceBlock<T[]>
{
private readonly ITargetBlock<T> _input;
private readonly IPropagatorBlock<T[], T[]> _output;
private readonly Queue<T> _queue;
private readonly object _locker = new object();
private int _nextMinBatchSize = Int32.MaxValue;
public Task Completion { get; }
public int InputCount { get { lock (_locker) return _queue.Count; } }
public int OutputCount => ((BufferBlock<T[]>)_output).Count;
public int BatchSize { get; }
public BatchBlockEx(int batchSize, DataflowBlockOptions dataflowBlockOptions = null)
{
if (batchSize < 1) throw new ArgumentOutOfRangeException(nameof(batchSize));
dataflowBlockOptions = dataflowBlockOptions ?? new DataflowBlockOptions();
if (dataflowBlockOptions.BoundedCapacity != DataflowBlockOptions.Unbounded &&
dataflowBlockOptions.BoundedCapacity < batchSize)
throw new ArgumentOutOfRangeException(nameof(batchSize),
"Number must be no greater than the value specified in BoundedCapacity.");
this.BatchSize = batchSize;
_output = new BufferBlock<T[]>(dataflowBlockOptions);
_queue = new Queue<T>(batchSize);
_input = new ActionBlock<T>(async item =>
{
T[] batch = null;
lock (_locker)
{
_queue.Enqueue(item);
if (_queue.Count == batchSize || _queue.Count >= _nextMinBatchSize)
{
batch = _queue.ToArray(); _queue.Clear();
_nextMinBatchSize = Int32.MaxValue;
}
}
if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
}, new ExecutionDataflowBlockOptions()
{
BoundedCapacity = 1,
CancellationToken = dataflowBlockOptions.CancellationToken
});
var inputContinuation = _input.Completion.ContinueWith(async t =>
{
try
{
T[] batch = null;
lock (_locker)
{
if (_queue.Count > 0)
{
batch = _queue.ToArray(); _queue.Clear();
}
}
if (batch != null) await _output.SendAsync(batch).ConfigureAwait(false);
}
finally
{
if (t.IsFaulted)
{
_output.Fault(t.Exception.InnerException);
}
else
{
_output.Complete();
}
}
}, TaskScheduler.Default).Unwrap();
this.Completion = Task.WhenAll(inputContinuation, _output.Completion);
}
public void Complete() => _input.Complete();
void IDataflowBlock.Fault(Exception ex) => _input.Fault(ex);
public int TriggerBatch(Func<T[], bool> condition, int nextMinBatchSizeIfEmpty)
{
if (nextMinBatchSizeIfEmpty < 1)
throw new ArgumentOutOfRangeException(nameof(nextMinBatchSizeIfEmpty));
int count = 0;
lock (_locker)
{
if (_queue.Count > 0)
{
T[] batch = _queue.ToArray();
if (condition == null || condition(batch))
{
bool accepted = _output.Post(batch);
if (accepted) { _queue.Clear(); count = batch.Length; }
}
_nextMinBatchSize = Int32.MaxValue;
}
else
{
_nextMinBatchSize = nextMinBatchSizeIfEmpty;
}
}
return count;
}
public int TriggerBatch(Func<T[], bool> condition)
=> TriggerBatch(condition, Int32.MaxValue);
public int TriggerBatch(int nextMinBatchSizeIfEmpty)
=> TriggerBatch(null, nextMinBatchSizeIfEmpty);
public int TriggerBatch() => TriggerBatch(null, Int32.MaxValue);
DataflowMessageStatus ITargetBlock<T>.OfferMessage(
DataflowMessageHeader messageHeader, T messageValue,
ISourceBlock<T> source, bool consumeToAccept)
{
return _input.OfferMessage(messageHeader, messageValue, source,
consumeToAccept);
}
T[] ISourceBlock<T[]>.ConsumeMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target, out bool messageConsumed)
{
return _output.ConsumeMessage(messageHeader, target, out messageConsumed);
}
bool ISourceBlock<T[]>.ReserveMessage(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target)
{
return _output.ReserveMessage(messageHeader, target);
}
void ISourceBlock<T[]>.ReleaseReservation(DataflowMessageHeader messageHeader,
ITargetBlock<T[]> target)
{
_output.ReleaseReservation(messageHeader, target);
}
IDisposable ISourceBlock<T[]>.LinkTo(ITargetBlock<T[]> target,
DataflowLinkOptions linkOptions)
{
return _output.LinkTo(target, linkOptions);
}
}
Another overload of the TriggerBatch method allows to examine the batch that can be currently produced, and decide if it should be triggered or not:
public int TriggerBatch(Func<T[], bool> condition);
The BatchBlockEx class does not support the Greedy and MaxNumberOfGroups options of the built-in BatchBlock.

I have found that using TriggerBatch in this way is unreliable:
_groupReadTags.Post(10);
_groupReadTags.Post(20);
_groupReadTags.TriggerBatch();
Apparently TriggerBatch is intended to be used inside the block, not outside it like this. I have seen this result in odd timing issues, like items from next batch batch being included in the current batch, even though TriggerBatch was called first.
Please see my answer to this question for an alternative using DataflowBlock.Encapsulate: BatchBlock produces batch with elements sent after TriggerBatch()

Using threads to parse multiple Html pages faster

Here's what I'm trying to do:
Get one html page from url which contains multiple links inside
Visit each link
Extract some data from visited link and create object using it
So far All i did is just simple and slow way:
public List<Link> searchLinks(string name)
{
List<Link> foundLinks = new List<Link>();
// getHtmlDocument() just returns HtmlDocument using input url.
HtmlDocument doc = getHtmlDocument(AU_SEARCH_URL + fixSpaces(name));
var link_list = doc.DocumentNode.SelectNodes(#"/html/body/div[#id='parent-container']/div[#id='main-content']/ol[#id='searchresult']/li/h2/a");
foreach (var link in link_list)
{
// TODO Threads
// getObject() creates object using data gathered
foundLinks.Add(getObject(link.InnerText, link.Attributes["href"].Value, getLatestEpisode(link.Attributes["href"].Value)));
}
return foundLinks;
}
To make it faster/efficient I need to implement threads, but I'm not sure how i should approach it, because I can't just randomly start threads, I need to wait for them to finish, thread.Join() kind of solves 'wait for threads to finish' problem, but it becomes not fast anymore i think, because threads will be launched after earlier one is finished.

The simplest way to offload the work to multiple threads would be to use Parallel.ForEach() in place of your current loop. Something like this:
Parallel.ForEach(link_list, link =>
{
foundLinks.Add(getObject(link.InnerText, link.Attributes["href"].Value, getLatestEpisode(link.Attributes["href"].Value)));
});
I'm not sure if there are other threading concerns in your overall code. (Note, for example, that this would no longer guarantee that the data would be added to foundLinks in the same order.) But as long as there's nothing explicitly preventing concurrent work from taking place then this would take advantage of threading over multiple CPU cores to process the work.

Maybe you should use Thread pool :
Example from MSDN :
using System;
using System.Threading;
public class Fibonacci
{
private int _n;
private int _fibOfN;
private ManualResetEvent _doneEvent;
public int N { get { return _n; } }
public int FibOfN { get { return _fibOfN; } }
// Constructor.
public Fibonacci(int n, ManualResetEvent doneEvent)
{
_n = n;
_doneEvent = doneEvent;
}
// Wrapper method for use with thread pool.
public void ThreadPoolCallback(Object threadContext)
{
int threadIndex = (int)threadContext;
Console.WriteLine("thread {0} started...", threadIndex);
_fibOfN = Calculate(_n);
Console.WriteLine("thread {0} result calculated...", threadIndex);
_doneEvent.Set();
}
// Recursive method that calculates the Nth Fibonacci number.
public int Calculate(int n)
{
if (n <= 1)
{
return n;
}
return Calculate(n - 1) + Calculate(n - 2);
}
}
public class ThreadPoolExample
{
static void Main()
{
const int FibonacciCalculations = 10;
// One event is used for each Fibonacci object.
ManualResetEvent[] doneEvents = new ManualResetEvent[FibonacciCalculations];
Fibonacci[] fibArray = new Fibonacci[FibonacciCalculations];
Random r = new Random();
// Configure and start threads using ThreadPool.
Console.WriteLine("launching {0} tasks...", FibonacciCalculations);
for (int i = 0; i < FibonacciCalculations; i++)
{
doneEvents[i] = new ManualResetEvent(false);
Fibonacci f = new Fibonacci(r.Next(20, 40), doneEvents[i]);
fibArray[i] = f;
ThreadPool.QueueUserWorkItem(f.ThreadPoolCallback, i);
}
// Wait for all threads in pool to calculate.
WaitHandle.WaitAll(doneEvents);
Console.WriteLine("All calculations are complete.");
// Display the results.
for (int i= 0; i<FibonacciCalculations; i++)
{
Fibonacci f = fibArray[i];
Console.WriteLine("Fibonacci({0}) = {1}", f.N, f.FibOfN);
}
}
}

TPL Dataflow Blocks

Question: Why using a WriteOnceBlock (or BufferBlock) for getting back the answer (like sort of callback) from another BufferBlock<Action> (getting back the answer happens in that posted Action) causes a deadlock (in this code)?
I thought that methods in a class can be considered as messages that we are sending to the object (like the original point of view about OOP that was proposed by - I think - Alan Kay). So I wrote this generic Actor class that helps to convert and ordinary object to an Actor (Of-course there are lots of unseen loopholes here because of mutability and things, but that's not the main concern here).
So we have these definitions:
public class Actor<T>
{
private readonly T _processor;
private readonly BufferBlock<Action<T>> _messageBox = new BufferBlock<Action<T>>();
public Actor(T processor)
{
_processor = processor;
Run();
}
public event Action<T> Send
{
add { _messageBox.Post(value); }
remove { }
}
private async void Run()
{
while (true)
{
var action = await _messageBox.ReceiveAsync();
action(_processor);
}
}
}
public interface IIdGenerator
{
long Next();
}
Now; why this code works:
static void Main(string[] args)
{
var idGenerator1 = new IdInt64();
var idServer1 = new Actor<IIdGenerator>(idGenerator1);
const int n = 1000;
for (var i = 0; i < n; i++)
{
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(answer.Receive());
}, TaskCreationOptions.LongRunning); // Runs on a separate new thread
t.Start();
}
Console.WriteLine("press any key you like! :)");
Console.ReadKey();
Trace.Flush();
}
And this code does not work:
static void Main(string[] args)
{
var idGenerator1 = new IdInt64();
var idServer1 = new Actor<IIdGenerator>(idGenerator1);
const int n = 1000;
for (var i = 0; i < n; i++)
{
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(answer.Receive());
}, TaskCreationOptions.PreferFairness); // Runs and is managed by Task Scheduler
t.Start();
}
Console.WriteLine("press any key you like! :)");
Console.ReadKey();
Trace.Flush();
}
Different TaskCreationOptions used here to create Tasks. Maybe I am wrong about TPL Dataflow concepts here, just started to use it (A [ThreadStatic] hidden somewhere?).

The problematic issue with your code is this part: answer.Receive().
When you move it inside the action the deadlock doesn't happen:
var t = new Task(() =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
Trace.WriteLine(answer.Receive());
};
idServer1.Send += action;
});
t.Start();
So why is that? answer.Receive();, as opposed to await answer.ReceiveAsnyc(); blocks the thread until an answer is returned. When you use TaskCreationOptions.LongRunning each task gets its own thread, so there's no problem, but without it (the TaskCreationOptions.PreferFairness is irrelevant) all the thread pool threads are busy waiting and so everything is much slower. It doesn't actually deadlock, as you can see when you use 15 instead of 1000.
There are other solutions that help understand the problem:
Increasing the thread pool with ThreadPool.SetMinThreads(1000, 0); before the original code.
Using ReceiveAsnyc:
Task.Run(async () =>
{
var answer = new WriteOnceBlock<long>(null);
Action<IIdGenerator> action = x =>
{
var buffer = x.Next();
answer.Post(buffer);
};
idServer1.Send += action;
Trace.WriteLine(await answer.ReceiveAsync());
});

Why this C# code throws SemaphoreFullException?

I have following code which throws SemaphoreFullException, I don't understand why ?
If I change _semaphore = new SemaphoreSlim(0, 2) to
_semaphore = new SemaphoreSlim(0, int.MaxValue)
then all works fine.
Can anyone please find fault with this code and explain to me.
class BlockingQueue<T>
{
private Queue<T> _queue = new Queue<T>();
private SemaphoreSlim _semaphore = new SemaphoreSlim(0, 2);
public void Enqueue(T data)
{
if (data == null) throw new ArgumentNullException("data");
lock (_queue)
{
_queue.Enqueue(data);
}
_semaphore.Release();
}
public T Dequeue()
{
_semaphore.Wait();
lock (_queue)
{
return _queue.Dequeue();
}
}
}
public class Test
{
private static BlockingQueue<string> _bq = new BlockingQueue<string>();
public static void Main()
{
for (int i = 0; i < 100; i++)
{
_bq.Enqueue("item-" + i);
}
for (int i = 0; i < 5; i++)
{
Thread t = new Thread(Produce);
t.Start();
}
for (int i = 0; i < 100; i++)
{
Thread t = new Thread(Consume);
t.Start();
}
Console.ReadLine();
}
private static Random _random = new Random();
private static void Produce()
{
while (true)
{
_bq.Enqueue("item-" + _random.Next());
Thread.Sleep(2000);
}
}
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
}
}
}

If you want to use the semaphore to control the number of concurrent threads, you're using it wrong. You should acquire the semaphore when you dequeue an item, and release the semaphore when the thread is done processing that item.
What you have right now is a system that allows only two items to be in the queue at any one time. Initially, your semaphore has a count of 2. Each time you enqueue an item, the count is reduced. After two items, the count is 0 and if you try to release again you're going to get a semaphore full exception.
If you really want to do this with a semaphore, you need to remove the Release call from the Enqueue method. And add a Release method to the BlockingQueue class. You then would write:
private static void Consume()
{
while (true)
{
Console.WriteLine("Consumed-" + _bq.Dequeue());
Thread.Sleep(1000);
bq.Release();
}
}
That would make your code work, but it's not a very good solution. A much better solution would be to use BlockingCollection<T> and two persistent consumers. Something like:
private BlockingCollection<int> bq = new BlockingCollection<int>();
void Test()
{
// create two consumers
var c1 = new Thread(Consume);
var c2 = new Thread(Consume);
c1.Start();
c2.Start();
// produce
for (var i = 0; i < 100; ++i)
{
bq.Add(i);
}
bq.CompleteAdding();
c1.Join();
c2.Join();
}
void Consume()
{
foreach (var i in bq.GetConsumingEnumerable())
{
Console.WriteLine("Consumed-" + i);
Thread.Sleep(1000);
}
}
That gives you two persistent threads consuming the items. The benefit is that you avoid the cost of spinning up a new thread (or having the RTL assign a pool thread) for each item. Instead, the threads do non-busy waits on the queue. You also don't have to worry about explicit locking, etc. The code is simpler, more robust, and much less likely to contain a bug.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Producer Consumer in C# with multiple (parallel) consumers and no TPL Dataflow - c#

Based on the comments I've posted a second and third version: https://codereview.stackexchange.com/questions/71182/producer-consumer-in-c-with-multiple-parallel-consumers-and-no-tpl-dataflow/71233#71233

Related

Multi Thread & Semaphore & Events

Data Propagation in TPL Dataflow Pipeline with Batchblock.Triggerbatch()

Using threads to parse multiple Html pages faster

TPL Dataflow Blocks

Why this C# code throws SemaphoreFullException?

Categories

Resources