How to correctly use BlockingCollection.GetConsumingEnumerable?

How to correctly use BlockingCollection.GetConsumingEnumerable? - c#

I'm trying to implement a producer/consumer pattern using BlockingCollection<T> so I've written up a simple console application to test it.
public class Program
{
public static void Main(string[] args)
{
var workQueue = new WorkQueue();
workQueue.StartProducingItems();
workQueue.StartProcessingItems();
while (true)
{
}
}
}
public class WorkQueue
{
private BlockingCollection<int> _queue;
private static Random _random = new Random();
public WorkQueue()
{
_queue = new BlockingCollection<int>();
// Prefill some items.
for (int i = 0; i < 100; i++)
{
//_queue.Add(_random.Next());
}
}
public void StartProducingItems()
{
Task.Run(() =>
{
_queue.Add(_random.Next()); // Should be adding items to the queue constantly, but instead adds one and then nothing else.
});
}
public void StartProcessingItems()
{
Task.Run(() =>
{
foreach (var item in _queue.GetConsumingEnumerable())
{
Console.WriteLine("Worker 1: " + item);
}
});
Task.Run(() =>
{
foreach (var item in _queue.GetConsumingEnumerable())
{
Console.WriteLine("Worker 2: " + item);
}
});
}
}
However there are 3 problems with my design:
I don't know the correct way of blocking/waiting in my Main method. Doing a simple empty while loop seems terribly inefficient and CPU usage wasting simply for the sake of making sure the application doesn't end.
There's also another problem with my design, in this simple application I have a producer that produces items indefinitely, and should never stop. In a real world setup, I'd want it to end eventually (e.g. ran out of files to process). In that case, how should I wait for it to finish in the Main method? Make StartProducingItems async and then await it?
Either the GetConsumingEnumerable or Add is not working as I expected. The producer should constantly adding items, but it adds one item and then never adds anymore. This one item is then processed by one of the consumers. Both consumers then block waiting for items to be added, but none are. I know of the Take method, but again spinning on Take in a while loop seems pretty wasteful and inefficient. There is a CompleteAdding method but that then does not allow anything else to ever be added and throws an exception if you try, so that is not suitable.
I know for sure that both consumers are in fact blocking and waiting for new items, as I can switch between threads during debugging:
EDIT:
I've made the changes suggested in one of the comments, but the Task.WhenAll still returns right away.
public Task StartProcessingItems()
{
var consumers = new List<Task>();
for (int i = 0; i < 2; i++)
{
consumers.Add(Task.Run(() =>
{
foreach (var item in _queue.GetConsumingEnumerable())
{
Console.WriteLine($"Worker {i}: " + item);
}
}));
}
return Task.WhenAll(consumers.ToList());
}

GetConsumingEnumerable() is blocking. If you want to add to the queue constantly, you should put the call to _queue.Add in a loop:
public void StartProducingItems()
{
Task.Run(() =>
{
while (true)
_queue.Add(_random.Next());
});
}
Regarding the Main() method you could call the Console.ReadLine() method to prevent the main thread from finishing before you have pressed a key:
public static void Main(string[] args)
{
var workQueue = new WorkQueue();
workQueue.StartProducingItems();
workQueue.StartProcessingItems();
Console.WriteLine("Press a key to terminate the application...");
Console.ReadLine();
}

Related

Thread safe with Linq and Tasks on a Collection

Given some code like so
public class CustomCollectionClass : Collection<CustomData> {}
public class CustomData
{
string name;
bool finished;
string result;
}
public async Task DoWorkInParallel(CustomCollectionClass collection)
{
// collection can be retrieved from a DB, may not exist.
if (collection == null)
{
collection = new CustomCollectionClass();
foreach (var data in myData)
{
collection.Add(new CustomData()
{
name = data.Name;
});
}
}
// This part doesn't feel safe. Not sure what to do here.
var processTasks = myData.Select(o =>
this.DoWorkOnItemInCollection(collection.Single(d => d.name = o.Name))).ToArray();
await Task.WhenAll(processTasks);
await SaveModifedCollection(collection);
}
public async Task DoWorkOnItemInCollection(CustomData data)
{
await DoABunchOfWorkElsewhere();
// This doesn't feel safe either. Lock here?
data.finished = true;
data.result = "Parallel";
}
As I noted in a couple comments inline, it doesn't feel safe for me to do the above, but I'm not sure. I do have a collection of elements that I'd like to assign a unique element to each parallel task and have those tasks be able to modify that single element of the collection based on what work is done. End result being, I wanted to save the collection after individual, different elements have been modified in parallel. If this isn't a safe way to do it, how best would I go about this?

Your code is the right way to do this, assuming starting DoABunchOfWorkElsewhere() multiple times is itself safe.
You don't need to worry about your LINQ query, because it doesn't actually run in parallel. All it does is to invoke DoWorkOnItemInCollection() multiple times. Those invocations may work in parallel (or not, depending on your synchronization context and the implementation of DoABunchOfWorkElsewhere()), but the code you showed is safe.

Your above code should work without issue. You are passing off one item to each worker thread. I'm not so sure about the async attribute. You might just return a Task, and then in your method do:
public Task DoWorkOnItemInCollection(CustomData data)
{
return Task.Run(() => {
DoABunchOfWorkElsewhere().Wait();
data.finished = true;
data.result = "Parallel";
});
}
You might want to be careful, with large amount of items, you could overflow your max thread count with background threads. In this case, c# just deletes your threads, which can be difficult to debug later.
I have done this before, It might be easier if instead of handing the whole collection to some magic linq, rather do a classic consumer problem:
class ParallelWorker<T>
{
private Action<T> Action;
private Queue<T> Queue = new Queue<T>();
private object QueueLock = new object();
private void DoWork()
{
while(true)
{
T item;
lock(this.QueueLock)
{
if(this.Queue.Count == 0) return; //exit thread
item = this.Queue.DeQueue();
}
try { this.Action(item); }
catch { /*...*/ }
}
}
public void DoParallelWork(IEnumerable<T> items, int maxDegreesOfParallelism, Action<T> action)
{
this.Action = action;
this.Queue.Clear();
this.Queue.AddRange(items);
List<Thread> threads = new List<Thread>();
for(int i = 0; i < items; i++)
{
ParameterizedThreadStart threadStart = new ParameterizedThreadStart(DoWork);
Thread thread = new Thread(threadStart);
thread.Start();
threads.Add(thread);
}
foreach(Thread thread in threads)
{
thread.Join();
}
}
}
This was done IDE free, so there may be typos.

I'm going to make the suggestion that you use Microsoft's Reactive Framework (NuGet "Rx-Main") to do this task.
Here's the code:
public void DoWorkInParallel(CustomCollectionClass collection)
{
var query =
from x in collection.ToObservable()
from r in Observable.FromAsync(() => DoWorkOnItemInCollection(x))
select x;
query.Subscribe(x => { }, ex => { }, async () =>
{
await SaveModifedCollection(collection);
});
}
Done. That's it. Nothing more.
I have to say though, that when I tried to get your code to run it was full of bugs and issues. I suspect that the code you posted isn't your production code, but an example you wrote specifically for this question. I suggest that you try to make a running compilable example before posting.
Nevertheless, my suggestion should work for you with a little tweaking.
It is multi-threaded and thread-safe. And it does do cleanly save the modified collection when done.

Ensure a long running task is only fired once and subsequent request are queued but with only one entry in the queue

I have a compute intensive method Calculate that may run for a few seconds, requests come from multiple threads.
Only one Calculate should be executing, a subsequent request should be queued until the initial request completes. If there is already a request queued then the the subsequent request can be discarded (as the queued request will be sufficient)
There seems to be lots of potential solutions but I just need the simplest.
UPDATE: Here's my rudimentaryattempt:
private int _queueStatus;
private readonly object _queueStatusSync = new Object();
public void Calculate()
{
lock(_queueStatusSync)
{
if(_queueStatus == 2) return;
_queueStatus++;
if(_queueStatus == 2) return;
}
for(;;)
{
CalculateImpl();
lock(_queueStatusSync)
if(--_queueStatus == 0) return;
}
}
private void CalculateImpl()
{
// long running process will take a few seconds...
}

The simplest, cleanest solution IMO is using TPL Dataflow (as always) with a BufferBlock acting as the queue. BufferBlock is thread-safe, supports async-await, and more important, has TryReceiveAll to get all the items at once. It also has OutputAvailableAsync so you can wait asynchronously for items to be posted to the buffer. When multiple requests are posted you simply take the last and forget about the rest:
var buffer = new BufferBlock<Request>();
var task = Task.Run(async () =>
{
while (await buffer.OutputAvailableAsync())
{
IList<Request> requests;
buffer.TryReceiveAll(out requests);
Calculate(requests.Last());
}
});
Usage:
buffer.Post(new Request());
buffer.Post(new Request());
Edit: If you don't have any input or output for the Calculate method you can simply use a boolean to act as a switch. If it's true you can turn it off and calculate, if it became true again while Calculate was running then calculate again:
public bool _shouldCalculate;
public void Producer()
{
_shouldCalculate = true;
}
public async Task Consumer()
{
while (true)
{
if (!_shouldCalculate)
{
await Task.Delay(1000);
}
else
{
_shouldCalculate = false;
Calculate();
}
}
}

A BlockingCollection that only takes 1 at a time
The trick is to skip if there are any items in the collection
I would go with the answer from I3aron +1
This is (maybe) a BlockingCollection solution
public static void BC_AddTakeCompleteAdding()
{
using (BlockingCollection<int> bc = new BlockingCollection<int>(1))
{
// Spin up a Task to populate the BlockingCollection
using (Task t1 = Task.Factory.StartNew(() =>
{
for (int i = 0; i < 100; i++)
{
if (bc.TryAdd(i))
{
Debug.WriteLine(" add " + i.ToString());
}
else
{
Debug.WriteLine(" skip " + i.ToString());
}
Thread.Sleep(30);
}
bc.CompleteAdding();
}))
{
// Spin up a Task to consume the BlockingCollection
using (Task t2 = Task.Factory.StartNew(() =>
{
try
{
// Consume consume the BlockingCollection
while (true)
{
Debug.WriteLine("take " + bc.Take());
Thread.Sleep(100);
}
}
catch (InvalidOperationException)
{
// An InvalidOperationException means that Take() was called on a completed collection
Console.WriteLine("That's All!");
}
}))
Task.WaitAll(t1, t2);
}
}
}

It sounds like a classic producer-consumer. I'd recommend looking into BlockingCollection<T>. It is part of the System.Collection.Concurrent namespace. On top of that you can implement your queuing logic.
You may supply to a BlockingCollection any internal structure to hold its data, such as a ConcurrentBag<T>, ConcurrentQueue<T> etc. The latter is the default structure used.

Single source producer to multiple cosumers working in parallel

I want to have a kind of queue in which a single source inputs data in it and on the other side there will be consumers waiting that when they detect that the queue is not empty will start to execute the data until they are halted. but its important that if the queue is emptied they will still remain watching the queue such that if more data pops in they will be able to consume it. What i found By multiple consumer and multiple producers as the consumers are nested in the producers where in my case i cant do that as i will have a single source and consumers committed to the queue till i stop them. therefore not in series but both the consumer and the producers are executing in parallel.
will be xecutig the consumer and the producers in parallel by
Parallel.Invoke(() => producer(), () => consumers());
the problem as such is how i will execute the content of a queue which is sometimes empty in parallel

You can solve this relatively easily using a BlockingCollection<T>.
You can use one as a queue, and pass a reference to it to the producer() and each of the consumers().
You'll be calling GetConsumingEnumerable() from each consumer thread, and using it with foreach.
The producer thread will add items to the collection, and will call CompleteAdding() when it has finished producing stuff. This will automatically make all the consumer threads exit their foreach loops.
Here's a basic example (with no error handling). The calls to Thread.Sleep() are to simulate load, and should not be used in real code.
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
internal class Program
{
private static void Main(string[] args)
{
ThreadPool.SetMinThreads(10, 0); // To help the demo; not needed in real code.
var plant = new ProcessingPlant();
plant.Process();
Console.WriteLine("Work complete.");
}
}
public sealed class ProcessingPlant
{
private readonly BlockingCollection<string> _queue = new BlockingCollection<string>();
public void Process()
{
Parallel.Invoke(producer, consumers);
}
private void producer()
{
for (int i = 0; i < 100; ++i)
{
string item = i.ToString();
Console.WriteLine("Producer is queueing {0}", item);
_queue.Add(item); // <- Here's where we add an item to the queue.
Thread.Sleep(0);
}
_queue.CompleteAdding(); // <- Here's where we make all the consumers
} // exit their foreach loops.
private void consumers()
{
Parallel.Invoke(
() => consumer(1),
() => consumer(2),
() => consumer(3),
() => consumer(4),
() => consumer(5)
);
}
private void consumer(int id)
{
Console.WriteLine("Consumer {0} is starting.", id);
foreach (var item in _queue.GetConsumingEnumerable()) // <- Here's where we remove items.
{
Console.WriteLine("Consumer {0} read {1}", id, item);
Thread.Sleep(0);
}
Console.WriteLine("Consumer {0} is stopping.", id);
}
}
}
(I know this is using an extra thread just to start the consumers, but I did it this way to avoid obscuring the real point - which is to demonstrate the use of BlockingCollection.)

How should i write producer /consumer code in C#?

I have 1 thread streaming data and a 2nd (the threadpool) processing the data. The data processing takes around 100ms so I use to second thread so not to hold up the 1st thread.
While the 2nd thread is processing the data the 1st thread adds the data to a dictionary cache then when the 2nd thread is finished it processes the cached values.
My questions is this how should be doing producer /consumer code in C#?
public delegate void OnValue(ulong value);
public class Runner
{
public event OnValue OnValueEvent;
private readonly IDictionary<string, ulong> _cache = new Dictionary<string, ulong>(StringComparer.InvariantCultureIgnoreCase);
private readonly AutoResetEvent _cachePublisherWaitHandle = new AutoResetEvent(true);
public void Start()
{
for (ulong i = 0; i < 500; i++)
{
DataStreamHandler(i.ToString(), i);
}
}
private void DataStreamHandler(string id, ulong value)
{
_cache[id] = value;
if (_cachePublisherWaitHandle.WaitOne(1))
{
IList<ulong> tempValues = new List<ulong>(_cache.Values);
_cache.Clear();
_cachePublisherWaitHandle.Reset();
ThreadPool.UnsafeQueueUserWorkItem(delegate
{
try
{
foreach (ulong value1 in tempValues)
if (OnValueEvent != null)
OnValueEvent(value1);
}
finally
{
_cachePublisherWaitHandle.Set();
}
}, null);
}
else
{
Console.WriteLine(string.Format("Buffered value: {0}.", value));
}
}
}
class Program
{
static void Main(string[] args)
{
Stopwatch sw = Stopwatch.StartNew();
Runner r = new Runner();
r.OnValueEvent += delegate(ulong x)
{
Console.WriteLine(string.Format("Processed value: {0}.", x));
Thread.Sleep(100);
if(x == 499)
{
sw.Stop();
Console.WriteLine(string.Format("Time: {0}.", sw.ElapsedMilliseconds));
}
};
r.Start();
Console.WriteLine("Done");
Console.ReadLine();
}
}

The best practice for setting up the producer-consumer pattern is to use the BlockingCollection class which is available in .NET 4.0 or as a separate download of the Reactive Extensions framework. The idea is that the producers will enqueue using the Add method and the consumers will dequeue using the Take method which blocks if the queue is empty. Like SwDevMan81 pointed out the Albahari site has a really good writeup on how to make it work correctly if you want to go the manual route.

There is a good article on MSDN about Synchronizing the Producer and Consumer. There is also a good example on the Albahari site.

C# producer/consumer

i've recently come across a producer/consumer pattern c# implementation. it's very simple and (for me at least) very elegant.
it seems to have been devised around 2006, so i was wondering if this implementation is
- safe
- still applicable
Code is below (original code was referenced at http://bytes.com/topic/net/answers/575276-producer-consumer#post2251375)
using System;
using System.Collections;
using System.Threading;
public class Test
{
static ProducerConsumer queue;
static void Main()
{
queue = new ProducerConsumer();
new Thread(new ThreadStart(ConsumerJob)).Start();
Random rng = new Random(0);
for (int i=0; i < 10; i++)
{
Console.WriteLine ("Producing {0}", i);
queue.Produce(i);
Thread.Sleep(rng.Next(1000));
}
}
static void ConsumerJob()
{
// Make sure we get a different random seed from the
// first thread
Random rng = new Random(1);
// We happen to know we've only got 10
// items to receive
for (int i=0; i < 10; i++)
{
object o = queue.Consume();
Console.WriteLine ("\t\t\t\tConsuming {0}", o);
Thread.Sleep(rng.Next(1000));
}
}
}
public class ProducerConsumer
{
readonly object listLock = new object();
Queue queue = new Queue();
public void Produce(object o)
{
lock (listLock)
{
queue.Enqueue(o);
// We always need to pulse, even if the queue wasn't
// empty before. Otherwise, if we add several items
// in quick succession, we may only pulse once, waking
// a single thread up, even if there are multiple threads
// waiting for items.
Monitor.Pulse(listLock);
}
}
public object Consume()
{
lock (listLock)
{
// If the queue is empty, wait for an item to be added
// Note that this is a while loop, as we may be pulsed
// but not wake up before another thread has come in and
// consumed the newly added object. In that case, we'll
// have to wait for another pulse.
while (queue.Count==0)
{
// This releases listLock, only reacquiring it
// after being woken up by a call to Pulse
Monitor.Wait(listLock);
}
return queue.Dequeue();
}
}
}

The code is older than that - I wrote it some time before .NET 2.0 came out. The concept of a producer/consumer queue is way older than that though :)
Yes, that code is safe as far as I'm aware - but it has some deficiencies:
It's non-generic. A modern version would certainly be generic.
It has no way of stopping the queue. One simple way of stopping the queue (so that all the consumer threads retire) is to have a "stop work" token which can be put into the queue. You then add as many tokens as you have threads. Alternatively, you have a separate flag to indicate that you want to stop. (This allows the other threads to stop before finishing all the current work in the queue.)
If the jobs are very small, consuming a single job at a time may not be the most efficient thing to do.
The ideas behind the code are more important than the code itself, to be honest.

You could do something like the following code snippet. It's generic and has a method for enqueue-ing nulls (or whatever flag you'd like to use) to tell the worker threads to exit.
The code is taken from here: http://www.albahari.com/threading/part4.aspx#_Wait_and_Pulse
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
namespace ConsoleApplication1
{
public class TaskQueue<T> : IDisposable where T : class
{
object locker = new object();
Thread[] workers;
Queue<T> taskQ = new Queue<T>();
public TaskQueue(int workerCount)
{
workers = new Thread[workerCount];
// Create and start a separate thread for each worker
for (int i = 0; i < workerCount; i++)
(workers[i] = new Thread(Consume)).Start();
}
public void Dispose()
{
// Enqueue one null task per worker to make each exit.
foreach (Thread worker in workers) EnqueueTask(null);
foreach (Thread worker in workers) worker.Join();
}
public void EnqueueTask(T task)
{
lock (locker)
{
taskQ.Enqueue(task);
Monitor.PulseAll(locker);
}
}
void Consume()
{
while (true)
{
T task;
lock (locker)
{
while (taskQ.Count == 0) Monitor.Wait(locker);
task = taskQ.Dequeue();
}
if (task == null) return; // This signals our exit
Console.Write(task);
Thread.Sleep(1000); // Simulate time-consuming task
}
}
}
}

Back in the day I learned how Monitor.Wait/Pulse works (and a lot about threads in general) from the above piece of code and the article series it is from. So as Jon says, it has a lot of value to it and is indeed safe and applicable.
However, as of .NET 4, there is a producer-consumer queue implementation in the framework. I only just found it myself but up to this point it does everything I need.

These days a more modern option is available using the namespace System.Threading.Tasks.Dataflow. It's async/await friendly and much more versatile.
More info here How to: Implement a producer-consumer dataflow pattern
It's included starting from .Net Core, for older .Nets you may need to install a package with the same name as the namespace.
I know the question is old, but it's the first match in Google for my request, so I decided to update the topic.

A modern and simple way to implement the producer/consumer pattern in C# is to use System.Threading.Channels. It's asynchronous and uses ValueTask's to decrease memory allocations. Here is an example:
public class ProducerConsumer<T>
{
protected readonly Channel<T> JobChannel = Channel.CreateUnbounded<T>();
public IAsyncEnumerable<T> GetAllAsync()
{
return JobChannel.Reader.ReadAllAsync();
}
public async ValueTask AddAsync(T job)
{
await JobChannel.Writer.WriteAsync(job);
}
public async ValueTask AddAsync(IEnumerable<T> jobs)
{
foreach (var job in jobs)
{
await JobChannel.Writer.WriteAsync(job);
}
}
}

Warning: If you read the comments, you'll understand my answer is wrong :)
There's a possible deadlock in your code.
Imagine the following case, for clarity, I used a single-thread approach but should be easy to convert to multi-thread with sleep:
// We create some actions...
object locker = new object();
Action action1 = () => {
lock (locker)
{
System.Threading.Monitor.Wait(locker);
Console.WriteLine("This is action1");
}
};
Action action2 = () => {
lock (locker)
{
System.Threading.Monitor.Wait(locker);
Console.WriteLine("This is action2");
}
};
// ... (stuff happens, etc.)
// Imagine both actions were running
// and there's 0 items in the queue
// And now the producer kicks in...
lock (locker)
{
// This would add a job to the queue
Console.WriteLine("Pulse now!");
System.Threading.Monitor.Pulse(locker);
}
// ... (more stuff)
// and the actions finish now!
Console.WriteLine("Consume action!");
action1(); // Oops... they're locked...
action2();
Please do let me know if this doesn't make any sense.
If this is confirmed, then the answer to your question is, "no, it isn't safe" ;)
I hope this helps.

public class ProducerConsumerProblem
{
private int n;
object obj = new object();
public ProducerConsumerProblem(int n)
{
this.n = n;
}
public void Producer()
{
for (int i = 0; i < n; i++)
{
lock (obj)
{
Console.Write("Producer =>");
System.Threading.Monitor.Pulse(obj);
System.Threading.Thread.Sleep(1);
System.Threading.Monitor.Wait(obj);
}
}
}
public void Consumer()
{
lock (obj)
{
for (int i = 0; i < n; i++)
{
System.Threading.Monitor.Wait(obj, 10);
Console.Write("<= Consumer");
System.Threading.Monitor.Pulse(obj);
Console.WriteLine();
}
}
}
}
public class Program
{
static void Main(string[] args)
{
ProducerConsumerProblem f = new ProducerConsumerProblem(10);
System.Threading.Thread t1 = new System.Threading.Thread(() => f.Producer());
System.Threading.Thread t2 = new System.Threading.Thread(() => f.Consumer());
t1.IsBackground = true;
t2.IsBackground = true;
t1.Start();
t2.Start();
Console.ReadLine();
}
}
output
Producer =><= Consumer
Producer =><= Consumer
Producer =><= Consumer
Producer =><= Consumer
Producer =><= Consumer
Producer =><= Consumer
Producer =><= Consumer
Producer =><= Consumer
Producer =><= Consumer
Producer =><= Consumer

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to correctly use BlockingCollection.GetConsumingEnumerable? - c#

Related

Thread safe with Linq and Tasks on a Collection

Ensure a long running task is only fired once and subsequent request are queued but with only one entry in the queue

Single source producer to multiple cosumers working in parallel

How should i write producer /consumer code in C#?

C# producer/consumer

Categories

Resources