Background:
By reading so many sources I understood BlockingCollection<T> is designed to get rid of the requirement of checking if new data is available in the shared collection between threads. if there is new data inserted into the shared collection then your consumer thread will awake immediately. So you do not have to check if new data is available for consumer thread in certain time intervals typically in a while loop.
I also have similar requirement:
I have a blocking collection of size 1.
This collection will be populated from 3 places (3 producers).
Currently using while loop to check whether collection has something or not.
Want to execute ProcessInbox() method as soon as blocking collection got a value and empty that collection, without checking if new data is available for consumer thread in certain time intervals typically in a while loop. How we can achieve it?
using System;
using System.Collections.Concurrent;
using System.Linq;
using System.Threading;
namespace ConsoleApp1
{
class Program
{
private static BlockingCollection<int> _processingNotificationQueue = new(1);
private static void GetDataFromQueue(CancellationToken cancellationToken)
{
Console.WriteLine("GDFQ called");
int data;
//while (!cancellationToken.IsCancellationRequested)
while(!_processingNotificationQueue.IsCompleted)
{
try
{
if(_processingNotificationQueue.TryTake(out data))
{
Console.WriteLine("Take");
ProcessInbox();
}
}
catch (Exception ex)
{
}
}
}
private static void ProcessInbox()
{
Console.WriteLine("PI called");
}
private static void PostDataToQueue(object state)
{
Console.WriteLine("PDTQ called");
_processingNotificationQueue.TryAdd(1);
}
private void MessageInsertedToTabale()
{
PostDataToQueue(new CancellationToken());
}
private void FewMessagesareNotProcessed()
{
PostDataToQueue(new CancellationToken());
}
static void Main(string[] args)
{
Console.WriteLine("Start");
new Timer(PostDataToQueue, new CancellationToken(), TimeSpan.Zero,
TimeSpan.FromMilliseconds(100));
// new Thread(()=> PostDataToQueue()).Start();
new Thread(() => GetDataFromQueue(new CancellationToken())).Start();
Console.WriteLine("End");
Console.ReadKey();
}
}
}
Just foreach over it. It's blocking. As long as it is not marked as completed, your foreach will HANG if the collection is empty, and will wake up as soon as new items were added.
See first ConsumingEnumerableDemo in of https://learn.microsoft.com/en-us/dotnet/api/system.collections.concurrent.blockingcollection-1?view=net-6.0 and imagine the consumer foreach (var item in bc.GetConsumingEnumerable()) is in another thread. The producer there has a delay between new items, so you should be able to easily tinker with it and see how consumer wakes up "relatively immediately". There's just one producer, but I don't see a problem with multiple producers, Add is thread-safe.
I can't guarantee that there's no significant delay between adding new item and waking up a sleeping consumer, because 'significant' is totally case-dependent word. There probably is some delay, at least for switching threads, but I doubt the collection does any additional throttling. I suppose it signals to wake up sleeping consumers before Add returns in the producer. And I suppose for the purposes of Inbox processing, that's probably UI thing for humans, and probably an order of 100ms delay won't be noticeable and I wouldn't expect the Add/Wakeup latency to be much below 100ms. No guarantees though.
If the 'blocking foreach' part sounds evil to you for some reason, you'll probably have to switch to a different synchronization mechanism (**). This is a BlockingCollection, right? It's not evented collection or something. The TryXXX methods are there for cases where you want to limit exceptions for some reason, and can deal with scheduling updates yourself like you do here (*).
(*) well, almost. This code you posted is missing 2 important things. Your while loop busy-spins at max speed when the collection is empty, that's usually a deadly no-no, especially for anything that runs on batteries. Consider addind some dead time to have if(hasItems) doWork; else sleep(sometime);. The other thing is try-catch. The docs say, when the collection throws, it means the collection is "done". NO MORE ITEMS EVER. No point in looping over a dead collection. The try-catch should be not inside the loop, but should encompass the loop so looping is stopped when collection is finished.
(**) I personally like RX extensions. Here it'd be a simple subject and one observer. Also, async/wait/IAsyncEnumerable are tempting, can help prevent synchronization by sleeping, but it still can end up with a busy-spinning loop if not done carefuly. And there are more choices, and the question was on BlockingCollection, so just FYI.
Related
What is the best way to accomplish a queue line for threads so that I can only have a max number of threads and if I already have that many the code waits for a free slot before continuing..
Pseudo codeish example of what I mean, Im sure this can be done in a better way...
(Please check the additional requirements below)
private int _MaxThreads = 10;
private int _CurrentThreads = 0;
public void main(string[] args)
{
List<object> listWithLotsOfItems = FillWithManyThings();
while(listWithLotsOfItems.Count> 0)
{
// get next item that needs to be worked on
var item = listWithLotsOfItems[0];
listWithLotsOfItems.RemoveAt(0);
// IMPORTANT!, more items can be added as we go.
listWithLotsOfItems.AddRange(AddMoreItemsToBeProcessed());
// wait for free thread slot
while (_CurrentThreads >= _MaxThreads)
Thread.Sleep(100);
Interlocked.Increment(ref _CurrentThreads); // risk of letting more than one thread through here...
Thread t = new Thread(new ParameterizedThreadStart(WorkerThread(item));
t.Start();
}
}
public void WorkerThread(object bigheavyObject)
{
// do heavy work here
Interlocked.Decrement(ref _CurrentThreads);
}
Looked at Sempahore but that seems to be needing to run inside the threads and not outside before it is created. In this example the Semaphore is used inside the thread after it is created to halt it, and in my case there could be over 100k threads that need to run before the job is done so I would rather not create the thread before a slot is available.
(link to semaphore example)
In the real application, data can be added to the list of items as the program progresses so the Parallel.ForEach won't really work either (I'm doing this in a script component in a SSIS package to send data to a very slow WCF).
SSIS has .Net 4.0
So, let me first of all say that what you're trying to do is only going to give you a small enhancement to performance in a very specific arrangement. It can be a lot of work to try and tune at the thread-allocation level, so be sure you have a very good reason before proceeding.
Now, first of all, if you want to simply queue up the work, you can put it on the .NET thread pool. It will only allocate threads up to the maximum configured and any work that doesn't fit onto those (if all the threads are busy) will be queued up until a thread becomes available.
The simplest way to do this is to call:
Task.Factory.StartNew(() => { /* Your code */});
This creates a TPL task and schedules it to run on the default task scheduler, which should in turn allocate the task to the thread-pool.
If you need to wait for these tasks to complete before proceeding, you can add them to a collection and then use Task.WaitAll(...):
var tasks = new List<Task>();
tasks.Add(Task.Factory.StartNew(() => { /* Your code */}));
// Before leaving the script.
Task.WaitAll(tasks);
However, if you need to go deeper and control the scheduling of these tasks, you can look at creating a custom task scheduler that supports limited concurrency. This MSDN article goes into more details about it and suggests a possible implementation, but it isn't a trivial task.
The easiest way to do this is with the overload of Parallel.ForEach() which allows you to select MaxDegreeOfParallelism.
Here's a sample program:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
public static class Program
{
private static void Main()
{
List<int> items = Enumerable.Range(1, 100).ToList();
Parallel.ForEach(items, new ParallelOptions {MaxDegreeOfParallelism = 5}, process);
}
private static void process(int item)
{
Console.WriteLine("Processing " + item);
Thread.Sleep(2000);
}
}
}
If you run this, you'll see that it processes 5 elements very quickly and then there's a delay (caused by Thread.Sleep(2000)) before the next block of elements is processed. This is because in this sample code no more than 5 threads are allowed to execute at once.
Note that if MaxDegreeOfParallelism exceeds the threadpool's minimum thread value, then it may take a while for all the threads to be started.
The reason for this is that Parallel.ForEach() uses threadpool threads - and there is a certain number of threads that the threadpool keeps available by default. When creating threads beyond this limit, a delay is introduced inbetween each new threadpool thread creation.
You can set the minimum number of threadpool threads to a higher value using ThreadPool.SetMinThreads(), but I do NOT recommend this.
However, if you do want to do so, here's an example which sets the minimum thread count to 20:
ThreadPool.GetMinThreads(out dummy, out ioThreads);
ThreadPool.SetMinThreads(20, ioThreads);
If you do that and then run the previous code with MaxDegreeOfParallelism = 20 you'll see that there's no longer any delay when the initial threads are created.
Have you considered using a Wait handle? See this
Also you can use Parallel.Foreach to manage the thread creation for you.
Hope it helps ;)
I have a thread, call it the "Parsing thread".
Thread parsingThread = new Thread(myMethod);
I perform some computations on this thread, of which the last involves more parallel computations.
public void ReadCityFiles(BlockingCollection<GeonamesFileInfo> files)
{
Parallel.ForEach<GeonamesFileInfo>(
files.GetConsumingPartitioner<GeonamesFileInfo>(),
new ParallelOptions { MaxDegreeOfParallelism = _maxParallelism },
(inputFile, args) =>
{
RaiseFileParsing(inputFile);
using (var input = new System.IO.StreamReader(inputFile.FullName))
{
while (!input.EndOfStream)
{
RaiseEntryParsed(ParseCity(input.ReadLine()));
Interlocked.Increment(ref _parsedEntries);
}
}
RaiseFileParsed(inputFile);
});
RaiseDirectoryParsed(Directory);
}
The problem is that when these very long and computationally expensive async foreach operations finish (~30 mins), the "Parsing Thread" doesn't resume. My GUI is still responsive, but the RaiseDirectoryParsed function that is supposed to continue to run on the "Parsing Thread" is never called. I debugged the program up to this point, and am pretty baffled as to what to do in this situation.
The point of BlockingCollection is that when an operation cannot be performed now, but might be in the future (e.g. Take() or Add() on a collection with bounded capacity), it will block. The same applies to GetConsumingEnumerable() and thus also to GetConsumingPartitioner(): if the collection is currently empty, the enumerable will block until you add more items to the collection.
But there is also a way to tell the collection that you're not going to add new items anymore and that it shouldn't block when empty from now on: the CompleteAdding() method. If you call this when you know you won't be adding any more new items to the collection, your Parallel.ForEach() won't block anymore and your thread will continue executing.
I can have a maximum of 5 threads running simultaneous at any one time which makes use of 5 separate hardware to speedup the computation of some complex calculations and return the result. The API (contains only one method) for each of this hardware is not thread safe and can only run on a single thread at any point in time. Once the computation is completed, the same thread can be re-used to start another computation on either the same or a different hardware depending on availability. Each computation is stand alone and does not depend on the results of the other computation. Hence, up to 5 threads may complete its execution in any order.
What is the most efficient C# (using .Net Framework 2.0) coding solution for keeping track of which hardware is free/available and assigning a thread to the appropriate hardware API for performing the computation? Note that other than the limitation of 5 concurrently running threads, I do not have any control over when or how the threads are fired.
Please correct me if I am wrong but a lock free solution is preferred as I believe it will result in increased efficiency and a more scalable solution.
Also note that this is not homework although it may sound like it...
.NET provides a thread pool that you can use. System.Threading.ThreadPool.QueueUserWorkItem() tells a thread in the pool to do some work for you.
Were I designing this, I'd not focus on mapping threads to your HW resources. Instead I'd expose a lockable object for each HW resource - this can simply be an array or queue of 5 Objects. Then for each bit of computation you have, call QueueUserWorkItem(). Inside the method you pass to QUWI, find the next available lockable object and lock it (aka, dequeue it). Use the HW resource, then re-enqueue the object, exit the QUWI method.
It won't matter how many times you call QUWI; there can be at most 5 locks held, each lock guards access to one instance of your special hardware device.
The doc page for Monitor.Enter() shows how to create a safe (blocking) Queue that can be accessed by multiple workers. In .NET 4.0, you would use the builtin BlockingCollection - it's the same thing.
That's basically what you want. Except don't call Thread.Create(). Use the thread pool.
cite: Advantage of using Thread.Start vs QueueUserWorkItem
// assume the SafeQueue class from the cited doc page.
SafeQueue<SpecialHardware> q = new SafeQueue<SpecialHardware>()
// set up the queue with objects protecting the 5 magic stones
private void Setup()
{
for (int i=0; i< 5; i++)
{
q.Enqueue(GetInstanceOfSpecialHardware(i));
}
}
// something like this gets called many times, by QueueUserWorkItem()
public void DoWork(WorkDescription d)
{
d.DoPrepWork();
// gain access to one of the special hardware devices
SpecialHardware shw = q.Dequeue();
try
{
shw.DoTheMagicThing();
}
finally
{
// ensure no matter what happens the HW device is released
q.Enqueue(shw);
// at this point another worker can use it.
}
d.DoFollowupWork();
}
A lock free solution is only beneficial if the computation time is very small.
I would create a facade for each hardware thread where jobs are enqueued and a callback is invoked each time a job finishes.
Something like:
public class Job
{
public string JobInfo {get;set;}
public Action<Job> Callback {get;set;}
}
public class MyHardwareService
{
Queue<Job> _jobs = new Queue<Job>();
Thread _hardwareThread;
ManualResetEvent _event = new ManualResetEvent(false);
public MyHardwareService()
{
_hardwareThread = new Thread(WorkerFunc);
}
public void Enqueue(Job job)
{
lock (_jobs)
_jobs.Enqueue(job);
_event.Set();
}
public void WorkerFunc()
{
while(true)
{
_event.Wait(Timeout.Infinite);
Job currentJob;
lock (_queue)
{
currentJob = jobs.Dequeue();
}
//invoke hardware here.
//trigger callback in a Thread Pool thread to be able
// to continue with the next job ASAP
ThreadPool.QueueUserWorkItem(() => job.Callback(job));
if (_queue.Count == 0)
_event.Reset();
}
}
}
Sounds like you need a thread pool with 5 threads where each one relinquishes the HW once it's done and adds it back to some queue. Would that work? If so, .Net makes thread pools very easy.
Sounds a lot like the Sleeping barber problem. I believe the standard solution to that is to use semaphores
I'm kinda new to concurrent programming, and trying to understand the benefits of using Monitor.Pulse and Monitor.Wait .
MSDN's example is the following:
class MonitorSample
{
const int MAX_LOOP_TIME = 1000;
Queue m_smplQueue;
public MonitorSample()
{
m_smplQueue = new Queue();
}
public void FirstThread()
{
int counter = 0;
lock(m_smplQueue)
{
while(counter < MAX_LOOP_TIME)
{
//Wait, if the queue is busy.
Monitor.Wait(m_smplQueue);
//Push one element.
m_smplQueue.Enqueue(counter);
//Release the waiting thread.
Monitor.Pulse(m_smplQueue);
counter++;
}
}
}
public void SecondThread()
{
lock(m_smplQueue)
{
//Release the waiting thread.
Monitor.Pulse(m_smplQueue);
//Wait in the loop, while the queue is busy.
//Exit on the time-out when the first thread stops.
while(Monitor.Wait(m_smplQueue,1000))
{
//Pop the first element.
int counter = (int)m_smplQueue.Dequeue();
//Print the first element.
Console.WriteLine(counter.ToString());
//Release the waiting thread.
Monitor.Pulse(m_smplQueue);
}
}
}
//Return the number of queue elements.
public int GetQueueCount()
{
return m_smplQueue.Count;
}
static void Main(string[] args)
{
//Create the MonitorSample object.
MonitorSample test = new MonitorSample();
//Create the first thread.
Thread tFirst = new Thread(new ThreadStart(test.FirstThread));
//Create the second thread.
Thread tSecond = new Thread(new ThreadStart(test.SecondThread));
//Start threads.
tFirst.Start();
tSecond.Start();
//wait to the end of the two threads
tFirst.Join();
tSecond.Join();
//Print the number of queue elements.
Console.WriteLine("Queue Count = " + test.GetQueueCount().ToString());
}
}
and i cant see the benefit of using Wait And Pulse instead of this:
public void FirstThreadTwo()
{
int counter = 0;
while (counter < MAX_LOOP_TIME)
{
lock (m_smplQueue)
{
m_smplQueue.Enqueue(counter);
counter++;
}
}
}
public void SecondThreadTwo()
{
while (true)
{
lock (m_smplQueue)
{
int counter = (int)m_smplQueue.Dequeue();
Console.WriteLine(counter.ToString());
}
}
}
Any help is most appreciated.
Thanks
To describe "advantages", a key question is "over what?". If you mean "in preference to a hot-loop", well, CPU utilization is obvious. If you mean "in preference to a sleep/retry loop" - you can get much faster response (Pulse doesn't need to wait as long) and use lower CPU (you haven't woken up 2000 times unnecessarily).
Generally, though, people mean "in preference to Mutex etc".
I tend to use these extensively, even in preference to mutex, reset-events, etc; reasons:
they are simple, and cover most of the scenarios I need
they are relatively cheap, since they don't need to go all the way to OS handles (unlike Mutex etc, which is owned by the OS)
I'm generally already using lock to handle synchronization, so chances are good that I already have a lock when I need to wait for something
it achieves my normal aim - allowing 2 threads to signal completion to each-other in a managed way
I rarely need the other features of Mutex etc (such as being inter-process)
There is a serious flaw in your snippet, SecondThreadTwo() will fail badly when it tries to call Dequeue() on an empty queue. You probably got it to work by having FirstThreadTwo() executed a fraction of a second before the consumer thread, probably by starting it first. That's an accident, one that will stop working after running these threads for a while or starting them with a different machine load. This can accidentally work error free for quite a while, very hard to diagnose the occasional failure.
There is no way to write a locking algorithm that blocks the consumer until the queue becomes non-empty with just the lock statement. A busy loop that constantly enters and exits the lock works but is a very poor substitute.
Writing this kind of code is best left to the threading gurus, it is very hard to prove it works in all cases. Not just absence of failure modes like this one or threading races. But also general fitness of the algorithm that avoids deadlock, livelock and thread convoys. In the .NET world, the gurus are Jeffrey Richter and Joe Duffy. They eat locking designs for breakfast, both in their books and their blogs and magazine articles. Stealing their code is expected and accepted. And partly entered into the .NET framework with the additions in the System.Collections.Concurrent namespace.
It is a performance improvement to use Monitor.Pulse/Wait, as you have guessed. It is a relatively expensive operation to acquire a lock. By using Monitor.Wait, your thread will sleep until some other thread wakes your thread up with `Monitor.Pulse'.
You'll see the difference in TaskManager because one processor core will be pegged even while nothing is in the queue.
The advantages of Pulse and Wait are that they can be used as building blocks for all other synchronization mechanisms including mutexes, events, barriers, etc. There are things that can be done with Pulse and Wait that cannot be done with any other synchronization mechanism in the BCL.
All of the interesting stuff happens inside the Wait method. Wait will exit the critical section and put the thread in the WaitSleepJoin state by placing it in the waiting queue. Once Pulse is called then the next thread in the waiting queue moves to the ready queue. Once the thread switches to the Running state it reenters the critical section. This is important to repeat another way. Wait will release the lock and reacquire it in an atomic fashion. No other synchronization mechanism has this feature.
The best way to envision this is to try to replicate the behavior with some other strategy and then see what can go wrong. Let us try this excerise with a ManualResetEvent since the Set and WaitOne methods seem like they may be analogous. Our first attempt might look like this.
void FirstThread()
{
lock (mre)
{
// Do stuff.
mre.Set();
// Do stuff.
}
}
void SecondThread()
{
lock (mre)
{
// Do stuff.
while (!CheckSomeCondition())
{
mre.WaitOne();
}
// Do stuff.
}
}
It should be easy to see that the code will can deadlock. So what happens if we try this naive fix?
void FirstThread()
{
lock (mre)
{
// Do stuff.
mre.Set();
// Do stuff.
}
}
void SecondThread()
{
lock (mre)
{
// Do stuff.
}
while (!CheckSomeCondition())
{
mre.WaitOne();
}
lock (mre)
{
// Do stuff.
}
}
Can you see what can go wrong here? Since we did not atomically reenter the lock after the wait condition was checked another thread could get in and invalidate the condition. In other words, another thread could do something that causes CheckSomeCondition to start returning false again before the following lock was reacquired. That can definitely cause a lot of weird problems if your second block of code required that the condition be true.
I have an object that requires a lot of initialization (1-2 seconds on a beefy machine). Though once it is initialized it only takes about 20 miliseconds to do a typical "job"
In order to prevent it from being re-initialized every time an app wants to use it (which could be 50 times a second or not at all for minutes in typical usage), I decided to give it a job que, and have it run on its own thread, checking to see if there is any work for it in the que. However I'm not entirely sure how to make a thread that runs indefinetly with or without work.
Here's what I have so far, any critique is welcomed
private void DoWork()
{
while (true)
{
if (JobQue.Count > 0)
{
// do work on JobQue.Dequeue()
}
else
{
System.Threading.Thread.Sleep(50);
}
}
}
After thought: I was thinking I may need to kill this thread gracefully insead of letting it run forever, so I think I will add a Job type that tells the thread to end. Any thoughts on how to end a thread like this also appreciated.
You need to lock anyway, so you can Wait and Pulse:
while(true) {
SomeType item;
lock(queue) {
while(queue.Count == 0) {
Monitor.Wait(queue); // releases lock, waits for a Pulse,
// and re-acquires the lock
}
item = queue.Dequeue(); // we have the lock, and there's data
}
// process item **outside** of the lock
}
with add like:
lock(queue) {
queue.Enqueue(item);
// if the queue was empty, the worker may be waiting - wake it up
if(queue.Count == 1) { Monitor.PulseAll(queue); }
}
You might also want to look at this question, which limits the size of the queue (blocking if it is too full).
You need a synchronization primitive, like a WaitHandle (look at the static methods) . This way you can 'signal' the worker thread that there is work. It checks the queue and keeps on working until the queue is empty, at which time it waits for the mutex to signal it again.
Make one of the job items be a quit command too, so that you can signal the worker thread when it's time to exit the thread
In most cases, I've done this quite similar to how you've set up -- but not in the same language. I had the advantage of working with a data structure (in Python) which will block the thread until an item is put into the queue, negating the need for the sleep call.
If .NET provides a class like that, I'd look into using it. A thread blocking is much better than a thread spinning on sleep calls.
The job you can pass could be as simple as a "null"; if the code receives a null, it knows it's time to break out of the while and go home.
If you don't really need to have the thread exit (and just want it to keep from keeping your application running) you can set Thread.IsBackground to true and it will end when all non background threads end. Will and Marc both have good solutions for handling the queue.
Grab the Parallel Framework. It has a BlockingCollection<T> which you can use as a job queue. How you'd use it is:
Create the BlockingCollection<T> that will hold your tasks/jobs.
Create some Threads which have a never-ending loop (while(true){ // get job off the queue)
Set the threads going
Add jobs to the collection when they come available
The threads will be blocked until an item appears in the collection. Whoever's turn it is will get it (depends on the CPU). I'm using this now and it works great.
It also has the advantage of relying on MS to write that particularly nasty bit of code where multiple threads access the same resource. And whenever you can get somebody else to write that you should go for it. Assuming, of course, they have more technical/testing resources and combined experience than you.
I've implemented a background-task queue without using any kind of while loop, or pulsing, or waiting, or, indeed, touching Thread objects at all. And it seems to work. (By which I mean it's been in production environments handling thousands of tasks a day for the last 18 months without any unexpected behavior.) It's a class with two significant properties, a Queue<Task> and a BackgroundWorker. There are three significant methods, abbreviated here:
private void BackgroundWorker_DoWork(object sender, DoWorkEventArgs e)
{
if (TaskQueue.Count > 0)
{
TaskQueue[0].Execute();
}
}
private void BackgroundWorker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
Task t = TaskQueue[0];
lock (TaskQueue)
{
TaskQueue.Remove(t);
}
if (TaskQueue.Count > 0 && !BackgroundWorker.IsBusy)
{
BackgroundWorker.RunWorkerAsync();
}
}
public void Enqueue(Task t)
{
lock (TaskQueue)
{
TaskQueue.Add(t);
}
if (!BackgroundWorker.IsBusy)
{
BackgroundWorker.RunWorkerAsync();
}
}
It's not that there's no waiting and pulsing. But that all happens inside the BackgroundWorker. This just wakes up whenever a task is dropped in the queue, runs until the queue is empty, and then goes back to sleep.
I am far from an expert on threading. Is there a reason to mess around with System.Threading for a problem like this if using a BackgroundWorker will do?