Beeing a threading noob, I'm trying to find a way w/o locking objects that allows me to enqueue a threadpool task, in such way that it has a max degree of parallelism = 1.
Will this code do what I think it does?
private int status;
private const int Idle = 0;
private const int Busy = 1;
private void Schedule()
{
// only schedule if we idle
// we become busy and get the old value to compare with
// in an atomic way (?)
if (Interlocked.Exchange(ref status, Busy) == Idle)
{
ThreadPool.QueueUserWorkItem(Run);
}
}
That is, in a threadsafe way enqueue the Run method if the status is Idle.
It seems to work fine in my tests, but since this is not my area, I'm not sure.
Yes, this will do what you want. It will never allow you to get a return value of Idle when in fact status is Busy, and it will set status to Busy in the same operation, with no chance of a conflict. So far so good.
However, if you're using a ConcurrentQueue<T> later on, why do you even do this? Why do you use a ThreadPool to enqueue Run over and over again, instead of just having a single thread continually take data from the concurrent queue using TryDequeue?
In fact, there is a producer-consumer collection that is specifically designed for this, the BlockingCollection<T>. Your consumer thread would just call Take (with a cancellation token if necessary - probably a good idea) and that either returns a value as in ConcurrentQueue<T>; or if no value is available, blocks the thread until there is something to take. When some other thread adds an item to the collection, it will notify as many consumers as it has data for (in your case, no need for anything complex, since you only have one consumer).
That means you only have to handle starting and stopping a single thread, which will run an "infinite" cycle, which will call col.Take, while the producers call col.Add.
Of course, this assumes you have .NET 4.0+ available, but then again, you probably do, since you're using ConcurrentQueue<T>.
Related
thanks for the assistance. I've got a triple-threaded process, linked by a concurrent queue. Thread one processes information, returns to the second thread, which places data into a concurrent queue. The third thread is just looping like so:
while (true) {
if(queue.TryDequeue(out info)) {
doStuff(info);
} else {
Thread.Sleep(1);
}
}
Is there a better way to handle it such that I'm not iterating over the loop so much? The application is extremely performance sensitive, and currently just the TryDequeue is taking ~8-9% of the application runtime. Looking to decrease that as much as possible, but not really sure what my options are.
You should consider using System.Collections.Concurrent.BlockingCollection and its Add() / Take() methods. With Take() your third thread will be just suspended while waiting for new item. Add() is thread safe and can be used by second thread.
With that approach you should be able to simplify your code into something like that:
while (true) {
var info = collection.Take();
doStuff(info);
}
You can increase the sleep time. I would also use await Task.Delay instead of sleep. This way you can wait longer without the extra cpu cycles that Thread.Sleep uses and still be able to cancel the delay by making use of the CancellationTokenSource.
On another note, there are better ways of queuing up jobs. Taking into consideration that it appears you want to run these jobs synchronously, an example would be to have a singleton class that takes your work items and queues them up. So if there are no items in the queue when you add one, it should detect that and then start your job process. At the end of your job process, check for more work, use recursion to do that work or if no more jobs then exit the job process, which will run again when you add an item to the empty queue. If my assumption is wrong and you can run these jobs in parallel, why use a queue?
You may like to use a thread safe implementation of ObservableCollection. Check out this SO question ObservableCollection and threading
I don't have a recommendation that avoids looping, however I would recommend you move away from
while (true)
and consider this instead:
MyThing thing;
while (queue.TryDequeue(out thing))
{
doWork(thing);
}
Put this in a method that gets called each time the queue is modified, this ensures it is running when needed, but ends when not needed.
Given a following code snippet(found in somewhere while learning threading).
public class BlockingQueue<T>
{
private readonly object sync = new object();
private readonly Queue<T> queue;
public BlockingQueue()
{
queue = new Queue<T>();
}
public void Enqueue(T item)
{
lock (sync)
{
queue.Enqueue(item);
Monitor.PulseAll(sync);
}
}
public T Dequeue()
{
lock (sync)
{
while (queue.Count == 0)
Monitor.Wait(sync);
return queue.Dequeue();
}
}
}
What I want to understand is ,
Why is there a while loop ?
while (queue.Count == 0)
Monitor.Wait(sync);
and what is wrong with the,
if(queue.Count == 0)
Monitor.Wait(sync);
In fact, all the time when I see the similar code I found using while loop, can anyone please help me understand the use of one above another.
Thank you.
You need to understand what Pulse, PulseAll, and Wait are doing. The Monitor maintains two queues: the waiting queue and the ready queue. When a thread calls Wait it is moved into the waiting queue. When a thread calls Pulse it moves one and only one thread from the waiting queue to the ready queue. When a thread calls PulseAll it moves all threads from the waiting queue to the ready queue. Threads in the ready queue are eligible to reacquire the lock at any moment, but only after the current holder releases it of course.
Based on this knowledge it is fairly easy to understand why you must recheck the queue count when using PulseAll. It is because all dequeueing threads will eventually wake and will want to attempt to extract an item from queue. But, what if there is only one item in the queue to begin with? Obviously, we must recheck the queue count to avoid dequeueing an empty queue.
So what would be the conclusion if you had used Pulse instead of PulseAll? There would still be a problem with the simple if check. The reason is because a thread from the ready queue is not necessarily going to be the next thread to acquire the lock. That is because the Monitor does not give preference to a Wait call above an Enter call.
The while loop is a fairly standard pattern when using Monitor.Wait. This is because pulsing a thread does not have semantic meaning by itself. It is only a signal that the lock state has changed. When threads wake up after blocking on Wait they should recheck the same condition that was originally used to block the thread to see if the thread can now proceed. Sometimes it cannot and so it should block some more.
The best rule of thumb here is that if there is doubt about whether to use an if check or a while check then always choose a while loop because it is safer. In fact, I would take this to the extreme and suggest to always use a while loop because there is no inherent advantage in using the simpler if check and because the if check is almost always the wrong choice anyway. A similar rule holds for choosing whether to use Pulse or PulseAll. If there is doubt about which one to use then always choose PulseAll.
you have to keep checking whether the queue is still empty or not. Using only if would only check it once, wait for a while, then a dequeue. What if at that time the queue is still empty? BANG! queue underflow error...
with if condition when something released the lock the queue.Count == 0 will not check again and maybe a queue underflow error so we have to check the condition every time because of concurrency and this is called Spinning
Why on Unix it could go wrong is because of the spurious wake up, possibility caused by OS signals. It is a side effect that is not guaranteed to never happen on windows as well. This is not a legacy, it is how OS works. If Monitors are implemented in terms of Condition Variable, that is.
def : a spurious wake up is a re-scheduling of a sleeping thread on a condition variable wait site, that was not triggered by an action coming from the current program threads (like Pulse()).
This inconvenience could be masked in managed languages by, e.g. the queues. So before going out of the Wait() function, the framework could check that this running thread is actually really being requested for scheduling, if it does not find itself in a run queue it can go back to sleep. Hiding the problem.
if (queue.Count == 0)
will do.
Using while loop pattern for "wait for and check condition" context is a legacy leftover, I think. Because non-Windows, non-.NET monitor variables can be triggered without actual Pulse.
In .NET, you private monitor variable cannot be triggered without Queue filling so you don't need to worry about queue underflow after monitor waiting. But, it is really not bad habit to use while loop for "wait for and check condition".
I have developed a generic producer-consumer queue which pulses by Monitor in the following way:
the enqueue :
public void EnqueueTask(T task)
{
_workerQueue.Enqueue(task);
Monitor.Pulse(_locker);
}
the dequeue:
private T Dequeue()
{
T dequeueItem;
if (_workerQueue.Count > 0)
{
_workerQueue.TryDequeue(out dequeueItem);
if(dequeueItem!=null)
return dequeueItem;
}
while (_workerQueue.Count == 0)
{
Monitor.Wait(_locker);
}
_workerQueue.TryDequeue(out dequeueItem);
return dequeueItem;
}
the wait section produces the following SynchronizationLockException :
"object synchronization method was called from an unsynchronized block of code"
do i need to synch it? why ? Is it better to use ManualResetEvents or the Slim version of .NET 4.0?
Yes, the current thread needs to "own" the monitor in order to call either Wait or Pulse, as documented. (So you'll need to lock for Pulse as well.) I don't know the details for why it's required, but it's the same in Java. I've usually found I'd want to do that anyway though, to make the calling code clean.
Note that Wait releases the monitor itself, then waits for the Pulse, then reacquires the monitor before returning.
As for using ManualResetEvent or AutoResetEvent instead - you could, but personally I prefer using the Monitor methods unless I need some of the other features of wait handles (such as atomically waiting for any/all of multiple handles).
From the MSDN description of Monitor.Wait():
Releases the lock on an object and blocks the current thread until it reacquires the lock.
The 'releases the lock' part is the problem, the object isn't locked. You are treating the _locker object as though it is a WaitHandle. Doing your own locking design that's provably correct is a form of black magic that's best left to our medicine man, Jeffrey Richter and Joe Duffy. But I'll give this one a shot:
public class BlockingQueue<T> {
private Queue<T> queue = new Queue<T>();
public void Enqueue(T obj) {
lock (queue) {
queue.Enqueue(obj);
Monitor.Pulse(queue);
}
}
public T Dequeue() {
T obj;
lock (queue) {
while (queue.Count == 0) {
Monitor.Wait(queue);
}
obj = queue.Dequeue();
}
return obj;
}
}
In most any practical producer/consumer scenario you will want to throttle the producer so it cannot fill the queue unbounded. Check Duffy's BoundedBuffer design for an example. If you can afford to move to .NET 4.0 then you definitely want to take advantage of its ConcurrentQueue class, it has lots more black magic with low-overhead locking and spin-waiting.
The proper way to view Monitor.Wait and Monitor.Pulse/PulseAll is not as providing a means of waiting, but rather (for Wait) as a means of letting the system know that the code is in a waiting loop which can't exit until something of interest changes, and (for Pulse/PulseAll) as a means of letting the system know that code has just changed something that might cause satisfy the exit condition some other thread's waiting loop. One should be able to replace all occurrences of Wait with Sleep(0) and still have code work correctly (even if much less efficiently, as a result of spending CPU time repeatedly testing conditions that haven't changed).
For this mechanism to work, it is necessary to avoid the possibility of the following sequence:
The code in the wait loop tests the condition when it isn't satisfied.
The code in another thread changes the condition so that it is satisfied.
The code in that other thread pulses the lock (which nobody is yet waiting on).
The code in the wait loop performs a Wait since its condition wasn't satisfied.
The Wait method requires that the waiting thread have a lock, since that's the only way it can be sure that the condition it's waiting upon won't change between the time it's tested and the time the code performs the Wait. The Pulse method requires a lock because that's the only way it can be sure that if another thread has "committed" itself to performing a Wait, the Pulse won't occur until after the other thread actually does so. Note that using Wait within a lock doesn't guarantee that it's being used correctly, but there's no way that using Wait outside a lock could possibly be correct.
The Wait/Pulse design actually works reasonably well if both sides cooperate. The biggest weaknesses of the design, IMHO, are (1) there's no mechanism for a thread to wait until any of a number of objects is pulsed; (2) even if one is "shutting down" an object such that all future wait loops should exit immediately (probably by checking an exit flag), the only way to ensure that any Wait to which a thread has committed itself will get a Pulse is to acquire the lock, possibly waiting indefinitely for it to become available.
Alright...I've given the site a fair search and have read over many posts about this topic. I found this question: Code for a simple thread pool in C# especially helpful.
However, as it always seems, what I need varies slightly.
I have looked over the MSDN example and adapted it to my needs somewhat. The example I refer to is here: http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80,printer).aspx
My issue is this. I have a fairly simple set of code that loads a web page via the HttpWebRequest and WebResponse classes and reads the results via a Stream. I fire off this method in a thread as it will need to executed many times. The method itself is pretty short, but the number of times it needs to be fired (with varied data for each time) varies. It can be anywhere from 1 to 200.
Everything I've read seems to indicate the ThreadPool class being the prime candidate. Here is what things get tricky. I might need to fire off this thing say 100 times, but I can only have 3 threads at most running (for this particular task).
I've tried setting the MaxThreads on the ThreadPool via:
ThreadPool.SetMaxThreads(3, 3);
I'm not entirely convinced this approach is working. Furthermore, I don't want to clobber other web sites or programs running on the system this will be running on. So, by limiting the # of threads on the ThreadPool, can I be certain that this pertains to my code and my threads only?
The MSDN example uses the event drive approach and calls WaitHandle.WaitAll(doneEvents); which is how I'm doing this.
So the heart of my question is, how does one ensure or specify a maximum number of threads that can be run for their code, but have the code keep running more threads as the previous ones finish up until some arbitrary point? Am I tackling this the right way?
Sincerely,
Jason
Okay, I've added a semaphore approach and completely removed the ThreadPool code. It seems simple enough. I got my info from: http://www.albahari.com/threading/part2.aspx
It's this example that showed me how:
[text below here is a copy/paste from the site]
A Semaphore with a capacity of one is similar to a Mutex or lock, except that the Semaphore has no "owner" – it's thread-agnostic. Any thread can call Release on a Semaphore, while with Mutex and lock, only the thread that obtained the resource can release it.
In this following example, ten threads execute a loop with a Sleep statement in the middle. A Semaphore ensures that not more than three threads can execute that Sleep statement at once:
class SemaphoreTest
{
static Semaphore s = new Semaphore(3, 3); // Available=3; Capacity=3
static void Main()
{
for (int i = 0; i < 10; i++)
new Thread(Go).Start();
}
static void Go()
{
while (true)
{
s.WaitOne();
Thread.Sleep(100); // Only 3 threads can get here at once
s.Release();
}
}
}
Note: if you are limiting this to "3" just so you don't overwhelm the machine running your app, I'd make sure this is a problem first. The threadpool is supposed to manage this for you. On the other hand, if you don't want to overwhelm some other resource, then read on!
You can't manage the size of the threadpool (or really much of anything about it).
In this case, I'd use a semaphore to manage access to your resource. In your case, your resource is running the web scrape, or calculating some report, etc.
To do this, in your static class, create a semaphore object:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
Then, in each thread, you do this:
System.Threading.Semaphore S = new System.Threading.Semaphore(3, 3);
try
{
// wait your turn (decrement)
S.WaitOne();
// do your thing
}
finally {
// release so others can go (increment)
S.Release();
}
Each thread will block on the S.WaitOne() until it is given the signal to proceed. Once S has been decremented 3 times, all threads will block until one of them increments the counter.
This solution isn't perfect.
If you want something a little cleaner, and more efficient, I'd recommend going with a BlockingQueue approach wherein you enqueue the work you want performed into a global Blocking Queue object.
Meanwhile, you have three threads (which you created--not in the threadpool), popping work out of the queue to perform. This isn't that tricky to setup and is very fast and simple.
Examples:
Best threading queue example / best practice
Best method to get objects from a BlockingQueue in a concurrent program?
It's a static class like any other, which means that anything you do with it affects every other thread in the current process. It doesn't affect other processes.
I consider this one of the larger design flaws in .NET, however. Who came up with the brilliant idea of making the thread pool static? As your example shows, we often want a thread pool dedicated to our task, without having it interfere with unrelated tasks elsewhere in the system.
Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class
The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.
I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.
Be careful, QueueUserWorkItem might fail
There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.
ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).
I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.
You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).
I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.