I have a producer-consumer scenario in ASP.NET. I designed a Producer class, a Consumer class and a class for holding the shared objects and responsible for communication between Producer and Consumer, lets call it Mediator. Because I fork the execution path at start-up (in parent object) and one thread would call Producer.Start() and another thread calls Consumer.Start(), I need to pass a reference of Mediator to both Producer and Consumer (via Constructor). Mediator is a smart class which will optimize many things like length of it's inner queue but for now consider it as a circular blocking queue. Producer would enqueues new objects to Mediator until the queue gets full and then Producer would block. Consumer dequeues objects from Mediator until there's nothing in the queue. For signaling between threads, I implemented two methods in Mediator class: Wait() and Pulse(). The code is something like this:
Class Mediator
{
private object _locker = new object();
public void Wait()
{
lock(_locker)
Monitor.Wait(_locker);
}
public void Pulse()
{
lock(_locker)
Monitor.Pulse(_locker);
}
}
// This way threads are signaling:
Class Consumer
{
object x;
if (Mediator.TryDequeue(out x))
// Do something
else
Mediator.Wait();
}
Inside Mediator I use this.Pulse() every time something is Enqueued or Dequeued so waiting threads would be signaled and continue their work.
But I encounter deadlocks and because I have never used this kind of design for signaling threads, I'm not sure if something is wrong with the design or I'm doing something wrong elsewhere ?
Thanks
There is not much code here to go on, but my best guess is that you have a live-lock problem. If Mediator.Pulse is called before Mediator.Wait then the signal gets lost even though there is something in the queue. Here is the standard pattern for implementing the blocking queue.
public class BlockingQueue<T>
{
private Queue<T> m_Queue = new Queue<T>();
public void Enqueue(T item)
{
lock (m_Queue)
{
m_Queue.Enqueue(item);
Monitor.Pulse(m_Queue);
}
}
public T Dequeue()
{
lock (m_Queue)
{
while (m_Queue.Count == 0)
{
Monitor.Wait(m_Queue);
}
return m_Queue.Dequeue();
}
}
}
Notice how Monitor.Wait is only called when the queue is empty. Also notice how it is being called in a while loop. This is because a Wait does not have priority over a Enter so a new thread coming into Dequeue could take the last item even though a call to Wait is ready to return. Without the loop a thread could attempt to remove an item from an empty queue.
If you can use .NET 4 your best bet would be to use BlockingCollection<T> (http://msdn.microsoft.com/en-us/library/dd267312.aspx) which handles queueing, dequeuing, and limits on queue length.
Nothing is wrong with design.
Problem raises when you use Monitor.Wait() and Monitor.Pulse() when you don't know which thread is going to do it's job first (producer or consumer). In that case using an AutoResetEvent resolves the problem. Think of consumer when it reaches the section where it should consume the data produced by producer. Maybe it reaches there before producer pulse it, then everything is OK but what if consumer reaches there after producer has signaled. Yes, then you encounter a deadlock because producer already called Monitor.Pulse() for that section and would not repeat it.
Using AutoResetEvent you sure consumer waits there for signal from producer and if producer already has signaled before consumer even reaches the section, the gate is open and consumer would continue.
It's OK to use Monitor.Wait() and Monitor.Pulse() inside Mediator for signaling waiting threads.
Is it possible that the deadlock is occurring because Pulse doesn't store any state? This means that if the Producer calls Pulse before/after Consumer calls Wait, then the Wait will block. This is the note in the documentation for Monitor.Pulse
Also, you should know that object x = new object(); is extraneous - an out call will initialize x, so the object created will fall out of scope with the TryDequeue call.
Difficult to tell with the code sample supplied.
Is the lock held elsewhere? Within Mediator?
Are the threads just parked on obtaining the lock and not on the actual Wait call?
Have you paused the threads in a debugger to see what the current state is?
Have you tried a simple test with just putting a simple single value on a queue and getting it to work? Or is Mediator pretty complex at this point?
Until a little more detail is available in the Mediator class and your producer class, it's some wild guessing. It seems like some thread may be holding the lock when you don't expect it to. Once you pulse, you do need to free the lock in whatever thread may have it by exiting the "lock" scope. So, if somewhere in Mediator you have the lock and then call Pulse, you need to exit the outer most scope where the lock is held and not just the one in Pulse.
Can you refactor to a normal consumer/ producer queue? That could then handle enqueing and dequing and thread-signalling in a single class, so no need to pass around public locks. Dequeing process could then be handled via a delegate. I can post an example if you wish.
Related
I'm using Pipelines pattern implementation to decouple messages consumer from a producer to avoid slow-consumer issue.
In case of any exception on a message processing stage [1] it will be lost and not dispatched to an other service/layer [2]. How can I handle such issue in [3] so message will not be lost and what is important! order of messages will not be mixed up so upper service/layer will get messages in the order they came in. I have an idea which involves an other intermediate Queue but it seems complex? Unfortunately BlockingCollection<T> does not expose any analogue of Queue.Peek() method so I can just read next available message and in case of successfull processing do Dequeue()
private BlockingCollection<IMessage> messagesQueue;
// TPL Task does following:
// Listen to new messages and as soon as any comes in - process it
foreach (var cachedMessage in
messagesQueue.GetConsumingEnumerable(cancellation))
{
const int maxRetries = 3;
int retriesCounter = 0;
bool isSent = false;
// On this point a message already is removed from messagesQueue
while (!isSent && retriesCounter++ <= maxRetries)
{
try
{
// [1] Preprocess a message
// [2] Dispatch to an other service/layer
clientProxyCallback.SendMessage(cachedMessage);
isSent = true;
}
catch(Exception exception)
{
// [3]
// logging
if (!isSent && retriesCounter < maxRetries)
{
Thread.Sleep(NSeconds);
}
}
if (!isSent && retriesCounter == maxRetries)
{
// just log, message is lost on this stage!
}
}
}
EDIT: Forgot to say this is IIS hosted WCF service which dispatches messages back to Silverlight client WCF Proxy via client callback contract.
EDIT2: Below is how I would do this using Peek(), Am I missing something?
bool successfullySent = true;
try
{
var item = queue.Peek();
PreProcessItem(item);
SendItem(item);
}
catch(Exception exception)
{
successfullySent = false;
}
finally
{
if (successfullySent)
{
// just remove already sent item from the queue
queue.Dequeue();
}
}
EDIT3: Surely I can use old style approach using while loop, bool flag, Queue and AutoResetEvent, but I just wondering whether the same is possible using BlockingCollection and GetConsumingEnumerable() I think facility like Peek would be
very helpful when using together with consuming enumerable, since otherwise all Pipeline pattern implementation examples new stuff like BlockingCollection and GetConsumingEnumerable() looks not durable and I have to move back to the old approach.
You should consider intermediate queue.
BlockingCollection<T> can't "peek" items because of its nature - there can be more than one consumer. One of them can peek an item, and another one can take it - hence, the first one will try to take item, that already has been taken.
As Dennis says in his comment, BlockingCollection<T> provides a blocking wrapper to any implementor of the IProducerConsumerCollection<T> interface.
As you can see, IProducerConsumerCollection<T>, by design, does not define a Peek<T> or other methods necessary to implement one. This means that BlockingCollection<T> cannot, as it stands, offer an analouge to Peek.
If you consider, this greately reduces the concurrencey problems created by the utility trade off of a Peek implementation. How can you consume without consuming? To Peek concurrently you would have to lock the head of the collection until the Peek operation was completed which I and the designers of BlockingCollection<T> view as sub-optimal. I think it would also be messy and difficult to implement, requiring some sort of disposable peek context.
If you consume a message and its consumption fails you will have to handle with it. You could add it to another failures queue, re-add it to the normal processing queue for a furture retry or just log its failure for posterity or, some other action appropriate to your context.
If you don't want to consume the messages concurrently then there is no need to use BlockingCollection<T> since you don't need concurrent consumption. You could use ConcurrentQueue<T> directly, you'll still get synchronicity off adds, and you can use TryPeek<T> safely since you control a single consumer. If consumption fails you could stop consumption with a infinite retry loop in you desire although, I suggest this requires some design thought.
BlockingCollection<T> is a wrapper around IProducerConsumerCollection<T>, which is more generic than e.g. ConcurrentQueue and gives the implementer the freedom of not having to implement a (Try)Peek method.
However, you can always call TryPeek on the underlying queue directly:
ConcurrentQueue<T> useOnlyForPeeking = new ConcurrentQueue<T>();
BlockingCollection<T> blockingCollection = new BlockingCollection<T>(useOnlyForPeeking);
...
useOnlyForPeeking.TryPeek(...)
Note however that you must not modify your queue via useOnlyForPeeking, otherwise blockingCollection will get confused and may throw InvalidOperationExceptions at you, but I'd be surprised if calling the non-modifying TryPeek on this concurrent data structure would be an issue.
You could use ConcurrentQueue<T> instead, it has TryDequeue() method.
ConcurrentQueue<T>.TryDequeue(out T result) tries to remove and return the object at the beginning of the concurrent queue, it returns true if an element was removed and returned from the beginning of the ConcurrentQueue successfully.
So, no need to check a Peek first.
TryDequeue() is thread safe:
ConcurrentQueue<T> handles all synchronization internally. If two threads call TryDequeue(T) at precisely the same moment, neither operation is blocked.
As far as I understand it returns false only if the queue is empty:
If the queue was populated with code such as q.Enqueue("a"); q.Enqueue("b"); q.Enqueue("c"); and two threads concurrently try to dequeue an element, one thread will dequeue a and the other thread will dequeue b. Both calls to TryDequeue(T) will return true, because they were both able to dequeue an element. If each thread goes back to dequeue an additional element, one of the threads will dequeue c and return true, whereas the other thread will find the queue empty and will return false.
http://msdn.microsoft.com/en-us/library/dd287208%28v=vs.100%29.aspx
UPDATE
Perhaps, the easiest option would be using TaskScheduler Class. With it you can wrap all your processing tasks into the queue's items and simplify the implementation of synchronisation.
I've reused the example producer consumer queue from the C# in a Nutshell book of Albahari (http://www.albahari.com/threading/part5.aspx#_BlockingCollectionT) and a colleague remarked:
"Why isn't the Dispose called on the BlockingCollection in the Dispose of the collection?"
I couldn't find an answer and the only reason I can come up with is that execution of the remaining workload of the queue wouldn't be processed. However, when I'm disposing the queue, why wouldn't it stop processing?
Besides the "Why you shouldn't Dispose the BlockingCollection?" I've got also a second question "Does it harm if you don't dispose a BlockingCollection?". I suppose when you are spawning/disposing a lot of producer consumer queues it gives problems (not that I want that but just for the cause of knowing).
According to What does BlockingCollection.Dispose actually do? BlockingCollection contains two wait handles (obviously) so not calling Dispose will give you some problems. Thanks ken2k for pointing this out.
The code I'm talking about:
public class PCQueue : IDisposable
{
BlockingCollection<Action> _taskQ = new BlockingCollection<Action>();
public PCQueue (int workerCount)
{
// Create and start a separate Task for each consumer:
for (int i = 0; i < workerCount; i++)
Task.Factory.StartNew (Consume);
}
public void Dispose() { _taskQ.CompleteAdding(); }
public void EnqueueTask (Action action) { _taskQ.Add (action); }
void Consume()
{
// This sequence that we’re enumerating will block when no elements
// are available and will end when CompleteAdding is called.
foreach (Action action in _taskQ.GetConsumingEnumerable())
action(); // Perform task.
}
}
Because that would be a bug. The collection cannot be disposed until all the consumer threads have exited. If that's not interlocked then those threads would bomb with an exception. The class does not in any way have awareness of what consumer threads might be pulling from the collection so it cannot reasonably know when it is safe to dispose. All it can do is prevent any more objects from being added by the producer, that's reasonable.
This is a common problem with threads, safely disposing requires knowing when the thread is complete. Which often defeats the point of using threads in the first place, you don't want to wait until a thread ends. This is most visible in the Thread class itself, it consumes five native operating system handles but doesn't have a Dispose() method. They need to be released by the finalizer. Same here.
I was reading AutoResetEvent documentation on MSDN and following warning kinda bothers me..
"Important:
There is no guarantee that every call to the Set method will release a thread. If two calls are too close together, so that the second call occurs before a thread has been released, only one thread is released. It is as if the second call did not happen. Also, if Set is called when there are no threads waiting and the AutoResetEvent is already signaled, the call has no effect."
But this warning basically kills the very reason to have such a thread synchronization techniques. For example I have a list which will hold jobs. And there is only one producer which will add jobs to the list. I have consumers (more than one), waiting to get the job from the list.. something like this..
Producer:
void AddJob(Job j)
{
lock(qLock)
{
jobQ.Enqueue(j);
}
newJobEvent.Set(); // newJobEvent is AutoResetEvent
}
Consumer
void Run()
{
while(canRun)
{
newJobEvent.WaitOne();
IJob job = null;
lock(qLock)
{
job = jobQ.Dequeue();
}
// process job
}
}
If the above warning is true, then if I enqueue two jobs very quickly, only one thread will pick up the job, isn't it? I was under the assumption that Set will be atomic, that is it does the following:
Set the event
If threads are waiting, pick one thread to wake up
reset the event
run the selected thread.
So I am basically confused about the warning in MSDN. is it a valid warning?
Even if the warning isn't true and Set is atomic, why would you use an AutoResetEvent here? Let's say you have some producers queue up 3 events in row and there's one consumer. After processing the 2nd job, the consumer blocks and never processes the third.
I would use a ReaderWriterLockSlim for this type of synchronization. Basically, you need multiple producers to be able to have write locks, but you don't want consumers to lock out producers for a long time while they are only reading the queue size.
The message on MSDN is a valid message indeed. What's happening internally is something like this:
Thread A waits for the event
Thread B sets the event
[If thread A is in spinlock]
[yes] Thread a detects that the event is set, unsets it and resumes its work
[no] The event will tell thread A to wake up, once woken, thread A will unset the event resume its work.
Note that the internal logic is not synchronous since Thread B doesn't wait for Thread A to continue its business. You can make this synchronous by introducing a temporary ManualResetEvent that thread A has to signal once it continues its work and on which Thread B has to wait. This is not done by default due to the inner working of the windows threading model. I guess the documentation is misleading but correct for saying that the Set method only releases one or more waiting threads.
Alternatively i would suggest you to look at the BlockingCollection class in the System.Collections.Concurrent namespace of the BCL introduced in .NET 4.0 which does exactly what you are trying to do
Given a following code snippet(found in somewhere while learning threading).
public class BlockingQueue<T>
{
private readonly object sync = new object();
private readonly Queue<T> queue;
public BlockingQueue()
{
queue = new Queue<T>();
}
public void Enqueue(T item)
{
lock (sync)
{
queue.Enqueue(item);
Monitor.PulseAll(sync);
}
}
public T Dequeue()
{
lock (sync)
{
while (queue.Count == 0)
Monitor.Wait(sync);
return queue.Dequeue();
}
}
}
What I want to understand is ,
Why is there a while loop ?
while (queue.Count == 0)
Monitor.Wait(sync);
and what is wrong with the,
if(queue.Count == 0)
Monitor.Wait(sync);
In fact, all the time when I see the similar code I found using while loop, can anyone please help me understand the use of one above another.
Thank you.
You need to understand what Pulse, PulseAll, and Wait are doing. The Monitor maintains two queues: the waiting queue and the ready queue. When a thread calls Wait it is moved into the waiting queue. When a thread calls Pulse it moves one and only one thread from the waiting queue to the ready queue. When a thread calls PulseAll it moves all threads from the waiting queue to the ready queue. Threads in the ready queue are eligible to reacquire the lock at any moment, but only after the current holder releases it of course.
Based on this knowledge it is fairly easy to understand why you must recheck the queue count when using PulseAll. It is because all dequeueing threads will eventually wake and will want to attempt to extract an item from queue. But, what if there is only one item in the queue to begin with? Obviously, we must recheck the queue count to avoid dequeueing an empty queue.
So what would be the conclusion if you had used Pulse instead of PulseAll? There would still be a problem with the simple if check. The reason is because a thread from the ready queue is not necessarily going to be the next thread to acquire the lock. That is because the Monitor does not give preference to a Wait call above an Enter call.
The while loop is a fairly standard pattern when using Monitor.Wait. This is because pulsing a thread does not have semantic meaning by itself. It is only a signal that the lock state has changed. When threads wake up after blocking on Wait they should recheck the same condition that was originally used to block the thread to see if the thread can now proceed. Sometimes it cannot and so it should block some more.
The best rule of thumb here is that if there is doubt about whether to use an if check or a while check then always choose a while loop because it is safer. In fact, I would take this to the extreme and suggest to always use a while loop because there is no inherent advantage in using the simpler if check and because the if check is almost always the wrong choice anyway. A similar rule holds for choosing whether to use Pulse or PulseAll. If there is doubt about which one to use then always choose PulseAll.
you have to keep checking whether the queue is still empty or not. Using only if would only check it once, wait for a while, then a dequeue. What if at that time the queue is still empty? BANG! queue underflow error...
with if condition when something released the lock the queue.Count == 0 will not check again and maybe a queue underflow error so we have to check the condition every time because of concurrency and this is called Spinning
Why on Unix it could go wrong is because of the spurious wake up, possibility caused by OS signals. It is a side effect that is not guaranteed to never happen on windows as well. This is not a legacy, it is how OS works. If Monitors are implemented in terms of Condition Variable, that is.
def : a spurious wake up is a re-scheduling of a sleeping thread on a condition variable wait site, that was not triggered by an action coming from the current program threads (like Pulse()).
This inconvenience could be masked in managed languages by, e.g. the queues. So before going out of the Wait() function, the framework could check that this running thread is actually really being requested for scheduling, if it does not find itself in a run queue it can go back to sleep. Hiding the problem.
if (queue.Count == 0)
will do.
Using while loop pattern for "wait for and check condition" context is a legacy leftover, I think. Because non-Windows, non-.NET monitor variables can be triggered without actual Pulse.
In .NET, you private monitor variable cannot be triggered without Queue filling so you don't need to worry about queue underflow after monitor waiting. But, it is really not bad habit to use while loop for "wait for and check condition".
I have developed a generic producer-consumer queue which pulses by Monitor in the following way:
the enqueue :
public void EnqueueTask(T task)
{
_workerQueue.Enqueue(task);
Monitor.Pulse(_locker);
}
the dequeue:
private T Dequeue()
{
T dequeueItem;
if (_workerQueue.Count > 0)
{
_workerQueue.TryDequeue(out dequeueItem);
if(dequeueItem!=null)
return dequeueItem;
}
while (_workerQueue.Count == 0)
{
Monitor.Wait(_locker);
}
_workerQueue.TryDequeue(out dequeueItem);
return dequeueItem;
}
the wait section produces the following SynchronizationLockException :
"object synchronization method was called from an unsynchronized block of code"
do i need to synch it? why ? Is it better to use ManualResetEvents or the Slim version of .NET 4.0?
Yes, the current thread needs to "own" the monitor in order to call either Wait or Pulse, as documented. (So you'll need to lock for Pulse as well.) I don't know the details for why it's required, but it's the same in Java. I've usually found I'd want to do that anyway though, to make the calling code clean.
Note that Wait releases the monitor itself, then waits for the Pulse, then reacquires the monitor before returning.
As for using ManualResetEvent or AutoResetEvent instead - you could, but personally I prefer using the Monitor methods unless I need some of the other features of wait handles (such as atomically waiting for any/all of multiple handles).
From the MSDN description of Monitor.Wait():
Releases the lock on an object and blocks the current thread until it reacquires the lock.
The 'releases the lock' part is the problem, the object isn't locked. You are treating the _locker object as though it is a WaitHandle. Doing your own locking design that's provably correct is a form of black magic that's best left to our medicine man, Jeffrey Richter and Joe Duffy. But I'll give this one a shot:
public class BlockingQueue<T> {
private Queue<T> queue = new Queue<T>();
public void Enqueue(T obj) {
lock (queue) {
queue.Enqueue(obj);
Monitor.Pulse(queue);
}
}
public T Dequeue() {
T obj;
lock (queue) {
while (queue.Count == 0) {
Monitor.Wait(queue);
}
obj = queue.Dequeue();
}
return obj;
}
}
In most any practical producer/consumer scenario you will want to throttle the producer so it cannot fill the queue unbounded. Check Duffy's BoundedBuffer design for an example. If you can afford to move to .NET 4.0 then you definitely want to take advantage of its ConcurrentQueue class, it has lots more black magic with low-overhead locking and spin-waiting.
The proper way to view Monitor.Wait and Monitor.Pulse/PulseAll is not as providing a means of waiting, but rather (for Wait) as a means of letting the system know that the code is in a waiting loop which can't exit until something of interest changes, and (for Pulse/PulseAll) as a means of letting the system know that code has just changed something that might cause satisfy the exit condition some other thread's waiting loop. One should be able to replace all occurrences of Wait with Sleep(0) and still have code work correctly (even if much less efficiently, as a result of spending CPU time repeatedly testing conditions that haven't changed).
For this mechanism to work, it is necessary to avoid the possibility of the following sequence:
The code in the wait loop tests the condition when it isn't satisfied.
The code in another thread changes the condition so that it is satisfied.
The code in that other thread pulses the lock (which nobody is yet waiting on).
The code in the wait loop performs a Wait since its condition wasn't satisfied.
The Wait method requires that the waiting thread have a lock, since that's the only way it can be sure that the condition it's waiting upon won't change between the time it's tested and the time the code performs the Wait. The Pulse method requires a lock because that's the only way it can be sure that if another thread has "committed" itself to performing a Wait, the Pulse won't occur until after the other thread actually does so. Note that using Wait within a lock doesn't guarantee that it's being used correctly, but there's no way that using Wait outside a lock could possibly be correct.
The Wait/Pulse design actually works reasonably well if both sides cooperate. The biggest weaknesses of the design, IMHO, are (1) there's no mechanism for a thread to wait until any of a number of objects is pulsed; (2) even if one is "shutting down" an object such that all future wait loops should exit immediately (probably by checking an exit flag), the only way to ensure that any Wait to which a thread has committed itself will get a Pulse is to acquire the lock, possibly waiting indefinitely for it to become available.