I have to develop a multithreaded application, where there will be multiple threads, each thread generates custom event log which need to be saved in queue (not Microsoft MSMQ).
There will be another thread which reads log data from queue and manipulates it, with certain information to save log information into a file. Basically here we are implementing Multiple-producer, Single-consumer paradigm.
Can anybody provide suggestions on how to implement this in C++ or C#.
Thanks,
This kind of thing is very easy to do using the BlockingCollection<T> defined in System.Collections.Concurrent.
Basically, you create your queue so that all threads can access it:
BlockingCollection<LogRecord> LogQueue = new BlockingCollection<LogRecord>();
Each producer adds items to the queue:
while (!Shutdown)
{
LogRecord rec = CreateLogRecord(); // however that's done
LogQueue.Add(rec);
}
And the consumer does something similar:
while (!Shutdown)
{
LogRecord rec = LogQueue.Take();
// process the record
}
By default, BlockingCollection uses a ConcurrentQueue<T> as the backing store. The ConcurrentQueue takes care of thread synchronization and, and the BlockingCollection does a non-busy wait when trying to take an item. That is, if the consumer calls Take when there are no items in the queue, it does a non-busy wait (no sleeping/spinning) until an item is available.
You can use a synchronized queue (if you have .NET 3.5 or older code) or even better the new ConcurrentQueue<T>!
What you are planning is a classic producer consumer queue with a thread consuming the items on the queue to do some work. This can be wrapped into is a higher level construct called an "actor" or "active object".
Basically this wraps the queue and the thread that consumes the items into a single class, the other threads all asynchronous methods on this class with put the messages on the queue to be performed by the actor's thread. In your case the class could have a single method writeData which stores the data in the queue and triggers the condition variable to notify the actor thread that there is something in the queue. The actor thread sees if there is any data in the queue if not waits on the condition variable.
Here is a good article on the concept:
http://www.drdobbs.com/go-parallel/article/showArticle.jhtml;jsessionid=UTEXJOTLP0YDNQE1GHPSKH4ATMY32JVN?articleID=225700095
Related
I've read a few other similiar-but-not-the-same links trying to find some answers:
How to consume a BlockingCollection<T> in batches
However, (in the above link) not using GetConsumingEnumerable seems fishy.
What is the correct method to effectively block producers while the consumer (should be singular) empties the collection?
[We want to do batch-processing because each batch does a web service call which would be a bottle neck if every single message/item needed its own call. Batching the messages/items is the solution for this bottleneck.]
Ideally:
1) Receive message
2) New producer task to push into collection
3) When collection 'full' (arbitrary limit), block all producer(s), new consumer task to consume ALL of the collection, then unblock for producer(s).
In other words; I want (parallel producers) xor (single consumer) acting on the collection at any time.
Seems like this should have been done before, but I can't seem to find a code snippet that specifically acts this way.
Thanks for any help.
Using this model all of the work is entirely serialized, which is to say you never have more than one "thing" working at a time. Either the producer is working, or the consumer is. Because of this, you don't really need a collection that is manipulated from both a producer and consumer, instead you can have a producer that produces batches of a traditional collection that the consumer consumes when it's done. It could look something like this:
public Task<List<Thing>> Produce(Message message)
{
//...
}
public Task Consume(List<Thing> data)
{
//...
}
public async Task MessageReceived(Message message)
{
while(HaveMoreBatches(message))
{
await Consume(await Produce(message));
}
}
This lets you produce a batch, then consume it, then produce another batch, then consume it, etc. until there are no more batches to produce.
According to your vague description, I believe double-buffer is what you want.
Simply create two buffers. Producers write into one until. When it gets full or when timer ticks out, it gets "swapped" for the second one and producers start writing into the new one. The consumer then starts reading the first, now full buffer.
This allows both producers and consumer to run at the same time. And makes sure consumer handles all previously create work in a batch before repeating the loop again.
I'm using Pipelines pattern implementation to decouple messages consumer from a producer to avoid slow-consumer issue.
In case of any exception on a message processing stage [1] it will be lost and not dispatched to an other service/layer [2]. How can I handle such issue in [3] so message will not be lost and what is important! order of messages will not be mixed up so upper service/layer will get messages in the order they came in. I have an idea which involves an other intermediate Queue but it seems complex? Unfortunately BlockingCollection<T> does not expose any analogue of Queue.Peek() method so I can just read next available message and in case of successfull processing do Dequeue()
private BlockingCollection<IMessage> messagesQueue;
// TPL Task does following:
// Listen to new messages and as soon as any comes in - process it
foreach (var cachedMessage in
messagesQueue.GetConsumingEnumerable(cancellation))
{
const int maxRetries = 3;
int retriesCounter = 0;
bool isSent = false;
// On this point a message already is removed from messagesQueue
while (!isSent && retriesCounter++ <= maxRetries)
{
try
{
// [1] Preprocess a message
// [2] Dispatch to an other service/layer
clientProxyCallback.SendMessage(cachedMessage);
isSent = true;
}
catch(Exception exception)
{
// [3]
// logging
if (!isSent && retriesCounter < maxRetries)
{
Thread.Sleep(NSeconds);
}
}
if (!isSent && retriesCounter == maxRetries)
{
// just log, message is lost on this stage!
}
}
}
EDIT: Forgot to say this is IIS hosted WCF service which dispatches messages back to Silverlight client WCF Proxy via client callback contract.
EDIT2: Below is how I would do this using Peek(), Am I missing something?
bool successfullySent = true;
try
{
var item = queue.Peek();
PreProcessItem(item);
SendItem(item);
}
catch(Exception exception)
{
successfullySent = false;
}
finally
{
if (successfullySent)
{
// just remove already sent item from the queue
queue.Dequeue();
}
}
EDIT3: Surely I can use old style approach using while loop, bool flag, Queue and AutoResetEvent, but I just wondering whether the same is possible using BlockingCollection and GetConsumingEnumerable() I think facility like Peek would be
very helpful when using together with consuming enumerable, since otherwise all Pipeline pattern implementation examples new stuff like BlockingCollection and GetConsumingEnumerable() looks not durable and I have to move back to the old approach.
You should consider intermediate queue.
BlockingCollection<T> can't "peek" items because of its nature - there can be more than one consumer. One of them can peek an item, and another one can take it - hence, the first one will try to take item, that already has been taken.
As Dennis says in his comment, BlockingCollection<T> provides a blocking wrapper to any implementor of the IProducerConsumerCollection<T> interface.
As you can see, IProducerConsumerCollection<T>, by design, does not define a Peek<T> or other methods necessary to implement one. This means that BlockingCollection<T> cannot, as it stands, offer an analouge to Peek.
If you consider, this greately reduces the concurrencey problems created by the utility trade off of a Peek implementation. How can you consume without consuming? To Peek concurrently you would have to lock the head of the collection until the Peek operation was completed which I and the designers of BlockingCollection<T> view as sub-optimal. I think it would also be messy and difficult to implement, requiring some sort of disposable peek context.
If you consume a message and its consumption fails you will have to handle with it. You could add it to another failures queue, re-add it to the normal processing queue for a furture retry or just log its failure for posterity or, some other action appropriate to your context.
If you don't want to consume the messages concurrently then there is no need to use BlockingCollection<T> since you don't need concurrent consumption. You could use ConcurrentQueue<T> directly, you'll still get synchronicity off adds, and you can use TryPeek<T> safely since you control a single consumer. If consumption fails you could stop consumption with a infinite retry loop in you desire although, I suggest this requires some design thought.
BlockingCollection<T> is a wrapper around IProducerConsumerCollection<T>, which is more generic than e.g. ConcurrentQueue and gives the implementer the freedom of not having to implement a (Try)Peek method.
However, you can always call TryPeek on the underlying queue directly:
ConcurrentQueue<T> useOnlyForPeeking = new ConcurrentQueue<T>();
BlockingCollection<T> blockingCollection = new BlockingCollection<T>(useOnlyForPeeking);
...
useOnlyForPeeking.TryPeek(...)
Note however that you must not modify your queue via useOnlyForPeeking, otherwise blockingCollection will get confused and may throw InvalidOperationExceptions at you, but I'd be surprised if calling the non-modifying TryPeek on this concurrent data structure would be an issue.
You could use ConcurrentQueue<T> instead, it has TryDequeue() method.
ConcurrentQueue<T>.TryDequeue(out T result) tries to remove and return the object at the beginning of the concurrent queue, it returns true if an element was removed and returned from the beginning of the ConcurrentQueue successfully.
So, no need to check a Peek first.
TryDequeue() is thread safe:
ConcurrentQueue<T> handles all synchronization internally. If two threads call TryDequeue(T) at precisely the same moment, neither operation is blocked.
As far as I understand it returns false only if the queue is empty:
If the queue was populated with code such as q.Enqueue("a"); q.Enqueue("b"); q.Enqueue("c"); and two threads concurrently try to dequeue an element, one thread will dequeue a and the other thread will dequeue b. Both calls to TryDequeue(T) will return true, because they were both able to dequeue an element. If each thread goes back to dequeue an additional element, one of the threads will dequeue c and return true, whereas the other thread will find the queue empty and will return false.
http://msdn.microsoft.com/en-us/library/dd287208%28v=vs.100%29.aspx
UPDATE
Perhaps, the easiest option would be using TaskScheduler Class. With it you can wrap all your processing tasks into the queue's items and simplify the implementation of synchronisation.
I am working on a class library that logs audit details of a web application in several types of datasources(file, xml, database) based on policies defined in the web configuration file.
My Audit log method has a signature similar to this:
public static void LogInfo(User user, Module module, List lst);
Web application uses this method to log important pieces of details like warnings, error and even exception details.
Since in a single workflow, there are more than 700+ calls to these methods , I thought of making them asynchronous. I used simple method from ThreadPool class called QueueUserWorkItem
ThreadPool.QueueUserWorkItem(o => LogInfo(User user, Module module, List<Object> lst) );
but this does not ensure the order in which work item was queued to it. Even though all my information was logged but entire ordering was messed up. In my text file my logs were not in the order in which they were called.
Is there a way I can control the ordering of the threads being called using QueueUserWorkItem?
I don't think you can specify ordering when using QueueUserWorkItem.
To run the logging in parallel (on some background thread), you could use ConcurrentQueue<T>. This is a thread-safe collection that can be accessed from multiple threads. You could create one work item (or a thread) that reads elements from the collection and writes them to a file. Your main application would add items to the collection. The fact that you're adding items to the collection from a single thread should guarantee that they will be read in the right order.
To keep things simple, you can store Action values in the queue:
ConcurrentQueue<Action> logOperations = new ConcurrentQueue<Action>();
// To add logging operation from main thread:
logOperations.Add(() => LogInfo(user, module, lst));
The background task can just take Actions from the queue and run them:
// Start this when you create the `logOperations` collection
ThreadPool.QueueUserWorkItem(o => {
Action op;
// Repeatedly take log operations & run them
while (logOperations.TryDequeue(out op)) op();
});
If you need to stop the background processor (that writes data to the log), you can create a CancellationTokenSource and also end the while loop when the token is being cancelled (by the main thread). This cha be checked using IsCancellationRequested property (see MSDN)
One way of solving this would be to put your data in a queue, and then having a single task picking from that queue and writing them in order. If you are using .net 4.0 You could use ConcurrentQueue, which is thread safe, otherwise a simple Queue with proper locks would work as well.
The thread consuming the queue could then periodically check for any element inside the queue, and for each one of them it could log. This way the lengthy operation (logging) could be in its own thread, whereas in the main thread you do simply adds.
I have a producer-consumer scenario in ASP.NET. I designed a Producer class, a Consumer class and a class for holding the shared objects and responsible for communication between Producer and Consumer, lets call it Mediator. Because I fork the execution path at start-up (in parent object) and one thread would call Producer.Start() and another thread calls Consumer.Start(), I need to pass a reference of Mediator to both Producer and Consumer (via Constructor). Mediator is a smart class which will optimize many things like length of it's inner queue but for now consider it as a circular blocking queue. Producer would enqueues new objects to Mediator until the queue gets full and then Producer would block. Consumer dequeues objects from Mediator until there's nothing in the queue. For signaling between threads, I implemented two methods in Mediator class: Wait() and Pulse(). The code is something like this:
Class Mediator
{
private object _locker = new object();
public void Wait()
{
lock(_locker)
Monitor.Wait(_locker);
}
public void Pulse()
{
lock(_locker)
Monitor.Pulse(_locker);
}
}
// This way threads are signaling:
Class Consumer
{
object x;
if (Mediator.TryDequeue(out x))
// Do something
else
Mediator.Wait();
}
Inside Mediator I use this.Pulse() every time something is Enqueued or Dequeued so waiting threads would be signaled and continue their work.
But I encounter deadlocks and because I have never used this kind of design for signaling threads, I'm not sure if something is wrong with the design or I'm doing something wrong elsewhere ?
Thanks
There is not much code here to go on, but my best guess is that you have a live-lock problem. If Mediator.Pulse is called before Mediator.Wait then the signal gets lost even though there is something in the queue. Here is the standard pattern for implementing the blocking queue.
public class BlockingQueue<T>
{
private Queue<T> m_Queue = new Queue<T>();
public void Enqueue(T item)
{
lock (m_Queue)
{
m_Queue.Enqueue(item);
Monitor.Pulse(m_Queue);
}
}
public T Dequeue()
{
lock (m_Queue)
{
while (m_Queue.Count == 0)
{
Monitor.Wait(m_Queue);
}
return m_Queue.Dequeue();
}
}
}
Notice how Monitor.Wait is only called when the queue is empty. Also notice how it is being called in a while loop. This is because a Wait does not have priority over a Enter so a new thread coming into Dequeue could take the last item even though a call to Wait is ready to return. Without the loop a thread could attempt to remove an item from an empty queue.
If you can use .NET 4 your best bet would be to use BlockingCollection<T> (http://msdn.microsoft.com/en-us/library/dd267312.aspx) which handles queueing, dequeuing, and limits on queue length.
Nothing is wrong with design.
Problem raises when you use Monitor.Wait() and Monitor.Pulse() when you don't know which thread is going to do it's job first (producer or consumer). In that case using an AutoResetEvent resolves the problem. Think of consumer when it reaches the section where it should consume the data produced by producer. Maybe it reaches there before producer pulse it, then everything is OK but what if consumer reaches there after producer has signaled. Yes, then you encounter a deadlock because producer already called Monitor.Pulse() for that section and would not repeat it.
Using AutoResetEvent you sure consumer waits there for signal from producer and if producer already has signaled before consumer even reaches the section, the gate is open and consumer would continue.
It's OK to use Monitor.Wait() and Monitor.Pulse() inside Mediator for signaling waiting threads.
Is it possible that the deadlock is occurring because Pulse doesn't store any state? This means that if the Producer calls Pulse before/after Consumer calls Wait, then the Wait will block. This is the note in the documentation for Monitor.Pulse
Also, you should know that object x = new object(); is extraneous - an out call will initialize x, so the object created will fall out of scope with the TryDequeue call.
Difficult to tell with the code sample supplied.
Is the lock held elsewhere? Within Mediator?
Are the threads just parked on obtaining the lock and not on the actual Wait call?
Have you paused the threads in a debugger to see what the current state is?
Have you tried a simple test with just putting a simple single value on a queue and getting it to work? Or is Mediator pretty complex at this point?
Until a little more detail is available in the Mediator class and your producer class, it's some wild guessing. It seems like some thread may be holding the lock when you don't expect it to. Once you pulse, you do need to free the lock in whatever thread may have it by exiting the "lock" scope. So, if somewhere in Mediator you have the lock and then call Pulse, you need to exit the outer most scope where the lock is held and not just the one in Pulse.
Can you refactor to a normal consumer/ producer queue? That could then handle enqueing and dequing and thread-signalling in a single class, so no need to pass around public locks. Dequeing process could then be handled via a delegate. I can post an example if you wish.
Greetings.
I'm trying to implement some multithreaded code in an application. The purpose of this code is to validate items that the database gives it. Validation can take quite a while (a few hundred ms to a few seconds), so this process needs to be forked off into its own thread for each item.
The database may give it 20 or 30 items a second in the beginning, but that begins to decline rapidly, eventually reaching about 65K items over 24 hours, at which point the application exits.
I'd like it if anyone more knowledgeable could take a peek at my code and see if there's any obvious problems. No one I work with knows multithreading, so I'm really just on my own, on this one.
Here's the code. It's kinda long but should be pretty clear. Let me know if you have any feedback or advice. Thanks!
public class ItemValidationService
{
/// <summary>
/// The object to lock on in this class, for multithreading purposes.
/// </summary>
private static object locker = new object();
/// <summary>Items that have been validated.</summary>
private HashSet<int> validatedItems;
/// <summary>Items that are currently being validated.</summary>
private HashSet<int> validatingItems;
/// <summary>Remove an item from the index if its links are bad.</summary>
/// <param name="id">The ID of the item.</param>
public void ValidateItem(int id)
{
lock (locker)
{
if
(
!this.validatedItems.Contains(id) &&
!this.validatingItems.Contains(id)
){
ThreadPool.QueueUserWorkItem(sender =>
{
this.Validate(id);
});
}
}
} // method
private void Validate(int itemId)
{
lock (locker)
{
this.validatingItems.Add(itemId);
}
// *********************************************
// Time-consuming routine to validate an item...
// *********************************************
lock (locker)
{
this.validatingItems.Remove(itemId);
this.validatedItems.Add(itemId);
}
} // method
} // class
The thread pool is a convenient choice if you have light weight sporadic processing that isn't time sensitive. However, I recall reading on MSDN that it's not appropriate for large scale processing of this nature.
I used it for something quite similar to this and regret it. I took a worker-thread approach in subsequent apps and am much happier with the level of control I have.
My favorite pattern in the worker-thread model is to create a master thread which holds a queue of tasks items. Then fork a bunch of workers that pop items off that queue to process. I use a blocking queue so that when there are no items the process, the workers just block until something is pushed onto the queue. In this model, the master thread produces work items from some source (db, etc.) and the worker threads consume them.
I second the idea of using a blocking queue and worker threads. Here is a blocking queue implementation that I've used in the past with good results:
https://www.codeproject.com/Articles/8018/Bounded-Blocking-Queue-One-Lock
What's involved in your validation logic? If its mainly CPU bound then I would create no more than 1 worker thread per processor/core on the box. This will tell you the number of processors:
Environment.ProcessorCount
If your validation involves I/O such as File Access or database access then you could use a few more threads than the number of processors.
Be careful, QueueUserWorkItem might fail
There is a possible logic error in the code posted with the question, depending on where the item id in ValidateItem(int id) comes from. Why? Because although you correctly lock your validatingItems and validatedItems queues before queing a work item, you do not add the item to the validatingItems queue until the new thread spins up. That means there could be a time gap where another thread calls ValidateItem(id) with the same id (unless this is running on a single main thread).
I would add item to the validatingItems queue just before queuing the item, inside the lock.
Edit: also QueueUserWorkItem() returns a bool so you should use the return value to make sure the item was queued and THEN add it to the validatingItems queue.
ThreadPool may not be optimal for jamming so much at once into it. You may want to research the upper limits of its capabilities and/or roll your own.
Also, there is a race condition that exists in your code, if you expect no duplicate validations. The call to
this.validatingItems.Add(itemId);
needs to happen in the main thread (ValidateItem), not in the thread pool thread (Validate method). This call should occur a line before the queueing of the work item to the pool.
A worse bug is found by not checking the return of QueueUserWorkItem. Queueing can fail, and why it doesn't throw an exception is a mystery to us all. If it returns false, you need to remove the item that was added to the validatingItems list, and handle the error (throw exeception probably).
I would be concerned about performance here. You indicated that the database may give it 20-30 items per second and an item could take up to a few seconds to be validated. That could be quite a large number of threads -- using your metrics, worst case 60-90 threads! I think you need to reconsider the design here. Michael mentioned a nice pattern. The use of the queue really helps keep things under control and organized. A semaphore could also be employed to control number of threads created -- i.e. you could have a maximum number of threads allowed, but under smaller loads, you wouldn't necessarily have to create the maximum number if fewer ended up getting the job done -- i.e. your own pool size could be dynamic with a cap.
When using the thread-pool, I also find it more difficult to monitor the execution of threads from the pool in their performing the work. So, unless it's fire and forget, I am in favor of more controlled execution. I know you mentioned that your app exits after the 65K items are all completed. How are you monitoring you threads to determine if they have completed their work -- i.e. all queued workers are done. Are you monitoring the status of all items in the HashSets? I think by queuing your items up and having your own worker threads consume off that queue, you can gain more control. Albeit, this can come at the cost of more overhead in terms of signaling between threads to indicate when all items have been queued allowing them to exit.
You could also try using the CCR - Concurrency and Coordination Runtime. It's buried inside Microsoft Robotics Studio, but provides an excellent API for doing this sort of thing.
You'd just need to create a "Port" (essentially a queue), hook up a receiver (method that gets called when something is posted to it), and then post work items to it. The CCR handles the queue and the worker thread to run it on.
Here's a video on Channel9 about the CCR.
It's very high-performance and is even being used for non-Robotics stuff (Myspace.com uses it behind the scenese for their content-delivery network).
I would recommend looking into MSDN: Task Parallel Library - DataFlow. You can find examples of implementing Producer-Consumer in your case would be the database producing items to validate and the validation routine becomes the consumer.
Also recommend using ConcurrentDictionary<TKey, TValue> as a "Concurrent" hash set where you just populate the keys with no values :). You can potentially make your code lock-free.