What is the best Queue Data structure to use in C# when the Queue needs to be accsible for Enqueue() on multiple threads but only needs to Dequeue() on a single main thread? My thread structure looks like this:
Main Thread - Consumer
Sub Thread1 - Producer
Sub Thread2 - Producer
Sub Thread3 - Producer
I have a single Queue<T> queue that holds all items produced by the sub-threads and the Main Thread calls queue.Dequeue() until it is empty. I have the following function that is called on my Main Thread for this purpose.
public void ConsumeItems()
{
while (queue.Count > 0)
{
var item = queue.Dequeue();
...
}
}
The Main Thread calls this function once through each thread loop and I want to make sure I am accessing queue in a thread-safe manor but I also want to avoid locking queue if possible for performance reasons.
The one you would want to use is a BlockingCollection<T> which by default is backed by a ConcurrentQueue<T>. To get items out of the queue you would use .GetConsumingEnumerable() from inside a foreach
public BlockingCollection<Item> queue = new BlockingCollection<Item>();
public void LoadItems()
{
var(var item in SomeDataSource())
{
queue.Add(item);
}
queue.CompleteAdding();
}
public void ConsumeItems()
{
foreach(var item in queue.GetConsumingEnumerable())
{
...
}
}
When the queue is empty the foreach will block the thread and unblock as soon as a item becomes available. once .CompleteAdding() has been called the foreach will finish processing any items in the queue but once it is empty it will exit the foreach block.
However, before you do this, I would recommend you look in to TPL Dataflow, with it you don't need to manage the queues or the threads anymore. It lets you build chains of logic and each block in the chain can have a separate level of concurrency.
public Task ProcessDataAsync(IEnumerable<SomeInput> input)
{
using(var outfile = new File.OpenWrite("outfile.txt"))
{
//Create a convert action that uses the number of processors on the machine to create parallel blocks for processing.
var convertBlock = new TransformBlock<SomeInput, string>(x => CpuIntensiveConversion(x), new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = Enviorment.ProcessorCount});
//Create a single threaded action that writes out to the textwriter.
var writeBlock = new ActionBlock<string>(x => outfile.WriteLine(x))
//Link the convert block to the write block.
convertBlock.LinkTo(writeBlock, new DataflowLinkOptions{PropagateCompletion = true});
//Add items to the convert block's queue.
foreach(var item in input)
{
await convertBlock.SendAsync();
}
//Tell the convert block we are done adding. This will tell the write block it is done processing once all items are processed.
convertBlock.Complete();
//Wait for the write to finish writing out to the file;
await writeBlock.Completion;
}
}
Related
I have to write a program where I'm reading from a database the queues to process and all the queues are run in parallel and managed on the parent thread using a ConcurrentDictionary.
I have a class that represents the queue, which has a constructor that takes in the queue information and the parent instance handle. The queue class also has the method that processes the queue.
Here is the Queue Class:
Class MyQueue {
protected ServiceExecution _parent;
protect string _queueID;
public MyQueue(ServiceExecution parentThread, string queueID)
{
_parent = parentThread;
_queueID = queueID;
}
public void Process()
{
try
{
//Do work to process
}
catch()
{
//exception handling
}
finally{
_parent.ThreadFinish(_queueID);
}
The parent thread loops through the dataset of queues and instantiates a new queue class. It spawns a new thread to execute the Process method of the Queue object asynchronously. This thread is added to the ConcurrentDictionary and then started as follows:
private ConcurrentDictionary<string, MyQueue> _runningQueues = new ConcurrentDictionary<string, MyQueue>();
Foreach(datarow dr in QueueDataset.rows)
{
MyQueue queue = new MyQueue(this, dr["QueueID"].ToString());
Thread t = new Thread(()=>queue.Process());
if(_runningQueues.TryAdd(dr["QueueID"].ToString(), queue)
{
t.start();
}
}
//Method that gets called by the queue thread when it finishes
public void ThreadFinish(string queueID)
{
MyQueue queue;
_runningQueues.TryRemove(queueID, out queue);
}
I have a feeling this is not the right approach to manage the asynchronous queue processing and I'm wondering if perhaps I can run into deadlocks with this design? Furthermore, I would like to use Tasks to run the queues asynchronously instead of the new Threads. I need to keep track of the queues because I will not spawn a new thread or task for the same queue if the previous run is not complete yet. What is the best way to handle this type of parallelism?
Thanks in advance!
About your current approach
Indeed it is not the right approach. High number of queues read from database will spawn high number of threads which might be bad. You will create a new thread each time. Better to create some threads and then re-use them. And if you want tasks, better to create LongRunning tasks and re-use them.
Suggested Design
I'd suggest the following design:
Reserve only one task to read queues from the database and put those queues in a BlockingCollection;
Now start multiple LongRunning tasks to read a queue each from that BlockingCollection and process that queue;
When a task is done with processing the queue it took from the BlockingCollection, it will then take another queue from that BlockingCollection;
Optimize the number of these processing tasks so as to properly utilize the cores of your CPU. Usually since DB interactions are slow, you can create tasks 3 times more than the number of cores however YMMV.
Deadlock possibility
They will at least not happen at the application side. However, since the queues are of database transactions, the deadlock may happen at the database end. You may have to write some logic to make your task start a transaction again if the database rolled it back because of deadlock.
Sample Code
private static void TaskDesignedRun()
{
var expectedParallelQueues = 1024; //Optimize it. I've chosen it randomly
var parallelProcessingTaskCount = 4 * Environment.ProcessorCount; //Optimize this too.
var baseProcessorTaskArray = new Task[parallelProcessingTaskCount];
var taskFactory = new TaskFactory(TaskCreationOptions.LongRunning, TaskContinuationOptions.None);
var itemsToProcess = new BlockingCollection<MyQueue>(expectedParallelQueues);
//Start a new task to populate the "itemsToProcess"
taskFactory.StartNew(() =>
{
// Add code to read queues and add them to itemsToProcess
Console.WriteLine("Done reading all the queues...");
// Finally signal that you are done by saying..
itemsToProcess.CompleteAdding();
});
//Initializing the base tasks
for (var index = 0; index < baseProcessorTaskArray.Length; index++)
{
baseProcessorTaskArray[index] = taskFactory.StartNew(() =>
{
while (!itemsToProcess.IsAddingCompleted && itemsToProcess.Count != 0) {
MyQueue q;
if (!itemsToProcess.TryTake(out q)) continue;
//Process your queue
}
});
}
//Now just wait till all queues in your database have been read and processed.
Task.WaitAll(baseProcessorTaskArray);
}
I've an application that works with a queue with strings (which corresponds to different tasks that application needs to perform). At random moments the queue can be filled with strings (like several times a minute sometimes but it also can take a few hours.
Till now I always had a timer that checked every few seconds the queue whether there were items in the queue and removed them.
I think there must be a nicer solution than this way. Is there any way to get an event or so when an item is added to the queue?
Yes. Take a look at TPL Dataflow, in particular, the BufferBlock<T>, which does more or less the same as BlockingCollection without the nasty side-effect of jamming up your threads by leveraging async/await.
So you can:
void Main()
{
var b = new BufferBlock<string>();
AddToBlockAsync(b);
ReadFromBlockAsync(b);
}
public async Task AddToBlockAsync(BufferBlock<string> b)
{
while (true)
{
b.Post("hello");
await Task.Delay(1000);
}
}
public async Task ReadFromBlockAsync(BufferBlock<string> b)
{
await Task.Delay(10000); //let some messages buffer up...
while(true)
{
var msg = await b.ReceiveAsync();
Console.WriteLine(msg);
}
}
I'd take a look at BlockingCollection.GetConsumingEnumerable. The collection will be backed with a queue by default, and it is a nice way to automatically take values from the queue as they are added using a simple foreach loop.
There is also an overload that allows you to supply a CancellationToken meaning you can cleanly break out.
Have you looked at BlockingCollection ? The GetConsumingEnumerable() method allows an indefinite loop to be run on the consumer, to which will new items will be yielded once an item becomes available, with no need for timers, or Thread.Sleep's:
// Common:
BlockingCollection<string> _blockingCollection =
new BlockingCollection<string>();
// Producer
for (var i = 0; i < 100; i++)
{
_blockingCollection.Add(i.ToString());
Thread.Sleep(500); // So you can track the consumer synchronization. Remove.
}
// Consumer:
foreach (var item in _blockingCollection.GetConsumingEnumerable())
{
Debug.WriteLine(item);
}
Looking for a best approach to reading from data source such as Azure Table Storage which is time consuming and converting the data in to json or csv and writing in to local file with file name depending on partition key.
One approach being considered is running the writing to file task on timer elapsed event trigger with fixed time interval.
For things that do not parallize well (like I/O) the best thing to do is use the "Producer-Consumer model".
The way it works is you have one thread handling the non parallizeable task, all that task does is read in to a buffer. Then you have a set of parallel tasks that all read from the buffer and process the data, they then put the data in to another buffer when they are done processing the data. If you then need to write out the result again in a non parallizeable way you then have another single task writing out the result.
public Stream ProcessData(string filePath)
{
using(var sourceCollection = new BlockingCollection<string>())
using(var destinationCollection = new BlockingCollection<SomeClass>())
{
//Create a new background task to start reading in the file
Task.Factory.StartNew(() => ReadInFile(filePath, sourceCollection), TaskCreationOptions.LongRunning);
//Create a new background task to process the read in lines as they come in
Task.Factory.StartNew(() => TransformToClass(sourceCollection, destinationCollection), TaskCreationOptions.LongRunning);
//Process the newly created objects as they are created on the same thread that we originally called the function with
return TrasformToStream(destinationCollection);
}
}
private static void ReadInFile(string filePath, BlockingCollection<string> collection)
{
foreach(var line in File.ReadLines(filePath))
{
collection.Add(line);
}
//This lets the consumer know that we will not be adding any more items to the collection.
collection.CompleteAdding();
}
private static void TransformToClass(BlockingCollection<string> source, BlockingCollection<SomeClass> dest)
{
//GetConsumingEnumerable() will take items out of the collection and block the thread if there are no items available and CompleteAdding() has not been called yet.
Parallel.ForEeach(source.GetConsumingEnumerable(),
(line) => dest.Add(SomeClass.ExpensiveTransform(line));
dest.CompleteAdding();
}
private static Stream TrasformToStream(BlockingCollection<SomeClass> source)
{
var stream = new MemoryStream();
foreach(var record in source.GetConsumingEnumerable())
{
record.Seralize(stream);
}
return stream;
}
I highly recommend you read the free book Patterns for Parallel Programming, it goes in to some detail about this. There is a entire section explaining the Producer-Consumer model in detail.
UPDATE: For small performance boot use GetConsumingPartitioner() instead of GetConsumingEnumerable() from Parallel Extension Extras in the Parallel.ForEach loop. ForEach makes some assumptions about the IEnumerable being passed in that cause it to take extra locks out that it does not need to, by passing a partitioner instead of a enumerable it does not need to take those extra locks.
I have issue with email sending window service. The service starts after every three minutes delay and get messages that are to send from the db, and start sending it. Here is how the code looks like:
MessageFilesHandler MFHObj = new MessageFilesHandler();
List<Broadcostmsg> imidiateMsgs = Manager.GetImidiateBroadCastMsgs(conString);
if (imidiateMsgs.Count > 0)
{
// WriteToFileImi(strLog);
Thread imMsgThread = new Thread(new ParameterizedThreadStart(MFHObj.SendImidiatBroadcast));
imMsgThread.IsBackground = true;
imMsgThread.Start(imidiateMsgs);
}
This sends messages to large lists, and take long to complete sending to a larger list. now the problem occurs when on message is still sending and the service get a new message to send, the previous sending is haulted and new message sending started, although i am using threads, each time service get message to send it initiate a new thread.
Can u please help where i am doing mistake in the code.
I think you are using your code inside a loop which WAITS for new messages, did you manage those waits?? let's see:
while(imidiateMsgs.Count == 0)
{
//Wait for new Message
}
//Now you have a new message Here
//Make a new thread to process message
there are different methods for that wait, I suggest using BlockingQueues:
In public area:
BlockingCollection<Broadcostmsg> imidiateMsgs = new BlockingCollection<Broadcostmsg>();
In your consumer(thread which generates messages):
SendImidiatBroadcast = imidiateMsgs.Take();//this will wait for new message
//Now you have a new message Here
//Make a new thread to process message
In producer(thread which answers messages):
imidiateMsgs.Add(SendImidiatBroadcast);
And you have to use thread pool for making new threads each time to answer messages, don' initialize new thread each time.
It looks like requirement is to build a consumer producer queue. In which producer will keep adding message to a list and consumer would pick item from that list and do some work with it
Only worry for me is, you are each time creating a new Thread to send email rather than picking threads from thread pool. If you keep on creating more and more thread, performance of your application will degrade due to over head created by context switching.
If you are using .Net framwe work 4.0, the soultion become pretty easy. You could use System.Collections.Concurrent.ConcurrentQueue for en-queuing and dequeuing your items. Its thread safe, so no lock objects required. Use Tasks to process your messages.
BlockingCollection takes an IProducerConsumerCollection in its constructor, or it will use a ConcurrentQueue by default if you call its empty constructor.
So to enqueue your messages.
//define a blocking collectiom
var blockingCollection = new BlockingCollection<string>();
//Producer
Task.Factory.StartNew(() =>
{
while (true)
{
blockingCollection.Add("value" + count);
count++;
}
});
//consumer
//GetConsumingEnumerable would wait until it find some item for work
// its similar to while(true) loop that we put inside consumer queue
Task.Factory.StartNew(() =>
{
foreach (string value in blockingCollection.GetConsumingEnumerable())
{
Console.WriteLine("Worker 1: " + value);
}
});
UPDATE
Since you are using FrameWork 3.5. I suggest you have a look at Joseph Albahari's implementation of Consumer/Producer Queue. Its one of the best that you would ever find out.
Taking the code directly from above link
public class PCQueue
{
readonly object _locker = new object();
Thread[] _workers;
Queue<Action> _itemQ = new Queue<Action>();
public PCQueue (int workerCount)
{
_workers = new Thread [workerCount];
// Create and start a separate thread for each worker
for (int i = 0; i < workerCount; i++)
(_workers [i] = new Thread (Consume)).Start();
}
public void Shutdown (bool waitForWorkers)
{
// Enqueue one null item per worker to make each exit.
foreach (Thread worker in _workers)
EnqueueItem (null);
// Wait for workers to finish
if (waitForWorkers)
foreach (Thread worker in _workers)
worker.Join();
}
public void EnqueueItem (Action item)
{
lock (_locker)
{
_itemQ.Enqueue (item); // We must pulse because we're
Monitor.Pulse (_locker); // changing a blocking condition.
}
}
void Consume()
{
while (true) // Keep consuming until
{ // told otherwise.
Action item;
lock (_locker)
{
while (_itemQ.Count == 0) Monitor.Wait (_locker);
item = _itemQ.Dequeue();
}
if (item == null) return; // This signals our exit.
item(); // Execute item.
}
}
}
The advantage with this approach is you can control the number of Threads that you need to create for optimized performance. With threadpools approach, although its safe, you can not control the number of threads that could be created simultaneously.
What would be the correct usage of either, BlockingCollection or ConcurrentQueue so you can freely dequeue items without burning out half or more of your CPU using a thread ?
I was running some tests using 2 threads and unless I had a Thread.Sleep of at least 50~100ms it would always hit at least 50% of my CPU.
Here is a fictional example:
private void _DequeueItem()
{
object o = null;
while(socket.Connected)
{
while (!listOfQueueItems.IsEmpty)
{
if (listOfQueueItems.TryDequeue(out o))
{
// use the data
}
}
}
}
With the above example I would have to set a thread.sleep so the cpu doesnt blow up.
Note: I have also tried it without the while for IsEmpty check, result was the same.
It is not because of the BlockingCollection or ConcurrentQueue, but the while loop:
while(socket.Connected)
{
while (!listOfQueueItems.IsEmpty)
{ /*code*/ }
}
Of course it will take the cpu down; because of if the queue is empty, then the while loop is just like:
while (true) ;
which in turn will eat the cpu resources.
This is not a good way of using ConcurrentQueue you should use AutoResetEvent with it so whenever item is added you will be notified.
Example:
private ConcurrentQueue<Data> _queue = new ConcurrentQueue<Data>();
private AutoResetEvent _queueNotifier = new AutoResetEvent(false);
//at the producer:
_queue.Enqueue(new Data());
_queueNotifier.Set();
//at the consumer:
while (true)//or some condition
{
_queueNotifier.WaitOne();//here we will block until receive signal notification.
Data data;
if (_queue.TryDequeue(out data))
{
//handle the data
}
}
For a good usage of the BlockingCollection you should use the GetConsumingEnumerable() to wait for the items to be added, Like:
//declare the buffer
private BlockingCollection<Data> _buffer = new BlockingCollection<Data>(new ConcurrentQueue<Data>());
//at the producer method:
_messageBuffer.Add(new Data());
//at the consumer
foreach (Data data in _buffer.GetConsumingEnumerable())//it will block here automatically waiting from new items to be added and it will not take cpu down
{
//handle the data here.
}
You really want to be using the BlockingCollection class in this case. It is designed to block until an item appears in the queue. A collection of this nature is often referred to as a blocking queue. This particular implementation is safe for multiple producers and multiple consumers. That is something that is surprisingly difficult to get right if you tried implementing it yourself. Here is what your code would look like if you used BlockingCollection.
private void _DequeueItem()
{
while(socket.Connected)
{
object o = listOfQueueItems.Take();
// use the data
}
}
The Take method blocks automatically if the queue is empty. It blocks in a manner that puts the thread in the SleepWaitJoin state so that it will not consume CPU resources. The neat thing about BlockingCollection is that it also uses low-lock strategies to increase performance. What this means is that Take will check to see if there is an item in the queue and if not then it will briefly perform a spin wait to prevent a context switch of the thread. If the queue is still empty then it will put the thread to sleep. This means that BlockingCollection will have some of the performance benefits that ConcurrentQueue provides in regards to concurrent execution.
You can call Thread.Sleep() only when queue is empty:
private void DequeueItem()
{
object o = null;
while(socket.Connected)
{
if (listOfQueueItems.IsEmpty)
{
Thread.Sleep(50);
}
else if (listOfQueueItems.TryDequeue(out o))
{
// use the data
}
}
}
Otherwise you should consider to use events.