The problem is the classic N producers (with N possibly large) and X consumers with limited resources (X is currently 4). The producers messages come in (say via MQTT) and get queued to be processed by consumers on a FIFO basis. The important part of the processing is that each consumer may need to send back to the producers one or more "replies" and such replies should be at least some time apart (the exact delay is not important). The classic solution where one starts X tasks that wait on the message queue, process and loop is easy to implement using, for example, System.Threading.Channels:
while (!cancellationToken.IsCancellationRequested && await queue.Reader.WaitToReadAsync()) {
while (queue.Reader.TryRead(out IncomingMessage item)) {
// Do some processing.
SendResponse(1);
// Do some more processing.
if (needsToSend2Response) {
await Task.Delay(500);
SendResponse(2);
}
}
}
This works and works well except that if a task needs to delay it can't process any more messages and that's obviously bad.
Possible solutions I thought of:
Use an outbound queue that process the messages and makes sure there is at least a minimum delay between messages sent to the same producer.
Don't use a queue. Just start a new Task every time a new message comes in and arbitrate the limited resources using a semaphore: it works but I don't see how to guarantee the FIFO requirement (some times messages from the same producer are processed in the wrong order).
Any other ideas?
Related
Main program starts a task which is consumer task for Channel. Consumer task code is :
while (await channelReader.WaitToReadAsync())
{
while(channelReader.TryRead(out item))
{
console.writeline("item is"+item.clientId);
}
}
There is socket server. When socket client connects to server, server starts task for each client. Each client task is a producer for the channel.
producer code is:
foreach (var item in itemList )
{
item.clientId = clientId;
await drawChannelWriter.WriteAsync(item);
}
What happens is: all elements from one client are received first, then all elements from other task are received.
Why there are not different elements from different task mixed in the channel?
Channels is a high-performance async-ready producer/consumer queue. The "high performance" bit means it will complete synchronously whenever possible.
When socket client connects to server, server starts task for each client. Each client task is a producer for the channel.
This is fine. The first thing I'd check is whether each client is running on a separate thread (e.g., Task.Run), or if they're just async tasks.
What happens is: all elements from one client are received first, then all elements from other task are received. Why there are not different elements from different task mixed in the channel?
I would expect a WriteAsync on an unbounded channel to always complete synchronously. Internally, it's just adding the item to a queue, and since the queue always has room (in an unbounded channel), there's nothing AFAIK that would make WriteAsync complete asynchronously. So the entire foreach loop in your produces runs synchronously.
There's nothing wrong with this behavior. If your producers are all running on different threads, then they may be running their foreach loops concurrently, and the elements may be interleaved. But each foreach loop will run very quickly, so it's likely that one thread will add all its items before another thread gets a chance.
I am writing a real time application which receives around 2000 messages per second which was pushed in a queue. I have written a background thread which process the messages in the queue.
private void ProcessSocketMessage()
{
while (!this.shouldStopProcessing)
{
while (this.messageQueue.Count > 0)
{
string message;
bool result = this.messageQueue.TryDequeue(out message);
if (result)
{
// Process the string and do some other stuff
// Like updating the received message in a datagrid
}
}
}
}
The problem with the above code is that it uses insane amount of processing power around 12% of CPU(2.40 GHz dual core processor).
I have 4 blocks similar to the one above which literally takes up 50 % of CPU computing power.
Is there anything which can be optimized in the above code?
Adding a Thread Sleep of 100 ms before second while loop end does seems to be increase the performance by 50%. But am I doing something wrong?
This functionality is already provided in the Dataflow library's ActionBlock class. An ActionBlock has an input buffer that receives messages and processes them by calling an action for each one. By default, only one message is processed at a time. It doesn't use busy waiting.
void MyActualProcessingMethod(string it)
{
// Process the string and do some other stuff
}
var myBlock = new ActionBlock<string>( someString =>MyActualProcessingMethod(someString));
//Simulate a lot of messages
for(int i=0;i<100000;i++)
{
myBlock.Post(someMessage);
}
When the messages finish and/or we don't want any more messages, we command it to complete, by refusing any new messages and processing anything left in the input buffer:
myBlock.Complete();
Before we finish, we need to actually await for the block to finish processing the leftovers:
await myBlock.Completion;
All Dataflow blocks can accept messages from multiple clients.
Blocks can be combined as well. The output of one block can feed another. The TransformBlock accepts a function that transforms an input into an output.
Typically each block uses tasks from the thread pool. By default one block processes only one message at a time. Different blocks run on different tasks or even different TaskSchedulers. This way, you can have one block do some heavy processing and push a result to another block that updates the UI.
string MyActualProcessingMethod(string it)
{
// Process the string and do some other stuff
// and send a progress message downstream
return SomeProgressMessage;
}
void UpdateTheUI(string msg)
{
statusBar1.Text = msg;
}
var myProcessingBlock = new TransformBlock<string,string>(msg =>MyActualProcessingMethod(msg));
The UI will be updated by another block that runs on the UI thread. This is expressed through the ExecutionDataflowBlockOptions :
var runOnUI=new ExecutionDataflowBlockOptions {
TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext()
};
var myUpdater = new ActionBlock<string>(msg => UpdateTheUI(msg),runOnUI);
//Pass progress messages from the processor to the updater
myProcessingBlock.LinkTo(myUpdater,new DataflowLinkOptions { PropagateCompletion = true });
The code that posts messages to the pipeline's first block doesn't change :
//Simulate a lot of messages
for(int i=0;i<100000;i++)
{
myProcessingBlock.Post(someMessage);
}
//We are finished, tell the block to process any leftover messages
myProcessingBlock.Complete();
In this case, as soon as the procesor completes it will notify the next block in the pipeline to complete. We need to wait for that final block to complete as well
//Wait for the block to finish
await myUpdater.Completion;
How about making the first block work in parallel? We can specify that up to eg 10 tasks will be used to process input messages through its execution options :
var dopOptions = new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 10};
var myProcessingBlock = new TransformBlock<string,string>(msg =>MyActualProcessingMethod(msg),dopOptions);
The processor will process up to 10 messages in parallel but the updater will still process them one by one, in the UI thread.
You're best bet is to use a profile to monitor the running application and determine for sure where the CPU is spending it's time.
However, it looks like you have the possibility for a busy-wait loop if this.messageQueue.Count is 0. At minimum, I would suggest adding a small pause if the queue is empty to allow a message to go onto the queue. Otherwise your CPU is just spending time checking the queue over and over and over.
If the time is spent dequeueing messages, you may want to consider handling multiple messages at once (if there are multiple messages available), assuming you're queue allows you to pop multiple messages off the queue in a single call.
Here my scenario. I am getting a large amount of data in chuck from an external data source and I have to write it locally at two places. One of the destination is very slow to write to but the other one is super fast (but I cannot rely on it to read and write to the slow destination). To accomplish that, I am using a Producer-Consumer pattern (using BlockingCollection).
The issue I have right now is that I have to queue the data in two BlockingCollection and that takes way too much memory. My code look very similar to the example below but I would really like to drive the two Task from a single queue. Anybody know what would be the proper way to do that? Any inefficiencies in the code below?
class Program
{
const int MaxNumberOfWorkItems = 15;
static BlockingCollection<int> slowBC = new BlockingCollection<int>(MaxNumberOfWorkItems);
static BlockingCollection<int> fastBC = new BlockingCollection<int>(MaxNumberOfWorkItems);
static void Main(string[] args)
{
Task slowTask = Task.Factory.StartNew(() =>
{
foreach (var item in slowBC.GetConsumingEnumerable())
{
Console.WriteLine("SLOW -> " + item);
Thread.Sleep(25);
}
});
Task fastTask = Task.Factory.StartNew(() =>
{
foreach (var item in fastBC.GetConsumingEnumerable())
{
Console.WriteLine("FAST -> " + item);
}
});
// Population two BlockingCollections with the same data. How can I have a single collection?
for (int i = 0; i < 100; i++)
{
while (slowBC.TryAdd(i) == false)
{
Console.WriteLine("Wait for slowBC...");
}
while (fastBC.TryAdd(i) == false)
{
Console.WriteLine("Wait for 2...");
}
}
slowBC.CompleteAdding();
fastBC.CompleteAdding();
Task.WaitAll(slowTask, fastTask);
Console.ReadLine();
}
}
Using a producer-consumer queue to transfer single ints is extremely inefficient. You are rx it in chunks, so why not type the queue as '*chunk' and send the whole chunk, immediately creating/depooling a new chunk at the same ref. varaible to rx. the next lot of data? This is how P-C queues are normally be used for non-trivial amounts of data - queueing refs/pointers, not actual data. Threads have shared memory spaces, (that some developers seem to think just causes problems), so use it - queue pointers/refs and safely transfer MB of data as one pointer. As long as you, IN THE NEXT LINE OF CODE, always create/depool a new one after queueing off the old one, the producer and consumer threads can never be operating on the same chunk.
Queueing *chunks is powers-of-10 times more efficient for large chunks.
Send the *chunks to the fast link then just 'forward' them to the slow link from there.
You may need flow-control overal if the slow link is not to block up your system and cause eventual OOM errors. What I usually do is fix an 'overall' quota for the total buffer size and create a pool of chunks at startup, (pool is another BlockingCollection, populated with *new(chunks) at startup). The producer thread dequeues chunks, fills them with data, queues them to the FAST thread. The FAST thread processes received chunks and then queues the *chunks to the SLOW thread. The SLOW thread processes the same data and then repools the 'used' chunk for re-use by the producer thread. This forms a flow-controlled system - if the SLOW thred is too slow, the producer eventually tries to depool a *chunk from an empty pool and so blocks there until the SLOW thread repools some used *chunks and so signals the producer thread to run again. You may need some policy in the slow thread to time-out its operations and dump its *chunk early, so dropping data - you must decide on a policy for that given your overall requirements - it is obviously impossible to continually queue data to a fast and slow consumer forever without memory overflow unless the slow consumer dumps some data.
Edit - Oh, and yes, using a pool eliminates GC on the used chunks, further increasing performance.
One overall flow policy would be to not dump any data in the slow thread. With continual high data flow, the *chunks will all end up being on the queue between the fast and slow threads and the producer thread will indeed block on the empty pool. The network connection wil then apply its own flow-contol to stop the network peer sending any more dat over TCP. This extends the flow -control all the way from your slow thread to the peer.
I have this problem where a system contains nodes (windows services) that push messages to be processed and others that pull messages and process them.
This has been designed in a way that the push nodes balance the load between queues by maintaining a round-robin list of queues and rotating queues after each send. Therefore message 1 will go to queue 1, message 2 to queue 2 etc. This part has been working great so far.
On the message pull end we designed it such that the messages are retrieved in a similar way - first from queue 1, then from queue 2 etc. In theory, each pull node sits on a different machine and in practice, so far, it only listened on a single queue. But a recent requirement made us have a pull node in a machine that listens to more than one queue: One that typically is extremely busy and filled with millions of messages and one that generally only contains a handful of messages.
The problem we are facing is that the way we architected originally the pull nodes goes from queue to queue until a message is found. If it times out (say after a sec) then it moves on to the next queue.
This doesnt work anymore cause Q1 (filled with millions of messages) will be delayed approximately a second per message since after each pull from Q1 we will ask Q2 for a message (and if it doesnt contain any we will wait for a second).
So it goes like this:
Q1 contains 10 messages and Q2 contains none
Pull node asks for a message from Q1
Q1 returns message immediately
Pull node asks for a message from Q1
------------ Waiting for a second ------------- (Q2 is empty and request times out)
Pull node asks for a message from Q1
Q1 returns message immediately
Pull node asks for a message from Q1
------------ Waiting for a second ------------- (Q2 is empty and request times out)
etc.
So this is clearly wrong.
I guess I am looking for the best architectural solution here. Message processing does not need to be as real-time as possible but needs to be robust and no message should ever be lost!
I would like to hear your views on this problem.
Thank in advance
Yannis
Maybe you could use the ReceiveCompleted event in the MessageQueue class? No need to poll then.
I ended up creating a set of threads - one for each msmq that needs to be processed. In the constructor I initialize those threads:
Storages.ForEach(queue =>
{
Task task = Task.Factory.StartNew(() =>
{
LoggingManager.LogInfo("Starting a local thread to read in mime messages from queue " + queue.Name, this.GetType());
while (true)
{
WorkItem mime = queue.WaitAndRetrieve();
if (mime != null)
{
_Semaphore.WaitOne();
_LocalStorage.Enqueue(mime);
lock (_locker) Monitor.Pulse(_locker);
LoggingManager.LogDebug("Adding no. " + _LocalStorage.Count + " item in queue", this.GetType());
}
}
});
});
The _LocalStorage is a thread-safe Queue implementation (ConcurrentQueue introduced in .NET 4.0)
The Semaphore is a counting semaphore to control inserts in the _LocalStorage. The _LocalStorage is basically a buffer of received messages but we dont want it to get too large while processing nodes are busy doing work. The effect could be that we retrieve ALL the msmq messages in that _LocalStorage but are busy processing only 5 of them or so. This is bad both in terms of resilience (if the program terminates unexpectedly we lose all these messages) and also in terms of performance as the memory consumption for holding all these items in memory will be huge. So we need to control how many items we hold in the _LocalStorage buffer queue.
We Pulse threads waiting for work (see below) that a new item was added to the queue by doing a simple Monitor.Pulse
The code that dequeues work items from the queue is as follows:
lock (_locker)
if (_LocalStorage.Count == 0)
Monitor.Wait(_locker);
WorkItem result;
if (_LocalStorage.TryDequeue(out result))
{
_Semaphore.Release();
return result;
}
return null;
I hope this can help someone to sort out a similar issue.
Sorry the title is a bit crappy, I couldn't quite word it properly.
Edit: I should note this is a console c# app
I've prototyped out a system that works like so (this is rough pseudo-codeish):
var collection = grabfromdb();
foreach (item in collection) {
SendAnEmail();
}
SendAnEmail:
SmtpClient mailClient = new SmtpClient;
mailClient.SendCompleted += new SendCompletedEventHandler(SendComplete);
mailClient.SendAsync('the mail message');
SendComplete:
if (anyErrors) {
errorHandling()
}
else {
HitDBAndMarkAsSendOK();
}
Obviously this setup is not ideal. If the initial collection has, say 10,000 records, then it's going to new up 10,000 instances of smtpclient in fairly short order as quickly as it can step through the rows - and likely asplode in the process.
My ideal end game is to have something like 10 concurrent email going out at once.
A hacky solution comes to mind: Add a counter, that increments when SendAnEmail() is called, and decrements when SendComplete is sent. Before SendAnEmail() is called in the initial loop, check the counter, if it's too high, then sleep for a small period of time and then check it again.
I'm not sure that's such a great idea, and figure the SO hive mind would have a way to do this properly.
I have very little knowledge of threading and not sure if it would be an appropriate use here. Eg sending email in a background thread, first check the number of child threads to ensure there's not too many being used. Or if there is some type of 'thread throttling' built in.
Update
Following in the advice of Steven A. Lowe, I now have:
A Dictionary holding my emails and a unique key (this is the email que
A FillQue Method, which populates the dictionary
A ProcessQue method, which is a background thread. It checks the que, and SendAsycs any email in the que.
A SendCompleted delegate which removes the email from the que. And calls FillQue again.
I've a few problems with this setup. I think I've missed the boat with the background thread, should I be spawning one of these for each item in the dictionary? How can I get the thread to 'hang around' for lack of a better word, if the email que empties the thread ends.
final update
I've put a 'while(true) {}' in the background thread. If the que is empty, it waits a few seconds and tries again. If the que is repeatedly empty, i 'break' the while, and the program ends... Works fine. I'm a bit worried about the 'while(true)' business though..
Short Answer
Use a queue as a finite buffer, processed by its own thread.
Long Answer
Call a fill-queue method to create a queue of emails, limited to (say) 10. Fill it with the first 10 unsent emails. Launch a thread to process the queue - for each email in the queue, send it asynch. When the queue is empty sleep for a while and check again. Have the completion delegate remove the sent or errored email from the queue and update the database, then call the fill-queue method to read more unsent emails into the queue (back up to the limit).
You'll only need locks around the queue operations, and will only have to manage (directly) the one thread to process the queue. You will never have more than N+1 threads active at once, where N is the queue limit.
I believe your hacky solution actually would work. Just make sure you have a lock statement around the bits where you increment and decrement the counter:
class EmailSender
{
object SimultaneousEmailsLock;
int SimultaneousEmails;
public string[] Recipients;
void SendAll()
{
foreach(string Recipient in Recipients)
{
while (SimultaneousEmails>10) Thread.Sleep(10);
SendAnEmail(Recipient);
}
}
void SendAnEmail(string Recipient)
{
lock(SimultaneousEmailsLock)
{
SimultaneousEmails++;
}
... send it ...
}
void FinishedEmailCallback()
{
lock(SimultaneousEmailsLock)
{
SimultaneousEmails--;
}
... etc ...
}
}
I would add all my messages to a Queue, and then spawn i.e. 10 threads which sent emails until the Queue was empty. Pseudo'ish C# (probably wont compile):
class EmailSender
{
Queue<Message> messages;
List<Thread> threads;
public Send(IEnumerable<Message> messages, int threads)
{
this.messages = new Queue<Message>(messages);
this.threads = new List<Thread>();
while(threads-- > 0)
threads.Add(new Thread(SendMessages));
threads.ForEach(t => t.Start());
while(threads.Any(t => t.IsAlive))
Thread.Sleep(50);
}
private SendMessages()
{
while(true)
{
Message m;
lock(messages)
{
try
{
m = messages.Dequeue();
}
catch(InvalidOperationException)
{
// No more messages
return;
}
}
// Send message in some way. Not in an async way,
// since we are already kind of async.
Thread.Sleep(); // Perhaps take a quick rest
}
}
}
If the message is the same, and just having many recipients, just swap the Message with a Recipient, and add a single Message parameter to the Send method.
You could use a .NET Timer to setup the schedule for sending messages. Whenever the timer fires, grab the next 10 messages and send them all, and repeat. Or if you want a general (10 messages per second) rate you could have the timer fire every 100ms, and send a single message every time.
If you need more advanced scheduling, you could look at a scheduling framework like Quartz.NET
Isn't this something that Thread.Sleep() can handle?
You are correct in thinking that background threading can serve a good purpose here. Basically what you want to do is create a background thread for this process, let it run its own way, delays and all, and then terminate the thread when the process is done, or leave it on indefinitely (turning it into a Windows Service or something similar will be a good idea).
A little intro on multi-threading can be read here (with Thread.Sleep included!).
A nice intro on Windows Services can be read here.