Sorry the title is a bit crappy, I couldn't quite word it properly.
Edit: I should note this is a console c# app
I've prototyped out a system that works like so (this is rough pseudo-codeish):
var collection = grabfromdb();
foreach (item in collection) {
SendAnEmail();
}
SendAnEmail:
SmtpClient mailClient = new SmtpClient;
mailClient.SendCompleted += new SendCompletedEventHandler(SendComplete);
mailClient.SendAsync('the mail message');
SendComplete:
if (anyErrors) {
errorHandling()
}
else {
HitDBAndMarkAsSendOK();
}
Obviously this setup is not ideal. If the initial collection has, say 10,000 records, then it's going to new up 10,000 instances of smtpclient in fairly short order as quickly as it can step through the rows - and likely asplode in the process.
My ideal end game is to have something like 10 concurrent email going out at once.
A hacky solution comes to mind: Add a counter, that increments when SendAnEmail() is called, and decrements when SendComplete is sent. Before SendAnEmail() is called in the initial loop, check the counter, if it's too high, then sleep for a small period of time and then check it again.
I'm not sure that's such a great idea, and figure the SO hive mind would have a way to do this properly.
I have very little knowledge of threading and not sure if it would be an appropriate use here. Eg sending email in a background thread, first check the number of child threads to ensure there's not too many being used. Or if there is some type of 'thread throttling' built in.
Update
Following in the advice of Steven A. Lowe, I now have:
A Dictionary holding my emails and a unique key (this is the email que
A FillQue Method, which populates the dictionary
A ProcessQue method, which is a background thread. It checks the que, and SendAsycs any email in the que.
A SendCompleted delegate which removes the email from the que. And calls FillQue again.
I've a few problems with this setup. I think I've missed the boat with the background thread, should I be spawning one of these for each item in the dictionary? How can I get the thread to 'hang around' for lack of a better word, if the email que empties the thread ends.
final update
I've put a 'while(true) {}' in the background thread. If the que is empty, it waits a few seconds and tries again. If the que is repeatedly empty, i 'break' the while, and the program ends... Works fine. I'm a bit worried about the 'while(true)' business though..
Short Answer
Use a queue as a finite buffer, processed by its own thread.
Long Answer
Call a fill-queue method to create a queue of emails, limited to (say) 10. Fill it with the first 10 unsent emails. Launch a thread to process the queue - for each email in the queue, send it asynch. When the queue is empty sleep for a while and check again. Have the completion delegate remove the sent or errored email from the queue and update the database, then call the fill-queue method to read more unsent emails into the queue (back up to the limit).
You'll only need locks around the queue operations, and will only have to manage (directly) the one thread to process the queue. You will never have more than N+1 threads active at once, where N is the queue limit.
I believe your hacky solution actually would work. Just make sure you have a lock statement around the bits where you increment and decrement the counter:
class EmailSender
{
object SimultaneousEmailsLock;
int SimultaneousEmails;
public string[] Recipients;
void SendAll()
{
foreach(string Recipient in Recipients)
{
while (SimultaneousEmails>10) Thread.Sleep(10);
SendAnEmail(Recipient);
}
}
void SendAnEmail(string Recipient)
{
lock(SimultaneousEmailsLock)
{
SimultaneousEmails++;
}
... send it ...
}
void FinishedEmailCallback()
{
lock(SimultaneousEmailsLock)
{
SimultaneousEmails--;
}
... etc ...
}
}
I would add all my messages to a Queue, and then spawn i.e. 10 threads which sent emails until the Queue was empty. Pseudo'ish C# (probably wont compile):
class EmailSender
{
Queue<Message> messages;
List<Thread> threads;
public Send(IEnumerable<Message> messages, int threads)
{
this.messages = new Queue<Message>(messages);
this.threads = new List<Thread>();
while(threads-- > 0)
threads.Add(new Thread(SendMessages));
threads.ForEach(t => t.Start());
while(threads.Any(t => t.IsAlive))
Thread.Sleep(50);
}
private SendMessages()
{
while(true)
{
Message m;
lock(messages)
{
try
{
m = messages.Dequeue();
}
catch(InvalidOperationException)
{
// No more messages
return;
}
}
// Send message in some way. Not in an async way,
// since we are already kind of async.
Thread.Sleep(); // Perhaps take a quick rest
}
}
}
If the message is the same, and just having many recipients, just swap the Message with a Recipient, and add a single Message parameter to the Send method.
You could use a .NET Timer to setup the schedule for sending messages. Whenever the timer fires, grab the next 10 messages and send them all, and repeat. Or if you want a general (10 messages per second) rate you could have the timer fire every 100ms, and send a single message every time.
If you need more advanced scheduling, you could look at a scheduling framework like Quartz.NET
Isn't this something that Thread.Sleep() can handle?
You are correct in thinking that background threading can serve a good purpose here. Basically what you want to do is create a background thread for this process, let it run its own way, delays and all, and then terminate the thread when the process is done, or leave it on indefinitely (turning it into a Windows Service or something similar will be a good idea).
A little intro on multi-threading can be read here (with Thread.Sleep included!).
A nice intro on Windows Services can be read here.
Related
I have built a MQTT client that listens for certain status data. For each message I run a method which can take a while (up to 1 second). Since a lot of messages can arrive at once, I want to run the whole thing in parallel. My problem now is, when I receive a message belonging to topic A, I want to make sure that the previous task belonging to topic A has already finished before I start the new one. But I also need to be able to receive new messages during the time I am waiting for Task A to finish and add them to the queue if necessary. Of course, if the new message belongs to topic B, I don't care about the status of task A and I can run this method call in parallel.
In my mind, this is solved with a kind of dictionary that has different queues.
What about to use a lock on an object related to the topic?
When a new item come in the system you could retrieve/create a lock object from a ConcurrentDictionary and then you could use this object to lock the execution.
something like this.
static ConcurrentDictionary<string,object> _locksByCategory =
new ConcurrentDictionary<string,object>();
async void ProcessItem(ItemType item) {
var lockObject = _locksByCategory(item.Category, new object(), (k, o) => o);
lock (lockObject) {
// your code
}
}
This isn't a production ready solution but could help to start with.
I don't know exactly how you would do it, but it goes along the lines of:
On startup, create a (static? singleton?) Dictionary<Topic, ConcurrentQueue> and for each topic create a thread that does the following:
Wrap the ConcurrentQueue in a BlockingCollection
infinitely loop with BlockingCollection.Take at the start of the loop. This should block until an item is ready, execute the rest of the loop and listen for more items afterwards.
Whenever a message comes in, add it to the corresponding ConcurrentQueue.
I am writing a real time application which receives around 2000 messages per second which was pushed in a queue. I have written a background thread which process the messages in the queue.
private void ProcessSocketMessage()
{
while (!this.shouldStopProcessing)
{
while (this.messageQueue.Count > 0)
{
string message;
bool result = this.messageQueue.TryDequeue(out message);
if (result)
{
// Process the string and do some other stuff
// Like updating the received message in a datagrid
}
}
}
}
The problem with the above code is that it uses insane amount of processing power around 12% of CPU(2.40 GHz dual core processor).
I have 4 blocks similar to the one above which literally takes up 50 % of CPU computing power.
Is there anything which can be optimized in the above code?
Adding a Thread Sleep of 100 ms before second while loop end does seems to be increase the performance by 50%. But am I doing something wrong?
This functionality is already provided in the Dataflow library's ActionBlock class. An ActionBlock has an input buffer that receives messages and processes them by calling an action for each one. By default, only one message is processed at a time. It doesn't use busy waiting.
void MyActualProcessingMethod(string it)
{
// Process the string and do some other stuff
}
var myBlock = new ActionBlock<string>( someString =>MyActualProcessingMethod(someString));
//Simulate a lot of messages
for(int i=0;i<100000;i++)
{
myBlock.Post(someMessage);
}
When the messages finish and/or we don't want any more messages, we command it to complete, by refusing any new messages and processing anything left in the input buffer:
myBlock.Complete();
Before we finish, we need to actually await for the block to finish processing the leftovers:
await myBlock.Completion;
All Dataflow blocks can accept messages from multiple clients.
Blocks can be combined as well. The output of one block can feed another. The TransformBlock accepts a function that transforms an input into an output.
Typically each block uses tasks from the thread pool. By default one block processes only one message at a time. Different blocks run on different tasks or even different TaskSchedulers. This way, you can have one block do some heavy processing and push a result to another block that updates the UI.
string MyActualProcessingMethod(string it)
{
// Process the string and do some other stuff
// and send a progress message downstream
return SomeProgressMessage;
}
void UpdateTheUI(string msg)
{
statusBar1.Text = msg;
}
var myProcessingBlock = new TransformBlock<string,string>(msg =>MyActualProcessingMethod(msg));
The UI will be updated by another block that runs on the UI thread. This is expressed through the ExecutionDataflowBlockOptions :
var runOnUI=new ExecutionDataflowBlockOptions {
TaskScheduler = TaskScheduler.FromCurrentSynchronizationContext()
};
var myUpdater = new ActionBlock<string>(msg => UpdateTheUI(msg),runOnUI);
//Pass progress messages from the processor to the updater
myProcessingBlock.LinkTo(myUpdater,new DataflowLinkOptions { PropagateCompletion = true });
The code that posts messages to the pipeline's first block doesn't change :
//Simulate a lot of messages
for(int i=0;i<100000;i++)
{
myProcessingBlock.Post(someMessage);
}
//We are finished, tell the block to process any leftover messages
myProcessingBlock.Complete();
In this case, as soon as the procesor completes it will notify the next block in the pipeline to complete. We need to wait for that final block to complete as well
//Wait for the block to finish
await myUpdater.Completion;
How about making the first block work in parallel? We can specify that up to eg 10 tasks will be used to process input messages through its execution options :
var dopOptions = new ExecutionDataflowBlockOptions {MaxDegreeOfParallelism = 10};
var myProcessingBlock = new TransformBlock<string,string>(msg =>MyActualProcessingMethod(msg),dopOptions);
The processor will process up to 10 messages in parallel but the updater will still process them one by one, in the UI thread.
You're best bet is to use a profile to monitor the running application and determine for sure where the CPU is spending it's time.
However, it looks like you have the possibility for a busy-wait loop if this.messageQueue.Count is 0. At minimum, I would suggest adding a small pause if the queue is empty to allow a message to go onto the queue. Otherwise your CPU is just spending time checking the queue over and over and over.
If the time is spent dequeueing messages, you may want to consider handling multiple messages at once (if there are multiple messages available), assuming you're queue allows you to pop multiple messages off the queue in a single call.
I'm having a windows service project that logs messages to a database (or other place). The frequency of these messages could go up to ten per second. Since sending and processing the messages shouldn't delay the main process of the service I start a new thread for the processing of every message. This means that if the main process needs to send 100 log messages, 100 threads are started that process each message. I learned that when a thread is done, it will be cleaned so I don't have to dispose it. As long as I dispose all used objects in the thread everything should be working fine.
The service could go into a exception that leads to shutting down the service. Before the service shuts down it should wait for all threads that were logging messages. To achieve this it adds the thread to a list every time a thread is started. When the wait-for-threads method is called, all threads in the list are checked if it is still alive and if so, it uses join to wait for it.
The code:
Creating the thread:
/// <summary>
/// Creates a new thread and sends the message
/// </summary>
/// <param name="logMessage"></param>
private static void ThreadSend(IMessage logMessage)
{
ParameterizedThreadStart threadStart = new ParameterizedThreadStart(MessageHandler.HandleMessage);
Thread messageThread = new Thread(threadStart);
messageThread.Name = "LogMessageThread";
messageThread.Start(logMessage);
threads.Add(messageThread);
}
The waiting for threads to end:
/// <summary>
/// Waits for threads that are still being processed
/// </summary>
public static void WaitForThreads()
{
int i = 0;
foreach (Thread thread in threads)
{
i++;
if (thread.IsAlive)
{
Debug.Print("waiting for {0} - {1} to end...", thread.Name, i);
thread.Join();
}
}
}
Now my main concern is if this service runs for a month it will still have all threads (millions) in the list (most of them dead). This will eat memory and I don't know how much. This in whole doesn't seem to be a good practice to me, I want to clean up finished threads but I can't find out how to do it. Does any one have a good or best practice for this?
Remove the threads from the list if they are dead?
/// <summary>
/// Waits for threads that are still being processed
/// </summary>
public static void WaitForThreads()
{
List<Thread> toRemove = new List<int>();
int i = 0;
foreach (Thread thread in threads)
{
i++;
if (thread.IsAlive)
{
Debug.Print("waiting for {0} - {1} to end...", thread.Name, i);
thread.Join();
}
else
{
toRemove.Add(thread);
}
}
threads.RemoveAll(x => toRemove.Contains(x));
}
Have a look at Task Parallelism
First of all: Creating one thread per log message is not a good idea. Either use ThreadPool or create a limited number of worker threads which handle the log items from a common queue (producer/consumer).
Second: Of course you need to also remove the thread references from the list! Either when the thread method ends, it can remove itself, or you can even do it on a regular basis. For example, have a timer run every half and hour that checks the list for dead threads and removes them.
If all you're doing in those threads is logging, you should probably have a single logging thread and a shared queue that the main thread puts messages on. The logging thread can then read the queue and log. This is incredibly easy with the BlockingCollection.
Create the queue in the service's main thread:
BlockingCollection<IMessage> LogMessageQueue = new BlockingCollection<IMessage>();
Your service's main thread creates a Logger (see below) instance, which starts a thread to process log messages. The main thread adds items to the LogMessageQueue. The logger thread reads them from the queue. When the main thread wants to shut down, it calls LogMessageQueue.CompleteAdding. The logger will empty the queue and exit.
Main thread would look like this:
// start the logger
Logger _loggingThread = new Logger(LogMessageQueue);
// to log a message:
LogMessageQueue.Add(logMessage);
// when the program needs to shut down:
LogMessageQueue.CompleteAdding();
And the logger class:
class Logger
{
BlockingCollection<IMessage> _queue;
Thread _loggingThread;
public Logger(BlockingCollection<IMessage> queue)
{
_queue = queue;
_loggingThread = new Thread(LoggingThreadProc);
}
private void LoggingThreadProc(object state)
{
IMessage msg;
while (_queue.TryTake(out msg, TimeSpan.Infinite))
{
// log the item
}
}
}
This way you have just one additional thread, messages are guaranteed to be processed in the order they're sent (not true of your current approach), and you don't have to worry about keeping track of thread shutdown, etc.
Update
If some of your log messages will take time to process (the email you described, for example), you can process them asynchronously. For example:
while (_queue.TryTake(out msg, TimeSpan.Infinite))
{
if (msg.Type == Email)
{
// start asynchronous task to send email
}
else
{
// write to log file
}
}
This way, only those messages that potentially take lots of time will run asynchronously. You can also have a secondary queue there if you want, for the email messages. That way you won't get bogged down with a bunch of email threads. Rather, you limit it to one or two, or perhaps a handful.
Note that you can also have multiple Logger instances if you want, all reading from the same message queue. Just make sure they're each writing to a different log file. The queue itself will support multiple consumers.
I think in general the approach to solve your issue is maybe not the best practice.
I mean, instead of creating 1000s of threads, you just want to store 1000s of messages in a database right? And it seems you want to do this asynchronously.
But creating a thread for each message is not really a good idea and actually does not solve that issue...
Instead I would try to implement something like message queues. You can have multiple queues and each queue has its own thread. If messages are coming in, you send them to one of the queues (alternating)...
The queue either waits for a certain amount of messages, or always waits a certain amount of time (e.g. 1 second, depends of how long it takes to store e.g. 100 messages within the database) until it tries to store the queued messages in the database.
This way you should actually always have a constant number of threads and you shouldn't see any performance issues...
Also it would enable you to batch insert data and not only one by one with the overhead of db connections etc...
Of cause, if your database is slower then the tasks are able to store the messages, more and more messages will be queued... But that's true for your current solution, also.
Since multiple answers and comments led to my solution I will post the complete code here.
I used threadpool to manage the threads and code from this page for the wating function.
Creating the thread:
private static void ThreadSend(IMessage logMessage)
{
ThreadPool.QueueUserWorkItem(MessageHandler.HandleMessage, logMessage);
}
Waiting for the threads to finish:
public static bool WaitForThreads(int maxWaitingTime)
{
int maxThreads = 0;
int placeHolder = 0;
int availableThreads = 0;
while (maxWaitingTime > 0)
{
System.Threading.ThreadPool.GetMaxThreads(out maxThreads, out placeHolder);
System.Threading.ThreadPool.GetAvailableThreads(out availableThreads, out placeHolder);
//Stop if all threads are available
if (availableThreads == maxThreads)
{
return true;
}
System.Threading.Thread.Sleep(TimeSpan.FromMilliseconds(1000));
--maxWaitingTime;
}
return false;
}
Optionally you can add this somewhere outside these methods to limit the amount of threads in the pool.
System.Threading.ThreadPool.SetMaxThreads(MaxWorkerThreads, MaxCompletionPortThreads);
I am implementing a very basic thread in C#:
private Thread listenThread;
public void startParser()
{
this.listenThread = new Thread(new ThreadStart(checkingData));
this.listenThread.IsBackground = true;
this.listenThread.Start();
}
private void checkingData()
{
while (true)
{
}
}
Then I immediately get 100% CPU. I want to check if sensor data is read inside the while(true) loop. Why it is like this?
Thanks in advance.
while (true) is what killing your CPU.
You can add Thread.Sleep(X) to you while to give CPU some rest before checking again.
Also, seems like you actually need a Timer.
Look at one of the Timer classes here http://msdn.microsoft.com/en-us/library/system.threading.timer.aspx.
Use Timer with as high pulling interval as you can afford, 1 sec, half a sec.
You need to tradeoff between CPU usage and the maximum delay you can afford between checks.
Let your loop sleep. It's running around and around and getting tired. At the very least, let it take a break eventually.
Because your function isn't doing anything inside the while block, it grabs the CPU, and, for all practical purposes, never lets go of it, so other threads can do their work
private void checkingData()
{
while (true)
{
// executes, immediately
}
}
If you change it to the following, you should see more reasonable CPU consumption:
private void checkingData()
{
while (true)
{
// read your sensor data
Thread.Sleep(1000);
}
}
you can use blocking queue. take a item from blocking queue will block the thread until there is a item put into the queue. that doesn't cost any cpu.
with .net4, you can use BlockingCollection http://msdn.microsoft.com/en-us/library/dd267312.aspx
under version 4, there is not blocking queue int .net framework.
you can find many implements of blocking queue if you google it.
here is a implementation
http://www.codeproject.com/KB/recipes/boundedblockingqueue.aspx
by the way. where does the data you wait come from?
EDIT
if you want to check file. you can use FileSystemWatcher to check it with thread block.
if your data comes from external API and the api doesn't block the thread, there is no way to block the thread except use Thread.Sleep
If you're polling for a condition, definitely do as others suggested and put in a sleep. I'd also add that if you need maximum performance, you can use a statistical trick to avoid sleeping when sensor data has been read. When you detect sensor data is idle, say, 10 times in a row, then start to sleep on each iteration again.
I have an object that requires a lot of initialization (1-2 seconds on a beefy machine). Though once it is initialized it only takes about 20 miliseconds to do a typical "job"
In order to prevent it from being re-initialized every time an app wants to use it (which could be 50 times a second or not at all for minutes in typical usage), I decided to give it a job que, and have it run on its own thread, checking to see if there is any work for it in the que. However I'm not entirely sure how to make a thread that runs indefinetly with or without work.
Here's what I have so far, any critique is welcomed
private void DoWork()
{
while (true)
{
if (JobQue.Count > 0)
{
// do work on JobQue.Dequeue()
}
else
{
System.Threading.Thread.Sleep(50);
}
}
}
After thought: I was thinking I may need to kill this thread gracefully insead of letting it run forever, so I think I will add a Job type that tells the thread to end. Any thoughts on how to end a thread like this also appreciated.
You need to lock anyway, so you can Wait and Pulse:
while(true) {
SomeType item;
lock(queue) {
while(queue.Count == 0) {
Monitor.Wait(queue); // releases lock, waits for a Pulse,
// and re-acquires the lock
}
item = queue.Dequeue(); // we have the lock, and there's data
}
// process item **outside** of the lock
}
with add like:
lock(queue) {
queue.Enqueue(item);
// if the queue was empty, the worker may be waiting - wake it up
if(queue.Count == 1) { Monitor.PulseAll(queue); }
}
You might also want to look at this question, which limits the size of the queue (blocking if it is too full).
You need a synchronization primitive, like a WaitHandle (look at the static methods) . This way you can 'signal' the worker thread that there is work. It checks the queue and keeps on working until the queue is empty, at which time it waits for the mutex to signal it again.
Make one of the job items be a quit command too, so that you can signal the worker thread when it's time to exit the thread
In most cases, I've done this quite similar to how you've set up -- but not in the same language. I had the advantage of working with a data structure (in Python) which will block the thread until an item is put into the queue, negating the need for the sleep call.
If .NET provides a class like that, I'd look into using it. A thread blocking is much better than a thread spinning on sleep calls.
The job you can pass could be as simple as a "null"; if the code receives a null, it knows it's time to break out of the while and go home.
If you don't really need to have the thread exit (and just want it to keep from keeping your application running) you can set Thread.IsBackground to true and it will end when all non background threads end. Will and Marc both have good solutions for handling the queue.
Grab the Parallel Framework. It has a BlockingCollection<T> which you can use as a job queue. How you'd use it is:
Create the BlockingCollection<T> that will hold your tasks/jobs.
Create some Threads which have a never-ending loop (while(true){ // get job off the queue)
Set the threads going
Add jobs to the collection when they come available
The threads will be blocked until an item appears in the collection. Whoever's turn it is will get it (depends on the CPU). I'm using this now and it works great.
It also has the advantage of relying on MS to write that particularly nasty bit of code where multiple threads access the same resource. And whenever you can get somebody else to write that you should go for it. Assuming, of course, they have more technical/testing resources and combined experience than you.
I've implemented a background-task queue without using any kind of while loop, or pulsing, or waiting, or, indeed, touching Thread objects at all. And it seems to work. (By which I mean it's been in production environments handling thousands of tasks a day for the last 18 months without any unexpected behavior.) It's a class with two significant properties, a Queue<Task> and a BackgroundWorker. There are three significant methods, abbreviated here:
private void BackgroundWorker_DoWork(object sender, DoWorkEventArgs e)
{
if (TaskQueue.Count > 0)
{
TaskQueue[0].Execute();
}
}
private void BackgroundWorker_RunWorkerCompleted(object sender, RunWorkerCompletedEventArgs e)
{
Task t = TaskQueue[0];
lock (TaskQueue)
{
TaskQueue.Remove(t);
}
if (TaskQueue.Count > 0 && !BackgroundWorker.IsBusy)
{
BackgroundWorker.RunWorkerAsync();
}
}
public void Enqueue(Task t)
{
lock (TaskQueue)
{
TaskQueue.Add(t);
}
if (!BackgroundWorker.IsBusy)
{
BackgroundWorker.RunWorkerAsync();
}
}
It's not that there's no waiting and pulsing. But that all happens inside the BackgroundWorker. This just wakes up whenever a task is dropped in the queue, runs until the queue is empty, and then goes back to sleep.
I am far from an expert on threading. Is there a reason to mess around with System.Threading for a problem like this if using a BackgroundWorker will do?