.NET Framework 4.5.2. I'm trying to update some old code and trying to figure out how to manage the number of running threads for a specific ID. This is the code.
foreach (ThreadGroup thread_group in threadGroups)
{
if (ImportQueues.TryGetValue(thread_group.ThreadGroupID, out BlockingCollection<ImportFrequency> configuration))
{
if (configuration.Count > thread_group.ThreadCount)
{
// increase number of threads running for this group
}
else if (configuration.Count < thread_group.ThreadCount)
{
// decrease number of threads running for this group
}
}
else // Spin up the initial threads
{
ImportQueues.Add(thread_group.ThreadGroupID, new BlockingCollection<ImportFrequency>());
for (int x = 0; x < thread_group.ThreadCount; x++)
{
Thread import_thread = new Thread(new ParameterizedThreadStart(ProcessImportQueue)) { IsBackground = true };
import_thread.Start((ImportQueues[thread_group.ThreadGroupID]));
}
}
}
Essentially, we're running thread_group.ThreadCount number of threads for each group's blocking collection, which will be continually updated. Additionally, thread_group can be updated to change the ThreadCount. If the change increases the number of threads, spin up more to process the blocking collection. If it decreases, wait for the difference in number of threads to finish while the others continue running.
Is this possible with this paradigm or do I need to find a better way to manage threads?
Edit: One solution I tried with this is using CancellationTokens. When I start up a thread, I pass in a model that contains the context as well as a CancellationToken. That model is saved to a global variable. If we need to decrease the number of threads, I go through however many need to be stopped and cancel the token, which stops the infinite loop for that one thread and stops it.
As I understand the example code, it creates N queues, with M threads processing each queue.
There are a few potential problems with this. I say potential since there might be some special circumstances that motivates such a solution, but Sturgeon's law suggest it might just be a misguided attempt to achieve some unclear goal.
It is likely that N*M is larger than the amount of hardware threads available, either resulting in having idle threads that consume resources, or having unnecessary context switching if all threads are loaded.
It is unclear if the queues are in any way different. If they are identical, why not just use a single queue?
While adding threads is fairly simple, decreasing threads can be a bit complicated. Threads should only be canceled cooperatively, so you need some way to signal that the thread should be cancelled. But if the thread is blocked waiting for items in the queue you might run into situations where you have asked threads to stop, but they have not yet done so, and you are starting threads faster than threads are actually freed, eventually running out of threads. This can probably be solved, but it is unclear if the current solution does this.
'Import' sounds like something involving IO, and IO usually does not scale well with the number of threads. In the worst case, parallel IO can even harm performance.
It is unclear what the actual processing is doing. In my experience, a cpu core can do an incredible amount of work if properly utilized. When things are 'slow' it usually means that the code is doing some kind of unnecessary work. Some junior programmers approach performance problems by attempting multi-threading as the first and last step. When profiling the code and improving efficiency can often give much greater improvements.
For a simple case of a queue that needs to be processed in parallel I would consider using a parallel.Foreach loop over a blocking collection, using the parallelExtensionsExtras for a better partitioner. That should automatically try to balance the number of threads used to available hardware and CPU usage patterns for maximum throughput.
But I would not touch working code without a good understanding what it is trying to do, and why it should be changed to something better. Any preferably only after sufficient automated testing is in place to help avoid introducing new bugs. Changing code just because it is old is a terrible idea.
Related
In an attempt to speed up processing of physics objects in C# I decided to change a linear update algorithm into a parallel algorithm. I believed the best approach was to use the ThreadPool as it is built for completing a queue of jobs.
When I first implemented the parallel algorithm, I queued up a job for every physics object. Keep in mind, a single job completes fairly quickly (updates forces, velocity, position, checks for collision with the old state of any surrounding objects to make it thread safe, etc). I would then wait on all jobs to be finished using a single wait handle, with an interlocked integer that I decremented each time a physics object completed (upon hitting zero, I then set the wait handle). The wait was required as the next task I needed to do involved having the objects all be updated.
The first thing I noticed was that performance was crazy. When averaged, the thread pooling seemed to be going a bit faster, but had massive spikes in performance (on the order of 10 ms per update, with random jumps to 40-60ms). I attempted to profile this using ANTS, however I could not gain any insight into why the spikes were occurring.
My next approach was to still use the ThreadPool, however instead I split all the objects into groups. I initially started with only 8 groups, as that was how any cores my computer had. The performance was great. It far outperformed the single threaded approach, and had no spikes (about 6ms per update).
The only thing I thought about was that, if one job completed before the others, there would be an idle core. Therefore, I increased the number of jobs to about 20, and even up to 500. As I expected, it dropped to 5ms.
So my questions are as follows:
Why would spikes occur when I made the job sizes quick / many?
Is there any insight into how the ThreadPool is implemented that would help me to understand how best to use it?
Using threads has a price - you need context switching, you need locking (the job queue is most probably locked when a thread tries to fetch a new job) - it all comes at a price. This price is usually small compared to the actual work your thread is doing, but if the work ends quickly, the price becomes meaningful.
Your solution seems correct. A reasonable rule of thumb is to have twice as many threads as there are cores.
As you probably expect yourself, the spikes are likely caused by the code that manages the thread pools and distributes tasks to them.
For parallel programming, there are more sophisticated approaches than "manually" distributing work across different threads (even if using the threadpool).
See Parallel Programming in the .NET Framework for instance for an overview and different options. In your case, the "solution" may be as simple as this:
Parallel.ForEach(physicObjects, physicObject => Process(physicObject));
Here's my take on your two questions:
I'd like to start with question 2 (how the thread pool works) because it actually holds the key to answering question 1. The thread pool is implemented (without going into details) as a (thread-safe) work queue and a group of worker threads (which may shrink or enlarge as needed). As the user calls QueueUserWorkItem the task is put into the work queue. The workers keep polling the queue and taking work if they are idle. Once they manage to take a task, they execute it and then return to the queue for more work (this is very important!). So the work is done by the workers on-demand: as the workers become idle they take more pieces of work to do.
Having said the above, it's simple to see what is the answer to question 1 (why did you see a performance difference with more fine-grained tasks): it's because with fine-grain you get more load-balancing (a very desirable property), i.e. your workers do more or less the same amount of work and all cores are exploited uniformly. As you said, with a coarse-grain task distribution, there may be longer and shorter tasks, so one or more cores may be lagging behind, slowing down the overall computation, while other do nothing. With small tasks the problem goes away. Each worker thread takes one small task at a time and then goes back for more. If one thread picks up a shorter task it will go to the queue more often, If it takes a longer task it will go to the queue less often, so things are balanced.
Finally, when the jobs are too fine-grained, and considering that the pool may enlarge to over 1K threads, there is very high contention on the queue when all threads go back to take more work (which happens very often), which may account for the spikes you are seeing. If the underlying implementation uses a blocking lock to access the queue, then context switches are very frequent which hurts performance a lot and makes it seem rather random.
answer of question 1:
this is because of Thread switching , thread switching (or context switching in OS concepts) is CPU clocks that takes to switch between each thread , most of times multi-threading increases the speed of programs and process but when it's process is so small and quick size then context switching will take more time than thread's self process so the whole program throughput decreases, you can find more information about this in O.S concepts books .
answer of question 2:
actually i have a overall insight of ThreadPool , and i cant explain what is it's structure exactly.
to learn more about ThreadPool start here ThreadPool Class
each version of .NET Framework adds more and more capabilities utilizing ThreadPool indirectly. such as Parallel.ForEach Method mentioned before added in .NET 4 along with System.Threading.Tasks which makes code more readable and neat. You can learn more on this here Task Schedulers as well.
At very basic level what it does is: it creates let's say 20 threads and puts them into a lits. Each time it receives a delegate to execute async it takes idle thread from the list and executes delegate. if no available threads found it puts it into a queue. every time deletegate execution completes it will check if queue has any item and if so peeks one and executes in the same thread.
We have a situation where our application needs to process a series of files and rather than perform this function synchronously, we would like to employ multi-threading to have the workload split amongst different threads.
Each item of work is:
1. Open a file for read only
2. Process the data in the file
3. Write the processed data to a Dictionary
We would like to perform each file's work on a new thread?
Is this possible and should be we better to use the ThreadPool or spawn new threads keeping in mind that each item of "work" only takes 30ms however its possible that hundreds of files will need to be processed.
Any ideas to make this more efficient is appreciated.
EDIT: At the moment we are making use of the ThreadPool to handle this. If we have 500 files to process we cycle through the files and allocate each "unit of processing work" to the threadpool using QueueUserWorkItem.
Is it suitable to make use of the threadpool for this?
I would suggest you to use ThreadPool.QueueUserWorkItem(...), in this, threads are managed by the system and the .net framework. The chances of you meshing up with your own threadpool is much higher. So I would recommend you to use Threadpool provided by .net .
It's very easy to use,
ThreadPool.QueueUserWorkItem(new WaitCallback(YourMethod), ParameterToBeUsedByMethod);
YourMethod(object o){
Your Code here...
}
For more reading please follow the link http://msdn.microsoft.com/en-us/library/3dasc8as%28VS.80%29.aspx
Hope, this helps
I suggest you have a finite number of threads (say 4) and then have 4 pools of work. I.e. If you have 400 files to process have 100 files per thread split evenly. You then spawn the threads, and pass to each their work and let them run until they have finished their specific work.
You only have a certain amount of I/O bandwidth so having too many threads will not provide any benefits, also remember that creating a thread also takes a small amount of time.
Instead of having to deal with threads or manage thread pools directly I would suggest using a higher-level library like Parallel Extensions (PEX):
var filesContent = from file in enumerableOfFilesToProcess
select new
{
File=file,
Content=File.ReadAllText(file)
};
var processedContent = from content in filesContent
select new
{
content.File,
ProcessedContent = ProcessContent(content.Content)
};
var dictionary = processedContent
.AsParallel()
.ToDictionary(c => c.File);
PEX will handle thread management according to available cores and load while you get to concentrate about the business logic at hand (wow, that sounded like a commercial!)
PEX is part of the .Net Framework 4.0 but a back-port to 3.5 is also available as part of the Reactive Framework.
I suggest using the CCR (Concurrency and Coordination Runtime) it will handle the low-level threading details for you. As for your strategy, one thread per work item may not be the best approach depending on how you attempt to write to the dictionary, because you may create heavy contention since dictionaries aren't thread safe.
Here's some sample code using the CCR, an Interleave would work nicely here:
Arbiter.Activate(dispatcherQueue, Arbiter.Interleave(
new TeardownReceiverGroup(Arbiter.Receive<bool>(
false, mainPort, new Handler<bool>(Teardown))),
new ExclusiveReceiverGroup(Arbiter.Receive<object>(
true, mainPort, new Handler<object>(WriteData))),
new ConcurrentReceiverGroup(Arbiter.Receive<string>(
true, mainPort, new Handler<string>(ReadAndProcessData)))));
public void WriteData(object data)
{
// write data to the dictionary
// this code is never executed in parallel so no synchronization code needed
}
public void ReadAndProcessData(string s)
{
// this code gets scheduled to be executed in parallel
// CCR take care of the task scheduling for you
}
public void Teardown(bool b)
{
// clean up when all tasks are done
}
In the long run, I think you'll be happier if you manage your own threads. This will let you control how many are running and make it easy to report status.
Build a worker class that does the processing and give it a callback routine to return results and status.
For each file, create a worker instance and a thread to run it. Put the thread in a Queue.
Peel threads off of the queue up to the maximum you want to run simultaneously. As each thread completes go get another one. Adjust the maximum and measure throughput. I prefer to use a Dictionary to hold running threads, keyed by their ManagedThreadId.
To stop early, just clear the queue.
Use locking around your thread collections to preserve your sanity.
Use ThreadPool.QueueUserWorkItem to execute each independent task. Definitely don't create hundreds of threads. That is likely to cause major headaches.
The general rule for using the ThreadPool is if you don't want to worry about when the threads finish (or use Mutexes to track them), or worry about stopping the threads.
So do you need to worry about when the work is done? If not, the ThreadPool is the best option. If you want to track the overall progress, stop threads then your own collection of threads is best.
ThreadPool is generally more efficient if you are re-using threads. This question will give you a more detailed discussion.
Hth
Using the ThreadPool for each individual task is definitely a bad idea. From my experience this tends to hurt performance more than helping it. The first reason is that a considerable amount of overhead is required just to allocate a task for the ThreadPool to execute. By default, each application is assigned it's own ThreadPool that is initialized with ~100 thread capacity. When you are executing 400 operations in a parallel, it does not take long to fill the queue with requests and now you have ~100 threads all competing for CPU cycles. Yes the .NET framework does a great job with throttling and prioritizing the queue, however, I have found that the ThreadPool is best left for long-running operations that probably won't occur very often (loading a configuration file, or random web requests). Using the ThreadPool to fire off a few operations at random is much more efficient than using it to execute hundreds of requests at once. Given the current information, the best course of action would be something similar to this:
Create a System.Threading.Thread (or use a SINGLE ThreadPool thread) with a queue that the application can post requests to
Use the FileStream's BeginRead and BeginWrite methods to perform the IO operations. This will cause the .NET framework to use native API's to thread and execute the IO (IOCP).
This will give you 2 leverages, one is that your requests will still get processed in parallel while allowing the operating system to manage file system access and threading. The second is that because the bottleneck of the vast majority of systems will be the HDD, you can implement a custom priority sort and throttling to your request thread to give greater control over resource usage.
Currently I have been writing a similar application and using this method is both efficient and fast... Without any threading or throttling my application was only using 10-15% CPU, which can be acceptable for some operations depending on the processing involved, however, it made my PC as slow as if an application was using 80%+ of the CPU. This was the file system access. The ThreadPool and IOCP functions do not care if they are bogging the PC down, so don't get confused, they are optimized for performance, even if that performance means your HDD is squeeling like a pig.
The only problem I have had is memory usage ran a little high (50+ mb) during the testing phaze with approximately 35 streams open at once. I am currently working on a solution similar to the MSDN recommendation for SocketAsyncEventArgs, using a pool to allow x number of requests to be operating simultaneously, which ultimately led me to this forum post.
Hope this helps somebody with their decision making in the future :)
Scenario
I have a Windows Forms Application. Inside the main form there is a loop that iterates around 3000 times, Creating a new instance of a class on a new thread to perform some calculations. Bearing in mind that this setup uses a Thread Pool, the UI does stay responsive when there are only around 100 iterations of this loop (100 Assets to process). But as soon as this number begins to increase heavily, the UI locks up into eggtimer mode and the thus the log that is writing out to the listbox on the form becomes unreadable.
Question
Am I right in thinking that the best way around this is to use a Background Worker?
And is the UI locking up because even though I'm using lots of different threads (for speed), the UI itself is not on its own separate thread?
Suggested Implementations greatly appreciated.
EDIT!!
So lets say that instead of just firing off and queuing up 3000 assets to process, I decide to do them in batches of 100. How would I go about doing this efficiently? I made an attempt earlier at adding "Thread.Sleep(5000);" after every batch of 100 were fired off, but the whole thing seemed to crap out....
If you are creating 3000 separate threads, you are pushing a documented limitation of the ThreadPool class:
If an application is subject to bursts
of activity in which large numbers of
thread pool tasks are queued, use the
SetMinThreads method to increase the
minimum number of idle threads.
Otherwise, the built-in delay in
creating new idle threads could cause
a bottleneck.
See that MSDN topic for suggestions to configure the thread pool for your situation.
If your work is CPU intensive, having that many separate threads will cause more overhead than it's worth. However, if it's very IO intensive, having a large number of threads may help things somewhat.
.NET 4 introduces outstanding support for parallel programming. If that is an option for you, I suggest you have a look at that.
More threads does not equal top speed. In fact too many threads equals less speed. If your task is simply CPU related you should only be using as many threads as you have cores otherwise you're wasting resources.
With 3,000 iterations and your form thread attempting to create a thread each time what's probably happening is you are maxing out the thread pool and the form is hanging because it needs to wait for a prior thread to complete before it can allocate a new one.
Apparently ThreadPool doesn't work this way. I have never checked it with threads before so I am not sure. Another possibility is that the tasks begin flooding the UI thread with invocations at which point it will give up on the GUI.
It's difficult to tell without seeing code - but, based on what you're describing, there is one suspect.
You mentioned that you have this running on the ThreadPool now. Switching to a BackgroundWorker won't change anything, dramatically, since it also uses the ThreadPool to execute. (BackgroundWorker just simplifies the invoke calls...)
That being said, I suspect the problem is your notifications back to the UI thread for your ListBox. If you're invoking too frequently, your UI may become unresponsive while it tries to "catch up". This can happen if you're feeding too much status info back to the UI thread via Control.Invoke.
Otherwise, make sure that ALL of your work is being done on the ThreadPool, and you're not blocking on the UI thread, and it should work.
If every thread logs something to your ui, every written log line must invoke the main thread. Better to cache the log-output and update the gui only every 100 iterations or something like that.
Since I haven't seen your code so this is just a lot of conjecture with some highly hopefully educated guessing.
All a threadpool does is queue up your requests and then fire new threads off as others complete their work. Now 3000 threads doesn't sounds like a lot but if there's a ton of processing going on you could be destroying your CPU.
I'm not convinced a background worker would help out since you will end up re-creating a manager to handle all the pooling the threadpool gives you. I think more you issue is you've got too much data chunking going on. I think a good place to start would be to throttle the amount of threads you start and maintain. The threadpool manager easily allows you to do this. Find a balance that allows you to process data while still keeping the UI responsive.
I am working on a C# application that works with an array. It walks through it (meaning that at one time only a narrow part of the array is used). I am considering adding threads in it to make it perform faster (it runs on a dualcore computer). The problem is that I do not know if it would actually help, because threads cost something and this cost could easily be more than the parallel gain... So how do I determine if threading would help?
Try writing some benchmarks that mimic, as closely as possible, the real-world conditions in which your software will actually be used.
Test and time the single-threaded version. Test and time the multi-threaded version. Compare the two sets of results.
If your application is CPU bound (i.e. it isn't spending time trying to read files or waiting for data from a device) and there is little to no sharing of live data (data being altered, if its read only its fine) between the threads then you can pretty much increase the speed by 50->75% by adding another thread (as long as it still remains CPU bound of course).
The main overhead in multithreading comes from 2 places.
Creation & initialization of the thread. Creating a thread requires quite a few resources to be allocated and involves swaps between kernel and user mode, this is expensive though a once off per thread so you can pretty much ignore it if the thread is running for any reasonable amount of time. The best way to mitigate this problem is to use a thread pool as it will keep the thread on hand and not need to be recreated.
Handling synchronization of data. If one thread is reading from data that another is writing, bad things will generally happen (worse if both are changing it). This requires you to lock your data before altering it so that no thread reads a half written value. These locks are generally quite slow as well. To mitigate this problem, you need to design your data layout so that the threads don't need to read or write to the same data as much as possible. If you do need a lot of these locks it can then become slower than the single thread option.
In short, if you are doing something that requires the CPU's to share a lot of data, then multi-threading it will be slower and if the program isn't CPU bound there will be little or no difference (could be a lot slower depending on what it is bound to, e.g. a cd/hard drive). If your program matches these conditions, then it will PROBABLY be worthwhile to add another thread (though the only way to be certain would be profiling).
One more little note, you should only create as many CPU bound threads as you have physical cores (threads that idle most of the time, such as a GUI message pump thread, can be ignored for this condition).
P.S. You can reduce the cost of locking data by using a methodology called "lock-free programming", though this something that should really only be attempted by people with a lot of experience with multi-threading and a clear understanding of their target architecture (including how the cache is treated and the memory bus).
I agree with Luke's answer. Benchmark it, it's the only way to be sure.
I can also give a prediction of the results - the fastest version will be when the number of threads matches the number of cores, EXCEPT if the array is very small and each thread would have to process just a few items, the setup/teardown times might get larger than the processing itself. How few - that depends on what you do. Again - benchmark.
I'd advise to find out a "minimum number of items for a thread to be useful". Then, when you are deciding how many threads to spawn (or take from a pool), check how many cores the computer has and how many items there are. Spawn as many threads as possible, but no more than the computer has cores, and not so many that each thread would have less than the minimum number of items to process.
For example if the minimum number of items is, say, 1000; and the computer has 4 cores; and your list contains 2500 items, you would spawn just 2 threads, because more threads would be inefficient (each would process less than 1000 items).
Making a step by step list for Luke's idea:
Make a single threaded test app
Download Sysinternals Process Monitor and run it
Run your test app and find it on the process list (remember to run it as a release build outside of Visual Studio)
Double click the process and select the Performance Graph tab
Observe the CPU time used by your process
If the CPU time is sittling flat 50% for more than a few seconds, you can probably speed your overall process up using threads (assuming the bunch of stuff Mr Peters refered to holds true)
(However, the best you can do on a duel core machine is to halve the time it takes to run. If your process only take 4 seconds, it might not be worth getting it to run in 2 seconds)
Using the task parallel library / Rx provides a friendlier interface than System.Threading.ThreadPool, which might make your world a bit easier.
You miss imho one item, which is that it is not always about execution time. There is:
The problem to koop a UI operational during an operation. Even if the UI is "dormant", a nonresponsive message pump makes a worse impression.
The possibility to use a thread pool to actually not ahve to start / stop threads all the time. I use thread pools very extensively, and various parts of the applications keep them busy.
Anyhow, ignoring my point 1 - where you may go multi threaded without speeding things up in order to keep your UI responsive - I would say it is always then faster when you can actually either split up work (so you can keep more than one core busy) or offload it for othe reasons.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
When should I not use the ThreadPool in .Net?
It looks like the best option is to use a ThreadPool, in which case, why is it not the only option?
What are your experiences around this?
#Eric, I'm going to have to agree with Dean. Threads are expensive. You can't assume that your program is the only one running. When everyone is greedy with resources, the problem multiplies.
I prefer to create my threads manually and control them myself. It keeps the code very easy to understand.
That's fine when it's appropriate. If you need a bunch of worker threads, though, all you've done is make your code more complicated. Now you have to write code to manage them. If you just used a thread pool, you'd get all the thread management for free. And the thread pool provided by the language is very likely to be more robust, more efficient, and less buggy than whatever you roll for yourself.
Thread t = new Thread(new ThreadStart(DoSomething));
t.Start();
t.Join();
I hope that you would normally have some additional code in between Start() and Join(). Otherwise, the extra thread is useless, and you're wasting resources for no reason.
People are way too afraid of the resources used by threads. I've never seen creating and starting a thread to take more than a millisecond. There is no hard limit on the number of threads you can create. RAM usage is minimal. Once you have a few hundred threads, CPU becomes an issue because of context switches, so at that point you might want to get fancy with your design.
A millisecond is a long time on modern hardware. That's 3 million cycles on a 3GHz machine. And again, you aren't the only one creating threads. Your threads compete for the CPU along with every other program's threads. If you use not-quite-too-many threads, and so does another program, then together you've used too many threads.
Seriously, don't make life more complex than it needs to be. Don't use the thread pool unless you need something very specific that it offers.
Indeed. Don't make life more complex. If your program needs multiple worker threads, don't reinvent the wheel. Use the thread pool. That's why it's there. Would you roll your own string class?
The only reason why I wouldn't use the ThreadPool for cheap multithreading is if I need to…
interract with the method running (e.g., to kill it)
run code on a STA thread (this happened to me)
keep the thread alive after my application has died (ThreadPool threads are background threads)
in case I need to change the priority of the Thread. We can not change priority of threads in ThreadPool which is by default Normal.
P.S.: The MSDN article "The Managed Thread Pool" contains a section titled, "When Not to Use Thread Pool Threads", with a very similar but slightly more complete list of possible reasons for not using the thread pool.
There are lots of reasons why you would need to skip the ThreadPool, but if you don't know them then the ThreadPool should be good enough for you.
Alternatively, look at the new Parallel Extensions Framework, which has some neat stuff in there that may suit your needs without having to use the ThreadPool.
To quarrelsome's answer, I would add that it's best not to use a ThreadPool thread if you need to guarantee that your thread will begin work immediately. The maximum number of running thread-pooled threads is limited per appdomain, so your piece of work may have to wait if they're all busy. It's called "queue user work item", after all.
Two caveats, of course:
You can change the maximum number of thread-pooled threads in code, at runtime, so there's nothing to stop you checking the current vs maximum number and upping the maximum if required.
Spinning up a new thread comes with its own time penalty - whether it's worthwhile for you to take the hit depends on your circumstances.
Thread pools make sense whenever you have the concept of worker threads. Any time you can easily partition processing into smaller jobs, each of which can be processed independently, worker threads (and therefore a thread pool) make sense.
Thread pools do not make sense when you need thread which perform entirely dissimilar and unrelated actions, which cannot be considered "jobs"; e.g., One thread for GUI event handling, another for backend processing. Thread pools also don't make sense when processing forms a pipeline.
Basically, if you have threads which start, process a job, and quit, a thread pool is probably the way to go. Otherwise, the thread pool isn't really going to help.
I'm not speaking as someone with only
theoretical knowledge here. I write
and maintain high volume applications
that make heavy use of multithreading,
and I generally don't find the thread
pool to be the correct answer.
Ah, argument from authority - but always be on the look out for people who might be on the Windows kernel team.
Neither of us were arguing with the fact that if you have some specific requirements then the .NET ThreadPool might not be the right thing. What we're objecting to is the trivialisation of the costs to the machine of creating a thread.
The significant expense of creating a thread at the raison d'etre for the ThreadPool in the first place. I don't want my machines to be filled with code written by people who have been misinformed about the expense of creating a thread, and don't, for example, know that it causes a method to be called in every single DLL which is attached to the process (some of which will be created by 3rd parties), and which may well hot-up a load of code which need not be in RAM at all and almost certainly didn't need to be in L1.
The shape of the memory hierarchy in a modern machine means that 'distracting' a CPU is about the worst thing you can possibly do, and everybody who cares about their craft should work hard to avoid it.
When you're going to perform an operation that is going to take a long time, or perhaps a continuous background thread.
I guess you could always push the amount of threads available in the pool up but there would be little point in incurring the management costs of a thread that is never going to be given back to the pool.
Threadpool threads are appropriate for tasks that meet both of the following criteria:
The task will not have to spend any significant time waiting for something to happen
Anything that's waiting for the task to finish will likely be waiting for many tasks to finish, so its scheduling priority isn't apt to affect things much.
Using a threadpool thread instead of creating a new one will save a significant but bounded amount of time. If that time is significant compared with the time it will take to perform a task, a threadpool task is likely appropriate. The longer the time required to perform a task, however, the smaller the benefit of using the threadpool and the greater the likelihood of the task impeding threadpool efficiency.
MSDN has a list some reasons here:
http://msdn.microsoft.com/en-us/library/0ka9477y.aspx
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
#Eric
#Derek, I don't exactly agree with the scenario you use as an example. If you don't know exactly what's running on your machine and exactly how many total threads, handles, CPU time, RAM, etc, that your app will use under a certain amount of load, you are in trouble.
Are you the only target customer for the programs you write? If not, you can't be certain about most of that. You generally have no idea when you write a program whether it will execute effectively solo, or if it will run on a webserver being hammered by a DDOS attack. You can't know how much CPU time you are going to have.
Assuming your program's behavior changes based on input, it's rare to even know exactly how much memory or CPU time your program will consume. Sure, you should have a pretty good idea about how your program is going to behave, but most programs are never analyzed to determine exactly how much memory, how many handles, etc. will be used, because a full analysis is expensive. If you aren't writing real-time software, the payoff isn't worth the effort.
In general, claiming to know exactly how your program will behave is far-fetched, and claiming to know everything about the machine approaches ludicrous.
And to be honest, if you don't know exactly what method you should use: manual threads, thread pool, delegates, and how to implement it to do just what your application needs, you are in trouble.
I don't fully disagree, but I don't really see how that's relevant. This site is here specifically because programmers don't always have all the answers.
If your application is complex enough to require throttling the number of threads that you use, aren't you almost always going to want more control than what the framework gives you?
No. If I need a thread pool, I will use the one that's provided, unless and until I find that it is not sufficient. I will not simply assume that the provided thread pool is insufficient for my needs without confirming that to be the case.
I'm not speaking as someone with only theoretical knowledge here. I write and maintain high volume applications that make heavy use of multithreading, and I generally don't find the thread pool to be the correct answer.
Most of my professional experience has been with multithreading and multiprocessing programs. I have often needed to roll my own solution as well. That doesn't mean that the thread pool isn't useful, or appropriate in many cases. The thread pool is built to handle worker threads. In cases where multiple worker threads are appropriate, the provided thread pool should should generally be the first approach.