Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
When should I not use the ThreadPool in .Net?
It looks like the best option is to use a ThreadPool, in which case, why is it not the only option?
What are your experiences around this?
#Eric, I'm going to have to agree with Dean. Threads are expensive. You can't assume that your program is the only one running. When everyone is greedy with resources, the problem multiplies.
I prefer to create my threads manually and control them myself. It keeps the code very easy to understand.
That's fine when it's appropriate. If you need a bunch of worker threads, though, all you've done is make your code more complicated. Now you have to write code to manage them. If you just used a thread pool, you'd get all the thread management for free. And the thread pool provided by the language is very likely to be more robust, more efficient, and less buggy than whatever you roll for yourself.
Thread t = new Thread(new ThreadStart(DoSomething));
t.Start();
t.Join();
I hope that you would normally have some additional code in between Start() and Join(). Otherwise, the extra thread is useless, and you're wasting resources for no reason.
People are way too afraid of the resources used by threads. I've never seen creating and starting a thread to take more than a millisecond. There is no hard limit on the number of threads you can create. RAM usage is minimal. Once you have a few hundred threads, CPU becomes an issue because of context switches, so at that point you might want to get fancy with your design.
A millisecond is a long time on modern hardware. That's 3 million cycles on a 3GHz machine. And again, you aren't the only one creating threads. Your threads compete for the CPU along with every other program's threads. If you use not-quite-too-many threads, and so does another program, then together you've used too many threads.
Seriously, don't make life more complex than it needs to be. Don't use the thread pool unless you need something very specific that it offers.
Indeed. Don't make life more complex. If your program needs multiple worker threads, don't reinvent the wheel. Use the thread pool. That's why it's there. Would you roll your own string class?
The only reason why I wouldn't use the ThreadPool for cheap multithreading is if I need to…
interract with the method running (e.g., to kill it)
run code on a STA thread (this happened to me)
keep the thread alive after my application has died (ThreadPool threads are background threads)
in case I need to change the priority of the Thread. We can not change priority of threads in ThreadPool which is by default Normal.
P.S.: The MSDN article "The Managed Thread Pool" contains a section titled, "When Not to Use Thread Pool Threads", with a very similar but slightly more complete list of possible reasons for not using the thread pool.
There are lots of reasons why you would need to skip the ThreadPool, but if you don't know them then the ThreadPool should be good enough for you.
Alternatively, look at the new Parallel Extensions Framework, which has some neat stuff in there that may suit your needs without having to use the ThreadPool.
To quarrelsome's answer, I would add that it's best not to use a ThreadPool thread if you need to guarantee that your thread will begin work immediately. The maximum number of running thread-pooled threads is limited per appdomain, so your piece of work may have to wait if they're all busy. It's called "queue user work item", after all.
Two caveats, of course:
You can change the maximum number of thread-pooled threads in code, at runtime, so there's nothing to stop you checking the current vs maximum number and upping the maximum if required.
Spinning up a new thread comes with its own time penalty - whether it's worthwhile for you to take the hit depends on your circumstances.
Thread pools make sense whenever you have the concept of worker threads. Any time you can easily partition processing into smaller jobs, each of which can be processed independently, worker threads (and therefore a thread pool) make sense.
Thread pools do not make sense when you need thread which perform entirely dissimilar and unrelated actions, which cannot be considered "jobs"; e.g., One thread for GUI event handling, another for backend processing. Thread pools also don't make sense when processing forms a pipeline.
Basically, if you have threads which start, process a job, and quit, a thread pool is probably the way to go. Otherwise, the thread pool isn't really going to help.
I'm not speaking as someone with only
theoretical knowledge here. I write
and maintain high volume applications
that make heavy use of multithreading,
and I generally don't find the thread
pool to be the correct answer.
Ah, argument from authority - but always be on the look out for people who might be on the Windows kernel team.
Neither of us were arguing with the fact that if you have some specific requirements then the .NET ThreadPool might not be the right thing. What we're objecting to is the trivialisation of the costs to the machine of creating a thread.
The significant expense of creating a thread at the raison d'etre for the ThreadPool in the first place. I don't want my machines to be filled with code written by people who have been misinformed about the expense of creating a thread, and don't, for example, know that it causes a method to be called in every single DLL which is attached to the process (some of which will be created by 3rd parties), and which may well hot-up a load of code which need not be in RAM at all and almost certainly didn't need to be in L1.
The shape of the memory hierarchy in a modern machine means that 'distracting' a CPU is about the worst thing you can possibly do, and everybody who cares about their craft should work hard to avoid it.
When you're going to perform an operation that is going to take a long time, or perhaps a continuous background thread.
I guess you could always push the amount of threads available in the pool up but there would be little point in incurring the management costs of a thread that is never going to be given back to the pool.
Threadpool threads are appropriate for tasks that meet both of the following criteria:
The task will not have to spend any significant time waiting for something to happen
Anything that's waiting for the task to finish will likely be waiting for many tasks to finish, so its scheduling priority isn't apt to affect things much.
Using a threadpool thread instead of creating a new one will save a significant but bounded amount of time. If that time is significant compared with the time it will take to perform a task, a threadpool task is likely appropriate. The longer the time required to perform a task, however, the smaller the benefit of using the threadpool and the greater the likelihood of the task impeding threadpool efficiency.
MSDN has a list some reasons here:
http://msdn.microsoft.com/en-us/library/0ka9477y.aspx
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
#Eric
#Derek, I don't exactly agree with the scenario you use as an example. If you don't know exactly what's running on your machine and exactly how many total threads, handles, CPU time, RAM, etc, that your app will use under a certain amount of load, you are in trouble.
Are you the only target customer for the programs you write? If not, you can't be certain about most of that. You generally have no idea when you write a program whether it will execute effectively solo, or if it will run on a webserver being hammered by a DDOS attack. You can't know how much CPU time you are going to have.
Assuming your program's behavior changes based on input, it's rare to even know exactly how much memory or CPU time your program will consume. Sure, you should have a pretty good idea about how your program is going to behave, but most programs are never analyzed to determine exactly how much memory, how many handles, etc. will be used, because a full analysis is expensive. If you aren't writing real-time software, the payoff isn't worth the effort.
In general, claiming to know exactly how your program will behave is far-fetched, and claiming to know everything about the machine approaches ludicrous.
And to be honest, if you don't know exactly what method you should use: manual threads, thread pool, delegates, and how to implement it to do just what your application needs, you are in trouble.
I don't fully disagree, but I don't really see how that's relevant. This site is here specifically because programmers don't always have all the answers.
If your application is complex enough to require throttling the number of threads that you use, aren't you almost always going to want more control than what the framework gives you?
No. If I need a thread pool, I will use the one that's provided, unless and until I find that it is not sufficient. I will not simply assume that the provided thread pool is insufficient for my needs without confirming that to be the case.
I'm not speaking as someone with only theoretical knowledge here. I write and maintain high volume applications that make heavy use of multithreading, and I generally don't find the thread pool to be the correct answer.
Most of my professional experience has been with multithreading and multiprocessing programs. I have often needed to roll my own solution as well. That doesn't mean that the thread pool isn't useful, or appropriate in many cases. The thread pool is built to handle worker threads. In cases where multiple worker threads are appropriate, the provided thread pool should should generally be the first approach.
Related
I have been reading a bit about async/await vs ThreadPool vs Threads, and I have to admit, I'm not fully clear on the details. There is one specific question that I think I have the answer to, but I can't say for sure.
I am also aware of the many many questions on SO and elsewhere, where these questiosn are being discussed and explained. I have read a fair few here on SO, but I haven't found a clear answer, or at least as clear as I'd like.
What I gathered:
Using async/await in an I/O operation, will make that thread available to use for other things, while the I/O operation continues elsewhere
For a server application, this means higher throughput etc
However, a collegue said something like this:
that using a ThreadPool to execute the same I/O operation would behave almost the same as async/await
that when the ThreadPool thread "hands over" the I/O operation to the OS, it might wait for the answer, but that it won't matter much, since there is no work done on the CPU (the thread is waiting), and thus, the ThreadPool/OS/framework can use or spawn another thread to do some other work.
Since the threadpool has a limit of some 32000+ threads, it wouldn't matter much, since it could just use more threads. The cost for a thread is very small, just a few bytes of memory
So, what I am asking is:
Does the Thread from a ThreadPool block in I/O operations?
Does it matter if it does, since the OS/framework/Threadpool can just use another thread from the pool if needed for some other work?
Bottom line: I/O operations with async/await or ThreadPool; does it matter for efficiency and throughput?
Please note that I am not discussing the client-side of things, just from a server perspective.
And sorry in advance if I missed an exact answer to this, I have looked =)
The real issue is not "thread pool vs. async/await", it's synchronous I/O vs. asynchronous I/O. [1]
On Windows, threads are relatively expensive objects -- a thread has a lot of OS bookkeeping associated with it, not to mention 1 MB of preallocated stack space. This puts a fairly low cap on how many threads the system will even support, and even when you don't hit that cap, context switching between all those threads is not cheap either. That "32,000 threads" limit is a strictly theoretical one, and highly optimistic about how many threads you can have and still be responsive! [2]
Enter asynchronous I/O, which is optimized to use only as many threads as necessary (usually some conservative multiple of the number of physical processor cores in the system), ideally never even creating new threads beyond the initial batch. These threads are dedicated to handling completed I/O operations by removing them from a queue (known as a completion port). While an asynchronous operation is in progress, no thread is dedicated to it at all, not even as an item on a wait list (Stephen Cleary has a nice blog post about it that explains this in more detail). Not much imagination is needed to think about what's more efficient:
A few thousand individual threads that each wait on a particular operation, which have to be woken up and switched to (and between) depending on what operation(s) completed; or
A few dozen threads (if that), each of which can handle any completed operation, so that only as many ever need to run as necessary to be responsive.
As it turns out, the latter scales much better than the former; the "thread per request" model which is common in naive server code quickly shows its limits, even when you use a thread pool to reduce the creation of new threads. Note that this was an issue long before async/await was ever a thing, and so is the solution Windows went with; async/await is just a new way of writing code to use the existing mechanisms.
Are you likely to notice a difference with only a few requests in flight at a time? No. But since async/await essentially allows you to write code that looks synchronous but has the scalability of asynchronous I/O "for free", why would you not choose to use that in favor of synchronous I/O queued to the thread pool?
[1] Turns out Stephen Cleary already wrote most of what's in this answer a few years ago. I recommend you read that as well.
[2] Here's an older post by Mark Russinovich where he actually tries to squeeze as many threads out of the system as possible -- just for fun and profit. He "only" gets to 55K on a 64-bit machine before all resources are gone, and that's with adjusting the default stack size, and without doing any actually useful work. On a modern system you could probably get more, but the real question should not be "how many threads can I have" -- if it is, you're Doing it Wrong.
Okay, So I wanted to know what happens when I use TaskCreationOptions.LongRunning. By this answer, I came to know that for long running tasks, I should use this options because it creates a thread outside of threadpool.
Cool. But what advantage would I get when I create a thread outside threadpool? And when to do it and avoid it?
what advantage would I get when I create a thread outside threadpool?
The threadpool, as it name states, is a pool of threads which are allocated once and re-used throughout, in order to save the time and resources necessary to allocate a thread. The pool itself re-sizes on demand. If you queue more work than actual workers exist in the pool, it will allocate more threads in 500ms intervals, one at a time (this exists to avoid allocation of multiple threads simultaneously where existing threads may already finish executing and can serve requests). If many long running operations are performed on the thread-pool, it causes "thread starvation", meaning delegates will start getting queued and ran only once a thread frees up. That's why you'd want to avoid a large amount of threads doing lengthy work with thread-pool threads.
The Managed Thread-Pool docs also have a section on this question:
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
For more, see:
Thread vs ThreadPool
When should I not use the ThreadPool in .Net?
Dedicated thread or thread-pool thread?
"Long running" can be quantified pretty well, a thread that takes more than half a second is running long. That's a mountain of processor instructions on a modern machine, you'd have to burn a fat five billion of them per second. Pretty hard to do in a constructive way unless you are calculating the value of Pi to thousands of decimals in the fraction.
Practical threads can only take that long when they are not burning core but are waiting a lot. Invariably on an I/O completion, like reading data from a disk, a network, a dbase server. And often the reason you'd start considering using a thread in the first place.
The threadpool has a "manager". It determines when a threadpool thread is allowed to start. It doesn't happen immediately when you start it in your program. The manager tries to limit the number of running threads to the number of CPU cores you have. It is much more efficient that way, context switching between too many active threads is expensive. And a good throttle, preventing your program from consuming too many resources in a burst.
But the threadpool manager has the very common problem with managers, it doesn't know enough about what is going on. Just like my manager doesn't know that I'm goofing off at Stackoverflow.com, the tp manager doesn't know that a thread is waiting for something and not actually performing useful work. Without that knowledge it cannot make good decisions. A thread that does a lot of waiting should be ignored and another one should be allowed to run in its place. Actually doing real work.
Just like you tell your manager that you go on vacation, so he can expect no work to get done, you tell the threadpool manager the same thing with LongRunning.
Do note that it isn't quite a bad as it, perhaps, sounds in this answer. Particularly .NET 4.0 hired a new manager that's a lot smarter at figuring out the optimum number of running threads. It does so with a feedback loop, collecting data to discover if active threads actually get work done. And adjusts the optimum accordingly. Only problem with this approach is the common one when you close a feedback loop, you have to make it slow so the loop cannot become unstable. In other words, it isn't particularly quick at driving up the number of active threads.
If you know ahead of time that the thread is pretty abysmal, running for many seconds with no real cpu load then always pick LongRunning. Otherwise it is a tuning job, observing the program when it is done and tinkering with it to make it more optimal.
In an attempt to speed up processing of physics objects in C# I decided to change a linear update algorithm into a parallel algorithm. I believed the best approach was to use the ThreadPool as it is built for completing a queue of jobs.
When I first implemented the parallel algorithm, I queued up a job for every physics object. Keep in mind, a single job completes fairly quickly (updates forces, velocity, position, checks for collision with the old state of any surrounding objects to make it thread safe, etc). I would then wait on all jobs to be finished using a single wait handle, with an interlocked integer that I decremented each time a physics object completed (upon hitting zero, I then set the wait handle). The wait was required as the next task I needed to do involved having the objects all be updated.
The first thing I noticed was that performance was crazy. When averaged, the thread pooling seemed to be going a bit faster, but had massive spikes in performance (on the order of 10 ms per update, with random jumps to 40-60ms). I attempted to profile this using ANTS, however I could not gain any insight into why the spikes were occurring.
My next approach was to still use the ThreadPool, however instead I split all the objects into groups. I initially started with only 8 groups, as that was how any cores my computer had. The performance was great. It far outperformed the single threaded approach, and had no spikes (about 6ms per update).
The only thing I thought about was that, if one job completed before the others, there would be an idle core. Therefore, I increased the number of jobs to about 20, and even up to 500. As I expected, it dropped to 5ms.
So my questions are as follows:
Why would spikes occur when I made the job sizes quick / many?
Is there any insight into how the ThreadPool is implemented that would help me to understand how best to use it?
Using threads has a price - you need context switching, you need locking (the job queue is most probably locked when a thread tries to fetch a new job) - it all comes at a price. This price is usually small compared to the actual work your thread is doing, but if the work ends quickly, the price becomes meaningful.
Your solution seems correct. A reasonable rule of thumb is to have twice as many threads as there are cores.
As you probably expect yourself, the spikes are likely caused by the code that manages the thread pools and distributes tasks to them.
For parallel programming, there are more sophisticated approaches than "manually" distributing work across different threads (even if using the threadpool).
See Parallel Programming in the .NET Framework for instance for an overview and different options. In your case, the "solution" may be as simple as this:
Parallel.ForEach(physicObjects, physicObject => Process(physicObject));
Here's my take on your two questions:
I'd like to start with question 2 (how the thread pool works) because it actually holds the key to answering question 1. The thread pool is implemented (without going into details) as a (thread-safe) work queue and a group of worker threads (which may shrink or enlarge as needed). As the user calls QueueUserWorkItem the task is put into the work queue. The workers keep polling the queue and taking work if they are idle. Once they manage to take a task, they execute it and then return to the queue for more work (this is very important!). So the work is done by the workers on-demand: as the workers become idle they take more pieces of work to do.
Having said the above, it's simple to see what is the answer to question 1 (why did you see a performance difference with more fine-grained tasks): it's because with fine-grain you get more load-balancing (a very desirable property), i.e. your workers do more or less the same amount of work and all cores are exploited uniformly. As you said, with a coarse-grain task distribution, there may be longer and shorter tasks, so one or more cores may be lagging behind, slowing down the overall computation, while other do nothing. With small tasks the problem goes away. Each worker thread takes one small task at a time and then goes back for more. If one thread picks up a shorter task it will go to the queue more often, If it takes a longer task it will go to the queue less often, so things are balanced.
Finally, when the jobs are too fine-grained, and considering that the pool may enlarge to over 1K threads, there is very high contention on the queue when all threads go back to take more work (which happens very often), which may account for the spikes you are seeing. If the underlying implementation uses a blocking lock to access the queue, then context switches are very frequent which hurts performance a lot and makes it seem rather random.
answer of question 1:
this is because of Thread switching , thread switching (or context switching in OS concepts) is CPU clocks that takes to switch between each thread , most of times multi-threading increases the speed of programs and process but when it's process is so small and quick size then context switching will take more time than thread's self process so the whole program throughput decreases, you can find more information about this in O.S concepts books .
answer of question 2:
actually i have a overall insight of ThreadPool , and i cant explain what is it's structure exactly.
to learn more about ThreadPool start here ThreadPool Class
each version of .NET Framework adds more and more capabilities utilizing ThreadPool indirectly. such as Parallel.ForEach Method mentioned before added in .NET 4 along with System.Threading.Tasks which makes code more readable and neat. You can learn more on this here Task Schedulers as well.
At very basic level what it does is: it creates let's say 20 threads and puts them into a lits. Each time it receives a delegate to execute async it takes idle thread from the list and executes delegate. if no available threads found it puts it into a queue. every time deletegate execution completes it will check if queue has any item and if so peeks one and executes in the same thread.
Scenario
I have a Windows Forms Application. Inside the main form there is a loop that iterates around 3000 times, Creating a new instance of a class on a new thread to perform some calculations. Bearing in mind that this setup uses a Thread Pool, the UI does stay responsive when there are only around 100 iterations of this loop (100 Assets to process). But as soon as this number begins to increase heavily, the UI locks up into eggtimer mode and the thus the log that is writing out to the listbox on the form becomes unreadable.
Question
Am I right in thinking that the best way around this is to use a Background Worker?
And is the UI locking up because even though I'm using lots of different threads (for speed), the UI itself is not on its own separate thread?
Suggested Implementations greatly appreciated.
EDIT!!
So lets say that instead of just firing off and queuing up 3000 assets to process, I decide to do them in batches of 100. How would I go about doing this efficiently? I made an attempt earlier at adding "Thread.Sleep(5000);" after every batch of 100 were fired off, but the whole thing seemed to crap out....
If you are creating 3000 separate threads, you are pushing a documented limitation of the ThreadPool class:
If an application is subject to bursts
of activity in which large numbers of
thread pool tasks are queued, use the
SetMinThreads method to increase the
minimum number of idle threads.
Otherwise, the built-in delay in
creating new idle threads could cause
a bottleneck.
See that MSDN topic for suggestions to configure the thread pool for your situation.
If your work is CPU intensive, having that many separate threads will cause more overhead than it's worth. However, if it's very IO intensive, having a large number of threads may help things somewhat.
.NET 4 introduces outstanding support for parallel programming. If that is an option for you, I suggest you have a look at that.
More threads does not equal top speed. In fact too many threads equals less speed. If your task is simply CPU related you should only be using as many threads as you have cores otherwise you're wasting resources.
With 3,000 iterations and your form thread attempting to create a thread each time what's probably happening is you are maxing out the thread pool and the form is hanging because it needs to wait for a prior thread to complete before it can allocate a new one.
Apparently ThreadPool doesn't work this way. I have never checked it with threads before so I am not sure. Another possibility is that the tasks begin flooding the UI thread with invocations at which point it will give up on the GUI.
It's difficult to tell without seeing code - but, based on what you're describing, there is one suspect.
You mentioned that you have this running on the ThreadPool now. Switching to a BackgroundWorker won't change anything, dramatically, since it also uses the ThreadPool to execute. (BackgroundWorker just simplifies the invoke calls...)
That being said, I suspect the problem is your notifications back to the UI thread for your ListBox. If you're invoking too frequently, your UI may become unresponsive while it tries to "catch up". This can happen if you're feeding too much status info back to the UI thread via Control.Invoke.
Otherwise, make sure that ALL of your work is being done on the ThreadPool, and you're not blocking on the UI thread, and it should work.
If every thread logs something to your ui, every written log line must invoke the main thread. Better to cache the log-output and update the gui only every 100 iterations or something like that.
Since I haven't seen your code so this is just a lot of conjecture with some highly hopefully educated guessing.
All a threadpool does is queue up your requests and then fire new threads off as others complete their work. Now 3000 threads doesn't sounds like a lot but if there's a ton of processing going on you could be destroying your CPU.
I'm not convinced a background worker would help out since you will end up re-creating a manager to handle all the pooling the threadpool gives you. I think more you issue is you've got too much data chunking going on. I think a good place to start would be to throttle the amount of threads you start and maintain. The threadpool manager easily allows you to do this. Find a balance that allows you to process data while still keeping the UI responsive.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have been trying to learn multi-threaded programming in C# and I am confused about when it is best to use a thread pool vs. create my own threads. One book recommends using a thread pool for small tasks only (whatever that means), but I can't seem to find any real guidelines.
What are some pros and cons of thread pools vs creating my own threads? And what are some example use cases for each?
I would suggest you use a thread pool in C# for the same reasons as any other language.
When you want to limit the number of threads running or don't want the overhead of creating and destroying them, use a thread pool.
By small tasks, the book you read means tasks with a short lifetime. If it takes ten seconds to create a thread which only runs for one second, that's one place where you should be using pools (ignore my actual figures, it's the ratio that counts).
Otherwise you spend the bulk of your time creating and destroying threads rather than simply doing the work they're intended to do.
If you have lots of logical tasks that require constant processing and you want that to be done in parallel use the pool+scheduler.
If you need to make your IO related tasks concurrently such as downloading stuff from remote servers or disk access, but need to do this say once every few minutes, then make your own threads and kill them once you're finished.
Edit: About some considerations, I use thread pools for database access, physics/simulation, AI(games), and for scripted tasks ran on virtual machines that process lots of user defined tasks.
Normally a pool consists of 2 threads per processor (so likely 4 nowadays), however you can set up the amount of threads you want, if you know how many you need.
Edit: The reason to make your own threads is because of context changes, (thats when threads need to swap in and out of the process, along with their memory). Having useless context changes, say when you aren't using your threads, just leaving them sit around as one might say, can easily half the performance of your program (say you have 3 sleeping threads and 2 active threads). Thus if those downloading threads are just waiting they're eating up tons of CPU and cooling down the cache for your real application
Here's a nice summary of the thread pool in .Net: http://blogs.msdn.com/pedram/archive/2007/08/05/dedicated-thread-or-a-threadpool-thread.aspx
The post also has some points on when you should not use the thread pool and start your own thread instead.
I highly recommend reading the this free e-book:
Threading in C# by Joseph Albahari
At least read the "Getting Started" section. The e-book provides a great introduction and includes a wealth of advanced threading information as well.
Knowing whether or not to use the thread pool is just the beginning. Next you will need to determine which method of entering the thread pool best suits your needs:
Task Parallel Library (.NET Framework
4.0)
ThreadPool.QueueUserWorkItem
Asynchronous Delegates
BackgroundWorker
This e-book explains these all and advises when to use them vs. create your own thread.
The thread pool is designed to reduce context switching among your threads. Consider a process that has several components running. Each of those components could be creating worker threads. The more threads in your process, the more time is wasted on context switching.
Now, if each of those components were queuing items to the thread pool, you would have a lot less context switching overhead.
The thread pool is designed to maximize the work being done across your CPUs (or CPU cores). That is why, by default, the thread pool spins up multiple threads per processor.
There are some situations where you would not want to use the thread pool. If you are waiting on I/O, or waiting on an event, etc then you tie up that thread pool thread and it can't be used by anyone else. Same idea applies to long running tasks, though what constitutes a long running task is subjective.
Pax Diablo makes a good point as well. Spinning up threads is not free. It takes time and they consume additional memory for their stack space. The thread pool will re-use threads to amortize this cost.
Note: you asked about using a thread pool thread to download data or perform disk I/O. You should not use a thread pool thread for this (for the reasons I outlined above). Instead use asynchronous I/O (aka the BeginXX and EndXX methods). For a FileStream that would be BeginRead and EndRead. For an HttpWebRequest that would be BeginGetResponse and EndGetResponse. They are more complicated to use, but they are the proper way to perform multi-threaded I/O.
Beware of the .NET thread pool for operations that may block for any significant, variable or unknown part of their processing, as it is prone to thread starvation. Consider using the .NET parallel extensions, which provide a good number of logical abstractions over threaded operations. They also include a new scheduler, which should be an improvement on ThreadPool. See here
One reason to use the thread pool for small tasks only is that there are a limited number of thread pool threads. If one is used for a long time then it stops that thread from being used by other code. If this happens many times then the thread pool can become used up.
Using up the thread pool can have subtle effects - some .NET timers use thread pool threads and will not fire, for example.
If you have a background task that will live for a long time, like for the entire lifetime of your application, then creating your own thread is a reasonable thing. If you have short jobs that need to be done in a thread, then use thread pooling.
In an application where you are creating many threads, the overhead of creating the threads becomes substantial. Using the thread pool creates the threads once and reuses them, thus avoiding the thread creation overhead.
In an application that I worked on, changing from creating threads to using the thread pool for the short lived threads really helpped the through put of the application.
For the highest performance with concurrently executing units, write your own thread pool, where a pool of Thread objects are created at start up and go to blocking (formerly suspended), waiting on a context to run (an object with a standard interface implemented by your code).
So many articles about Tasks vs. Threads vs. the .NET ThreadPool fail to really give you what you need to make a decision for performance. But when you compare them, Threads win out and especially a pool of Threads. They are distributed the best across CPUs and they start up faster.
What should be discussed is the fact that the main execution unit of Windows (including Windows 10) is a thread, and OS context switching overhead is usually negligible. Simply put, I have not been able to find convincing evidence of many of these articles, whether the article claims higher performance by saving context switching or better CPU usage.
Now for a bit of realism:
Most of us won’t need our application to be deterministic, and most of us do not have a hard-knocks background with threads, which for instance often comes with developing an operating system. What I wrote above is not for a beginner.
So what may be most important is to discuss is what is easy to program.
If you create your own thread pool, you’ll have a bit of writing to do as you’ll need to be concerned with tracking execution status, how to simulate suspend and resume, and how to cancel execution – including in an application-wide shut down. You might also have to be concerned with whether you want to dynamically grow your pool and also what capacity limitation your pool will have. I can write such a framework in an hour but that is because I’ve done it so many times.
Perhaps the easiest way to write an execution unit is to use a Task. The beauty of a Task is that you can create one and kick it off in-line in your code (though caution may be warranted). You can pass a cancellation token to handle when you want to cancel the Task. Also, it uses the promise approach to chaining events, and you can have it return a specific type of value. Moreover, with async and await, more options exist and your code will be more portable.
In essence, it is important to understand the pros and cons with Tasks vs. Threads vs. the .NET ThreadPool. If I need high performance, I am going to use threads, and I prefer using my own pool.
An easy way to compare is start up 512 Threads, 512 Tasks, and 512 ThreadPool threads. You’ll find a delay in the beginning with Threads (hence, why write a thread pool), but all 512 Threads will be running in a few seconds while Tasks and .NET ThreadPool threads take up to a few minutes to all start.
Below are the results of such a test (i5 quad core with 16 GB of RAM), giving each 30 seconds to run. The code executed performs simple file I/O on an SSD drive.
Test Results
Thread pools are great when you have more tasks to process than available threads.
You can add all the tasks to a thread pool and specify the maximum number of threads that can run at a certain time.
Check out this page on MSDN:
http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80).aspx
Always use a thread pool if you can, work at the highest level of abstraction possible. Thread pools hide creating and destroying threads for you, this is usually a good thing!
Most of the time you can use the pool as you avoid the expensive process of creating the thread.
However in some scenarios you may want to create a thread. For example if you are not the only one using the thread pool and the thread you create is long-lived (to avoid consuming shared resources) or for example if you want to control the stacksize of the thread.
Don't forget to investigate the Background worker.
I find for a lot of situations, it gives me just what i want without the heavy lifting.
Cheers.
I usually use the Threadpool whenever I need to just do something on another thread and don't really care when it runs or ends. Something like logging or maybe even background downloading a file (though there are better ways to do that async-style). I use my own thread when I need more control. Also what I've found is using a Threadsafe queue (hack your own) to store "command objects" is nice when I have multiple commands that I need to work on in >1 thread. So you'd may split up an Xml file and put each element in a queue and then have multiple threads working on doing some processing on these elements. I wrote such a queue way back in uni (VB.net!) that I've converted to C#. I've included it below for no particular reason (this code might contain some errors).
using System.Collections.Generic;
using System.Threading;
namespace ThreadSafeQueue {
public class ThreadSafeQueue<T> {
private Queue<T> _queue;
public ThreadSafeQueue() {
_queue = new Queue<T>();
}
public void EnqueueSafe(T item) {
lock ( this ) {
_queue.Enqueue(item);
if ( _queue.Count >= 1 )
Monitor.Pulse(this);
}
}
public T DequeueSafe() {
lock ( this ) {
while ( _queue.Count <= 0 )
Monitor.Wait(this);
return this.DeEnqueueUnblock();
}
}
private T DeEnqueueUnblock() {
return _queue.Dequeue();
}
}
}
I wanted a thread pool to distribute work across cores with as little latency as possible, and that didn't have to play well with other applications. I found that the .NET thread pool performance wasn't as good as it could be. I knew I wanted one thread per core, so I wrote my own thread pool substitute class. The code is provided as an answer to another StackOverflow question over here.
As to the original question, the thread pool is useful for breaking repetitive computations up into parts that can be executed in parallel (assuming they can be executed in parallel without changing the outcome). Manual thread management is useful for tasks like UI and IO.