Greetings Overflowers,
I'm working with C# .Net 4.
I want to give priority to specific parts of my code to utilize concurrency while other parts can also do the same only if there is room (free cores), otherwise they should switch to sequential execution (in the invoker thread) rather than just being temporarily blocked.
How can I do that ? Any good recent reading on that specific issue ?
Do Parallel.Invoke(...) execute stuff sequentially in the invoker task/thread if no cores are available?
Regards
There's no such thing as a 'free thread'. There's only an unused core. You can simply count them with Environment.ProcessorCount and base you threading strategy on that. Not leveraging the ThreadPool is usually a mistake, it does a good job distributing tp threads across cores. But it doesn't easily give you what you ask for, ThreadPool.GetAvailableThreads is a constantly changing number. Should be anyway.
I got the impression that the parallel framework handed out threads based on physical availability. http://msdn.microsoft.com/en-gb/concurrency/default.aspx
I might be thinking of the parallel extensions ( http://blogs.msdn.com/b/pfxteam/ ), or they might be the same thing or I might be talking rubbish :-)
In my experience with F#, it's possible to control/configure use of concurrency from outside of your core application code. I would expect C# Async CTP to offer similar functionality.
Free threads is a concept that exists only in the .Net thread pool. What you meant probably is free CPU resources!?
If you are declaring your own threads with new Thread() you are not bound by it.
However spawning a lot of them can slow down the process and even the OS.
That is why you should make your own thread manager to handle this.
You could check the CPU usage liek this:
PerformanceCounter cpuCounter = new PerformanceCounter();
cpuCounter.CategoryName = "Processor";
cpuCounter.CounterName = "% Processor Time";
cpuCounter.InstanceName = "_Total";
double cpuUsage = Convert.ToDouble(cpuCounter.NextValue());
And then in you code use the variable to do:
int threadId;
if (cpuUsage > threshold) {
DoWork();
}
else {
threadId = YourThreadManager.Queue(DoWork);
}
You problem is prioritization. You are saying that you have two types of tasks: important tasks (which must run), and leisure tasks (that can run only when there is a free CPU and no important tasks are queued).
The simple solution is to assign a very high priority for important tasks, and very low priority for leisure tasks. This way, you almost guarantee that most of your system's CPU resources go towards running important tasks. However, it does not guarantee that your leisure tasks always run after all the important tasks are run. The system may still schedule some leisure tasks onces in a while.
However, the Task Parallel Library does not allow scheduling tasks with different priorities. You can either use something like this or override the TPL's scheduler with a custom implementation of BlockingCollection that always dequeues the important tasks first.
One solution is to schedule your important tasks with the TPL, but manually schedule your leisure tasks on your own background thread (set to the lowest thread priority) -- and this thread runs the leisure tasks sequentially.
Related
What feedback does TPL use to dynamically adjust the number of worker threads?
My previous understanding was that it measures the rate of task completion to see if adding or removing threads is worth it. But then, why does this code keep increasing the number of threads, even though there is a bottleneck introduced by a semaphore?
Surely, there can be no more than 20 task completions per second, and more than 2 threads will not improve that.
var activeThreads = 0;
var semaphore = new SemaphoreSlim(2);
var res = Parallel.For(0, 1_000_000, i =>
{
Interlocked.Increment(ref activeThreads);
semaphore.Wait();
try
{
Thread.Sleep(100);
Console.WriteLine("Threads: " + activeThreads);
}
finally
{
Interlocked.Decrement(ref activeThreads);
semaphore.Release();
}
});
I believe the ParallelOptions is what you are looking for to specify the amount of parallelism.
Parallel.For(0, 1000, new ParallelOptions
{
MaxDegreeOfParallelism = 2
}, i => { Console.WriteLine(i); });
Personally, I think the TPL library will work in a lot of cases, but it isn't really smart about execution distribution (pardon my english). Whenever you have bottlenecks in the execution of your application, have a look at the pipeline pattern for example. Here is a link that describes different approaches to parallel execution very well imo: https://www.dotnetcurry.com/patterns-practices/1407/producer-consumer-pattern-dotnet-csharp
TL;DR: The thing that you are doing in your code that the TPL uses to justify creating a new thread is blocking. (Synchronizing or sleeping, or performing I/O would all count as blocking.)
A longer explanation...
When your task runs, it takes its thread hostage for 100 ms (because you Sleep(100)). While you are sleeping, that thread cannot be used to run other tasks because it would risk not being in a runnable state when the sleep time period expires. Typically we sleep rather than perform an asynchronous action because we need to keep our call stack intact. We are therefore relying on the stack to maintain our state. And the stack is a one-of-a-kind resource for the thread. (There's not actually a lot more to a thread than its stack.)
So the TPL (Thread pool, specifically) tries to keep occupancy high but the thread count low. One way it achieves this is by making sure that there are approximately just as many runnable threads in the system as there are virtual processors. Each time it needs to increase the thread count, it must create a relatively expensive stack for the thread, so it's best not to have so many. And a thread that is not runnable cannot be scheduled, so when the CPU becomes free, you need something to schedule to make use of the processing resources available. If the thread is sleeping, it cannot be scheduled to run. So instead, a thread will be added to the thread pool and the next task will be scheduled on it.
When you are writing parallel code like this (as in your parallel for loop) that can be partitioned and managed by the TPL you should be careful about putting your thread into a non-runnable state. Performing synchronous I/O, waiting for a synchronization object (e.g. semaphore, event or mutex etc.), or sleeping will put the thread into a state where nothing else can be done on the thread until the I/O completes, the sleep interval expires, or the synchronization object becomes signalled. The thread is no good to the TPL during this period.
In your case, you do several of these things: you wait on a semaphore, you sleep, and you perform I/O by writing to the console. The first thing is waiting on that semaphore. If it's not signalled, then you immediately have the situation where the thread is not runnable and the next task of your million-or-so tasks that need to be run must be scheduled on a different thread. If there isn't one, then the TPL can justify creating a new thread to get more tasks started. After-all, what if it's thread #987,321 that will actually wind up setting the semaphore to unblock task #1? The TPL doesn't know what your code does, so it can delay creating threads for a while in the spirit of efficiency, but for correctness, ultimately it will have to create more threads to start chipping away at the task list. There is a complex, implementation-specific heuristic that it applies to monitor, predict and otherwise get this efficiency guess right.
Now your specific question actually asked what feedback does it use to adjust the number of threads. Like I said, the actual implementation is complex and you should probably think of it as a black-box. But in a nutshell, if there are no runnable threads, it may create another thread to keep chipping away at the task list (or may wait a while before doing so, hoping that things will free up), and if there are too many idle threads, it will terminate the idle threads to reclaim their resources.
And to reiterate, as I said at the top, and to hopefully answer your question this time, the one thing you do that allows the TPL to justify creating a new thread is to block. ...even on that first semaphore.
Ran into an article analysing the thread injection algorithm in 2017. As of 2019-08-01, the hillclimbing.cpp file on GitHub hasn't really changed so the article should still be up to date.
Relevant details:
The .NET thread pool has two main mechanisms for injecting threads: a
starvation-avoidance mechanism that adds worker threads if it sees no
progress being made on queued items and a hill-climbing heuristic that
tries to maximize throughput while using as few threads as possible.
...
It calculates the desired number of threads based on the ‘current
throughput’, which is the ‘# of tasks completed’ (numCompletions)
during the current time-period (sampleDuration in seconds).
...
It also takes the current thread count (currentThreadCount) into
consideration.
...
The real .NET Thread Pool only increases the thread-count by one
thread every 500 milliseconds. It keeps doing this until the ‘# of
threads’ has reached the amount that the hill-climbing algorithm
suggests.
...
The [hill-climbing] algorithm only returns values that respect the limits
specified by ThreadPool.SetMinThreads(..) and
ThreadPool.SetMaxThreads(..)
...
In addition, [the hill-climbing algorithm] will only recommend
increasing the thread count if the CPU Utilization is below 95%
So it turns out the thread pool does have a feedback mechanism based on task completion rate. It also doesn't explicitly check whether its threads are blocked or running, but it does keep an eye on overall CPU utilization to detect blockages. All this also means it should be roughly aware of what the other threads and processes are doing.
On the other hand, it will always eagerly spawn at least as many threads as told by ThreadPool.SetMinThreads(), which defaults to the number of logical processors on the machine.
In conclusion, the test code in question was doing two things which make it keep piling up more threads:
there are lots of tasks queued up and sitting in the queue for ages, which indicates starvation
CPU utilization is negligible, which means that a new thread should be able to use it
Is there any way to limit the number of processors that the ThreadPool object will use? According to the docs, "You cannot set the number of worker threads or the number of I/O completion threads to a number smaller than the number of processors in the computer."
So how can I limit my program to not consume all the processors?
After some experiments, I think I have just the thing. I've noticed that the ThreadPool considers the number of processors in the system as the number of processors available to the current process. This can work to your advantage.
I have 4 cores in my CPU. Trying to call SetMaxThreads with 2:
ThreadPool.SetMaxThreads(2, 2);
fails since I have 4 cores and so the numbers remain at their initial values (1023 and 1000 for my system).
However, like I said initially, the ThreadPool only considers the number of processors available to the process, which I can manage using Process.ProcessorAffinity. Doing this:
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(3);
limits the available processors to the first two cores (since 3 = 11 in binary). Calling SetMaxThreads again:
ThreadPool.SetMaxThreads(2, 2);
should work like a charm (at least it did for me). Just make sure to use the affinity setting right at program start-up!
Of course, I would not encourage this hack, since anyway your process will be stuck with a limited number of cores for the entire duration of its execution.
Thread pool was specifically designed to remove all the headache of manual managing of threads. OS task scheduler has been worked on for many years and is doing a very good job at scheduling tasks for execution. There are multiple options that the scheduler considers, such as thread priority, memory location and proximity to the processor etc. Unless you have deep understanding of these processes, you better off from setting thread affinity.
If I may quote docs
Thread affinity forces a thread to run on a specific subset of
processors. Setting thread affinity should generally be avoided,
because it can interfere with the scheduler's ability to schedule
threads effectively across processors. This can decrease the
performance gains produced by parallel processing. An appropriate use
of thread affinity is testing each processor.
It is possible to set a Thread affinity though (relevant post), for which you will need to create your own thread object (and not the one from thread pool). To my limited knowledge, you cannot set a thread affinity for a thread from a pool.
Since SetMaxThreads doesn't allow you to set no. of threads lower than no. of processors your net best bet is to write your own TaskScheduler in TPL that limits the concurrency by queuing the tasks. And then instead of adding items to threadpool, you can create tasks with that scheduler.
See
How to: Create a Task Scheduler That Limits the Degree of Concurrency
You don't typically limit the threadpool, you throttle the number of threads your program spawns at a time. The Task Parallel Library lets you set MaxDegreeOfParallelism, or you can use a Semaphore.
Maybe I did not understand it right ... all the Parallel class issue :(
But from what I am reading now, I understand that when I use the Parallel I actually mobilize all the threads that exists in the threadPool for some task/mission.
For example:
var arrayStrings = new string[1000];
Parallel.ForEach<string>(arrayStrings, someString =>
{
DoSomething(someString);
});
So the Parallel.ForEach in this case is mobilizing all the threads that exists in the threadPool for the 'DoSomething' task/mission.
But does the call Parallel.ForEach will create any new thread at all?
Its clear that there will be no 1000 new threads. But lets assume that there are 1000 new threads, some case that the threadPool release all the thread that it hold so, in this case ... the Parallel.ForEach will create any new thread?
Short answer: Parallel.ForEach() does not “mobilize all the threads”. And any operation that schedules some work on the ThreadPool (which Parallel.ForEach() does) can cause creation of new thread in the pool.
Long answer: To understand this properly, you need to know how three levels of abstraction work: Parallel.ForEach(), TaskScheduler and ThreadPool:
Parallel.ForEach() (and Parallel.For()) schedule their work on a TaskScheduler. If you don't specify a scheduler explicitly, the current one will be used.
Parallel.ForEach() splits the work between several Tasks. Each Task will process a part of the input sequence, and when it's done, it will request another part if one is available, and so on.
How many Tasks will Parallel.ForEach() create? As many as the TaskScheduler will let it run. The way this is done is that each Task first enqueues a copy of itself when it starts executing (unless doing so would violate MaxDegreeOfParallelism, if you set it). This way, the actual concurrency level is up to the TaskScheduler.
Also, the first Task will actually execute on the current thread, if the TaskScheduler supports it (this is done using RunSynchronously()).
The default TaskScheduler simply enqueues each Task to the ThreadPool queue. (Actually, it's more complicated if you start a Task from another Task, but that's not relevant here.) Other TaskSchedulers can do completely different things and some of them (like TaskScheduler.FromCurrentSynchronizationContext()) are completely unsuitable for use with Parallel.ForEach().
The ThreadPool uses quite a complex algorithm to decide exactly how many threads should be running at any given time. But the most important thing here is that scheduling new work item can cause the creating of a new thread (although not necessarily immediately). And because with Parallel.ForEach(), there is always some item queued to be executed, it's completely up to the internal algorithm of ThreadPool to decide the number of threads.
Put together, it's pretty much impossible to decide how many threads will be used by a Parallel.ForEach(), because it depends on many variables. Both extremes are possible: that the loop will run completely synchronously on the current thread and that each item will be run on its own, newly created thread.
But generally, is should be close to optimal efficiency and you probably don't have to worry about all those details.
Parallel.Foreach does not create new threads, nor does it "mobilize all the threads". It uses a limited number of threads from the threadpool and submits tasks to them for parallel execution. In the current implementation the default is to use one thread per core.
I think you have this the wrong way round.
From PATTERNS OF PARALLEL PROGRAMMING you'll see that Parallel.ForEach is just really syntactic sugar.
The Parallel.ForEach is largely boiled down to something like this,
for (int p = 0; p < arrayStrings.Count(); p++)
{
ThreadPool.QueueUserWorkItem(DoSomething(arrayStrings[p]);
}
The ThreadPool takes care of the scheduling. There are some excellent articles around how the ThreadPool's scheduler behaves to some degree if you're interested, but that's nothing to do with TPL.
Parallel does not deal with threads at all - it schedules TASKS to the task framework. THat then has a scheduler and the default scheduler goes to the threadpool. This one will try to find a goo number of threads (better in 4.5 than 4.0) and the Threadpool may slowly spin up new threads.
But that is not a functoin of parallel.foreach ;)
the Parallel.ForEach will create any new thread ???
It never will. As I said - it has 1000 foreach, then it queues 10.000 tasks, Point. THe Task factory scheduler will do what it is programmed to do ((you can replace it). Generally, default - yes, slowly new threads will spring up WITHIN REASON.
From what I understand about the difference between Task & Thread is that task happened in the thread-pool while the thread is something that I need to managed by myself .. ( and that task can be cancel and return to the thread-pool in the end of his mission )
But in some blog I read that if the operating system need to create task and create thread => it will be easier to create ( and destroy ) task.
Someone can explain please why creating task is simple that thread ?
( or maybe I missing something here ... )
I think that what you are talking about when you say Task is a System.Threading.Task. If that's the case then you can think about it this way:
A program can have many threads, but a processor core can only run one Thread at a time.
Threads are very expensive, and switching between the threads that are running is also very expensive.
So... Having thousands of threads doing stuff is inefficient. Imagine if your teacher gave you 10,000 tasks to do. You'd spend so much time cycling between them that you'd never get anything done. The same thing can happen to the CPU if you start too many threads.
To get around this, the .NET framework allows you to create Tasks. Tasks are a bit of work bundled up into an object, and they allow you to do interesting things like capture the output of that work and chain pieces of work together (first go to the store, then buy a magazine).
Tasks are scheduled on a pool of threads. The specific number of threads depends on the scheduler used, but the default scheduler tries to pick a number of threads that is optimal for the number of CPU cores that you have and how much time your tasks are spending actually using CPU time. If you want to, you can even write your own scheduler that does something specific like making sure that all Tasks for that scheduler always operate on a single thread.
So think of Tasks as items in your to-do list. You might be able to do 5 things at once, but if your boss gives you 10000, they will pile up in your inbox until the first 5 that you are doing get done. The difference between Tasks and the ThreadPool is that Tasks (as I mentioned earlier) give you better control over the relationship between different items of work (imagine to-do items with multiple instructions stapled together), whereas the ThreadPool just allows you to queue up a bunch of individual, single-stage items (Functions).
You are hearing two different notions of task. The first is the notion of a job, and the second is the notion of a process.
A long time ago (in computer terms), there were no threads. Each running instance of a program was called a process, since it simply performed one step after another after another until it exited. This matches the intuitive idea of a process as a series of steps, like that of a factory assembly line. The operating system manages the process abstraction.
Then, developers began to add multiple assembly lines to the factories. Now a program could do more than one thing at once, and either a library or (more commonly today) the operating system would manage the scheduling of the steps within each thread. A thread is kind of a lightweight process, but a thread belongs to a process, and all the threads in a process share memory. On the other hand, multiple processes can't mess with each others' memory. So, the multiple threads in your web server can each access the same information about the connection, but Word can't access Excel's in-memory data structures because Word and Excel are running as separate processes. The idea of a process as a series of steps doesn't really match the model of a process with threads, so some people took to calling the "abstraction formerly known as a process" a task. This is the second definition of task that you saw in the blog post. Note that plenty of people still use the word process to mean this thing.
Well, as threads became more commmon, developers added even more abstractions over top of them to make them easier to use. This led to the rise of the thread pool, which is a library-managed "pool" of threads. You pass the library a job, and the library picks a thread and runs the job on that thread. The .NET framework has a thread pool implementation, and the first time you heard about a "task" the documentation really meant a job that you pass to the thread pool.
So in a sense, both the documentation and the blog post are right. The overloading of the term task is the unfortunate source of confusion.
Threads have been a part of .Net from v1.0, Tasks were introduced in the Task Parallel Library TPL which was released in .Net 4.0.
You can consider a Task as a more sophisticated version of a Thread. They are very easy to use and have a lot of advantages over Threads as follows:
You can create return types to Tasks as if they are functions.
You can the "ContinueWith" method, which will wait for the previous task and then start the execution. (Abstracting wait)
Abstracts Locks which should be avoided as per guidlines of my company.
You can use Task.WaitAll and pass an array of tasks so you can wait till all tasks are complete.
You can attach task to the parent task, thus you can decide whether the parent or the child will exist first.
You can achieve data parallelism with LINQ queries.
You can create parallel for and foreach loops
Very easy to handle exceptions with tasks.
*Most important thing is if the same code is run on single core machine it will just act as a single process without any overhead of threads.
Disadvantage of tasks over threads:
You need .Net 4.0
Newcomers who have learned operating systems can understand threads better.
New to the framework so not much assistance available.
Some tips:-
Always use Task.Factory.StartNew method which is semantically perfect and standard.
Take a look at Task Parallel Libray for more information
http://msdn.microsoft.com/en-us/library/dd460717.aspx
Expanding on the comment by Eric Lippert:
Threads are a way that allows your application to do several things in parallel. For example, your application might have one thread that processes the events from the user, like button clicks, and another thread that performs some long computation. This way, you can do two different things “at the same time”. If you didn't do that, the user wouldn't be to click buttons until the computation finished. So, Thread is something that can execute some code you wrote.
Task, on the other hand represents an abstract notion of some job. That job can have a result, and you can wait until the job finishes (by calling Wait()) or say that you want to do something after the job finishes (by calling ContinueWith()).
The most common job that you want to represent is to perform some computation in parallel with the current code. And Task offers you a simple way to do that. How and when the code actually runs is defined by TaskScheduler. The default one uses a ThreadPool: a set of threads that can run any code. This is done because creating and switching threads in inefficient.
But Task doesn't have to be directly associated with some code. You can use TaskCompletionSource to create a Task and then set its result whenever you want. For example, you could create a Task and mark it as completed when the user clicks a button. Some other code could wait on that Task and while it's waiting, there is no code executing for that Task.
If you want to know when to use Task and when to use Thread: Task is simpler to use and more efficient that creating your own Threads. But sometimes, you need more control than what is offered by Task. In those cases, it makese sense to use Thread directly.
Tasks really are just a wrapper for the boilerplate code of spinning up threads manually. At the root, there is no difference. Tasks just make the management of threads easier, as well as they are generally more expressive due to the lessening of the boilerplate noise.
Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have been trying to learn multi-threaded programming in C# and I am confused about when it is best to use a thread pool vs. create my own threads. One book recommends using a thread pool for small tasks only (whatever that means), but I can't seem to find any real guidelines.
What are some pros and cons of thread pools vs creating my own threads? And what are some example use cases for each?
I would suggest you use a thread pool in C# for the same reasons as any other language.
When you want to limit the number of threads running or don't want the overhead of creating and destroying them, use a thread pool.
By small tasks, the book you read means tasks with a short lifetime. If it takes ten seconds to create a thread which only runs for one second, that's one place where you should be using pools (ignore my actual figures, it's the ratio that counts).
Otherwise you spend the bulk of your time creating and destroying threads rather than simply doing the work they're intended to do.
If you have lots of logical tasks that require constant processing and you want that to be done in parallel use the pool+scheduler.
If you need to make your IO related tasks concurrently such as downloading stuff from remote servers or disk access, but need to do this say once every few minutes, then make your own threads and kill them once you're finished.
Edit: About some considerations, I use thread pools for database access, physics/simulation, AI(games), and for scripted tasks ran on virtual machines that process lots of user defined tasks.
Normally a pool consists of 2 threads per processor (so likely 4 nowadays), however you can set up the amount of threads you want, if you know how many you need.
Edit: The reason to make your own threads is because of context changes, (thats when threads need to swap in and out of the process, along with their memory). Having useless context changes, say when you aren't using your threads, just leaving them sit around as one might say, can easily half the performance of your program (say you have 3 sleeping threads and 2 active threads). Thus if those downloading threads are just waiting they're eating up tons of CPU and cooling down the cache for your real application
Here's a nice summary of the thread pool in .Net: http://blogs.msdn.com/pedram/archive/2007/08/05/dedicated-thread-or-a-threadpool-thread.aspx
The post also has some points on when you should not use the thread pool and start your own thread instead.
I highly recommend reading the this free e-book:
Threading in C# by Joseph Albahari
At least read the "Getting Started" section. The e-book provides a great introduction and includes a wealth of advanced threading information as well.
Knowing whether or not to use the thread pool is just the beginning. Next you will need to determine which method of entering the thread pool best suits your needs:
Task Parallel Library (.NET Framework
4.0)
ThreadPool.QueueUserWorkItem
Asynchronous Delegates
BackgroundWorker
This e-book explains these all and advises when to use them vs. create your own thread.
The thread pool is designed to reduce context switching among your threads. Consider a process that has several components running. Each of those components could be creating worker threads. The more threads in your process, the more time is wasted on context switching.
Now, if each of those components were queuing items to the thread pool, you would have a lot less context switching overhead.
The thread pool is designed to maximize the work being done across your CPUs (or CPU cores). That is why, by default, the thread pool spins up multiple threads per processor.
There are some situations where you would not want to use the thread pool. If you are waiting on I/O, or waiting on an event, etc then you tie up that thread pool thread and it can't be used by anyone else. Same idea applies to long running tasks, though what constitutes a long running task is subjective.
Pax Diablo makes a good point as well. Spinning up threads is not free. It takes time and they consume additional memory for their stack space. The thread pool will re-use threads to amortize this cost.
Note: you asked about using a thread pool thread to download data or perform disk I/O. You should not use a thread pool thread for this (for the reasons I outlined above). Instead use asynchronous I/O (aka the BeginXX and EndXX methods). For a FileStream that would be BeginRead and EndRead. For an HttpWebRequest that would be BeginGetResponse and EndGetResponse. They are more complicated to use, but they are the proper way to perform multi-threaded I/O.
Beware of the .NET thread pool for operations that may block for any significant, variable or unknown part of their processing, as it is prone to thread starvation. Consider using the .NET parallel extensions, which provide a good number of logical abstractions over threaded operations. They also include a new scheduler, which should be an improvement on ThreadPool. See here
One reason to use the thread pool for small tasks only is that there are a limited number of thread pool threads. If one is used for a long time then it stops that thread from being used by other code. If this happens many times then the thread pool can become used up.
Using up the thread pool can have subtle effects - some .NET timers use thread pool threads and will not fire, for example.
If you have a background task that will live for a long time, like for the entire lifetime of your application, then creating your own thread is a reasonable thing. If you have short jobs that need to be done in a thread, then use thread pooling.
In an application where you are creating many threads, the overhead of creating the threads becomes substantial. Using the thread pool creates the threads once and reuses them, thus avoiding the thread creation overhead.
In an application that I worked on, changing from creating threads to using the thread pool for the short lived threads really helpped the through put of the application.
For the highest performance with concurrently executing units, write your own thread pool, where a pool of Thread objects are created at start up and go to blocking (formerly suspended), waiting on a context to run (an object with a standard interface implemented by your code).
So many articles about Tasks vs. Threads vs. the .NET ThreadPool fail to really give you what you need to make a decision for performance. But when you compare them, Threads win out and especially a pool of Threads. They are distributed the best across CPUs and they start up faster.
What should be discussed is the fact that the main execution unit of Windows (including Windows 10) is a thread, and OS context switching overhead is usually negligible. Simply put, I have not been able to find convincing evidence of many of these articles, whether the article claims higher performance by saving context switching or better CPU usage.
Now for a bit of realism:
Most of us won’t need our application to be deterministic, and most of us do not have a hard-knocks background with threads, which for instance often comes with developing an operating system. What I wrote above is not for a beginner.
So what may be most important is to discuss is what is easy to program.
If you create your own thread pool, you’ll have a bit of writing to do as you’ll need to be concerned with tracking execution status, how to simulate suspend and resume, and how to cancel execution – including in an application-wide shut down. You might also have to be concerned with whether you want to dynamically grow your pool and also what capacity limitation your pool will have. I can write such a framework in an hour but that is because I’ve done it so many times.
Perhaps the easiest way to write an execution unit is to use a Task. The beauty of a Task is that you can create one and kick it off in-line in your code (though caution may be warranted). You can pass a cancellation token to handle when you want to cancel the Task. Also, it uses the promise approach to chaining events, and you can have it return a specific type of value. Moreover, with async and await, more options exist and your code will be more portable.
In essence, it is important to understand the pros and cons with Tasks vs. Threads vs. the .NET ThreadPool. If I need high performance, I am going to use threads, and I prefer using my own pool.
An easy way to compare is start up 512 Threads, 512 Tasks, and 512 ThreadPool threads. You’ll find a delay in the beginning with Threads (hence, why write a thread pool), but all 512 Threads will be running in a few seconds while Tasks and .NET ThreadPool threads take up to a few minutes to all start.
Below are the results of such a test (i5 quad core with 16 GB of RAM), giving each 30 seconds to run. The code executed performs simple file I/O on an SSD drive.
Test Results
Thread pools are great when you have more tasks to process than available threads.
You can add all the tasks to a thread pool and specify the maximum number of threads that can run at a certain time.
Check out this page on MSDN:
http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80).aspx
Always use a thread pool if you can, work at the highest level of abstraction possible. Thread pools hide creating and destroying threads for you, this is usually a good thing!
Most of the time you can use the pool as you avoid the expensive process of creating the thread.
However in some scenarios you may want to create a thread. For example if you are not the only one using the thread pool and the thread you create is long-lived (to avoid consuming shared resources) or for example if you want to control the stacksize of the thread.
Don't forget to investigate the Background worker.
I find for a lot of situations, it gives me just what i want without the heavy lifting.
Cheers.
I usually use the Threadpool whenever I need to just do something on another thread and don't really care when it runs or ends. Something like logging or maybe even background downloading a file (though there are better ways to do that async-style). I use my own thread when I need more control. Also what I've found is using a Threadsafe queue (hack your own) to store "command objects" is nice when I have multiple commands that I need to work on in >1 thread. So you'd may split up an Xml file and put each element in a queue and then have multiple threads working on doing some processing on these elements. I wrote such a queue way back in uni (VB.net!) that I've converted to C#. I've included it below for no particular reason (this code might contain some errors).
using System.Collections.Generic;
using System.Threading;
namespace ThreadSafeQueue {
public class ThreadSafeQueue<T> {
private Queue<T> _queue;
public ThreadSafeQueue() {
_queue = new Queue<T>();
}
public void EnqueueSafe(T item) {
lock ( this ) {
_queue.Enqueue(item);
if ( _queue.Count >= 1 )
Monitor.Pulse(this);
}
}
public T DequeueSafe() {
lock ( this ) {
while ( _queue.Count <= 0 )
Monitor.Wait(this);
return this.DeEnqueueUnblock();
}
}
private T DeEnqueueUnblock() {
return _queue.Dequeue();
}
}
}
I wanted a thread pool to distribute work across cores with as little latency as possible, and that didn't have to play well with other applications. I found that the .NET thread pool performance wasn't as good as it could be. I knew I wanted one thread per core, so I wrote my own thread pool substitute class. The code is provided as an answer to another StackOverflow question over here.
As to the original question, the thread pool is useful for breaking repetitive computations up into parts that can be executed in parallel (assuming they can be executed in parallel without changing the outcome). Manual thread management is useful for tasks like UI and IO.