Threadpool, order of execution and long running operations

Threadpool, order of execution and long running operations - c#

I have a need to create multiple processing threads in a new application. Each thread has the possibility of being "long running". Can someone comment on the viability of the built in .net threadpool or some existing custom threadpool for use in my application?
Requirements :
Works well within a windows service. (queued work can be removed from the queue, currently running threads can be told to halt)
Ability to spin up multiple threads.
Work needs to be started in sequential order, but multiple threads can be processing in parallel.
Hung threads can be detected and killed.
EDIT:
Comments seem to be leading towards manual threading. Unfortunately I am held to 3.5 version of the framework. Threadpool was appealing because it would allow me to queue work up and threads created for me when resources were available. Is there a good 3.5 compatable pattern (producer/consumer perhaps) that would give me this aspect of threadpool without actually using the threadpool?

Your requirements essentially rule out the use of the .NET ThreadPool;
It generally should not be used for long-running threads, due to the danger of exhausting the pool.
It does work well in Windows services, though, and you can spin up multiple threads - limited automatically by the pool's limits.
You can not guarantee thread starting times with the thread pool; it may queue threads for execution when it has enough free ones, and it does not even guarantee they will be started in the sequence you submit them.
There are no easy ways to detect and kill running threads in the ThreadPool
So essentially, you will want to look outside the ThreadPool; I might recommend that perhaps you might need 'full' System.Threading.Thread instances just due to all of your requirements. As long as you handle concurrency issues (as you must with any threading mechanism), I don't find the Thread class to be all that difficult to manage myself, really.

Simple answer, but the Task class (Fx4) meets most of your requirements.
Cancellation is cooperative, ie your Task code has to check for it.
But detecting hung threads is difficult, that is a very high requirement anyway.
But I can also read your requirements as for a JobQueue, where the 'work' consists of mostly similar jobs. You could roll your own system that Consumes that queue and monitors execution on a few Threads.

I've done essentially the same thing with .Net 3.5 by creating my own thread manager:
Instantiate worker classes that know how long they've been running.
Create threads that run a worker method and add them to a Queue<Thread>.
A supervisor thread reads threads from the Queue and adds them to a Dictionary<int, Worker> as it launches them until it hits its maximum running threads. Add the thread as a property of the Worker instance.
As each worker finishes it invokes a callback method from the supervisor that passes back its ManagedThreadId.
The supervisor removes the thread from the Dictionary and launches another waiting thread.
Poll the Dictionary of running workers to see if any have timed out, or put timers in the workers that invoke a callback if they take too long.
Signal a long-running worker to quit, or abort its thread.
The supervisor invokes callbacks to your main thread to inform of progress, etc.

Related

What is the advantage of creating a thread outside threadpool?

Okay, So I wanted to know what happens when I use TaskCreationOptions.LongRunning. By this answer, I came to know that for long running tasks, I should use this options because it creates a thread outside of threadpool.
Cool. But what advantage would I get when I create a thread outside threadpool? And when to do it and avoid it?

what advantage would I get when I create a thread outside threadpool?
The threadpool, as it name states, is a pool of threads which are allocated once and re-used throughout, in order to save the time and resources necessary to allocate a thread. The pool itself re-sizes on demand. If you queue more work than actual workers exist in the pool, it will allocate more threads in 500ms intervals, one at a time (this exists to avoid allocation of multiple threads simultaneously where existing threads may already finish executing and can serve requests). If many long running operations are performed on the thread-pool, it causes "thread starvation", meaning delegates will start getting queued and ran only once a thread frees up. That's why you'd want to avoid a large amount of threads doing lengthy work with thread-pool threads.
The Managed Thread-Pool docs also have a section on this question:
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
For more, see:
Thread vs ThreadPool
When should I not use the ThreadPool in .Net?
Dedicated thread or thread-pool thread?

"Long running" can be quantified pretty well, a thread that takes more than half a second is running long. That's a mountain of processor instructions on a modern machine, you'd have to burn a fat five billion of them per second. Pretty hard to do in a constructive way unless you are calculating the value of Pi to thousands of decimals in the fraction.
Practical threads can only take that long when they are not burning core but are waiting a lot. Invariably on an I/O completion, like reading data from a disk, a network, a dbase server. And often the reason you'd start considering using a thread in the first place.
The threadpool has a "manager". It determines when a threadpool thread is allowed to start. It doesn't happen immediately when you start it in your program. The manager tries to limit the number of running threads to the number of CPU cores you have. It is much more efficient that way, context switching between too many active threads is expensive. And a good throttle, preventing your program from consuming too many resources in a burst.
But the threadpool manager has the very common problem with managers, it doesn't know enough about what is going on. Just like my manager doesn't know that I'm goofing off at Stackoverflow.com, the tp manager doesn't know that a thread is waiting for something and not actually performing useful work. Without that knowledge it cannot make good decisions. A thread that does a lot of waiting should be ignored and another one should be allowed to run in its place. Actually doing real work.
Just like you tell your manager that you go on vacation, so he can expect no work to get done, you tell the threadpool manager the same thing with LongRunning.
Do note that it isn't quite a bad as it, perhaps, sounds in this answer. Particularly .NET 4.0 hired a new manager that's a lot smarter at figuring out the optimum number of running threads. It does so with a feedback loop, collecting data to discover if active threads actually get work done. And adjusts the optimum accordingly. Only problem with this approach is the common one when you close a feedback loop, you have to make it slow so the loop cannot become unstable. In other words, it isn't particularly quick at driving up the number of active threads.
If you know ahead of time that the thread is pretty abysmal, running for many seconds with no real cpu load then always pick LongRunning. Otherwise it is a tuning job, observing the program when it is done and tinkering with it to make it more optimal.

multithreading using threadPool or new thread() in server app

I have done a bunch of reading on the dynamics and implications of multi-threading in a server app (starving the clr thread pool, etc.) but let's say for sake of argument I have EXACTLY 4 async processes I need to accomplish per request of my (asp.net) page... Now let's say time is the more critical element and my site should not experience heavy traffic. In this scenario, is it preferable to spawn 4 threads using the new Thread() approach or the ThreadPool.QueueUserWorkItem method?
My concern (and my point) here is that using the ThreadPool method, it may create a thread pool that is too large than what I really want? When I only need 4 threads, can't I just spawn them myself to keep the number of allocated app domain, clr threads minimal?

Spawning a thread is a very costly and therefore high-latency operation. If you want to manage threads yourself, which would be reasonable but not required, you have to build a custom pool.
Using thread pool work items is not without danger because it does not guarantee you a concurrency level of 4. If you happen to get 2 or 3 you will have much more latency in your HTTP request.
I'd use the thread-pool and use SetMinThreads to ensure that threads are started without delay and that there are always enough.

I would definitely go for the ThreadPool approach. It's designed for exactly this kind of scenario. The thread pool will internally manage the number of threads required, making sure not to overburden the system. Quoting from MSDN:
The thread pool provides new worker threads or I/O completion threads
on demand until it reaches the minimum for each category. When a
minimum is reached, the thread pool can create additional threads in
that category or wait until some tasks complete. Beginning with the
.NET Framework 4, the thread pool creates and destroys worker threads
in order to optimize throughput, which is defined as the number of
tasks that complete per unit of time. Too few threads might not make
optimal use of available resources, whereas too many threads could
increase resource contention.
If you're really paranoid you can limit it manually with SetMaxThreads. Going for the manual threading management will only introduce potential bugs.
If you have access to .net 4.0 you can use the TPL Task class (it also uses the ThreadPool under the hood), as it has even more appealing features.

ThreadPool.QueueUserWorkItem uses ASP.Net

In Asp.Net for creating a huge pdf report iam using "ThreadPool.QueueUserWorkItem", My requirement is report has to be created asynchronously , and i do not want to wait for the Response. I plan to achieve it through below code
protected void Button1_Click(object sender, EventArgs e)
{
ThreadPool.QueueUserWorkItem(report => CreateReport());
}
public void CreateReport()
{
//This method will take 30 seconds to finish it work
}
My question is ThreadPool.QueueUserWorkItem will create a new thread from Asp.Net worker process or some system thread. Is this a good approach ?, I may have 100 of concurrent users accessing the web page.

The QueueUserWorkItem() method utilizes the process's ThreadPool which automatically manages a number of worker-threads. These threads are assigned a task, run them to completion, then are returned to the ThreadPool for reuse.
Since this is hosted in ASP.NET the ThreadPool will belong to the ASP.NET process.
The ThreadPool is a very good candidate for this type of work; as the alternative of spinning up a dedicated thread is relatively expensive. However, you should consider the following limitations of the ThreadPool as well:
The ThreadPool is used by other aspects of .NET, and provides a limited number of threads. If you overuse it there is the possibility your tasks will be blocked waiting for others to complete. This is especially a concern in terms of scalability--however it shouldn't dissuade you from using the ThreadPool unless you have reason to believe it will be a bottleneck.
The ThreadPool tasks must be carefully managed to ensure they are returned for reuse. Unhandled exceptions or returns from a background thread will essentially "leak" that thread and prevent it from being reused. In these scenarios the ThreadPool may effectively lose it's threads and cause a serious slowdown or halt of the process.
The tasks you assign to the ThreadPool should be short-lived. If your processing is intensive then it's a better idea to provide it with a dedicated thread.
All these topics relate to the simple concept that the ThreadPool is intended for small tasks, and for it's threads to provide a cost-saving to the consuming code by being reused. Your scenario sounds like a reasonable case for using the ThreadPool--however you will want to carefully code around it, and ensure you run realistic load-tests to determine if it is the best approach.

The thread pool will manage the number of active threads as needed. Once a thread is done with a task it continues on the next queued task. Using the thread pool is normally a good way to handle background processing.
When running in an ASP.NET application there are a couple of things to be aware of:
ASP.NET applications can be recycled for various reasons. When this happens all queued work items are lost.
There is no simple way to signal back to the client web browser that the operation completed.
A better approach in your case might be to have a WCF service with a REST/JSON binding that is called by AJAX code on the client web page for doing the heavy work. This would give you the possibility to report process and results back to the user.

In addition to what Anders Abel has already laid out, which I agree with entirely, you should consider that ASP.NET also uses the thread pool to respond to requests, so if you have long running work like this using up a thread pool thread, it is technically stealing from the resources which ASP.NET is able to use to fulfill other requests anyway.
If you were to ask me how best to architect it I would say you dispatch the work to a WCF service using one way messaging over the MSMQ transport. That way it is fast to dispatch, resilient to failure and processing of the requests on the WCF side can be more tightly controlled because the messages will just sit on the queue waiting to be processed. So if your server can only create 10 PDFs at a time you would just set the maxConcurrentCalls for the WCF service to 10 and it will only pull a maximum of 10 messages off the queue at once. Also, if your service shuts down, when it starts up it will just begin processing again.

How to set WCF threads to schedual differently

I'm running a winservice that has 2 main objectives.
Execute/Handle exposed webmethods.
Run Inner processes that consume allot of CPU.
The problem is that when I execute many inner processes |(as tasks) that are queued to the threadpool or taskpool, the execution of the webmethods takes much more time as WCF also queues its executions to the same threadpool. This even happens when setting the inner processes task priority to lowest and setting the webmethods thread priority to heights.
I hoped that Framework 4.0 would improve this, and they have, but still it takes quite allot of time for the system to handle the WCF queued tasks if the CPU is handling other inner tasks.
Is it possible to change the Threadpool that WCF uses to a different one?
Is it possible to manually change the task queue (global task queue, local task queue).
Is it possible to manually handle 2 task queues that behave differently ?
Any help in the subject would be appropriated.
Gilad.

Keep in mind that the ThreadPool harbors two distinct types of threads: worker threads and I/O completion threads. WCF requests will be serviced by I/O threads. Tasks that you run via ThreadPool.QueueUserWorkItem will run on worker threads. So in that respect the WCF requests and the other CPU tasks are working from different queues already.
Some of your performance issues may be caused by your ThreadPool settings. From MSDN:
The thread pool maintains a minimum number of idle threads. For worker threads, the default value of this minimum is the number of processors. The GetMinThreads method obtains the minimum numbers of idle worker and I/O completion threads. When all thread pool threads have been assigned to tasks, the thread pool does not immediately begin creating new idle threads. To avoid unnecessarily allocating stack space for threads, it creates new idle threads at intervals. The interval is currently half a second, although it could change in future versions of the .NET Framework. If an application is subject to bursts of activity in which large numbers of thread pool tasks are queued, use the SetMinThreads method to increase the minimum number of idle threads. Otherwise, the built-in delay in creating new idle threads could cause a bottleneck.
I have certainly experienced the above bottleneck in the past. There is a method called SetMinThreads that will allow you to change these settings. By the way, you mention setting thread priorites; however, I am not familiar with the mechanism for changing thread priorities of the ThreadPool. Could you please elaborate? Also, I've read that setting thread priorities can be fraught with danger.
Coding Horror : Thread Priorities are Evil
By the way, how many processors/cores is your machine running?

Since you are using .NET 4.0, you could run your long running processes through the TPL. By default, the TPL uses the .NET thread pool to execute tasks but you can also provide your own TaskScheduler implementation. Take a look at the example scheduler implementations in the samples for the TPL. I have not used it personally, but the QueuedTaskScheduler seems to assign tasks to a queue and uses its own thread pool to process the tasks. You can use this to define the maximum amount of threads you want to use for your long running tasks.

When to use thread pool in C#? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 4 years ago.
Improve this question
I have been trying to learn multi-threaded programming in C# and I am confused about when it is best to use a thread pool vs. create my own threads. One book recommends using a thread pool for small tasks only (whatever that means), but I can't seem to find any real guidelines.
What are some pros and cons of thread pools vs creating my own threads? And what are some example use cases for each?

I would suggest you use a thread pool in C# for the same reasons as any other language.
When you want to limit the number of threads running or don't want the overhead of creating and destroying them, use a thread pool.
By small tasks, the book you read means tasks with a short lifetime. If it takes ten seconds to create a thread which only runs for one second, that's one place where you should be using pools (ignore my actual figures, it's the ratio that counts).
Otherwise you spend the bulk of your time creating and destroying threads rather than simply doing the work they're intended to do.

If you have lots of logical tasks that require constant processing and you want that to be done in parallel use the pool+scheduler.
If you need to make your IO related tasks concurrently such as downloading stuff from remote servers or disk access, but need to do this say once every few minutes, then make your own threads and kill them once you're finished.
Edit: About some considerations, I use thread pools for database access, physics/simulation, AI(games), and for scripted tasks ran on virtual machines that process lots of user defined tasks.
Normally a pool consists of 2 threads per processor (so likely 4 nowadays), however you can set up the amount of threads you want, if you know how many you need.
Edit: The reason to make your own threads is because of context changes, (thats when threads need to swap in and out of the process, along with their memory). Having useless context changes, say when you aren't using your threads, just leaving them sit around as one might say, can easily half the performance of your program (say you have 3 sleeping threads and 2 active threads). Thus if those downloading threads are just waiting they're eating up tons of CPU and cooling down the cache for your real application

Here's a nice summary of the thread pool in .Net: http://blogs.msdn.com/pedram/archive/2007/08/05/dedicated-thread-or-a-threadpool-thread.aspx
The post also has some points on when you should not use the thread pool and start your own thread instead.

I highly recommend reading the this free e-book:
Threading in C# by Joseph Albahari
At least read the "Getting Started" section. The e-book provides a great introduction and includes a wealth of advanced threading information as well.
Knowing whether or not to use the thread pool is just the beginning. Next you will need to determine which method of entering the thread pool best suits your needs:
Task Parallel Library (.NET Framework
4.0)
ThreadPool.QueueUserWorkItem
Asynchronous Delegates
BackgroundWorker
This e-book explains these all and advises when to use them vs. create your own thread.

The thread pool is designed to reduce context switching among your threads. Consider a process that has several components running. Each of those components could be creating worker threads. The more threads in your process, the more time is wasted on context switching.
Now, if each of those components were queuing items to the thread pool, you would have a lot less context switching overhead.
The thread pool is designed to maximize the work being done across your CPUs (or CPU cores). That is why, by default, the thread pool spins up multiple threads per processor.
There are some situations where you would not want to use the thread pool. If you are waiting on I/O, or waiting on an event, etc then you tie up that thread pool thread and it can't be used by anyone else. Same idea applies to long running tasks, though what constitutes a long running task is subjective.
Pax Diablo makes a good point as well. Spinning up threads is not free. It takes time and they consume additional memory for their stack space. The thread pool will re-use threads to amortize this cost.
Note: you asked about using a thread pool thread to download data or perform disk I/O. You should not use a thread pool thread for this (for the reasons I outlined above). Instead use asynchronous I/O (aka the BeginXX and EndXX methods). For a FileStream that would be BeginRead and EndRead. For an HttpWebRequest that would be BeginGetResponse and EndGetResponse. They are more complicated to use, but they are the proper way to perform multi-threaded I/O.

Beware of the .NET thread pool for operations that may block for any significant, variable or unknown part of their processing, as it is prone to thread starvation. Consider using the .NET parallel extensions, which provide a good number of logical abstractions over threaded operations. They also include a new scheduler, which should be an improvement on ThreadPool. See here

One reason to use the thread pool for small tasks only is that there are a limited number of thread pool threads. If one is used for a long time then it stops that thread from being used by other code. If this happens many times then the thread pool can become used up.
Using up the thread pool can have subtle effects - some .NET timers use thread pool threads and will not fire, for example.

If you have a background task that will live for a long time, like for the entire lifetime of your application, then creating your own thread is a reasonable thing. If you have short jobs that need to be done in a thread, then use thread pooling.
In an application where you are creating many threads, the overhead of creating the threads becomes substantial. Using the thread pool creates the threads once and reuses them, thus avoiding the thread creation overhead.
In an application that I worked on, changing from creating threads to using the thread pool for the short lived threads really helpped the through put of the application.

For the highest performance with concurrently executing units, write your own thread pool, where a pool of Thread objects are created at start up and go to blocking (formerly suspended), waiting on a context to run (an object with a standard interface implemented by your code).
So many articles about Tasks vs. Threads vs. the .NET ThreadPool fail to really give you what you need to make a decision for performance. But when you compare them, Threads win out and especially a pool of Threads. They are distributed the best across CPUs and they start up faster.
What should be discussed is the fact that the main execution unit of Windows (including Windows 10) is a thread, and OS context switching overhead is usually negligible. Simply put, I have not been able to find convincing evidence of many of these articles, whether the article claims higher performance by saving context switching or better CPU usage.
Now for a bit of realism:
Most of us won’t need our application to be deterministic, and most of us do not have a hard-knocks background with threads, which for instance often comes with developing an operating system. What I wrote above is not for a beginner.
So what may be most important is to discuss is what is easy to program.
If you create your own thread pool, you’ll have a bit of writing to do as you’ll need to be concerned with tracking execution status, how to simulate suspend and resume, and how to cancel execution – including in an application-wide shut down. You might also have to be concerned with whether you want to dynamically grow your pool and also what capacity limitation your pool will have. I can write such a framework in an hour but that is because I’ve done it so many times.
Perhaps the easiest way to write an execution unit is to use a Task. The beauty of a Task is that you can create one and kick it off in-line in your code (though caution may be warranted). You can pass a cancellation token to handle when you want to cancel the Task. Also, it uses the promise approach to chaining events, and you can have it return a specific type of value. Moreover, with async and await, more options exist and your code will be more portable.
In essence, it is important to understand the pros and cons with Tasks vs. Threads vs. the .NET ThreadPool. If I need high performance, I am going to use threads, and I prefer using my own pool.
An easy way to compare is start up 512 Threads, 512 Tasks, and 512 ThreadPool threads. You’ll find a delay in the beginning with Threads (hence, why write a thread pool), but all 512 Threads will be running in a few seconds while Tasks and .NET ThreadPool threads take up to a few minutes to all start.
Below are the results of such a test (i5 quad core with 16 GB of RAM), giving each 30 seconds to run. The code executed performs simple file I/O on an SSD drive.
Test Results

Thread pools are great when you have more tasks to process than available threads.
You can add all the tasks to a thread pool and specify the maximum number of threads that can run at a certain time.
Check out this page on MSDN:
http://msdn.microsoft.com/en-us/library/3dasc8as(VS.80).aspx

Always use a thread pool if you can, work at the highest level of abstraction possible. Thread pools hide creating and destroying threads for you, this is usually a good thing!

Most of the time you can use the pool as you avoid the expensive process of creating the thread.
However in some scenarios you may want to create a thread. For example if you are not the only one using the thread pool and the thread you create is long-lived (to avoid consuming shared resources) or for example if you want to control the stacksize of the thread.

Don't forget to investigate the Background worker.
I find for a lot of situations, it gives me just what i want without the heavy lifting.
Cheers.

I usually use the Threadpool whenever I need to just do something on another thread and don't really care when it runs or ends. Something like logging or maybe even background downloading a file (though there are better ways to do that async-style). I use my own thread when I need more control. Also what I've found is using a Threadsafe queue (hack your own) to store "command objects" is nice when I have multiple commands that I need to work on in >1 thread. So you'd may split up an Xml file and put each element in a queue and then have multiple threads working on doing some processing on these elements. I wrote such a queue way back in uni (VB.net!) that I've converted to C#. I've included it below for no particular reason (this code might contain some errors).
using System.Collections.Generic;
using System.Threading;
namespace ThreadSafeQueue {
public class ThreadSafeQueue<T> {
private Queue<T> _queue;
public ThreadSafeQueue() {
_queue = new Queue<T>();
}
public void EnqueueSafe(T item) {
lock ( this ) {
_queue.Enqueue(item);
if ( _queue.Count >= 1 )
Monitor.Pulse(this);
}
}
public T DequeueSafe() {
lock ( this ) {
while ( _queue.Count <= 0 )
Monitor.Wait(this);
return this.DeEnqueueUnblock();
}
}
private T DeEnqueueUnblock() {
return _queue.Dequeue();
}
}
}

I wanted a thread pool to distribute work across cores with as little latency as possible, and that didn't have to play well with other applications. I found that the .NET thread pool performance wasn't as good as it could be. I knew I wanted one thread per core, so I wrote my own thread pool substitute class. The code is provided as an answer to another StackOverflow question over here.
As to the original question, the thread pool is useful for breaking repetitive computations up into parts that can be executed in parallel (assuming they can be executed in parallel without changing the outcome). Manual thread management is useful for tasks like UI and IO.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.