As I understand Parallel API use the Thread pool internally and they queue up the items for parallel processing, however, when I checked up the execution of one such parallel loop using SOS debugger, then my understanding is that if I have 10 tasks lined up then all of them might not go in parallel and CLR would decide how many threads to dispatch for the given tasks to be executed, so it may be 4 or 5 or 6 (varied number in each execution)
However in case my total task number if not very high like 10 and I want all of them to go in parallel, since all of them are long running, then it is preferable to have them on Traditional threads, which will ensure 1 thread per task and they all go in parallel
In case the number of tasks are good number like 100, then usage of Parallel or Threadpool is a practical solution, as we do not want to invoke 100 individual threads per process
Please share your view, I understand the benefit of Parallel API making complete Parallel programming very easy to implement, but here my aim is different
By default, the .NET thread pool initializes a number of worker threads that corresponds to the number of logical cores on your machine. It subsequently employs a hill-climbing heuristic that adjusts this number based on the current task workload, firing up new worker threads when a task takes too long to complete.
You are correct in wishing for your long-running tasks to be executed simultaneously through thread oversubscription (i.e. running multiple threads per logical core). In fact, the Task Parallel Library infrastructure (TPL) provides specifically for this scenario through the LongRunning option, which (under the current implementation) spawns a new dedicated thread for each task marked thusly.
Task.Factory.StartNew(myLongRunningAction, TaskCreationOptions.LongRunning);
Related
What feedback does TPL use to dynamically adjust the number of worker threads?
My previous understanding was that it measures the rate of task completion to see if adding or removing threads is worth it. But then, why does this code keep increasing the number of threads, even though there is a bottleneck introduced by a semaphore?
Surely, there can be no more than 20 task completions per second, and more than 2 threads will not improve that.
var activeThreads = 0;
var semaphore = new SemaphoreSlim(2);
var res = Parallel.For(0, 1_000_000, i =>
{
Interlocked.Increment(ref activeThreads);
semaphore.Wait();
try
{
Thread.Sleep(100);
Console.WriteLine("Threads: " + activeThreads);
}
finally
{
Interlocked.Decrement(ref activeThreads);
semaphore.Release();
}
});
I believe the ParallelOptions is what you are looking for to specify the amount of parallelism.
Parallel.For(0, 1000, new ParallelOptions
{
MaxDegreeOfParallelism = 2
}, i => { Console.WriteLine(i); });
Personally, I think the TPL library will work in a lot of cases, but it isn't really smart about execution distribution (pardon my english). Whenever you have bottlenecks in the execution of your application, have a look at the pipeline pattern for example. Here is a link that describes different approaches to parallel execution very well imo: https://www.dotnetcurry.com/patterns-practices/1407/producer-consumer-pattern-dotnet-csharp
TL;DR: The thing that you are doing in your code that the TPL uses to justify creating a new thread is blocking. (Synchronizing or sleeping, or performing I/O would all count as blocking.)
A longer explanation...
When your task runs, it takes its thread hostage for 100 ms (because you Sleep(100)). While you are sleeping, that thread cannot be used to run other tasks because it would risk not being in a runnable state when the sleep time period expires. Typically we sleep rather than perform an asynchronous action because we need to keep our call stack intact. We are therefore relying on the stack to maintain our state. And the stack is a one-of-a-kind resource for the thread. (There's not actually a lot more to a thread than its stack.)
So the TPL (Thread pool, specifically) tries to keep occupancy high but the thread count low. One way it achieves this is by making sure that there are approximately just as many runnable threads in the system as there are virtual processors. Each time it needs to increase the thread count, it must create a relatively expensive stack for the thread, so it's best not to have so many. And a thread that is not runnable cannot be scheduled, so when the CPU becomes free, you need something to schedule to make use of the processing resources available. If the thread is sleeping, it cannot be scheduled to run. So instead, a thread will be added to the thread pool and the next task will be scheduled on it.
When you are writing parallel code like this (as in your parallel for loop) that can be partitioned and managed by the TPL you should be careful about putting your thread into a non-runnable state. Performing synchronous I/O, waiting for a synchronization object (e.g. semaphore, event or mutex etc.), or sleeping will put the thread into a state where nothing else can be done on the thread until the I/O completes, the sleep interval expires, or the synchronization object becomes signalled. The thread is no good to the TPL during this period.
In your case, you do several of these things: you wait on a semaphore, you sleep, and you perform I/O by writing to the console. The first thing is waiting on that semaphore. If it's not signalled, then you immediately have the situation where the thread is not runnable and the next task of your million-or-so tasks that need to be run must be scheduled on a different thread. If there isn't one, then the TPL can justify creating a new thread to get more tasks started. After-all, what if it's thread #987,321 that will actually wind up setting the semaphore to unblock task #1? The TPL doesn't know what your code does, so it can delay creating threads for a while in the spirit of efficiency, but for correctness, ultimately it will have to create more threads to start chipping away at the task list. There is a complex, implementation-specific heuristic that it applies to monitor, predict and otherwise get this efficiency guess right.
Now your specific question actually asked what feedback does it use to adjust the number of threads. Like I said, the actual implementation is complex and you should probably think of it as a black-box. But in a nutshell, if there are no runnable threads, it may create another thread to keep chipping away at the task list (or may wait a while before doing so, hoping that things will free up), and if there are too many idle threads, it will terminate the idle threads to reclaim their resources.
And to reiterate, as I said at the top, and to hopefully answer your question this time, the one thing you do that allows the TPL to justify creating a new thread is to block. ...even on that first semaphore.
Ran into an article analysing the thread injection algorithm in 2017. As of 2019-08-01, the hillclimbing.cpp file on GitHub hasn't really changed so the article should still be up to date.
Relevant details:
The .NET thread pool has two main mechanisms for injecting threads: a
starvation-avoidance mechanism that adds worker threads if it sees no
progress being made on queued items and a hill-climbing heuristic that
tries to maximize throughput while using as few threads as possible.
...
It calculates the desired number of threads based on the ‘current
throughput’, which is the ‘# of tasks completed’ (numCompletions)
during the current time-period (sampleDuration in seconds).
...
It also takes the current thread count (currentThreadCount) into
consideration.
...
The real .NET Thread Pool only increases the thread-count by one
thread every 500 milliseconds. It keeps doing this until the ‘# of
threads’ has reached the amount that the hill-climbing algorithm
suggests.
...
The [hill-climbing] algorithm only returns values that respect the limits
specified by ThreadPool.SetMinThreads(..) and
ThreadPool.SetMaxThreads(..)
...
In addition, [the hill-climbing algorithm] will only recommend
increasing the thread count if the CPU Utilization is below 95%
So it turns out the thread pool does have a feedback mechanism based on task completion rate. It also doesn't explicitly check whether its threads are blocked or running, but it does keep an eye on overall CPU utilization to detect blockages. All this also means it should be roughly aware of what the other threads and processes are doing.
On the other hand, it will always eagerly spawn at least as many threads as told by ThreadPool.SetMinThreads(), which defaults to the number of logical processors on the machine.
In conclusion, the test code in question was doing two things which make it keep piling up more threads:
there are lots of tasks queued up and sitting in the queue for ages, which indicates starvation
CPU utilization is negligible, which means that a new thread should be able to use it
I'm reading Essential C# 5.0 which says,
The thread pool also assumes that all the work will be relatively
short-running (that is, consuming milliseconds or seconds of processor
time, not hours or days). By making this assumption, it can ensure
that each processor is working full out on a task, and not
inefficiently time slicing multiple tasks. The thread pool attempts to
prevent excessive time slicing by ensuring that thread creation is
"throttled" and so that no one processor is "oversubscrived" with too
many threads.
I've always thought one of the benefits of multithreading was time slicing.
If you >1 processor, than you can concurrently run threads and achieve true multithreading. But unless that's the case, you'd have to resort to time slicing for applications with multiple threads right?
So, if the ThreadPool in C# doesn't time slice, then,
a. Does it mean the ThreadPool is only used to get around the overhead in creating new threads?
b. Does it mean the ThreadPool can't run multiple threads simultaneously unless the processor has multiple cores, where each core can run a single process?
The .NET thread pool will create multiple threads per core, but has heuristics to keep the number of threads as low as possible while performing the maximum amount of work.
This means if your code is CPU-bound, you may end up with a single thread per core. If your code is I/O-bound and blocks, or queues up a huge amount of work items, you may end up with many threads per core.
It's not just thread creation that is expensive: context switching between hundreds of threads takes up a lot of time that you'd rather be spent running your own code. More threads is almost never better.
The quote you mention refers to the rate at which the Threadpool creates new threads when all of the threads that it has already created are already allocated to a task. The new thread creation rate is throttled (and new tasks are queued) to avoid creating many threads (and swamping the CPU) when a large burst of tasks are put on the Threadpool.
The current algorithm does indeed create many threads per CPU core, but it creates them relatively slowly, in the hope that the current backlog of tasks will be quickly satisfied by the threads that it has already created, and adding threads will not be needed.
Q. Does it mean the ThreadPool is only used to get around the overhead in creating new threads?
A. No. Thread creation is usually not the problem. The goal is make your CPU work at 100%, while using as as few concurrently executing threads as it is possible. The low number of threads will avoid excessive context switching, and improve overall execution time.
Q. Does it mean the ThreadPool can't run multiple threads simultaneously unless the processor has multiple cores, where each core can run a single process?
A. It runs multiple threads simultaneously. Ideally it runs exactly one CPU-intensive task per core, in which case you CPU works at 100%
After working a while noticed that, even if you spawn 1000 tasks, they don't start immediately. So basically even if i start 1000 tasks, 100 of them running and 900 of them waiting to run.
So my question is, how are they begining to start ?
How .net determines when to start running task or make it waittorun ?
What methodolgy i can follow to start them immediately ?
I want to have certain number of task/thread running all the time.
If i use threads instead of tasks would they start running immediately or .net will start them as it please like tasks ?
Question may not be very clear, so please ask me to clarify.
Basically i am spawning 1000 (keeping this number spawned. when 1 task completed starting another task) tasks but only 125 of them Running and 875 of them WaitingToRun :)
this is how i start task
Task.Factory.StartNew(() =>
{
startCheckingProxies();
});
c# wpf 4.5
If you are talking about Task objects, they are run on top of the thread pool, so they will not all start immediately by running each on a separate thread. Instead, the limited number of tasks will initially be started on threads coming from the pool, and then the threads will be reused to run next tasks and so on.
Of course, this is just a high-level description, the logic behind is more complex and implements lot of optimizations.
You can find more info here and here
You can also start tasks with the overload of StartNew which lets you tweak options and scheduler settings. Note, however, that running on a large number of threads will likely result in worse performance. Thread creation and context switching have significant costs, and running thousands of threads will, IMO, backfire.
Tasks are really just threads under the hood.
There is a limit to how much benefit you can get by spawning new threads. Each thread has some overhead, so at some point, the overhead is going to exceed the benefit of spawning a new thread. If you leave the spawning of those tasks to the Framework, it is going to decide for itself how many threads it's going to run at once, and it's going to make that decision based on how much productivity it thinks it can get from those threads.
I'm pretty sure that optimal number is not going to be a thousand; I've written Windows Services where the optimal number of threads to run at the same time is the number of cores in the machine (in my case, it was 4).
Is there any way to limit the number of processors that the ThreadPool object will use? According to the docs, "You cannot set the number of worker threads or the number of I/O completion threads to a number smaller than the number of processors in the computer."
So how can I limit my program to not consume all the processors?
After some experiments, I think I have just the thing. I've noticed that the ThreadPool considers the number of processors in the system as the number of processors available to the current process. This can work to your advantage.
I have 4 cores in my CPU. Trying to call SetMaxThreads with 2:
ThreadPool.SetMaxThreads(2, 2);
fails since I have 4 cores and so the numbers remain at their initial values (1023 and 1000 for my system).
However, like I said initially, the ThreadPool only considers the number of processors available to the process, which I can manage using Process.ProcessorAffinity. Doing this:
Process.GetCurrentProcess().ProcessorAffinity = new IntPtr(3);
limits the available processors to the first two cores (since 3 = 11 in binary). Calling SetMaxThreads again:
ThreadPool.SetMaxThreads(2, 2);
should work like a charm (at least it did for me). Just make sure to use the affinity setting right at program start-up!
Of course, I would not encourage this hack, since anyway your process will be stuck with a limited number of cores for the entire duration of its execution.
Thread pool was specifically designed to remove all the headache of manual managing of threads. OS task scheduler has been worked on for many years and is doing a very good job at scheduling tasks for execution. There are multiple options that the scheduler considers, such as thread priority, memory location and proximity to the processor etc. Unless you have deep understanding of these processes, you better off from setting thread affinity.
If I may quote docs
Thread affinity forces a thread to run on a specific subset of
processors. Setting thread affinity should generally be avoided,
because it can interfere with the scheduler's ability to schedule
threads effectively across processors. This can decrease the
performance gains produced by parallel processing. An appropriate use
of thread affinity is testing each processor.
It is possible to set a Thread affinity though (relevant post), for which you will need to create your own thread object (and not the one from thread pool). To my limited knowledge, you cannot set a thread affinity for a thread from a pool.
Since SetMaxThreads doesn't allow you to set no. of threads lower than no. of processors your net best bet is to write your own TaskScheduler in TPL that limits the concurrency by queuing the tasks. And then instead of adding items to threadpool, you can create tasks with that scheduler.
See
How to: Create a Task Scheduler That Limits the Degree of Concurrency
You don't typically limit the threadpool, you throttle the number of threads your program spawns at a time. The Task Parallel Library lets you set MaxDegreeOfParallelism, or you can use a Semaphore.
I'm running a winservice that has 2 main objectives.
Execute/Handle exposed webmethods.
Run Inner processes that consume allot of CPU.
The problem is that when I execute many inner processes |(as tasks) that are queued to the threadpool or taskpool, the execution of the webmethods takes much more time as WCF also queues its executions to the same threadpool. This even happens when setting the inner processes task priority to lowest and setting the webmethods thread priority to heights.
I hoped that Framework 4.0 would improve this, and they have, but still it takes quite allot of time for the system to handle the WCF queued tasks if the CPU is handling other inner tasks.
Is it possible to change the Threadpool that WCF uses to a different one?
Is it possible to manually change the task queue (global task queue, local task queue).
Is it possible to manually handle 2 task queues that behave differently ?
Any help in the subject would be appropriated.
Gilad.
Keep in mind that the ThreadPool harbors two distinct types of threads: worker threads and I/O completion threads. WCF requests will be serviced by I/O threads. Tasks that you run via ThreadPool.QueueUserWorkItem will run on worker threads. So in that respect the WCF requests and the other CPU tasks are working from different queues already.
Some of your performance issues may be caused by your ThreadPool settings. From MSDN:
The thread pool maintains a minimum number of idle threads. For worker threads, the default value of this minimum is the number of processors. The GetMinThreads method obtains the minimum numbers of idle worker and I/O completion threads. When all thread pool threads have been assigned to tasks, the thread pool does not immediately begin creating new idle threads. To avoid unnecessarily allocating stack space for threads, it creates new idle threads at intervals. The interval is currently half a second, although it could change in future versions of the .NET Framework. If an application is subject to bursts of activity in which large numbers of thread pool tasks are queued, use the SetMinThreads method to increase the minimum number of idle threads. Otherwise, the built-in delay in creating new idle threads could cause a bottleneck.
I have certainly experienced the above bottleneck in the past. There is a method called SetMinThreads that will allow you to change these settings. By the way, you mention setting thread priorites; however, I am not familiar with the mechanism for changing thread priorities of the ThreadPool. Could you please elaborate? Also, I've read that setting thread priorities can be fraught with danger.
Coding Horror : Thread Priorities are Evil
By the way, how many processors/cores is your machine running?
Since you are using .NET 4.0, you could run your long running processes through the TPL. By default, the TPL uses the .NET thread pool to execute tasks but you can also provide your own TaskScheduler implementation. Take a look at the example scheduler implementations in the samples for the TPL. I have not used it personally, but the QueuedTaskScheduler seems to assign tasks to a queue and uses its own thread pool to process the tasks. You can use this to define the maximum amount of threads you want to use for your long running tasks.