How does the Parallel class dynamically adjust the level of parallelism?

How does the Parallel class dynamically adjust the level of parallelism? - c#

What feedback does TPL use to dynamically adjust the number of worker threads?
My previous understanding was that it measures the rate of task completion to see if adding or removing threads is worth it. But then, why does this code keep increasing the number of threads, even though there is a bottleneck introduced by a semaphore?
Surely, there can be no more than 20 task completions per second, and more than 2 threads will not improve that.
var activeThreads = 0;
var semaphore = new SemaphoreSlim(2);
var res = Parallel.For(0, 1_000_000, i =>
{
Interlocked.Increment(ref activeThreads);
semaphore.Wait();
try
{
Thread.Sleep(100);
Console.WriteLine("Threads: " + activeThreads);
}
finally
{
Interlocked.Decrement(ref activeThreads);
semaphore.Release();
}
});

I believe the ParallelOptions is what you are looking for to specify the amount of parallelism.
Parallel.For(0, 1000, new ParallelOptions
{
MaxDegreeOfParallelism = 2
}, i => { Console.WriteLine(i); });
Personally, I think the TPL library will work in a lot of cases, but it isn't really smart about execution distribution (pardon my english). Whenever you have bottlenecks in the execution of your application, have a look at the pipeline pattern for example. Here is a link that describes different approaches to parallel execution very well imo: https://www.dotnetcurry.com/patterns-practices/1407/producer-consumer-pattern-dotnet-csharp

TL;DR: The thing that you are doing in your code that the TPL uses to justify creating a new thread is blocking. (Synchronizing or sleeping, or performing I/O would all count as blocking.)
A longer explanation...
When your task runs, it takes its thread hostage for 100 ms (because you Sleep(100)). While you are sleeping, that thread cannot be used to run other tasks because it would risk not being in a runnable state when the sleep time period expires. Typically we sleep rather than perform an asynchronous action because we need to keep our call stack intact. We are therefore relying on the stack to maintain our state. And the stack is a one-of-a-kind resource for the thread. (There's not actually a lot more to a thread than its stack.)
So the TPL (Thread pool, specifically) tries to keep occupancy high but the thread count low. One way it achieves this is by making sure that there are approximately just as many runnable threads in the system as there are virtual processors. Each time it needs to increase the thread count, it must create a relatively expensive stack for the thread, so it's best not to have so many. And a thread that is not runnable cannot be scheduled, so when the CPU becomes free, you need something to schedule to make use of the processing resources available. If the thread is sleeping, it cannot be scheduled to run. So instead, a thread will be added to the thread pool and the next task will be scheduled on it.
When you are writing parallel code like this (as in your parallel for loop) that can be partitioned and managed by the TPL you should be careful about putting your thread into a non-runnable state. Performing synchronous I/O, waiting for a synchronization object (e.g. semaphore, event or mutex etc.), or sleeping will put the thread into a state where nothing else can be done on the thread until the I/O completes, the sleep interval expires, or the synchronization object becomes signalled. The thread is no good to the TPL during this period.
In your case, you do several of these things: you wait on a semaphore, you sleep, and you perform I/O by writing to the console. The first thing is waiting on that semaphore. If it's not signalled, then you immediately have the situation where the thread is not runnable and the next task of your million-or-so tasks that need to be run must be scheduled on a different thread. If there isn't one, then the TPL can justify creating a new thread to get more tasks started. After-all, what if it's thread #987,321 that will actually wind up setting the semaphore to unblock task #1? The TPL doesn't know what your code does, so it can delay creating threads for a while in the spirit of efficiency, but for correctness, ultimately it will have to create more threads to start chipping away at the task list. There is a complex, implementation-specific heuristic that it applies to monitor, predict and otherwise get this efficiency guess right.
Now your specific question actually asked what feedback does it use to adjust the number of threads. Like I said, the actual implementation is complex and you should probably think of it as a black-box. But in a nutshell, if there are no runnable threads, it may create another thread to keep chipping away at the task list (or may wait a while before doing so, hoping that things will free up), and if there are too many idle threads, it will terminate the idle threads to reclaim their resources.
And to reiterate, as I said at the top, and to hopefully answer your question this time, the one thing you do that allows the TPL to justify creating a new thread is to block. ...even on that first semaphore.

Ran into an article analysing the thread injection algorithm in 2017. As of 2019-08-01, the hillclimbing.cpp file on GitHub hasn't really changed so the article should still be up to date.
Relevant details:
The .NET thread pool has two main mechanisms for injecting threads: a
starvation-avoidance mechanism that adds worker threads if it sees no
progress being made on queued items and a hill-climbing heuristic that
tries to maximize throughput while using as few threads as possible.
...
It calculates the desired number of threads based on the ‘current
throughput’, which is the ‘# of tasks completed’ (numCompletions)
during the current time-period (sampleDuration in seconds).
...
It also takes the current thread count (currentThreadCount) into
consideration.
...
The real .NET Thread Pool only increases the thread-count by one
thread every 500 milliseconds. It keeps doing this until the ‘# of
threads’ has reached the amount that the hill-climbing algorithm
suggests.
...
The [hill-climbing] algorithm only returns values that respect the limits
specified by ThreadPool.SetMinThreads(..) and
ThreadPool.SetMaxThreads(..)
...
In addition, [the hill-climbing algorithm] will only recommend
increasing the thread count if the CPU Utilization is below 95%
So it turns out the thread pool does have a feedback mechanism based on task completion rate. It also doesn't explicitly check whether its threads are blocked or running, but it does keep an eye on overall CPU utilization to detect blockages. All this also means it should be roughly aware of what the other threads and processes are doing.
On the other hand, it will always eagerly spawn at least as many threads as told by ThreadPool.SetMinThreads(), which defaults to the number of logical processors on the machine.
In conclusion, the test code in question was doing two things which make it keep piling up more threads:
there are lots of tasks queued up and sitting in the queue for ages, which indicates starvation
CPU utilization is negligible, which means that a new thread should be able to use it

Related

What is the advantage of creating a thread outside threadpool?

Okay, So I wanted to know what happens when I use TaskCreationOptions.LongRunning. By this answer, I came to know that for long running tasks, I should use this options because it creates a thread outside of threadpool.
Cool. But what advantage would I get when I create a thread outside threadpool? And when to do it and avoid it?

what advantage would I get when I create a thread outside threadpool?
The threadpool, as it name states, is a pool of threads which are allocated once and re-used throughout, in order to save the time and resources necessary to allocate a thread. The pool itself re-sizes on demand. If you queue more work than actual workers exist in the pool, it will allocate more threads in 500ms intervals, one at a time (this exists to avoid allocation of multiple threads simultaneously where existing threads may already finish executing and can serve requests). If many long running operations are performed on the thread-pool, it causes "thread starvation", meaning delegates will start getting queued and ran only once a thread frees up. That's why you'd want to avoid a large amount of threads doing lengthy work with thread-pool threads.
The Managed Thread-Pool docs also have a section on this question:
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
For more, see:
Thread vs ThreadPool
When should I not use the ThreadPool in .Net?
Dedicated thread or thread-pool thread?

"Long running" can be quantified pretty well, a thread that takes more than half a second is running long. That's a mountain of processor instructions on a modern machine, you'd have to burn a fat five billion of them per second. Pretty hard to do in a constructive way unless you are calculating the value of Pi to thousands of decimals in the fraction.
Practical threads can only take that long when they are not burning core but are waiting a lot. Invariably on an I/O completion, like reading data from a disk, a network, a dbase server. And often the reason you'd start considering using a thread in the first place.
The threadpool has a "manager". It determines when a threadpool thread is allowed to start. It doesn't happen immediately when you start it in your program. The manager tries to limit the number of running threads to the number of CPU cores you have. It is much more efficient that way, context switching between too many active threads is expensive. And a good throttle, preventing your program from consuming too many resources in a burst.
But the threadpool manager has the very common problem with managers, it doesn't know enough about what is going on. Just like my manager doesn't know that I'm goofing off at Stackoverflow.com, the tp manager doesn't know that a thread is waiting for something and not actually performing useful work. Without that knowledge it cannot make good decisions. A thread that does a lot of waiting should be ignored and another one should be allowed to run in its place. Actually doing real work.
Just like you tell your manager that you go on vacation, so he can expect no work to get done, you tell the threadpool manager the same thing with LongRunning.
Do note that it isn't quite a bad as it, perhaps, sounds in this answer. Particularly .NET 4.0 hired a new manager that's a lot smarter at figuring out the optimum number of running threads. It does so with a feedback loop, collecting data to discover if active threads actually get work done. And adjusts the optimum accordingly. Only problem with this approach is the common one when you close a feedback loop, you have to make it slow so the loop cannot become unstable. In other words, it isn't particularly quick at driving up the number of active threads.
If you know ahead of time that the thread is pretty abysmal, running for many seconds with no real cpu load then always pick LongRunning. Otherwise it is a tuning job, observing the program when it is done and tinkering with it to make it more optimal.

Change thread priority from lowest to highest in .net

I am trying to speed things by having one thread write to a linked list and another thread process the linked list.
For some reason if the method that writes to the linked list I make it into a task and the method that reads from the linked list a low priority thread the program finishes as a whole much faster. In other words I experiense fastests results when doing:
Task.Factory.StartNew( AddItems );
new Thread( startProcessingItems ) { Priority = ThreadPriority.Lowest }.Start();
while(completed==false)
Thread.Sleep(0);
Maybe because the first task is doing so much extra work than the other thread that's why everything as a whole will finish faster if I set the second method a low priority.
Anyways now my question is The startProcessingItems runs with ThreadPriority = Lowest. How could I change it's priority to highest? If I create a new Task in that method will it be running with low priority? Basically the startProcessingItems ends with a list and once it has that list I will like to start executing with highest priority.

This is not a good approach. First off, LinkedList<T> is not thread safe, so writing to it and reading from it in two threads will cause race conditions.
A better approach would be to use BlockingCollection<T>. This allows you to add items (producer thread) and read items (consumer thread) without worrying about thread safety, as it's completely thread safe.
The reading thread can just call blockingCollection.GetConsumingEnumerable() in a foreach to get the elements, and the write thread just adds them. The reading thread will automatically block, so there's no need to mess with priorities.
When the writing thread is "done", you just call CompleteAdding, which will, in turn, allow the reading thread to finish automatically.

You can improve the performance of your program by changing the inherent design, rather than by changing thread/process priorities.
A big part of your problem is that you're doing a busywait:
while(completed==false)
Thread.Sleep(0);
This results in it consuming lots of CPU cycles for no productive work, and it's why lowering it's priority makes it execute quicker. If you aren't busywaiting then that won't be an issue any more.
As Reed has suggested, BlockingCollection is tailor made for this situation. You can have a producer thread adding items using Add, and a consumer thread using Take knowing that the method will simply block if there are no more items to be removed.
You can also store the Task you create and use Task.Result or Task.Wait to have the main thread wait for the other task to finish (without wasting CPU cycles). (If you are using threads directly you can use Join.)

In addition to what Reed and Servy have said:
Thread priority is relative to the processes priority.
The Windows scheduler takes into account all other threads when it schedules time for threads. Threads with higher priority take time away from other threads which could artificially slow down the rest of the system. It's not like the system has no reason not to give you thread more priority. The priority would only have an effect if something else took the CPU away from it--which happens for a reason. If nothing took the CPU away from the thread, it won't magically run faster with a priority of Highest.
Setting thread priority to Highest is almost always the wrong thing to do.
Likely the overhead of synchronization between the two threads will kill any performance gain you think you might get.
Also, Thread.Sleep(0) only relinquishes time to threads of equal priority and is ready to run--which could lead to thread starvation. http://msdn.microsoft.com/en-us/library/d00bd51t(v=vs.80).aspx

When using Task what happens if the ThreadPool is full/busy?

When I am using the .Net 4 Task class which uses the ThreadPool, what happens if all Threads are busy?
Does the TaskScheduler create a new thread and extend the ThreadPool maximum number of threads or does it sit and wait until a thread is available?

The maximum number of threads in the ThreadPool is set to around 1000 threads on a .NET4.0 32-bit system. It's less for older versions of .NET. If you have 1000 threads going, say they're blocking for some reason, when you queue the 1001st task, it won't ever execute.
You will never hit the max thread count in a 32 bit process. Keep in mind that each thread takes at least 1MB of memory (that's the size of the user-mode stack), plus any other overhead. You already lose a lot of memory from the CLR and native DLLs loaded, so you'll hit an OutOfMemoryException before you use that many threads.
You can change the number of threads the ThreadPool can use, by calling the ThreadPool.SetMaxThreads method. However, if you're expecting to use that many threads, you have a far larger problem with your code. I do NOT recommend you mess around with ThreadPool configurations like that. You'll most likely just get worse performance.
Keep in mind with Task and ThreadPool.QueueUserWorkItem, the threads are re-used when they're done. If you create a task or queue a threadpool thread, it may or may not create a new thread to execute your code. If there are available threads already in the pool, it will re-use one of those instead of creating (an expensive) new thread. Only if the methods you're executing in the tasks never return, should you worry about running out of threads, but like I said, that's an entirely different problem with your code.

By default, the MaxThreads of the ThreadPool is very high. Usually you'll never get there, your app will crash first.
So when all threads are busy the new tasks are queued and slowly, at most 1 per 500 ms, the TP will allocate new threads.

It won't increase MaxThreads. When there are more tasks than available worker threads, some tasks will be queued and wait until the thread pool provides an available thread. It does some pretty advanced stuff to scale with a large number of cores (work-stealing, thread injection, etc).

How to "Free" a thread

I have 20 threads running at a time in my program, (create 20 wait for them to finish, start another 20), after a while my program slows way down. Do I need to free the tasks or do anything special? If so how, if not is there a common reason why a program like this would slow down?

You might want to consider using the ThreadPool, either directly, or via the Task Parallel Library (my preferred option). This is likely a better, simpler, and cleaner design than spawning your own threads and blocking on them repeatedly.
That being said, if your program is getting progressively slower, this is something where a profiler can help dramatically. Without seeing code, it's very difficult to diagnose. For example, depending on the work that you're doing, you may be causing the GC to become less efficient over time, which could cause the % of time spent in GC to climb as the program continues its execution. Profiling should give you a good indication of what is taking time as your program executes.

Reed's answer is probably the best way to deal with your issue; however, if you do want to manage the threads yourself, and not use the ThreadPool or TPL, I'd have to ask why you would let 20 threads die and create 20 more. Creating threads is an expensive process, which is why the thread pool exists. If you continually have the same number of parallel tasks, or a maximum number, they should be created once and reused. You can use locking constructs such as semaphore and mutex and have the threads wait when they are done, and just give them new data to work with and release them to proceed again. Waiting on a lock is a very inexpensive operation -- orders of magnitude cheaper than recreating a thread.
So for example, a thread might look like this (pseudocode):
while (program_not_ending)
{
wait_for_new_data_release; // Wait on thread's personal mutex
process_new_data;
resignal_my_mutex; // Cause the beginning of loop to wait again
release_semaphore_saying_I_am_done; // Increment parent semaphore count
}
The parent would then wait for its semaphore to fill up that 20 threads completed, reset the data buckets, and clear all of the thread mutexes.

Can you have too many Delegate.BeginInvoke calls at once?

I am cleaning up some old code converting it to work asynchronously.
psDelegate.GetStops decStops = psLoadRetrieve.GetLoadStopsByLoadID;
var arStops = decStops.BeginInvoke(loadID, null, null);
WaitHandle.WaitAll(new WaitHandle[] { arStops.AsyncWaitHandle });
var stops = decStops.EndInvoke(arStops);
Above is a single example of what I am doing for asynchronous work. My plan is to have close to 20 different delegates running. All will call BeginInvoke and wait until they are all complete before calling EndInvoke.
My question is will having so many delegates running cause problems? I understand that BeginInvoke uses the ThreadPool to do work and that has a limit of 25 threads. 20 is under that limit but it is very likely that other parts of the system could be using any number of threads from the ThreadPool as well.
Thanks!

No, the ThreadPool manager was designed to deal with this situation. It won't let all thread pool threads run at the same time. It starts off allowing as many threads to run as you have CPU cores. As soon as one completes, it allows another one to run.
Every half second, it steps in if the active threads are not completing. It assumes they are stuck and allows another one to run. On a 2 core CPU, you'd now have 3 threads running.
Getting to the maximum, 500 threads on a 2 core CPU, would take quite a while. You would have to have threads that don't complete for over 4 minutes. If your threads behave that way then you don't want to use threadpool threads.

The current default MaxThreads is 250 per processor, so effectively there should be no limit unless your application is just spewing out calls to BeginInvoke. See http://msdn.microsoft.com/en-us/library/system.threading.threadpool.aspx. Also, the thread pool will try to reuse existing threads before creating new ones in order to reduce the overhead of creating new threads. If you invokes are all fast you will probably not see the thread pool create a lot of threads.
For long running tasks or blocking tasks it is usually better to avoid the thread pool and managed the threads yourself.
However, trying to schedule that many threads will probably not yield the best results on most current machines.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.