How is Task.Run limited by CPU cores?

How is Task.Run limited by CPU cores? - c#

Why is it that the following program will only run a limited number of blocked tasks. The limiting number seems to be the number of cores on the machine.
Initially when I wrote this I expected to see the following:
Job complete output of Jobs 1 - 24
A 2 second gap
Output of Jobs 25 - 48
However the output was:
Job complete output of Jobs 1 - 4
Then randomly completing jobs every couple of 100ms.
When running on server with 32 cores, the program did run as I had expected.
class Program
{
private static object _lock = new object();
static void Main(string[] args)
{
int completeJobs = 1;
var limiter = new MyThreadLimiter();
for (int iii = 1; iii < 100000000; iii++)
{
var jobId = iii;
limiter.Schedule()
.ContinueWith(t =>
{
lock (_lock)
{
completeJobs++;
Console.WriteLine("Job: " + completeJobs + " scheduled");
}
});
}
Console.ReadLine();
}
}
class MyThreadLimiter
{
readonly SemaphoreSlim _semaphore = new SemaphoreSlim(24);
public async Task Schedule()
{
await _semaphore.WaitAsync();
Task.Run(() => Thread.Sleep(2000))
.ContinueWith(t => _semaphore.Release());
}
}
However replacing the Thread.Sleep with Task.Delay gives my expected results.
public async Task Schedule()
{
await _semaphore.WaitAsync();
Task.Delay(2000)
.ContinueWith(t => _semaphore.Release());
}
And using a Thread gives my expected results
public async Task Schedule()
{
await _semaphore.WaitAsync();
var thread = new Thread(() =>
{
Thread.Sleep(2000);
_semaphore.Release();
});
thread.Start();
}
How does Task.Run() work? Is it the case it is limited to the number of cores?

Task.Run schedules the work to run in the thread pool. The thread pool is given wide latitude to schedule the work as best as it can in order to maximize throughput. It will create additional threads when it feels they will be helpful, and remove threads from the pool when it doesn't think it will be able to have enough work for them.
Creating more threads than your processor is able to run at the same time isn't going to be productive when you have CPU bound work. Adding more threads will just result in dramatically more context switches, increasing overhead, and reducing throughput.

Yes for compute bound operations Task.Run() internally uses CLR's thread pool which will throttle the number of new threads to avoid CPU over-subscription. Initially it will run the number of threads that equals to the number of cpu cores concurrently. Then it continually optimises the number of threads using a hill-climbing algorithm based on factors like the number of requests thread pool receives and overall computer resources to either create more threads or fewer threads.
In fact, this is one of the main benefits of using pooled thread over raw thread e.g. (new Thread(() => {}).Start()) as it not only recycles threads but also optimises performance internally for you. As mentioned in the other answer, it's generally a bad idea to block pooled threads because it will "mislead" thread pool's optimisation, simiarly using many pooled thread to do very long-running computation can also lead to thread pool creating more threads and consequently increase the overheads of context switch and later destory extra threads in the pool.

The Task.Run() is running based on CLR Thread pool.
There is a concept called 'OverSubscription', means there
are being more active thread than CPU Cores and they must be time-sliced.
In Thread-Pool when the number of threads that must be scheduled on CPU Cores
increase, Context-Switch rise and at the result, the performance will hurt.
The CLR that manages the Thread-Pool avoid OverSubscription by queuing
and throttling the thread startup and always try to compensate workload.

Related

How to scale an application with 50000 Simultaneous Tasks

I am working on a project which needs to be able to run (for example) 50,000 tasks simultaneously. Each task will run at some frequency (say 5 minutes) and will be either a url ping or an HTTP GET request. My initial plan was to create thread for each task. I ran a basic test to see if this was possible given available system resources. I ran the following code as a console app:
public class Program
{
public static void Test1()
{
Thread.Sleep(1000000);
}
public static void Main(string[] args)
{
for(int i = 0; i < 50000; i++)
{
Thread t = new Thread(new ThreadStart(Test1));
t.Start();
Console.WriteLine(i);
}
}
}
Unfortunately, though it started very fast, at the 2000 thread mark, the performance was greatly decreased. By 5000, I could count faster than the program could create threads. This makes getting to 50000 seem like it wouldn't be exactly possible. Am I on the right track or should I try something else? Thanks

Many people have the idea that you need to spawn n threads if you want to handle n tasks in parallel. Most of the time a computer is waiting, it is waiting on I/O such as network traffic, disk access, memory transfer for GPU compute, hardware device to complete an operation, etc.
Given this insight, we can see that a viable solution to handling as many tasks in parallel as possible for a given hardware platform is to pipeline work: place work in a queue and process it using as many threads as possible. Usually, this means 1-2 threads per virtual processor.
In C# we can accomplish this with the Task Parallel Library (TPL):
class Program
{
static Task RunAsync(int x)
{
return Task.Delay(10000);
}
static async Task Main(string[] args)
{
var tasks = Enumerable.Range(0, 50000).Select(x => RunAsync());
Console.WriteLine("Waiting for tasks to complete...");
await Task.WhenAll(tasks);
Console.WriteLine("Done");
}
}
This queues 50000 work items, and waits until all 50000 tasks are complete. These tasks only execute on as many threads that are needed. Behind the scenes, a task scheduler examines the pool of work and has threads steal work from the queue when they need a task to execute.
Additional Considerations
With a large upper bound (n=50000) you should be cognizant of memory pressure, garbage collector activity, and other task-related overhead. You should consider the following:
Consider using ValueTask<T> to minimize allocations, especially for synchronous operations
Use ConfigureAwait(false) where possible to reduce context switching
Use CancellationTokenSource and CancellationToken to cancel requests early (e.g. timeout)
Follow best practices
Avoid awaiting inside of a loop where possible
Avoid querying tasks too frequently for completion
Avoid accessing Task<T>.Result before a task is complete to prevent blocking
Avoid deadlocks by using synchronization primitives (mutex, semaphore, condition signal, synclock, etc) as appropriate
Avoid frequent use of Task.Run to create tasks to avoid exhausting the thread pool available to the default task scheduler (this method is usually reserved for compute-bound tasks)

Thread pool - 10 phreads created?

I am creating an app that deals with huge number of data to be processed. I want to use threading in C# just to make it processes faster. Please see example code below.
private static void MyProcess(Object someData)
{
//Do some data processing
}
static void Main(string[] args)
{
for (int task = 1; task < 10; task++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(MyProcess), task);
}
}
Does this mean that a new thread will be created every loop passing the task to the "MyProcess" method (10 threads total)? Also, are the threads going to process concurrently?

The number of threads a threadpool will start depends on multiple factors, see The managed thread pool
Basically you are queing 10 worker items here which are likely to start threads immediatly.
The threads will most likly run concurrently, depending on the machine and number of processors.
If you start a large number of worker items, they will end up in a queue and start running as soon as a thread becomes available.

The calls will be scheduled on the thread pool. It does not guarantee that 10 threads will be created nor that all 10 tasks will be executed concurrently. The number of threads in the thread pool depends on the hardware and is chosen automatically to provide the best performance.
This articles contain good explanations of how it works:
https://owlcation.com/stem/C-ThreadPool-and-its-Task-Queue-Example
https://learn.microsoft.com/en-us/dotnet/api/system.threading.threadpool?redirectedfrom=MSDN&view=netframework-4.8
https://www.c-sharpcorner.com/article/thread-pool-in-net-core-and-c-sharp/
This Stackoverflow question explains the difference between ThreadPool and Thread:
Thread vs ThreadPool

Your method will be queued 9 times (you start at 1, not 0) for execution and will be executed when a thredpool thread will be available.

Effects of non-awaited Task

I have a Task which I do not await because I want it to continue its own logic in the background. Part of that logic is to delay 60 seconds and check back in to see if some minute work is to be done. The abbreviate code looks something like this:
public Dictionary<string, Task> taskQueue = new Dictionary<string, Task>();
// Entry point
public void DoMainWork(string workId, XmlDocument workInstructions){
// A work task (i.e. "workInstructions") is actually a plugin which might use its own tasks internally or any other logic it sees fit.
var workTask = Task.Factory.StartNew(() => {
// Main work code that interprets workInstructions
// .........
// .........
// etc.
}, TaskCreationOptions.LongRunning);
// Add the work task to the queue of currently running tasks
taskQueue.Add(workId, workTask);
// Delay a period of time and then see if we need to extend our timeout for doing main work code
this.QueueCheckinOnWorkTask(workId); // Note the non-awaited task
}
private async Task QueueCheckinOnWorkTask(string workId){
DateTime startTime = DateTime.Now;
// Delay 60 seconds
await Task.Delay(60 * 1000).ConfigureAwait(false);
// Find out how long Task.Delay delayed for.
TimeSpan duration = DateTime.Now - startTime; // THIS SOMETIMES DENOTES TIMES MUCH LARGER THAN EXPECTED, I.E. 80+ SECONDS VS. 60
if(!taskQueue.ContainsKey(workId)){
// Do something based on work being complete
}else{
// Work is not complete, inform outside source we're still working
QueueCheckinOnWorkTask(workId); // Note the non-awaited task
}
}
Keep in mind, this is example code just to show a extremely miniminal version of what is going on with my actual program.
My problem is that Task.Delay() is delaying for longer than the time specified. Something is blocking this from continuing in a reasonable timeframe.
Unfortunately I haven't been able to replicate the issue on my development machine and it only happens on the server every couple of days. Lastly, it seems related to the number of work tasks we have running at a time.
What would cause this to delay longer than expected? Additionally, how might one go about debugging this type of situation?
This is a follow up to my other question which did not receive an answer: await Task.Delay() delaying for longer that expected

Most often that happens because of thread pool saturation. You can clearly see its effect with this simple console application (I measure time the same way you are doing, doesn't matter in this case if we use stopwatch or not):
public class Program {
public static void Main() {
for (int j = 0; j < 10; j++)
for (int i = 1; i < 10; i++) {
TestDelay(i * 1000);
}
Console.ReadKey();
}
static async Task TestDelay(int expected) {
var startTime = DateTime.Now;
await Task.Delay(expected).ConfigureAwait(false);
var actual = (int) (DateTime.Now - startTime).TotalMilliseconds;
ThreadPool.GetAvailableThreads(out int aw, out _);
ThreadPool.GetMaxThreads(out int mw, out _);
Console.WriteLine("Thread: {3}, Total threads in pool: {4}, Expected: {0}, Actual: {1}, Diff: {2}", expected, actual, actual - expected, Thread.CurrentThread.ManagedThreadId, mw - aw);
Thread.Sleep(5000);
}
}
This program starts 100 tasks which await Task.Delay for 1-10 seconds, and then use Thread.Sleep for 5 seconds to simulate work on a thread on which continuation runs (this is thread pool thread). It will also output total number of threads in thread pool, so you will see how it increases over time.
If you run it you will see that in almost all cases (except first 8) - actual time after delay is much longer than expected, in some cases 5 times longer (you delayed for 3 seconds but 15 seconds has passed).
That's not because Task.Delay is so imprecise. The reason is continuation after await should be executed on a thread pool thread. Thread pool will not always give you a thread when you request. It can consider that instead of creating new thread - it's better to wait for one of the current busy threads to finish its work. It will wait for a certain time and if no thread became free - it will still create a new thread. If you request 10 thread pool threads at once and none is free, it will wait for Xms and create new one. Now you have 9 requests in queue. Now it will again wait for Xms and create another one. Now you have 8 in queue, and so on. This wait for a thread pool thread to become free is what causes increased delay in this console application (and most likely in your real program) - we keep thread pool threads busy with long Thread.Sleep, and thread pool is saturated.
Some parameters of heuristics used by thread pool are available for you to control. Most influential one is "minumum" number of threads in a pool. Thread pool is expected to always create new thread without delay until total number of threads in a pool reaches configurable "minimum". After that, if you request a thread, it might either still create new one or wait for existing to become free.
So the most straightforward way to remove this delay is to increase minimum number of threads in a pool. For example if you do this:
ThreadPool.GetMinThreads(out int wt, out int ct);
ThreadPool.SetMinThreads(100, ct); // increase min worker threads to 100
All tasks in the example above will complete at the expected time with no additional delay.
This is usually not recommended way to solve this problem though. It's better to avoid performing long running heavy operations on thread pool threads, because thread pool is a global resource and doing this affects your whole application. For example, if we remove Thread.Sleep(5000) in the example above - all tasks will delay for expected amount of time, because all what keeps thread pool thread busy now is Console.WriteLine statement which completes in no time, making this thread available for other work.
So to sum up: identify places where you perform heavy work on thread pool threads and avoid doing that (perform heavy work on separate, non-thread-pool threads instead). Alternatively, you might consider increasing minimum number of threads in a pool to a reasonable amount.

List of tasks starts them synchronously - I would like them to start all at once

private async Task MainTask(CancellationToken token)
{
List<Task> tasks = new List<Task>();
do
{
var data = StaticVariables.AllData;
foreach (var dataPiece in data)
{
tasks.Add((new Task(() => DoSomething(data))));
}
Parallel.ForEach(tasks, task => task.Start());
await Task.WhenAll(tasks);
tasks.Clear();
await Task.Delay(2000);
} while (!token.IsCancellationRequested);
}
The above function is supposed to start a number of DoSomething(task) methods and run them at the same time. DoSomething has a timeout of 2 sec before it returns false. After some testing, it seems that the part between
await Task.WhenAll(tasks);
and
tasks.Clear()
is taking roughly 2 sec * number of tasks. So it would seem they do it like that:
Start task
do it or abort after 2 sec
start next task
...
How could I do it so that they all start at the same time and perform their operations simultaneously?
EDIT
Doing it like so:
await Task.WhenAll(data.Select(dataPiece => Task.Run(() => DoSomething(dataPiece)))
results in horrible performance (around 25 sec to complete the old code, 115 sec to complete this)

The issue you are seeing here is due to the fact that the thread pool maintains a minimum number of threads ready to run. If the thread pool needs to create more threads than that minimum, it introduces a deliberate 1 second delay between creating each new thread.
This is done to prevent things like "thread stampedes" from swamping the system with many simultaneous thread creations.
You can change the minimum thread limit using the ThreadPool.SetMinThreads() method. However, it is not recommended to do this, since it is subverting the expected thread pool operation and may cause other processes to slow down.
If you really must do it though, here's an example console application:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApp3
{
class Program
{
static Stopwatch sw = Stopwatch.StartNew();
static void Main()
{
runTasks();
setMinThreadPoolThreads(30);
runTasks();
}
static void setMinThreadPoolThreads(int count)
{
Console.WriteLine("\nSetting min thread pool threads to {0}.\n", count);
int workerThreads, completionPortThreads;
ThreadPool.GetMinThreads(out workerThreads, out completionPortThreads);
ThreadPool.SetMinThreads(count, completionPortThreads);
}
static void runTasks()
{
var sw = Stopwatch.StartNew();
Console.WriteLine("\nStarting tasks.");
var task = test(20);
Console.WriteLine("Waiting for tasks to finish.");
task.Wait();
Console.WriteLine("Finished after " + sw.Elapsed);
}
static async Task test(int n)
{
var tasks = new List<Task>();
for (int i = 0; i < n; ++i)
tasks.Add(Task.Run(new Action(task)));
await Task.WhenAll(tasks);
}
static void task()
{
Console.WriteLine("Task starting at time " + sw.Elapsed);
Thread.Sleep(5000);
Console.WriteLine("Task stopping at time " + sw.Elapsed);
}
}
}
If you run it, you'll see from the output that running test() before setting the minimum thread pool size the tasks will take around 10 seconds (and you'll see the delay between the task start times increases after the first few tasks).
After setting the minimum thread pool threads to 30, the delay between new tasks starting is much shorter, and the overall time to run test() drops to around 5 seconds (on my PC - yours may be different!).
However, I just want to reiterate that setting the minimum thread pool size is not a normal thing to do, and should be approached with caution. As the Microsoft documentation says:
By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.

First of all, you should utilize Task.Run instead of creating and starting tasks in separate steps.
You can do so inside the loop or Linq style. If you use Linq, just ensure that you are not stuck with deferred execution, where the second task only starts after the first one is completed. Create a list, array or some other persistent collection of your selected tasks:
await Task.WhenAll(data.Select(dataPiece => Task.Run(() => DoSomething(dataPiece)).ToList());
The other problem is with the content of DoSomething. As long as this is a synchronous method, it will block its executing thread until it is done. For an inherently asynchronous operation (like pinging some network address), redesigning the method can prevent this thread blocking behavior.
Another option, as answered by Matthew Watson is to increase the amount of available threads, so each task can run in its own thread. This is not the best option, but if you have many tasks that have long blocking time without doing actual work, more threads will help to get the work done.
More threads will not help if the tasks are actually using the available physical resources, CPU or IO bound work.

Streaming Data BlockingCollection

On page 88 of Stephen Toub's book
http://www.microsoft.com/download/en/details.aspx?id=19222
There is the code
private BlockingCollection<T> _streamingData = new BlockingCollection<T>();
// Parallel.ForEach
Parallel.ForEach(_streamingData.GetConsumingEnumerable(),
item => Process(item));
// PLINQ
var q = from item in _streamingData.GetConsumingEnumerable().AsParallel()
...
select item;
Stephen then mentions
"when
passing the result of calling GetConsumingEnumerable as the data source to Parallel.ForEach, the threads used by
the loop have the potential to block when the collection becomes empty. And a blocked thread may not be released by Parallel.ForEach back to the ThreadPool for retirement or other uses. As such, with the code as shown
above, if there are any periods of time where the collection is empty, the thread count in the process may steadily
grow;"
I do not understand why the thread count would grow?
If the collection is empty then wouldn't the blockingcollection not request any further threads?
Hence you do not need to do WithDegreeOfParallelism to limit the number of threads used on the BlockingCollection

The thread pool has a hill climbing algorithm that it uses to estimate the appropriate number of threads. As long as adding threads increases throughput, the thread pool will create more threads. It will assume that some blocking or IO happens and try to saturate the CPU by going over the count of processors in the system.
That is why doing IO and blocking stuff on thread pool threads can be dangerous.
Here is a fully working example of said behavior:
BlockingCollection<string> _streamingData = new BlockingCollection<string>();
Task.Factory.StartNew(() =>
{
for (int i = 0; i < 100; i++)
{
_streamingData.Add(i.ToString());
Thread.Sleep(100);
}
});
new Thread(() =>
{
while (true)
{
Thread.Sleep(1000);
Console.WriteLine("Thread count: " + Process.GetCurrentProcess().Threads.Count);
}
}).Start();
Parallel.ForEach(_streamingData.GetConsumingEnumerable(), item =>
{
});
I do not know why the thread count keeps climbing although it does not increase throughput. According to the model that I explained it would not grow. But I do not know if my model is actually correct.
Maybe the thread-pool has an additional heuristic that makes it spawn threads if it sees no progress at all (measured in tasks completed per second). That would make sense because that would likely prevent a lot of deadlocks in applications. Deadlocks can happen if important tasks cannot run because they are waiting for existing tasks to exit and make threads available. This is a well-known problem with the thread pool.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How is Task.Run limited by CPU cores? - c#

Related

How to scale an application with 50000 Simultaneous Tasks

Thread pool - 10 phreads created?

Effects of non-awaited Task

List of tasks starts them synchronously - I would like them to start all at once

Streaming Data BlockingCollection

Categories

Resources