How TPL works in a multicore processor

How TPL works in a multicore processor - c#

I am new to parallel programming in C# 4.0. I understand that Parallel Programming and Multithreading are two different things. Now in TPL if I create a task like below:
Task<int> task1 = new Task<int>(() => {
for (int i = 0; i < 100; i++) {
sum += DoSomeHeavyCalculation(i);
}
return sum;
});
// start the task
task1.Start();
How will this work in the core 2 duo processor. I am actually trying to get my concepts clear.

The calculation for task1 will be executed on single thread, different* from the one you're currently on. What actually happens depends on the code below the one you posted.
If there's nothing there and it's in the main method, the task will probably stop in the middle.
If there's task1.Wait() or something using task1.Result, the current thread will wait until the task is finished and you won't get any performance benefits from using TPL.
If there's some other heavy calculation and then something from the previous point, those two computations will run in parallel.
If you want to run a for loop in parallel, using all your available cores, you should use Parallel.For or PLINQ:
ParallelEnumerable.Range(0, 100).Select(DoSomeHeavyCalculation).Sum()
* In fact, the task can run on the same actual thread, under some circumstances, but that's not relevant here.

Related

Using Task.Wait on a parallel task vs asynchronous task

In chapter 4.4 Dynamic Parallelism, in Stephen Cleary's book Concurrency in C# Cookbook, it says the following:
Parallel tasks may use blocking members, such as Task.Wait,
Task.Result, Task.WaitAll, and Task.WaitAny. In contrast, asynchronous
tasks should avoid blocking members, and prefer await, Task.WhenAll,
and Task.WhenAny.
I was always told that Task.Wait etc are bad because they block the current thread, and that it's much better to use await instead, so that the calling thread is not blocked.
Why is it ok to use Task.Wait etc for a parallel (which I think means CPU bound) Task?
Example:
In the example below, isn't Test1() better because the thread that calls Test1() is able to continue doing something else while it waits for the for loop to complete?
Whereas the thread that calls Test() is stuck waiting for the for loop to complete.
private static void Test()
{
Task.Run(() =>
{
for (int i = 0; i < 100; i++)
{
//do something.
}
}).Wait();
}
private static async Task Test1()
{
await Task.Run(() =>
{
for (int i = 0; i < 100; i++)
{
//do something.
}
});
}
EDIT:
This is the rest of the paragraph which I'm adding based on Peter Csala's comment:
Parallel tasks also commonly use AttachedToParent to create parent/child relationships between tasks. Parallel tasks should be created with Task.Run or Task.Factory.StartNew.

You've already got some great answers here, but just to chime in (sorry if this is repetitive at all):
Task was introduced in the TPL before async/await existed. When async came along, the Task type was reused instead of creating a separate "Promise" type.
In the TPL, pretty much all tasks were Delegate Tasks - i.e., they wrap a delegate (code) which is executed on a TaskScheduler. It was also possible - though rare - to have Promise Tasks in the TPL, which were created by TaskCompletionSource<T>.
The higher-level TPL APIs (Parallel and PLINQ) hide the Delegate Tasks from you; they are higher-level abstractions that create multiple Delegate Tasks and execute them on multiple threads, complete with all the complexity of partitioning and work queue stealing and all that stuff.
However, the one drawback to the higher-level APIs is that you need to know how much work you are going to do before you start. It's not possible for, e.g., the processing of one data item to add another data item(s) back at the beginning of the parallel work. That's where Dynamic Parallelism comes in.
Dynamic Parallelism uses the Task type directly. There are many APIs on the Task type that were designed for Dynamic Parallelism and should be avoided in async code unless you really know what you're doing (i.e., either your name is Stephen Toub or you're writing a high-performance .NET runtime). These APIs include StartNew, ContinueWith, Wait, Result, WaitAll, WaitAny, Id, CurrentId, RunSynchronously, and parent/child tasks. And then there's the Task constructor itself and Start which should never be used in any code at all.
In the particular case of Wait, yes, it does block the thread. And that is not ideal (even in parallel programming), because it blocks a literal thread. However, the alternative may be worse.
Consider the case where task A reaches a point where it has to be sure task B completes before it continues. This is the general Dynamic Parallelism case, so assume no parent/child relationship.
The old-school way to avoid this kind of blocking is to split method A up into a continuation and use ContinueWith. That works fine, but it does complicate the code - rather considerably in the case of loops. You end up writing a state machine, essentially what async does for you. In modern code, you may be able to use await, but then that has its own dangers: parallel code does not work out of the box with async, and combining the two can be tricky.
So it really comes down to a tradeoff between code complexity vs runtime efficiency. And when you consider the following points, you'll see why blocking was common:
Parallelism is normally done on Desktop applications; it's not common (or recommended) for web servers.
Desktop machines tend to have plenty of threads to spare. I remember Mark Russinovich (long before he joined Microsoft) demoing how showing a File Open dialog on Windows spawned some crazy number of threads (over 20, IIRC). And yet the user wouldn't even notice 20 threads being spawned (and presumably blocked).
Parallel code is difficult to maintain in the first place; Dynamic Parallelism using continuations is exceptionally difficult to maintain.
Given these points, it's pretty easy to see why a lot of parallel code blocks thread pool threads: the user experience is degraded by an unnoticeable amount, but the developer experience is enhanced significantly.

The thing is if you are using tasks to parallelize CPU-bound work - your method is likely not asynchronous, because the main benefit of async is asynchronous IO, and you have no IO in this case. Since your method is synchronous - you can't await anything, including tasks you use to parallelize computation, nor do you need to.
The valid concern you mentioned is you would waste current thread if you just block it waiting for parallel tasks to complete. However you should not waste it like this - it can be used as one participant in parallel computation. Say you want to perform parallel computation on 4 threads. Use current thread + 3 other threads, instead of using just 4 other threads and waste current one blocked waiting for them.
That's what for example Parallel LINQ does - it uses current thread together with thread pool threads. Note also its methods are not async (and should not be), but they do use Tasks internally and do block waiting on them.
Update: about your examples.
This one:
private static void Test()
{
Task.Run(() =>
{
for (int i = 0; i < 100; i++)
{
//do something.
}
}).Wait();
}
Is always useless - you offset some computation to separate thread while current thread is blocked waiting, so in result one thread is just wasted for nothing useful. Instead you should just do:
private static void Test()
{
for (int i = 0; i < 100; i++)
{
//do something.
}
}
This one:
private static async Task Test1()
{
await Task.Run(() =>
{
for (int i = 0; i < 100; i++)
{
//do something.
}
});
}
Is useful sometimes - when for some reason you need to perform computation but don't want to block current thread. For example, if current thread is UI thread and you don't want user interface to be freezed while computation is performed. However, if you are not in such environment, for example you are writing general purpose library - then it's useless too and you should stick to synchronous version above. If user of your library happen to be on UI thread - he can wrap the call in Task.Run himself. I would say that even if you are not writing a library but UI application - you should move all such logic (for loop in this case) into separate synchronous method and then wrap call to that method in Task.Run if necessary. So like this:
private static async Task Test2()
{
// we are on UI thread here, don't want to block it
await Task.Run(() => {
OurSynchronousVersionAbove();
});
// back on UI thread
// do something else
}
Now say you have that synchronous method and want to parallelize the computation. You may try something like this:
static void Test1() {
var task1 = Task.Run(() => {
for (int i = 0; i < 50;i++) {
// do something
}
});
var task2 = Task.Run(() => {
for (int i = 50; i < 100;i++) {
// do something
}
});
Task.WaitAll(task1, task2);
}
That will work but it wastes current thread blocked for no reason, waiting for two tasks to complete. Instead, you should do it like this:
static void Test1() {
var task = Task.Run(() => {
for (int i = 0; i < 50; i++) {
// do something
}
});
for (int i = 50; i < 100; i++) {
// do something
}
task.Wait();
}
Now you perform computation in parallel using 2 threads - one thread pool thread (from Task.Run) and current thread. And here is your legitimate use of task.Wait(). Of course usually you should stick to existing solutions like parallel LINQ, which does the same for you but better.

One of the risks of Task.Wait is deadlocks. If you call .Wait on the UI thread, you will deadlock if the task needs the main thread to complete. If you call an async method on the UI thread such deadlocks are very likely.
If you are 100% sure the task is running on a background thread, is guaranteed to complete no matter what, and that this will never change, it is fine to wait on it.
Since this if fairly difficult to guarantee it is usually a good idea to try to avoid waiting on tasks at all.

I believe that point in this passage is to not use blocking operations like Task.Wait in asynchronous code.
The main point isn't that Task.Wait is preferred in parallel code; it just says that you can get away with it, while in asynchronous code it can have a really serious effect.
This is because the success of async code depends on the tasks 'letting go' (with await) so that the thread(s) can do other work. In explicitly parallel code a blocking Wait may be OK because the other streams of work will continue going because they have a dedicated thread(s).

As I mentioned in the comments section if you look at the Receipt as a whole it might make more sense. Let me quote here the relevant part as well.
The Task type serves two purposes in concurrent programming: it can be a parallel task or an asynchronous task. Parallel tasks may use blocking members, such as Task.Wait, Task.Result, Task.WaitAll, and Task.WaitAny. In contrast, asynchronous tasks should avoid blocking members, and prefer await, Task.WhenAll, and Task.WhenAny. Parallel tasks also commonly use AttachedToParent to create parent/child relationships between tasks. Parallel tasks should be created with Task.Run or Task.Factory.StartNew.
In contrast, asynchronous tasks should avoid blocking members and prefer await, Task.WhenAll, and Task.WhenAny. Asynchronous tasks do not use AttachedToParent, but they can inform an implicit kind of parent/child relationship by awaiting an other task.
IMHO, it clearly articulates that a Task (or future) can represent a job, which can take advantage of the async I/O. OR it can represent a CPU bound job which could run in parallel with other CPU bound jobs.
Awaiting the former is the suggested way because otherwise you can't really take advantage of the underlying I/O driver's async capability. The latter does not require awaiting since it is not an async I/O job.
UPDATE Provide an example
As Theodor Zoulias asked in a comment section here is a made up example for parallel tasks where Task.WaitAll is being used.
Let's suppose we have this naive Is Prime Number implementation. It is not efficient, but it demonstrates that you perform something which is computationally can be considered as heavy. (Please also bear in mind for the sake of simplicity I did not add any error handling logic.)
static (int, bool) NaiveIsPrime(int number)
{
int numberOfDividers = 0;
for (int divider = 1; divider <= number; divider++)
{
if (number % divider == 0)
{
numberOfDividers++;
}
}
return (number, numberOfDividers == 2);
}
And here is a sample use case which run a couple of is prime calculation in parallel and waits for the results in a blocking way.
List<Task<(int, bool)>> jobs = new();
for (int number = 1_010; number < 1_020; number++)
{
var x = number;
jobs.Add(Task.Run(() => NaiveIsPrime(x)));
}
Task.WaitAll(jobs.ToArray());
foreach (var job in jobs)
{
(int number, bool isPrime) = job.Result;
var isPrimeInText = isPrime ? "a prime" : "not a prime";
Console.WriteLine($"{number} is {isPrimeInText}");
}
As you can see I haven't used any await keyword anywhere.
Here is a dotnet fiddle link
and here is a link for the prime numbers under 10 000.

I recommended using await instead of Task.Wait() for asynchronous methods/tasks, because this way the thread can be used for something else while the task is running.
However, for parallel tasks that are CPU -bound, most of the available CPU should be used. It makes sense use Task.Wait() to block the current thread until the task is complete. This way, the CPU -bound task can make full use of the CPU resources.
Update with supplementary statement.
Parallel tasks can use blocking members such as Task.Wait(), Task.Result, Task.WaitAll, and Task.WaitAny as they should consume all available CPU resources. When working with parallel tasks, it can be beneficial to block the current thread until the task is complete, since the thread is not being used for anything else. This way, the software can fully utilize all available CPU resources instead of wasting resources by keeping the thread running while it is blocked.

Task.Factory.StartNew isn't running how I expected

I've been trying to determine the best way of running code concurrently/in parallel with the rest of my code, probably using a thread. From what I've read, using Thread type is a no-no in modern C#. Initially I thought Parallel.Invoke(), but that turns out to be a blocking call till all the inner work is complete.
In my application, I don't need to wait for anything to complete, I don't care about getting a result, I need code that is completely independent of the current thread. Basically a "fire and forget" idea.
From what I thought I understand, Task.Factory.StartNew() is the correct way of running a piece of code concurrently/in parallel with the currently running code.
Based on that, I thought the following code would randomly print out "ABABBABABAA".
void Main()
{
Task.Factory.StartNew(() =>
{
for (int i = 0; i < 10; i++)
{
Console.Write("A");
}
});
for (int i = 0; i < 10; i++)
{
Console.Write("B");
}
}
However, it:
Prints out "BBBBBBBBBBAAAAAAAAAA"
If I swap the Task.Factory.StartNew with the for and vice-versa the same sequence is printed out, which seems bizarre.
So this leads me to think that Task.Factory.StartNew() is never actually scheduling work out to another thread, almost as if calling StartNew is a blocking call.
Given the requirement that I don't need to get any results or wait/await, would it be easier for me to simply create a new Thread and run my code there? The only problem that I have with this is that using a Thread seems to be the opposite of modern best practices and the book "Concurrency in C# 6.0" states:
As soon as you type new Thread(), it’s over; your project already has legacy code

Actually Task.Factory.StartNew does run on a seperate thread the only reason it loses the race everytime is because Task creation time. Here is code to prove it
static void Main(string[] args)
{
Task.Factory.StartNew(() =>
{
for (int i = 0; i < 10; i++)
{
Console.Write("A");
Thread.Sleep(1);
}
});
for (int i = 0; i < 10; i++)
{
Console.Write("B");
Thread.Sleep(1);
}
Console.ReadKey();
}

In my application, I don't need to wait for anything to complete, I don't care about getting a result, I need code that is completely independent of the current thread. Basically a "fire and forget" idea.
Are you sure? Fire-and-forget really means "I'm perfectly OK with exceptions being ignored." Since this is not what most people want, one common approach is to queue the work and save a Task representing that work, and join with it at the "end" of whatever your program does, just to make sure the not-quite-fire-and-forget work does complete successfully.
I've been trying to determine the best way of running code concurrently/in parallel with the rest of my code, probably using a thread. From what I've read, using Thread type is a no-no in modern C#. Initially I thought Parallel.Invoke(), but that turns out to be a blocking call till all the inner work is complete.
True, parallel code will block the calling thread. This can be avoided by wrapping the parallel call in a Task.Run (illustrated by Recipe 7.4 in my book).
In your particular case (i.e., in a fire-and-forget scenario), you can just drop the await and have a bare Task.Run. Though as I mention above, it's probably better to just stash the Task away someplace and await it later.
From what I thought I understand, Task.Factory.StartNew() is the correct way of running a piece of code concurrently/in parallel with the currently running code.
No, StartNew is dangerous and should only be used as a last resort. The proper technique is Task.Run, and if you have truly parallel work to do (i.e., many chunks of CPU-bound code), then a Task.Run wrapper around Parallel/PLINQ would be best.
Based on that, I thought the following code would randomly print out "ABABBABABAA".
As others have noted, this is just a race condition. It takes time to queue work to the thread pool, and computers can count to 10 really fast. (Writing output is a lot slower, but it's still too fast here). The same problem would occur with Task.Run (or a manual Thread).

Controlling number of tasks being executed at an instance at runtime [duplicate]

I have the following code:
var factory = new TaskFactory();
for (int i = 0; i < 100; i++)
{
var i1 = i;
factory.StartNew(() => foo(i1));
}
static void foo(int i)
{
Thread.Sleep(1000);
Console.WriteLine($"foo{i} - on thread {Thread.CurrentThread.ManagedThreadId}");
}
I can see it only does 4 threads at a time (based on observation). My questions:
What determines the number of threads used at a time?
How can I retrieve this number?
How can I change this number?
P.S. My box has 4 cores.
P.P.S. I needed to have a specific number of tasks (and no more) that are concurrently processed by the TPL and ended up with the following code:
private static int count = 0; // keep track of how many concurrent tasks are running
private static void SemaphoreImplementation()
{
var s = new Semaphore(20, 20); // allow 20 tasks at a time
for (int i = 0; i < 1000; i++)
{
var i1 = i;
Task.Factory.StartNew(() =>
{
try
{
s.WaitOne();
Interlocked.Increment(ref count);
foo(i1);
}
finally
{
s.Release();
Interlocked.Decrement(ref count);
}
}, TaskCreationOptions.LongRunning);
}
}
static void foo(int i)
{
Thread.Sleep(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}

When you are using a Task in .NET, you are telling the TPL to schedule a piece of work (via TaskScheduler) to be executed on the ThreadPool. Note that the work will be scheduled at its earliest opportunity and however the scheduler sees fit. This means that the TaskScheduler will decide how many threads will be used to run n number of tasks and which task is executed on which thread.
The TPL is very well tuned and continues to adjust its algorithm as it executes your tasks. So, in most cases, it tries to minimize contention. What this means is if you are running 100 tasks and only have 4 cores (which you can get using Environment.ProcessorCount), it would not make sense to execute more than 4 threads at any given time, as otherwise it would need to do more context switching. Now there are times where you want to explicitly override this behaviour. Let's say in the case where you need to wait for some sort of IO to finish, which is a whole different story.
In summary, trust the TPL. But if you are adamant to spawn a thread per task (not always a good idea!), you can use:
Task.Factory.StartNew(
() => /* your piece of work */,
TaskCreationOptions.LongRunning);
This tells the DefaultTaskscheduler to explicitly spawn a new thread for that piece of work.
You can also use your own Scheduler and pass it in to the TaskFactory. You can find a whole bunch of Schedulers HERE.
Note another alternative would be to use PLINQ which again by default analyses your query and decides whether parallelizing it would yield any benefit or not, again in the case of a blocking IO where you are certain starting multiple threads will result in a better execution you can force the parallelism by using WithExecutionMode(ParallelExecutionMode.ForceParallelism) you then can use WithDegreeOfParallelism, to give hints on how many threads to use but remember there is no guarantee you would get that many threads, as MSDN says:
Sets the degree of parallelism to use in a query. Degree of
parallelism is the maximum number of concurrently executing tasks that
will be used to process the query.
Finally, I highly recommend having a read of THIS great series of articles on Threading and TPL.

If you increase the number of tasks to for example 1000000 you will see a lot more threads spawned over time. The TPL tends to inject one every 500ms.
The TPL threadpool does not understand IO-bound workloads (sleep is IO). It's not a good idea to rely on the TPL for picking the right degree of parallelism in these cases. The TPL is completely clueless and injects more threads based on vague guesses about throughput. Also to avoid deadlocks.
Here, the TPL policy clearly is not useful because the more threads you add the more throughput you get. Each thread can process one item per second in this contrived case. The TPL has no idea about that. It makes no sense to limit the thread count to the number of cores.
What determines the number of threads used at a time?
Barely documented TPL heuristics. They frequently go wrong. In particular they will spawn an unlimited number of threads over time in this case. Use task manager to see for yourself. Let this run for an hour and you'll have 1000s of threads.
How can I retrieve this number? How can I change this number?
You can retrieve some of these numbers but that's not the right way to go. If you need a guaranteed DOP you can use AsParallel().WithDegreeOfParallelism(...) or a custom task scheduler. You also can manually start LongRunning tasks. Do not mess with process global settings.

I would suggest using SemaphoreSlim because it doesn't use Windows kernel (so it can be used in Linux C# microservices) and also has a property SemaphoreSlim.CurrentCount that tells how many remaining threads are left so you don't need the Interlocked.Increment or Interlocked.Decrement. I also removed i1 because i is value type and it won't be changed by the call of foo method passing the i argument so it's no need to copy it into i1 to ensure it never changes (if that was the reasoning for adding i1):
private static void SemaphoreImplementation()
{
var maxThreadsCount = 20; // allow 20 tasks at a time
var semaphoreSlim = new SemaphoreSlim(maxTasksCount, maxTasksCount);
var taskFactory = new TaskFactory();
for (int i = 0; i < 1000; i++)
{
taskFactory.StartNew(async () =>
{
try
{
await semaphoreSlim.WaitAsync();
var count = maxTasksCount-semaphoreSlim.CurrentCount; //SemaphoreSlim.CurrentCount tells how many threads are remaining
await foo(i, count);
}
finally
{
semaphoreSlim.Release();
}
}, TaskCreationOptions.LongRunning);
}
}
static async void foo(int i, int count)
{
await Task.Wait(100);
Console.WriteLine($"foo{i:00} - on thread " +
$"{Thread.CurrentThread.ManagedThreadId:00}. Executing concurently: {count}");
}

Task parallel library time slicing

I have a used TaskParallel library in couple of places in my WCF application.
At one place I am using it like:
Place 1
var options = new ParallelOptions { MaxDegreeOfParallelism = Environment.ProcessorCount * 100 };
Parallel.ForEach(objList, options, recurringOrder =>
{
Task.Factory.StartNew(() => ProcessSingleRequestForDebitOrder(recurringOrder));
//var th = new Thread(() => ProcessSingleRequestForDebitOrder(recurringOrder)) { Priority = ThreadPriority.Normal };
//th.Start();
//ProcessSingleRequestForDebitOrder( recurringOrder);
});
And in of another method I have used it like:
Place 2
System.Threading.Tasks.Task.Factory.StartNew(() => ProcessTransaction(objInput.Clone()));
Problem is time slicing between the two places. That is if I have called the the method where parallel loop is processing hundreds of records at Place 2 my thread at Place 1 is waiting till all the records have processed. Could some how I can time slice the processing?
I am using task parallel library for .net 3.5 from;
https://www.nuget.org/packages/TaskParallelLibrary/

The problem is that you have spawned a lot of tasks in place 1 and place 2 is now queued. The Parallel loop in place 1 does nothing because the body only starts a task which is done very quickly.
Probably, you should remove the StartNew thing from place 1 so that the degree of parallelism is lower. I'm not sure this will completely remove any problems because the Parallel loop might still fully utilize all available pool threads.
Doing IO with Parallel is an anti pattern anyway because the system-chosen DOP almost always is a bad choice. The TPL has no idea how to efficiently schedule IO.
You can make place 2 a LongRunning task so that it does not depend on the thread pool and is guaranteed to run.
You also can investigate using async IO so that you do not depend on the thread pool anymore.

WhenAll vs WaitAll in parallel

I'm trying to understand how WaitAll and WhenAll works and have following problem. There are two possible ways to get a result from a method:
return Task.WhenAll(tasks).Result.SelectMany(r=> r);
return tasks.Select(t => t.Result).SelectMany(r => r).ToArray();
If I understand correctly, the second case is like calling WaitAll on tasks and fetching the results after that.
It looks like the second case has much better performance. I know that the proper usage of WhenAll is with await keyword, but still, i'm wondering why there is so big difference in performance for these lines.
After analyzing the flow of the system I think I've figured out how to model the problem in a simple test application (test code is based on I3arnon answer):
public static void Test()
{
var tasks = Enumerable.Range(1, 1000).Select(n => Task.Run(() => Compute(n)));
var baseTasks = new Task[100];
var stopwatch = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
baseTasks[i] = Task.Run(() =>
{
tasks.Select(t => t.Result).SelectMany(r => r).ToList();
});
}
Task.WaitAll(baseTasks);
Console.WriteLine("Select - {0}", stopwatch.Elapsed);
baseTasks = new Task[100];
stopwatch.Restart();
for (int i = 0; i < 100; i++)
{
baseTasks[i] = Task.Run(() =>
{
Task.WhenAll(tasks).Result.SelectMany(result => result).ToList();
});
}
Task.WaitAll(baseTasks);
Console.WriteLine("Task.WhenAll - {0}", stopwatch.Elapsed);
}
It looks like the problem is in starting tasks from other tasks (or in Parallel loop). In that case WhenAll results in much worse performance of the program. Why is that?

You are starting tasks inside a Parallel.ForEach loop which you should avoid. The whole point of Paralle.ForEach is to parallelize many small but intensive computations across the available CPU cores and starting a task is not an intensive computation. Rather it creates a task object and stores it on a queue if the task pool is saturated which it quickly will be with 1000 tasks being starteed. So now Parallel.ForEach competes with the task pool for compute resources.
In the first loop that is quite slow it seems that the scheduling is suboptimal and very little CPU is used probably because of Task.WhenAll inside the Parallel.ForEach. If you change the Parallel.ForEach to a normal for loop you will see a speedup.
But if you code really is as simple as a Compute function without any state carried forward between iterations you can get rid of the tasks and simply use Parallel.ForEach to maximize performance:
Parallel.For(0, 100, (i, s) =>
{
Enumerable.Range(1, 1000).Select(n => Compute(n)).SelectMany(r => r).ToList();
});
As to why Task.WhenAll performs much worse you should realize that this code
tasks.Select(t => t.Result).SelectMany(r => r).ToList();
will not run the tasks in parallel. The ToList basically wraps the iteration in a foreach loop and the body of the loop creates a task and then waits for the task to complete because you retrieve the Task.Result property. So each iteration of the loop will create a task and then wait for it to complete. The 1000 tasks are executed one after the other and there is very little overhead in handling the tasks. This means that you do not need the tasks which is also what I have suggested above.
On the other hand, the code
Task.WhenAll(tasks).Result.SelectMany(result => result).ToList();
will start all the tasks and try to execute them concurrently and because the task pool is unable to execute 1000 tasks in parallel most of these tasks are queued before they are executed. This creates a big management and task switch overhead which explains the bad performance.
With regard to the final question you added: If the only purpose of the outer task is to start the inner tasks then the outer task has no useful purpose but if the outer tasks are there to perform some kind of coordination of the inner tasks then it might make sense (perhaps you want to combine Task.WhenAny with Task.WhenAll). Without more context it is hard to answer. However, your question seems to be about performance and starting 100,000 tasks may add considerable overhead.
Parallel.ForEach is a good choice if you want to perform 100,000 independent computations like you do in your example. Tasks are very good for executing concurrent activities involving "slow" calls to other systems where you want to wait for and combine results and also handle errors. For massive parallelism they are probably not the best choice.

Your test is way too complicated so I've made my own. Here's a simple test that incorporates your Consume method:
public static void Test()
{
var tasks = Enumerable.Repeat(int.MaxValue, 10000).Select(n => Task.Run(() => Compute(n)));
var stopwatch = Stopwatch.StartNew();
Task.WhenAll(tasks).Result.SelectMany(result => result).ToList();
Console.WriteLine("Task.WhenAll - {0}", stopwatch.Elapsed);
stopwatch.Restart();
tasks.Select(t => t.Result).SelectMany(r => r).ToList();
Console.WriteLine("Select - {0}", stopwatch.Elapsed);
}
private static List<int> Compute(int seed)
{
var results = new List<int>();
for (int i = 0; i < 5000; i++)
{
results.Add(seed * i);
}
return results;
}
Output:
Task.WhenAll - 00:00:01.2894227
Select - 00:00:01.7114142
However if I use Enumerable.Repeat(int.MaxValue, 100) the output is:
Task.WhenAll - 00:00:00.0205375
Select - 00:00:00.0178089
Basically the difference between the options is if you're blocking once or blocking for each element. Blocking once is better when there are many elements, but for few blocking for each one could be better.
Since there ins't really a big difference and, you care about performance only when you're dealing with many items and logically you want to proceed when all the tasks completed I recommend using Task.WhenAll.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.