C# What is the best approach to parallel one long computing process?

C# What is the best approach to parallel one long computing process? - c#

I made a fake test resembling my real computing task. My current code is:
static void Main()
{
List<ulong> list = new List<ulong>();
Action action = () =>
{
Random rng = new Random(Guid.NewGuid().GetHashCode());
ulong i = 0;
do
{
i++;
if (rng.Next(100000000) == 1000)
{
lock (list) list.Add(i);
Console.WriteLine("ThreadId {0}, step {1}: match is found",
Thread.CurrentThread.ManagedThreadId, i);
}
} while (list.Count < 100);
};
int length = Environment.ProcessorCount;
Action[] actions = new Action[length];
for (int i = 0; i < length; i++)
actions[i] = action;
Parallel.Invoke(actions);
Console.WriteLine("The process is completed. {0} matches are found. Press any key...",
list.Count);
Console.ReadKey();
}
Is there any better approach to optimize the number of parallel tasks for one long computing process?

I'm not sure if i understood the question correctly. The code you've shared will run different instances of the action in parallel. But if you like to compute a long running task in parallel for performance, then you should divide the long running task to small work groups Or If the you are iterating over a collection you can use Parallel for or foreach provided by TPL (Task parallel library) which will determine the number of threads depending on metrics like number of cores and load on cpus etc.

Related

Is parallel code supposed to run slower than sequential code, after a certain dataset size?

I'm fairly new to C# and programming in general and I was trying out parallel programming.
I have written this example code that computes the sum of an array first, using multiple threads, and then, using one thread (the main thread).
I've timed both cases.
static long Sum(int[] numbers, int start, int end)
{
long sum = 0;
for (int i = start; i < end; i++)
{
sum += numbers[i];
}
return sum;
}
static async Task Main()
{
// Arrange data.
const int COUNT = 100_000_000;
int[] numbers = new int[COUNT];
Random random = new();
for (int i = 0; i < numbers.Length; i++)
{
numbers[i] = random.Next(100);
}
// Split task into multiple parts.
int threadCount = Environment.ProcessorCount;
int taskCount = threadCount - 1;
int taskSize = numbers.Length / taskCount;
var start = DateTime.Now;
// Run individual parts in separate threads.
List<Task<long>> tasks = new();
for (int i = 0; i < taskCount; i++)
{
int begin = i * taskSize;
int end = (i == taskCount - 1) ? numbers.Length : (i + 1) * taskSize;
tasks.Add(Task.Run(() => Sum(numbers, begin, end)));
}
// Wait for all threads to finish, as we need the result.
var partialSums = await Task.WhenAll(tasks);
long sumAsync = partialSums.Sum();
var durationAsync = (DateTime.Now - start).TotalMilliseconds;
Console.WriteLine($"Async sum: {sumAsync}");
Console.WriteLine($"Async duration: {durationAsync} miliseconds");
// Sequential
start = DateTime.Now;
long sumSync = Sum(numbers, 0, numbers.Length);
var durationSync = (DateTime.Now - start).TotalMilliseconds;
Console.WriteLine($"Sync sum: {sumSync}");
Console.WriteLine($"Sync duration: {durationSync} miliseconds");
var factor = durationSync / durationAsync;
Console.WriteLine($"Factor: {factor:0.00}x");
}
When the array size is 100 million, the parallel sum is computed 2x faster. (on average).
But when the array size is 1 billion, it's significantly slower than the sequential sum.
Why is it running slower?
Hardware Information
Environment.ProcessorCount = 4
GC.GetGCMemoryInfo().TotalAvailableMemoryBytes = 8468377600
Timing:
When array size is 100,000,000
When array size is 1,000,000,000
New Test:
This time instead of separate threads (it was 3 in my case) working on different parts of a single array of 1,000,000,000 integers, I physically divided the dataset into 3 separate arrays of 333,333,333 (one-third in size). This time, although, I'm working on adding up a billion integers on the same machine, my parallel code runs faster (as expected)
private static void InitArray(int[] numbers)
{
Random random = new();
for (int i = 0; i < numbers.Length; i++)
{
numbers[i] = (int)random.Next(100);
}
}
public static async Task Main()
{
Stopwatch stopwatch = new();
const int SIZE = 333_333_333; // one third of a billion
List<int[]> listOfArrays = new();
for (int i = 0; i < Environment.ProcessorCount - 1; i++)
{
int[] numbers = new int[SIZE];
InitArray(numbers);
listOfArrays.Add(numbers);
}
// Sequential.
stopwatch.Start();
long syncSum = 0;
foreach (var array in listOfArrays)
{
syncSum += Sum(array);
}
stopwatch.Stop();
var sequentialDuration = stopwatch.Elapsed.TotalMilliseconds;
Console.WriteLine($"Sequential sum: {syncSum}");
Console.WriteLine($"Sequential duration: {sequentialDuration} ms");
// Parallel.
stopwatch.Restart();
List<Task<long>> tasks = new();
foreach (var array in listOfArrays)
{
tasks.Add(Task.Run(() => Sum(array)));
}
var partialSums = await Task.WhenAll(tasks);
long parallelSum = partialSums.Sum();
stopwatch.Stop();
var parallelDuration = stopwatch.Elapsed.TotalMilliseconds;
Console.WriteLine($"Parallel sum: {parallelSum}");
Console.WriteLine($"Parallel duration: {parallelDuration} ms");
Console.WriteLine($"Factor: {sequentialDuration / parallelDuration:0.00}x");
}
Timing
I don't know if it helps figure out what went wrong in the first approach.

The asynchronous pattern is not the same as running code in parallel. The main reason for asynchronous code is better resource utilization while the computer is waiting for some kind of IO device. Your code would be better described as parallel computing or concurrent computing.
While your example should work fine, it may not be the easiest, nor optimal way to do it. The easiest option would probably be to use Parallel Linq: numbers.AsParallel().Sum();. There is also a Parallel.For method that should be better suited, including an overload that maintains a thread local state. Note that while the parallel.For will attempt to optimize its partitioning, you probably want to process chunks of data in each iteration to reduce overhead. I would try around 1-10k values or so.
We can only guess the reason your parallel method is slower. Summing numbers is a really fast operation, so it may be that the computation is limited by memory bandwith or Cache usage. And while you want your work partitions to be fairly large, using too large partitions may result in less overall parallelism if a thread gets suspended for any reason. You may also want partitions on certain sizes to work well with the caching system, see cache associativity. It is also possible you are including things you did not intend to measure, like compilation times or GCs, See benchmark .Net that takes care of many of the edge cases when measuring performance.
Also, never use DateTime for measuring performance, Stopwatch is both much easier to use and much more accurate.

My machine has 4GB RAM, so initializing an int[1_000_000_000] results in memory paging. Going from int[100_000_000] to int[1_000_000_000] results in non-linear performance degradation (100x instead of 10x). Essentially a CPU-bound operation becomes I/O-bound. Instead of adding numbers, the program spends most of its time reading segments of the array from the disk. In these conditions using multiple threads can be detrimental for the overall performance, because the pattern of accessing the storage device becomes more erratic and less streamlined.
Maybe something similar happens on your 8GB RAM machine too, but I can't say for sure.

Parallel.ForEach search doesn't find the correct value

This is my first attempt at parallel programming.
I'm writing a test console app before using this in my real app and I can't seem to get it right. When I run this, the parallel search is always faster than the sequential one, but the parallel search never finds the correct value. What am I doing wrong?
I tried it without using a partitioner (just Parallel.For); it was slower than the sequential loop and gave the wrong number. I saw a Microsoft doc that said for simple computations, using Partitioner.Create can speed things up. So I tried that but still got the wrong values. Then I saw Interlocked, but I think I'm using it wrong.
Any help would be greatly appreciated
Random r = new Random();
Stopwatch timer = new Stopwatch();
do {
// Make and populate a list
List<short> test = new List<short>();
for (int x = 0; x <= 10000000; x++)
{
test.Add((short)(r.Next(short.MaxValue) * r.NextDouble()));
}
// Initialize result variables
short rMin = short.MaxValue;
short rMax = 0;
// Do min/max normal search
timer.Start();
foreach (var amp in test)
{
rMin = Math.Min(rMin, amp);
rMax = Math.Max(rMax, amp);
}
timer.Stop();
// Display results
Console.WriteLine($"rMin: {rMin} rMax: {rMax} Time: {timer.ElapsedMilliseconds}");
// Initialize parallel result variables
short pMin = short.MaxValue;
short pMax = 0;
// Create list partioner
var rangePortioner = Partitioner.Create(0, test.Count);
// Do min/max parallel search
timer.Restart();
Parallel.ForEach(rangePortioner, (range, loop) =>
{
short min = short.MaxValue;
short max = 0;
for (int i = range.Item1; i < range.Item2; i++)
{
min = Math.Min(min, test[i]);
max = Math.Max(max, test[i]);
}
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMin), Math.Min(pMin, min));
_ = Interlocked.Exchange(ref Unsafe.As<short, int>(ref pMax), Math.Max(pMax, max));
});
timer.Stop();
// Display results
Console.WriteLine($"pMin: {pMin} pMax: {pMax} Time: {timer.ElapsedMilliseconds}");
Console.WriteLine("Press enter to run again; any other key to quit");
} while (Console.ReadKey().Key == ConsoleKey.Enter);
Sample output:
rMin: 0 rMax: 32746 Time: 106
pMin: 0 pMax: 32679 Time: 66
Press enter to run again; any other key to quit

The correct way to do a parallel search like this is to compute local values for each thread used, and then merge the values at the end. This ensures that synchronization is only needed at the final phase:
var items = Enumerable.Range(0, 10000).ToList();
int globalMin = int.MaxValue;
int globalMax = int.MinValue;
Parallel.ForEach<int, (int Min, int Max)>(
items,
() => (int.MaxValue, int.MinValue), // Create new min/max values for each thread used
(item, state, localMinMax) =>
{
var localMin = Math.Min(item, localMinMax.Min);
var localMax = Math.Max(item, localMinMax.Max);
return (localMin, localMax); // return the new min/max values for this thread
},
localMinMax => // called one last time for each thread used
{
lock(items) // Since this may run concurrently, synchronization is needed
{
globalMin = Math.Min(globalMin, localMinMax.Min);
globalMax = Math.Max(globalMax, localMinMax.Max);
}
});
As you can see this is quite a bit more complex than a regular loop, and this is not even doing anything fancy like partitioning. An optimized solution would work over larger blocks to reduce overhead, but this is omitted for simplicity, and it looks like the OP is aware such issues already.
Be aware that multi threaded programming is difficult. While it is a great idea to try out such techniques in a playground rather than a real program, I would still suggest that you should start by studying the potential dangers of thread safety, there is fairly easy to find good resources about this.
Not all problems will be as obviously wrong like this, and it is quite easy to cause issues that breaks once in a million, or only when the cpu load is high, or only on single CPU systems, or issues that are only detected long after the code is put into production. It is a good practice to be paranoid whenever multiple threads may read and write the same memory concurrently.
I would also recommend learning about immutable data types, and pure functions, since these are much safer and easier to reason about once multiple threads are involved.

Interlocked.Exchange is thread safe only for Exchange, every Math.Min and Math.Max can be with race condition. You should compute min/max for every batch separately and then join results.

Using low-lock techniques like the Interlocked class is tricky and advanced. Taking into consideration that your experience in multithreading is not excessive, I would say go with a simple and trusty lock:
object locker = new object();
//...
lock (locker)
{
pMin = Math.Min(pMin, min);
pMax = Math.Max(pMax, max);
}

Summing every element in a byte array

Now, I'm new to threading and async / sync programming and all that stuff. So, I've been practicing and saw this problem on youtube. The problem was to sum every content of a byte array. It was from the channel called Jamie King. He did this with threads. I've decided to do this with task. I made it asynchronous and it was slower than the synchronous one. The difference between the two was 360 milliseconds! I wonder if any of you could do it faster in an asynchronous way. If so, please post it!
Here's mine:
static Random Random = new Random(999);
static byte[] byteArr = new byte[100_000_000];
static byte TaskCount = (byte)Environment.ProcessorCount;
static int readingLength;
static void Main(string[] args)
{
for (int i = 0; i < byteArr.Length; i++)
{
byteArr[i] = (byte)Random.Next(11);
}
SumAsync(byteArr);
}
static async void SumAsync(byte[] bytes)
{
readingLength = bytes.Length / TaskCount;
int sum = 0;
Console.WriteLine("Running...");
Stopwatch watch = new Stopwatch();
watch.Start();
for (int i = 0; i < TaskCount; i++)
{
Task<int> task = SumPortion(bytes.SubArray(i * readingLength, readingLength));
int result = await task;
sum += result;
}
watch.Stop();
Console.WriteLine("Done! Time took: {0}, Result: {1}", watch.ElapsedMilliseconds, sum);
}
static async Task<int> SumPortion(byte[] bytes)
{
Task<int> task = Task.Run(() =>
{
int sum = 0;
foreach (byte b in bytes)
{
sum += b;
}
return sum;
});
int result = await task;
return result;
}
Note that bytes.SubArray is an extension method. I have one question. Is asynchronous programming slower than synchronous programming?
Please point out my mistakes.
Thanks for your time!

You need to use WhenAll() and return all of the tasks at the end:
static async void SumAsync(byte[] bytes)
{
readingLength = bytes.Length / TaskCount;
int sum = 0;
Console.WriteLine("Running...");
Stopwatch watch = new Stopwatch();
watch.Start();
var results = new Task[TaskCount];
for (int i = 0; i < TaskCount; i++)
{
Task<int> task = SumPortion(bytes.SubArray(i * readingLength, readingLength));
results[i] = task
}
int[] result = await Task.WhenAll(results);
watch.Stop();
Console.WriteLine("Done! Time took: {0}, Result: {1}", watch.ElapsedMilliseconds, result.Sum());
}
When you use the WhenAll() method, you combine all of the Task results, thus the tasks would run in parallel, saving you a lot of necessary time.
You can read more about it in learn.microsoft.com.

asynchronous is not explicitly slower - but runs in the background (Such as waits for connection to a website to be established) - so that the main thread is not stopped for the time it waits for something to happen.

The fastest way to do this is probably going to be to hand-roll a Parallel.ForEach() loop.
Plinq may not even give you a speedup in comparison to a single-threaded approach, and it certainly won't be as fast as Parallel.ForEach().
Here's some sample timing code. When you try this, make sure it's a RELEASE build and that you don't run it under the debugger (which will turn off the JIT optimiser, even if it's a RELEASE build):
using System;
using System.Collections.Concurrent;
using System.Diagnostics;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
static class Program
{
static void Main()
{
// Create some random bytes (using a seed to ensure it's the same bytes each time).
var rng = new Random(12345);
byte[] byteArr = new byte[500_000_000];
rng.NextBytes(byteArr);
// Time single-threaded Linq.
var sw = Stopwatch.StartNew();
long sum = byteArr.Sum(x => (long)x);
Console.WriteLine($"Single-threaded Linq took {sw.Elapsed} to calculate sum as {sum}");
// Time single-threaded loop;
sw.Restart();
sum = 0;
foreach (var n in byteArr)
sum += n;
Console.WriteLine($"Single-threaded took {sw.Elapsed} to calculate sum as {sum}");
// Time Plinq
sw.Restart();
sum = byteArr.AsParallel().Sum(x => (long)x);
Console.WriteLine($"Plinq took {sw.Elapsed} to calculate sum as {sum}");
// Time Parallel.ForEach() with partitioner.
sw.Restart();
sum = 0;
Parallel.ForEach
(
Partitioner.Create(0, byteArr.Length),
() => 0L,
(subRange, loopState, threadLocalState) =>
{
for (int i = subRange.Item1; i < subRange.Item2; i++)
threadLocalState += byteArr[i];
return threadLocalState;
},
finalThreadLocalState =>
{
Interlocked.Add(ref sum, finalThreadLocalState);
}
);
Console.WriteLine($"Parallel.ForEach with partioner took {sw.Elapsed} to calculate sum as {sum}");
}
}
}
The results I get with an x64 build on my octo-core PC are:
Single-threaded Linq took 00:00:03.1160235 to calculate sum as 63748717461
Single-threaded took 00:00:00.7596687 to calculate sum as 63748717461
Plinq took 00:00:01.0305913 to calculate sum as 63748717461
Parallel.ForEach with partioner took 00:00:00.0839141 to calculate sum as 63748717461
The results I get with an x86 build are:
Single-threaded Linq took 00:00:02.6964067 to calculate sum as 63748717461
Single-threaded took 00:00:00.8200462 to calculate sum as 63748717461
Plinq took 00:00:01.1251899 to calculate sum as 63748717461
Parallel.ForEach with partioner took 00:00:00.1084805 to calculate sum as 63748717461
As you can see, the Parallel.ForEach() with the x64 build is fastest (probably because it's calculating a long total, rather than because of the larger address space).
The Plinq is around three times faster than the Linq non-threaded solution.
The Parallel.ForEach() with a partitioner is more than 30 times faster.
But notably, the non-linq single-threaded code is faster than the Plinq code. In this case, using Plinq is pointless; it makes things slower!
This tells us that the speedup isn't just from multithreading - it's also related to the overhead of Linq and Plinq in comparison to hand-rolling the loop.
Generally speaking, you should only use Plinq when the processing of each element take a relatively long time (and adding a byte to a running total take a very short time).
The advantage of Plinq over Parallel.ForEach() with a partitioner is that it is much simpler to write - however, if it winds up being slower than a simple foreach loop then its utility is questionable. So timing things before choosing a solution is very important!

Getting Min, Max, Sum with a single parallel for loop

I am trying to get minimum, maximum and sum (for the average) from a large array. I would love to substitute my regular for loop with parallel.for
UInt16 tempMin = (UInt16)(Math.Pow(2,mfvm.cameras[openCamIndex].bitDepth) - 1);
UInt16 tempMax = 0;
UInt64 tempSum = 0;
for (int i = 0; i < acquisition.frameDataShorts.Length; i++)
{
if (acquisition.frameDataShorts[i] < tempMin)
tempMin = acquisition.frameDataShorts[i];
if (acquisition.frameDataShorts[i] > tempMax)
tempMax = acquisition.frameDataShorts[i];
tempSum += acquisition.frameDataShorts[i];
}
I know how to solve this using Tasks with cutting the array myself. However I would love to learn how to use parallel.for for this. Since as I understand it, it should be able to do this very elegantly.
I found this tutorial from MSDN for calculating the Sum, however I have no idea how to extend it to do all three things (min, max and sum) in a single passage.
Results:
Ok I tried PLINQ solution and I have seen some serious improvements.
3 passes (Min, Max, Sum) are on my i7 (2x4 Cores) 4x times faster then sequential aproach. However I tried the same code on Xeon (2x8 core) and results are completelly different. Parallel (again 3 passes) are actually twice as slow as sequential aproach (which is like 5x faster then on my i7).
In the end I have separated the array myself with Task Factory and I have slightly better results on all computers.

I assume that the main issue here is that three different variables are have to be remembered each iteration. You can utilize Tuple for this purpose:
var lockObject = new object();
var arr = Enumerable.Range(0, 1000000).ToArray();
long total = 0;
var min = arr[0];
var max = arr[0];
Parallel.For(0, arr.Length,
() => new Tuple<long, int, int>(0, arr[0], arr[0]),
(i, loop, temp) => new Tuple<long, int, int>(temp.Item1 + arr[i], Math.Min(temp.Item2, arr[i]),
Math.Max(temp.Item3, arr[i])),
x =>
{
lock (lockObject)
{
total += x.Item1;
min = Math.Min(min, x.Item2);
max = Math.Max(max, x.Item3);
}
}
);
I must warn you, though, that this implementation runs about 10x slower (on my machine) than the simple for loop approach you demonstrated in your question, so proceed with caution.

I don't think parallel.for is good fit here but try this out:
public class MyArrayHandler {
public async Task GetMinMaxSum() {
var myArray = Enumerable.Range(0, 1000);
var maxTask = Task.Run(() => myArray.Max());
var minTask = Task.Run(() => myArray.Min());
var sumTask = Task.Run(() => myArray.Sum());
var results = await Task.WhenAll(maxTask,
minTask,
sumTask);
var max = results[0];
var min = results[1];
var sum = results[2];
}
}
Edit
Just for fun due to the comments regarding performance I took a couple measurements. Also, found this Fastest way to find sum.
#10,000,000 values
GetMinMax: 218ms
GetMinMaxAsync: 308ms
public class MinMaxSumTests {
[Test]
public async Task GetMinMaxSumAsync() {
var myArray = Enumerable.Range(0, 10000000).Select(x => (long)x).ToArray();
var sw = new Stopwatch();
sw.Start();
var maxTask = Task.Run(() => myArray.Max());
var minTask = Task.Run(() => myArray.Min());
var sumTask = Task.Run(() => myArray.Sum());
var results = await Task.WhenAll(maxTask,
minTask,
sumTask);
var max = results[0];
var min = results[1];
var sum = results[2];
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
[Test]
public void GetMinMaxSum() {
var myArray = Enumerable.Range(0, 10000000).Select(x => (long)x).ToArray();
var sw = new Stopwatch();
sw.Start();
long tempMin = 0;
long tempMax = 0;
long tempSum = 0;
for (int i = 0; i < myArray.Length; i++) {
if (myArray[i] < tempMin)
tempMin = myArray[i];
if (myArray[i] > tempMax)
tempMax = myArray[i];
tempSum += myArray[i];
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
}
}

Do not reinvent the wheel, Min, Max Sum and similar operations are aggregations. Since .NET v3.5 you have a handy versions of LINQ extension methods which are already providing you the solution:
using System.Linq;
var sequence = Enumerable.Range(0, 10).Select(s => (uint)s).ToList();
Console.WriteLine(sequence.Sum(s => (double)s));
Console.WriteLine(sequence.Max());
Console.WriteLine(sequence.Min());
Though they are declared as the extensions for IEnumerable, they have some internal improvements for IList and Array types, so you should measure how your code will work on that types and on IEnumerable's.
In your case this isn't enough, as you clearly do not want to iterate other one array more than one time, so the magic goes here: PLINQ (a.k.a. Parallel-LINQ). You need to add only one method to aggregate your array in parallel:
var sequence = Enumerable.Range(0, 10000000).Select(s => (uint)s).AsParallel();
Console.WriteLine(sequence.Sum(s => (double)s));
Console.WriteLine(sequence.Max());
Console.WriteLine(sequence.Min());
This option add some overhead for synchronization the items, but it do scale well, providing a similar time either for small and big enumerations. From MSDN:
PLINQ is usually the recommended approach whenever you need to apply the parallel aggregation pattern to .NET applications. Its declarative nature makes it less prone to error than other approaches, and its performance on multicore computers is competitive with them.
Implementing parallel aggregation with PLINQ doesn't require adding locks in your code. Instead, all the synchronization occurs internally, within PLINQ.
However, if you still want to investigate the performance for different types of the operations, you can use the Parallel.For and Parallel.ForaEach methods overloads with some aggregation approach, something like this:
double[] sequence = ...
object lockObject = new object();
double sum = 0.0d;
Parallel.ForEach(
// The values to be aggregated
sequence,
// The local initial partial result
() => 0.0d,
// The loop body
(x, loopState, partialResult) =>
{
return Normalize(x) + partialResult;
},
// The final step of each local context
(localPartialSum) =>
{
// Enforce serial access to single, shared result
lock (lockObject)
{
sum += localPartialSum;
}
}
);
return sum;
If you need additional partition for your data, you can use a Partitioner for the methods:
var rangePartitioner = Partitioner.Create(0, sequence.Length);
Parallel.ForEach(
// The input intervals
rangePartitioner,
// same code here);
Also Aggregate method can be used for the PLINQ, with some merge logic
(illustration from MSDN again):
Useful links:
Parallel Aggregation
Enumerable.Min<TSource>(IEnumerable<TSource>) method
Enumerable.Sum method
Enumerable.Max<TSource> (IEnumerable<TSource>) method

Several Tasks manipulating on same Object

So was I just doing some experiments with Task class in c# and the following thing happens.
Here is the method I call
static async Task<List<int>> GenerateList(long size, int numOfTasks)
{
var nums = new List<int>();
Task[] tasks = new Task[numOfTasks];
for (int i = 0; i < numOfTasks; i++)
{
tasks[i] = Task.Run(() => nums.Add(Rand.Nex())); // Rand is a ThreadLocal<Random>
}
for (long i = 0; i < size; i += numOfTasks)
{
await Task.WhenAll(tasks);
}
return nums;
}
I call this method like this
var nums = GenerateList(100000000, 10).Result;
before I used Tasks generation took like 4-5 seconds. after I implemented this method like this if I pass 10-20 number of tasks the time of generation is lowered to 1,8-2,2 seconds but the thing it the List which is return by the method has numOfTask number of Elements in it so in this case List of ten numbers is returned. May be I'm writing something wrong. What can be the problem here. Or may be there is another solution to It. All I want it many task to add numbers in the same list so the generation time would be at least twice faster. Thanks In advance

WhenAll does not run the tasks; it just (asynchronously) waits for them to complete. Your code is only creating 10 tasks, so that's why you're only getting 10 numbers. Also, as #Mauro pointed out, List<T>.Add is not threadsafe.
If you want to do parallel computation, then use Parallel or Parallel LINQ, not async:
static List<int> GenerateList(int size, int numOfTasks)
{
return Enumerable.Range(0, size)
.AsParallel()
.WithDegreeOfParallelism(numOfTasks)
.Select(_ => Rand.Value.Next())
.ToList();
}

As explained by Stephen, you are only creating 10 tasks.
Also, I believe the Add operation on the generic list is not thread safe. You should use a locking mechanism or, if you are targeting framework 4 or newer, use thread-safe collections .

you are adding to the list in the following loop which runs for only 10 times
for (int i = 0; i < numOfTasks; i++)
{
tasks[i] = Task.Run(() => nums.Add(Rand.Nex())); // Rand is a ThreadLocal<Random>
}
you can instead do
for (int i = 0; i < numOfTasks; i++)
{
tasks[i] = new Task(() => nums.Add(Rand.Nex()));
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.