Why is await async so slow?

Why is await async so slow? - c#

I finally got VS2012 and got a simple demo up and working to check out the potential performance boost of async and await, but to my dismay it is slower! Its possible I'm doing something wrong, but maybe you can help me out. (I also added a simple Threaded solution, and that runs faster as expected)
My code uses a class to sum an array, based on the number of cores on your system (-1) Mine had 4 cores, so I saw about a 2x speed up (2.5 threads) for threading, but a 2x slow down for the same thing but with async/await.
Code: (Note you will need to added the reference to System.Management to get the core detector working)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
using System.Management;
using System.Diagnostics;
namespace AsyncSum
{
class Program
{
static string Results = "";
static void Main(string[] args)
{
Task t = Run();
t.Wait();
Console.WriteLine(Results);
Console.ReadKey();
}
static async Task Run()
{
Random random = new Random();
int[] huge = new int[1000000];
for (int i = 0; i < huge.Length; i++)
{
huge[i] = random.Next(2);
}
ArraySum summer = new ArraySum(huge);
Stopwatch sw = new Stopwatch();
sw.Restart();
long tSum = summer.Sum();
for (int i = 0; i < 100; i++)
{
tSum = summer.Sum();
}
long tticks = sw.ElapsedTicks / 100;
long aSum = await summer.SumAsync();
sw.Restart();
for (int i = 0; i < 100; i++)
{
aSum = await summer.SumAsync();
}
long aticks = sw.ElapsedTicks / 100;
long dSum = summer.SumThreaded();
sw.Restart();
for (int i = 0; i < 100; i++)
{
dSum = summer.SumThreaded();
}
long dticks = sw.ElapsedTicks / 100;
long pSum = summer.SumParallel();
sw.Restart();
for (int i = 0; i < 100; i++)
{
pSum = summer.SumParallel();
}
long pticks = sw.ElapsedTicks / 100;
Program.Results += String.Format("Regular Sum: {0} in {1} ticks\n", tSum, tticks);
Program.Results += String.Format("Async Sum: {0} in {1} ticks\n", aSum, aticks);
Program.Results += String.Format("Threaded Sum: {0} in {1} ticks\n", dSum, dticks);
Program.Results += String.Format("Parallel Sum: {0} in {1} ticks\n", pSum, pticks);
}
}
class ArraySum
{
int[] Data;
int ChunkSize = 1000;
int cores = 1;
public ArraySum(int[] data)
{
Data = data;
cores = 0;
foreach (var item in new System.Management.ManagementObjectSearcher("Select * from Win32_Processor").Get())
{
cores += int.Parse(item["NumberOfCores"].ToString());
}
cores--;
if (cores < 1) cores = 1;
ChunkSize = Data.Length / cores + 1;
}
public long Sum()
{
long sum = 0;
for (int i = 0; i < Data.Length; i++)
{
sum += Data[i];
}
return sum;
}
public async Task<long> SumAsync()
{
Task<long>[] psums = new Task<long>[cores];
for (int i = 0; i < psums.Length; i++)
{
int start = i * ChunkSize;
int end = start + ChunkSize;
psums[i] = Task.Run<long>(() =>
{
long asum = 0;
for (int a = start; a < end && a < Data.Length; a++)
{
asum += Data[a];
}
return asum;
});
}
long sum = 0;
for (int i = 0; i < psums.Length; i++)
{
sum += await psums[i];
}
return sum;
}
public long SumThreaded()
{
long sum = 0;
Thread[] threads = new Thread[cores];
long[] buckets = new long[cores];
for (int i = 0; i < cores; i++)
{
int start = i * ChunkSize;
int end = start + ChunkSize;
int bucket = i;
threads[i] = new Thread(new ThreadStart(() =>
{
long asum = 0;
for (int a = start; a < end && a < Data.Length; a++)
{
asum += Data[a];
}
buckets[bucket] = asum;
}));
threads[i].Start();
}
for (int i = 0; i < cores; i++)
{
threads[i].Join();
sum += buckets[i];
}
return sum;
}
public long SumParallel()
{
long sum = 0;
long[] buckets = new long[cores];
ParallelLoopResult lr = Parallel.For(0, cores, new Action<int>((i) =>
{
int start = i * ChunkSize;
int end = start + ChunkSize;
int bucket = i;
long asum = 0;
for (int a = start; a < end && a < Data.Length; a++)
{
asum += Data[a];
}
buckets[bucket] = asum;
}));
for (int i = 0; i < cores; i++)
{
sum += buckets[i];
}
return sum;
}
}
}
Any thoughts? Am I doing async/await wrong? I'll be happy to try any suggestions.

It's important to separate "asynchrony" from "parallelization". await is there to help make writing asynchronous code easier. Code that runs in parallel may (or may not) involve asynchrony, and code that is asynchronous may or may not run in parallel.
Nothing about await is designed to make parallel code faster. The purpose of await is to make writing asynchronous code easier, while minimizing the negative performance implications. Using await won't ever be faster than correctly written non-await asynchronous code (although because writing correct code with await is easier, it will sometimes be faster because the programmer isn't capable of writing that asynchronous code correctly without await, or isn't willing to put the time in to do so. If the non-async code is written well it will perform about as well, if not a tad better, than the await code.
C# does have support specifically for parallelization, it's just not specifically though await. The Task Parallel Library (TPL) as well as Parallel LINQ (PLINQ) have several very effective means of parallelizing code that is generally more efficient than naive threaded implementations.
In your case, an effective implementation using PLINQ might be something like this:
public static int Sum(int[] array)
{
return array.AsParallel().Sum();
}
Note that this will take care of efficiently partitioning the input sequence into chunks that will be run in parallel; it will take care of determining the appropriate size of chunks, and the number of concurrent workers, and it will appropriately aggregate the results of those workers in a manor that is both properly synchronized to ensure a correct result (unlike your threaded example) and efficient (meaning that it won't completely serialize all aggregation).

async isn't intended for heavy-duty parallel computation. You can do basic parallel work using Task.Run with Task.WhenAll, but any serious parallel work should be done using the task parallel library (e.g., Parallel). Asynchronous code on the client side is about responsiveness, not parallel processing.
A common approach is to use Parallel for the parallel work, and then wrap it in a Task.Run and use await on it to keep the UI responsive.

Your benchmark has a couple of flaws:
You are timing the first run which includes initialization time (loading class Task, JIT-compilation etc.)
You are using DateTime.Now, which is too inaccurate for timings in the millisecond range. You'll need to use StopWatch
With these two issues fixed; I get the following benchmark results:
Regular Sum: 499946 in 00:00:00.0047378
Async Sum: 499946 in 00:00:00.0016994
Threaded Sum: 499946 in 00:00:00.0026898
Async now comes out as the fastest solution, taking less than 2ms.
This is the next problem: timing something as fast as 2ms is extremely unreliable; your thread can get paused for longer than that if some other process is using the CPU in the background. You should average the results over several thousands of benchmark runs.
Also, what's going on with your number of core detection? My quad-core is using a chunk size of 333334 which allows only 3 threads to run.

On a quick look, the results are expected: your async sum is using just one thread, while you asynchronously wait for it to finish, so it's slower than the multi-threaded sum.
You'd use async in case you have something else to finish while it's doing its job. So, this wouldn't be the right test for any speed/response improvements.

Related

Factorial using Task

I am looking for a way to use task to compute factorial of a number. My purpose is to compare the result with factorial using a sequential loop. For example
16! task1 = 16*15*14*13*12*11 and task2 = 10*9*8*7*6 and task3 = 5*4*3*2*1
I have searched online but I cannot find a solution that matched my need. Thanks in anticipation.
static long factorialmethod(int number)
{
long factorial;
factorial = number;
if (number <= 1) { return 1; }
else
for (int i = number-1; i >= number; i--)
{
factorial *= i;
}
return factorial;
}
static void Main(string[] args)
{
int number;
Console.WriteLine("Please input your whole number");
number = int.Parse(Console.ReadLine());
Console.WriteLine("\nFactorial of the number is {0}",factorialmethod(number));
Console.ReadKey()
}

An easiest way to try parallel when computing factorial is PLinq (Parallel Linq):
using System.Linq;
...
static long factorialmethod(int number) {
if (number <= 1)
return 1; // strictly speaking, factorial on negative (-N)! = infinity
return Enumerable
.Range(1, number)
.AsParallel() // comment it out if you want sequential version
.Aggregate(1L, (s, a) => s * a);
}
Use Stopwatch to benchmark; comment out .AsParallel(): do you really want parallel implementation (let Task alone)?

You can use #DmitryBychenko answer if you want to go PLINQ way. But as an alternative, you can directly use the Parallel library too. If you are literally trying to solve your stated problem using Task library, I think PLINQ is a cleaner solution, but if you are using the posted question as a reduced version of some similar problem, where you need more control, then the below solution might help
static long Factorial(int number) {
if (number < 1)
return 1;
var results = new ConcurrentBag<long>();
Parallel.ForEach(Partitioner.Create(1, number + 1, 5 /*You can select your range size here. You can also derive it based on Environment.ProcessorCount, if you so wish */),
(range, loopState) => {
long product = 1;
for (int i = range.Item1; i < range.Item2; i++) {
product *= i;
}
results.Add(product);
});
long factorial = 1;
foreach (var item in results)
factorial *= item;
return factorial;
}

Matrix Multiplication with specific number of threads [duplicate]

This question already has answers here:
Thread parameters being changed
(2 answers)
Closed 6 years ago.
I am completely new in the field of multithreading. At the moment I am trying to implement a commandline program which is able to multiply two matrices of equal size. The main goal is that the user can enter a specific number of threads as a commandline argument and that the multiplication task is solved using exactly this number of threads.
My approach is based on the following java implementation which tries to solve a similar task: Java Implementation Matrix Multiplication Multi-Threading
My current state is the following one:
using System;
using System.Threading;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading.Tasks;
class Program
{
static int rows = 16;
static int columns = 16;
static int[] temp = new int[rows*columns];
static int[,] matrixA = new int[rows, columns];
static int[,] matrixB = new int[rows, columns];
static int[,] result = new int[rows, columns];
static Thread[] threadPool;
static void runMultiplication(int index){
for(int i = 0; i < rows; i++){
for(int j = 0; j < columns; j++){
Console.WriteLine();
result[index, i] += matrixA[index, j] * matrixB[j, i];
}
}
}
static void fillMatrix(){
for (int i = 0; i < matrixA.GetLength(0); i++) {
for (int j = 0; j < matrixA.GetLength(1); j++) {
matrixA[i, j] = 1;
matrixB[i, j] = 2;
}
}
}
static void multiplyMatrices(){
threadPool = new Thread[rows];
for(int i = 0; i < rows; i++){
threadPool[i] = new Thread(() => runMultiplication(i));
threadPool[i].Start();
}
for(int i = 0; i < rows; i++){
try{
threadPool[i].Join();
}catch (Exception e){
Console.WriteLine(e.Message);
}
}
}
static void printMatrix(int[,] matrix) {
for (int i = 0; i < rows; i++)
{
for (int j = 0; j < columns; j++)
{
Console.Write(string.Format("{0} ", matrix[i, j]));
}
Console.Write(Environment.NewLine + Environment.NewLine);
}
}
static void Main(String[] args){
fillMatrix();
multiplyMatrices();
printMatrix(result);
}
}
At the moment I have two problems:
My result matrix contains values which are far from a valid result
I do not know if I am on the right way according to my goal that a user can specify how many threads should be used.
I would be very grateful if anyone could guide me to a solution.
PS: I know there are existing posts which are similar to mine, but the main challenge in my post is to allow the user to set the number of threads which will then solve the matrix multiplication.

Linear algebra is a difficult place to begin with threads if you're new. I'd recommend learning about map/reduce first and implementing that in C#.
Imagine if you have just one core and you wanted to perform a long calculation. Multiple threads are scheduled by the operating system so that one does some work, then the next is giving a turn, etc. It's easy to do the thought experiment and figure out that context switching will make the problem go slower than the single threaded version. There's no true parallelization there.
The problem is that most linear algebra operations are not easily parallelizable. They aren't independent of each other. More threads than cores will not improve the situation and may make performance worse.
The best you can do is one thread per core and partitioning the matrix like this.
Here's a thought: Before you worry about multithreading, take your Matrix class and make sure that every single operation works properly with a single thread. There's no sense in worrying about multithreading if your code doesn't produce the right answers for a single thread. Get that working, then figure out how to partition the problem among multiple threads.

Weird behavior of multithread random numbers generator

Please check below code, this code try to compute birthday conflict possibility. To my surprise, if i execute those code with sequence, the result is expected around 0.44; but if try on PLinq, the result is 0.99.
Anyone can explain the result?
public static void BirthdayConflict(int num = 5, int people = 300) {
int N = 100000;
int act = 0;
Random r = new Random();
Action<int> action = (a) => {
List<int> p = new List<int>();
for (int i = 0; i < people; i++)
{
p.Add(r.Next(364) + 1);
}
p.Sort();
bool b = false;
for (int i = 0; i < 300; i++)
{
if (i + num -1 >= people) break;
if (p[i] == p[i + num -1])
b = true;
}
if (b)
Interlocked.Increment(ref act);
// act++;
};
// Result is around 0.99 - which is not OK
// Parallel.For( 0, N, action);
//Result is around 0.44 - which is OK
for (int i = 0; i < N; i++)
{
action(0);
}
Console.WriteLine(act / 100000.0);
Console.ReadLine();
}

You're using a shared (between threads) instance System.Random. It's not thread-safe then you're getting wrong results (well actually it just doesn't work and it'll return 0). From MSDN:
If your app calls Random methods from multiple threads, you must use a synchronization object to ensure that only one thread can access the random number generator at a time. If you don't ensure that the Random object is accessed in a thread-safe way, calls to methods that return random numbers return 0.
Simple (but not so efficient for parallel execution) solution is to use a lock:
lock (r)
{
for (int i = 0; i < people; i++)
{
p.Add(r.Next(364) + 1);
}
}
To improve performance (but you should measure) you may use multiple instances of System.Random, be careful to initialize each one with a different seed.

I find a useful explanation why random does not work under multi-thread, although it was original for Java, still can be benefitical.

Precise alternative to Thread.Sleep

I have a method Limit() which counts a bandwidth passed thought some channel in certain time and limits by using Thread.Sleep() it (if bandwidth limit is reached).
Method itself produces proper ( in my opinion results ) but Thread.Sleep doesn't ( due to multithreaded CPU usage ) because i have proper "millisecondsToWait" but speed check afterwards is far from limitation i've passed.
Is there a way to make limitation more precise ?
Limiter Class
private readonly int m_maxSpeedInKbps;
public Limiter(int maxSpeedInKbps)
{
m_maxSpeedInKbps = maxSpeedInKbps;
}
public int Limit(DateTime startOfCycleDateTime, long writtenInBytes)
{
if (m_maxSpeedInKbps > 0)
{
double totalMilliseconds = DateTime.Now.Subtract(startOfCycleDateTime).TotalMilliseconds;
int currentSpeedInKbps = (int)((writtenInBytes / totalMilliseconds));
if (currentSpeedInKbps - m_maxSpeedInKbps > 0)
{
double delta = (double)currentSpeedInKbps / m_maxSpeedInKbps;
int millisecondsToWait = (int)((totalMilliseconds * delta) - totalMilliseconds);
if (millisecondsToWait > 0)
{
Thread.Sleep(millisecondsToWait);
return millisecondsToWait;
}
}
}
return 0;
}
Test Class which always fails in large delta
[TestMethod]
public void ATest()
{
List<File> files = new List<File>();
for (int i = 0; i < 1; i++)
{
files.Add(new File(i + 1, 100));
}
const int maxSpeedInKbps = 1024; // 1MBps
Limiter limiter = new Limiter(maxSpeedInKbps);
DateTime startDateTime = DateTime.Now;
Parallel.ForEach(files, new ParallelOptions {MaxDegreeOfParallelism = 5}, file =>
{
DateTime currentFileStartTime = DateTime.Now;
Thread.Sleep(5);
limiter.Limit(currentFileStartTime, file.Blocks * Block.Size);
});
long roundOfWriteInKB = (files.Sum(i => i.Blocks.Count) * Block.Size) / 1024;
int currentSpeedInKbps = (int) (roundOfWriteInKB/DateTime.Now.Subtract(startDateTime).TotalMilliseconds*1000);
Assert.AreEqual(maxSpeedInKbps, currentSpeedInKbps, string.Format("maxSpeedInKbps {0} currentSpeedInKbps {1}", maxSpeedInKbps, currentSpeedInKbps));
}

I used to use Thread.Sleep a lot until I discovered waithandles. Using waithandles you can suspend threads, which will come alive again when the waithandle is triggered from elsewhere, or when a time threshold is reached. Perhaps it's possible to re-engineer your limit methodology to use waithandles in some way, because in a lot of situations they are indeed much more precise than Thread.Sleep?

You can do it fairly accurately using a busy wait, but I wouldn't recommend it. You should use one of the multimedia timers to wait instead.
However, this method will wait fairly accurately:
void accurateWait(int millisecs)
{
var sw = Stopwatch.StartNew();
if (millisecs >= 100)
Thread.Sleep(millisecs - 50);
while (sw.ElapsedMilliseconds < millisecs)
;
}
But it is a busy wait and will consume CPU cycles terribly. Also it could be affected by garbage collections or task rescheduling.
Here's the test program:
using System;
using System.Diagnostics;
using System.Collections.Generic;
using System.Threading;
namespace Demo
{
class Program
{
void run()
{
for (int i = 1; i < 10; ++i)
test(i);
for (int i = 10; i < 100; i += 5)
test(i);
for (int i = 100; i < 200; i += 10)
test(i);
for (int i = 200; i < 500; i += 20)
test(i);
}
void test(int millisecs)
{
var sw = Stopwatch.StartNew();
accurateWait(millisecs);
Console.WriteLine("Requested wait = " + millisecs + ", actual wait = " + sw.ElapsedMilliseconds);
}
void accurateWait(int millisecs)
{
var sw = Stopwatch.StartNew();
if (millisecs >= 100)
Thread.Sleep(millisecs - 50);
while (sw.ElapsedMilliseconds < millisecs)
;
}
static void Main()
{
new Program().run();
}
}
}

Why do these tasks execute sequentially?

Why does the following code execute sequentially?
List<Task> tasks = new List<Task>();
for (int i = 0; i <= max; i += block)
{
if (i + block >= max)
tasks.Add(Task.Factory.StartNew(() => Count(ref counter, block)));
else
block = max - i;
}
Task.WaitAll(tasks.ToArray());
I have also tested a version of this using Parallel.Invoke; it, too, fails to execute in parallel. There's bound to be something I'm not understanding, but when I try Googling this, I mostly get instructions on how to force sequential execution.
As a response to one of the caveats given in an answer below, I have included the following method for reference:
static void Count(ref int counter, int num)
{
int localCounter = 0;
for (int i = 0; i < num; i++)
if (Coin()) localCounter++;
System.Threading.Interlocked.Add(ref counter, localCounter);
}
Edited again: Thank you all!

Just replace tasks.Add(Task.Factory.StartNew(() => Count(ref counter, block))); with Console.WriteLine and debug your code.
You never create more than one task.
for (int i = 0; i <= max; i += block)
{
if (i + block >= max)
Console.WriteLine(i);
else
block = max - i;
}

why does the following code execute sequentially?
It doesn't, unless you have something within the Count method that's synchronizing access to a single resource. If it can be parallelized, then this will run in parallel.
If Count executes very quickly then you'll find that the tasks are finished faster than new tasks get scheduled, which is why they may all execute in order.

Seems to me there is something wrong with your loop/if statement:
for (int i = 0; i <= max; i += block)
{
if (i + block >= max)
tasks.Add(Task.Factory.StartNew(() => Count(ref counter, block)));
else
block = max - i;
}
If I am reading it right, you will only add a task if i + block >= max and you will only loop if i + block <= max (showing both the counter increase and the condition check). As such you will only add a task once.
In addition, you are changing block when you are not adding a task. I expect you want something more like the following, though I can not be sure without more code:
for (int i = 0; i <= max; i += block)
{
tasks.Add(Task.Factory.StartNew(() => Count(ref counter, block)));
if (i + block >= max) { block = max - i; }
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Why is await async so slow? - c#

Related

Factorial using Task

Matrix Multiplication with specific number of threads [duplicate]

Weird behavior of multithread random numbers generator

Precise alternative to Thread.Sleep

Why do these tasks execute sequentially?

Categories

Resources