I have a method Limit() which counts a bandwidth passed thought some channel in certain time and limits by using Thread.Sleep() it (if bandwidth limit is reached).
Method itself produces proper ( in my opinion results ) but Thread.Sleep doesn't ( due to multithreaded CPU usage ) because i have proper "millisecondsToWait" but speed check afterwards is far from limitation i've passed.
Is there a way to make limitation more precise ?
Limiter Class
private readonly int m_maxSpeedInKbps;
public Limiter(int maxSpeedInKbps)
{
m_maxSpeedInKbps = maxSpeedInKbps;
}
public int Limit(DateTime startOfCycleDateTime, long writtenInBytes)
{
if (m_maxSpeedInKbps > 0)
{
double totalMilliseconds = DateTime.Now.Subtract(startOfCycleDateTime).TotalMilliseconds;
int currentSpeedInKbps = (int)((writtenInBytes / totalMilliseconds));
if (currentSpeedInKbps - m_maxSpeedInKbps > 0)
{
double delta = (double)currentSpeedInKbps / m_maxSpeedInKbps;
int millisecondsToWait = (int)((totalMilliseconds * delta) - totalMilliseconds);
if (millisecondsToWait > 0)
{
Thread.Sleep(millisecondsToWait);
return millisecondsToWait;
}
}
}
return 0;
}
Test Class which always fails in large delta
[TestMethod]
public void ATest()
{
List<File> files = new List<File>();
for (int i = 0; i < 1; i++)
{
files.Add(new File(i + 1, 100));
}
const int maxSpeedInKbps = 1024; // 1MBps
Limiter limiter = new Limiter(maxSpeedInKbps);
DateTime startDateTime = DateTime.Now;
Parallel.ForEach(files, new ParallelOptions {MaxDegreeOfParallelism = 5}, file =>
{
DateTime currentFileStartTime = DateTime.Now;
Thread.Sleep(5);
limiter.Limit(currentFileStartTime, file.Blocks * Block.Size);
});
long roundOfWriteInKB = (files.Sum(i => i.Blocks.Count) * Block.Size) / 1024;
int currentSpeedInKbps = (int) (roundOfWriteInKB/DateTime.Now.Subtract(startDateTime).TotalMilliseconds*1000);
Assert.AreEqual(maxSpeedInKbps, currentSpeedInKbps, string.Format("maxSpeedInKbps {0} currentSpeedInKbps {1}", maxSpeedInKbps, currentSpeedInKbps));
}
I used to use Thread.Sleep a lot until I discovered waithandles. Using waithandles you can suspend threads, which will come alive again when the waithandle is triggered from elsewhere, or when a time threshold is reached. Perhaps it's possible to re-engineer your limit methodology to use waithandles in some way, because in a lot of situations they are indeed much more precise than Thread.Sleep?
You can do it fairly accurately using a busy wait, but I wouldn't recommend it. You should use one of the multimedia timers to wait instead.
However, this method will wait fairly accurately:
void accurateWait(int millisecs)
{
var sw = Stopwatch.StartNew();
if (millisecs >= 100)
Thread.Sleep(millisecs - 50);
while (sw.ElapsedMilliseconds < millisecs)
;
}
But it is a busy wait and will consume CPU cycles terribly. Also it could be affected by garbage collections or task rescheduling.
Here's the test program:
using System;
using System.Diagnostics;
using System.Collections.Generic;
using System.Threading;
namespace Demo
{
class Program
{
void run()
{
for (int i = 1; i < 10; ++i)
test(i);
for (int i = 10; i < 100; i += 5)
test(i);
for (int i = 100; i < 200; i += 10)
test(i);
for (int i = 200; i < 500; i += 20)
test(i);
}
void test(int millisecs)
{
var sw = Stopwatch.StartNew();
accurateWait(millisecs);
Console.WriteLine("Requested wait = " + millisecs + ", actual wait = " + sw.ElapsedMilliseconds);
}
void accurateWait(int millisecs)
{
var sw = Stopwatch.StartNew();
if (millisecs >= 100)
Thread.Sleep(millisecs - 50);
while (sw.ElapsedMilliseconds < millisecs)
;
}
static void Main()
{
new Program().run();
}
}
}
Related
I'm trying to write a code to find prime numbers within a given range. Unfortunately I'm running into some problems with too many repetitions that'll give me a stackoverflowexception after prime nr: 30000. I have tried using a 'foreach' and also not using a list, (doing each number as it comes) but nothing seems to handle the problem in hand.
How can I make this program run forever without causing a stackoverflow?
class Program
{
static void Main(string[] args)
{
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
List<double> Primes = new List<double>();
const double Start = 0;
const double End = 100000;
double counter = 0;
int lastInt = 0;
for (int i = 0; i < End; i++)
Primes.Add(i);
for (int i =0;i< Primes.Count;i++)
{
lastInt = (int)Primes[i] - RoundOff((int)Primes[i]);
Primes[i] = (int)CheckForPrime(Primes[i], Math.Round(Primes[i] / 2));
if (Primes[i] != 0)
{
Console.Write(", {0}", Primes[i]);
counter++;
}
}
stopwatch.Stop();
Console.WriteLine("\n\nNumber of prime-numbers between {0} and {1} is: {2}, time it took to calc this: {3} (millisecounds).\n\n" +
" The End\n", Start, End, counter, stopwatch.ElapsedMilliseconds);
}
public static double CheckForPrime(double Prim, double Devider)
{
if (Prim / Devider == Math.Round(Prim / Devider))
return 0;
else if (Devider > 2)
return CheckForPrime(Prim, Devider - 1);
else
return Prim;
}
public static int RoundOff(int i)
{
return ((int)Math.Floor(i / 10.0)) * 10;
}
}
I need to calculate Pi - number via Monte-Carlo method using Task Parallel Library, but when my paralleled program is running, it calculates Pi - number much longer than it's unparallel analog.How two fix it? Paralleled calculating class and it's unparallel analog are below:
class CalcPiTPL
{
Object randLock = new object();
int n;
int N_0;
double aPi;
public StringBuilder Msg; // diagonstic message
double x, y;
Stopwatch stopWatch = new Stopwatch();
public void Init(int aN)
{
stopWatch.Start();
n = aN; // save total calculate-iterations amount
aPi = -1; // flag, if no any calculate-iteration has been completed
Msg = new StringBuilder("No any calculate-iteration has been completed");
}
public void Run()
{
if (n < 1)
{
Msg = new StringBuilder("Inbalid N-value");
return;
}
Random rnd = new Random(); // to create randomizer
Task[] tasks = new Task[4];
tasks[0] = Task.Factory.StartNew(() => PointGenerator(n, rnd));
tasks[1] = Task.Factory.StartNew(() => PointGenerator(n, rnd));
tasks[2] = Task.Factory.StartNew(() => PointGenerator(n, rnd));
tasks[3] = Task.Factory.StartNew(() => PointGenerator(n, rnd));
Task.WaitAll(tasks[0], tasks[1], tasks[2], tasks[3]);
aPi = 4.0 * ((double)N_0 / (double)n); // to calculate approximate Pi - value
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10);
Console.WriteLine("RunTime " + elapsedTime);
}
public double Done()
{
if (aPi > 0)
{
Msg = new StringBuilder("Calculates has been completed successful");
return aPi; // return gotten value
}
else
{
return 0; // no result
}
}
public void PointGenerator(int n, Random rnd)
{
for (int i = 1; i <= n / 4; i++)
{
lock (randLock)
{
x = rnd.NextDouble(); // to generate coordinates
y = rnd.NextDouble(); //
if (((x - 0.5) * (x - 0.5) + (y - 0.5) * (y - 0.5)) < 0.25)
{
//Interlocked.Increment(ref N_0);
N_0++; // coordinate in a circle! mark it by incrementing N_0
}
}
}
}
}
Unparallel analog:
class TCalcPi//unparallel calculating method
{
int N;
int N_0;
double aPi;
public StringBuilder Msg; // diagnostic message
double x, y;
Stopwatch stopWatch = new Stopwatch();
public void Init(int aN)
{
stopWatch.Start();
N = aN; // save total calculate-iterations amount
aPi = -1; // flag, if no any calculate-iteration has been completed
Msg = new StringBuilder("No any calculate-iteration has been completed");
}
public void Run()
{
if (N < 1)
{
Msg = new StringBuilder("Invalid N - value");
return;
}
int i;
Random rnd = new Random(); // to create randomizer
for (i = 1; i <= N; i++)
{
x = rnd.NextDouble(); // to generate coordinates
y = rnd.NextDouble(); //
if (((x - 0.5) * (x - 0.5) + (y - 0.5) * (y - 0.5)) < 0.25)
{
N_0++; // coordinate in a circle! mark it by incrementing N_0
}
}
aPi = 4.0 * ((double)N_0 / (double)N); // to calculate approximate Pi - value
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
string elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}",
ts.Hours, ts.Minutes, ts.Seconds,
ts.Milliseconds / 10);
Console.WriteLine("RunTime " + elapsedTime);
}
public double Done()
{
if (aPi > 0)
{
Msg = new StringBuilder("Calculates has been completed successful");
return aPi; // return gotten value
}
else
{
return 0; // no result
}
}
}
You have written the PointGenerator in a way in which it can barely benefit from being executed in parallel.
the lock means it will have basically single-threaded performance with additional threading overhead
a global state N_0 means you will have to synchronize access. Granted, since it's just an int you can use the Interlocked class for efficiently incrementing it.
What I would is to let each PointGenerator have a different Random object and a different counter. Then there won't be any shared mutable state which could cause problems. Be careful though, the default constructor of Random uses the tick count of the system. Creating several objects might result in random generators with the same seed.
Once all PointGenerator finish you combine the results.
This would be very similar to what some of the TPL overloads of Parallel.For and Parallel.ForEach do.
I know this post is old but it still shows up when searching for how to compute pi in parallel in C#. I have modified this to use the systems thread count for the workers. Also the lock is not needed if we use a return type for the workers, put some of the other variables in the worker function and finally let everything be put together by yet another task. This uses long for a larger count of iterations. The instances of Random are created with the thread id as the seed, which i hope makes them give different sequences of random numbers. Removed the Init-Method and put initialization in the Run-Method instead. There are two ways of using this now, blocking and non-blocking. But first here is the class:
public class CalcPiTPL
{
private long n;
private double pi;
private Stopwatch stopWatch = new Stopwatch();
private Task<int>[]? tasks = null;
private Task? taskOrchestrator = null;
private ManualResetEvent rst = new ManualResetEvent(false);
private bool isDone = false;
public string elapsedTime = string.Empty;
public double Pi { get { return pi; } }
public void Run(long n)
{
if (n < 1 || taskOrchestrator!=null) return;
isDone = false;
rst.Reset();
stopWatch.Start();
this.n = n; // save total calculate-iterations amount
pi = -1; // flag, if no any calculate-iteration has been completed
tasks = new Task<int>[Environment.ProcessorCount];
for(int i = 0; i < Environment.ProcessorCount; i++)
{
tasks[i] = Task.Factory.StartNew(() => PointGenerator(n));
}
taskOrchestrator = Task.Factory.StartNew(() => Orchestrator());
}
private void Orchestrator()
{
Task.WaitAll(tasks);
long N_0 = 0;
foreach (var task in tasks)
{
N_0 += task.GetAwaiter().GetResult();
}
pi = 4.0 * ((double)N_0 / (double)n); // to calculate approximate Pi - value
stopWatch.Stop();
TimeSpan ts = stopWatch.Elapsed;
elapsedTime = String.Format("{0:00}:{1:00}:{2:00}.{3:00}", ts.Hours, ts.Minutes, ts.Seconds, ts.Milliseconds / 10);
tasks = null;
taskOrchestrator = null;
isDone = true;
rst.Set();
}
public double Wait()
{
rst.WaitOne();
return pi;
}
public bool IsDone()
{
return isDone;
}
private int PointGenerator(long n)
{
int N_0 = 0;
Random rnd = new Random(Thread.CurrentThread.ManagedThreadId);
for (int i = 1; i <= n / Environment.ProcessorCount; i++)
{
double x = rnd.NextDouble(); // to generate coordinates
double y = rnd.NextDouble(); //
if (((x - 0.5) * (x - 0.5) + (y - 0.5) * (y - 0.5)) < 0.25)
{
N_0++;
}
}
return N_0;
}
}
Blocking call:
CalcPiTPL pi = new CalcPiTPL();
pi.Run(1000000000);
Console.WriteLine(pi.Wait());
non-blocking call:
CalcPiTPL pi = new CalcPiTPL();
pi.Run(100000000);
while (pi.IsDone()==false)
{
Thread.Sleep(100);
// Do something else
}
Console.WriteLine(pi.Pi);
Adding an event would probably be nice, if someone wants to use this in a GUI application. Maybe i will do that later.
Feel free to correct, if i messed something up.
When your whole parallel part is inside a lock scope nothing is actually parallel. Only a single thread can be inside a lock scope in any given moment.
You can simply use different Random instances instead of a single one.
I am trying to compare performance between parallel streams in Java 8 and PLINQ (C#/.Net 4.5.1).
Here is the result I get on my machine ( System Manufacturer Dell Inc. System Model Precision M4700 Processor Intel(R) Core(TM) i7-3740QM CPU # 2.70GHz, 2701 Mhz, 4 Core(s), 8 Logical Processor(s) Installed Physical Memory (RAM) 16.0 GB OS Name Microsoft Windows 7 Enterprise Version 6.1.7601 Service Pack 1 Build 7601)
C# .Net 4.5.1 (X64-release)
Serial:
470.7784, 491.4226, 502.4643, 481.7507, 464.1156, 463.0088, 546.149, 481.2942, 502.414, 483.1166
Average: 490.6373
Parallel:
158.6935, 133.4113, 217.4304, 182.3404, 184.188, 128.5767, 160.352, 277.2829, 127.6818, 213.6832
Average: 180.5496
Java 8 (X64)
Serial:
471.911822, 333.843924, 324.914299, 325.215631, 325.208402, 324.872828, 324.888046, 325.53066, 325.765791, 325.935861
Average:326.241715
Parallel:
212.09323, 73.969783, 68.015431, 66.246628, 66.15912, 66.185373, 80.120837, 75.813539, 70.085948, 66.360769
Average:70.3286
It looks like PLINQ does not scale across the CPU cores. I am wondering if I miss something.
Here is the code for C#:
class Program
{
static void Main(string[] args)
{
var NUMBER_OF_RUNS = 10;
var size = 10000000;
var vals = new double[size];
var rnd = new Random();
for (int i = 0; i < size; i++)
{
vals[i] = rnd.NextDouble();
}
var avg = 0.0;
Console.WriteLine("Serial:");
for (int i = 0; i < NUMBER_OF_RUNS; i++)
{
var watch = Stopwatch.StartNew();
var res = vals.Select(v => Math.Sin(v)).ToArray();
var elapsed = watch.Elapsed.TotalMilliseconds;
Console.Write(elapsed + ", ");
if (i > 0)
avg += elapsed;
}
Console.Write("\nAverage: " + (avg / (NUMBER_OF_RUNS - 1)));
avg = 0.0;
Console.WriteLine("\n\nParallel:");
for (int i = 0; i < NUMBER_OF_RUNS; i++)
{
var watch = Stopwatch.StartNew();
var res = vals.AsParallel().Select(v => Math.Sin(v)).ToArray();
var elapsed = watch.Elapsed.TotalMilliseconds;
Console.Write(elapsed + ", ");
if (i > 0)
avg += elapsed;
}
Console.Write("\nAverage: " + (avg / (NUMBER_OF_RUNS - 1)));
}
}
Here is the code for Java:
import java.util.Arrays;
import java.util.Random;
import java.util.stream.DoubleStream;
public class Main {
private static final Random rand = new Random();
private static final int MIN = 1;
private static final int MAX = 140;
private static final int POPULATION_SIZE = 10_000_000;
public static final int NUMBER_OF_RUNS = 10;
public static void main(String[] args) throws InterruptedException {
Random rnd = new Random();
double[] vals1 = DoubleStream.generate(rnd::nextDouble).limit(POPULATION_SIZE).toArray();
double avg = 0.0;
System.out.println("Serial:");
for (int i = 0; i < NUMBER_OF_RUNS; i++)
{
long start = System.nanoTime();
double[] res = Arrays.stream(vals1).map(Math::sin).toArray();
double duration = (System.nanoTime() - start) / 1_000_000.0;
System.out.print(duration + ", " );
if (i > 0)
avg += duration;
}
System.out.println("\nAverage:" + (avg / (NUMBER_OF_RUNS - 1)));
avg = 0.0;
System.out.println("\n\nParallel:");
for (int i = 0; i < NUMBER_OF_RUNS; i++)
{
long start = System.nanoTime();
double[] res = Arrays.stream(vals1).parallel().map(Math::sin).toArray();
double duration = (System.nanoTime() - start) / 1_000_000.0;
System.out.print(duration + ", " );
if (i > 0)
avg += duration;
}
System.out.println("\nAverage:" + (avg / (NUMBER_OF_RUNS - 1)));
}
}
Both runtimes make a decision about how many threads to use in order to complete the parallel operation. That is a non-trivial task that can take many factors into account, including the degree to which the task is CPU bound, the estimated time to complete the task, etc.
Each runtime is different decisions about how many threads to use to resolve the request. Neither decision is obviously right or wrong in terms of system-wide scheduling, but the Java strategy performs the benchmark better (and leaves fewer CPU resources available for other tasks on the system).
I finally got VS2012 and got a simple demo up and working to check out the potential performance boost of async and await, but to my dismay it is slower! Its possible I'm doing something wrong, but maybe you can help me out. (I also added a simple Threaded solution, and that runs faster as expected)
My code uses a class to sum an array, based on the number of cores on your system (-1) Mine had 4 cores, so I saw about a 2x speed up (2.5 threads) for threading, but a 2x slow down for the same thing but with async/await.
Code: (Note you will need to added the reference to System.Management to get the core detector working)
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Threading;
using System.Management;
using System.Diagnostics;
namespace AsyncSum
{
class Program
{
static string Results = "";
static void Main(string[] args)
{
Task t = Run();
t.Wait();
Console.WriteLine(Results);
Console.ReadKey();
}
static async Task Run()
{
Random random = new Random();
int[] huge = new int[1000000];
for (int i = 0; i < huge.Length; i++)
{
huge[i] = random.Next(2);
}
ArraySum summer = new ArraySum(huge);
Stopwatch sw = new Stopwatch();
sw.Restart();
long tSum = summer.Sum();
for (int i = 0; i < 100; i++)
{
tSum = summer.Sum();
}
long tticks = sw.ElapsedTicks / 100;
long aSum = await summer.SumAsync();
sw.Restart();
for (int i = 0; i < 100; i++)
{
aSum = await summer.SumAsync();
}
long aticks = sw.ElapsedTicks / 100;
long dSum = summer.SumThreaded();
sw.Restart();
for (int i = 0; i < 100; i++)
{
dSum = summer.SumThreaded();
}
long dticks = sw.ElapsedTicks / 100;
long pSum = summer.SumParallel();
sw.Restart();
for (int i = 0; i < 100; i++)
{
pSum = summer.SumParallel();
}
long pticks = sw.ElapsedTicks / 100;
Program.Results += String.Format("Regular Sum: {0} in {1} ticks\n", tSum, tticks);
Program.Results += String.Format("Async Sum: {0} in {1} ticks\n", aSum, aticks);
Program.Results += String.Format("Threaded Sum: {0} in {1} ticks\n", dSum, dticks);
Program.Results += String.Format("Parallel Sum: {0} in {1} ticks\n", pSum, pticks);
}
}
class ArraySum
{
int[] Data;
int ChunkSize = 1000;
int cores = 1;
public ArraySum(int[] data)
{
Data = data;
cores = 0;
foreach (var item in new System.Management.ManagementObjectSearcher("Select * from Win32_Processor").Get())
{
cores += int.Parse(item["NumberOfCores"].ToString());
}
cores--;
if (cores < 1) cores = 1;
ChunkSize = Data.Length / cores + 1;
}
public long Sum()
{
long sum = 0;
for (int i = 0; i < Data.Length; i++)
{
sum += Data[i];
}
return sum;
}
public async Task<long> SumAsync()
{
Task<long>[] psums = new Task<long>[cores];
for (int i = 0; i < psums.Length; i++)
{
int start = i * ChunkSize;
int end = start + ChunkSize;
psums[i] = Task.Run<long>(() =>
{
long asum = 0;
for (int a = start; a < end && a < Data.Length; a++)
{
asum += Data[a];
}
return asum;
});
}
long sum = 0;
for (int i = 0; i < psums.Length; i++)
{
sum += await psums[i];
}
return sum;
}
public long SumThreaded()
{
long sum = 0;
Thread[] threads = new Thread[cores];
long[] buckets = new long[cores];
for (int i = 0; i < cores; i++)
{
int start = i * ChunkSize;
int end = start + ChunkSize;
int bucket = i;
threads[i] = new Thread(new ThreadStart(() =>
{
long asum = 0;
for (int a = start; a < end && a < Data.Length; a++)
{
asum += Data[a];
}
buckets[bucket] = asum;
}));
threads[i].Start();
}
for (int i = 0; i < cores; i++)
{
threads[i].Join();
sum += buckets[i];
}
return sum;
}
public long SumParallel()
{
long sum = 0;
long[] buckets = new long[cores];
ParallelLoopResult lr = Parallel.For(0, cores, new Action<int>((i) =>
{
int start = i * ChunkSize;
int end = start + ChunkSize;
int bucket = i;
long asum = 0;
for (int a = start; a < end && a < Data.Length; a++)
{
asum += Data[a];
}
buckets[bucket] = asum;
}));
for (int i = 0; i < cores; i++)
{
sum += buckets[i];
}
return sum;
}
}
}
Any thoughts? Am I doing async/await wrong? I'll be happy to try any suggestions.
It's important to separate "asynchrony" from "parallelization". await is there to help make writing asynchronous code easier. Code that runs in parallel may (or may not) involve asynchrony, and code that is asynchronous may or may not run in parallel.
Nothing about await is designed to make parallel code faster. The purpose of await is to make writing asynchronous code easier, while minimizing the negative performance implications. Using await won't ever be faster than correctly written non-await asynchronous code (although because writing correct code with await is easier, it will sometimes be faster because the programmer isn't capable of writing that asynchronous code correctly without await, or isn't willing to put the time in to do so. If the non-async code is written well it will perform about as well, if not a tad better, than the await code.
C# does have support specifically for parallelization, it's just not specifically though await. The Task Parallel Library (TPL) as well as Parallel LINQ (PLINQ) have several very effective means of parallelizing code that is generally more efficient than naive threaded implementations.
In your case, an effective implementation using PLINQ might be something like this:
public static int Sum(int[] array)
{
return array.AsParallel().Sum();
}
Note that this will take care of efficiently partitioning the input sequence into chunks that will be run in parallel; it will take care of determining the appropriate size of chunks, and the number of concurrent workers, and it will appropriately aggregate the results of those workers in a manor that is both properly synchronized to ensure a correct result (unlike your threaded example) and efficient (meaning that it won't completely serialize all aggregation).
async isn't intended for heavy-duty parallel computation. You can do basic parallel work using Task.Run with Task.WhenAll, but any serious parallel work should be done using the task parallel library (e.g., Parallel). Asynchronous code on the client side is about responsiveness, not parallel processing.
A common approach is to use Parallel for the parallel work, and then wrap it in a Task.Run and use await on it to keep the UI responsive.
Your benchmark has a couple of flaws:
You are timing the first run which includes initialization time (loading class Task, JIT-compilation etc.)
You are using DateTime.Now, which is too inaccurate for timings in the millisecond range. You'll need to use StopWatch
With these two issues fixed; I get the following benchmark results:
Regular Sum: 499946 in 00:00:00.0047378
Async Sum: 499946 in 00:00:00.0016994
Threaded Sum: 499946 in 00:00:00.0026898
Async now comes out as the fastest solution, taking less than 2ms.
This is the next problem: timing something as fast as 2ms is extremely unreliable; your thread can get paused for longer than that if some other process is using the CPU in the background. You should average the results over several thousands of benchmark runs.
Also, what's going on with your number of core detection? My quad-core is using a chunk size of 333334 which allows only 3 threads to run.
On a quick look, the results are expected: your async sum is using just one thread, while you asynchronously wait for it to finish, so it's slower than the multi-threaded sum.
You'd use async in case you have something else to finish while it's doing its job. So, this wouldn't be the right test for any speed/response improvements.
I'm working on a simple benchmark testing out both Mono's ParallelFX against Java on several Linux boxes. The test for .NET runs great on Windows and Linux alike, but I'm having some kind of snag with the Java version...
I can see the specified number of threads starting up, but they run in a strange fashion. It acts like they start up, but they finish very slowly. They continue to start, but take forever to finish. It seems like it should be exceeding the limit of the thread pool, and my CPU usage looks to me like it's only using one or two cores (I've got an i7 processor so something like 8 should try to be used).
Yes, I know I am not being "thread safe" with my integers and probably other stuff too. I don't really care right now. Something larger is an issue here.
C# Version
public class Program
{
static void Main(string[] args)
{
const int numberOfCycles = 1000;
const int numbersPerCycle = 1000000;
Stopwatch swG = Stopwatch.StartNew();
int threadCount = 0;
int completeCount = 0;
Parallel.For(0, numberOfCycles, x =>
{
Console.WriteLine(string.Format("Starting cycle {0}. Thread count at {1}", x, threadCount++));
Random r = new Random();
Stopwatch sw = Stopwatch.StartNew();
List<double> numbers = new List<double>();
for (int i = 0; i < numbersPerCycle; i++)
{
numbers.Add(r.NextDouble() * 1000);
}
numbers.Sort();
double min = numbers.Min();
double max = numbers.Max();
completeCount++;
Console.WriteLine(string.Format("{0} cycles complete: {1:#,##0.0} ms. Min: {2:0.###} Max: {3:0.###}", completeCount, sw.ElapsedMilliseconds, min, max));
threadCount--;
});
Console.WriteLine(string.Format("All {0} cycles complete. Took {1:#,##0.0} ms.", numberOfCycles, swG.ElapsedMilliseconds));
Console.WriteLine("Press any key to continue.");
Console.ReadKey();
}
}
Java Version
P.S. I am lazy and stole the Stopwatch class from here: Is there a stopwatch in Java?
public class JavaMonoTest {
static int threadCount = 0;
static int completeCount = 0;
static String CLRF = "\r\n";
public static void main(String[] args) throws IOException, InterruptedException {
final int numberOfCycles = 1000;
final int numbersPerCycle = 1000000;
final int NUM_CORES = Runtime.getRuntime().availableProcessors();
//Setup the running array
List<Integer> cyclesList = new LinkedList<Integer>();
for(int i = 0; i < numberOfCycles; i++){
cyclesList.add(i);
}
Stopwatch swG = new Stopwatch();
swG.start();
ExecutorService exec = Executors.newFixedThreadPool(NUM_CORES);
try {
for (final Integer x : cyclesList) {
exec.submit(new Runnable() {
#Override
public void run() {
System.out.printf("Starting cycle %s. Thread count at %s %s", x, threadCount++, CLRF);
Random r = new Random();
Stopwatch sw = new Stopwatch();
sw.start();
List<Double> numbers = new LinkedList<Double>();
for (int i = 0; i < numbersPerCycle; i++)
{
numbers.add(r.nextDouble() * 1000);
}
Collections.sort(numbers);
double min = Collections.min(numbers);
double max = Collections.max(numbers);
completeCount++;
System.out.printf("%s cycles complete: %.2f ms. Min: %.2f Max: %.2f %s", completeCount, sw.getElapsedTime(), min, max, CLRF);
threadCount--;
}
});
}
} finally {
exec.shutdown();
}
exec.awaitTermination(1, TimeUnit.DAYS);
System.out.printf("All %s cycles complete. Took %.2f ms. %s", numberOfCycles, swG.getElapsedTime(), CLRF);
System.out.println("Press any key to continue.");
System.in.read();
}
}
Updated C# Version to Match Java Version In Answer
public class Program
{
static void Main(string[] args)
{
const int numberOfCycles = 1000;
const int numbersPerCycle = 1000000;
Stopwatch swG = Stopwatch.StartNew();
int threadCount = 0;
int completeCount = 0;
Parallel.For(0, numberOfCycles, x =>
{
Console.WriteLine(string.Format("Starting cycle {0}. Thread count at {1}", x, Interlocked.Increment(ref threadCount)));
Random r = new Random();
Stopwatch sw = Stopwatch.StartNew();
double[] numbers = new double[numbersPerCycle];
for (int i = 0; i < numbersPerCycle; i++)
{
numbers[i] = r.NextDouble() * 1000;
}
Array.Sort(numbers);
double min = numbers[0];
double max = numbers[numbers.Length - 1];
Interlocked.Increment(ref completeCount);
Console.WriteLine(string.Format("{0} cycles complete: {1:#,##0.0} ms. Min: {2:0.###} Max: {3:0.###}", completeCount, sw.ElapsedMilliseconds, min, max));
Interlocked.Decrement(ref threadCount);
});
Console.WriteLine(string.Format("All {0} cycles complete. Took {1:#,##0.0} ms.", numberOfCycles, swG.ElapsedMilliseconds));
Console.WriteLine("Press any key to continue.");
Console.ReadKey();
}
}
Running the program I see that its using 97%-98% of eight cpus, but also creating an insane amount of garbage. If I make the program more efficient it runs to completion much faster.
import java.util.*;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class JavaMonoTest {
static final AtomicInteger threadCount = new AtomicInteger();
static final AtomicInteger completeCount = new AtomicInteger();
public static void main(String[] args) throws InterruptedException {
final int numberOfCycles = 1000;
final int numbersPerCycle = 1000000;
final int NUM_CORES = Runtime.getRuntime().availableProcessors();
long swG = System.nanoTime();
ExecutorService exec = Executors.newFixedThreadPool(NUM_CORES);
try {
for (int i = 0; i < numberOfCycles; i++) {
final int x = i;
exec.submit(new Runnable() {
#Override
public void run() {
try {
System.out.printf("Starting cycle %s. Thread count at %s %n", x, threadCount.getAndIncrement());
Random r = new Random();
long sw = System.nanoTime();
double[] numbers = new double[numbersPerCycle];
for (int i = 0; i < numbersPerCycle; i++) {
numbers[i] = r.nextDouble() * 1000;
}
Arrays.sort(numbers);
double min = numbers[0];
double max = numbers[numbers.length - 1];
completeCount.getAndIncrement();
System.out.printf("%s cycles complete: %.2f ms. Min: %.2f Max: %.2f %n",
completeCount, (System.nanoTime() - sw) / 1e6, min, max);
threadCount.getAndDecrement();
} catch (Throwable t) {
t.printStackTrace();
}
}
});
}
} finally {
exec.shutdown();
}
exec.awaitTermination(1, TimeUnit.DAYS);
System.out.printf("All %s cycles complete. Took %.2f ms. %n",
numberOfCycles, (System.nanoTime() - swG) / 1e6);
}
}
prints
Starting cycle 0. Thread count at 0
Starting cycle 7. Thread count at 7
Starting cycle 6. Thread count at 6
... deleted ...
999 cycles complete: 139.28 ms. Min: 0.00 Max: 1000.00
1000 cycles complete: 139.05 ms. Min: 0.00 Max: 1000.00
All 1000 cycles complete. Took 19431.14 ms.
In place of:
ExecutorService exec = Executors.newFixedThreadPool(NUM_CORES);
try {
for (final Integer x : cyclesList) {
exec.submit(new Runnable() {
try:
ExecutorService exec = Executors.newFixedThreadPool(NUM_CORES);
try {
for (final Integer x : cyclesList) {
exec.execute( new Runnable() { // No Future< T > needed