Measure Execution time of Parallel.For - c#

I am using a Parallel.For loop to increase execution speed of a computation.
I would like to measure the approximate time left for the computation. Normally one simply has to measure the time it takes for each step and estimate the total time by multiplying the step time by the total number of steps.
e.g., If there are 100 steps and some step takes 5 seconds then one could except that the total time would be about 500 seconds. (one could average over several steps and continuously report to the user which is what I want to do).
The only way I can think to do this is by using an outer for loop that essentially resorts back to the original way by splitting up the parallel.for interval and measuring each one.
for(i;n;i += step)
Time(Parallel.For(i, i + step - 1, ...))
This isn't a very good way in general because either a few number of very long steps or a large number of short steps cause problems with timing.
Anyone have any ideas?
(Please realize I need a real time estimation of the time it is taking the parallel.for to complete and NOT the total time. I want to let the user know how much time is left in execution).

This method seems to be pretty effective. We can "linearize" the parallel for loop by simply having each parallel loop increment a counter:
Parallel.For(0, n, (i) => { Thread.Sleep(1000); Interlocked.Increment(ref cnt); });
(Note, thanks to Niclas, that ++ is not atomic and one must use lock or Interlocked.Increment)
Each loop, running in parallel, will increment cnt. The effect is that cnt is monotonically increasing to n, and cnt/n is the percentage of how much the for is complete. Since there is no contention for cnt, there are no concurrency issues and it is very fast and very perfectly accurate.
We can measure the percentage of completion of the parallel For loop at any time during the execution by simply computing cnt/n
The total computation time can be easily estimated by dividing the elapsed time since the start of the loop with the percentage the loop is at. These two quantities should have approximately the same rates of change when each loop takes approximately the same amount of time is relatively well behaved (can average out small fluctuation too).
Obviously the more unpredictable each task is, the more inaccurate the remaining computation time will be. This is to be expected and in general, there is no solution (which is why it's called an approximation). We can still get the elapsed computation time or percentage with complete accuracy.
The underlying assumption of any estimation of "time left" algorithms is each sub task takes approximately the same computation time (assuming one wants a linear result). For example, if we have a parallel approach where 99 tasks are very quick and 1 task is very slow, our estimation will be grossly inaccurate. Our counter will zip up to 99 pretty quick then sit on the last percentage until the slow task completes. We could linearly interpolate and do further estimation to get a smoother countdown but ultimately there is a breaking point.
The following code demonstrates how to measure the parallel for efficiently. Note the time at 100% is the true total execution time and can be used as a reference.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
using System.Diagnostics;
namespace ParallelForTiming
{
class Program
{
static void Main(string[] args)
{
var sw = new Stopwatch();
var pct = 0.000001;
var iter = 20;
var time = 20 * 1000 / iter;
var p = new ParallelOptions(); p.MaxDegreeOfParallelism = 4;
var Done = false;
Parallel.Invoke(() =>
{
sw.Start();
Parallel.For(0, iter, p, (i) => { Thread.Sleep(time); lock(p) { pct += 1 / (double)iter; }});
sw.Stop();
Done = true;
}, () =>
{
while (!Done)
{
Console.WriteLine(Math.Round(pct*100,2) + " : " + ((pct < 0.1) ? "oo" : (sw.ElapsedMilliseconds / pct /1000.0).ToString()));
Thread.Sleep(2000);
}
}
);
Console.WriteLine(Math.Round(pct * 100, 2) + " : " + sw.ElapsedMilliseconds / pct / 1000.0);
Console.ReadKey();
}
}
}

This is almost impossible to answer.
First of all, it's not clear what all the steps do. Some steps may be I/O-intensive, or computationally intensive.
Furthermore, Parallel.For is a request -- you are not sure that your code will actually run in parallel. It depends on circumstances (availability of threads and memory) whether the code will actually run in parallel. Then if you have parallel code that relies on I/O, one thread will block the others while waiting for the I/O to complete. And you don't know what other processes are doing either.
This is what makes predicting how long something will take extremely error-prone and, actually, an exercise in futility.

This problem is a tough one to answer. The problems with timing that you refer to using very long steps or a large number of very short steps are likley related to that your loop will be working at the edges of what the parallel partitioner can handle.
Since the default partitioner is very dynamic and we know nothing about your actual problem there is no good answer that allows you to solve the problem at hand while still reaping the benefits of parallel execution with dynamic load balancing.
If it is very important to achive a reliable estimation of projected runtime perhaps you could set up a custom partitioner and then leverage your knowledge about the partioning to extrapolate timings from a few chunks on one thread.

Here's a possible solution to measure the average of all previously finished tasks. After each task finishes, an Action<T> is called where you could summarize all times and divide it by the total tasks finished. This is however just the current state and has no way to predict any future tasks / averages. (As others mentioned, this is quite difficult)
However: You'll have to measure if it fits for your problem because there is a possibility for lock contention on both the method level declared variables.
static void ComputeParallelForWithTLS()
{
var collection = new List<int>() { 1000, 2000, 3000, 4000 }; // values used as sleep parameter
var sync = new object();
TimeSpan averageTime = new TimeSpan();
int amountOfItemsDone = 0; // referenced by the TPL, increment it with lock / interlocked.increment
Parallel.For(0, collection.Count,
() => new TimeSpan(),
(i, loopState, tlData) =>
{
var sw = Stopwatch.StartNew();
DoWork(collection, i);
sw.Stop();
return sw.Elapsed;
},
threadLocalData => // Called each time a task finishes
{
lock (sync)
{
averageTime += threadLocalData; // add time used for this task to the total.
}
Interlocked.Increment(ref amountOfItemsDone); // increment the tasks done
Console.WriteLine(averageTime.TotalMilliseconds / amountOfItemsDone + ms.");
/*print out the average for all done tasks so far. For an estimation,
multiply with the remaining items.*/
});
}
static void DoWork(List<int> items, int current)
{
System.Threading.Thread.Sleep(items[current]);
}

I would propose having the method being executed at each step report when it is done. This is slightly tricky with thread safety of course, so that is something to remember when implementing. This will let you keep track of number of finished tasks out of the total, and also makes it (sort of) easy to know the time spent on each individual step, which is useful to remove outliers etc.
EDIT: Some code to demonstrate the idea
Parallel.For(startIdx, endIdx, idx => {
var sw = Stopwatch.StartNew();
DoCalculation(idx);
sw.Stop();
var dur = sw.Elapsed;
ReportFinished(idx, dur);
});
The key here is that ReportFinished will give you continuous information about number of finished tasks, and the duration of each of them. This enables you to do some better guesses about how long time remains by doing statistics on this data.

Here i wrote class that mesures time and speed
public static class Counter
{
private static long _seriesProcessedItems = 0;
private static long _totalProcessedItems = 0;
private static TimeSpan _totalTime = TimeSpan.Zero;
private static DateTime _operationStartTime;
private static object _lock = new object();
private static int _numberOfCurrentOperations = 0;
public static void StartAsyncOperation()
{
lock (_lock)
{
if (_numberOfCurrentOperations == 0)
{
_operationStartTime = DateTime.Now;
}
_numberOfCurrentOperations++;
}
}
public static void EndAsyncOperation(int itemsProcessed)
{
lock (_lock)
{
_numberOfCurrentOperations--;
if (_numberOfCurrentOperations < 0)
throw new InvalidOperationException("EndAsyncOperation without StartAsyncOperation");
_seriesProcessedItems +=itemsProcessed;
if (_numberOfCurrentOperations == 0)
{
_totalProcessedItems += _seriesProcessedItems;
_totalTime += DateTime.Now - _operationStartTime;
_seriesProcessedItems = 0;
}
}
}
public static double GetAvgSpeed()
{
if (_totalProcessedItems == 0) throw new InvalidOperationException("_totalProcessedItems is zero");
if (_totalProcessedItems == 0) throw new InvalidOperationException("_totalTime is zero");
return _totalProcessedItems / (double)_totalTime.TotalMilliseconds;
}
public static void Reset()
{
_totalProcessedItems = 0;
_totalTime = TimeSpan.Zero;
}
}
Example of usage and test:
static void Main(string[] args)
{
var st = Stopwatch.StartNew();
Parallel.For(0, 100, _ =>
{
Counter.StartAsyncOperation();
Thread.Sleep(100);
Counter.EndAsyncOperation(1);
});
st.Stop();
Console.WriteLine("Speed correct {0}", 100 / (double)st.ElapsedMilliseconds);
Console.WriteLine("Speed to test {0}", Counter.GetAvgSpeed());
}

Related

Parallel.ForEach blocked on long iteration

I've been using Parallel.ForEach to do some time-consuming processing on collections of items. The processing is actually handled by an external command line tool and I cannot change that. However, it seems that the Parallel.ForEach will get "stuck" on a long running item from the collection. I've distilled the problem down and can show that Parallel.ForEach is, in fact, waiting for this long one to finish and not allowing any others through. I've written a console app to demonstrate the problem:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
namespace testParallel
{
class Program
{
static int inloop = 0;
static int completed = 0;
static void Main(string[] args)
{
// initialize an array integers to hold the wait duration (in milliseconds)
var items = Enumerable.Repeat(10, 1000).ToArray();
// set one of the items to 10 seconds
items[50] = 10000;
// Initialize our line for reporting status
Console.Write(0.ToString("000") + " Threads, " + 0.ToString("000") + " completed");
// Start the loop in a task (to avoid SO answers having to do with the Parallel.ForEach call, itself, not being parallel)
var t = Task.Factory.StartNew(() => Process(items));
// Wait for the operations to compelte
t.Wait();
// Report finished
Console.WriteLine("\nDone!");
}
static void Process(int[] items)
{
// SpinWait (not sleep or yield or anything) for the specified duration
Parallel.ForEach(items, (msToWait) =>
{
// increment the counter for how many threads are in the loop right now
System.Threading.Interlocked.Increment(ref inloop);
// determine at what time we shoule stop spinning
var e = DateTime.Now + new TimeSpan(0, 0, 0, 0, msToWait);
// spin until the target time
while (DateTime.Now < e) /* no body -- just a hard loop */;
// count another completed
System.Threading.Interlocked.Increment(ref completed);
// we're done with this iteration
System.Threading.Interlocked.Decrement(ref inloop);
// report status
Console.Write("\r" + inloop.ToString("000") + " Threads, " + completed.ToString("000") + " completed");
});
}
}
}
Basically, I make an array of int to store the number of milliseconds a given operation takes. I set them all to 10 except for one, which I set to 10000 (so, 10 seconds). I kick off the Parallel.ForEach in a task and process each integer in a hard spin wait (so it shouldn't be yielding or sleeping or anything).
On each iteration, I report how many iterations are in the body of the loop right now, and how many iterations we have completed. Mostly, it goes along fine. However, toward the end (time-wise), it reports "001 Threads, 987 Completed".
My question is why doesn't it use 7 of the other cores to work on the remaining 13 "jobs"? This one long-running iteration should not keep it from processing other elements in the collection, right?
This example happens to be a fixed collection, but it could easily be set to be an enumerable. We wouldn't want to stop fetching the next item in the enumerable just because one was taking a long time.
I found the answer (or at least, an answer). It has to do with the chunk partitioning. The SO answer here got it for me. So basically, at the top of my "Process" function, if I change from this:
static void Process(int[] items)
{
Parallel.ForEach(items, (msToWait) => { ... });
}
to this
static void Process(int[] items)
{
var partitioner = Partitioner.Create(items, EnumerablePartitionerOptions.NoBuffering);
Parallel.ForEach(partitioner, (msToWait) => { ... });
}
it grabs the work one at a time. For the more typical case of a parallel for each, where the body doesn't take more than a second, I can certainly see chunking the sets of work. In my use case, however, each body part can take anywhere from half a second to 5 hours. I certainly would not want a bunch of the 10-second variety elements to be blocked by one 5 hour element. So, in this case, the overhead of "one-at-a-time" is well worth it.

Are these 2 methods started almost at the same time?

When I run below code, Output is this:
When I run till 300, output is this:
When I run till 100, output is this:
Does this mean that both methods started almost at the same time?
If this is true, why do we need Parallel library if we can achieve parallelism by async-await?
using System;
using System.Threading.Tasks;
class Program
{
public static void PrintX()
{
for (int i = 0; i < 500; i++) { Console.Write("x"); }
}
public static void PrintY()
{
for (int i = 0; i < 500; i++) { Console.Write("y"); }
}
public async Task RunAsync()
{
var t1 = Task.Run(() => PrintY());
var t2 = Task.Run(() => PrintX());
await t1;
await t2;
}
static void Main(string[] args)
{
Task t = new Program().RunAsync();
t.Wait();
}
}
Ultimately you're at the mercy of the thread pool here. You have enqueued two items (Task.Run), and they will be picked up and serviced at some future time. When they start is non-deterministic, and will depend on how many available threads there are, and other factors.
They will start approximately at the same time, with no guarantees of anything (perhaps not even the order in which they start). The await will be triggered against their completion - so when you call await (or even whether you call await) won't impact them in any way. They might run in parallel, but most likely they individually run fast enough that whichever one gets started first will have completed before it tries starting the second. They might even end up running consecutively on the same thread (outputting the managed thread id would be a way to see this).
As for why we need Parallel: firstly, it pre-dates async/await by a long time; secondly it does a lot of things to allow larger scale parallelization - things like running a large sequence with concurrent processing including fixed maximum parallelization.
Just to show that it can be concurrent, here's the output from a real run where I added the Environment.CurrentManagedThreadId into the output:
main: 1
y: 3
x: 4
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
xxxxxxxxxxxxxxxxxxxyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy
definitely concurrent, but: other runs can show very different outputs

C# Timing Loop Rate Inaccuracies

I'm trying to create a method in C# whereby I can repeatedly perform an action (in my particular application it's sending a UDP packet) at a targeted rate. I know that timer inaccuracy in C# will prevent the output from being precisely at the target rate, but best effort is good enough. However, at higher rates, the timing seems to be completely off.
while (!token.IsCancellationRequested)
{
stopwatch.Restart();
Thread.Sleep(5); // Simulate process
int processTime = (int)stopwatch.ElapsedMilliseconds;
int waitTime = taskInterval - processTime;
Task.Delay(Math.Max(0, waitTime)).Wait();
}
See here for the full example console app.
When run, the FPS output of this test app shows around 44-46 Hz for a target of 60 Hz. However at lower rates (say 20 Hz), the output rate is much closer to the target. I can't understand why this would be the case. What is the problem with my code?
The problem is that Thread.Sleep (or Task.Delay) is not very accurate. Take a look at this: Accuracy of Task.Delay
One way to fix this is to a start the timer once, and then have a loop where you delay some ~15 ms in each iteration. Inside each iteration, you calculate how much times the operation should have been executed so far and you compare it with how many times you have run it so far. And you then run the operation enough times to catch up.
Here is some code sample:
private static void timerTask(CancellationToken token)
{
const int taskRateHz = 60;
var stopwatch = new Stopwatch();
stopwatch.Start();
int ran_so_far = 0;
while (!token.IsCancellationRequested)
{
Thread.Sleep(15);
int should_have_run =
stopwatch.ElapsedMilliseconds * taskRateHz / 1000;
int need_to_run_now = should_have_run - ran_so_far;
if(need_to_run_now > 0)
{
for(int i = 0; i < need_to_run_now; i++)
{
ExecuteTheOperationHere();
}
ran_so_far += need_to_run_now;
}
}
}
Please note that you want to use longs instead of ints if the process is to remain alive for a very long time.
If you replace this:
Task.Delay(Math.Max(0, waitTime)).Wait();
with this:
Thread.Sleep(Math.Max(0, waitTime));
You should get closer values on higher rates (why would you use Task.Delay(..).Wait() anyway?).

Is there anyway to know if thread has released CPU?

I have accidentally found a peculiar behaviour for which I am not getting any reason. In the program below there are 2 sections. First section is commented which creates 2 threads and does some work, in second section I added some code for getting primes which I tried for checking AsParallel Performance. AsParallel really decreased the time for program. But the thing which struck me most was when I commented the above section I got a improvement in time.
So my question is did the first section ,which I have commented, had kept the CPU busy enough. Or was there any other reason.
Please see time elapsed for
1) When first section is not commented : Elapsed: 4260619 (Ticks)
2) When first section is commented : Elapsed: 2700445 (Ticks)
class Program
{
[ThreadStatic]
static int thStaticInt = 0;
static void Main(string[] args)
{
//new Thread(() =>
//{
// for (int i = 0; i < 10; i++)
// {
// thStaticInt++;
// Console.WriteLine("from first {0}", thStaticInt);
// }
//}
//).Start();
//new Thread(() =>
//{
// for (int i = 0; i < 10; i++)
// {
// thStaticInt++;
// Console.WriteLine("from second {0}", thStaticInt);
// }
//}
//).Start();
//Console.WriteLine("Press any key");
//Console.ReadLine();
//Another section starts here
IEnumerable<int> numbers = Enumerable.Range(3, 1000000);
Stopwatch watch = new Stopwatch();
watch.Start();
var primes = from n in numbers.AsParallel()
where Enumerable.Range(2, (int)Math.Sqrt(n)).All(i => n % i != 0)
select n;
IEnumerable<int> primeNumbers = primes.ToArray();
watch.Stop();
TimeSpan ts = watch.Elapsed;
Console.WriteLine("Time Elapsed {0}", ts.Ticks);
Console.ReadLine();
}
}
You're not grabbing the complete picture. Your profiling leads you to misinterpretation.
When you start a thread you request the operating system to start a thread, that doesn't mean it starts immediately, it's just a request. The OS decides when it runs your thread and for how long. That being said, in your example, your threads in the commented section could be run before, during or even after your second section.
The work in your threads is also a bit dubious. A counter to ten is very minimalistic. Also keep in mind that what you do, writing to the console, from a thread is only possible because the Console class does the thread synchronization for you. So, how does this fit in the timing? I have no idea.
Then there might be other processes with other threads running that you don't know of.
And on top of all of that you probably have a multicore processor, which might or might not affect everything.
Profiling is not an easy task. You should at least repeat your tests multiple times, corelate the results and have a solid understanding of everything involved.

How do I calculate how many hashes I generate per second?

I have a function which generates hashes from a string:
string GenerateHash(string plainText);
I generate as many hashes as possible with 4 threads.
How do I calculate how many hashes (or megahashes) I generate per second?
Your problem breaks down nicely into 3 separate tasks
Sharing a single count variable across threads
Benchmarking thread completion time
Calculating hashes p/sec
Sharing a single count variable across threads
public static class GlobalCounter
{
public static int Value { get; private set; }
public static void Increment()
{
Value = GetNextValue(Value);
}
private static int GetNextValue(int curValue)
{
return Interlocked.Increment(ref curValue);
}
public static void Reset()
{
Value = 0;
}
}
Before you spin off the threads call GlobalCounter.Reset and then in each thread (after each successful hash) you would call GlobalCounter.Increment - using Interlocked.X performs atomic operations of Value in a thread-safe manner, it's also much faster than lock.
Benchmarking thread completion time
var sw = Stopwatch.StartNew();
Parallel.ForEach(someCollection, someValue =>
{
// generate hash
GlobalCounter.Increment();
});
sw.Stop();
Parallel.ForEach will block until all threads have finished
Calculating hashes per second
...
sw.Stop();
var hashesPerSecond = GlobalCounter.Value / sw.Elapsed.Seconds;
Use a counter variable to count the number of hashes generated and divide it accordingly all x seconds using a timer.
Its probably most performant if you have a counter for each thread and the timer just reads all the counters. Then you do not have as much locking.
You can use the Stopwatch class to more accurately measure time passing. You can start your stopwatch, generate your hashes, then stop the stopwatch. Then you should have a count of the hashes and a total time it took to generate, which you need to calculate the per-second rate.

Categories

Resources