Measure code speed in .net in milliseconds - c#

I want to get the maximum count I have to execute a loop for it to take x milliseconds to finish.
For eg.
int GetIterationsForExecutionTime(int ms)
{
int count = 0;
/* pseudocode
do
some code here
count++;
until executionTime > ms
*/
return count;
}
How do I accomplish something like this?

I want to get the maximum count I have to execute a loop for it to take x milliseconds to finish.
First off, simply do not do that. If you need to wait a certain number of milliseconds do not busy-wait in a loop. Rather, start a timer and return. When the timer ticks, have it call a method that resumes where you left off. The Task.Delay method might be a good one to use; it takes care of the timer details for you.
If your question is actually about how to time the amount of time that some code takes then you need much more than simply a good timer. There is a lot of art and science to getting accurate timings.
First you should always use Stopwatch and never use DateTime.Now for these timings. Stopwatch is designed to be a high-precision timer for telling you how much time elapsed. DateTime.Now is a low-precision timer for telling you if it is time to watch Doctor Who yet. You wouldn't use a wall clock to time an Olympic race; you'd use the highest precision stopwatch you could get your hands on. So use the one provided for you.
Second, you need to remember that C# code is compiled Just In Time. The first time you go through a loop can therefore be hundreds or thousands of times more expensive than every subsequent time due to the cost of the jitter analyzing the code that the loop calls. If you are intending on measuring the "warm" cost of a loop then you need to run the loop once before you start timing it. If you are intending on measuring the average cost including the jit time then you need to decide how many times makes up a reasonable number of trials, so that the average works out correctly.
Third, you need to make sure that you are not wearing any lead weights when you are running. Never make performance measurements while debugging. It is astonishing the number of people who do this. If you are in the debugger then the runtime may be talking back and forth with the debugger to make sure that you are getting the debugging experience you want, and that chatter takes time. The jitter is generating worse code than it normally would, so that your debugging experience is more consistent. The garbage collector is collecting less aggressively. And so on. Always run your performance measurements outside the debugger, and with optimizations turned on.
Fourth, remember that virtual memory systems impose costs similar to those of jitters. If you are already running a managed program, or have recently run one, then the pages of the CLR that you need are likely "hot" -- already in RAM -- where they are fast. If not, then the pages might be cold, on disk, and need to be page faulted in. That can change timings enormously.
Fifth, remember that the jitter can make optimizations that you do not expect. If you try to time:
// Let's time addition!
for (int i = 0; i < 1000000; ++i) { int j = i + 1; }
the jitter is entirely within its rights to remove the entire loop. It can realize that the loop computes no value that is used anywhere else in the program and remove it entirely, giving it a time of zero. Does it do so? Maybe. Maybe not. That's up to the jitter. You should measure the performance of realistic code, where the values computed are actually used somehow; the jitter will then know that it cannot optimize them away.
Sixth, timings of tests which create lots of garbage can be thrown off by the garbage collector. Suppose you have two tests, one that makes a lot of garbage and one that makes a little bit. The cost of the collection of the garbage produced by the first test can be "charged" to the time taken to run the second test if by luck the first test manages to run without a collection but the second test triggers one. If your tests produce a lot of garbage then consider (1) is my test realistic to begin with? It doesn't make any sense to do a performance measurement of an unrealistic program because you cannot make good inferences to how your real program will behave. And (2) should I be charging the cost of garbage collection to the test that produced the garbage? If so, then make sure that you force a full collection before the timing of the test is done.
Seventh, you are running your code in a multithreaded, multiprocessor environment where threads can be switched at will, and where the thread quantum (the amount of time the operating system will give another thread until yours might get a chance to run again) is about 16 milliseconds. 16 milliseconds is about fifty million processor cycles. Coming up with accurate timings of sub-millisecond operations can be quite difficult if the thread switch happens within one of the several million processor cycles that you are trying to measure. Take that into consideration.

var sw = Stopwatch.StartNew();
...
long elapsedMilliseconds = sw.ElapsedMilliseconds;

You could also use the Stopwatch class:
int GetIterationsForExecutionTime(int ms)
{
int count = 0;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
do
{
// some code here
count++;
} while (stopwatch.ElapsedMilliseconds < ms);
stopwatch.Stop();
return count;
}

Good points from Eric Lippert.
I'd been benchmarking and unit testing for a while and I'd advise you should discard every first-pass on you code cause JIT compilation.
So in a benchmarking code which use loop and Stopwatch remember to put this at the end of the loop:
// JIT optimization.
if (i == 0)
{
// Discard every result you've collected.
// And restart the timer.
stopwatch.Restart();
}

Related

why does Paralle.for loose so badly [duplicate]

Here is the code:
using (var context = new AventureWorksDataContext())
{
IEnumerable<Customer> _customerQuery = from c in context.Customers
where c.FirstName.StartsWith("A")
select c;
var watch = new Stopwatch();
watch.Start();
var result = Parallel.ForEach(_customerQuery, c => Console.WriteLine(c.FirstName));
watch.Stop();
Debug.WriteLine(watch.ElapsedMilliseconds);
watch = new Stopwatch();
watch.Start();
foreach (var customer in _customerQuery)
{
Console.WriteLine(customer.FirstName);
}
watch.Stop();
Debug.WriteLine(watch.ElapsedMilliseconds);
}
The problem is, Parallel.ForEach takes about 400ms vs a regular foreach, which takes about 40ms. What exactly am I doing wrong and why doesn't this work as I expect it to?
Suppose you have a task to perform. Let's say you're a math teacher and you have twenty papers to grade. It takes you two minutes to grade a paper, so it's going to take you about forty minutes.
Now let's suppose that you decide to hire some assistants to help you grade papers. It takes you an hour to locate four assistants. You each take four papers and you are all done in eight minutes. You've traded 40 minutes of work for 68 total minutes of work including the extra hour to find the assistants, so this isn't a savings. The overhead of finding the assistants is larger than the cost of doing the work yourself.
Now suppose you have twenty thousand papers to grade, so it is going to take you about 40000 minutes. Now if you spend an hour finding assistants, that's a win. You each take 4000 papers and are done in a total of 8060 minutes instead of 40000 minutes, a savings of almost a factor of 5. The overhead of finding the assistants is basically irrelevant.
Parallelization is not free. The cost of splitting up work amongst different threads needs to be tiny compared to the amount of work done per thread.
Further reading:
Amdahl's law
Gives the theoretical speedup in latency of the execution of a task at fixed workload, that can be expected of a system whose resources are improved.
Gustafson's law
Gives the theoretical speedup in latency of the execution of a task at fixed execution time, that can be expected of a system whose resources are improved.
The first thing you should realize is that not all parallelism is beneficial. There is an amount of overhead to parallelism, and this overhead may or may not be significant depending on the complexity what is being parallelized. Since the work in your parallel function is very small, the overhead of the management the parallelism has to do becomes significant, thus slowing down the overall work.
The additional overhead of creating all the threads for your enumerable VS just executing the numerable is more than likely the cause for the slowdown. Parallel.ForEach is not a blanket performance increasing move; it needs to be weighed whether or not the operation that is to be completed for each element is likely to block.
For example, if you were to make a web request or something instead of simply writing to the console, the parallel version might be faster. As it is, simply writing to the console is a very fast operation, so the overhead of creating the threads and starting them is going to be slower.
As previous writer has said there are some overhead associated with Parallel.ForEach, but that is not why you can't see your performance improvement. Console.WriteLine is a synchronous operation, so only one thread is working at a time. Try changing the body to something non-blocking and you will see the performance increase (as long as the amount of work in the body is big enough to outweight the overhead).
I like salomons answer and would like to add that you also have additional overhead of
Allocating delegates.
Calling through them.

C# performance counter Timer100Ns. How to?

I have an application which monitors a particular event and then starts to calculate things once it happens. Events are irregular and can come in any pattern from bunches in a sec to none for long time..
I want to measure %% of time the application is busy (similar to CPU % Usage)
I want to use Timer100Ns counter
Two questions:
Do I increment it by hardware ticks or by DateTime ticks (e.g. if I use Stopwatch - do I use sw.ElapsedTicks or sw.Elapsed.Ticks) ?
Do I need a base counter for it?
so I am about to write something like this:
Stopwatch sw = new Stopwatch();
sw.Start();
// Do some operation which is irregular by nature
sw.Stop();
// Measure utilization of the application
myCounterOfTypeTimer100Ns.IncrementBy(sw.Elapsed.Ticks);
Will it do ?
EDIT : I experimented with it a bit and now its even more confusing.. It actually shows the values I increment it by. Not %%.
The mystery unravelled. It currently appears that I don't use it in the way it was supposed to be used (or rather I didn't read TFM properly). If the sampling interval is 1s (as in perf mon live window) and you intervals are more than 1s then it shows you a nonsense number... To achieve smoothness, the activity you are trying to measure must be really fractions of 1s.. Otherwise this counter is not a good idea..
The answer for this kind of problem (although its not obvious, but still disturbing that nobody suggested it in a week) is actually SampleCounter.

Why is the first iteration always faster then the next in a loop?

I would like to understand why the first iteration in the loop executes quicker than the rest.
Stopwatch sw = new Stopwatch ();
sw.Start ();
for(int i=0; i<10; i++)
{
System.Threading.Thread.Sleep ( 100 );
Console.WriteLine ( "Finished at : {0}", ((double) sw.ElapsedTicks / Stopwatch.Frequency ) * 1e3 );
}
When I execute the code I get the following:
Initially I thought it could be due to the accuracy factor of Stopwatch class, but then why is it applicable only to the first element? Correct me if I'm missing something.
This is a very flawed benchmark. For one, Thread.Sleep does not guarantee you that you'll sleep for exactly 100ms. Try much longer sleeps and you'll see more consistent results.
So it might be even just scheduling - the next iterations are always just doing sleep after sleep. Since Sleep works thanks to the system interrupt clock, the sleeps after the first should take similar amount of time, while the first has to "sync up" with the clock first.
If you add another sleep before the cycle (and before starting the stopwatch), you'll likely get closer times for each of the iterations.
Or even better, don't use sleeps. If you use some actual CPU work instead, you'll avoid thread switches (provided you've got enough CPU to do that) and many other costs not associated with the cycle itself. For example,
Stopwatch sw = new Stopwatch ();
sw.Start ();
for(int i=0; i<10; i++)
{
Thread.SpinWait(10000000);
Console.WriteLine ( "Finished at : {0}", ((double) sw.ElapsedTicks / Stopwatch.Frequency ) * 1e3 );
}
This will give you much more consistent results, because it doesn't depend on the clock at all.
There's many other things that can complicate a benchmark like this, which is why benchmarks simply aren't done this way. There will always be deviations, and they can get rather big, especially on a system with a lot of work.
In other words, if you're getting differences in CPU work execution time on the scale of milliseconds, someone is stealing your work. There's nothing in a modern CPU that would account for such a huge difference just based on e.g. i++ being there or not.
I could describe a lot more issues with your code, but it probably isn't worth it. Just google for some best practices on CPU work benchmarking in C#, and you'll get much more worth out of it.
Oh, and just to help hammer the point home more, on my computer, the first tends to go anywhere from 99 up to 100. This would be highly unusual, since the default is 15.6ms, rather than 1ms, but the culprit is easily found - Chrome sets it to 1ms. Ouch.
What you're outputting for times is the total time elapsed since the start. so, time increasing by about 100ms is exactly what you should be expecting
But, when you use Thread.Sleep you're giving up control of the thread and maybe for something close to the time you've specified. That time will be in multiples of the system quantum--so, what you specify cannot possibly be exact. If other threads of higher priority are doing work, it's less likely that your thread will be given processor time at a granularity close to the time you've suggested.

.NET's Multi-threading vs Multi-processing: Awful Parallel.ForEach Performance

I have coded a very simple "Word Count" program that reads a file and counts each word's occurrence in the file. Here is a part of the code:
class Alaki
{
private static List<string> input = new List<string>();
private static void exec(int threadcount)
{
ParallelOptions options = new ParallelOptions();
options.MaxDegreeOfParallelism = threadcount;
Parallel.ForEach(Partitioner.Create(0, input.Count),options, (range) =>
{
var dic = new Dictionary<string, List<int>>();
for (int i = range.Item1; i < range.Item2; i++)
{
//make some delay!
//for (int x = 0; x < 400000; x++) ;
var tokens = input[i].Split();
foreach (var token in tokens)
{
if (!dic.ContainsKey(token))
dic[token] = new List<int>();
dic[token].Add(1);
}
}
});
}
public static void Main(String[] args)
{
StreamReader reader=new StreamReader((#"c:\txt-set\agg.txt"));
while(true)
{
var line=reader.ReadLine();
if(line==null)
break;
input.Add(line);
}
DateTime t0 = DateTime.Now;
exec(Environment.ProcessorCount);
Console.WriteLine("Parallel: " + (DateTime.Now - t0));
t0 = DateTime.Now;
exec(1);
Console.WriteLine("Serial: " + (DateTime.Now - t0));
}
}
It is simple and straight forward. I use a dictionary to count each word's occurrence. The style is roughly based on the MapReduce programming model. As you can see, each task is using its own private dictionary. So, there is NO shared variables; just a bunch of tasks that count words by themselves. Here is the output when the code is run on a quad-core i7 CPU:
Parallel: 00:00:01.6220927
Serial: 00:00:02.0471171
The speedup is about 1.25 which means a tragedy! But when I add some delay when processing each line, I can reach speedup values about 4.
In the original parallel execution with no delay, CPU's utilization hardly reaches to 30% and therefore the speedup is not promising. But, when we add some delay, CPU's utilization reaches to 97%.
Firstly, I thought the cause is the IO-bound nature of the program (but I think inserting into a dictionary is to some extent CPU intensive) and it seems logical because all of the threads are reading data from a shared memory bus. However, The surprising point is when I run 4 instances of serial programs (with no delays) simultaneously, CPU's utilization reaches to about raises and all of the four instances finish in about 2.3 seconds!
This means that when the code is being run in a multiprocessing configuration, it reaches to a speedup value about 3.5 but when it is being run in multithreading config, the speedup is about 1.25.
What is your idea?
Is there anything wrong about my code? Because I think there is no shared data at all and I think the code shall not experience any contentions.
Is there a flaw in .NET's run-time?
Thanks in advance.
Parallel.For doesn't divide the input into n pieces (where n is the MaxDegreeOfParallelism); instead it creates many small batches and makes sure that at most n are being processed concurrently. (This is so that if one batch takes a very long time to process, Parallel.For can still be running work on other threads. See Parallelism in .NET - Part 5, Partioning of Work for more details.)
Due to this design, your code is creating and throwing away dozens of Dictionary objects, hundreds of List objects, and thousands of String objects. This is putting enormous pressure on the garbage collector.
Running PerfMonitor on my computer reports that 43% of the total run time is spent in GC. If you rewrite your code to use fewer temporary objects, you should see the desired 4x speedup. Some excerpts from the PerfMonitor report follow:
Over 10% of the total CPU time was spent in the garbage collector.
Most well tuned applications are in the 0-10% range. This is typically
caused by an allocation pattern that allows objects to live just long
enough to require an expensive Gen 2 collection.
This program had a peak GC heap allocation rate of over 10 MB/sec.
This is quite high. It is not uncommon that this is simply a
performance bug.
Edit: As per your comment, I will attempt to explain the timings you reported. On my computer, with PerfMonitor, I measured between 43% and 52% of time spent in GC. For simplicity, let's assume that 50% of the CPU time is work, and 50% is GC. Thus, if we make the work 4× faster (through multi-threading) but keep the amount of GC the same (this will happen because the number of batches being processed happened to be the same in the parallel and serial configurations), the best improvement we could get is 62.5% of the original time, or 1.6×.
However, we only see a 1.25× speedup because GC isn't multithreaded by default (in workstation GC). As per Fundamentals of Garbage Collection, all managed threads are paused during a Gen 0 or Gen 1 collection. (Concurrent and background GC, in .NET 4 and .NET 4.5, can collect Gen 2 on a background thread.) Your program experiences only a 1.25× speedup (and you see 30% CPU usage overall) because the threads spend most of their time being paused for GC (because the memory allocation pattern of this test program is very poor).
If you enable server GC, it will perform garbage collection on multiple threads. If I do this, the program runs 2× faster (with almost 100% CPU usage).
When you run four instances of the program simultaneously, each has its own managed heap, and the garbage collection for the four processes can execute in parallel. This is why you see 100% CPU usage (each process is using 100% of one CPU). The slightly longer overall time (2.3s for all vs 2.05s for one) is possibly due to inaccuracies in measurement, contention for the disk, time taken to load the file, having to initialise the threadpool, overhead of context switching, or some other environment factor.
An attempt to explain the results:
a quick run in the VS profiler shows it's barely reaching 40% CPU utilization.
String.Split is the main hotspot.
so a shared something must be blocking the the CPU.
that something is most likely memory allocation. Your bottlenecks are
var dic = new Dictionary<string, List<int>>();
...
dic[token].Add(1);
I replaced this with
var dic = new Dictionary<string, int>();
...
... else dic[token] += 1;
and the result is closer to a 2x speedup.
But my counter question would be: does it matter? Your code is very artificial and incomplete. The parallel version ends up creating multiple dictionaries without merging them. This is not even close to a real situation. And as you can see, little details do matter.
Your sample code is to complex to make broad statements about Parallel.ForEach().
It is too simple to solve/analyze a real problem.
Just for fun, here is a shorter PLINQ version:
File.ReadAllText("big.txt").Split().AsParallel().GroupBy(t => t)
.ToDictionary(g => g.Key, g => g.Count());

Does looping occurs at the same speed for all systems?

Does looping in C# occur at the same speed for all systems. If not, how can I control a looping speed to make the experience consistent on all platforms?
You can set a minimum time for the time taken to go around a loop, like this:
for(int i= 0; i < 10; i++)
{
System.Threading.Thread.Sleep(100);
... rest of your code...
}
The sleep call will take a minimum of 100ms (you cannot say what the maximum will be), so your loop wil take at least 1 second to run 10 iterations.
Bear in mind that it's counter to the normal way of Windows programming to sleep on your user-interface thread, but this might be useful to you for a quick hack.
You can never depend on the speed of a loop. Although all existing compilers strive to make loops as efficient as possible and so they probably produce very similar results (given enough development time), the compilers are not the only think influencing this.
And even leaving everything else aside, different machines have different performance. No two machines will yield the exact same speed for a loop. In fact, even starting the program twice on the same machine will yield slightly different performances. It depends on what other programs are running, how the CPU is feeling today and whether or not the moon is shining.
No, loops do not occur the same in all systems. There are so many factors to this question that it can not be appreciable answered without code.
This is a simple loop:
int j;
for(int i = 0; i < 100; i++) {
j = j + i;
}
this loop is too simple, it's merely a pair of load, add, store operations, with a jump and a compare. This will be only a few microops and will be really fast. However, the speed of those microops will be dependent on the processor. If the processor can do one microop in 1 billionth of a second (roughly one gigahertz) then the loop will take approximately 6 * 100 microops (this is all rough estimation, there are so many factors involved that I'm only going for approximation) or 6 * 100 billionths of a second, or slightly less than one millionth of a second. For the entire loop. You can barely measure this with most operating system functions.
I wanted to demonstrate the speed of the looping. I referenced above a processor of 1 billion microops per second. Now consider a processor that can do 4 billion microops per second. That processor would be four times faster (roughly) than the first processor. And we didn't change the code.
Does this answer the question?
For those who want to mention that the compiler might loop unroll this, ignore that for the sake of the learning.
One way of controlling this is by using the Stopwatch to control when you do your logic. See this example code:
int noofrunspersecond = 30;
long ticks1 = 0;
long ticks2 = 0;
double interval = (double)Stopwatch.Frequency / noofrunspersecond;
while (true) {
ticks2 = Stopwatch.GetTimestamp();
if (ticks2 >= ticks1 + interval) {
ticks1 = Stopwatch.GetTimestamp();
//perform your logic here
}
Thread.Sleep(1);
}
This will make sure that that the logic is performed at given intervals as long as the system can keep up, so if you try to execute 100 times per second, depending on the logic performed the system might not manage to perform that logic 100 times a second. In other cases this should work just fine.
This kind of logic is good for getting smooth animations that will not speed up or slow down on different systems for example.

Categories

Resources