Using QueryPerformanceCounter in c sharp but not getting precise results - c#

I am using QueryPerformanceCounter to get the precise timing behaviour in my c sharp program. But if i measure the timing using thread.sleep it is not giving excpected results.
I followed standard example given at msdn site:
http://msdn.microsoft.com/en-us/library/ff650674.aspx
myTimer.Start();
for (int i = 0; i < iterations; i++)
{
// Method to time
System.Threading.Thread.Sleep(1000);
}
myTimer.Stop();
Results are :
Iterations: 5
Average time per iteration:
0.208957983452184 seconds
Datetime.Now gives :
Duration of test run:
5 seconds
5000 milliseconds
Can you please suggest what could be possible error?

Related

C# How to determine the time required to perform the push/pop or queue/dequeue operations on stacks and queues

Part of my task is to get the time required to push 10,000,000 values onto each stack and then get the time required to pop 10,000,000 values from each stack. I have code written out so far for my stacks and it looks like this I used 1,000,000 because 10,000,000 took too long.
Console.WriteLine ("The StackO stack : " + '\t' + " ");
for (int n = 1; n <= 10; n++)
{
DateTime stackOstartTime = DateTime.Now;
Stack StackO = new Stack ();
for (int i = 1; i <= 1000000*n; i++)
{
StackO.Push (i);
}
DateTime stackOendTime = DateTime.Now;
TimeSpan stackOtimeDifference = stackOendTime - stackOstartTime;
Console.WriteLine ("It took the number: " + n + " " + stackOtimeDifference + " to push.");
The part that I am stuck on his how to pop these values out of my stack and then get the time required to do that as well. I have tried making a similar implementation by replacing StackO.Push(i) with StackO.Pop() and I do not get anything. I have been trying many different things and I am stuck. Any help would be appreciated, and if my stack push implementation is incorrect any suggestions would be appreciated.
Here is my output:
The StackO stack :
It took the number: 1 00:00:00.1358860 to push.
It took the number: 2 00:00:00.2481400 to push.
It took the number: 3 00:00:00.4524940 to push.
It took the number: 4 00:00:00.5205040 to push.
It took the number: 5 00:00:00.7325030 to push.
It took the number: 6 00:00:00.6901880 to push.
It took the number: 7 00:00:00.9433310 to push.
It took the number: 8 00:00:01.1270300 to push.
It took the number: 9 00:00:01.0711620 to push.
It took the number: 10 00:00:01.3230030 to push.
The resolution of the clock that drives System.DateTime is not particularly good. As a couple of people have suggested you need to use System.Diagnostics.Stopwatch to get better timing information:
var sw = new System.Diagnostics.Stopwatch();
int numItems = 50000000;
var stack = new Stack<int>();
sw.Restart();
for (int i = 0; i < numItems; ++i)
stack.Push(i);
sw.Stop();
var totalTime = sw.Elapsed.TotalSeconds;
Console.WriteLine("Push time: {0:#,#0.00}ms", 1000 * totalTime);
Console.WriteLine("Time per item: {0:#,#0.00}ns", (1000000000.0 * totalTime) / numItems);
Running this on my dev machine (it's a virtual, so stats are a bit hazy :P) gives the following output:
Push time: 1,278.07ms
Time per item: 25.56ns
That's 50 million items pushed in just over a second, with an average of around 25 nanoseconds per item inserted. Over several runs I get a fairly wide variance, but that's a fairly typical number.
For the pop measures, directly after the above:
int val;
sw.Restart();
while (stack.Any())
val = stack.Pop();
sw.Stop();
totalTime = sw.Elapsed.TotalSeconds;
Console.WriteLine("Pop time: {0:#,#0.00}ms", 1000 * totalTime);
Console.WriteLine("Time per item: {0:#,#0.00}ns", (1e+9 * totalTime) / numItems);
Output times for this were much longer:
Pop time: 5,067.17ms
Time per item: 101.34ns
About 4 times as long to pop as to push, which is interesting. Subsequent tests had the ratio varying between 3 and 5 times.
Now here's what you get from measuring 1000 items:
Push time: 0.06ms
Time per item: 64.60ns
Pop time: 0.40ms
Time per item: 402.70ns
Because of the scale difference the error resolution is pretty poor, but here you get actual results. I put some DateTime.Now recording in and ran the thing again, with results that were so far off the real times that they were utterly worthless - 2ms instead of 0.06ms, then 1ms instead of 0.4ms.
So if you are measuring very small time spans use a Stopwatch. If the times are long (at least a second) and you just want a rough figure then use DateTime. But since Stopwatch is so simple to use it's generally going to be the best option.

<= does work slower than <

I found a few questions on SO regarding performance comparison of < and <= (this one was extremely downvoted) and I always found the same answer that there is no performance difference between the two.
I wrote a program for the comparison (not so working fiddle...copy to your machine to run it) in which I created two loops for (int i = 0; i <= 1000000000; i++ ) and for (int i = 0; i < 1000000001; i++ ) in two different methods.
I ran each method 100 times; took an average of the elapsed time and found that loop with <= operator ran slower than the one with < operator.
I ran the program multiple times and <= always took more time to complete.
My results(im ms) were:
3018.73, 2778.22
2816.87, 2760.62
2859.02, 2797.05
My question is: If neither one is faster, why do I see differences in the results? Is there anything wrong with my program?
Bench-marking is a fine art. What you describe is not physically possible, the <= and < operators just generate different processor instructions that execute at the exact same speed. I altered your program slightly, running DoIt ten times and dropping two zeros from the for() loop so I wouldn't have to wait for ever:
x86 jitter:
Less Than Equal To Method Time Elapsed: 0.5
Less Than Method Time Elapsed: 0.42
Less Than Equal To Method Time Elapsed: 0.36
Less Than Method Time Elapsed: 0.46
Less Than Equal To Method Time Elapsed: 0.4
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.33
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.35
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.31
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.31
Less Than Method Time Elapsed: 0.32
x64 jitter:
Less Than Equal To Method Time Elapsed: 0.44
Less Than Method Time Elapsed: 0.4
Less Than Equal To Method Time Elapsed: 0.44
Less Than Method Time Elapsed: 0.45
Less Than Equal To Method Time Elapsed: 0.36
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.38
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.33
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.42
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.31
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.35
The only real signal you get from this is the slow execution of the first DoIt(), also visible in your test results, that's jitter overhead. And the most important signal, it is noisy. The median value for both loops is about equal, the standard deviation is rather large.
Otherwise the kind of signal you always get when you micro-optimize, execution of code is not very deterministic. Short from .NET runtime overhead that's usually easy to eliminate, your program is not the only one that runs on your machine. It has to share the processor, just the WriteLine() call already has an affect. Executed by the conhost.exe process, runs concurrently with your test while your test code entered the next for() loop. And everything else that happens on your machine, kernel code and interrupt handlers also get their turn.
And codegen can play a role, one thing you should do for example is just swap the two calls. The processor itself is in general executes code very non-deterministically. The state of the processor caches and how much historical data was gathered by the branch prediction logic matters a great deal.
When I benchmark, I consider differences of 15% or less not statistically significant. Hunting down differences less than that is quite difficult, you have to very carefully study the machine code. Silly things like a branch target being mis-aligned or a variable not getting stored in a processor register can cause big effects in execution time. Not something you can ever fix, the jitter does not have enough knobs to tweak.
First of all, there are many, many reasons to see variations in benchmarks, even when they're done right. Here are a few that come to mind:
Your computer is running a lot of other processes at the same time, switching things in and out of context, and so on. The operating system is constantly receiving and handling interrupts from various I/O devices, etc. All of these things can cause the computer to pause for periods of time that dwarf the running time for the actual code you're testing.
The JIT process can detect when a function has run a certain number of times, and apply additional optimizations to it based on that information. Things like loop unrolling can drastically reduce the number of jumps that the program has to make, which are significantly more expensive than typical CPU operations. Re-optimizing the instructions takes time when it first happens, and then speeds things up after that point.
Your hardware is trying to make additional optimizations, like branch prediction, to ensure that its pipeline is being used as efficiently as possible. (If it guesses correctly, it can basically pretend that it's going to do the i++ while it waits to see whether the < or <= comparison finishes, and then discard the result if it finds out it was wrong.) The impact of these optimizations depends on a lot of factors, and is not really easy to predict.
Secondly, it's actually really difficult to do benchmarking well. Here'a benchmark template that I've been using for a while now. It's not perfect, but it's pretty good at ensuring that any emerging patterns are unlikely to be based on order of execution or random chance:
/* This is a benchmarking template I use in LINQPad when I want to do a
* quick performance test. Just give it a couple of actions to test and
* it will give you a pretty good idea of how long they take compared
* to one another. It's not perfect: You can expect a 3% error margin
* under ideal circumstances. But if you're not going to improve
* performance by more than 3%, you probably don't care anyway.*/
void Main()
{
// Enter setup code here
var actions = new[]
{
new TimedAction("control", () =>
{
int i = 0;
}),
new TimedAction("<", () =>
{
for (int i = 0; i < 1000001; i++)
{}
}),
new TimedAction("<=", () =>
{
for (int i = 0; i <= 1000000; i++)
{}
}),
new TimedAction(">", () =>
{
for (int i = 1000001; i > 0; i--)
{}
}),
new TimedAction(">=", () =>
{
for (int i = 1000000; i >= 0; i--)
{}
})
};
const int TimesToRun = 10000; // Tweak this as necessary
TimeActions(TimesToRun, actions);
}
#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
Stopwatch s = new Stopwatch();
int length = actions.Length;
var results = new ActionResult[actions.Length];
// Perform the actions in their initial order.
for(int i = 0; i < length; i++)
{
var action = actions[i];
var result = results[i] = new ActionResult{Message = action.Message};
// Do a dry run to get things ramped up/cached
result.DryRun1 = s.Time(action.Action, 10);
result.FullRun1 = s.Time(action.Action, iterations);
}
// Perform the actions in reverse order.
for(int i = length - 1; i >= 0; i--)
{
var action = actions[i];
var result = results[i];
// Do a dry run to get things ramped up/cached
result.DryRun2 = s.Time(action.Action, 10);
result.FullRun2 = s.Time(action.Action, iterations);
}
results.Dump();
}
public class ActionResult
{
public string Message {get;set;}
public double DryRun1 {get;set;}
public double DryRun2 {get;set;}
public double FullRun1 {get;set;}
public double FullRun2 {get;set;}
}
public class TimedAction
{
public TimedAction(string message, Action action)
{
Message = message;
Action = action;
}
public string Message {get;private set;}
public Action Action {get;private set;}
}
public static class StopwatchExtensions
{
public static double Time(this Stopwatch sw, Action action, int iterations)
{
sw.Restart();
for (int i = 0; i < iterations; i++)
{
action();
}
sw.Stop();
return sw.Elapsed.TotalMilliseconds;
}
}
#endregion
Here's the result I get when running this in LINQPad:
So you'll notice that there is some variation, particularly early on, but after running everything backwards and forwards enough times, there isn't a clear pattern emerging to show that one way is much faster or slower than another.

Time elapsed between two functions

I need to find the time elapsed between two functions doing the same operation but written in different algorithm. I need to find the fastest among the two
Here is my code snippet
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine(sample.palindrome()); // algorithm 1
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);//tried sw.elapsed and sw.elapsedticks
sw.Reset(); //tried with and without reset
sw.Start();
Console.WriteLine(sample.isPalindrome()); //algorithm 2
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
Technically this should give the time taken for two algorithms. This gives that the algorithm 2 is faster. But it gives different time if I interchange the calling of two function. Like if I call algorithm2 first and algorithm1 second it says algorithm1 is faster.
I dont know what I am doing wrong.
I assume your palindrome methods runs extremely fast in this example and therefore in order to get a real result you will need to run them a couple of times and then decide which is faster.
Something like this:
int numberOfIterations = 1000; // you decide on a reasonable threshold.
sample.palindrome(); // Call this the first time and avoid measuring the JIT compile time
Stopwatch sw = new Stopwatch();
sw.Start();
for(int i = 0 ; i < numberOfIterations ; i++)
{
sample.palindrome(); // why console write?
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds); // or sw.ElapsedMilliseconds/numberOfIterations
Now do the same for the second method and you will get more realistic results.
What you must do is execute both methods before the actual calculated tests for the compiled code to be JIT'd. Then test with multiple tries. Here is a code mockup.
The compiled code in CIL format will be JIT'd upon first execution, it will be translated into machine code. So testing it at first is in-accurate. So let the code be JIT'd before actually testing it.
sample.palindrome();
sample.isPalindrome();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 1000; i++)
{
sample.palindrome();
Console.WriteLine("palindrome test #{0} result: {1}", i, sw.ElapsedMilliseconds);
}
sw.Stop();
Console.WriteLine("palindrome test Final result: {0}", sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 0; i < 1000; i++)
{
sample.isPalindrome();
Console.WriteLine("isPalindrome test #{0} result: {1}", i, sw.ElapsedMilliseconds);
}
sw.Stop();
Console.WriteLine("isPalindrome test Final result: {0}", sw.ElapsedMilliseconds);
Read more about CIL and JIT
Unless you provide the code of palindrome and isPalindrome function along with the sample class, I can't do much except speculate.
The most likely reason which I guess for this is that both your functions use the same class variables and other data. So when you call the function for the first time, it has to allocate memory to the variables, whereas the next time you call some other function, those one time expenses have already occurred. If not variables, it could be some other matter, but along the same lines.
I suggest that you call both the functions twice, and note the duration only the second time a function is called, so that any resources which they need to use may have been allocated once, and there's lesser probability of something behind the scenes messing with the result.
Let me know if this works. This is mere speculation on my part, and I may be wrong.

Performance profiling in .NET

I wrote a class which uses Stopwatch to profile methods and for/foreach loops. With for and foreach loops it tests a standard loop against a Parallel.For or Parallel.ForEach implementation.
You would write performance tests like so:
Method:
PerformanceResult result = Profiler.Execute(() => { FooBar(); });
For loop:
SerialParallelPerformanceResult result = Profiler.For(0, 100, x => { FooBar(x); });
ForEach loop:
SerialParallelPerformanceResult result = Profiler.ForEach(list, item => { FooBar(item); });
Whenever I run the tests (one of .Execute, .For or .ForEach) I put them in a loop so I can see how the performance changes over time.
Example of performance might be:
Method execution 1 = 200ms
Method execution 2 = 12ms
Method execution 3 = 0ms
For execution 1 = 300ms (Serial), 100ms (Parallel)
For execution 2 = 20ms (Serial), 75ms (Parallel)
For execution 3 = 2ms (Serial), 50ms (Parallel)
ForEach execution 1 = 350ms (Serial), 300ms (Parallel)
ForEach execution 2 = 24ms (Serial), 89ms (Parallel)
ForEach execution 3 = 1ms (Serial), 21ms (Parallel)
My questions are:
Why does performance change over time, what is .NET doing in the background to facilitate this?
How/why is a serial operation faster than a parallel one? I have made sure that I make the operations complex to see the difference properly...in most cases serial operations seem faster!?
NOTE: For parallel processing I am testing on an 8 core machine.
After some more exploration into performance profiling, I have discovered that using a Stopwatch is not an accurate way to measure the performance of a particular task
(Thanks hatchet and Loren for your comments on this!)
Reasons a stopwatch are not accurate:
Measurements are calculated in elapsed time in milliseconds, not CPU time.
Measurements can be influenced by background "noise" and thread intensive processes.
Measurements do not take into account JIT compilation and overhead.
That being said, using a stopwatch is OK for casual exploration of performance. With that in mind, I have improved my profiling algorithm somewhat.
Where before it simply executed the expression that was passed to it, it now has the facility to iterate over the expression several times, building an average execution time. The first run can be omitted since this is where JIT kicks in, and some major overhead may occur. Understandably, this will never be as sophisticated as using a professional profiling tool like Redgate's ANTS profiler, but it's OK for simpler tasks!
As per my comment above: I did some simple tests on my own and found no difference over time. Can you share your code? I'll put mine in an answer as it doesn't fit here.
This is my sample code.
(I also tried with both static and instance methods with no difference)
class Program
{
static void Main(string[] args)
{
int to = 50000000;
OtherStuff os = new OtherStuff();
Console.WriteLine(Profile(() => os.CountTo(to)));
Console.WriteLine(Profile(() => os.CountTo(to)));
Console.WriteLine(Profile(() => os.CountTo(to)));
}
static long Profile(Action method)
{
Stopwatch st = Stopwatch.StartNew();
method();
st.Stop();
return st.ElapsedMilliseconds;
}
}
class OtherStuff
{
public void CountTo(int to)
{
for (int i = 0; i < to; i++)
{
// some work...
i++;
i--;
}
}
}
A sample output would be:
331
331
334
Consider executing this method instead:
class OtherStuff
{
public string CountTo(Guid id)
{
using(SHA256 sha = SHA256.Create())
{
int x = default(int);
for (int index = 0; index < 16; index++)
{
x = id.ToByteArray()[index] >> 32 << 16;
}
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
byte[] y = new byte[1024];
rng.GetBytes(y);
y = y.Concat(BitConverter.GetBytes(x)).ToArray();
return BitConverter.ToString(sha.ComputeHash(BitConverter.GetBytes(x).Where(o => o >> 2 < 0).ToArray()));
}
}
}
Sample output:
11
0
0

Problem in calculating time taken to execute a function

I am trying to find the time taken to run a function. I am doing it this way:
SomeFunc(input) {
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
//some operation on input
stopWatch.Stop();
long timeTaken = stopWatch.ElapsedMilliseconds;
}
Now the "some operation on input" as mentioned in the comments takes significant time based on the input to SomeFunc.
The problem is when I call SomeFunc multiple times from the main, I get timeTaken correctly only for the first time, and the rest of the time it is being assigned to 0. Is there a problem with the above code?
EDIT:
There is a UI with multiple text fields, and when a button is clicked, it is delegated to the SomeFunc. The SomeFunc makes some calculations based on the input (from the text fields) and displays the result on the UI. I am not allowed to share the code in "some operation on input" since I have signed an NDA. I can however answer your questions as to what I am trying to achieve there. Please help.
EDIT 2:
As it seems that I am getting weird value when the function is called the first time, and as #Mike Bantegui mentioned, there must be JIT optimization going on, the only solution I can think of now (to not get zero as execution time) is that to display the time in nano seconds. How is it possible to display the time in nano seconds in C#?
Well, you aren't outputing that data anywhere. Ideally you would do it something more like this.
void SomeFunc(input)
{
Do sstuff
}
main()
{
List<long> results = new List<long>();
Stopwatch sw = new Stopwatch();
for(int i = 0; i < MAX_TRIES; i++)
{
sw.Start();
SomeFunc(arg);
sw.Stop();
results.Add(sw.ElapsedMilliseconds);
sw.Reset();
}
//Perform analyses and results
}
In fact you are getting the wrong time at the first start and correct time to the remaining. You can't relay just on the first call to measure the time. However It seams to be that the operation is too fast and so you get the 0 results. To measure the test correctly call the function 1000 times for example to see the average cost time:
Stopwatch watch = StopWatch.StartNew();
for (int index = 0; index < 1000; index++)
{
SomeFunc(input);
}
watch.Stop();
Console.WriteLine(watch.ElapsedMilliseconds);
Edit:
How is it possible to display the time in nano seconds
You can get watch.ElapsedTicks and then convert it to nanoseconds : (watch.ElapsedTicks / Stopwatch.Frequency) * 1000000000
As a simple example, consider the following (contrived) example:
double Mean(List<double> items)
{
double mu = 0;
foreach (double val in items)
mu += val;
return mu / items.Length;
}
We can time it like so:
void DoTimings(int n)
{
Stopwatch sw = new Stopwatch();
int time = 0;
double dummy = 0;
for (int i = 0; i < n; i++)
{
List<double> items = new List<double>();
// populate items with random numbers, excluded for brevity
sw.Start();
dummy += Mean(items);
sw.Stop();
time += sw.ElapsedMilliseconds;
}
Console.WriteLine(dummy);
Console.WriteLine(time / n);
}
This works if the list of items is actually very large. But if it's too small, we'll have to do multiple runs under one timing:
void DoTimings(int n)
{
Stopwatch sw = new Stopwatch();
int time = 0;
double dummy = 0;
List<double> items = new List<double>(); // Reuse same list
// populate items with random numbers, excluded for brevity
sw.Start();
for (int i = 0; i < n; i++)
{
dummy += Mean(items);
time += sw.ElapsedMilliseconds;
}
sw.Stop();
Console.WriteLine(dummy);
Console.WriteLine(time / n);
}
In the second example, if the size of the list is too small, then we can accurately get an idea of how long it takes by simply running this for a large enough n. Each has it's advantages and flaws though.
However, before doing either of these I would do a "warm up" calculation before hand:
// Or something smaller, just enough to let the compiler JIT
double dummy = 0;
for (int i = 0; i < 10000; i++)
dummy += Mean(data);
Console.WriteLine(dummy);
// Now do the actual timing
An alternative method of both would be to do what #Rig did in his answer, and build up a list of results to do statistics on. In the first case, you'd simply build up a list of each individual time. In the second case, you would build up a list of the average timing of multiple runs, since the time for a calculation could smaller than finest grained time in your Stopwatch.
With all that said, I would say there is one very large caveat in all of this: Calculating the time it takes for something to run is very hard to do properly. It's admirable to want to do profiling, but you should do some research on SO and see what other people have done to do this properly. It's very easy to write a routine that times something badly, but very hard to do it right.

Categories

Resources