I found a few questions on SO regarding performance comparison of < and <= (this one was extremely downvoted) and I always found the same answer that there is no performance difference between the two.
I wrote a program for the comparison (not so working fiddle...copy to your machine to run it) in which I created two loops for (int i = 0; i <= 1000000000; i++ ) and for (int i = 0; i < 1000000001; i++ ) in two different methods.
I ran each method 100 times; took an average of the elapsed time and found that loop with <= operator ran slower than the one with < operator.
I ran the program multiple times and <= always took more time to complete.
My results(im ms) were:
3018.73, 2778.22
2816.87, 2760.62
2859.02, 2797.05
My question is: If neither one is faster, why do I see differences in the results? Is there anything wrong with my program?
Bench-marking is a fine art. What you describe is not physically possible, the <= and < operators just generate different processor instructions that execute at the exact same speed. I altered your program slightly, running DoIt ten times and dropping two zeros from the for() loop so I wouldn't have to wait for ever:
x86 jitter:
Less Than Equal To Method Time Elapsed: 0.5
Less Than Method Time Elapsed: 0.42
Less Than Equal To Method Time Elapsed: 0.36
Less Than Method Time Elapsed: 0.46
Less Than Equal To Method Time Elapsed: 0.4
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.33
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.35
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.31
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.31
Less Than Method Time Elapsed: 0.32
x64 jitter:
Less Than Equal To Method Time Elapsed: 0.44
Less Than Method Time Elapsed: 0.4
Less Than Equal To Method Time Elapsed: 0.44
Less Than Method Time Elapsed: 0.45
Less Than Equal To Method Time Elapsed: 0.36
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.38
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.33
Less Than Method Time Elapsed: 0.34
Less Than Equal To Method Time Elapsed: 0.34
Less Than Method Time Elapsed: 0.32
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.35
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.42
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.31
Less Than Equal To Method Time Elapsed: 0.32
Less Than Method Time Elapsed: 0.35
The only real signal you get from this is the slow execution of the first DoIt(), also visible in your test results, that's jitter overhead. And the most important signal, it is noisy. The median value for both loops is about equal, the standard deviation is rather large.
Otherwise the kind of signal you always get when you micro-optimize, execution of code is not very deterministic. Short from .NET runtime overhead that's usually easy to eliminate, your program is not the only one that runs on your machine. It has to share the processor, just the WriteLine() call already has an affect. Executed by the conhost.exe process, runs concurrently with your test while your test code entered the next for() loop. And everything else that happens on your machine, kernel code and interrupt handlers also get their turn.
And codegen can play a role, one thing you should do for example is just swap the two calls. The processor itself is in general executes code very non-deterministically. The state of the processor caches and how much historical data was gathered by the branch prediction logic matters a great deal.
When I benchmark, I consider differences of 15% or less not statistically significant. Hunting down differences less than that is quite difficult, you have to very carefully study the machine code. Silly things like a branch target being mis-aligned or a variable not getting stored in a processor register can cause big effects in execution time. Not something you can ever fix, the jitter does not have enough knobs to tweak.
First of all, there are many, many reasons to see variations in benchmarks, even when they're done right. Here are a few that come to mind:
Your computer is running a lot of other processes at the same time, switching things in and out of context, and so on. The operating system is constantly receiving and handling interrupts from various I/O devices, etc. All of these things can cause the computer to pause for periods of time that dwarf the running time for the actual code you're testing.
The JIT process can detect when a function has run a certain number of times, and apply additional optimizations to it based on that information. Things like loop unrolling can drastically reduce the number of jumps that the program has to make, which are significantly more expensive than typical CPU operations. Re-optimizing the instructions takes time when it first happens, and then speeds things up after that point.
Your hardware is trying to make additional optimizations, like branch prediction, to ensure that its pipeline is being used as efficiently as possible. (If it guesses correctly, it can basically pretend that it's going to do the i++ while it waits to see whether the < or <= comparison finishes, and then discard the result if it finds out it was wrong.) The impact of these optimizations depends on a lot of factors, and is not really easy to predict.
Secondly, it's actually really difficult to do benchmarking well. Here'a benchmark template that I've been using for a while now. It's not perfect, but it's pretty good at ensuring that any emerging patterns are unlikely to be based on order of execution or random chance:
/* This is a benchmarking template I use in LINQPad when I want to do a
* quick performance test. Just give it a couple of actions to test and
* it will give you a pretty good idea of how long they take compared
* to one another. It's not perfect: You can expect a 3% error margin
* under ideal circumstances. But if you're not going to improve
* performance by more than 3%, you probably don't care anyway.*/
void Main()
{
// Enter setup code here
var actions = new[]
{
new TimedAction("control", () =>
{
int i = 0;
}),
new TimedAction("<", () =>
{
for (int i = 0; i < 1000001; i++)
{}
}),
new TimedAction("<=", () =>
{
for (int i = 0; i <= 1000000; i++)
{}
}),
new TimedAction(">", () =>
{
for (int i = 1000001; i > 0; i--)
{}
}),
new TimedAction(">=", () =>
{
for (int i = 1000000; i >= 0; i--)
{}
})
};
const int TimesToRun = 10000; // Tweak this as necessary
TimeActions(TimesToRun, actions);
}
#region timer helper methods
// Define other methods and classes here
public void TimeActions(int iterations, params TimedAction[] actions)
{
Stopwatch s = new Stopwatch();
int length = actions.Length;
var results = new ActionResult[actions.Length];
// Perform the actions in their initial order.
for(int i = 0; i < length; i++)
{
var action = actions[i];
var result = results[i] = new ActionResult{Message = action.Message};
// Do a dry run to get things ramped up/cached
result.DryRun1 = s.Time(action.Action, 10);
result.FullRun1 = s.Time(action.Action, iterations);
}
// Perform the actions in reverse order.
for(int i = length - 1; i >= 0; i--)
{
var action = actions[i];
var result = results[i];
// Do a dry run to get things ramped up/cached
result.DryRun2 = s.Time(action.Action, 10);
result.FullRun2 = s.Time(action.Action, iterations);
}
results.Dump();
}
public class ActionResult
{
public string Message {get;set;}
public double DryRun1 {get;set;}
public double DryRun2 {get;set;}
public double FullRun1 {get;set;}
public double FullRun2 {get;set;}
}
public class TimedAction
{
public TimedAction(string message, Action action)
{
Message = message;
Action = action;
}
public string Message {get;private set;}
public Action Action {get;private set;}
}
public static class StopwatchExtensions
{
public static double Time(this Stopwatch sw, Action action, int iterations)
{
sw.Restart();
for (int i = 0; i < iterations; i++)
{
action();
}
sw.Stop();
return sw.Elapsed.TotalMilliseconds;
}
}
#endregion
Here's the result I get when running this in LINQPad:
So you'll notice that there is some variation, particularly early on, but after running everything backwards and forwards enough times, there isn't a clear pattern emerging to show that one way is much faster or slower than another.
Related
I've been trying to grab the CPU usage of a process to verify it isn't hanging. The method NextValue() seems to be the recommended approach, but it seems to only get the CPU usage at the time of checking, and not the total CPU used in between checks (experimentation code and results below)
I ran the following code to see If I could figure out the relationship between the two values, but I can't seem to tell if they actually relate to each other directly. I can tell that 'foo.TotalProcessorTime' does seem to line up with counter.RawValue exactly.
var foo = Process.Start("MyProces.exe");
PerformanceCounter counter = new PerformanceCounter("Process", "% Processor Time", foo.ProcessName, true);
decimal totalValue = 0;
while (true)
{
var newTotal = counter.RawValue;
var increment = counter.NextValue();
var theoryTotal = totalValue + (decimal)increment*10000;//tried 10,000 and 100,000
logger.Info($"New Total: {newTotal} increment {increment} oldTotal + increment {theoryTotal} difference {newTotal-theoryTotal}");
totalValue = newTotal;
await Task.Delay(100);//different increments don't seem to affect the magnitude of NextValue
}
The following is my experimentation of NextValue(), the logs on the right are for the 1s interval, on the left is a 10s interval, with me doing the same things on the process in question.
counter.NextValue();//initialize
while (true)
{
await Task.Delay(10000);//tried 1s and 10s intervals, greater values on the 1s interval
logger.Info($"Percentage: {counter.NextValue()}");
}
TL;DR; How is NextValue() derived from RawValue? Or are they metrics that measure the same things but in a completely unconnected way? And if NextValue() really measures the value in between calls of NextValue() why am I getting '0's when I use a long interval between measurements and leave the process alone at the time of referencing NextValue()?
I need to find the time elapsed between two functions doing the same operation but written in different algorithm. I need to find the fastest among the two
Here is my code snippet
Stopwatch sw = new Stopwatch();
sw.Start();
Console.WriteLine(sample.palindrome()); // algorithm 1
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);//tried sw.elapsed and sw.elapsedticks
sw.Reset(); //tried with and without reset
sw.Start();
Console.WriteLine(sample.isPalindrome()); //algorithm 2
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
Technically this should give the time taken for two algorithms. This gives that the algorithm 2 is faster. But it gives different time if I interchange the calling of two function. Like if I call algorithm2 first and algorithm1 second it says algorithm1 is faster.
I dont know what I am doing wrong.
I assume your palindrome methods runs extremely fast in this example and therefore in order to get a real result you will need to run them a couple of times and then decide which is faster.
Something like this:
int numberOfIterations = 1000; // you decide on a reasonable threshold.
sample.palindrome(); // Call this the first time and avoid measuring the JIT compile time
Stopwatch sw = new Stopwatch();
sw.Start();
for(int i = 0 ; i < numberOfIterations ; i++)
{
sample.palindrome(); // why console write?
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds); // or sw.ElapsedMilliseconds/numberOfIterations
Now do the same for the second method and you will get more realistic results.
What you must do is execute both methods before the actual calculated tests for the compiled code to be JIT'd. Then test with multiple tries. Here is a code mockup.
The compiled code in CIL format will be JIT'd upon first execution, it will be translated into machine code. So testing it at first is in-accurate. So let the code be JIT'd before actually testing it.
sample.palindrome();
sample.isPalindrome();
Stopwatch sw = Stopwatch.StartNew();
for (int i = 0; i < 1000; i++)
{
sample.palindrome();
Console.WriteLine("palindrome test #{0} result: {1}", i, sw.ElapsedMilliseconds);
}
sw.Stop();
Console.WriteLine("palindrome test Final result: {0}", sw.ElapsedMilliseconds);
sw.Restart();
for (int i = 0; i < 1000; i++)
{
sample.isPalindrome();
Console.WriteLine("isPalindrome test #{0} result: {1}", i, sw.ElapsedMilliseconds);
}
sw.Stop();
Console.WriteLine("isPalindrome test Final result: {0}", sw.ElapsedMilliseconds);
Read more about CIL and JIT
Unless you provide the code of palindrome and isPalindrome function along with the sample class, I can't do much except speculate.
The most likely reason which I guess for this is that both your functions use the same class variables and other data. So when you call the function for the first time, it has to allocate memory to the variables, whereas the next time you call some other function, those one time expenses have already occurred. If not variables, it could be some other matter, but along the same lines.
I suggest that you call both the functions twice, and note the duration only the second time a function is called, so that any resources which they need to use may have been allocated once, and there's lesser probability of something behind the scenes messing with the result.
Let me know if this works. This is mere speculation on my part, and I may be wrong.
I need to measure the communitcation latency between two processes on the same machine.The best way I have come up with is serializing DateTime.UtcNow (DateTime.Now seems to be so utterly slow that it extremely distorts my measurements) into the message and comparing it with the DateTime.UtcNow in the other process. Is this as good as it gets? Or is there a better way?
If your goal is to measure and compare exact times between processes, you should use the Windows API function QueryPerformanceCounter(). The values that it returns are synchronized between processes because it returns an internal processor value.
Stopwatch also uses QueryPerformanceCounter() in its implementation, but it doesn't expose the absolute values that are returned so you can't use it.
You will have to use P/Invoke to call QueryPerformanceCounter() but it's pretty easy.
The overhead of using P/Invoke is small. From the MSDN documentation:
PInvoke has an overhead of between 10 and 30 x86 instructions per call. In addition to this fixed cost, marshaling creates additional overhead. There is no marshaling cost between blittable types that have the same representation in managed and unmanaged code. For example, there is no cost to translate between int and Int32.
Since the value returned from QueryPerformanceCounter() is a long, there will be no additional marshaling cost from it, so you're left with an overhead of 10-30 instructions.
Also see this MSDN blog where it is stated that the resolution of UtcNow is around 10ms - which is pretty huge compared to the resolution of the performance counter. (Although I actually don't believe this is true for Windows 8; my measurements seem to show that UtcNow has a millisecond resolution).
Anyway, it is easy to demonstrate that P/Invoking QueryPerformanceCounter() has a higher resolution than using DateTime.UtcNow.
If you run a release build of the following code (run from OUTSIDE a debugger), you'll see that almost all the DateTime.UtcNow elapsed times are 0, whereas all the QueryPerformanceCounter() ones are nonzero.
This is because the resolution of DateTime.UtcNow is not high enough to measure the elapsed time of calling Thread.Sleep(0), whereas QueryPerformanceCounter() is.
using System;
using System.Runtime.InteropServices;
using System.Threading;
namespace ConsoleApplication1
{
internal class Program
{
private static void Main(string[] args)
{
for (int i = 0; i < 100; ++i)
{
var t1 = DateTime.UtcNow;
Thread.Sleep(0);
var t2 = DateTime.UtcNow;
Console.WriteLine("UtcNow elapsed = " + (t2-t1).Ticks);
}
for (int i = 0; i < 100; ++i)
{
long q1, q2;
QueryPerformanceCounter(out q1);
Thread.Sleep(0);
QueryPerformanceCounter(out q2);
Console.WriteLine("QPC elapsed = " + (q2-q1));
}
}
[DllImport("kernel32.dll", SetLastError=true)]
static extern bool QueryPerformanceCounter(out long lpPerformanceCount);
}
}
Now I realise that it could be that the overhead of calling QueryPerformanceCounter() is so high that it is measuring how long it is taking to call, rather than how long Thread.Sleep(0) takes. We can eliminate that in two ways:
Firstly, we can modify the first loop as follows:
for (int i = 0; i < 100; ++i)
{
var t1 = DateTime.UtcNow;
long dummy;
QueryPerformanceCounter(out dummy);
Thread.Sleep(0);
QueryPerformanceCounter(out dummy);
var t2 = DateTime.UtcNow;
Console.WriteLine("UtcNow elapsed = " + (t2-t1).Ticks);
}
Now the UtcNow should be timing Thread.Sleep(0) and two calls to QueryPerformanceCounter(). But if you run it, you'll still see almost all the elapsed times being zero.
Secondly, we can time how long it takes to call QueryPerformanceCounter() a million times:
var t1 = DateTime.UtcNow;
for (int i = 0; i < 1000000; ++i)
{
long dummy;
QueryPerformanceCounter(out dummy);
}
var t2 = DateTime.UtcNow;
Console.WriteLine("Elapsed = " + (t2-t1).TotalMilliseconds);
On my system it takes around 32ms to call QueryPerformanceCounter() one million times.
Finally, we can time how long it takes to call DateTime.UtcNow one million times:
var t1 = DateTime.UtcNow;
for (int i = 0; i < 1000000; ++i)
{
var dummy = DateTime.UtcNow;
}
var t2 = DateTime.UtcNow;
Console.WriteLine("Elapsed = " + (t2-t1).TotalMilliseconds);
On my system that takes around 10ms, which is around 3 times faster than calling QueryPerformanceCounter().
In Summary
So DateTime.UtcNow has lower overhead than P/Invoking QueryPerformanceCounter(), but it has much lower resolution.
So you pays your money and you takes your choice!
I wrote a class which uses Stopwatch to profile methods and for/foreach loops. With for and foreach loops it tests a standard loop against a Parallel.For or Parallel.ForEach implementation.
You would write performance tests like so:
Method:
PerformanceResult result = Profiler.Execute(() => { FooBar(); });
For loop:
SerialParallelPerformanceResult result = Profiler.For(0, 100, x => { FooBar(x); });
ForEach loop:
SerialParallelPerformanceResult result = Profiler.ForEach(list, item => { FooBar(item); });
Whenever I run the tests (one of .Execute, .For or .ForEach) I put them in a loop so I can see how the performance changes over time.
Example of performance might be:
Method execution 1 = 200ms
Method execution 2 = 12ms
Method execution 3 = 0ms
For execution 1 = 300ms (Serial), 100ms (Parallel)
For execution 2 = 20ms (Serial), 75ms (Parallel)
For execution 3 = 2ms (Serial), 50ms (Parallel)
ForEach execution 1 = 350ms (Serial), 300ms (Parallel)
ForEach execution 2 = 24ms (Serial), 89ms (Parallel)
ForEach execution 3 = 1ms (Serial), 21ms (Parallel)
My questions are:
Why does performance change over time, what is .NET doing in the background to facilitate this?
How/why is a serial operation faster than a parallel one? I have made sure that I make the operations complex to see the difference properly...in most cases serial operations seem faster!?
NOTE: For parallel processing I am testing on an 8 core machine.
After some more exploration into performance profiling, I have discovered that using a Stopwatch is not an accurate way to measure the performance of a particular task
(Thanks hatchet and Loren for your comments on this!)
Reasons a stopwatch are not accurate:
Measurements are calculated in elapsed time in milliseconds, not CPU time.
Measurements can be influenced by background "noise" and thread intensive processes.
Measurements do not take into account JIT compilation and overhead.
That being said, using a stopwatch is OK for casual exploration of performance. With that in mind, I have improved my profiling algorithm somewhat.
Where before it simply executed the expression that was passed to it, it now has the facility to iterate over the expression several times, building an average execution time. The first run can be omitted since this is where JIT kicks in, and some major overhead may occur. Understandably, this will never be as sophisticated as using a professional profiling tool like Redgate's ANTS profiler, but it's OK for simpler tasks!
As per my comment above: I did some simple tests on my own and found no difference over time. Can you share your code? I'll put mine in an answer as it doesn't fit here.
This is my sample code.
(I also tried with both static and instance methods with no difference)
class Program
{
static void Main(string[] args)
{
int to = 50000000;
OtherStuff os = new OtherStuff();
Console.WriteLine(Profile(() => os.CountTo(to)));
Console.WriteLine(Profile(() => os.CountTo(to)));
Console.WriteLine(Profile(() => os.CountTo(to)));
}
static long Profile(Action method)
{
Stopwatch st = Stopwatch.StartNew();
method();
st.Stop();
return st.ElapsedMilliseconds;
}
}
class OtherStuff
{
public void CountTo(int to)
{
for (int i = 0; i < to; i++)
{
// some work...
i++;
i--;
}
}
}
A sample output would be:
331
331
334
Consider executing this method instead:
class OtherStuff
{
public string CountTo(Guid id)
{
using(SHA256 sha = SHA256.Create())
{
int x = default(int);
for (int index = 0; index < 16; index++)
{
x = id.ToByteArray()[index] >> 32 << 16;
}
RNGCryptoServiceProvider rng = new RNGCryptoServiceProvider();
byte[] y = new byte[1024];
rng.GetBytes(y);
y = y.Concat(BitConverter.GetBytes(x)).ToArray();
return BitConverter.ToString(sha.ComputeHash(BitConverter.GetBytes(x).Where(o => o >> 2 < 0).ToArray()));
}
}
}
Sample output:
11
0
0
I am using QueryPerformanceCounter to get the precise timing behaviour in my c sharp program. But if i measure the timing using thread.sleep it is not giving excpected results.
I followed standard example given at msdn site:
http://msdn.microsoft.com/en-us/library/ff650674.aspx
myTimer.Start();
for (int i = 0; i < iterations; i++)
{
// Method to time
System.Threading.Thread.Sleep(1000);
}
myTimer.Stop();
Results are :
Iterations: 5
Average time per iteration:
0.208957983452184 seconds
Datetime.Now gives :
Duration of test run:
5 seconds
5000 milliseconds
Can you please suggest what could be possible error?