While testing the performance of floats in .NET, I stumbled unto a weird case: for certain values, multiplication seems way slower than normal. Here is the test case:
using System;
using System.Diagnostics;
namespace NumericPerfTestCSharp {
class Program {
static void Main() {
Benchmark(() => float32Multiply(0.1f), "\nfloat32Multiply(0.1f)");
Benchmark(() => float32Multiply(0.9f), "\nfloat32Multiply(0.9f)");
Benchmark(() => float32Multiply(0.99f), "\nfloat32Multiply(0.99f)");
Benchmark(() => float32Multiply(0.999f), "\nfloat32Multiply(0.999f)");
Benchmark(() => float32Multiply(1f), "\nfloat32Multiply(1f)");
}
static void float32Multiply(float param) {
float n = 1000f;
for (int i = 0; i < 1000000; ++i) {
n = n * param;
}
// Write result to prevent the compiler from optimizing the entire method away
Console.Write(n);
}
static void Benchmark(Action func, string message) {
// warm-up call
func();
var sw = Stopwatch.StartNew();
for (int i = 0; i < 5; ++i) {
func();
}
Console.WriteLine(message + " : {0} ms", sw.ElapsedMilliseconds);
}
}
}
Results:
float32Multiply(0.1f) : 7 ms
float32Multiply(0.9f) : 946 ms
float32Multiply(0.99f) : 8 ms
float32Multiply(0.999f) : 7 ms
float32Multiply(1f) : 7 ms
Why are the results so different for param = 0.9f?
Test parameters: .NET 4.5, Release build, code optimizations ON, x86, no debugger attached.
As others have mentioned, various processors do not support normal-speed calculations when subnormal floating-point values are involved. This is either a design defect (if the behavior impairs your application or is otherwise troublesome) or a feature (if you prefer the cheaper processor or alternative use of silicon that was enabled by not using gates for this work).
It is illuminating to understand why there is a transition at .5:
Suppose you are multiplying by p. Eventually, the value becomes so small that the result is some subnormal value (below 2-126 in 32-bit IEEE binary floating point). Then multiplication becomes slow. As you continue multiplying, the value continues decreasing, and it reaches 2-149, which is the smallest positive number that can be represented. Now, when you multiply by p, the exact result is of course 2-149p, which is between 0 and 2-149, which are the two nearest representable values. The machine must round the result and return one of these two values.
Which one? If p is less than ½, then 2-149p is closer to 0 than to 2-149, so the machine returns 0. Then you are not working with subnormal values anymore, and multiplication is fast again. If p is greater than ½, then 2-149p is closer to 2-149 than to 0, so the machine returns 2-149, and you continue working with subnormal values, and multiplication remains slow. If p is exactly ½, the rounding rules say to use the value that has zero in the low bit of its significand (fraction portion), which is zero (2-149 has 1 in its low bit).
You report that .99f appears fast. This should end with the slow behavior. Perhaps the code you posted is not exactly the code for which you measured fast performance with .99f? Perhaps the starting value or the number of iterations were changed?
There are ways to work around this problem. One is that the hardware has mode settings that specify to change any subnormal values used or obtained to zero, called “denormals as zero” or “flush to zero” modes. I do not use .NET and cannot advise you about how to set these modes in .NET.
Another approach is to add a tiny value each time, such as
n = (n+e) * param;
where e is at least 2-126/param. Note that 2-126/param should be calculated rounded upward, unless you can guarantee that n is large enough that (n+e) * param does not produce a subnormal value. This also presumes n is not negative. The effect of this is to make sure the calculated value is always large enough to be in the normal range, never subnormal.
Adding e in this way of course changes the results. However, if you are, for example, processing audio with some echo effect (or other filter), then the value of e is too small to cause any effects observable by humans listening to the audio. It is likely too small to cause any change in the hardware behavior when producing the audio.
I suspect this has something to do with denormal values (fp values smaller than ~ 1e-38) and the cost associated with processing them.
If you test for denormal values and remove them, sanity is restored.
static void float32Multiply(float param) {
float n = 1000f;
int zeroCount=0;
for (int i = 0; i < 1000000; ++i) {
n = n * param;
if(n<1e-38)n=0;
}
// Write result to prevent the compiler from optimizing the entire method away
Console.Write(n);
}
Related
While writing a program I came across finding the cube root of a number in one of my functions.
when I used the below code, I was getting an incorrect value for the cube root (1 was getting printed for n = 64).
public static void cubicPairs(double n)
{
double root = (System.Math.Pow(n, (1/3)));
Console.WriteLine(root);
}
Now after I changed the code slightly to this,
public static void cubicPairs(double n)
{
double root = (System.Math.Pow(n, (1.0/3.0))); //Changed how second parameter is passed
Console.WriteLine(root);
}
I got root = 3.9999999999999996 (while debugging) but the method was printing 4 (which is correct).
Why is there a difference between the two values and if this has to do with the second parameter to the System.Math.Pow() method (i.e, 1.0/3.0 which is a recursive value), what should I use to find cube root so that I get 4 (while debugging) rather than 3.9999999999999996?
This is a standard trap in the { curly brace languages }, C# included, a division with integral operands is performed as an integer division, not a floating point division. It always yields an integer result, 1 / 3 produces 0. Raising any number to the power of 0 produces 1.0
You force a floating point division by converting one of the operands to double. Like 1.0 / 3 or (double)integerVariable / 3.
Similar problem with multiplication, but usually less of a trap, integral operands produce an integral result that risks overflow. This otherwise reflects the way the processor works, it has distinct instructions for these operations, IMUL vs FMUL and IDIV vs FDIV. The latter one being rather famous for a bug in the Pentium processor :)
You can try running this code for cube root functionality.
textBox2.Text = Math.Pow(Convert.ToDouble(textBox1.Text), 0.3333333333333333).ToString();
The error (which, by the way, is just 4E-16 - 400 quintillionths) is caused by floating point errors.
You could circumvent this by rounding the number if it is within a certain threshold:
public static void cubicPairs(double n)
{
double root = (System.Math.Pow(n, (1/3)));
double roundedRoot = Math.Round(root);
if (Math.Abs(roundedRoot - root) < VERY_SMALL_NUMBER)
return roundedRoot;
else
return root;
}
Where VERY_SMALL_NUMBER can be, say, 1e-10.
public static void Main()
{
int a= int.Parse(Console.ReadLine());
int sum=0;
for(int i=0 ; i<= a ;i++)
{
for(int j=0 ; j<i ;j++)
{
sum =+ (i*i);
}
Console.WriteLine("Number is : {0} and cube of the {0} is :{1} \n",i,sum*i);
}
}
Try it
Math.Ceiling(Math.Pow(n, (double)1 / 3));
DISCLAIMER
I do not want to know when or if to use shift operators in my code, I am interested in why multiplication is faster than shifting bits to the left whereas division is not.
As I was just wandering around SO I came across this question regarding efficiency and speed of division and bit shifting. It basically states that although one might save a few seconds when performing bit shifts on powers of 2, it is not some difference one has to worry about.
Intrigued by this I decided to check how much faster bit shifting in C# actually is and realised something strange:
Bit shifting instead of dividing is faster, as I expected, but the "normal" multiplication method is faster than bit shifting.
My question is simple: Why is the multiplication of two numbers faster than bit shifting, although bit shifting is a primitive operation for the processor?
Here are the results for my test case:
Division: | Multiplication:
Bit shift: 315ms | 315ms
normal: 406ms | 261ms
The times are the averages of 100 cases with each case consisting of 10 operations per number on 10000000 random positive numbers ranging from 1 to int.MaxValue. The operations ranged from dividing/multiplying by 2 to 1024 (in powers of 2) and bit shifting from 1 to 10 digits.
EDIT
#usr: I am using .NET version 4.5.1
I updated my results because I realised I only computed a tenth of the numbers I stated... facepalm
My Main:
static Main(string[] args)
{
Fill(); // fills the array with random numbers
Profile("division shift:", 100, BitShiftDiv);
Profile("division:", 100, Div);
Profile("multiplication shift:", 100, BitShiftMul);
Profile("multiplication:", 100, Mul);
Console.ReadKey();
}
This is my profiling method:
static void Profile(string description, int iterations, Action func)
{
GC.Collect()
GC.WaitForPendingFinalizers();
GC.Collect();
func();
Stopwatch stopWatch = Stopwatch.StartNew();
for (int i = 0; i < iterations; i++)
{
func();
}
stopWatch.Stop();
Console.WriteLine(description);
Console.WriteLine("total: {0}ms", stopWatch.Elapsed.TotalMilliseconds);
Console.WriteLine(" avg: {0}ms", stopWatch.Elapsed.TotalMilliseconds / (double)iterations);
}
The Actions containing the operations are structured like this:
static void <Name>()
{
for (int s = 1; s <= 10; s++) /* for shifts */
for (int s = 2; s <= 1024; s++) /* for others */
{
for (int i = 0; i < nums.Length; i++)
{
var useless = nums[i] <shift> s; /* shifting */
var useless = nums[i] <operator> s; /* otherwise */
}
}
}
nums is a public array containing 10000000 ints, which is filled by the Fill() method.
To sum up the answers already mentioned in the comments:
Multiplication, as well as bit shifting, is faster because is a native operation for the CPU too. It takes one cycle while bit shifting takes about four which is why it is faster. Division takes something between 11 and 18 cycles.
Using C# I cannot get close enough to the CPU to get diagnostically conclusive results because many optimizations take place between my code and the CPU.
Also, microbenchmarking is hard and can produce erroneous results, which also can happen because of the above mentioned reason.
If I forgot anything, please comment and tell me!
I have a method that converts value to a newBase number of length length.
The logic in english is:
If we calculated every possible combination of numbers from 0 to (c-1)
with a length of x
what set would occur at point i
While the method below does work perfectly, because very large numbers are used, it can take a long time to complete:
For example, value=(((65536^480000)-1)/2), newbase=(65536), length=(480000) takes about an hour to complete on a 64 bit architecture, quad core PC).
private int[] GetValues(BigInteger value, int newBase, int length)
{
Stack<int> result = new Stack<int>();
while (value > 0)
{
result.Push((int)(value % newBase));
if (value < newBase)
value = 0;
else
value = value / newBase;
}
for (var i = result.Count; i < length; i++)
{
result.Push(0);
}
return result.ToArray();
}
My question is, how can I change this method into something that will allow multiple threads to work out part of the number?
I am working C#, but if you're not familiar with that then pseudocode is fine too.
Note: The method is from this question: Cartesian product subset returning set of mostly 0
If that GetValues method is really the bottleneck, there are several things you can do to speed it up.
First, you're dividing by newBase every time through the loop. Since newBase is an int, and the BigInteger divide method divides by a BigInteger, you're potentially incurring the cost of an int-to-BigInteger conversion on every iteration. You might consider:
BigInteger bigNewBase = newBase;
Also, you can cut the number of divides in half by calling DivRem:
while (value > 0)
{
BigInteger rem;
value = BigInteger.DivRem(value, bigNewBase, out rem);
result.Push((int)rem);
}
One other optimization, as somebody mentioned in comments, would be to store the digits in a pre-allocated array. You'll have to call Array.Reverse to get them in the proper order, but that takes approximately no time.
That method, by the way, doesn't lend itself to parallelizing because computing each digit depends on the computation of the previous digit.
I am trying to calculate average for an array of floats. I need to use indices because this is inside a binary search so the top and bottom will move. (Big picture we are trying to optimize a half range estimation so we don't have to re-create the array each pass).
Anyway I wrote a custom average loop and I'm getting 2 places less accuracy than the c# Average() method
float test = input.Average();
int count = (top - bottom) + 1;//number of elements in this iteration
int pos = bottom;
float average = 0f;//working average
while (pos <= top)
{
average += input[pos];
pos++;
}
average = average / count;
example:
0.0371166766 - c#
0.03711666 - my loop
125090.148 - c#
125090.281 - my loop
http://pastebin.com/qRE3VrCt
I'm getting 2 places less accuracy than the c# Average()
No, you are only losing 1 significant digit. The float type can only store 7 significant digits, the rest are just random noise. Inevitably in a calculation like this, you can accumulate round-off error and thus lose precision. Getting the round-off errors to balance out requires luck.
The only way to avoid it is to use a floating point type that has more precision to accumulate the result. Not an issue, you have double available. Which is why the Linq Average method looks like this:
public static float Average(this IEnumerable<float> source) {
if (source == null) throw Error.ArgumentNull("source");
double sum = 0; // <=== NOTE: double
long count = 0;
checked {
foreach (float v in source) {
sum += v;
count++;
}
}
if (count > 0) return (float)(sum / count);
throw Error.NoElements();
}
Use double to reproduce the Linq result with a comparable number of significant digits in the result.
I'd rewrite this as:
int count = (top - bottom) + 1;//number of elements in this iteration
double sum = 0;
for(int i = bottom; i <= top; i++)
{
sum += input[i];
}
float average = (float)(sum/count);
That way you're using a high precision accumulator, which helps reduce rounding errors.
btw. if performance isn't that important, you can still use LINQ to calculate the average of an array slice:
input.Skip(bottom).Take(top - bottom + 1).Average()
I'm not entirely sure if that fits your problem, but if you need to calculate the average of many subarrays, it can be useful to create a persistent sum array, so calculating an average simply becomes two table lookups and a division.
Just to add to the conversation, be careful when using Floating point primitives.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Internally floating point numbers store additional least significant bits that are not reflected in the displayed value (aka: Guard Bits or Guard Digits). They are, however, utilized when performing mathematical operations and equality checks. One common result is that a variable containing 0f is not always zero. When accumulating floating point values this can also lead to precision errors.
Use Decimal for your accumulator:
Will not have rounding errors due to Guard Digits
Is a 128bit data type (less likely to exceed Max Value in your accumulator).
For more info:
What is the difference between Decimal, Float and Double in C#?
I have a simple routine which calculates the aspect ratio from a floating point value. So for the value 1.77777779, the routine returns the string "16:9". I have tested this on my machine and it works fine.
The routine is given as :
public string AspectRatioAsString(float f)
{
bool carryon = true;
int index = 0;
double roundedUpValue = 0;
while (carryon)
{
index++;
float upper = index * f;
roundedUpValue = Math.Ceiling(upper);
if (roundedUpValue - upper <= (double)0.1 || index > 20)
{
carryon = false;
}
}
return roundedUpValue + ":" + index;
}
Now on another machine, I get completely different results. So on my machine, 1.77777779 gives "16:9" but on another machine I get "38:21".
Here's an interesting bit of the C# specifiction, from section 4.1.6:
Floating-point operations may be
performed with higher precision than
the result type of the operation. For
example, some hardware architectures
support an “extended” or “long double”
floating-point type with greater range
and precision than the double type,
and implicitly perform all
floating-point operations using this
higher precision type. Only at
excessive cost in performance can such
hardware architectures be made to
perform floating-point operations with
less precision, and rather than
require an implementation to forfeit
both performance and precision, C#
allows a higher precision type to be
used for all floating-point
operations. Other than delivering more
precise results, this rarely has any
measurable effects.
It is possible that this is one of the "measurable effects" thanks to that call to Ceiling. Taking the ceiling of a floating point number, as others have noted, magnifies a difference of 0.000000002 by nine orders of magnitude because it turns 15.99999999 into 16 and 16.00000001 into 17. Two numbers that differ slightly before the operation differ massively afterwards; the tiny difference might be accounted for by the fact that different machines can have more or less "extra precision" in their floating point operations.
Some related issues:
C# XNA Visual Studio: Difference between "release" and "debug" modes?
CLR JIT optimizations violates causality?
To address your specific problem of how to compute an aspect ratio from a float: I'd possibly solve this a completely different way. I'd make a table like this:
struct Ratio
{
public int X { get; private set; }
public int Y { get; private set; }
public Ratio (int x, int y) : this()
{
this.X = x;
this.Y = y;
}
public double AsDouble() { return (double)X / (double)Y; }
}
Ratio[] commonRatios = {
new Ratio(16, 9),
new Ratio(4, 3),
// ... and so on, maybe the few hundred most common ratios here.
// since you are pinning results to be less than 20, there cannot possibly
// be more than a few hundred.
};
and now your implementation is
public string AspectRatioAsString(double ratio)
{
var results = from commonRatio in commonRatios
select new {
Ratio = commonRatio,
Diff = Math.Abs(ratio - commonRatio.AsDouble())};
var smallestResult = results.Min(x=>x.Diff);
return String.Format("{0}:{1}", smallestResult.Ratio.X, smallestResult.Ratio.Y);
}
Notice how the code now reads very much like the operation you are trying to perform: from this list of common ratios, choose the one where the difference between the given ratio and the common ratio is minimized.
I wouldn't use floating point numbers unless I really had to. They're too prone to this sort of thing due to rounding errors.
Can you change the code to work in double precision? (decimal would be overkill). If you do this, does it give more consistent results?
As to why it's different on different machines, what are the differences between the two machines?
32 bit vs 64 bit?
Windows 7 vs Vista vs XP?
Intel vs AMD processor? (thanks Oded)
Something like this might be the cause.
Try Math.Round instead of Math.Ceiling. If you end up with 16.0000001 and round up you'll incorrectly discard that answer.
Miscellaneous other suggestions:
Doubles are better than floats.
(double) 0.1 cast is unnecessary.
Might want to throw an exception if you can't figure out what the aspect ratio is.
If you return immediately upon finding the answer you can ditch the carryon variable.
A perhaps more accurate check would be to calculate the aspect ratio for each guess and compare it to the input.
Revised (untested):
public string AspectRatioAsString(double ratio)
{
for (int height = 1; height <= 20; ++height)
{
int width = (int) Math.Round(height * ratio);
double guess = (double) width / height;
if (Math.Abs(guess - ratio) <= 0.01)
{
return width + ":" + height;
}
}
throw ArgumentException("Invalid aspect ratio", "ratio");
}
When index is 9, you would expect to get something like upper = 16.0000001 or upper = 15.9999999. Which one you get will depend on rounding error, which may differ on different machines. When it's 15.999999, roundedUpValue - upper <= 0.1 is true, and the loop ends. When it's 16.0000001, roundedUpValue - upper <= 0.1 is false and the loop keeps going until you get to index > 20.
Instead maybe you should try rounding upper to the nearest integer and checking if the absolute value of its difference from that integer is small. In otherwords, use something like if (Math.Abs(Math.Round(upper) - upper) <= (double)0.0001 || index > 20)
We had printf()-statements with floating point values that gave different roundings on computer 1 versus computer 2, even though both computers contained the same Visual Studio 2019 version and build.
The difference was found however in a slightly older Windows 10 SDK versus the newest version. How strange it may seem... After fixing that the differences were gone.