I know this sounds like a homework assignment, but it isn't. Lately I've been interested in algorithms used to perform certain mathematical operations, such as sine, square root, etc. At the moment, I'm trying to write the Babylonian method of computing square roots in C#.
So far, I have this:
public static double SquareRoot(double x) {
if (x == 0) return 0;
double r = x / 2; // this is inefficient, but I can't find a better way
// to get a close estimate for the starting value of r
double last = 0;
int maxIters = 100;
for (int i = 0; i < maxIters; i++) {
r = (r + x / r) / 2;
if (r == last)
break;
last = r;
}
return r;
}
It works just fine and produces the exact same answer as the .NET Framework's Math.Sqrt() method every time. As you can probably guess, though, it's slower than the native method (by around 800 ticks). I know this particular method will never be faster than the native method, but I'm just wondering if there are any optimizations I can make.
The only optimization I saw immediately was the fact that the calculation would run 100 times, even after the answer had already been determined (at which point, r would always be the same value). So, I added a quick check to see if the newly calculated value is the same as the previously calculated value and break out of the loop. Unfortunately, it didn't make much of a difference in speed, but just seemed like the right thing to do.
And before you say "Why not just use Math.Sqrt() instead?"... I'm doing this as a learning exercise and do not intend to actually use this method in any production code.
First, instead of checking for equality (r == last), you should be checking for convergence, wherein r is close to last, where close is defined by an arbitrary epsilon:
eps = 1e-10 // pick any small number
if (Math.Abs(r-last) < eps) break;
As the wikipedia article you linked to mentions - you don't efficiently calculate square roots with Newton's method - instead, you use logarithms.
float InvSqrt (float x){
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;}
This is my favorite fast square root. Actually it's the inverse of the square root, but you can invert it after if you want....I can't say if it's faster if you want the square root and not the inverse square root, but it's freaken cool just the same.
http://www.beyond3d.com/content/articles/8/
What you are doing here is you execute Newton's method of finding a root. So you could just use some more efficient root-finding algorithm. You can start searching for it here.
Replacing the division by 2 with a bit shift is unlikely to make that big a difference; given that the division is by a constant I'd hope the compiler is smart enough to do that for you, but you may as well try it to see.
You're much more likely to get an improvement by exiting from the loop early, so either store new r in a variable and compare with old r, or store x/r in a variable and compare that against r before doing the addition and division.
Instead of breaking the loop and then returning r, you could just return r. May not provide any noticable increase in performance.
With your method, each iteration doubles the number of correct bits.
Using a table to obtain the initial 4 bits (for example), you will have 8 bits after the 1st iteration, then 16 bits after the second, and all the bits you need after the fourth iteration (since a double stores 52+1 bits of mantissa).
For a table lookup, you can extract the mantissa in [0.5,1[ and exponent from the input (using a function like frexp), then normalize the mantissa in [64,256[ using multiplication by a suitable power of 2.
mantissa *= 2^K
exponent -= K
After this, your input number is still mantissa*2^exponent. K must be 7 or 8, to obtain an even exponent. You can obtain the initial value for the iterations from a table containing all the square roots of the integral part of mantissa. Perform 4 iterations to get the square root r of mantissa. The result is r*2^(exponent/2), constructed using a function like ldexp.
EDIT. I put some C++ code below to illustrate this. The OP's function sr1 with improved test takes 2.78s to compute 2^24 square roots; my function sr2 takes 1.42s, and the hardware sqrt takes 0.12s.
#include <math.h>
#include <stdio.h>
double sr1(double x)
{
double last = 0;
double r = x * 0.5;
int maxIters = 100;
for (int i = 0; i < maxIters; i++) {
r = (r + x / r) / 2;
if ( fabs(r - last) < 1.0e-10 )
break;
last = r;
}
return r;
}
double sr2(double x)
{
// Square roots of values in 0..256 (rounded to nearest integer)
static const int ROOTS256[] = {
0,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,
9,9,9,9,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,
11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,
12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,
13,13,13,13,13,13,13,13,13,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,
14,14,14,14,14,14,14,14,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,
15,15,15,15,15,15,15,15,15,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16 };
// Normalize input
int exponent;
double mantissa = frexp(x,&exponent); // MANTISSA in [0.5,1[ unless X is 0
if (mantissa == 0) return 0; // X is 0
if (exponent & 1) { mantissa *= 128; exponent -= 7; } // odd exponent
else { mantissa *= 256; exponent -= 8; } // even exponent
// Here MANTISSA is in [64,256[
// Initial value on 4 bits
double root = ROOTS256[(int)floor(mantissa)];
// Iterate
for (int it=0;it<4;it++)
{
root = 0.5 * (root + mantissa / root);
}
// Restore exponent in result
return ldexp(root,exponent>>1);
}
int main()
{
// Used to generate the table
// for (int i=0;i<=256;i++) printf(",%.0f",sqrt(i));
double s = 0;
int mx = 1<<24;
// for (int i=0;i<mx;i++) s += sqrt(i); // 0.120s
// for (int i=0;i<mx;i++) s += sr1(i); // 2.780s
for (int i=0;i<mx;i++) s += sr2(i); // 1.420s
}
Define a tolerance and return early when subsequent iterations fall within that tolerance.
Since you said the code below was not fast enough, try this:
static double guess(double n)
{
return Math.Pow(10, Math.Log10(n) / 2);
}
It should be very accurate and hopefully fast.
Here is code for the initial estimate described here. It appears to be pretty good. Use this code, and then you should also iterate until the values converge within an epsilon of difference.
public static double digits(double x)
{
double n = Math.Floor(x);
double d;
if (d >= 1.0)
{
for (d = 1; n >= 1.0; ++d)
{
n = n / 10;
}
}
else
{
for (d = 1; n < 1.0; ++d)
{
n = n * 10;
}
}
return d;
}
public static double guess(double x)
{
double output;
double d = Program.digits(x);
if (d % 2 == 0)
{
output = 6*Math.Pow(10, (d - 2) / 2);
}
else
{
output = 2*Math.Pow(10, (d - 1) / 2);
}
return output;
}
I have been looking at this as well for learning purposes. You may be interested in two modifications I tried.
The first was to use a first order taylor series approximation in x0:
Func<double, double> fNewton = (b) =>
{
// Use first order taylor expansion for initial guess
// http://www27.wolframalpha.com/input/?i=series+expansion+x^.5
double x0 = 1 + (b - 1) / 2;
double xn = x0;
do
{
x0 = xn;
xn = (x0 + b / x0) / 2;
} while (Math.Abs(xn - x0) > Double.Epsilon);
return xn;
};
The second was to try a third order (more expensive), iterate
Func<double, double> fNewtonThird = (b) =>
{
double x0 = b/2;
double xn = x0;
do
{
x0 = xn;
xn = (x0*(x0*x0+3*b))/(3*x0*x0+b);
} while (Math.Abs(xn - x0) > Double.Epsilon);
return xn;
};
I created a helper method to time the functions
public static class Helper
{
public static long Time(
this Func<double, double> f,
double testValue)
{
int imax = 120000;
double avg = 0.0;
Stopwatch st = new Stopwatch();
for (int i = 0; i < imax; i++)
{
// note the timing is strictly on the function
st.Start();
var t = f(testValue);
st.Stop();
avg = (avg * i + t) / (i + 1);
}
Console.WriteLine("Average Val: {0}",avg);
return st.ElapsedTicks/imax;
}
}
The original method was faster, but again, might be interesting :)
Replacing "/ 2" by "* 0.5" makes this ~1.5 times faster on my machine, but of course not nearly as fast as the native implementation.
Well, the native Sqrt() function probably isn't implemented in C#, it'll most likely be done in a low-level language, and it'll certainly be using a more efficient algorithm. So trying to match its speed is probably futile.
However, in regard to just trying to optimize your function for the heckuvit, the Wikipedia page you linked recommends the "starting guess" to be 2^floor(D/2), where D represents the number of binary digits in the number. You could give that an attempt, I don't see much else that could be optimized significantly in your code.
You can try
r = x >> 1;
instead of / 2 (also in the other place you device by 2).
It might give you a slight edge.
I would also move the 100 into the loop. Probably nothing, but we are talking about ticks in here.
just checking it now.
EDIT:
Fixed the > into >>, but it doesn't work for doubles, so nevermind.
the inlining of the 100 gave me no speed increase.