I have a larger array or list of doubles which is not sorted and I want to calculate min, max, mean, median and standard deviation the most efficient way. Of course I could simply use Linq to calculate each one by one, but I think one can go faster. Sample code:
var list = new List<double>(){1.0, 2.5, 0.11, 0.7, 8.2, 3.4, 1.0};
var (min, max, mean, median, std) = CalculateMetrics(list);
private (double, double, double, double, double) CalculateMetrics(List<double> list) {
// TODO
}
So what is the most efficient way? Using libraries is also fine for me.
All the descriptive stats except median you want can be computed in one pass through your list. The trick to getting the standard deviation is accumulating both the sum and the sum-of-squares of your samples. Here's an example of that.
int count = 0;
double sum = 0.0;
double sumsq = 0.0;
double max = double.MinValue;
double min = double.MaxValue;
foreach (double sample in list)
{
count++;
sum += sample;
sumsq += sample * sample;
if (sample > max) max = sample;
if (sample < min) min = sample;
}
double mean = sum / count;
double stdev = Math.Sqrt((sumsq / count) - (mean * mean));
Because this makes only one pass through the list, it works with any IEnumerable collection of samples, and is compatible with LINQ.
Obviously this is quick-n-dirty example code. I leave it to you to build it into a useful function.
It will throw a divide check on an empty list. And, if you have very large numbers or very long lists, that subtraction in the computation of stdev may lose precision and give you back a useless number.
But it works well for most applications.
Because the median is asked for and the standard deviation requires the mean, it makes this hard to do in O(n).
Here's my best attempt:
private (double min, double max, double mean, double median, double std) CalculateMetrics(List<double> list)
{
var mean = list.Average();
var std = Math.Sqrt(list.Aggregate(0.0, (a, x) => a + (x - mean) * (x - mean)) / list.Count());
var sorted = list.OrderBy(x => x).ToList();
var median = sorted.Count % 2 == 0 ? (sorted[sorted.Count / 2 - 1] + sorted[sorted.Count / 2]) / 2 : sorted[sorted.Count / 2];
return (sorted.First(), sorted.Last(), mean, median, std);
}
O(2n) solution:
private static (double, double, double, double, double) CalculateMetrics(double[] list)
{
if (list.Length < 1)
{
throw new Exception();
}
double min = list[0];
double max = list[0];
double median = list[list.Length / 2];
double sum = 0;
foreach (double el in list)
{
if (el > max)
{
max = el;
}
if (el < min)
{
min = el;
}
sum += el;
}
double mean = sum / list.Length;
double sumStd = 0;
foreach (var el in list)
{
sumStd += Math.Pow(el - mean, 2) / list.Length;
}
double stdDev = Math.Sqrt(sumStd);
return (min, max, mean, median, stdDev);
}
In one part of my code I convert from decimal coordinates to degrees/minutes/seconds and I use this:
double coord = 59.345235;
int sec = (int)Math.Round(coord * 3600);
int deg = sec / 3600;
sec = Math.Abs(sec % 3600);
int min = sec / 60;
sec %= 60;
How would I convert back from degrees/minutes/seconds to decimal coordinates?
Try this:
public double ConvertDegreeAngleToDouble( double degrees, double minutes, double seconds )
{
//Decimal degrees =
// whole number of degrees,
// plus minutes divided by 60,
// plus seconds divided by 3600
return degrees + (minutes/60) + (seconds/3600);
}
Just to save others time, I wanted to add on to Byron's answer. If you have the point in string form (e.g. "17.21.18S"), you can use this method:
public double ConvertDegreeAngleToDouble(string point)
{
//Example: 17.21.18S
var multiplier = (point.Contains("S") || point.Contains("W")) ? -1 : 1; //handle south and west
point = Regex.Replace(point, "[^0-9.]", ""); //remove the characters
var pointArray = point.Split('.'); //split the string.
//Decimal degrees =
// whole number of degrees,
// plus minutes divided by 60,
// plus seconds divided by 3600
var degrees = Double.Parse(pointArray[0]);
var minutes = Double.Parse(pointArray[1]) / 60;
var seconds = Double.Parse(pointArray[2]) / 3600;
return (degrees + minutes + seconds) * multiplier;
}
Since degrees are each worth 1 coordinate total, and minutes are worth 1/60 of a coordinate total, and seconds are worth 1/3600 of a coordinate total, you should be able to put them back together with:
new_coord = deg + min/60 + sec/3600
Beware that it won't be the exact same as the original, though, due to floating-point rounding.
Often the western and southern hemispheres are expressed as negative degrees, and seconds contain decimals for accuracy: -86:44:52.892 Remember longitude is the X-coordinate and latitude is the Y-coordinate. This often gets mixed up because people often refer to them lat/lon and X/Y. I modified the code below for the above format.
private double ConvertDegreesToDecimal(string coordinate)
{
double decimalCoordinate;
string[] coordinateArray = coordinate.Split(':');
if (3 == coordinateArray.Length)
{
double degrees = Double.Parse(coordinateArray[0]);
double minutes = Double.Parse(coordinateArray[1]) / 60;
double seconds = Double.Parse(coordinateArray[2]) / 3600;
if (degrees > 0)
{
decimalCoordinate = (degrees + minutes + seconds);
}
else
{
decimalCoordinate = (degrees - minutes - seconds);
}
}
return decimalCoordinate;
}
CoordinateSharp is available as a Nuget package and can handle Coordinate conversions for you. It even does UTM/MGRS conversion and provides solar/lunar times relative to the input location. It's really easy to use!
Coordinate c = new Coordinate(40.465, -75.089);
//Display DMS Format
c.FormatOptions.Format = CoordinateFormatType.Degree_Minutes_Seconds;
c.ToString();//N 40º 27' 54" W 75º 5' 20.4"
c.Latitude.ToString();//N 40º 27' 54"
c.Latitude.ToDouble();//40.465
Coordinate properties are iObservable as as well. So if you change a latitude minute value for example, everything else will update.
The accepted answer to date is inaccurate and doesn't take into account what happens when you add negative numbers to positive numbers. The below code addresses the issue and will correctly convert.
public double ConvertDegreeAngleToDouble(double degrees, double minutes, double seconds)
{
var multiplier = (degrees < 0 ? -1 : 1);
var _deg = (double)Math.Abs(degrees);
var result = _deg + (minutes / 60) + (seconds / 3600);
return result * multiplier;
}
For those who prefer regular expression and to handle format like DDMMSS.dddS
This function could easily be updated to handle other format.
C#
Regex reg = new Regex(#"^((?<D>\d{1,2}(\.\d+)?)(?<W>[SN])|(?<D>\d{2})(?<M>\d{2}(\.\d+)?)(?<W>[SN])|(?<D>\d{2})(?<M>\d{2})(?<S>\d{2}(\.\d+)?)(?<W>[SN])|(?<D>\d{1,3}(\.\d+)?)(?<W>[WE])|(?<D>\d{3})(?<M>\d{2}(\.\d+)?)(?<W>[WE])|(?<D>\d{3})(?<M>\d{2})(?<S>\d{2}(\.\d+)?)(?<W>[WE]))$");
private double DMS2Decimal(string dms)
{
double result = double.NaN;
var match = reg.Match(dms);
if (match.Success)
{
var degrees = double.Parse("0" + match.Groups["D"]);
var minutes = double.Parse("0" + match.Groups["M"]);
var seconds = double.Parse("0" + match.Groups["S"]);
var direction = match.Groups["W"].ToString();
var dec = (Math.Abs(degrees) + minutes / 60d + seconds / 3600d) * (direction == "S" || direction == "W" ? -1 : 1);
var absDec = Math.Abs(dec);
if ((((direction == "W" || direction == "E") && degrees <= 180 & absDec <= 180) || (degrees <= 90 && absDec <= 90)) && minutes < 60 && seconds < 60)
{
result = dec;
}
}
return result;
}
I know this sounds like a homework assignment, but it isn't. Lately I've been interested in algorithms used to perform certain mathematical operations, such as sine, square root, etc. At the moment, I'm trying to write the Babylonian method of computing square roots in C#.
So far, I have this:
public static double SquareRoot(double x) {
if (x == 0) return 0;
double r = x / 2; // this is inefficient, but I can't find a better way
// to get a close estimate for the starting value of r
double last = 0;
int maxIters = 100;
for (int i = 0; i < maxIters; i++) {
r = (r + x / r) / 2;
if (r == last)
break;
last = r;
}
return r;
}
It works just fine and produces the exact same answer as the .NET Framework's Math.Sqrt() method every time. As you can probably guess, though, it's slower than the native method (by around 800 ticks). I know this particular method will never be faster than the native method, but I'm just wondering if there are any optimizations I can make.
The only optimization I saw immediately was the fact that the calculation would run 100 times, even after the answer had already been determined (at which point, r would always be the same value). So, I added a quick check to see if the newly calculated value is the same as the previously calculated value and break out of the loop. Unfortunately, it didn't make much of a difference in speed, but just seemed like the right thing to do.
And before you say "Why not just use Math.Sqrt() instead?"... I'm doing this as a learning exercise and do not intend to actually use this method in any production code.
First, instead of checking for equality (r == last), you should be checking for convergence, wherein r is close to last, where close is defined by an arbitrary epsilon:
eps = 1e-10 // pick any small number
if (Math.Abs(r-last) < eps) break;
As the wikipedia article you linked to mentions - you don't efficiently calculate square roots with Newton's method - instead, you use logarithms.
float InvSqrt (float x){
float xhalf = 0.5f*x;
int i = *(int*)&x;
i = 0x5f3759df - (i>>1);
x = *(float*)&i;
x = x*(1.5f - xhalf*x*x);
return x;}
This is my favorite fast square root. Actually it's the inverse of the square root, but you can invert it after if you want....I can't say if it's faster if you want the square root and not the inverse square root, but it's freaken cool just the same.
http://www.beyond3d.com/content/articles/8/
What you are doing here is you execute Newton's method of finding a root. So you could just use some more efficient root-finding algorithm. You can start searching for it here.
Replacing the division by 2 with a bit shift is unlikely to make that big a difference; given that the division is by a constant I'd hope the compiler is smart enough to do that for you, but you may as well try it to see.
You're much more likely to get an improvement by exiting from the loop early, so either store new r in a variable and compare with old r, or store x/r in a variable and compare that against r before doing the addition and division.
Instead of breaking the loop and then returning r, you could just return r. May not provide any noticable increase in performance.
With your method, each iteration doubles the number of correct bits.
Using a table to obtain the initial 4 bits (for example), you will have 8 bits after the 1st iteration, then 16 bits after the second, and all the bits you need after the fourth iteration (since a double stores 52+1 bits of mantissa).
For a table lookup, you can extract the mantissa in [0.5,1[ and exponent from the input (using a function like frexp), then normalize the mantissa in [64,256[ using multiplication by a suitable power of 2.
mantissa *= 2^K
exponent -= K
After this, your input number is still mantissa*2^exponent. K must be 7 or 8, to obtain an even exponent. You can obtain the initial value for the iterations from a table containing all the square roots of the integral part of mantissa. Perform 4 iterations to get the square root r of mantissa. The result is r*2^(exponent/2), constructed using a function like ldexp.
EDIT. I put some C++ code below to illustrate this. The OP's function sr1 with improved test takes 2.78s to compute 2^24 square roots; my function sr2 takes 1.42s, and the hardware sqrt takes 0.12s.
#include <math.h>
#include <stdio.h>
double sr1(double x)
{
double last = 0;
double r = x * 0.5;
int maxIters = 100;
for (int i = 0; i < maxIters; i++) {
r = (r + x / r) / 2;
if ( fabs(r - last) < 1.0e-10 )
break;
last = r;
}
return r;
}
double sr2(double x)
{
// Square roots of values in 0..256 (rounded to nearest integer)
static const int ROOTS256[] = {
0,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6,
7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9,
9,9,9,9,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11,
11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12,
12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,
13,13,13,13,13,13,13,13,13,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,
14,14,14,14,14,14,14,14,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,
15,15,15,15,15,15,15,15,15,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16 };
// Normalize input
int exponent;
double mantissa = frexp(x,&exponent); // MANTISSA in [0.5,1[ unless X is 0
if (mantissa == 0) return 0; // X is 0
if (exponent & 1) { mantissa *= 128; exponent -= 7; } // odd exponent
else { mantissa *= 256; exponent -= 8; } // even exponent
// Here MANTISSA is in [64,256[
// Initial value on 4 bits
double root = ROOTS256[(int)floor(mantissa)];
// Iterate
for (int it=0;it<4;it++)
{
root = 0.5 * (root + mantissa / root);
}
// Restore exponent in result
return ldexp(root,exponent>>1);
}
int main()
{
// Used to generate the table
// for (int i=0;i<=256;i++) printf(",%.0f",sqrt(i));
double s = 0;
int mx = 1<<24;
// for (int i=0;i<mx;i++) s += sqrt(i); // 0.120s
// for (int i=0;i<mx;i++) s += sr1(i); // 2.780s
for (int i=0;i<mx;i++) s += sr2(i); // 1.420s
}
Define a tolerance and return early when subsequent iterations fall within that tolerance.
Since you said the code below was not fast enough, try this:
static double guess(double n)
{
return Math.Pow(10, Math.Log10(n) / 2);
}
It should be very accurate and hopefully fast.
Here is code for the initial estimate described here. It appears to be pretty good. Use this code, and then you should also iterate until the values converge within an epsilon of difference.
public static double digits(double x)
{
double n = Math.Floor(x);
double d;
if (d >= 1.0)
{
for (d = 1; n >= 1.0; ++d)
{
n = n / 10;
}
}
else
{
for (d = 1; n < 1.0; ++d)
{
n = n * 10;
}
}
return d;
}
public static double guess(double x)
{
double output;
double d = Program.digits(x);
if (d % 2 == 0)
{
output = 6*Math.Pow(10, (d - 2) / 2);
}
else
{
output = 2*Math.Pow(10, (d - 1) / 2);
}
return output;
}
I have been looking at this as well for learning purposes. You may be interested in two modifications I tried.
The first was to use a first order taylor series approximation in x0:
Func<double, double> fNewton = (b) =>
{
// Use first order taylor expansion for initial guess
// http://www27.wolframalpha.com/input/?i=series+expansion+x^.5
double x0 = 1 + (b - 1) / 2;
double xn = x0;
do
{
x0 = xn;
xn = (x0 + b / x0) / 2;
} while (Math.Abs(xn - x0) > Double.Epsilon);
return xn;
};
The second was to try a third order (more expensive), iterate
Func<double, double> fNewtonThird = (b) =>
{
double x0 = b/2;
double xn = x0;
do
{
x0 = xn;
xn = (x0*(x0*x0+3*b))/(3*x0*x0+b);
} while (Math.Abs(xn - x0) > Double.Epsilon);
return xn;
};
I created a helper method to time the functions
public static class Helper
{
public static long Time(
this Func<double, double> f,
double testValue)
{
int imax = 120000;
double avg = 0.0;
Stopwatch st = new Stopwatch();
for (int i = 0; i < imax; i++)
{
// note the timing is strictly on the function
st.Start();
var t = f(testValue);
st.Stop();
avg = (avg * i + t) / (i + 1);
}
Console.WriteLine("Average Val: {0}",avg);
return st.ElapsedTicks/imax;
}
}
The original method was faster, but again, might be interesting :)
Replacing "/ 2" by "* 0.5" makes this ~1.5 times faster on my machine, but of course not nearly as fast as the native implementation.
Well, the native Sqrt() function probably isn't implemented in C#, it'll most likely be done in a low-level language, and it'll certainly be using a more efficient algorithm. So trying to match its speed is probably futile.
However, in regard to just trying to optimize your function for the heckuvit, the Wikipedia page you linked recommends the "starting guess" to be 2^floor(D/2), where D represents the number of binary digits in the number. You could give that an attempt, I don't see much else that could be optimized significantly in your code.
You can try
r = x >> 1;
instead of / 2 (also in the other place you device by 2).
It might give you a slight edge.
I would also move the 100 into the loop. Probably nothing, but we are talking about ticks in here.
just checking it now.
EDIT:
Fixed the > into >>, but it doesn't work for doubles, so nevermind.
the inlining of the 100 gave me no speed increase.