Which way is more accurate? - c#
I need to divide a numeric range to some segments that have same length. But I can't decide which way is more accurate. For example:
double r1 = 100.0, r2 = 1000.0, r = r2 - r1;
int n = 30;
double[] position = new double[n];
for (int i = 0; i < n; i++)
{
position[i] = r1 + (double)i / n * r;
// position[i] = r1 + i * r / n;
}
It's about (double)int1 / int2 * double or int1 * double / int2. Which way is more accurate? Which way should I use?
Update
The following code will show the difference:
double r1 = 1000.0, r2 = 100000.0, r = r2 - r1;
int n = 300;
double[] position = new double[n];
for (int i = 0; i < n; i++)
{
double v1 = r1 + (double)i / n * r;
double v2 = position[i] = r1 + i * r / n;
if (v1 != v2)
{
Console.WriteLine(v2 - v1);
}
}
Disclaimer: All numbers I am going to give as examples are not exact, but show the principle of what's happening behind the scenes.
Let's examine two cases:
(1) int1 = 1000, int2= 3, double = 3.0
The first method will give you: (1000.0 / 3) * 3 == 333.33333 * 3.0 == 999.999...
While the second will give (1000 * 3.0) / 3 == 3000 / 3 == 1000
In this scenario - the second method is more accurate.
(2) int1 = 2, int2 = 2, double = Double.MAX_VALUE
The first will yield (2.0 / 2) * Double.MAX_VALUE == 1 * Double.MAX_VALUE == Double.MAX_VALUE
While the second will give (2 * Double.MAX_VALUE) / 2 - which will cause (in Java) to be Infinity, I am not sure what the double standard says about this cases, if it might overflow or is it always infinity - but it is definetly an issue.
So, in this case - the first method is more accurate.
The things might go more complicated if the integers are longs or the double is float, since there are long values that cannot be represented by doubles, so loss of accuracy might happen for large double values in this case, and in any case - large double values are less accurate.
Conclusion: Which is better is domain specific. In some cases the first method should be better and in some the first. It really depends on the values of int1,int2, and double.
However- AFAIK, the general rule of thumb with double precision ops is keep the calculations as small as possible (Don't create huge numbers and then decrease them back, keep them small as longest as you can). This issue is known as loss of significant digits.
Neither is particularly faster, since the compiler or the JIT process may reorder the operation for efficiency anyway.
Maybe I misunderstand your requirement but why do any division/multiplication inside the loop at all? Maybe this would get the same results:
decimal r1 = 100.0m, r2 = 1000.0m, r = r2 - r1;
int n = 30;
decimal[] position = new double[n];
decimal diff = r / n;
decimal current = r1;
for (int i = 0; i < n; i++)
{
position[i] = current;
current += diff;
}
Related
How to divide a decimal number into rounded parts that add up to the original number?
All Decimal numbers are rounded to 2 digits when saved into application. I'm given a number totalAmount and asked to divide it into n equal parts(or close to equal). Example : Given : totalAmount = 421.9720; count = 2 (totalAmount saved into application is 421.97) Expected : 210.99, 210.98 => sum = 421.97 Actual(with plain divide) : 210.9860 (210.99), 210.9860 (210.99) => sum = 412.98 My approach : var totalAmount = 421.972m; var count = 2; var individualCharge = Math.Floor(totalAmount / count); var leftOverAmount = totalAmount - (individualCharge * count); for(var i = 0;i < count; i++) { Console.WriteLine(individualCharge + leftOverAmount); leftOverAmount = 0; } This gives (-211.97, -210)
public IEnumerable<decimal> GetDividedAmounts(decimal amount, int count) { var pennies = (int)(amount * 100) % count; var baseAmount = Math.Floor((amount / count) * 100) / 100; foreach (var _ in Enumerable.Range(1, count)) { var offset = pennies-- > 0 ? 0.01m : 0m; yield return baseAmount + offset; } } Feel free to alter this if you want to get an array or an IEnumerable which is not deferred. I updated it to get the baseAmount to be the floor value so it isn't recalculated within the loop. Basically you need to find the base amount and a total of all the leftover pennies. Then, simply add the pennies back one by one until you run out. Because the pennies are based on the modulus operator, they'll always be in the range of [0, count - 1], so you'll never have a final leftover penny.
You're introducing a few rounding errors here, then compounding them. This is a common problem with financial data, especially when you have to constrain your algorithm to only produce outputs with 2 decimal places. It's worse when dealing with actual money in countries where 1 cent/penny/whatever coins are no longer legal tender. At least when working with electronic money the rounding isn't as big an issue. The naive approach of dividing the total by the count and rounding the results is, as you've already discovered, not going to work. What you need is some way to spread out the errors while varying the output amounts by no more than $0.01. No output value can be more than $0.01 from any other output value, and the total must be the truncated total value. What you need is a way to distribute the error across the output values, with the smallest possible variation between the values in the result. The trick is to track your error and adjust the output down once the error is high enough. (This is basically how the Bresenham line-drawing algorithm figures out when to increase the y value, if that helps.) Here's the generalized form, which is pretty quick: public IEnumerable<decimal> RoundedDivide(decimal amount, int count) { int totalCents = (int)Math.Floor(100 * amount); // work out the true division, integer portion and error values float div = totalCents / (float)count; int portion = (int)Math.Floor(div); float stepError = div - portion; float error = 0; for (int i = 0; i < count; i++) { int value = portion; // add in the step error and see if we need to add 1 to the output error += stepError; if (error > 0.5) { value++; error -= 1; } // convert back to dollars and cents for outputput yield return value / 100M; } } I've tested it with count values from 1 through 100, all outputs sum to match the (floored) input value exactly.
Try to break it down to steps: int decimals = 2; int factor = (int)Math.Pow(10, decimals); int count = 2; decimal totalAmount = 421.97232m; totalAmount = Math.Floor(totalAmount * factor) / factor; // 421.97, you may want round here, depends on your requirement. int baseAmount = (int)(totalAmount * factor / count); // 42197 / 2 = 21098 int left = (int)(totalAmount * factor) % count; // 1 // Adding back the left for Mod operation for (int i = 0; i < left; i++) { Console.WriteLine((decimal)(baseAmount + 1) / factor); // 21098 + 1 / 100 = 210.99 } // The reset that does not needs adjust for (int i = 0; i < count - left; i++) { Console.WriteLine((decimal)baseAmount / factor); // 21098 / 100 = 210.98 }
Linear regression gradient descent using C#
I'm taking the Coursera machine learning course right now and I cant get my gradient descent linear regression function to minimize. I use: one dependent variable, an intercept, and four values of x and y, therefore the equations are fairly simple. The final value of the Gradient Decent equation varies wildly depending on the initial values of alpha and beta and I cant figure out why. I've only been coding for about two weeks, so my knowledge is limited to say the least, please keep this in mind if you take the time to help. using System; namespace LinearRegression { class Program { static void Main(string[] args) { Random rnd = new Random(); const int N = 4; //We randomize the inital values of alpha and beta double theta1 = rnd.Next(0, 100); double theta2 = rnd.Next(0, 100); //Values of x, i.e the independent variable double[] x = new double[N] { 1, 2, 3, 4 }; //VAlues of y, i.e the dependent variable double[] y = new double[N] { 5, 7, 9, 12 }; double sumOfSquares1; double sumOfSquares2; double temp1; double temp2; double sum; double learningRate = 0.001; int count = 0; do { //We reset the Generalized cost function, called sum of squares //since I originally used SS to //determine if the function was minimized sumOfSquares1 = 0; sumOfSquares2 = 0; //Adding 1 to counter for each iteration to keep track of how //many iterations are completed thus far count += 1; //First we calculate the Generalized cost function, which is //to be minimized sum = 0; for (int i = 0; i < (N - 1); i++) { sum += Math.Pow((theta1 + theta2 * x[i] - y[i]), 2); } //Since we have 4 values of x and y we have 1/(2*N) = 1 /8 = 0.125 sumOfSquares1 = 0.125 * sum; //Then we calcualte the new alpha value, using the derivative of //the cost function. sum = 0; for (int i = 0; i < (N - 1); i++) { sum += theta1 + theta2 * x[i] - y[i]; } //Since we have 4 values of x and y we have 1/(N) = 1 /4 = 0.25 temp1 = theta1 - learningRate * 0.25 * sum; //Same for the beta value, it has a different derivative sum = 0; for (int i = 0; i < (N - 1); i++) { sum += (theta1 + theta2 * x[i]) * x[i] - y[i]; } temp2 = theta2 - learningRate * 0.25 * sum; //WE change the values of alpha an beta at the same time, otherwise the //function wont work theta1 = temp1; theta2 = temp2; //We then calculate the cost function again, with new alpha and beta values sum = 0; for (int i = 0; i < (N - 1); i++) { sum += Math.Pow((theta1 + theta2 * x[i] - y[i]), 2); } sumOfSquares2 = 0.125 * sum; Console.WriteLine("Alpha: {0:N}", theta1); Console.WriteLine("Beta: {0:N}", theta2); Console.WriteLine("GCF Before: {0:N}", sumOfSquares1); Console.WriteLine("GCF After: {0:N}", sumOfSquares2); Console.WriteLine("Iterations: {0}", count); Console.WriteLine(" "); } while (sumOfSquares2 <= sumOfSquares1 && count < 5000); //we end the iteration cycle once the generalized cost function //cannot be reduced any further or after 5000 iterations Console.ReadLine(); } } }
There are two bugs in the code. First, I assume that you would like to iterate through all the element in the array. So rework the for loop like this: for (int i = 0; i < N; i++) Second, when updating the theta2 value the summation is not calculated well. According to the update function it should be look like this: sum += (theta1 + theta2 * x[i] - y[i]) * x[i]; Why the final values depend on the initial values? Because the gradient descent update step is calculated from these values. If the initial values (Starting Point) are too big or too small, then it will be too far away from the final values (Final Value). You could solve this problem by: Increasing the iteration steps (e.g. 5000 to 50000): gradient descent algorithm has more time to converge. Decreasing the learning rate (e.g. 0.001 to 0.01): gradient descent update steps are bigger, therefore it converges faster. Note: if the learning rate is too small, then it is possible to step through the global minimum. The slope (theta2) is around 2.5 and the intercept (theta1) is around 2.3 for the given data. I have created a github project to fix your code and i have also added a shorter solution using LINQ. It is 5 line of codes. If you are curious check it out here.
C# math statement not working while doing it in parts works
I am wondering if this could be some kind of associativity problem, because when I do the problem on paper, I get the correct answer, but when I run the code I keep getting 4 over and over. Here is the code. Why aren't these equal? What am I missing? The whole problem (returns 4 on every iteration): for (int x = 1; x <= stackCount; x++) { temp = ((x - 1) / stackCount * uBound) + lBound + 1; Base[x] = Top[x] = Convert.ToInt32(Math.Floor(temp)); } Broken into pieces (runs correctly): double temp, temp1, temp2, temp3, temp4; for (int x = 1; x <= stackCount; x++) { temp1 = (x - 1); temp2 = temp1 / stackCount; temp3 = temp2 * uBound; temp4 = temp3 + lBound + 1; Base[x] = Top[x] = Convert.ToInt32(Math.Floor(temp4)); } Added: Yes, I am sorry, I forgot about that declarations: //the main memory for the application private string[] Memory; //arrays to keep track of the bottom and top of stacks private int[] Base; private int[] Top; //keep track of the upper and lower bounds and usable size private int LowerBound; private int UpperBound; private int usableSize; I also think I had that backwards. I thought that if you used a double in a division operation with integers that the result would be a double, but it appears that is not the case. That makes sense! Thank you all!
Speculation: stackCount, uBound, and lBound are all integers or longs. Result: The entire expression is computed as though you're doing integer arithmetic. Solution: temp = ((double)(x -1) / stackCount * uBound) + lBound + 1;
You haven't given us the full code. In particular, the declarations for stackCount, uBound and lBound and temp have all been omitted. You've also omitted the values of the first 3. If, as seems likely, all the bits involved in your expression ((x - 1) / stackCount * uBound) + lBound + 1; are integral types, the result will also be an integral type since integer division is performed: int x = 9 ; int y = 4 ; double z = x / y ; yields the expected double precision value 2.0. ((5 - 1) / 9 * 11) + 3 + 1 The particular integral type that the expression resolves two is depends on the various types involved and whether or not they are signed, and whether or not they are all compatible.
.NET math calculation performances
I asked a question about having the Excel's BetaInv function ported to .NET: BetaInv function in SQL Server now I managed to write that function in pure dependency less C# code and I do get the same results than in MS Excel up to 6 or 7 digits after comma, results are fine for us, the problem is that such code is embedded in a SQL CLR Function and gets called thousands of time from a stored procedure and makes the execution of the whole procedure about 50% slower, from 30 seconds up to a minute if I use that function or not. here some code of it, I am not asking a deep analysis but is there anybody who sees any major performance issue in the way I am doing this calculations? like for example usage of other data types instead of doubles or whatsoever... ? private static double betacf(double a, double b, double x) { int m, m2; double aa, c, d, del, h, qab, qam, qap; qab = a + b; qap = a + 1.0; qam = a - 1.0; c = 1.0; // First step of Lentz’s method. d = 1.0 - qab * x / qap; if (System.Math.Abs(d) < FPMIN) { d = FPMIN; } d = 1.0 / d; h = d; for (m = 1; m <= MAXIT; ++m) { m2 = 2 * m; aa = m * (b - m) * x / ((qam + m2) * (a + m2)); d = 1.0 + aa * d; //One step (the even one) of the recurrence. if (System.Math.Abs(d) < FPMIN) { d = FPMIN; } c = 1.0 + aa / c; if (System.Math.Abs(c) < FPMIN) { c = FPMIN; } d = 1.0 / d; h *= d * c; aa = -(a + m) * (qab + m) * x / ((a + m2) * (qap + m2)); d = 1.0 + aa * d; // Next step of the recurrence (the odd one). if (System.Math.Abs(d) < FPMIN) { d = FPMIN; } c = 1.0 + aa / c; if (System.Math.Abs(c) < FPMIN) { c = FPMIN; } d = 1.0 / d; del = d * c; h *= del; if (System.Math.Abs(del - 1.0) < EPS) { // Are we done? break; } } if (m > MAXIT) { return 0; } else { return h; } } private static double gammln(double xx) { double x, y, tmp, ser; double[] cof = new double[] { 76.180091729471457, -86.505320329416776, 24.014098240830911, -1.231739572450155, 0.001208650973866179, -0.000005395239384953 }; y = xx; x = xx; tmp = x + 5.5; tmp -= (x + 0.5) * System.Math.Log(tmp); ser = 1.0000000001900149; for (int j = 0; j <= 5; ++j) { y += 1; ser += cof[j] / y; } return -tmp + System.Math.Log(2.5066282746310007 * ser / x); }
The only thing that stands out for me, and is usually a performance hit, is memory allocation. I don't know how often gammln is called but you might want to move the double[] cof = new double[] {} to a static one time only allocation.
double is usually the best type. Especially since the functions in Math take doubles. Unfortunately I see no obvious improvements to make on your code. It might be possible to use look up tables to get a better first estimate on which you iterate, but since I don't know the Math behind what you're doing I don't know if that's possible in this specific case. Obviously larger epsilons will improve performance. So choose it as large as possible while fulfilling your accuracy demands. If the function gets called repeatedly with the same parameters you might be able to cache results. One thing that looks odd is the way you force small values for c, d,... to FPMIN. My instinct is that this might lead to suboptimal step sizes.
All I've got is unrolling the j loop in gammln, but it'll make at most a tiny difference. A more radical thought would be to rewrite in pure T-SQL, since it has everything you use: + - * / abs log are all available.
How can I improve this square root method?
I know this sounds like a homework assignment, but it isn't. Lately I've been interested in algorithms used to perform certain mathematical operations, such as sine, square root, etc. At the moment, I'm trying to write the Babylonian method of computing square roots in C#. So far, I have this: public static double SquareRoot(double x) { if (x == 0) return 0; double r = x / 2; // this is inefficient, but I can't find a better way // to get a close estimate for the starting value of r double last = 0; int maxIters = 100; for (int i = 0; i < maxIters; i++) { r = (r + x / r) / 2; if (r == last) break; last = r; } return r; } It works just fine and produces the exact same answer as the .NET Framework's Math.Sqrt() method every time. As you can probably guess, though, it's slower than the native method (by around 800 ticks). I know this particular method will never be faster than the native method, but I'm just wondering if there are any optimizations I can make. The only optimization I saw immediately was the fact that the calculation would run 100 times, even after the answer had already been determined (at which point, r would always be the same value). So, I added a quick check to see if the newly calculated value is the same as the previously calculated value and break out of the loop. Unfortunately, it didn't make much of a difference in speed, but just seemed like the right thing to do. And before you say "Why not just use Math.Sqrt() instead?"... I'm doing this as a learning exercise and do not intend to actually use this method in any production code.
First, instead of checking for equality (r == last), you should be checking for convergence, wherein r is close to last, where close is defined by an arbitrary epsilon: eps = 1e-10 // pick any small number if (Math.Abs(r-last) < eps) break; As the wikipedia article you linked to mentions - you don't efficiently calculate square roots with Newton's method - instead, you use logarithms.
float InvSqrt (float x){ float xhalf = 0.5f*x; int i = *(int*)&x; i = 0x5f3759df - (i>>1); x = *(float*)&i; x = x*(1.5f - xhalf*x*x); return x;} This is my favorite fast square root. Actually it's the inverse of the square root, but you can invert it after if you want....I can't say if it's faster if you want the square root and not the inverse square root, but it's freaken cool just the same. http://www.beyond3d.com/content/articles/8/
What you are doing here is you execute Newton's method of finding a root. So you could just use some more efficient root-finding algorithm. You can start searching for it here.
Replacing the division by 2 with a bit shift is unlikely to make that big a difference; given that the division is by a constant I'd hope the compiler is smart enough to do that for you, but you may as well try it to see. You're much more likely to get an improvement by exiting from the loop early, so either store new r in a variable and compare with old r, or store x/r in a variable and compare that against r before doing the addition and division.
Instead of breaking the loop and then returning r, you could just return r. May not provide any noticable increase in performance.
With your method, each iteration doubles the number of correct bits. Using a table to obtain the initial 4 bits (for example), you will have 8 bits after the 1st iteration, then 16 bits after the second, and all the bits you need after the fourth iteration (since a double stores 52+1 bits of mantissa). For a table lookup, you can extract the mantissa in [0.5,1[ and exponent from the input (using a function like frexp), then normalize the mantissa in [64,256[ using multiplication by a suitable power of 2. mantissa *= 2^K exponent -= K After this, your input number is still mantissa*2^exponent. K must be 7 or 8, to obtain an even exponent. You can obtain the initial value for the iterations from a table containing all the square roots of the integral part of mantissa. Perform 4 iterations to get the square root r of mantissa. The result is r*2^(exponent/2), constructed using a function like ldexp. EDIT. I put some C++ code below to illustrate this. The OP's function sr1 with improved test takes 2.78s to compute 2^24 square roots; my function sr2 takes 1.42s, and the hardware sqrt takes 0.12s. #include <math.h> #include <stdio.h> double sr1(double x) { double last = 0; double r = x * 0.5; int maxIters = 100; for (int i = 0; i < maxIters; i++) { r = (r + x / r) / 2; if ( fabs(r - last) < 1.0e-10 ) break; last = r; } return r; } double sr2(double x) { // Square roots of values in 0..256 (rounded to nearest integer) static const int ROOTS256[] = { 0,1,1,2,2,2,2,3,3,3,3,3,3,4,4,4,4,4,4,4,4,5,5,5,5,5,5,5,5,5,5,6,6,6,6,6,6,6,6,6,6,6,6, 7,7,7,7,7,7,7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,10,11,11,11,11,11, 11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,11,12,12,12,12,12,12,12,12,12,12,12,12, 12,12,12,12,12,12,12,12,12,12,12,12,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13,13, 13,13,13,13,13,13,13,13,13,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14,14, 14,14,14,14,14,14,14,14,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15,15, 15,15,15,15,15,15,15,15,15,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16,16 }; // Normalize input int exponent; double mantissa = frexp(x,&exponent); // MANTISSA in [0.5,1[ unless X is 0 if (mantissa == 0) return 0; // X is 0 if (exponent & 1) { mantissa *= 128; exponent -= 7; } // odd exponent else { mantissa *= 256; exponent -= 8; } // even exponent // Here MANTISSA is in [64,256[ // Initial value on 4 bits double root = ROOTS256[(int)floor(mantissa)]; // Iterate for (int it=0;it<4;it++) { root = 0.5 * (root + mantissa / root); } // Restore exponent in result return ldexp(root,exponent>>1); } int main() { // Used to generate the table // for (int i=0;i<=256;i++) printf(",%.0f",sqrt(i)); double s = 0; int mx = 1<<24; // for (int i=0;i<mx;i++) s += sqrt(i); // 0.120s // for (int i=0;i<mx;i++) s += sr1(i); // 2.780s for (int i=0;i<mx;i++) s += sr2(i); // 1.420s }
Define a tolerance and return early when subsequent iterations fall within that tolerance.
Since you said the code below was not fast enough, try this: static double guess(double n) { return Math.Pow(10, Math.Log10(n) / 2); } It should be very accurate and hopefully fast. Here is code for the initial estimate described here. It appears to be pretty good. Use this code, and then you should also iterate until the values converge within an epsilon of difference. public static double digits(double x) { double n = Math.Floor(x); double d; if (d >= 1.0) { for (d = 1; n >= 1.0; ++d) { n = n / 10; } } else { for (d = 1; n < 1.0; ++d) { n = n * 10; } } return d; } public static double guess(double x) { double output; double d = Program.digits(x); if (d % 2 == 0) { output = 6*Math.Pow(10, (d - 2) / 2); } else { output = 2*Math.Pow(10, (d - 1) / 2); } return output; }
I have been looking at this as well for learning purposes. You may be interested in two modifications I tried. The first was to use a first order taylor series approximation in x0: Func<double, double> fNewton = (b) => { // Use first order taylor expansion for initial guess // http://www27.wolframalpha.com/input/?i=series+expansion+x^.5 double x0 = 1 + (b - 1) / 2; double xn = x0; do { x0 = xn; xn = (x0 + b / x0) / 2; } while (Math.Abs(xn - x0) > Double.Epsilon); return xn; }; The second was to try a third order (more expensive), iterate Func<double, double> fNewtonThird = (b) => { double x0 = b/2; double xn = x0; do { x0 = xn; xn = (x0*(x0*x0+3*b))/(3*x0*x0+b); } while (Math.Abs(xn - x0) > Double.Epsilon); return xn; }; I created a helper method to time the functions public static class Helper { public static long Time( this Func<double, double> f, double testValue) { int imax = 120000; double avg = 0.0; Stopwatch st = new Stopwatch(); for (int i = 0; i < imax; i++) { // note the timing is strictly on the function st.Start(); var t = f(testValue); st.Stop(); avg = (avg * i + t) / (i + 1); } Console.WriteLine("Average Val: {0}",avg); return st.ElapsedTicks/imax; } } The original method was faster, but again, might be interesting :)
Replacing "/ 2" by "* 0.5" makes this ~1.5 times faster on my machine, but of course not nearly as fast as the native implementation.
Well, the native Sqrt() function probably isn't implemented in C#, it'll most likely be done in a low-level language, and it'll certainly be using a more efficient algorithm. So trying to match its speed is probably futile. However, in regard to just trying to optimize your function for the heckuvit, the Wikipedia page you linked recommends the "starting guess" to be 2^floor(D/2), where D represents the number of binary digits in the number. You could give that an attempt, I don't see much else that could be optimized significantly in your code.
You can try r = x >> 1; instead of / 2 (also in the other place you device by 2). It might give you a slight edge. I would also move the 100 into the loop. Probably nothing, but we are talking about ticks in here. just checking it now. EDIT: Fixed the > into >>, but it doesn't work for doubles, so nevermind. the inlining of the 100 gave me no speed increase.