c# float [] average loses accuracy - c#

I am trying to calculate average for an array of floats. I need to use indices because this is inside a binary search so the top and bottom will move. (Big picture we are trying to optimize a half range estimation so we don't have to re-create the array each pass).
Anyway I wrote a custom average loop and I'm getting 2 places less accuracy than the c# Average() method
float test = input.Average();
int count = (top - bottom) + 1;//number of elements in this iteration
int pos = bottom;
float average = 0f;//working average
while (pos <= top)
{
average += input[pos];
pos++;
}
average = average / count;
example:
0.0371166766 - c#
0.03711666 - my loop
125090.148 - c#
125090.281 - my loop
http://pastebin.com/qRE3VrCt

I'm getting 2 places less accuracy than the c# Average()
No, you are only losing 1 significant digit. The float type can only store 7 significant digits, the rest are just random noise. Inevitably in a calculation like this, you can accumulate round-off error and thus lose precision. Getting the round-off errors to balance out requires luck.
The only way to avoid it is to use a floating point type that has more precision to accumulate the result. Not an issue, you have double available. Which is why the Linq Average method looks like this:
public static float Average(this IEnumerable<float> source) {
if (source == null) throw Error.ArgumentNull("source");
double sum = 0; // <=== NOTE: double
long count = 0;
checked {
foreach (float v in source) {
sum += v;
count++;
}
}
if (count > 0) return (float)(sum / count);
throw Error.NoElements();
}
Use double to reproduce the Linq result with a comparable number of significant digits in the result.

I'd rewrite this as:
int count = (top - bottom) + 1;//number of elements in this iteration
double sum = 0;
for(int i = bottom; i <= top; i++)
{
sum += input[i];
}
float average = (float)(sum/count);
That way you're using a high precision accumulator, which helps reduce rounding errors.
btw. if performance isn't that important, you can still use LINQ to calculate the average of an array slice:
input.Skip(bottom).Take(top - bottom + 1).Average()
I'm not entirely sure if that fits your problem, but if you need to calculate the average of many subarrays, it can be useful to create a persistent sum array, so calculating an average simply becomes two table lookups and a division.

Just to add to the conversation, be careful when using Floating point primitives.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Internally floating point numbers store additional least significant bits that are not reflected in the displayed value (aka: Guard Bits or Guard Digits). They are, however, utilized when performing mathematical operations and equality checks. One common result is that a variable containing 0f is not always zero. When accumulating floating point values this can also lead to precision errors.
Use Decimal for your accumulator:
Will not have rounding errors due to Guard Digits
Is a 128bit data type (less likely to exceed Max Value in your accumulator).
For more info:
What is the difference between Decimal, Float and Double in C#?

Related

How to distribute a decimal by .5 as evenly as possible with no rounding

I have an input number that is guaranteed to be specified as n.0 or n.5
Example inputs (only a single input is provided):
0
0.5
13.0
13.5
13337.5
28091.0
I need to distribute this number by n values so that all values total the input exactly, but all values must be n.0 or n.5 with NO rounding allowed.
For example, if 13.5 is input and needs to be distributed between 4, the resulting values would be:
3.5
3.5
3.5
3.0
Another example, if 1.0 is specified and needs to be distributed between 4, the resulting values would be:
0.5
0.5
0.0
0.0
I'm thinking the function signature would look similar to:
private List<decimal> DistributeEvenly(decimal amount, int numDistributions)
{
...
}
I do find examples of how to distribute numbers evenly, but not with the n.0 and n.5 requirement and also not without rounding. How can this be accomplished?
I would just divide the input amount by the number of distributions, ignoring any remainder and use that as the base value. Then take the remainder and subtract 0.5 for each distribution until it is gone. Something like this:
private List<decimal> DistributeEvenly(decimal amount, int numDistributions)
{
var result = new List<decimal>();
//Determines the whole number part of each distribution
decimal baseValue = (int)(amount / numDistributions);
//Fill the list with the baseValue, numDistributions times
result.AddRange(Enumerable.Repeat(baseValue, numDistributions));
//The remainder will be distributed in amounts of 0.5 until it runs out
decimal remainder = amount - (baseValue * numDistributions);
int index = 0;
while (remainder > 0)
{
result[index] += 0.5m;
index++;
if (index >= numDistributions)
{
index = 0;
}
remainder -= 0.5m;
}
return result;
}
If you find the total of the numbers you're given and divide by the number of outputs required, you get the mean that's needed. If you divide this mean value by one-half (equivalently: multiply it by 2), take the floor of the result, and then multiply the result of that by one-half, you get the smaller of the two distinct numbers that need to be in your output list. It is simple from there to work out how many of the output numbers need to be increased by one-half.
Just multiply every number by two, and then distribute them evenly by increments of 1 in a for loop until your total is 0. Divide the result by 2. (Or just do it with floats, but you might get rounding errors)

Double every time brings different values

It's my generating algorithm it's generating random double elements for the array which sum must be 1
public static double [] GenerateWithSumOfElementsIsOne(int elements)
{
double sum = 1;
double [] arr = new double [elements];
for (int i = 0; i < elements - 1; i++)
{
arr[i] = RandomHelper.GetRandomNumber(0, sum);
sum -= arr[i];
}
arr[elements - 1] = sum;
return arr;
}
And the method helper
public static double GetRandomNumber(double minimum, double maximum)
{
Random random = new Random();
return random.NextDouble() * (maximum - minimum) + minimum;
}
My test cases are:
[Test]
[TestCase(7)]
[TestCase(5)]
[TestCase(4)]
[TestCase(8)]
[TestCase(10)]
[TestCase(50)]
public void GenerateWithSumOfElementsIsOne(int num)
{
Assert.AreEqual(1, RandomArray.GenerateWithSumOfElementsIsOne(num).Sum());
}
And the thing is - when I'm testing it returns every time different value like this cases :
Expected: 1
But was: 0.99999999999999967d
Expected: 1
But was: 0.99999999999999989d
But in the next test, it passes sometimes all of them, sometimes not.
I know that troubles with rounding and ask for some help, dear experts :)
https://en.wikipedia.org/wiki/Floating-point_arithmetic
In computing, floating-point arithmetic is arithmetic using formulaic
representation of real numbers as an approximation so as to support a
trade-off between range and precision. For this reason, floating-point
computation is often found in systems which include very small and
very large real numbers, which require fast processing times. A number
is, in general, represented approximately to a fixed number of
significant digits (the significand) and scaled using an exponent in
some fixed base; the base for the scaling is normally two, ten, or
sixteen.
In short, this is what floats do, they dont hold every single value and do approximate. If you would like more precision try using a Decimal instead, or adding tolerance by an epsilon (an upper bound on the relative error due to rounding in floating point arithmetic)
var ratio = a / b;
var diff = Math.Abs(ratio - 1);
return diff <= epsilon;
Round up errors are frequent in case of floating point types (like Single and Double), e.g. let's compute an easy sum:
// 0.1 + 0.1 + ... + 0.1 = ? (100 times). Is it 0.1 * 100 == 10? No!
Console.WriteLine((Enumerable.Range(1, 100).Sum(i => 0.1)).ToString("R"));
Outcome:
9.99999999999998
That's why when comparing floatinfg point values with == or != add tolerance:
// We have at least 8 correct digits
// i.e. the asbolute value of the (round up) error is less than tolerance
Assert.IsTrue(Math.Abs(RandomArray.GenerateWithSumOfElementsIsOne(num).Sum() - 1.0) < 1e-8);

Parallelizing very large array base conversion

I have a method that converts value to a newBase number of length length.
The logic in english is:
If we calculated every possible combination of numbers from 0 to (c-1)
with a length of x
what set would occur at point i
While the method below does work perfectly, because very large numbers are used, it can take a long time to complete:
For example, value=(((65536^480000)-1)/2), newbase=(65536), length=(480000) takes about an hour to complete on a 64 bit architecture, quad core PC).
private int[] GetValues(BigInteger value, int newBase, int length)
{
Stack<int> result = new Stack<int>();
while (value > 0)
{
result.Push((int)(value % newBase));
if (value < newBase)
value = 0;
else
value = value / newBase;
}
for (var i = result.Count; i < length; i++)
{
result.Push(0);
}
return result.ToArray();
}
My question is, how can I change this method into something that will allow multiple threads to work out part of the number?
I am working C#, but if you're not familiar with that then pseudocode is fine too.
Note: The method is from this question: Cartesian product subset returning set of mostly 0
If that GetValues method is really the bottleneck, there are several things you can do to speed it up.
First, you're dividing by newBase every time through the loop. Since newBase is an int, and the BigInteger divide method divides by a BigInteger, you're potentially incurring the cost of an int-to-BigInteger conversion on every iteration. You might consider:
BigInteger bigNewBase = newBase;
Also, you can cut the number of divides in half by calling DivRem:
while (value > 0)
{
BigInteger rem;
value = BigInteger.DivRem(value, bigNewBase, out rem);
result.Push((int)rem);
}
One other optimization, as somebody mentioned in comments, would be to store the digits in a pre-allocated array. You'll have to call Array.Reverse to get them in the proper order, but that takes approximately no time.
That method, by the way, doesn't lend itself to parallelizing because computing each digit depends on the computation of the previous digit.

Performing Math operations on decimal datatype in C#?

I was wondering if the above was at all possible. For example:
Math.Sqrt(myVariableHere);
When looking at the overload, it requires a double parameter, so I'm not sure if there is another way to replicate this with decimal datatypes.
I don't understand why all the answers to that question are the same.
There are several ways to calculate the square root from a number. One of them was proposed by Isaac Newton. I'll only write one of the simplest implementations of this method. I use it to improve the accuracy of double's square root.
// x - a number, from which we need to calculate the square root
// epsilon - an accuracy of calculation of the root from our number.
// The result of the calculations will differ from an actual value
// of the root on less than epslion.
public static decimal Sqrt(decimal x, decimal epsilon = 0.0M)
{
if (x < 0) throw new OverflowException("Cannot calculate square root from a negative number");
decimal current = (decimal)Math.Sqrt((double)x), previous;
do
{
previous = current;
if (previous == 0.0M) return 0;
current = (previous + x / previous) / 2;
}
while (Math.Abs(previous - current) > epsilon);
return current;
}
About speed: in the worst case (epsilon = 0 and number is decimal.MaxValue) the loop repeats less than a three times.
If you want to know more, read this (Hacker's Delight by Henry S. Warren, Jr.)
I just came across this question, and I'd suggest a different algorithm than the one SLenik proposed. This is based on the Babylonian Method.
public static decimal Sqrt(decimal x, decimal? guess = null)
{
var ourGuess = guess.GetValueOrDefault(x / 2m);
var result = x / ourGuess;
var average = (ourGuess + result) / 2m;
if (average == ourGuess) // This checks for the maximum precision possible with a decimal.
return average;
else
return Sqrt(x, average);
}
It doesn't require using the existing Sqrt function, and thus avoids converting to double and back, with the accompanying loss of precision.
In most cases involving a decimal (currency etc), it isn't useful to take a root; and the root won't have anything like the expected precision that you might expect a decimal to have. You can of course force it by casting (assuming we aren't dealing with extreme ends of the decimal range):
decimal root = (decimal)Math.Sqrt((double)myVariableHere);
which forces you to at least acknowledge the inherent rounding issues.
Simple: Cast your decimal to a double and call the function, get the result and cast that back to a decimal. That will probably be faster than any sqrt function you could make on your own, and save a lot of effort.
Math.Sqrt((double)myVariableHere);
Will give you back a double that's the square root of your decimal myVariableHere.

Comparing double values in C#

I've a double variable called x.
In the code, x gets assigned a value of 0.1 and I check it in an 'if' statement comparing x and 0.1
if (x==0.1)
{
----
}
Unfortunately it does not enter the if statement
Should I use Double or double?
What's the reason behind this? Can you suggest a solution for this?
It's a standard problem due to how the computer stores floating point values. Search here for "floating point problem" and you'll find tons of information.
In short – a float/double can't store 0.1 precisely. It will always be a little off.
You can try using the decimal type which stores numbers in decimal notation. Thus 0.1 will be representable precisely.
You wanted to know the reason:
Float/double are stored as binary fractions, not decimal fractions. To illustrate:
12.34 in decimal notation (what we use) means
1 * 101 + 2 * 100 + 3 * 10-1 + 4 * 10-2
The computer stores floating point numbers in the same way, except it uses base 2: 10.01 means
1 * 21 + 0 * 20 + 0 * 2-1 + 1 * 2-2
Now, you probably know that there are some numbers that cannot be represented fully with our decimal notation. For example, 1/3 in decimal notation is 0.3333333…. The same thing happens in binary notation, except that the numbers that cannot be represented precisely are different. Among them is the number 1/10. In binary notation that is 0.000110011001100….
Since the binary notation cannot store it precisely, it is stored in a rounded-off way. Hence your problem.
double and Double are the same (double is an alias for Double) and can be used interchangeably.
The problem with comparing a double with another value is that doubles are approximate values, not exact values. So when you set x to 0.1 it may in reality be stored as 0.100000001 or something like that.
Instead of checking for equality, you should check that the difference is less than a defined minimum difference (tolerance). Something like:
if (Math.Abs(x - 0.1) < 0.0000001)
{
...
}
You need a combination of Math.Abs on X-Y and a value to compare with.
You can use following Extension method approach
public static class DoubleExtensions
{
const double _3 = 0.001;
const double _4 = 0.0001;
const double _5 = 0.00001;
const double _6 = 0.000001;
const double _7 = 0.0000001;
public static bool Equals3DigitPrecision(this double left, double right)
{
return Math.Abs(left - right) < _3;
}
public static bool Equals4DigitPrecision(this double left, double right)
{
return Math.Abs(left - right) < _4;
}
...
Since you rarely call methods on double except ToString I believe its pretty safe extension.
Then you can compare x and y like
if(x.Equals4DigitPrecision(y))
Comparing floating point number can't always be done precisely because of rounding. To compare
(x == .1)
the computer really compares
(x - .1) vs 0
Result of sybtraction can not always be represeted precisely because of how floating point number are represented on the machine. Therefore you get some nonzero value and the condition evaluates to false.
To overcome this compare
Math.Abs(x- .1) vs some very small threshold ( like 1E-9)
From the documentation:
Precision in Comparisons
The Equals method should be used with caution, because two apparently equivalent values can be unequal due to the differing precision of the two values. The following example reports that the Double value .3333 and the Double returned by dividing 1 by 3 are unequal.
...
Rather than comparing for equality, one recommended technique involves defining an acceptable margin of difference between two values (such as .01% of one of the values). If the absolute value of the difference between the two values is less than or equal to that margin, the difference is likely to be due to differences in precision and, therefore, the values are likely to be equal. The following example uses this technique to compare .33333 and 1/3, the two Double values that the previous code example found to be unequal.
So if you really need a double, you should use the techique described on the documentation.
If you can, change it to a decimal. It' will be slower, but you won't have this type of problem.
Use decimal. It doesn't have this "problem".
Exact comparison of floating point values is know to not always work due to the rounding and internal representation issue.
Try imprecise comparison:
if (x >= 0.099 && x <= 0.101)
{
}
The other alternative is to use the decimal data type.
double (lowercase) is just an alias for System.Double, so they are identical.
For the reason, see Binary floating point and .NET.
In short: a double is not an exact type and a minute difference between "x" and "0.1" will throw it off.
Double (called float in some languages) is fraut with problems due to rounding issues, it's good only if you need approximate values.
The Decimal data type does what you want.
For reference decimal and Decimal are the same in .NET C#, as are the double and Double types, they both refer to the same type (decimal and double are very different though, as you've seen).
Beware that the Decimal data type has some costs associated with it, so use it with caution if you're looking at loops etc.
Official MS help, especially interested "Precision in Comparisons" part in context of the question.
https://learn.microsoft.com/en-us/dotnet/api/system.double.equals
// Initialize two doubles with apparently identical values
double double1 = .333333;
double double2 = (double) 1/3;
// Define the tolerance for variation in their values
double difference = Math.Abs(double1 * .00001);
// Compare the values
// The output to the console indicates that the two values are equal
if (Math.Abs(double1 - double2) <= difference)
Console.WriteLine("double1 and double2 are equal.");
else
Console.WriteLine("double1 and double2 are unequal.");
1) Should i use Double or double???
Double and double is the same thing. double is just a C# keyword working as alias for the class System.Double
The most common thing is to use the aliases! The same for string (System.String), int(System.Int32)
Also see Built-In Types Table (C# Reference)
Taking a tip from the Java code base, try using .CompareTo and test for the zero comparison. This assumes the .CompareTo function takes in to account floating point equality in an accurate manner. For instance,
System.Math.PI.CompareTo(System.Math.PI) == 0
This predicate should return true.
// number of digits to be compared
public int n = 12
// n+1 because b/a tends to 1 with n leading digits
public double MyEpsilon { get; } = Math.Pow(10, -(n+1));
public bool IsEqual(double a, double b)
{
// Avoiding division by zero
if (Math.Abs(a)<= double.Epsilon || Math.Abs(b) <= double.Epsilon)
return Math.Abs(a - b) <= double.Epsilon;
// Comparison
return Math.Abs(1.0 - a / b) <= MyEpsilon;
}
Explanation
The main comparison function done using division a/b which should go toward 1. But why division? it simply puts one number as reference defines the second one. For example
a = 0.00000012345
b = 0.00000012346
a/b = 0.999919002
b/a = 1.000081004
(a/b)-1 = 8.099789405475458e-5‬
1-(b/a) = 8.100445524503848e-5‬
or
a=12345*10^8
b=12346*10^8
a/b = 0.999919002
b/a = 1.000081004
(a/b)-1 = 8.099789405475458e-5‬
1-(b/a) = 8.100445524503848e-5‬
by division we get rid of trailing or leading zeros (or relatively small numbers) that pollute our judgement of number precision. In the example, the comparison is of order 10^-5, and we have 4 number accuracy, because of that in the beginning code I wrote comparison with 10^(n+1) where n is number accuracy.
Adding onto Valentin Kuzub's answer above:
we could use a single method that supports providing nth precision number:
public static bool EqualsNthDigitPrecision(this double value, double compareTo, int precisionPoint) =>
Math.Abs(value - compareTo) < Math.Pow(10, -Math.Abs(precisionPoint));
Note: This method is built for simplicity without added bulk and not with performance in mind.
As a general rule:
Double representation is good enough in most cases but can miserably fail in some situations. Use decimal values if you need complete precision (as in financial applications).
Most problems with doubles doesn't come from direct comparison, it use to be a result of the accumulation of several math operations which exponentially disturb the value due to rounding and fractional errors (especially with multiplications and divisions).
Check your logic, if the code is:
x = 0.1
if (x == 0.1)
it should not fail, it's to simple to fail, if X value is calculated by more complex means or operations it's quite possible the ToString method used by the debugger is using an smart rounding, maybe you can do the same (if that's too risky go back to using decimal):
if (x.ToString() == "0.1")
Floating point number representations are notoriously inaccurate because of the way floats are stored internally. E.g. x may actually be 0.0999999999 or 0.100000001 and your condition will fail. If you want to determine if floats are equal you need to specify whether they're equal to within a certain tolerance.
I.e.:
if(Math.Abs(x - 0.1) < tol) {
// Do something
}
My extensions method for double comparison:
public static bool IsEqual(this double value1, double value2, int precision = 2)
{
var dif = Math.Abs(Math.Round(value1, precision) - Math.Round(value2, precision));
while (precision > 0)
{
dif *= 10;
precision--;
}
return dif < 1;
}
To compare floating point, double or float types, use the specific method of CSharp:
if (double1.CompareTo(double2) > 0)
{
// double1 is greater than double2
}
if (double1.CompareTo(double2) < 0)
{
// double1 is less than double2
}
if (double1.CompareTo(double2) == 0)
{
// double1 equals double2
}
https://learn.microsoft.com/en-us/dotnet/api/system.double.compareto?view=netcore-3.1

Categories

Resources