In a very high performance app we find the the CPU can calculate long arithmetic significantly faster then with doubles. However, in our system it was determined that we never need more then 9 decimal places of precision. So we using longs for all floating point arithmetic with a 9 point precision understood.
However, in certain parts of the system it is more convenient due to readability to work with doubles. So we have to convert between the long value that assumes 9 decimal places into double.
We find the simply taking the long and dividing by 10 to the power of 9 or multiplying by 1 divided by 10 to the power of 9 gives imprecise representations in a double.
To solve that we using the Math.Round(value,9) to give the precise values.
However, Math.Round() is horrifically slow for performance.
So our idea at the moment is to directly convert the mantissa and exponent to the binary format of a double since--in that way, there will be zero need for rounding.
We have learned online how to examine bits of a double to get the mantissa and exponent but it's confusing to figure out how to reverse that to take a mantissa and exponent and fabricate a double by using the bits.
Any suggestions?
[Test]
public unsafe void ChangeBitsInDouble()
{
var original = 1.0D;
long bits;
double* dptr = &original;
//bits = *(long*) dptr;
bits = BitConverter.DoubleToInt64Bits(original);
var negative = (bits < 0);
var exponent = (int) ((bits >> 52) & 0x7ffL);
var mantissa = bits & 0xfffffffffffffL;
if( exponent == 0)
{
exponent++;
}
else
{
mantissa = mantissa | (1L << 52);
}
exponent -= 1075;
if( mantissa == 0)
{
return;
}
while ((mantissa & 1) == 0)
{
mantissa >>= 1;
exponent++;
}
Console.WriteLine("Mantissa " + mantissa + ", exponent " + exponent);
}
You shouldn't use a scale factor of 10^9, you should use 2^30 instead.
As you've already realised as per the other answer, doubles work by floating-point binary rather than floating-point decimal, and therefore the initial approach doesn't work.
It's also not clear if it could work with a deliberately simplified formula, because it's not clear what the maximum range you need is, so rounding becomes inevitable.
The problem of doing so quickly but precisely is well-studied and often supported by CPU instructions. Your only chance of beating the built-in conversions is either:
You hit a mathematical breakthrough that's worthy of some serious papers being written about it.
You exclude enough cases that won't occur in your own examples that while the built-ins are better generally yours is optimised for your own use.
Unless the range of values you use is very limited, the potential for short-cutting on conversion between double-precision IEEE 754 and long integer becomes smaller and smaller.
If you're at the point where you have to cover most of the cases IEEE 754 covers, or even a sizable proportion of them, then you'll end up making things slower.
I'd recommend either staying with what you have, moving the cases where double is more convenient to stick with long anyway despite the inconvenience, or if necessary using decimal. You can create a decimal from a long easily with:
private static decimal DivideByBillion (long l)
{
if(l >= 0)
return new decimal((int)(l & 0xFFFFFFFF), (int)(uint)(l >> 32), 0, false, 9);
l = -l;
return new decimal((int)(l & 0xFFFFFFFF), (int)(uint)(l >> 32), 0, true, 9);
}
Now, decimal is magnitudes slower to use in arithmetic than double (precisely because it implements an approach similar to yours in the opening question, but with a varying exponent and larger mantissa). But if you need just a convenient way to obtain a value for display or rendering to string, then hand-hacking the conversion to decimal has advantages over hand-hacking the conversion to double, so it could be worth looking at.
Related
In How does DoubleUtil.DoubleToInt(double val) work? we learn that the .NET Framework has a special way of rounding floating point values:
public static int DoubleToInt(double val)
{
return (0 < val) ? (int)(val + 0.5) : (int)(val - 0.5);
}
Why are they not just using (int)Math.Round(val)?
Or: Why is Math.Round not defined this way if this is superior? There must be some trade-off.
Math.Round would result in the creation of a double with the exact value needed, which would then need to be converted to an int. The code here avoids the creation of that double. It also allows for the elision of error handling, and the code related to other types of rounding modes or digits to round to.
They have different behaviour at value with a fractional part 1/2. According to Math.Round:
If the fractional component of a is halfway between two integers, one of which is even and the other odd, then the even number is returned.
So if val == 0.5, then Math.Round(val) == 0.0, whereas this DoubleToInt would give (int)(0.5+0.5) == 1. In other words, DoubleToInt round 1/2 away from zero (like the standard C round function).
There is also potential here for less desirable behaviour: if val is actually the double before 0.5 (i.e. 0.49999999999999994) then, depending on how C# handles intermediate precision, it may in fact give 1 (as val + 0.5 isn't representable by a double, and could be rounded to 1). This was in fact an infamous specification bug in Java 6 (and earlier).
I could see this being an optimization since to get the same behavior from Round you need to use the MidpointRounding.AwayFromZero option. From the reference source this is implemented via:
private static unsafe double InternalRound(double value, int digits, MidpointRounding mode) {
if (Abs(value) < doubleRoundLimit) {
Double power10 = roundPower10Double[digits];
value *= power10;
if (mode == MidpointRounding.AwayFromZero) {
double fraction = SplitFractionDouble(&value);
if (Abs(fraction) >= 0.5d) {
value += Sign(fraction);
}
}
else {
// On X86 this can be inlined to just a few instructions
value = Round(value);
}
value /= power10;
}
return value;
}
I can only guess that the author of the utility method did some performance comparison.
For a little personal research project I want to generate a string list of all possible values a double precision floating point number can have.
I've found the "r" formatting option, which guarantees that the string can be parsed back into the exact same bit representation:
string s = myDouble.ToString("r");
But how to generate all possible bit combinations? Preferably ordered by value.
Maybe using the unchecked keyword somehow?
unchecked
{
//for all long values
myDouble[i] = myLong++;
}
Disclaimer: It's more a theoretical question, I am not going to read all the numbers... :)
using unsafe code:
ulong i = 0; //long is 64 bit, like double
unsafe
{
double* d = (double*)&i;
for(;i<ulong.MaxValue;i++)
Console.WriteLine(*d);
}
You can start with all possible values 0 <= x < 1. You can create those by having zero for exponent and use different values for the mantissa.
The mantissa is stored in 52 bits of the 64 bits that make a double precision number, so that makes for 2 ^ 52 = 4503599627370496 different numbers between 0 and 1.
From the description of the decimal format you can figure out how the bit pattern (eight bytes) should be for those numbers, then you can use the BitConverter.ToDouble method to do the conversion.
Then you can set the first bit to make the negative version of all those numbers.
All those numbers are unique, beyond that you will start getting duplicate values because there are several ways to express the same value when the exponent is non-zero. For each new non-zero exponent you would get the value that were not possible to express with the previously used expontents.
The values between 0 and 1 will however keep you busy for the forseeable future, so you can just start with those.
This should be doable in safe code: Create a bit string. Convert that to a double. Output. Increment. Repeat.... A LOT.
string bstr = "01010101010101010101010101010101"; // this is 32 instead of 64, adjust as needed
long v = 0;
for (int i = bstr.Length - 1; i >= 0; i--) v = (v << 1) + (bstr[i] - '0');
double d = BitConverter.ToDouble(BitConverter.GetBytes(v), 0);
// increment bstr and loop
It is possible to convert 24 bit integer value into float and then back to 24 bit integer without losing data?
For example, let's consider 8 bit int, a byte, range is [-127..127] (we drop -128).
public static float ToFloatSample (byte x) { return x / 127f; }
So, if x == -127, result will be -1, if x == 127, result will be 1. If x == 64, result will be ~0.5
public static int ToIntSample (float x) { return (int) (x * 127f); }
So now:
int x = some_number;
float f = ToFloatSample (x);
int y = ToIntSample (f);
Will always x == y ? Using 8 bit int yes, but what if I use 24 bit?
Having thought about your question, I now understand what you're asking.
I understand you have 24-bits which represent a real number n such that -1 <= n <= +1 and you want to load this into an instance of System.Single, and back again.
In C/C++ this is actually quite easy with the frexp and ldexp functions, documented here ( how can I extract the mantissa of a double ), but in .NET it's a more involved process.
The C# language specification (and thusly, .NET) states it uses IEEE-754 1989 format, which means you'll need to dump the bits into an integer type so you can perform the bitwise logic to extract the components. This question has already been asked here on SO except for System.Double instead of System.Single, but converting the answer to work with Single is a trivial exercise for the reader ( extracting mantissa and exponent from double in c# ).
In your case, you'd want to store your 24-bit mantissa value in the low-24 bits of an Int32 and then use the code in that linked question to load and extract it from a Single instance.
Every integer in the range [-16777216, 16777216] is exactly representable as an IEEE 754 32-bit binary floating point number. That includes both the unsigned and 2's complement 24 bit integer ranges. Simple casting will do the job.
The range is wider than you would expect because there is an extra significand bit that is not stored - it is a binary digit that is known not to be zero.
Given 2 values like so:
decimal a = 0.15m;
decimal b = 0.85m;
Where a + b will always be 1.0m, both values are only specified to 2 decimal places and both values are >= 0.0m and <= 1.0m
Is it guaranteed that x == total will always be true, for all possible Decimal values of x, a and b? Using the calculation below:
decimal x = 105.99m;
decimal total = (x * a) + (x * b);
Or are there cases where x == total only to 2 decimal places, but not beyond that?
Would it make any difference if a and b could be specified to unlimited decimal places (as much as Decimal allows), but as long as a + b = 1.0m still holds?
Decimal is stored as a sign, an integer, and an integer exponent for the number 10 that represents the decimal location. So long as your integral portion of the number (e.g. 105 in 105.99) is not sufficiently large, then a + b will always equal one. and the outcome of your equation (x * a) + (x * b) will always have the correct value for the four decimal places.
Unlike float and double, precision is not lost up to the size of the data type (128 bits)
From MSDN:
The Decimal value type represents decimal numbers ranging from
positive 79,228,162,514,264,337,593,543,950,335 to negative
79,228,162,514,264,337,593,543,950,335. The Decimal value type is
appropriate for financial calculations requiring large numbers of
significant integral and fractional digits and no round-off errors.
The Decimal type does not eliminate the need for rounding. Rather, it
minimizes errors due to rounding. For example, the following code
produces a result of 0.9999999999999999999999999999 rather than 1
decimal dividend = Decimal.One;
decimal divisor = 3;
// The following displays 0.9999999999999999999999999999 to the console
Console.WriteLine(dividend/divisor * divisor);
The maximum precision of decimal in the CLR is 29 significant digits. When you're using that kind of precision, you're really talking approximation especially if you do multiplication because that requires intermediate results that the CLR must be able to process (see also http://msdn.microsoft.com/en-us/library/364x0z75.aspx).
If you have x with 2 significant digits and, say, a with 20 significant digits, then x * a will already have a minimum precision of 22 digits, and possibly more may be needed for intermediate results.
If x always has only 2 significant digits and you can keep the number of significant digits in a and b low enough (say, 22 digits -- pretty good and probably far enough away from 27 to deal with rounding errors), then I suppose (x * a) + (x * b) should be a pretty precise calculation always.
Finally, whether a + b always makes up 1.0m is of no significance related to a and b's individual precisions.
I've a double variable called x.
In the code, x gets assigned a value of 0.1 and I check it in an 'if' statement comparing x and 0.1
if (x==0.1)
{
----
}
Unfortunately it does not enter the if statement
Should I use Double or double?
What's the reason behind this? Can you suggest a solution for this?
It's a standard problem due to how the computer stores floating point values. Search here for "floating point problem" and you'll find tons of information.
In short – a float/double can't store 0.1 precisely. It will always be a little off.
You can try using the decimal type which stores numbers in decimal notation. Thus 0.1 will be representable precisely.
You wanted to know the reason:
Float/double are stored as binary fractions, not decimal fractions. To illustrate:
12.34 in decimal notation (what we use) means
1 * 101 + 2 * 100 + 3 * 10-1 + 4 * 10-2
The computer stores floating point numbers in the same way, except it uses base 2: 10.01 means
1 * 21 + 0 * 20 + 0 * 2-1 + 1 * 2-2
Now, you probably know that there are some numbers that cannot be represented fully with our decimal notation. For example, 1/3 in decimal notation is 0.3333333…. The same thing happens in binary notation, except that the numbers that cannot be represented precisely are different. Among them is the number 1/10. In binary notation that is 0.000110011001100….
Since the binary notation cannot store it precisely, it is stored in a rounded-off way. Hence your problem.
double and Double are the same (double is an alias for Double) and can be used interchangeably.
The problem with comparing a double with another value is that doubles are approximate values, not exact values. So when you set x to 0.1 it may in reality be stored as 0.100000001 or something like that.
Instead of checking for equality, you should check that the difference is less than a defined minimum difference (tolerance). Something like:
if (Math.Abs(x - 0.1) < 0.0000001)
{
...
}
You need a combination of Math.Abs on X-Y and a value to compare with.
You can use following Extension method approach
public static class DoubleExtensions
{
const double _3 = 0.001;
const double _4 = 0.0001;
const double _5 = 0.00001;
const double _6 = 0.000001;
const double _7 = 0.0000001;
public static bool Equals3DigitPrecision(this double left, double right)
{
return Math.Abs(left - right) < _3;
}
public static bool Equals4DigitPrecision(this double left, double right)
{
return Math.Abs(left - right) < _4;
}
...
Since you rarely call methods on double except ToString I believe its pretty safe extension.
Then you can compare x and y like
if(x.Equals4DigitPrecision(y))
Comparing floating point number can't always be done precisely because of rounding. To compare
(x == .1)
the computer really compares
(x - .1) vs 0
Result of sybtraction can not always be represeted precisely because of how floating point number are represented on the machine. Therefore you get some nonzero value and the condition evaluates to false.
To overcome this compare
Math.Abs(x- .1) vs some very small threshold ( like 1E-9)
From the documentation:
Precision in Comparisons
The Equals method should be used with caution, because two apparently equivalent values can be unequal due to the differing precision of the two values. The following example reports that the Double value .3333 and the Double returned by dividing 1 by 3 are unequal.
...
Rather than comparing for equality, one recommended technique involves defining an acceptable margin of difference between two values (such as .01% of one of the values). If the absolute value of the difference between the two values is less than or equal to that margin, the difference is likely to be due to differences in precision and, therefore, the values are likely to be equal. The following example uses this technique to compare .33333 and 1/3, the two Double values that the previous code example found to be unequal.
So if you really need a double, you should use the techique described on the documentation.
If you can, change it to a decimal. It' will be slower, but you won't have this type of problem.
Use decimal. It doesn't have this "problem".
Exact comparison of floating point values is know to not always work due to the rounding and internal representation issue.
Try imprecise comparison:
if (x >= 0.099 && x <= 0.101)
{
}
The other alternative is to use the decimal data type.
double (lowercase) is just an alias for System.Double, so they are identical.
For the reason, see Binary floating point and .NET.
In short: a double is not an exact type and a minute difference between "x" and "0.1" will throw it off.
Double (called float in some languages) is fraut with problems due to rounding issues, it's good only if you need approximate values.
The Decimal data type does what you want.
For reference decimal and Decimal are the same in .NET C#, as are the double and Double types, they both refer to the same type (decimal and double are very different though, as you've seen).
Beware that the Decimal data type has some costs associated with it, so use it with caution if you're looking at loops etc.
Official MS help, especially interested "Precision in Comparisons" part in context of the question.
https://learn.microsoft.com/en-us/dotnet/api/system.double.equals
// Initialize two doubles with apparently identical values
double double1 = .333333;
double double2 = (double) 1/3;
// Define the tolerance for variation in their values
double difference = Math.Abs(double1 * .00001);
// Compare the values
// The output to the console indicates that the two values are equal
if (Math.Abs(double1 - double2) <= difference)
Console.WriteLine("double1 and double2 are equal.");
else
Console.WriteLine("double1 and double2 are unequal.");
1) Should i use Double or double???
Double and double is the same thing. double is just a C# keyword working as alias for the class System.Double
The most common thing is to use the aliases! The same for string (System.String), int(System.Int32)
Also see Built-In Types Table (C# Reference)
Taking a tip from the Java code base, try using .CompareTo and test for the zero comparison. This assumes the .CompareTo function takes in to account floating point equality in an accurate manner. For instance,
System.Math.PI.CompareTo(System.Math.PI) == 0
This predicate should return true.
// number of digits to be compared
public int n = 12
// n+1 because b/a tends to 1 with n leading digits
public double MyEpsilon { get; } = Math.Pow(10, -(n+1));
public bool IsEqual(double a, double b)
{
// Avoiding division by zero
if (Math.Abs(a)<= double.Epsilon || Math.Abs(b) <= double.Epsilon)
return Math.Abs(a - b) <= double.Epsilon;
// Comparison
return Math.Abs(1.0 - a / b) <= MyEpsilon;
}
Explanation
The main comparison function done using division a/b which should go toward 1. But why division? it simply puts one number as reference defines the second one. For example
a = 0.00000012345
b = 0.00000012346
a/b = 0.999919002
b/a = 1.000081004
(a/b)-1 = 8.099789405475458e-5
1-(b/a) = 8.100445524503848e-5
or
a=12345*10^8
b=12346*10^8
a/b = 0.999919002
b/a = 1.000081004
(a/b)-1 = 8.099789405475458e-5
1-(b/a) = 8.100445524503848e-5
by division we get rid of trailing or leading zeros (or relatively small numbers) that pollute our judgement of number precision. In the example, the comparison is of order 10^-5, and we have 4 number accuracy, because of that in the beginning code I wrote comparison with 10^(n+1) where n is number accuracy.
Adding onto Valentin Kuzub's answer above:
we could use a single method that supports providing nth precision number:
public static bool EqualsNthDigitPrecision(this double value, double compareTo, int precisionPoint) =>
Math.Abs(value - compareTo) < Math.Pow(10, -Math.Abs(precisionPoint));
Note: This method is built for simplicity without added bulk and not with performance in mind.
As a general rule:
Double representation is good enough in most cases but can miserably fail in some situations. Use decimal values if you need complete precision (as in financial applications).
Most problems with doubles doesn't come from direct comparison, it use to be a result of the accumulation of several math operations which exponentially disturb the value due to rounding and fractional errors (especially with multiplications and divisions).
Check your logic, if the code is:
x = 0.1
if (x == 0.1)
it should not fail, it's to simple to fail, if X value is calculated by more complex means or operations it's quite possible the ToString method used by the debugger is using an smart rounding, maybe you can do the same (if that's too risky go back to using decimal):
if (x.ToString() == "0.1")
Floating point number representations are notoriously inaccurate because of the way floats are stored internally. E.g. x may actually be 0.0999999999 or 0.100000001 and your condition will fail. If you want to determine if floats are equal you need to specify whether they're equal to within a certain tolerance.
I.e.:
if(Math.Abs(x - 0.1) < tol) {
// Do something
}
My extensions method for double comparison:
public static bool IsEqual(this double value1, double value2, int precision = 2)
{
var dif = Math.Abs(Math.Round(value1, precision) - Math.Round(value2, precision));
while (precision > 0)
{
dif *= 10;
precision--;
}
return dif < 1;
}
To compare floating point, double or float types, use the specific method of CSharp:
if (double1.CompareTo(double2) > 0)
{
// double1 is greater than double2
}
if (double1.CompareTo(double2) < 0)
{
// double1 is less than double2
}
if (double1.CompareTo(double2) == 0)
{
// double1 equals double2
}
https://learn.microsoft.com/en-us/dotnet/api/system.double.compareto?view=netcore-3.1