The Double data type cannot correctly represent some base 10 values. This is because of how floating point numbers represent real numbers. What this means is that when representing monetary values, one should use the decimal value type to prevent errors. (feel free to correct errors in this preamble)
What I want to know is what are the values which present such a problem under the Double data-type under a 64 bit architecture in the standard .Net framework (C# if that makes a difference) ?
I expect the answer the be a formula or rule to find such values but I would also like some example values.
Any number which cannot be written as the sum of positive and negative powers of 2 cannot be exactly represented as a binary floating-point number.
The common IEEE formats for 32- and 64-bit representations of floating-point numbers impose further constraints; they limit the number of binary digits in both the significand and the exponent. So there are maximum and minimum representable numbers (approximately +/- 10^308 (base-10) if memory serves) and limits to the precision of a number that can be represented. This limit on the precision means that, for 64-bit numbers, the difference between the exponent of the largest power of 2 and the smallest power in a number is limited to 52, so if your number includes a term in 2^52 it can't also include a term in 2^-1.
Simple examples of numbers which cannot be exactly represented in binary floating-point numbers include 1/3, 2/3, 1/5.
Since the set of floating-point numbers (in any representation) is finite, and the set of real numbers is infinite, one algorithm to find a real number which is not exactly representable as a floating-point number is to select a real number at random. The probability that the real number is exactly representable as a floating-point number is 0.
You generally need to be prepared for the possibility that any value you store in a double has some small amount of error. Unless you're storing a constant value, chances are it could be something with at least some error. If it's imperative that there never be any error, and the values aren't constant, you probably shouldn't be using a floating point type.
What you probably should be asking in many cases is, "How do I deal with the minor floating point errors?" You'll want to know what types of operations can result in a lot of error, and what types don't. You'll want to ensure that comparing two values for "equality" actually just ensures they are "close enough" rather than exactly equal, etc.
This question actually goes beyond any single programming language or platform. The inaccuracy is actually inherent in binary data.
Consider that with a double, each number N to the left (at 0-based index I) of the decimal point represents the value N * 2^I and every digit to the right of the decimal point represents the value N * 2^(-I).
As an example, 5.625 (base 10) would be 101.101 (base 2).
Given this calculation, and decimal value that can't be calculated as a sum of 2^(-I) for different values of I would have an incorrect value as a double.
A float is represented as s, e and m in the following formula
s * m * 2^e
This means that any number that cannot be represented using the given expression (and in the respective domains of s, e and m) cannot be represented exactly.
Basically, you can represent all numbers between 0 and 2^53 - 1 multiplied by a certain power of two (possibly a negative power).
As an example, all numbers between 0 and 2^53 - 1 can be represented multiplied with 2^0 = 1. And you can also represent all those numbers by dividing them by 2 (with a .5 fraction). And so on.
This answer does not fully cover the topic, but I hope it helps.
Related
Why does the following program print what it prints?
class Program
{
static void Main(string[] args)
{
float f1 = 0.09f*100f;
float f2 = 0.09f*99.999999f;
Console.WriteLine(f1 > f2);
}
}
Output is
false
Floating point only has so many digits of precision. If you're seeing f1 == f2, it is because any difference requires more precision than a 32-bit float can represent.
I recommend reading What Every Computer Scientist Should Read About Floating Point
The main thing is this isn't just .Net: it's a limitation of the underlying system most every language will use to represent a float in memory. The precision only goes so far.
You can also have some fun with relatively simple numbers, when you take into account that it's not even base ten. 0.1 (1/10th), for example, is a repeating decimal when represented in binary, just as 1/3rd is when represented in decimal.
In this particular case, it’s because .09 and .999999 cannot be represented with exact precision in binary (similarly, 1/3 cannot be represented with exact precision in decimal). For example, 0.111111111111111111101111 base 2 is 0.999998986721038818359375 base 10. Adding 1 to the previous binary value, 0.11111111111111111111 base 2 is 0.99999904632568359375 base 10. There isn’t a binary value for exactly 0.999999. Floating point precision is also limited by the space allocated for storing the exponent and the fractional part of the mantissa. Also, like integer types, floating point can overflow its range, although its range is larger than integer ranges.
Running this bit of C++ code in the Xcode debugger,
float myFloat = 0.1;
shows that myFloat gets the value 0.100000001. It is off by 0.000000001. Not a lot, but if the computation has several arithmetic operations, the imprecision can be compounded.
imho a very good explanation of floating point is in Chapter 14 of Introduction to Computer Organization with x86-64 Assembly Language & GNU/Linux by Bob Plantz of California State University at Sonoma (retired) http://bob.cs.sonoma.edu/getting_book.html. The following is based on that chapter.
Floating point is like scientific notation, where a value is stored as a mixed number greater than or equal to 1.0 and less than 2.0 (the mantissa), times another number to some power (the exponent). Floating point uses base 2 rather than base 10, but in the simple model Plantz gives, he uses base 10 for clarity’s sake. Imagine a system where two positions of storage are used for the mantissa, one position is used for the sign of the exponent* (0 representing + and 1 representing -), and one position is used for the exponent. Now add 0.93 and 0.91. The answer is 1.8, not 1.84.
9311 represents 0.93, or 9.3 times 10 to the -1.
9111 represents 0.91, or 9.1 times 10 to the -1.
The exact answer is 1.84, or 1.84 times 10 to the 0, which would be 18400 if we had 5 positions, but, having only four positions, the answer is 1800, or 1.8 times 10 to the zero, or 1.8. Of course, floating point data types can use more than four positions of storage, but the number of positions is still limited.
Not only is precision limited by space, but “an exact representation of fractional values in binary is limited to sums of inverse powers of two.” (Plantz, op. cit.).
0.11100110 (binary) = 0.89843750 (decimal)
0.11100111 (binary) = 0.90234375 (decimal)
There is no exact representation of 0.9 decimal in binary. Even carrying the fraction out more places doesn’t work, as you get into repeating 1100 forever on the right.
Beginning programmers often see floating point arithmetic as more
accurate than integer. It is true that even adding two very large
integers can cause overflow. Multiplication makes it even more likely
that the result will be very large and, thus, overflow. And when used
with two integers, the / operator in C/C++ causes the fractional part
to be lost. However, ... floating point representations have their own
set of inaccuracies. (Plantz, op. cit.)
*In floating point, both the sign of the number and the sign of the exponent are represented.
I have read in different post on stackoverflow and in the C# documentation, that converting long (or any other data type representing a number) to double loses precision. This is quite obvious due to the representation of floating point numbers.
My question is, how big is the loss of precision if I convert a larger number to double? Do I have to expect differences larger than +/- X ?
The reason I would like to know this, is that I have to deal with a continuous counter which is a long. This value is read by my application as string, needs to be cast and has to be divided by e.g. 10 or some other small number and is then processed further.
Would decimal be more appropriate for this task?
converting long (or any other data type representing a number) to double loses precision. This is quite obvious due to the representation of floating point numbers.
This is less obvious than it seems, because precision loss depends on the value of long. For values between -252 and 252 there is no precision loss at all.
How big is the loss of precision if I convert a larger number to double? Do I have to expect differences larger than +/- X
For numbers with magnitude above 252 you will experience some precision loss, depending on how much above the 52-bit limit you go. If the absolute value of your long fits in, say, 58 bits, then the magnitude of your precision loss will be 58-52=6 bits, or +/-64.
Would decimal be more appropriate for this task?
decimal has a different representation than double, and it uses a different base. Since you are planning to divide your number by "small numbers", different representations would give you different errors on division. Specifically, double will be better at handling division by powers of two (2, 4, 8, 16, etc.) because such division can be accomplished by subtracting from exponent, without touching the mantissa. Similarly, large decimals would suffer no loss of significant digits when divided by ten, hundred, etc.
long
long is a 64-bit integer type and can hold values from –9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (max. 19 digits).
double
double is 64-bit floating-point type that has precision of 15 to 16 digits. So data can certainly be lost in case your numbers are greater than ~100,000,000,000,000.
decimal
decimal is a 128-bit decimal type and can hold up to 28-29 digits. So it's always safe to cast long to decimal.
Recommendation
I would advice that you find out the exact expectations about the numbers you will be working with. Then you can take an informed decision in choosing the appropriate data type. Since you are reading your numbers from a string, isn't it possible that they will be even greater than 28 digits? In that case, none of the types listed will work for you, and instead you'll have to use some sort of a BigInt implementation.
Is the equality comparison for C# decimal types any more likely to work as we would intuitively expect than other floating point types?
I guess that depends on your intuition. I would assume that some people would think of the result of dividing 1 by 3 as the fraction 1/3, and others would think more along the lines of "Oh, 1 divided by 3 can't be represented as a decimal number, we'll have to decide how many digits to keep, let's go with 0.333".
If you think in the former way, Decimal won't help you much, but if you think in the latter way, and are explicit about rounding when needed, it is more likely that operations that are "intuitively" not subject to rounding errors in decimal, e.g. dividing by 10, will behave as you expect. This is more intuitive to most people than the behavior of a binary floating-point type, where powers of 2 behave nicely, but powers of 10 do not.
Basically, no. The Decimal type simply represents a specialised sort of floating-point number that is designed to reduce rounding error specifically in the base 10 system. That is, the internal representation of a Decimal is in fact in base 10 (denary) and not the usual binary. Hence, it is a rather more appropriate type for monetary calculations -- though not of course limited to such applications.
From the MSDN page for the structure:
The Decimal value type represents decimal numbers ranging from positive 79,228,162,514,264,337,593,543,950,335 to negative 79,228,162,514,264,337,593,543,950,335. The Decimal value type is appropriate for financial calculations requiring large numbers of significant integral and fractional digits and no round-off errors. The Decimal type does not eliminate the need for rounding. Rather, it minimizes errors due to rounding. For example, the following code produces a result of 0.9999999999999999999999999999 rather than 1.
A decimal number is a floating-point value that consists of a sign, a numeric value where each digit in the value ranges from 0 to 9, and a scaling factor that indicates the position of a floating decimal point that separates the integral and fractional parts of the numeric value.
I came across following issue while developing some engineering rule value engine using eval(...) implementation.
Dim first As Double = 1.1
Dim second As Double = 2.2
Dim sum As Double = first + second
If (sum = 3.3) Then
Console.WriteLine("Matched")
Else
Console.WriteLine("Not Matched")
End If
'Above condition returns false because sum's value is 3.3000000000000003 instead of 3.3
It looks like 15th digit is round-tripped. Someone may give better explanation on this pls.
Is Math.Round(...) only solution available OR there is something else also I can attempt?
You are not adding decimals - you are adding up doubles.
Not all doubles can be represented accurately in a computer, hence the error. I suggest reading this article for background (What Every Computer Scientist Should Know About Floating-Point Arithmetic).
Use the Decimal type instead, it doesn't suffer from these issues.
Dim first As Decimal = 1.1
Dim second As Decimal = 2.2
Dim sum As Decimal= first + second
If (sum = 3.3) Then
Console.WriteLine("Matched")
Else
Console.WriteLine("Not Matched")
End If
that's how the double number work in PC.
The best way to compare them is to use such a construction
if (Math.Abs(second - first) <= 1E-9)
Console.WriteLine("Matched")
instead if 1E-9 you can use another number, that would represent the possible error in comparison.
Equality comparisons with floating point operations are always inaccurate because of how fractional values are represented within the machine. You should have some sort of epsilon value by which you're comparing against. Here is an article that describes it much more thoroughly:
http://www.cygnus-software.com/papers/comparingfloats/Comparing%20floating%20point%20numbers.htm
Edit: Math.Round will not be an ideal choice because of the error generated with it for certain comparisons. You are better off determining an epsilon value that can be used to limit the amount of error in the comparison (basically determining the level of accuracy).
A double uses floating-point arithmetic, which is approximate but more efficient. If you need to compare against exact values, use the decimal data type instead.
In C#, Java, Python, and many other languages, decimals/floats are not perfect. Because of the way they are represented (using multipliers and exponents), they often have inaccuracies. See http://www.yoda.arachsys.com/csharp/decimal.html for more info.
From the documentaiton:
http://msdn.microsoft.com/en-us/library/system.double.aspx
Floating-Point Values and Loss of
Precision
Remember that a floating-point number
can only approximate a decimal number,
and that the precision of a
floating-point number determines how
accurately that number approximates a
decimal number. By default, a Double
value contains 15 decimal digits of
precision, although a maximum of 17
digits is maintained internally. The
precision of a floating-point number
has several consequences:
Two floating-point numbers that appear
equal for a particular precision might
not compare equal because their least
significant digits are different.
A mathematical or comparison operation
that uses a floating-point number
might not yield the same result if a
decimal number is used because the
floating-point number might not
exactly approximate the decimal
number.
A value might not roundtrip if a
floating-point number is involved. A
value is said to roundtrip if an
operation converts an original
floating-point number to another form,
an inverse operation transforms the
converted form back to a
floating-point number, and the final
floating-point number is equal to the
original floating-point number. The
roundtrip might fail because one or
more least significant digits are lost
or changed in a conversion.
In addition, the result of arithmetic
and assignment operations with Double
values may differ slightly by platform
because of the loss of precision of
the Double type. For example, the
result of assigning a literal Double
value may differ in the 32-bit and
64-bit versions of the .NET Framework.
The following example illustrates this
difference when the literal value
-4.42330604244772E-305 and a variable whose value is -4.42330604244772E-305
are assigned to a Double variable.
Note that the result of the
Parse(String) method in this case does
not suffer from a loss of precision.
THis is a well known problem with floating point arithmatic. Look into binary coding for further details.
Use the type "decimal" if that will fit your needs.
But in general, you should never compare floating point values to constant floating point values with the equality sign.
Failing that, compare to the number of places that you want to compare to (e.g. say it is 4 then you would go (if sum > 3.2999 and sum < 3.3001)
All the methods in System.Math takes double as parameters and returns parameters. The constants are also of type double. I checked out MathNet.Numerics, and the same seems to be the case there.
Why is this? Especially for constants. Isn't decimal supposed to be more exact? Wouldn't that often be kind of useful when doing calculations?
This is a classic speed-versus-accuracy trade off.
However, keep in mind that for PI, for example, the most digits you will ever need is 41.
The largest number of digits of pi
that you will ever need is 41. To
compute the circumference of the
universe with an error less than the
diameter of a proton, you need 41
digits of pi †. It seems safe to
conclude that 41 digits is sufficient
accuracy in pi for any circle
measurement problem you're likely to
encounter. Thus, in the over one
trillion digits of pi computed in
2002, all digits beyond the 41st have
no practical value.
In addition, decimal and double have a slightly different internal storage structure. Decimals are designed to store base 10 data, where as doubles (and floats), are made to hold binary data. On a binary machine (like every computer in existence) a double will have fewer wasted bits when storing any number within its range.
Also consider:
System.Double 8 bytes Approximately ±5.0e-324 to ±1.7e308 with 15 or 16 significant figures
System.Decimal 12 bytes Approximately ±1.0e-28 to ±7.9e28 with 28 or 29 significant figures
As you can see, decimal has a smaller range, but a higher precision.
No, - decimals are no more "exact" than doubles, or for that matter, any type. The concept of "exactness", (when speaking about numerical representations in a compuiter), is what is wrong. Any type is absolutely 100% exact at representing some numbers. unsigned bytes are 100% exact at representing the whole numbers from 0 to 255. but they're no good for fractions or for negatives or integers outside the range.
Decimals are 100% exact at representing a certain set of base 10 values. doubles (since they store their value using binary IEEE exponential representation) are exact at representing a set of binary numbers.
Neither is any more exact than than the other in general, they are simply for different purposes.
To elaborate a bit furthur, since I seem to not be clear enough for some readers...
If you take every number which is representable as a decimal, and mark every one of them on a number line, between every adjacent pair of them there is an additional infinity of real numbers which are not representable as a decimal. The exact same statement can be made about the numbers which can be represented as a double. If you marked every decimal on the number line in blue, and every double in red, except for the integers, there would be very few places where the same value was marked in both colors.
In general, for 99.99999 % of the marks, (please don't nitpick my percentage) the blue set (decimals) is a completely different set of numbers from the red set (the doubles).
This is because by our very definition for the blue set is that it is a base 10 mantissa/exponent representation, and a double is a base 2 mantissa/exponent representation. Any value represented as base 2 mantissa and exponent, (1.00110101001 x 2 ^ (-11101001101001) means take the mantissa value (1.00110101001) and multiply it by 2 raised to the power of the exponent (when exponent is negative this is equivilent to dividing by 2 to the power of the absolute value of the exponent). This means that where the exponent is negative, (or where any portion of the mantissa is a fractional binary) the number cannot be represented as a decimal mantissa and exponent, and vice versa.
For any arbitrary real number, that falls randomly on the real number line, it will either be closer to one of the blue decimals, or to one of the red doubles.
Decimal is more precise but has less of a range. You would generally use Double for physics and mathematical calculations but you would use Decimal for financial and monetary calculations.
See the following articles on msdn for details.
Double
http://msdn.microsoft.com/en-us/library/678hzkk9.aspx
Decimal
http://msdn.microsoft.com/en-us/library/364x0z75.aspx
Seems like most of the arguments here to "It does not do what I want" are "but it's faster", well so is ANSI C+Gmp library, but nobody is advocating that right?
If you particularly want to control accuracy, then there are other languages which have taken the time to implement exact precision, in a user controllable way:
http://www.doughellmann.com/PyMOTW/decimal/
If precision is really important to you, then you are probably better off using languages that mathematicians would use. If you do not like Fortran then Python is a modern alternative.
Whatever language you are working in, remember the golden rule:
Avoid mixing types...
So do convert a and b to be the same before you attempt a operator b
If I were to hazard a guess, I'd say those functions leverage low-level math functionality (perhaps in C) that does not use decimals internally, and so returning a decimal would require a cast from double to decimal anyway. Besides, the purpose of the decimal value type is to ensure accuracy; these functions do not and cannot return 100% accurate results without infinite precision (e.g., irrational numbers).
Neither Decimal nor float or double are good enough if you require something to be precise. Furthermore, Decimal is so expensive and overused out there it is becoming a regular joke.
If you work in fractions and require ultimate precision, use fractions. It's same old rule, convert once and only when necessary. Your rounding rules too will vary per app, domain and so on, but sure you can find an odd example or two where it is suitable. But again, if you want fractions and ultimate precision, the answer is not to use anything but fractions. Consider you might want a feature of arbitrary precision as well.
The actual problem with CLR in general is that it is so odd and plain broken to implement a library that deals with numerics in generic fashion largely due to bad primitive design and shortcoming of the most popular compiler for the platform. It's almost the same as with Java fiasco.
double just turns out to be the best compromise covering most domains, and it works well, despite the fact MS JIT is still incapable of utilising a CPU tech that is about 15 years old now.
[piece to users of MSDN slowdown compilers]
Double is a built-in type. Is is supported by FPU/SSE core (formerly known as "Math coprocessor"), that's why it is blazingly fast. Especially at multiplication and scientific functions.
Decimal is actually a complex structure, consisting of several integers.