My team is working with financial software that exposes monetary values as C# floating point doubles. Occasionally, we need to compare these values to see if they equal zero, or fall under a particular limit. When I noticed unexpected behavior in this logic, I quickly learned about the rounding errors inherent in floating point doubles (e.g. 1.1 + 2.2 = 3.3000000000000003). Up until this point, I have primarily used C# decimals to represent monetary values.
My team decided to resolve this issue by using the epsilon value approach. Essentially, when you are compare two numbers, if the difference between those two numbers is less than epsilon, they are considered equal. We implemented this approach in a similar way as described in the article below:
https://www.codeproject.com/Articles/383871/Demystify-Csharp-floating-point-equality-and-relat
Our challenge has been determining an appropriate value for epsilon. Our monetary values can have up to 3 digits to the right of the decimal point (scale = 3). This means that the largest epsilon we could use is .0001 (anything larger and the 3rd digit gets ignored). Since epsilon values are supposed to be small, we decided to move it out one more decimal point to .00001 (just to be safe, you could say). C# doubles have a precision of at least 15 digits, so I believe this value of epsilon should work if the number to the left of the decimal point is less or equal to 10 digits (15 - 5 = 10, where 5 is the number of digits epsilon is to the right of the decimal point). With 10 digits, we can represent values into the billions, up to 9,999,999,999.999. It's possible that we may have numbers in the hundreds of millions, but we don't expect to go into the billions, so this limit should suffice.
Is my rationale for choosing this value of epsilon correct? I found a lot of resources that discuss this approach, but I couldn’t find many resources that provide guidance on choosing epsilon.
Your reasoning seems sound, but as you have already discovered it is a complicated issue. You might want to read What Every Computer Scientist Should Know About Floating-Point Arithmetic. You do have a minimum of 15 digits of precision using 64 bit doubles. However, you will also want to validate your inputs as floats can contain Nan, +/- Infinity, negative zero and a considerably larger "range" than 15 decimal digits. If someone hands your library a value like 1.2E102, should you process it or consider it out of range? Ditto with very small values. Garbage In, Garbage out, but it might be nice if you code detected the "smell" of garbage and at very least logged it.
You might also want to consider providing a property for setting precision as well as different forms of rounding. That depends largely on the specifications you are working with. You might also want to determine if these values can represent currencies other than dollars (1 dollar is currently >112 yen).
Long and the short of it choosing your epsilon a digit below your needs (so four digits to the right of the decimal) is sound and gives you a digit to use for consistent rounding. Otherwise $10.0129 and $10.0121 would be equal but their sum would be $20.025 rather than $20.024 ... accountants like things that "foot".
Related
This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
If I execute the following expression in C#:
double i = 10*0.69;
i is: 6.8999999999999995. Why?
I understand numbers such as 1/3 can be hard to represent in binary as it has infinite recurring decimal places but this is not the case for 0.69. And 0.69 can easily be represented in binary, one binary number for 69 and another to denote the position of the decimal place.
How do I work around this? Use the decimal type?
Because you've misunderstood floating point arithmetic and how data is stored.
In fact, your code isn't actually performing any arithmetic at execution time in this particular case - the compiler will have done it, then saved a constant in the generated executable. However, it can't store an exact value of 6.9, because that value cannot be precisely represented in floating point point format, just like 1/3 can't be precisely stored in a finite decimal representation.
See if this article helps you.
why doesn't the framework work around this and hide this problem from me and give me the
right answer,0.69!!!
Stop behaving like a dilbert manager, and accept that computers, though cool and awesome, have limits. In your specific case, it doesn't just "hide" the problem, because you have specifically told it not to. The language (the computer) provides alternatives to the format, that you didn't choose. You chose double, which has certain advantages over decimal, and certain downsides. Now, knowing the answer, you're upset that the downsides don't magically disappear.
As a programmer, you are responsible for hiding this downside from managers, and there are many ways to do that. However, the makers of C# have a responsibility to make floating point work correctly, and correct floating point will occasionally result in incorrect math.
So will every other number storage method, as we do not have infinite bits. Our job as programmers is to work with limited resources to make cool things happen. They got you 90% of the way there, just get the torch home.
And 0.69 can easily be represented in
binary, one binary number for 69 and
another to denote the position of the
decimal place.
I think this is a common mistake - you're thinking of floating point numbers as if they are base-10 (i.e decimal - hence my emphasis).
So - you're thinking that there are two whole-number parts to this double: 69 and divide by 100 to get the decimal place to move - which could also be expressed as:
69 x 10 to the power of -2.
However floats store the 'position of the point' as base-2.
Your float actually gets stored as:
68999999999999995 x 2 to the power of some big negative number
This isn't as much of a problem once you're used to it - most people know and expect that 1/3 can't be expressed accurately as a decimal or percentage. It's just that the fractions that can't be expressed in base-2 are different.
but why doesn't the framework work around this and hide this problem from me and give me the right answer,0.69!!!
Because you told it to use binary floating point, and the solution is to use decimal floating point, so you are suggesting that the framework should disregard the type you specified and use decimal instead, which is very much slower because it is not directly implemented in hardware.
A more efficient solution is to not output the full value of the representation and explicitly specify the accuracy required by your output. If you format the output to two decimal places, you will see the result you expect. However if this is a financial application decimal is precisely what you should use - you've seen Superman III (and Office Space) haven't you ;)
Note that it is all a finite approximation of an infinite range, it is merely that decimal and double use a different set of approximations. The advantage of decimal is it produces the same approximations that you would if you were performing the calculation yourself. For example if you calculated 1/3, you would eventually stop writing 3's when it was 'good enough'.
For the same reason that 1 / 3 in a decimal systems comes out as 0.3333333333333333333333333333333333333333333 and not the exact fraction, which is infinitely long.
To work around it (e.g. to display on screen) try this:
double i = (double) Decimal.Multiply(10, (Decimal) 0.69);
Everyone seems to have answered your first question, but ignored the second part.
The Double data type cannot correctly represent some base 10 values. This is because of how floating point numbers represent real numbers. What this means is that when representing monetary values, one should use the decimal value type to prevent errors. (feel free to correct errors in this preamble)
What I want to know is what are the values which present such a problem under the Double data-type under a 64 bit architecture in the standard .Net framework (C# if that makes a difference) ?
I expect the answer the be a formula or rule to find such values but I would also like some example values.
Any number which cannot be written as the sum of positive and negative powers of 2 cannot be exactly represented as a binary floating-point number.
The common IEEE formats for 32- and 64-bit representations of floating-point numbers impose further constraints; they limit the number of binary digits in both the significand and the exponent. So there are maximum and minimum representable numbers (approximately +/- 10^308 (base-10) if memory serves) and limits to the precision of a number that can be represented. This limit on the precision means that, for 64-bit numbers, the difference between the exponent of the largest power of 2 and the smallest power in a number is limited to 52, so if your number includes a term in 2^52 it can't also include a term in 2^-1.
Simple examples of numbers which cannot be exactly represented in binary floating-point numbers include 1/3, 2/3, 1/5.
Since the set of floating-point numbers (in any representation) is finite, and the set of real numbers is infinite, one algorithm to find a real number which is not exactly representable as a floating-point number is to select a real number at random. The probability that the real number is exactly representable as a floating-point number is 0.
You generally need to be prepared for the possibility that any value you store in a double has some small amount of error. Unless you're storing a constant value, chances are it could be something with at least some error. If it's imperative that there never be any error, and the values aren't constant, you probably shouldn't be using a floating point type.
What you probably should be asking in many cases is, "How do I deal with the minor floating point errors?" You'll want to know what types of operations can result in a lot of error, and what types don't. You'll want to ensure that comparing two values for "equality" actually just ensures they are "close enough" rather than exactly equal, etc.
This question actually goes beyond any single programming language or platform. The inaccuracy is actually inherent in binary data.
Consider that with a double, each number N to the left (at 0-based index I) of the decimal point represents the value N * 2^I and every digit to the right of the decimal point represents the value N * 2^(-I).
As an example, 5.625 (base 10) would be 101.101 (base 2).
Given this calculation, and decimal value that can't be calculated as a sum of 2^(-I) for different values of I would have an incorrect value as a double.
A float is represented as s, e and m in the following formula
s * m * 2^e
This means that any number that cannot be represented using the given expression (and in the respective domains of s, e and m) cannot be represented exactly.
Basically, you can represent all numbers between 0 and 2^53 - 1 multiplied by a certain power of two (possibly a negative power).
As an example, all numbers between 0 and 2^53 - 1 can be represented multiplied with 2^0 = 1. And you can also represent all those numbers by dividing them by 2 (with a .5 fraction). And so on.
This answer does not fully cover the topic, but I hope it helps.
Dividing an int by zero, will throw an exception, but a float won't - at least in Java. Why does a float have additional NaN info, while an int type doesn't?
The representation of a float has been designed such that there are some special combination of bits reserved to store special values such as NaN, infinity, etc.
There are no unused representations for an int type - every bit pattern corresponds to an integer. This has many advantages:
The range of an integer type is as large as possible - no bit patterns are wasted.
The representation of an integer is easy to understand because there are no special cases.
Integer arithmetic can be done at extremely high speed even on very simple processors.
A clear Explanation about float arithmetic is given here
http://www.artima.com/underthehood/floatingP.html
I think the real reason, the root of this, is the well known fact: computers store everything in zeroes and ones.
What does it have to do with integers, floats and zero division? It's pretty simple. If you have only zeroes and ones, it is pretty easy to combine them into integer numbers, like you do with decimal digits. So "10" becomes two, "11" becomes three and so on. This kind of integer representation is so natural that no one would think of inventing anything else for integers, it would just make CPUs more complicated and things more confusing. The only "invention" that was required is to figure out how to store negative numbers, but that's also very natural and simple if you start from the point that x+(-x) should always be equal to zero, without using any special kind of addition here. That's why 11111111 is -1 for 8-bit integers, because if you add 1 to it, it becomes 100000000, then 8th bit is truncated due to overflow and you get your zero. But this natural format has no place for infinities and NaNs, and nobody wanted to invent a non-natural representation just for that. Well, I won't be surprised if someone actually did that, but there is no way such format would become well-known and widely used.
Now, for floating-point numbers, there is no natural representation. Even if we translate 0.5 to binary, it would still be something like 0.1 only now we have "binary point" instead of decimal point. But CPUs can't naturally represent a "point", only 1 and 0. So some kind of special format was needed. There was simply no other way to go. And then someone probably suggested, "Hey guys, while we are at it, why not to include special representation for infinity and other numeric nonsense?" and so it was done.
This is the reason why these formats are so different. How to handle divisions by zero, it's up to language designers, but for floating-points they have the choice between inf/NaN and exceptions, while for integers they don't naturally have such kind of thing.
Basically, it's a purely arbitrary decision.
The traditional int tries to use all the bits for representing possible numbers, whereas IEEE 754 standard reserves a special value for NaN.
The standard could be changed for ints to include special values, at a cost of less efficient operations. The developers usually expect int operations to be very efficient, whereas the operations with floating point numbers are (purely psychologically) more allowed to be slower.
Ints and floats are represented differently inside the machine. Integers usually use a signed, two's complement representation that is (essentially) the number written out in base two. Floats, on the other hand, use a more complex representation that can hold much larger and much smaller values. However, the machine reserves several special bit patterns for floats to mean things other than numbers. There's values for NaN, and for positive or negative infinity, for example. This means that if you divide a float by zero, there is a series of bits that the computer can use to encode that you divided by zero. For ints, all bit patterns are used to encode numbers, so there's no meaningful series of bits the computer could use to represent the error.
This isn't an essential property of ints, though. One could, in theory, make an integer representation that handles division by zero by returning some NaN variant. It's just not what's done in practice.
Java reflects the way most CPUs are implemented. Integer divide by zero causes an interrupt on x86/x64 and Floating point divide by zero results in Infinity, Negative infinity or NaN. Note: with floating point you can also divide by negative zero. :P
All the methods in System.Math takes double as parameters and returns parameters. The constants are also of type double. I checked out MathNet.Numerics, and the same seems to be the case there.
Why is this? Especially for constants. Isn't decimal supposed to be more exact? Wouldn't that often be kind of useful when doing calculations?
This is a classic speed-versus-accuracy trade off.
However, keep in mind that for PI, for example, the most digits you will ever need is 41.
The largest number of digits of pi
that you will ever need is 41. To
compute the circumference of the
universe with an error less than the
diameter of a proton, you need 41
digits of pi †. It seems safe to
conclude that 41 digits is sufficient
accuracy in pi for any circle
measurement problem you're likely to
encounter. Thus, in the over one
trillion digits of pi computed in
2002, all digits beyond the 41st have
no practical value.
In addition, decimal and double have a slightly different internal storage structure. Decimals are designed to store base 10 data, where as doubles (and floats), are made to hold binary data. On a binary machine (like every computer in existence) a double will have fewer wasted bits when storing any number within its range.
Also consider:
System.Double 8 bytes Approximately ±5.0e-324 to ±1.7e308 with 15 or 16 significant figures
System.Decimal 12 bytes Approximately ±1.0e-28 to ±7.9e28 with 28 or 29 significant figures
As you can see, decimal has a smaller range, but a higher precision.
No, - decimals are no more "exact" than doubles, or for that matter, any type. The concept of "exactness", (when speaking about numerical representations in a compuiter), is what is wrong. Any type is absolutely 100% exact at representing some numbers. unsigned bytes are 100% exact at representing the whole numbers from 0 to 255. but they're no good for fractions or for negatives or integers outside the range.
Decimals are 100% exact at representing a certain set of base 10 values. doubles (since they store their value using binary IEEE exponential representation) are exact at representing a set of binary numbers.
Neither is any more exact than than the other in general, they are simply for different purposes.
To elaborate a bit furthur, since I seem to not be clear enough for some readers...
If you take every number which is representable as a decimal, and mark every one of them on a number line, between every adjacent pair of them there is an additional infinity of real numbers which are not representable as a decimal. The exact same statement can be made about the numbers which can be represented as a double. If you marked every decimal on the number line in blue, and every double in red, except for the integers, there would be very few places where the same value was marked in both colors.
In general, for 99.99999 % of the marks, (please don't nitpick my percentage) the blue set (decimals) is a completely different set of numbers from the red set (the doubles).
This is because by our very definition for the blue set is that it is a base 10 mantissa/exponent representation, and a double is a base 2 mantissa/exponent representation. Any value represented as base 2 mantissa and exponent, (1.00110101001 x 2 ^ (-11101001101001) means take the mantissa value (1.00110101001) and multiply it by 2 raised to the power of the exponent (when exponent is negative this is equivilent to dividing by 2 to the power of the absolute value of the exponent). This means that where the exponent is negative, (or where any portion of the mantissa is a fractional binary) the number cannot be represented as a decimal mantissa and exponent, and vice versa.
For any arbitrary real number, that falls randomly on the real number line, it will either be closer to one of the blue decimals, or to one of the red doubles.
Decimal is more precise but has less of a range. You would generally use Double for physics and mathematical calculations but you would use Decimal for financial and monetary calculations.
See the following articles on msdn for details.
Double
http://msdn.microsoft.com/en-us/library/678hzkk9.aspx
Decimal
http://msdn.microsoft.com/en-us/library/364x0z75.aspx
Seems like most of the arguments here to "It does not do what I want" are "but it's faster", well so is ANSI C+Gmp library, but nobody is advocating that right?
If you particularly want to control accuracy, then there are other languages which have taken the time to implement exact precision, in a user controllable way:
http://www.doughellmann.com/PyMOTW/decimal/
If precision is really important to you, then you are probably better off using languages that mathematicians would use. If you do not like Fortran then Python is a modern alternative.
Whatever language you are working in, remember the golden rule:
Avoid mixing types...
So do convert a and b to be the same before you attempt a operator b
If I were to hazard a guess, I'd say those functions leverage low-level math functionality (perhaps in C) that does not use decimals internally, and so returning a decimal would require a cast from double to decimal anyway. Besides, the purpose of the decimal value type is to ensure accuracy; these functions do not and cannot return 100% accurate results without infinite precision (e.g., irrational numbers).
Neither Decimal nor float or double are good enough if you require something to be precise. Furthermore, Decimal is so expensive and overused out there it is becoming a regular joke.
If you work in fractions and require ultimate precision, use fractions. It's same old rule, convert once and only when necessary. Your rounding rules too will vary per app, domain and so on, but sure you can find an odd example or two where it is suitable. But again, if you want fractions and ultimate precision, the answer is not to use anything but fractions. Consider you might want a feature of arbitrary precision as well.
The actual problem with CLR in general is that it is so odd and plain broken to implement a library that deals with numerics in generic fashion largely due to bad primitive design and shortcoming of the most popular compiler for the platform. It's almost the same as with Java fiasco.
double just turns out to be the best compromise covering most domains, and it works well, despite the fact MS JIT is still incapable of utilising a CPU tech that is about 15 years old now.
[piece to users of MSDN slowdown compilers]
Double is a built-in type. Is is supported by FPU/SSE core (formerly known as "Math coprocessor"), that's why it is blazingly fast. Especially at multiplication and scientific functions.
Decimal is actually a complex structure, consisting of several integers.
So, we know that fractions such as 0.1, cannot be accurately represented in binary base, which cause precise problems (such as mentioned here: Formatting doubles for output in C#).
And we know we have the decimal type for a decimal representation of numbers... but the problem is, a lot of Math methods, do not supporting decimal type, so we have convert them to double, which ruins the number again.
so what should we do?
Oh, what should we do about the fact that most decimal fractions cannot be represented in binary? or for that matter, that binary fractions cannot be represented in Decimal ?
or, even, that an infinity (in fact, a non-countable infinity) of real numbers in all bases cannot be accurately represented in any computerized system??
nothing! To recall an old cliche, You can get close enough for government work... In fact, you can get close enough for any work... There is no limit to the degree of accuracy the computer can generate, it just cannot be infinite, (which is what would be required for a number representation scheme to be able to represent every possible real number)
You see, for every number representation scheme you can design, in any computer, it can only represent a finite number of distinct different real numbers with 100.00 % accuracy. And between each adjacent pair of those numbers (those that can be represented with 100% accuracy), there will always be an infinity of other numbers that it cannot represent with 100% accuracy.
so what should we do?
We just keep on breathing. It really isn't a structural problem. We have a limited precision but usually more than enough. You just have to remember to format/round when presenting the numbers.
The problem in the following snippet is with the WriteLine(), not in the calculation(s):
double x = 6.9 - 10 * 0.69;
Console.WriteLine("x = {0}", x);
If you have a specific problem, th post it. There usually are ways to prevent loss of precision. If you really need >= 30 decimal digits, you need a special library.
Keep in mind that the precision you need, and the rounding rules required, will depend on your problem domain.
If you are writing software to control a nuclear reactor, or to model the first billionth of a second of the universe after the big bang (my friend actually did that), you will need much higher precision than if you are calculating sales tax (something I do for a living).
In the finance world, for example, there will be specific requirements on precision either implicitly or explicitly. Some US taxing jurisdictions specify tax rates to 5 digits after the decimal place. Your rounding scheme needs to allow for that much precision. When much of Western Europe converted to the Euro, there was a very specific approach to rounding that was written into law. During that transition period, it was essential to round exactly as required.
Know the rules of your domain, and test that your rounding scheme satisfies those rules.
I think everyone implying:
Inverting a sparse matrix? "There's an app for that", etc, etc
Numerical computation is one well-flogged horse. If you have a problem, it was probably put to pasture before 1970 or even much earlier, carried forward library by library or snippet by snippet into the future.
you could shift the decimal point so that the numbers are whole, then do 64 bit integer arithmetic, then shift it back. Then you would only have to worry about overflow problems.
And we know we have the decimal type
for a decimal representation of
numbers... but the problem is, a lot
of Math methods, do not supporting
decimal type, so we have convert them
to double, which ruins the number
again.
Several of the Math methods do support decimal: Abs, Ceiling, Floor, Max, Min, Round, Sign, and Truncate. What these functions have in common is that they return exact results. This is consistent with the purpose of decimal: To do exact arithmetic with base-10 numbers.
The trig and Exp/Log/Pow functions return approximate answers, so what would be the point of having overloads for an "exact" arithmetic type?