Floating point operations ambiguity [duplicate] - c#

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is floating point arithmetic in C# imprecise?
Why is there a bias in floating point ops? Any specific reason?
Output:
160
139
static void Main()
{
float x = (float) 1.6;
int y = (int)(x * 100);
float a = (float) 1.4;
int b = (int)(a * 100);
Console.WriteLine(y);
Console.WriteLine(b);
Console.ReadKey();
}

Any rational number that has a denominator that is not a power of 2 will lead to an infinite number of digits when represented as a binary. Here you have 8/5 and 7/5. Therefore there is no exact binary representation as a floating-point number (unless you have infinite memory).
The exact binary representation of 1.6 is 110011001100110011001100110011001100...
The exact binary representation of 1.4 is 101100110011001100110011001100110011...
Both values have an infinite number of digits (1100 is repeated endlessly).
float values have a precision of 24 bits. So the binary representation of any value will be rounded to 24 bits. If you round the given values to 24 bits you get:
1.6: 110011001100110011001101 (decimal 13421773) - rounded up
1.4: 101100110011001100110011 (decimal 11744051) - rounded down
Both values have an exponent of 0 (the first bit is 2^0 = 1, the second is 2^-1 = 0.5 etc.).
Since the first bit in a 24 bit value is 2^23 you can calculate the exact decimal values by dividing the 24 bit values (13421773 and 11744051) by two 23 times.
The values are: 1.60000002384185791015625 and 1.39999997615814208984375.
When using floating-point types you always have to consider that their precision is finite. Values that can be written exact as decimal values might be rounded up or down when represented as binaries. Casting to int does not respect that because it truncates the given values. You should always use something like Math.Round.
If you really need an exact representation of rational numbers you need a completely different approach. Since rational numbers are fractions you can use integers to represent them. Here is an example of how you can achieve that.
However, you can not write Rational x = (Rational)1.6 then. You have to write something like Rational x = new Rational(8, 5) (or new Rational(16, 10) etc.).

This is due to the fact that floating point arithmetic is not precise. When you set a to 1.4, internally it may not be exactly 1.4, just as close as can be made with machine precision. If it is fractionally less than 1.4, then multiplying by 100 and casting to integer will take only the integer portion which in this case would be 139. You will get far more technically precise answers but essentially this is what is happening.
In the case of your output for the 1.6 case, the floating point representation may actually be minutely larger than 1.6 and so when you multiply by 100, the total is slightly larger than 160 and so the integer cast gives you what you expect. The fact is that there is simply not enough precision available in a computer to store every real number exactly.
See this link for details of the conversion from floating point to integer types http://msdn.microsoft.com/en-us/library/aa691289%28v=vs.71%29.aspx - it has its own section.

The floating point types float (32 bit) and double (64 bit) have a limited precision and more over the value is represented as a binary value internally. Just as you cannot represent 1/7 precisely in a decimal system (~ 0.1428571428571428...), 1/10 cannot be represented precisely in a binary system.
You can however use the decimal type. It still has a limited (however high) precision, but the numbers a represented in a decimal way internally. Therefore a value like 1/10 is represented exactly like 0.1000000000000000000000000000 internally. 1/7 is still a problem for decimal. But at least you don't get a loss of precision by converting to binary and then back to decimal.
Consider using decimal.

Related

Understanding C# floating point numbers precision. How does C# stores float numbers? [duplicate]

This question already has answers here:
How to calculate float type precision and does it make sense?
(4 answers)
Closed 1 year ago.
I have some doubts about what "precision" actually means in C# when working with floating numbers. I apologize in advance if a logic is weak and for the long explanation.
I know float number (e.g. 10.01F) has a precision of 6 to 9 digits. So, let's say we have the next code:
float myFloat = 1.000001F;
Console.WriteLine(myFloat);
I get the exact number in console. Now, let's use the next code:
myFloat = 1.00000006F;
Console.WriteLine(myFloat);
A different number is printed: 1.0000001, even thought the number has 9 digits, which is the limit.
This is my first doubt. Does precision depends of the number itself or the computer's architecture?
Furthermore, data is store as bits in the computer, bearing that in mid, I remember that converting the decimal part of a number to bits can lead to a different number when transforming the number back to decimal. For example:
(Decimal) 1.0001 -> (Binary) 1.00000000000001101001
(Binary) 1.00000000000001101001 -> (Decimal) 1.00010013580322265625 (It's not the same)
My logic after this is: maybe a float number doesn't lose information when stored, maybe such information is lost when the number is converted back to decimal to show it to the user.
E.g.
float myFloat = 999999.11F + 1.11F;
The result of the above should be: 1000000.22. However, since this number exceeds the precision of a float, I should see a different number, which indeed happens: 1000000.25
There is a 0.03 difference. In order to see if the actual result is 1000000.22 I did the next condition:
if (myFloat == 1000000.22F) {
Console.WriteLine("Real result = 100000.22");
}
And it actually prints it: Real result = 100000.22.
So... the information loss occurs when converting the bits back to decimal? or it also happens in the lower levels of computing and my example was just a coincidence?
1.000001F in source code is converted to the float value 8,388,616•2−23, which is 1.00000095367431640625.
1.00000006F in source code is converted to the float value 8,388,609•2−23, which is 1.00000011920928955078125.
Console.WriteLine shows only some of the value of these; it rounds its display to a limited number of digits, by default.
999999.11F is converted to 15,999,986•2−4 which is 999,999.125. 1.11F is converted to 9,311,355•2−23, which is 1.11000001430511474609375. When these are added using real-number mathematics, the result is 8,388,609,971,323•2−23. That cannot be represented in a float, because the fraction portion of a float (called the significand) can only have 24 bits, so its maximum value as an integer is 16,777,215. If we divide that significand by 219 to reduce it to that limit, we get approximately 8,388,609,971,323/219 • 2−23•219 = 16,000,003.76•2−4. Rounding that significand to an integer produces 16,000,004•2−4. So, when those two numbers are added, float arithmetic rounds the result and produces 16,000,004•2−4, which is 1,000,000.25.
So... the information loss occurs when converting the bits back to decimal? or it also happens in the lower levels of computing and my example was just a coincidence?
Converting a decimal numeral to floating-point generally introduces a rounding error.
Adding floating-point numbers generally introduces a rounding error.
Converting a floating-point number to a decimal numeral with limited precision generally introduces a rounding error.
The rounding occurs both when you write 1000000.22F in your code (the compiler must find the exponent and mantissa that give a result closest to the decimal number to typed), and again when converting to decimal to display.
There isn't any decimal/binary type of rounding in the actual arithmetic operations, although arithmetic operations do have rounding error related to the limited number of mantissa bits.

Integer multiplied by float giving strange result [duplicate]

Why does the following program print what it prints?
class Program
{
static void Main(string[] args)
{
float f1 = 0.09f*100f;
float f2 = 0.09f*99.999999f;
Console.WriteLine(f1 > f2);
}
}
Output is
false
Floating point only has so many digits of precision. If you're seeing f1 == f2, it is because any difference requires more precision than a 32-bit float can represent.
I recommend reading What Every Computer Scientist Should Read About Floating Point
The main thing is this isn't just .Net: it's a limitation of the underlying system most every language will use to represent a float in memory. The precision only goes so far.
You can also have some fun with relatively simple numbers, when you take into account that it's not even base ten. 0.1 (1/10th), for example, is a repeating decimal when represented in binary, just as 1/3rd is when represented in decimal.
In this particular case, it’s because .09 and .999999 cannot be represented with exact precision in binary (similarly, 1/3 cannot be represented with exact precision in decimal). For example, 0.111111111111111111101111 base 2 is 0.999998986721038818359375 base 10. Adding 1 to the previous binary value, 0.11111111111111111111 base 2 is 0.99999904632568359375 base 10. There isn’t a binary value for exactly 0.999999. Floating point precision is also limited by the space allocated for storing the exponent and the fractional part of the mantissa. Also, like integer types, floating point can overflow its range, although its range is larger than integer ranges.
Running this bit of C++ code in the Xcode debugger,
float myFloat = 0.1;
shows that myFloat gets the value 0.100000001. It is off by 0.000000001. Not a lot, but if the computation has several arithmetic operations, the imprecision can be compounded.
imho a very good explanation of floating point is in Chapter 14 of Introduction to Computer Organization with x86-64 Assembly Language & GNU/Linux by Bob Plantz of California State University at Sonoma (retired) http://bob.cs.sonoma.edu/getting_book.html. The following is based on that chapter.
Floating point is like scientific notation, where a value is stored as a mixed number greater than or equal to 1.0 and less than 2.0 (the mantissa), times another number to some power (the exponent). Floating point uses base 2 rather than base 10, but in the simple model Plantz gives, he uses base 10 for clarity’s sake. Imagine a system where two positions of storage are used for the mantissa, one position is used for the sign of the exponent* (0 representing + and 1 representing -), and one position is used for the exponent. Now add 0.93 and 0.91. The answer is 1.8, not 1.84.
9311 represents 0.93, or 9.3 times 10 to the -1.
9111 represents 0.91, or 9.1 times 10 to the -1.
The exact answer is 1.84, or 1.84 times 10 to the 0, which would be 18400 if we had 5 positions, but, having only four positions, the answer is 1800, or 1.8 times 10 to the zero, or 1.8. Of course, floating point data types can use more than four positions of storage, but the number of positions is still limited.
Not only is precision limited by space, but “an exact representation of fractional values in binary is limited to sums of inverse powers of two.” (Plantz, op. cit.).
0.11100110 (binary) = 0.89843750 (decimal)
0.11100111 (binary) = 0.90234375 (decimal)
There is no exact representation of 0.9 decimal in binary. Even carrying the fraction out more places doesn’t work, as you get into repeating 1100 forever on the right.
Beginning programmers often see floating point arithmetic as more
accurate than integer. It is true that even adding two very large
integers can cause overflow. Multiplication makes it even more likely
that the result will be very large and, thus, overflow. And when used
with two integers, the / operator in C/C++ causes the fractional part
to be lost. However, ... floating point representations have their own
set of inaccuracies. (Plantz, op. cit.)
*In floating point, both the sign of the number and the sign of the exponent are represented.

Fix for numbers close to 2^5 and 2^7 imprecision problems [duplicate]

Why does the following program print what it prints?
class Program
{
static void Main(string[] args)
{
float f1 = 0.09f*100f;
float f2 = 0.09f*99.999999f;
Console.WriteLine(f1 > f2);
}
}
Output is
false
Floating point only has so many digits of precision. If you're seeing f1 == f2, it is because any difference requires more precision than a 32-bit float can represent.
I recommend reading What Every Computer Scientist Should Read About Floating Point
The main thing is this isn't just .Net: it's a limitation of the underlying system most every language will use to represent a float in memory. The precision only goes so far.
You can also have some fun with relatively simple numbers, when you take into account that it's not even base ten. 0.1 (1/10th), for example, is a repeating decimal when represented in binary, just as 1/3rd is when represented in decimal.
In this particular case, it’s because .09 and .999999 cannot be represented with exact precision in binary (similarly, 1/3 cannot be represented with exact precision in decimal). For example, 0.111111111111111111101111 base 2 is 0.999998986721038818359375 base 10. Adding 1 to the previous binary value, 0.11111111111111111111 base 2 is 0.99999904632568359375 base 10. There isn’t a binary value for exactly 0.999999. Floating point precision is also limited by the space allocated for storing the exponent and the fractional part of the mantissa. Also, like integer types, floating point can overflow its range, although its range is larger than integer ranges.
Running this bit of C++ code in the Xcode debugger,
float myFloat = 0.1;
shows that myFloat gets the value 0.100000001. It is off by 0.000000001. Not a lot, but if the computation has several arithmetic operations, the imprecision can be compounded.
imho a very good explanation of floating point is in Chapter 14 of Introduction to Computer Organization with x86-64 Assembly Language & GNU/Linux by Bob Plantz of California State University at Sonoma (retired) http://bob.cs.sonoma.edu/getting_book.html. The following is based on that chapter.
Floating point is like scientific notation, where a value is stored as a mixed number greater than or equal to 1.0 and less than 2.0 (the mantissa), times another number to some power (the exponent). Floating point uses base 2 rather than base 10, but in the simple model Plantz gives, he uses base 10 for clarity’s sake. Imagine a system where two positions of storage are used for the mantissa, one position is used for the sign of the exponent* (0 representing + and 1 representing -), and one position is used for the exponent. Now add 0.93 and 0.91. The answer is 1.8, not 1.84.
9311 represents 0.93, or 9.3 times 10 to the -1.
9111 represents 0.91, or 9.1 times 10 to the -1.
The exact answer is 1.84, or 1.84 times 10 to the 0, which would be 18400 if we had 5 positions, but, having only four positions, the answer is 1800, or 1.8 times 10 to the zero, or 1.8. Of course, floating point data types can use more than four positions of storage, but the number of positions is still limited.
Not only is precision limited by space, but “an exact representation of fractional values in binary is limited to sums of inverse powers of two.” (Plantz, op. cit.).
0.11100110 (binary) = 0.89843750 (decimal)
0.11100111 (binary) = 0.90234375 (decimal)
There is no exact representation of 0.9 decimal in binary. Even carrying the fraction out more places doesn’t work, as you get into repeating 1100 forever on the right.
Beginning programmers often see floating point arithmetic as more
accurate than integer. It is true that even adding two very large
integers can cause overflow. Multiplication makes it even more likely
that the result will be very large and, thus, overflow. And when used
with two integers, the / operator in C/C++ causes the fractional part
to be lost. However, ... floating point representations have their own
set of inaccuracies. (Plantz, op. cit.)
*In floating point, both the sign of the number and the sign of the exponent are represented.

C# Converting Double to Float Loses So Much Precision Why?

When I'm debugging my program using Visual Studio;
When I convert a double variable "91.151497095188446" to a float by using
(float)double_variable
I see resulting float variable is "91.1515".
Why I'm losing so much precision? Do I need to use another data type?
C# and many other languages use IEEE 754 as a specification for their floating-point data types. Floating-point numbers are expressed as a significand and an exponent, similar to how a decimal number, in scientific notation, is expressed as
1.234567890 x 10^12
^ ^
mantissa exponent
I won't go into the details (the Wikipedia article goes into that better than I can), but IEEE 754 specifies that:
for a 32-bit floating point number, such as the C# float data type, has 24 bits of precision for the significand, and 8 bits for the exponent.
for a 64-bit floating point number, such as the C# double data type, has 53 bits of precision for the significand, and 11 bits for the exponent.
Because a float only has 24 bits of precision, it can only express 7-8 digits of precision. Conversely, a double has 53 bits of precision so has about 15-16 digits of precision.
As has been said in the comments, if you don't want to lose precision, don't go from a double (64 bits in total) to a float (32 bits in total). Depending on your application, you could perhaps use the decimal data type which has 28-29 decimal digits of precision - but will come with penalties because (a) calculations involving it are slower than for double or float, and (b) it's typically far less supported by external libraries.
Note that you're talking about 91.15149709518846 which will actually be interpreted as 91.1514970951884 by the compiler - see, for example, this:
double value = 91.151497095188446;
Console.WriteLine(value);
// prints 91.1514970951884
You will find detailed explanation here: https://stackoverflow.com/a/2386882/10863059
Basically, float takes smaller memory space (4 bytes) when compared to double (8 bytes), but that's just one part of the story.
Floating-point types store fractional parts as inverse powers of two. For this reason, they
can only represent exact values such as 10, 10.25, 10.5, and so on. Other numbers,
such as 1.23 or 19.99, cannot be represented exactly and are only an approximation.
Even if double has 15 decimal digits of precision, as compared to only 7 for float,
precision loss starts to accumulate when performing repeated calculations.
This makes double and float difficult or even inappropriate to use in certain types of
applications, such as financial applications, where precision is key. For this purpose, the
decimal type is provided
Reference : Learn C# Programming, A guide to building a solid foundation in C# language
for writing efficient programs By Marius Bancila,
Raffaele Rialdi,
Ankit Sharma

Why is floating point arithmetic in C# imprecise?

Why does the following program print what it prints?
class Program
{
static void Main(string[] args)
{
float f1 = 0.09f*100f;
float f2 = 0.09f*99.999999f;
Console.WriteLine(f1 > f2);
}
}
Output is
false
Floating point only has so many digits of precision. If you're seeing f1 == f2, it is because any difference requires more precision than a 32-bit float can represent.
I recommend reading What Every Computer Scientist Should Read About Floating Point
The main thing is this isn't just .Net: it's a limitation of the underlying system most every language will use to represent a float in memory. The precision only goes so far.
You can also have some fun with relatively simple numbers, when you take into account that it's not even base ten. 0.1 (1/10th), for example, is a repeating decimal when represented in binary, just as 1/3rd is when represented in decimal.
In this particular case, it’s because .09 and .999999 cannot be represented with exact precision in binary (similarly, 1/3 cannot be represented with exact precision in decimal). For example, 0.111111111111111111101111 base 2 is 0.999998986721038818359375 base 10. Adding 1 to the previous binary value, 0.11111111111111111111 base 2 is 0.99999904632568359375 base 10. There isn’t a binary value for exactly 0.999999. Floating point precision is also limited by the space allocated for storing the exponent and the fractional part of the mantissa. Also, like integer types, floating point can overflow its range, although its range is larger than integer ranges.
Running this bit of C++ code in the Xcode debugger,
float myFloat = 0.1;
shows that myFloat gets the value 0.100000001. It is off by 0.000000001. Not a lot, but if the computation has several arithmetic operations, the imprecision can be compounded.
imho a very good explanation of floating point is in Chapter 14 of Introduction to Computer Organization with x86-64 Assembly Language & GNU/Linux by Bob Plantz of California State University at Sonoma (retired) http://bob.cs.sonoma.edu/getting_book.html. The following is based on that chapter.
Floating point is like scientific notation, where a value is stored as a mixed number greater than or equal to 1.0 and less than 2.0 (the mantissa), times another number to some power (the exponent). Floating point uses base 2 rather than base 10, but in the simple model Plantz gives, he uses base 10 for clarity’s sake. Imagine a system where two positions of storage are used for the mantissa, one position is used for the sign of the exponent* (0 representing + and 1 representing -), and one position is used for the exponent. Now add 0.93 and 0.91. The answer is 1.8, not 1.84.
9311 represents 0.93, or 9.3 times 10 to the -1.
9111 represents 0.91, or 9.1 times 10 to the -1.
The exact answer is 1.84, or 1.84 times 10 to the 0, which would be 18400 if we had 5 positions, but, having only four positions, the answer is 1800, or 1.8 times 10 to the zero, or 1.8. Of course, floating point data types can use more than four positions of storage, but the number of positions is still limited.
Not only is precision limited by space, but “an exact representation of fractional values in binary is limited to sums of inverse powers of two.” (Plantz, op. cit.).
0.11100110 (binary) = 0.89843750 (decimal)
0.11100111 (binary) = 0.90234375 (decimal)
There is no exact representation of 0.9 decimal in binary. Even carrying the fraction out more places doesn’t work, as you get into repeating 1100 forever on the right.
Beginning programmers often see floating point arithmetic as more
accurate than integer. It is true that even adding two very large
integers can cause overflow. Multiplication makes it even more likely
that the result will be very large and, thus, overflow. And when used
with two integers, the / operator in C/C++ causes the fractional part
to be lost. However, ... floating point representations have their own
set of inaccuracies. (Plantz, op. cit.)
*In floating point, both the sign of the number and the sign of the exponent are represented.

Categories

Resources