Parsing floats with Single.Parse() - c#

Here comes a silly question. I'm playing with the parse function of System.Single and it behaves unexpected which might be because I don't really understand floating-point numbers. The MSDN page of System.Single.MaxValue states that the max value is 3.402823e38, in standard form that is
340282300000000000000000000000000000000
If I use this string as an argument for the Parse() method, it will succeed without error, if I change any of the zeros to an arbitrary digit it will still succeed without error (although it seems to ignore them looking at the result). In my understanding, that exceeds the limit, so What am I missing?

It may be easier to think about this by looking at some lower numbers. All (positive) integers up to 16777216 can be exactly represented in a float. After that point, only every other integer can be represented (up to the next time we hit a limit, at which point it's only every 4th integer that can be represented).
So what has to happen then is the 16777218 has to stand for 16777218∓1, 16777220 has to stand for 16777220∓1, etc. As you move up into even larger numbers, the range of integers that each value has to "represent" grows wider and wider - until the point where 340282300000000000000000000000000000000 represents all numbers in the range 340282300000000000000000000000000000000∓100000000000000000000000000000000, approximately (I've not actually worked out what the right ∓ value is here, but hopefully you get the point)
Number Significand Exponent
16777215 = 1 11111111111111111111111 2^0 = 111111111111111111111111
16777216 = 1 00000000000000000000000 2^1 = 1000000000000000000000000
16777218 = 1 00000000000000000000001 2^1 = 1000000000000000000000010
^
|
Implicit leading bit

That's actually not true - change the first 0 to 9 and you will see an exception. Actually change it to anything 6 and up and it blows up.
Any other number is just rounded down as float is not an 100% accurate representation of a decimal with 38+1 positions that's fine.

A floating point number is not like a decimal. It comprises a mantissa that carries the significant digits and an exponent that effectively says how far left or right of the decimal point to place the mantissa. A System.Single can only handle seven significant digits in the mantissa. If you replace any of your trailing zeroes with an arbitrary digit it is being lost when your decimal is converted into the mantissa and exponent form.

Good question. That is happening because the fact you can save a number with that range doesn't mean this type'll have enough precision to hold it. You can only store ~6-7 leading digits for floats and add an exponent to describe decimal point position.
0.012345 and 1234500 hold the same amount of informations - same mantissa, different exponents. The MSDN states only that value AFTRER EXPONENTIATION cannot be bigger, than MaxValue.

Related

Fix for numbers close to 2^5 and 2^7 imprecision problems [duplicate]

Why does the following program print what it prints?
class Program
{
static void Main(string[] args)
{
float f1 = 0.09f*100f;
float f2 = 0.09f*99.999999f;
Console.WriteLine(f1 > f2);
}
}
Output is
false
Floating point only has so many digits of precision. If you're seeing f1 == f2, it is because any difference requires more precision than a 32-bit float can represent.
I recommend reading What Every Computer Scientist Should Read About Floating Point
The main thing is this isn't just .Net: it's a limitation of the underlying system most every language will use to represent a float in memory. The precision only goes so far.
You can also have some fun with relatively simple numbers, when you take into account that it's not even base ten. 0.1 (1/10th), for example, is a repeating decimal when represented in binary, just as 1/3rd is when represented in decimal.
In this particular case, it’s because .09 and .999999 cannot be represented with exact precision in binary (similarly, 1/3 cannot be represented with exact precision in decimal). For example, 0.111111111111111111101111 base 2 is 0.999998986721038818359375 base 10. Adding 1 to the previous binary value, 0.11111111111111111111 base 2 is 0.99999904632568359375 base 10. There isn’t a binary value for exactly 0.999999. Floating point precision is also limited by the space allocated for storing the exponent and the fractional part of the mantissa. Also, like integer types, floating point can overflow its range, although its range is larger than integer ranges.
Running this bit of C++ code in the Xcode debugger,
float myFloat = 0.1;
shows that myFloat gets the value 0.100000001. It is off by 0.000000001. Not a lot, but if the computation has several arithmetic operations, the imprecision can be compounded.
imho a very good explanation of floating point is in Chapter 14 of Introduction to Computer Organization with x86-64 Assembly Language & GNU/Linux by Bob Plantz of California State University at Sonoma (retired) http://bob.cs.sonoma.edu/getting_book.html. The following is based on that chapter.
Floating point is like scientific notation, where a value is stored as a mixed number greater than or equal to 1.0 and less than 2.0 (the mantissa), times another number to some power (the exponent). Floating point uses base 2 rather than base 10, but in the simple model Plantz gives, he uses base 10 for clarity’s sake. Imagine a system where two positions of storage are used for the mantissa, one position is used for the sign of the exponent* (0 representing + and 1 representing -), and one position is used for the exponent. Now add 0.93 and 0.91. The answer is 1.8, not 1.84.
9311 represents 0.93, or 9.3 times 10 to the -1.
9111 represents 0.91, or 9.1 times 10 to the -1.
The exact answer is 1.84, or 1.84 times 10 to the 0, which would be 18400 if we had 5 positions, but, having only four positions, the answer is 1800, or 1.8 times 10 to the zero, or 1.8. Of course, floating point data types can use more than four positions of storage, but the number of positions is still limited.
Not only is precision limited by space, but “an exact representation of fractional values in binary is limited to sums of inverse powers of two.” (Plantz, op. cit.).
0.11100110 (binary) = 0.89843750 (decimal)
0.11100111 (binary) = 0.90234375 (decimal)
There is no exact representation of 0.9 decimal in binary. Even carrying the fraction out more places doesn’t work, as you get into repeating 1100 forever on the right.
Beginning programmers often see floating point arithmetic as more
accurate than integer. It is true that even adding two very large
integers can cause overflow. Multiplication makes it even more likely
that the result will be very large and, thus, overflow. And when used
with two integers, the / operator in C/C++ causes the fractional part
to be lost. However, ... floating point representations have their own
set of inaccuracies. (Plantz, op. cit.)
*In floating point, both the sign of the number and the sign of the exponent are represented.

Finding minimum precision

I'm trying to write an algorithm to find the minimum number of precision digits to show at least one significant figure using .Net string formatter.
eg.
Value Precision wanted:
----- -----------------
10 0
1 0
0.1 1
0.99 1
0.01 2
0.009 3
(don't care about further digits, only the first, hence 0.99 only need a precision of 1.)
The best I can come up with is:
int precision = (int)Math.Abs(Math.Min(0, Math.Floor(Math.Log10(value))));
This works fine but I can't help thinking there's a more elegant solution. Can any maths gurus help me out?
Slightly shorter:
int precision = (int)Math.Max(0, -Math.Floor(Math.Log10(value)));
A float is a binary representation of a fractional value. If your initial value is a single precision floating point number, then the number is multiplied by an exponent to get the value,
((f & 0x7f800000) >> 23)-127 .
If the exponent is non-negative, based on what you have said you would get back zero (you have numbers before the decimal point).
If the exponent is negative, well, that is annoying since binary digits don't line up well with decimal. It should be doable with a look up table, though. Check out https://math.stackexchange.com/questions/3987/how-to-calculate-the-number-of-decimal-digits-for-a-binary-number#4080.
Edit: you should read about the way floating point numbers are stored.

Precision of double after decimal point

In the lunch break we started debating about the precision of the double value type.
My colleague thinks, it always has 15 places after the decimal point.
In my opinion one can't tell, because IEEE 754 does not make assumptions
about this and it depends on where the first 1 is in the binary
representation. (i.e. the size of the number before the decimal point counts, too)
How can one make a more qualified statement?
As stated by the C# reference, the precision is from 15 to 16 digits (depending on the decimal values represented) before or after the decimal point.
In short, you are right, it depends on the values before and after the decimal point.
For example:
12345678.1234567D //Next digit to the right will get rounded up
1234567.12345678D //Next digit to the right will get rounded up
Full sample at: http://ideone.com/eXvz3
Also, trying to think about double value as fixed decimal values is not a good idea.
You're both wrong. A normal double has 53 bits of precision. That's roughly equivalent to 16 decimal digits, but thinking of double values as though they were decimals leads to no end of confusion, and is best avoided.
That said, you are much closer to correct than your colleague--the precision is relative to the value being represented; sufficiently large doubles have no fractional digits of precision.
For example, the next double larger than 4503599627370496.0 is 4503599627370497.0.
C# doubles are represented according to IEEE 754 with a 53 bit significand p (or mantissa) and a 11 bit exponent e, which has a range between -1022 and 1023. Their value is therefore
p * 2^e
The significand always has one digit before the decimal point, so the precision of its fractional part is fixed. On the other hand the number of digits after the decimal point in a double depends also on its exponent; numbers whose exponent exceeds the number of digits in the fractional part of the significand do not have a fractional part themselves.
What Every Computer Scientist Should Know About Floating-Point Arithmetic is probably the most widely recognized publication on this subject.
Since this is the only question on SO that I could find on this topic, I would like to make an addition to jorgebg's answer.
According to this, precision is actually 15-17 digits. An example of a double with 17 digits of precision would be 0.92107099070578813 (don't ask me how I got that number :P)

Which values cannot be represented correctly by a double

The Double data type cannot correctly represent some base 10 values. This is because of how floating point numbers represent real numbers. What this means is that when representing monetary values, one should use the decimal value type to prevent errors. (feel free to correct errors in this preamble)
What I want to know is what are the values which present such a problem under the Double data-type under a 64 bit architecture in the standard .Net framework (C# if that makes a difference) ?
I expect the answer the be a formula or rule to find such values but I would also like some example values.
Any number which cannot be written as the sum of positive and negative powers of 2 cannot be exactly represented as a binary floating-point number.
The common IEEE formats for 32- and 64-bit representations of floating-point numbers impose further constraints; they limit the number of binary digits in both the significand and the exponent. So there are maximum and minimum representable numbers (approximately +/- 10^308 (base-10) if memory serves) and limits to the precision of a number that can be represented. This limit on the precision means that, for 64-bit numbers, the difference between the exponent of the largest power of 2 and the smallest power in a number is limited to 52, so if your number includes a term in 2^52 it can't also include a term in 2^-1.
Simple examples of numbers which cannot be exactly represented in binary floating-point numbers include 1/3, 2/3, 1/5.
Since the set of floating-point numbers (in any representation) is finite, and the set of real numbers is infinite, one algorithm to find a real number which is not exactly representable as a floating-point number is to select a real number at random. The probability that the real number is exactly representable as a floating-point number is 0.
You generally need to be prepared for the possibility that any value you store in a double has some small amount of error. Unless you're storing a constant value, chances are it could be something with at least some error. If it's imperative that there never be any error, and the values aren't constant, you probably shouldn't be using a floating point type.
What you probably should be asking in many cases is, "How do I deal with the minor floating point errors?" You'll want to know what types of operations can result in a lot of error, and what types don't. You'll want to ensure that comparing two values for "equality" actually just ensures they are "close enough" rather than exactly equal, etc.
This question actually goes beyond any single programming language or platform. The inaccuracy is actually inherent in binary data.
Consider that with a double, each number N to the left (at 0-based index I) of the decimal point represents the value N * 2^I and every digit to the right of the decimal point represents the value N * 2^(-I).
As an example, 5.625 (base 10) would be 101.101 (base 2).
Given this calculation, and decimal value that can't be calculated as a sum of 2^(-I) for different values of I would have an incorrect value as a double.
A float is represented as s, e and m in the following formula
s * m * 2^e
This means that any number that cannot be represented using the given expression (and in the respective domains of s, e and m) cannot be represented exactly.
Basically, you can represent all numbers between 0 and 2^53 - 1 multiplied by a certain power of two (possibly a negative power).
As an example, all numbers between 0 and 2^53 - 1 can be represented multiplied with 2^0 = 1. And you can also represent all those numbers by dividing them by 2 (with a .5 fraction). And so on.
This answer does not fully cover the topic, but I hope it helps.

C# float infinite loop

The following code in C# (.Net 3.5 SP1) is an infinite loop on my machine:
for (float i = 0; i < float.MaxValue; i++) ;
It reached the number 16777216.0 and 16777216.0 + 1 is evaluates to 16777216.0. Yet at this point: i + 1 != i.
This is some craziness.
I realize there is some inaccuracy in how floating point numbers are stored. And I've read that whole numbers greater 2^24 than cannot be properly stored as a float.
Still the code above, should be valid in C# even if the number cannot be properly represented.
Why does it not work?
You can get the same to happen for double but it takes a very long time. 9007199254740992.0 is the limit for double.
Right, so the issue is that in order to add one to the float, it would have to become
16777217.0
It just so happens that this is at a boundary for the radix and cannot be represented exactly as a float. (The next highest value available is 16777218.0)
So, it rounds to the nearest representable float
16777216.0
Let me put it this way:
Since you have a floating amount of precision, you have to increment up by a higher-and-higher number.
EDIT:
Ok, this is a little bit difficult to explain, but try this:
float f = float.MaxValue;
f -= 1.0f;
Debug.Assert(f == float.MaxValue);
This will run just fine, because at that value, in order to represent a difference of 1.0f, you would need over 128 bits of precision. A float has only 32 bits.
EDIT2
By my calculations, at least 128 binary digits unsigned would be necessary.
log(3.40282347E+38) * log(10) / log(2) = 128
As a solution to your problem, you could loop through two 128 bit numbers. However, this will take at least a decade to complete.
Imagine for example that a floating point number is represented by up to 2 significant decimal digits, plus an exponent: in that case, you could count from 0 to 99 exactly. The next would be 100, but because you can only have 2 significant digits that would be stored as "1.0 times 10 to the power of 2". Adding one to that would be ... what?
At best, it would be 101 as an intermediate result, which would actually be stored (via a rounding error which discards the insignificant 3rd digit) as "1.0 times 10 to the power of 2" again.
To understand what's going wrong you're going to have to read the IEEE standard on floating point
Let's examine the structure of a floating point number for a second:
A floating point number is broken into two parts (ok 3, but ignore the sign bit for a second).
You have a exponent and a mantissa. Like so:
smmmmmmmmeeeeeee
Note: that is not acurate to the number of bits, but it gives you a general idea of what's happening.
To figure out what number you have we do the following calculation:
mmmmmm * 2^(eeeeee) * (-1)^s
So what is float.MaxValue going to be? Well you're going to have the largest possible mantissa and the largest possible exponent. Let's pretend this looks something like:
01111111111111111
in actuality we define NAN and +-INF and a couple other conventions, but ignore them for a second because they're not relevant to your question.
So, what happens when you have 9.9999*2^99 + 1? Well, you do not have enough significant figures to add 1. As a result it gets rounded down to the same number. In the case of single floating point precision the point at which +1 starts to get rounded down happens to be 16777216.0
It has nothing to do with overflow, or being near the max value. The float value for 16777216.0 has a binary representation of 16777216. You then increment it by 1, so it should be 16777217.0, except that the binary representation of 16777217.0 is 16777216!!! So it doesn't actually get incremented or at least the increment doesn't do what you expect.
Here is a class written by Jon Skeet that illustrates this:
DoubleConverter.cs
Try this code with it:
double d1 = 16777217.0;
Console.WriteLine(DoubleConverter.ToExactString(d1));
float f1 = 16777216.0f;
Console.WriteLine(DoubleConverter.ToExactString(f1));
float f2 = 16777217.0f;
Console.WriteLine(DoubleConverter.ToExactString(f2));
Notice how the internal representation of 16777216.0 is the same 16777217.0!!
The iteration when i approaches float.MaxValue has i just below this value. The next iteration adds to i, but it can't hold a number bigger than float.MaxValue. Thus it holds a value much smaller, and begins the loop again.

Categories

Resources