Excerpt from a book:
A float value consists of a 24-bit
signed mantissa and an 8-bit signed
exponent. The precision is approximately seven decimal digits.
Values
range from
-3.402823 × 10^38 to 3.402823 × 10^38
How to calculate this range? Can someone explain the binary arithmetic?
You need to read "What Every Computer Scientist Should Know About Floating-Point Arithmetic" which will explain how floating point numbers are stored, which will also answer your question.
I would definitely read the article to which Richard points. But if you need a simpler explanation, I hope this helps:
Basically, as you said, there is 1 sign bit, 8 bits for exponent, and 23 for fraction.
Then, using this equation (from Wikipedia)
N = (1 - 2s) * 2^(x-127) * (1 + m*2^-23)
where s is the sign bit, x is the exponent (minus the 127 bias), and m is the fractional part treated as a whole number (the equation above transforms the whole number into the appropriate fraction value).
Note, that the exponent value of 0xFF is reserved to represent infinity. So the largest exponent of a real value is 0xFE.
you see that the maximum value is
N = (1 - 2*0) * 2^(254-127) * (1 + (2^23 - 1) * 2^-23)
N = 1 * 2^127 * 1.999999
N = 3.4 x 10^34
The minimum value would be the same but with the sign bit set, which would simply negate the value to give you -3.4 X 10^34.
Q.E.D.
Related
If you evaluate
(Math.PI).ToString("G51")
the result is
"3,141592653589793115997963468544185161590576171875"
Math.Pi is correct up to the 15th decimal which is in accordance with the precision of a double. "G17" is used for round tripping, so I expected the ToString to stop after "G17", but instead it continued to produce numbers up to "G51".
Q Since the digits after 15th decimal are beyond the precision afforded by the 53 bits of mantissa, how are the remaining digits calculated?
A double has a 53 bit mantissa which corresponds to 15 decimals
A double has a 53 bit mantissa which corresponds to roughly 15 decimals.
Math.Pi is truncated at 53 bits in binary, which is not precisely equal to
3.1415926535897
but it's also not quite equal to
3.14159265358979
, etc.
Just like 1/3 is not quite 0.3, or 0.33, etc.
If you continue the decimal expansion long enough it will eventually terminate or start repeating, since Math.PI, unlike π is a rational number.
And as you can verify with a BigRational type
using System.Numerics;
var r = new BigRational(Math.PI);
BigRational dem;
BigRational.NumDen(r, out dem);
var num = r * dem;
Console.WriteLine(num);
Console.WriteLine("-----------------");
Console.WriteLine(dem);
it's
884279719003555
---------------
281474976710656
And since the denominator is a power of 2 (which it has to be since Math.PI has a binary mantissa and so a terminating representation in base-2). It also therefore has a terminating representation in base-10, which is exactly what was given by (Math.PI).ToString("G51"):
3.141592653589793115997963468544185161590576171875
as
Console.WriteLine(r == BigRational.Parse("3.141592653589793115997963468544185161590576171875"));
outputs
True
The .ToString("G51") method uses 'school math'. First it takes the integer part, then it multiplies the Remainder by 10, use the integer part of that calculation and multiplies the Remainder part by 10.
It will keep doing that as long as there is a remainder and up to 51 decimal digits.
Since double has an exponent, it will keep having a remainder with about 18 decimal digits, which produces the next digit in ToString method.
However, as you have seen, the Remainder part is not accurate but simply a result of a math function.
The documentation of Random.NextDouble():
Returns a random floating-point number that is greater than or equal to 0.0, and less than 1.0.
So, it can be exactly 0. But what are the chances for that?
var random = new Random();
var x = random.NextDouble()
if(x == 0){
// probability for this?
}
It would be easy to calculate the probability for Random.Next() being 0, but I have no idea how to do it in this case...
As mentioned in comments, it depends on internal implementation of NextDouble. In "old" .NET Framework, and in modern .NET up to version 5, it looks like this:
protected virtual double Sample() {
return (InternalSample()*(1.0/MBIG));
}
InternalSample returns integer in 0 to Int32.MaxValue range, 0 included, int.MaxValue excluded. We can assume that the distribution of InternalSample is uniform (in the docs for Next method, which just calls InternalSample, there are clues that it is, and it seems there is no reason to use non-uniform distribution in general-purpose RNG for integers). That means every number is equally likely. Then, we have 2,147,483,647 numbers in distribution, and the probability to draw 0 is 1 / 2,147,483,647.
In modern .NET 6+ there are two implementations. First is used when you provide explicit seed value to Random constructor. This implementation is the same as above, and is used for compatibility reasons - so that old code relying on the seed value to produce deterministic results will not break while moving to the new .NET version.
Second implementation is a new one and is used when you do NOT pass seed into Random constructor. Source code:
public override double NextDouble() =>
// As described in http://prng.di.unimi.it/:
// "A standard double (64-bit) floating-point number in IEEE floating point format has 52 bits of significand,
// plus an implicit bit at the left of the significand. Thus, the representation can actually store numbers with
// 53 significant binary digits. Because of this fact, in C99 a 64-bit unsigned integer x should be converted to
// a 64-bit double using the expression
// (x >> 11) * 0x1.0p-53"
(NextUInt64() >> 11) * (1.0 / (1ul << 53));
We first obtain random 64-bit unsigned integer. Now, we could multiply it by 1 / 2^64 to obtain double in 0..1 range, but that would make the resulting distribution biased. double is represented by 53-bit mantissa (52 bits are explicit and one is implicit) ,exponent and sign. For all integer values exponent is the same, so that leaves us with 53 bits to represent integer values. But we have 64-bit integer here. This means integer values less than 2^53 can be represented exactly by double but bigger integers can not. For example:
ulong l1 = 1ul << 53;
ulong l2 = l1 + 1;
double d1 = l1;
double d2 = l2;
Console.WriteLine(d1 == d2);
Prints "true", so two different integers map to the same double value. That means if we just multiply our 64-bit integer by 1 / 2^64 - we'll get a biased non-uniform distribution, because many integers bigger than 2^53-1 will map to the same values.
So instead, we throw away 11 bits, and multiply the result by 1 / 2^53 to get uniform distibution in 0..1 range. The probability to get 0 is then 1 / 2^53 (1 / 9,007,199,254,740,992). This implementation is better than the old one, because is provides much more different doubles in 0 .. 1 range (2^53 compared to 2^32 in old one).
You also asked in comments:
If one knows how many numbers there are between 0 inclusive and 1
exclusive (according to IEEE 754), it would be possible to answer the
'probability' question, because 0 is one of all of them
That's not so. There are actually more than 2^53 representable numbers between 0..1 in IEEE 754. We have 52 bits of mantissa, then we have 11 bits of exponent, half of which is for negative exponents. Almost all negative exponents (rougly half of that 11 bit range) combined with mantissa gives us distinct value in 0..1 range.
Why we can't use full 0..1 range which IEEE allows us to generate random number? Because this range is not uniform (like the full double range is not uniform itself). For example there are more representable numbers in 0 .. 0.5 range than in 0.5 .. 1 range.
This is from a strictly academic persepective.
From Double Struct:
All floating-point numbers also have a limited number of significant
digits, which also determines how accurately a floating-point value
approximates a real number. A Double value has up to 15 decimal digits
of precision, although a maximum of 17 digits is maintained
internally. This means that some floating-point operations may lack
the precision to change a floating point value.
If only 15 decimal digits are significant, then your possible return values are:
0.000000000000000
To:
0.999999999999999
Said differently, you have 10^15 possible (comparably different, "distinct") values (see Permutations in the first answer):
10^15 = 1,000,000,000,000,000
Zero is just ONE of those possibilities:
1 / 1,000,000,000,000,000 = 0.000000000000001
Stated as a percentage:
0.0000000000001% chance of zero being randomly selected?
I think this is the closest "correct" answer you're going to get...
...whether it performs this way in practice is possibly a different story.
Just create a simple program, and let it run until you are satisfied by the number of tries done. (see: https://onlinegdb.com/ij1M50gRQ)
Random r = new Random();
Double d ;
int attempts=0;
int attempts0=0;
while (true) {
d = Math.Round(r.NextDouble(),3);
if(d==0) attempts0++;
attempts++;
if (attempts%1000000==0) Console.WriteLine($"Attempts: {attempts}, with {attempts0} times a 0 value, this is {Math.Round(100.0*attempts0/attempts,3)} %");
}
example output:
...
Attempts: 208000000, with 103831 times a 0 value, this is 0.05 %
Attempts: 209000000, with 104315 times a 0 value, this is 0.05 %
Attempts: 210000000, with 104787 times a 0 value, this is 0.05 %
Attempts: 211000000, with 105305 times a 0 value, this is 0.05 %
Attempts: 212000000, with 105853 times a 0 value, this is 0.05 %
Attempts: 213000000, with 106349 times a 0 value, this is 0.05 %
Attempts: 214000000, with 106839 times a 0 value, this is 0.05 %
...
Changing the value of d to be rounded to 2 decimals will return 0.5%
Excerpt from a book:
A float value consists of a 24-bit
signed mantissa and an 8-bit signed
exponent. The precision is approximately seven decimal digits.
Values
range from
-3.402823 × 10^38 to 3.402823 × 10^38
How to calculate this range? Can someone explain the binary arithmetic?
You need to read "What Every Computer Scientist Should Know About Floating-Point Arithmetic" which will explain how floating point numbers are stored, which will also answer your question.
I would definitely read the article to which Richard points. But if you need a simpler explanation, I hope this helps:
Basically, as you said, there is 1 sign bit, 8 bits for exponent, and 23 for fraction.
Then, using this equation (from Wikipedia)
N = (1 - 2s) * 2^(x-127) * (1 + m*2^-23)
where s is the sign bit, x is the exponent (minus the 127 bias), and m is the fractional part treated as a whole number (the equation above transforms the whole number into the appropriate fraction value).
Note, that the exponent value of 0xFF is reserved to represent infinity. So the largest exponent of a real value is 0xFE.
you see that the maximum value is
N = (1 - 2*0) * 2^(254-127) * (1 + (2^23 - 1) * 2^-23)
N = 1 * 2^127 * 1.999999
N = 3.4 x 10^34
The minimum value would be the same but with the sign bit set, which would simply negate the value to give you -3.4 X 10^34.
Q.E.D.
Why does:
double dividend = 1.0;
double divisor = 3.0;
Console.WriteLine(dividend / divisor * divisor);
output 1.0,
but:
decimal dividend = 1;
decimal divisor = 3;
Console.WriteLine(dividend / divisor * divisor);
outputs 0.9999999999999999999999999999
?
I understand that 1/3 can't be computed exactly, so there must be some rounding.
But why does Double round the answer to 1.0, but Decimal does not?
Also, why does double compute 1.0/3.0 to be 0.33333333333333331?
If rounding is used, then wouldn't the last 3 get rounded to 0, why 1?
Why 1/3 as a double is 0.33333333333333331
The closest way to represent 1/3 in binary is like this:
0.0101010101...
That's the same as the series 1/4 + (1/4)^2 + (1/4)^3 + (1/4)^4...
Of course, this is limited by the number of bits you can store in a double. A double is 64 bits, but one of those is the sign bit and another 11 represent the exponent (think of it like scientific notation, but in binary). So the rest, which is called the mantissa or significand is 52 bits. Assume a 1 to start and then use two bits for each subsequent power of 1/4. That means you can store:
1/4 + 1/4^2 + ... + 1/4 ^ 27
which is 0.33333333333333331
Why multiplying by 3 rounds this to 1
So 1/3 represented in binary and limited by the size of a double is:
0.010101010101010101010101010101010101010101010101010101
I'm not saying that's how it's stored. Like I said, you store the bits starting after the 1, and you use separate bits for the exponent and the sign. But I think it's useful to consider how you'd actually write it in base 2.
Let's stick with this "mathematician's binary" representation and ignore the size limits of a double. You don't have to do it this way, but I find it convenient. If we want to take this approximation for 1/3 and multiply by 3, that's the same as bit shifting to multiply by 2 and then adding what you started with. This gives us 1/3 * 3 = 0.111111111111111111111111111111111111111111111111111111
But can a double store that? No, remember, you can only have 52 bits of mantissa after the first 1, and that number has 54 ones. So we know that it'll be rounded, in this case rounded up to exactly 1.
Why for decimal you get 0.9999999999999999999999999999
With decimal, you get 96 bits to represent an integer, with additional bits representing the exponent up to 28 powers of 10. So even though ultimately it's all stored as binary, here we're working with powers of 10 so it makes sense to think of the number in base 10. 96 bits lets us express up to 79,228,162,514,264,337,593,543,950,335, but to represent 1/3 we're going to go with all 3's, up to the 28 of them that we can shift to the right of the decimal point: 0.3333333333333333333333333333.
Multiplying this approximation for 1/3 by 3 gives us a number we can represent exactly. It's just 28 9's, all shifted to the right of the decimal point: 0.9999999999999999999999999999. So unlike with double's there's not a second round of rounding at this point.
This is by design of the decimal type which is optimized for accuracy unlike the double type which is optimized for low accuracy but higher performance.
The Decimal value type represents decimal numbers ranging from positive 79,228,162,514,264,337,593,543,950,335 to negative 79,228,162,514,264,337,593,543,950,335.
The Decimal value type is appropriate for financial calculations requiring large numbers of significant integral and fractional digits and no round-off errors. The Decimal type does not eliminate the need for rounding. Rather, it minimizes errors due to rounding. Thus your code produces a result of 0.9999999999999999999999999999 rather than 1.
One reason that infinite decimals are a necessary extension of finite decimals is to represent fractions. Using long division, a simple division of integers like 1⁄9 becomes a recurring decimal, 0.111…, in which the digits repeat without end. This decimal yields a quick proof for 0.999… = 1. Multiplication of 9 times 1 produces 9 in each digit, so 9 × 0.111… equals 0.999… and 9 × 1⁄9 equals 1, so 0.999… = 1:
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is floating point arithmetic in C# imprecise?
Why is there a bias in floating point ops? Any specific reason?
Output:
160
139
static void Main()
{
float x = (float) 1.6;
int y = (int)(x * 100);
float a = (float) 1.4;
int b = (int)(a * 100);
Console.WriteLine(y);
Console.WriteLine(b);
Console.ReadKey();
}
Any rational number that has a denominator that is not a power of 2 will lead to an infinite number of digits when represented as a binary. Here you have 8/5 and 7/5. Therefore there is no exact binary representation as a floating-point number (unless you have infinite memory).
The exact binary representation of 1.6 is 110011001100110011001100110011001100...
The exact binary representation of 1.4 is 101100110011001100110011001100110011...
Both values have an infinite number of digits (1100 is repeated endlessly).
float values have a precision of 24 bits. So the binary representation of any value will be rounded to 24 bits. If you round the given values to 24 bits you get:
1.6: 110011001100110011001101 (decimal 13421773) - rounded up
1.4: 101100110011001100110011 (decimal 11744051) - rounded down
Both values have an exponent of 0 (the first bit is 2^0 = 1, the second is 2^-1 = 0.5 etc.).
Since the first bit in a 24 bit value is 2^23 you can calculate the exact decimal values by dividing the 24 bit values (13421773 and 11744051) by two 23 times.
The values are: 1.60000002384185791015625 and 1.39999997615814208984375.
When using floating-point types you always have to consider that their precision is finite. Values that can be written exact as decimal values might be rounded up or down when represented as binaries. Casting to int does not respect that because it truncates the given values. You should always use something like Math.Round.
If you really need an exact representation of rational numbers you need a completely different approach. Since rational numbers are fractions you can use integers to represent them. Here is an example of how you can achieve that.
However, you can not write Rational x = (Rational)1.6 then. You have to write something like Rational x = new Rational(8, 5) (or new Rational(16, 10) etc.).
This is due to the fact that floating point arithmetic is not precise. When you set a to 1.4, internally it may not be exactly 1.4, just as close as can be made with machine precision. If it is fractionally less than 1.4, then multiplying by 100 and casting to integer will take only the integer portion which in this case would be 139. You will get far more technically precise answers but essentially this is what is happening.
In the case of your output for the 1.6 case, the floating point representation may actually be minutely larger than 1.6 and so when you multiply by 100, the total is slightly larger than 160 and so the integer cast gives you what you expect. The fact is that there is simply not enough precision available in a computer to store every real number exactly.
See this link for details of the conversion from floating point to integer types http://msdn.microsoft.com/en-us/library/aa691289%28v=vs.71%29.aspx - it has its own section.
The floating point types float (32 bit) and double (64 bit) have a limited precision and more over the value is represented as a binary value internally. Just as you cannot represent 1/7 precisely in a decimal system (~ 0.1428571428571428...), 1/10 cannot be represented precisely in a binary system.
You can however use the decimal type. It still has a limited (however high) precision, but the numbers a represented in a decimal way internally. Therefore a value like 1/10 is represented exactly like 0.1000000000000000000000000000 internally. 1/7 is still a problem for decimal. But at least you don't get a loss of precision by converting to binary and then back to decimal.
Consider using decimal.