Float and double - Significand numbers- Mantissa POV?

Float and double - Significand numbers- Mantissa POV? - c#

With Single precision (32 bits): the bits division goes like this :
So we have 23 bits of mantissa/Significand .
So we can represent 2^23 numbers (via 23 bits ) : which is 8388608 --> which is 7 digit long.
BUT
I was reading that the mantissa is normalized (the leading digit in the mantissa will always be a 1) - so the pattern is actually 1.mmm and only the mmm is represented in the mantissa.
for example : look here :
0.75 is represented but it's actually 1.75
Question #1
So basically it adds 1 more precision digit....no ?
If so then we have 8 Significand !
So why does msdn says : 7 ?
Question #2
In double there are 52 bits for mantissa. (0..51)
If I add 1 for the normalized mantissa so its 2^53 possibilites which is : 9007199254740992 ( 16 digits)
and MS does say : 15-16 :
Why is this inconsistency ? am I missing something ?

It doesn't add one more decimal digit - just a single binary digit. So instead of 23 bits, you have 24 bits. This is handy, because the only number you can't represent as starting with a one is zero, and that's a special value.
In short, you're not looking at 2 ^ 24 (that would be a decimal number, base-10) - you're looking at 2 ^ (-24). That's the most important difference between float-double and decimal. decimal is what you imagine floats to be, ie. a simple exponent-shifted, base-10 number. float and double aren't that.
Now, decimal digits versus binary digits is a tricky matter. You're mistaken in your understanding that the precision has anything to do with the 2 ^ 24 figure - that would only be true if you were talking about e.g. the decimal type, which actually stores decimal values as decimal point offsets of a normal (huge-ass) integer.
Just like 1 / 3 cannot be written in decimal (0.333333...), many simple decimal numbers can't be represented in a float precisely (0.2 is the typical example). decimal doesn't have a problem with that - it's just 2 shifted one digit to the right, easy peasy. For floats, however, you have to represent this value as a sum of negative powers of two - 0.5, 0.25, 0.125 ... The same would apply in the opposite direction if 2 wasn't a factor of 10 - every finite binary "decimal" can be represented with finite precision in decimal.
Now, in fact, float can easily represent a number with 24 decimal digits - it just has to be 2 ^ (-24) - a number you're not going to encounter in your usual day job, and a weird number in decimal. So where does the 7 (actually more like 7.22...) come from? Simple, just do a decimal logarithm of 2 ^ (-24).
The fact that it seems that 0.2 can be represented "exactly" in a float is simply because everytime you e.g. convert it to a string, you're rounding. So, even though the number isn't 0.2 exactly, it ends up that way when you convert it to a decimal number.
All this means that when you need decimal precision, you want to use decimal, as simple as that. This is not because it's a better base for calculations, it's simply because humans use it, and they will not be happy if your application gives different results from what they calculate on a piece of paper - especially when dealing with money. Accountants are very focused on having everything correct to the least significant digit.
Floats are used where it's not about decimal precision, but rather about generally having some sort of precision - this makes them well suited for physics calculations and similar, because you don't actually care about having the number come up the same in decimal - you're working with a given precision, and you're going to get that - 24 significant binary "decimals".

The implied leading 1 adds one more binary digit of precision, not decimal.

Related

Wasted exponent bit in C# double representation

I've been doing research on floating-point doubles in .NET lately. While reading Jon Skeet's article Binary floating points and .NET, I had a question.
Let's start with the example of 46.428292315077 in the article.
Represented as a 64 bit double, this equates to the following bits:
Sign Exponent Mantissa
0 10000000100 0111001101101101001001001000010101110011000100100011
One bit is used to represent the sign, 11 bits are used to represent the exponent, and 52 bits are used to represent the mantissa. Note the bias of 1023 for doubles (which I assume is to allow for negative exponents - more on this later).
My confusion is with the 11 bits which represent the exponent, and their use (or lack thereof) for large numbers, specifically double.MaxValue (1.7976931348623157E+308).
For the exponent, there are a few special values as cited in the article which help determine a number's value. All zeroes represent 0; all ones represent NaN and positive/negative infinity. There are 11 bits to work with: the first bit of the exponent is bias, so we can disregard that. This gives us 10 bits which control the actual size of the exponent.
The exponent on double.MaxValue is 308, which can be represented with 9 bits (100110100, or with bias: 10100110100). The smallest fractional value is double.Epsilon (4.94065645841247E-324), and its exponent can still be represented in 9 bits (101000100, or with bias: 00101000100).
You might notice that the first bit after the bias always seems to be wasted. Are my assumptions about negative exponents correct? If so, why is the second bit after the bias wasted? Regardless, it seems like the actual largest number we could represent (while respecting the special values and a possible sign bit after the bias) is 111111111 (or 511 in base 10).
If the bit after the bias is actually wasted, why can't we represent numbers with exponents larger than 324? What am I misunderstanding about this?

There are no wasted bits in a double.
Let's sort out your confusion. How do we turn a double from bits into a mathematical value? Let's assume the double is not zero, infinity, negative infinity, NaN or a denormal, because those all have special rules.
The crux of your confusion is mixing up decimal quantities with binary quantities. For this answer I'll put all binary quantities in this formatting and decimal quantities in regular formatting.
We take the 52 bits of the mantissa and we put them after 1. So in your example, that would be
1.0111001101101101001001001000010101110011000100100011
That's a binary number. So 1 + 0/2 + 1/4 + 1/8 + 1/16 + 0/32 ...
Then we take the 11 bits of the exponent, treat that as an 11 bit unsigned integer, and subtract 1023 from that value. So in your example we have 10000000100 which is the unsigned integer 1028. Subtract 1023, and we get 5.
Now we shift the "decimal place" (ha ha) by 5 places:
101110.01101101101001001001000010101110011000100100011
Note that this is equivalent to multiplying by 25. It is not multiplying by 105!
And now we multiply the whole thing by 1 if the sign bit is 0, and -1 if the sign bit is 1. So the final answer is
101110.01101101101001001001000010101110011000100100011
Let's see an example with a negative exponent.
Suppose the exponent had been 01111111100. That's 1020 as an unsigned integer. Subtract 1023. We get -3, so we would shift three places to the left, and get:
0.0010111001101101101001001001000010101110011000100100011
Let's see an example with a large exponent. What if the exponent had been 11111111100 ?
Work it out. That's 2044 in decimal. Subtract 1023. That's 1021. So this number would be the extremely large number that you get when multiplying 1.0111001101101101001001001000010101110011000100100011 by 21021.
So the value of that double is exactly equal to
32603055608669827528875188998863283395233949199438288081243712122350844851941321466156747022359800582932574058697506453751658312301708309704448596122037141141297743099124156580613023692715652869864010740666615694378079258090383719888417882332809291228958035810952632190230935024250237637887765563383983636480
Which is approximately 3.26030556 x 10307.
Is that now clear?
If this subject interests you, here's some further reading:
Code to decode a double into its parts:
https://ericlippert.com/2015/11/30/the-dedoublifier-part-one/
A simple arbitrary-precision rational:
https://ericlippert.com/2015/12/03/the-dedoublifier-part-two/
Code to turn a double into its exact rational:
https://ericlippert.com/2015/12/07/the-dedoublifier-part-three/
Representation of floats:
https://blogs.msdn.microsoft.com/ericlippert/2005/01/10/floating-point-arithmetic-part-one/
How Benford's Law is used to minimize representation errors:
https://blogs.msdn.microsoft.com/ericlippert/2005/01/13/floating-point-and-benfords-law-part-two/
What algorithm do we use to display floats as decimal quantities?
https://blogs.msdn.microsoft.com/ericlippert/2005/01/17/fun-with-floating-point-arithmetic-part-three/
What happens when you try to compare for equality floats of different precision levels?
https://blogs.msdn.microsoft.com/ericlippert/2005/01/18/fun-with-floating-point-arithmetic-part-four/
What properties of standard arithmetic fail to hold in floating point?
https://blogs.msdn.microsoft.com/ericlippert/2005/01/20/fun-with-floating-point-arithmetic-part-five/
How are infinities and divisions by zero represented?
https://blogs.msdn.microsoft.com/ericlippert/2009/10/15/as-timeless-as-infinity/

Parsing float resulting in strange decimal values

I'm attempting to parse a string with 2 decimal places as a float.
The problem is, the resultant object has an incorrect mantissa.
As it's quite a bit off what I'd expect, I struggle to believe it's a rounding issue.
However double seems to work.
This value does seem to be within the range of a float (-3.4 × 10^38 to +3.4 × 10^38) so I don't see why it doesn't parse it as I'd expect.
I tried a few more test but it doesn't make what's happening any more clear to me.

From the documentation for System.Single:
All floating-point numbers have a limited number of significant digits, which also determines how accurately a floating-point value approximates a real number. A Single value has up to 7 decimal digits of precision, although a maximum of 9 digits is maintained internally.
It's not a matter of the range of float - it's the precision.
The closest exact value to 650512.56 (for example) is 650512.5625... which is then being shown as 650512.5625 in the watch window.
To be honest, if you're parsing a decimal number, you should probably use decimal to represent it. That way, assuming it's in range and doesn't have more than the required number of decimal digits, you'll have the exact numeric representation of the original string. While you could use double and be fine for 9 significant digits, you still wouldn't be storing the exact value you parsed - for example, "0.1" can't be exactly represented as a double.

The mantissa of a float in c# has 23 bits, which means that it can have 6-7 significant digits. In your example 650512.59 you have 8, and it is just that digit which is 'wrong'. Double has a mantissa of 52 bits (15-16 digits), so of course it will show correctly all your 8 or 9 significant digits.
See here for more: Type float in C#

Why is C# Rounding This Way?

So, I've got this floating point number:
(float)-123456.668915
It's a number chosen at random because I'm doing some unit testing for a chunk of BCD code I'm writing. Whenever I go to compare the number above with a a string ("-123456.668915" to be clear), I'm getting an issue with how C# rounded the number. It rounds it to -123456.7. This has been checked in NUnit and with straight console output.
Why is it rounding like this? According to MSDN, the range of float is approximately -3.4 * 10^38 to +3.4 * 10^38 with 7 digits of precision. The above number, unless I'm completely missing something, is well within that range, and only has 6 digits after the decimal point.
Thanks for the help!

According to MSDN, the range of float is approximately -3.4 * 10^38 to +3.4 * 10^38 with 7 digits of precision. The above number, unless I'm completely missing something, is well within that range, and only has 6 digits after the decimal point.
"6 digits after the decimal point" isn't the same as "6 digits of precision". The number of digits of precision is the number of significant digits which can be reliably held. Your number has 12 significant digits, so it's not at all surprising that it can't be represented exactly by float.
Note that the number it's (supposedly) rounding to, -123456.7, does have 7 significant digits. In fact, that's not the value of your float either. I strongly suspect the exact value is -123456.671875, as that's the closest float to -123456.668915. However, when you convert the exact value to a string representation, the result is only 7 digits, partly because beyond that point the digits aren't really meaningful anyway.
You should probably read my article about binary floating point in .NET for more details.

The float type has a precision of 24 significant bits (except for denormals), which is equivalent to 24 log10 2 &approx; 7.225 significant decimal digits. The number -123456.668915 has 12 significant digits, so it can't be represented accurately.
The actual binary value, rounded to 24 significant bits, is -11110001001000000.1010110. This is equivalent to the fraction -7901227/64 = -123456.671875. Rounding to 7 significant digits gives the -123456.7 you see.

Accuracy of decimal

I use the decimal type for high precise calculation (monetary).
But I came across this simple division today:
1 / (1 / 37) which should result in 37 again
http://www.wolframalpha.com/input/?i=1%2F+%281%2F37%29
But C# gives me:
37.000000000000000000000000037M
I tried both these:
1m/(1m/37m);
and
Decimal.Divide(1, Decimal.Divide(1, 37))
but both yield the same results. How is the behaviour explainable?

Decimal stores the value as decimal floating point with only limited precision. The result of 1 / 37 is not precicely stored, as it's stored as 0.027027027027027027027027027M. The true number has the group 027 going indefinitely in decimal representation. For that reason, you cannot get the precise numbers in decimal representation for every possible number.
If you use Double in the same calculation, the end result is correct in this case (but it does not mean it will always be better).
A good answer on that topic is here: Difference between decimal, float and double in .NET?

Decimal data type has an accuracy of 28-29 significant digits.
So what you have to understand is when you consider 28-29 significant digits you are still not exact.
So when you compute a decimal value for (1/37) what you have to note is that at this stage you are only getting an accuracy of 28-29 digits. e.g 1/37 is 0.02 when you take 2 significant digits and 0.027 when you take 3 significant digits. Imagine you divide 1 with these values in each case. you get a 50 in first case and in second case you get 37.02...Considering 28-29 digits (decimal ) takes you to an accuracy of 37.000000000000000000000000037. If you have to get an exact 37 you simply need more than 28-29 significant digits than the decimal offers.
Always do computations with maximum significant digits and round off only your answer with Math.Round for desired result.

Double vs Decimal Rounding in C#

Why does:
double dividend = 1.0;
double divisor = 3.0;
Console.WriteLine(dividend / divisor * divisor);
output 1.0,
but:
decimal dividend = 1;
decimal divisor = 3;
Console.WriteLine(dividend / divisor * divisor);
outputs 0.9999999999999999999999999999
?
I understand that 1/3 can't be computed exactly, so there must be some rounding.
But why does Double round the answer to 1.0, but Decimal does not?
Also, why does double compute 1.0/3.0 to be 0.33333333333333331?
If rounding is used, then wouldn't the last 3 get rounded to 0, why 1?

Why 1/3 as a double is 0.33333333333333331
The closest way to represent 1/3 in binary is like this:
0.0101010101...
That's the same as the series 1/4 + (1/4)^2 + (1/4)^3 + (1/4)^4...
Of course, this is limited by the number of bits you can store in a double. A double is 64 bits, but one of those is the sign bit and another 11 represent the exponent (think of it like scientific notation, but in binary). So the rest, which is called the mantissa or significand is 52 bits. Assume a 1 to start and then use two bits for each subsequent power of 1/4. That means you can store:
1/4 + 1/4^2 + ... + 1/4 ^ 27
which is 0.33333333333333331
Why multiplying by 3 rounds this to 1
So 1/3 represented in binary and limited by the size of a double is:
0.010101010101010101010101010101010101010101010101010101
I'm not saying that's how it's stored. Like I said, you store the bits starting after the 1, and you use separate bits for the exponent and the sign. But I think it's useful to consider how you'd actually write it in base 2.
Let's stick with this "mathematician's binary" representation and ignore the size limits of a double. You don't have to do it this way, but I find it convenient. If we want to take this approximation for 1/3 and multiply by 3, that's the same as bit shifting to multiply by 2 and then adding what you started with. This gives us 1/3 * 3 = 0.111111111111111111111111111111111111111111111111111111
But can a double store that? No, remember, you can only have 52 bits of mantissa after the first 1, and that number has 54 ones. So we know that it'll be rounded, in this case rounded up to exactly 1.
Why for decimal you get 0.9999999999999999999999999999
With decimal, you get 96 bits to represent an integer, with additional bits representing the exponent up to 28 powers of 10. So even though ultimately it's all stored as binary, here we're working with powers of 10 so it makes sense to think of the number in base 10. 96 bits lets us express up to 79,228,162,514,264,337,593,543,950,335, but to represent 1/3 we're going to go with all 3's, up to the 28 of them that we can shift to the right of the decimal point: 0.3333333333333333333333333333.
Multiplying this approximation for 1/3 by 3 gives us a number we can represent exactly. It's just 28 9's, all shifted to the right of the decimal point: 0.9999999999999999999999999999. So unlike with double's there's not a second round of rounding at this point.

This is by design of the decimal type which is optimized for accuracy unlike the double type which is optimized for low accuracy but higher performance.
The Decimal value type represents decimal numbers ranging from positive 79,228,162,514,264,337,593,543,950,335 to negative 79,228,162,514,264,337,593,543,950,335.
The Decimal value type is appropriate for financial calculations requiring large numbers of significant integral and fractional digits and no round-off errors. The Decimal type does not eliminate the need for rounding. Rather, it minimizes errors due to rounding. Thus your code produces a result of 0.9999999999999999999999999999 rather than 1.
One reason that infinite decimals are a necessary extension of finite decimals is to represent fractions. Using long division, a simple division of integers like 1⁄9 becomes a recurring decimal, 0.111…, in which the digits repeat without end. This decimal yields a quick proof for 0.999… = 1. Multiplication of 9 times 1 produces 9 in each digit, so 9 × 0.111… equals 0.999… and 9 × 1⁄9 equals 1, so 0.999… = 1:

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.