I was going through the documentation for "Floating-point numeric types (C# reference)" at MSDN, https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/floating-point-numeric-types.
It has a table, "Characteristics of the floating-point types," describing the approximate ranges for the different floating datatypes that C# deals with. What I do not understand is why both the MIN and MAX in "Approximate range" column are both positive and negative. Skipping a link click, here is the table,
C# type/keyword
Approximate range
Precision
Size
.NET type
float
±1.5 x 10−45 to ±3.4 x 1038
~6-9 digits
4 bytes
System.Single
double
±5.0 × 10−324 to ±1.7 × 10308
~15-17 digits
8 bytes
System.Double
decimal
±1.0 x 10-28 to ±7.9228 x 1028
28-29 digits
16 bytes
System.Decimal
Why does the approximate range on both the MIN and MAX have a ±? Should it not be a - for the MIN, and + for the MAX, as it does for the Integer type here https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/builtin-types/integral-numeric-types? Maybe I misunderstood something about floating points.
Thank you.
Perhaps it could be made clearer, but this expresses the smallest absolute value that can be expressed as well as the largest absolute value that can be expressed by the given data type. To take an example, if we consider double, it is impossible to represent 3e-324 - it would become approximately 5.0e-324, which is double.Epsilon (https://learn.microsoft.com/en-us/dotnet/api/system.double.epsilon?view=net-7.0).
These values work for both positive and negative values, hence the use of ±.
The important thing here is the sign of the exponent.
The double and float types are used for approximations and should be avoided for exact values.
Some programmers make the mistake of using them because there is a performance gain in the execution of complex calculations in relation to the decimal type.
Explaining what I think is confusing you
I'll lower the exponent to make it easier to understand
± is a replecement for "more or less", since float and double types are recommended for approximations.
The first part is for values that are less than one (fractions).
±1.5 x 10^−5 = "more or less" 0,000015
The seconde part is for integers.
±3.4 x 10^5 = "more or less" 340.000
Related
When I'm debugging my program using Visual Studio;
When I convert a double variable "91.151497095188446" to a float by using
(float)double_variable
I see resulting float variable is "91.1515".
Why I'm losing so much precision? Do I need to use another data type?
C# and many other languages use IEEE 754 as a specification for their floating-point data types. Floating-point numbers are expressed as a significand and an exponent, similar to how a decimal number, in scientific notation, is expressed as
1.234567890 x 10^12
^ ^
mantissa exponent
I won't go into the details (the Wikipedia article goes into that better than I can), but IEEE 754 specifies that:
for a 32-bit floating point number, such as the C# float data type, has 24 bits of precision for the significand, and 8 bits for the exponent.
for a 64-bit floating point number, such as the C# double data type, has 53 bits of precision for the significand, and 11 bits for the exponent.
Because a float only has 24 bits of precision, it can only express 7-8 digits of precision. Conversely, a double has 53 bits of precision so has about 15-16 digits of precision.
As has been said in the comments, if you don't want to lose precision, don't go from a double (64 bits in total) to a float (32 bits in total). Depending on your application, you could perhaps use the decimal data type which has 28-29 decimal digits of precision - but will come with penalties because (a) calculations involving it are slower than for double or float, and (b) it's typically far less supported by external libraries.
Note that you're talking about 91.15149709518846 which will actually be interpreted as 91.1514970951884 by the compiler - see, for example, this:
double value = 91.151497095188446;
Console.WriteLine(value);
// prints 91.1514970951884
You will find detailed explanation here: https://stackoverflow.com/a/2386882/10863059
Basically, float takes smaller memory space (4 bytes) when compared to double (8 bytes), but that's just one part of the story.
Floating-point types store fractional parts as inverse powers of two. For this reason, they
can only represent exact values such as 10, 10.25, 10.5, and so on. Other numbers,
such as 1.23 or 19.99, cannot be represented exactly and are only an approximation.
Even if double has 15 decimal digits of precision, as compared to only 7 for float,
precision loss starts to accumulate when performing repeated calculations.
This makes double and float difficult or even inappropriate to use in certain types of
applications, such as financial applications, where precision is key. For this purpose, the
decimal type is provided
Reference : Learn C# Programming, A guide to building a solid foundation in C# language
for writing efficient programs By Marius Bancila,
Raffaele Rialdi,
Ankit Sharma
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is floating point arithmetic in C# imprecise?
Why is there a bias in floating point ops? Any specific reason?
Output:
160
139
static void Main()
{
float x = (float) 1.6;
int y = (int)(x * 100);
float a = (float) 1.4;
int b = (int)(a * 100);
Console.WriteLine(y);
Console.WriteLine(b);
Console.ReadKey();
}
Any rational number that has a denominator that is not a power of 2 will lead to an infinite number of digits when represented as a binary. Here you have 8/5 and 7/5. Therefore there is no exact binary representation as a floating-point number (unless you have infinite memory).
The exact binary representation of 1.6 is 110011001100110011001100110011001100...
The exact binary representation of 1.4 is 101100110011001100110011001100110011...
Both values have an infinite number of digits (1100 is repeated endlessly).
float values have a precision of 24 bits. So the binary representation of any value will be rounded to 24 bits. If you round the given values to 24 bits you get:
1.6: 110011001100110011001101 (decimal 13421773) - rounded up
1.4: 101100110011001100110011 (decimal 11744051) - rounded down
Both values have an exponent of 0 (the first bit is 2^0 = 1, the second is 2^-1 = 0.5 etc.).
Since the first bit in a 24 bit value is 2^23 you can calculate the exact decimal values by dividing the 24 bit values (13421773 and 11744051) by two 23 times.
The values are: 1.60000002384185791015625 and 1.39999997615814208984375.
When using floating-point types you always have to consider that their precision is finite. Values that can be written exact as decimal values might be rounded up or down when represented as binaries. Casting to int does not respect that because it truncates the given values. You should always use something like Math.Round.
If you really need an exact representation of rational numbers you need a completely different approach. Since rational numbers are fractions you can use integers to represent them. Here is an example of how you can achieve that.
However, you can not write Rational x = (Rational)1.6 then. You have to write something like Rational x = new Rational(8, 5) (or new Rational(16, 10) etc.).
This is due to the fact that floating point arithmetic is not precise. When you set a to 1.4, internally it may not be exactly 1.4, just as close as can be made with machine precision. If it is fractionally less than 1.4, then multiplying by 100 and casting to integer will take only the integer portion which in this case would be 139. You will get far more technically precise answers but essentially this is what is happening.
In the case of your output for the 1.6 case, the floating point representation may actually be minutely larger than 1.6 and so when you multiply by 100, the total is slightly larger than 160 and so the integer cast gives you what you expect. The fact is that there is simply not enough precision available in a computer to store every real number exactly.
See this link for details of the conversion from floating point to integer types http://msdn.microsoft.com/en-us/library/aa691289%28v=vs.71%29.aspx - it has its own section.
The floating point types float (32 bit) and double (64 bit) have a limited precision and more over the value is represented as a binary value internally. Just as you cannot represent 1/7 precisely in a decimal system (~ 0.1428571428571428...), 1/10 cannot be represented precisely in a binary system.
You can however use the decimal type. It still has a limited (however high) precision, but the numbers a represented in a decimal way internally. Therefore a value like 1/10 is represented exactly like 0.1000000000000000000000000000 internally. 1/7 is still a problem for decimal. But at least you don't get a loss of precision by converting to binary and then back to decimal.
Consider using decimal.
In the lunch break we started debating about the precision of the double value type.
My colleague thinks, it always has 15 places after the decimal point.
In my opinion one can't tell, because IEEE 754 does not make assumptions
about this and it depends on where the first 1 is in the binary
representation. (i.e. the size of the number before the decimal point counts, too)
How can one make a more qualified statement?
As stated by the C# reference, the precision is from 15 to 16 digits (depending on the decimal values represented) before or after the decimal point.
In short, you are right, it depends on the values before and after the decimal point.
For example:
12345678.1234567D //Next digit to the right will get rounded up
1234567.12345678D //Next digit to the right will get rounded up
Full sample at: http://ideone.com/eXvz3
Also, trying to think about double value as fixed decimal values is not a good idea.
You're both wrong. A normal double has 53 bits of precision. That's roughly equivalent to 16 decimal digits, but thinking of double values as though they were decimals leads to no end of confusion, and is best avoided.
That said, you are much closer to correct than your colleague--the precision is relative to the value being represented; sufficiently large doubles have no fractional digits of precision.
For example, the next double larger than 4503599627370496.0 is 4503599627370497.0.
C# doubles are represented according to IEEE 754 with a 53 bit significand p (or mantissa) and a 11 bit exponent e, which has a range between -1022 and 1023. Their value is therefore
p * 2^e
The significand always has one digit before the decimal point, so the precision of its fractional part is fixed. On the other hand the number of digits after the decimal point in a double depends also on its exponent; numbers whose exponent exceeds the number of digits in the fractional part of the significand do not have a fractional part themselves.
What Every Computer Scientist Should Know About Floating-Point Arithmetic is probably the most widely recognized publication on this subject.
Since this is the only question on SO that I could find on this topic, I would like to make an addition to jorgebg's answer.
According to this, precision is actually 15-17 digits. An example of a double with 17 digits of precision would be 0.92107099070578813 (don't ask me how I got that number :P)
The Double data type cannot correctly represent some base 10 values. This is because of how floating point numbers represent real numbers. What this means is that when representing monetary values, one should use the decimal value type to prevent errors. (feel free to correct errors in this preamble)
What I want to know is what are the values which present such a problem under the Double data-type under a 64 bit architecture in the standard .Net framework (C# if that makes a difference) ?
I expect the answer the be a formula or rule to find such values but I would also like some example values.
Any number which cannot be written as the sum of positive and negative powers of 2 cannot be exactly represented as a binary floating-point number.
The common IEEE formats for 32- and 64-bit representations of floating-point numbers impose further constraints; they limit the number of binary digits in both the significand and the exponent. So there are maximum and minimum representable numbers (approximately +/- 10^308 (base-10) if memory serves) and limits to the precision of a number that can be represented. This limit on the precision means that, for 64-bit numbers, the difference between the exponent of the largest power of 2 and the smallest power in a number is limited to 52, so if your number includes a term in 2^52 it can't also include a term in 2^-1.
Simple examples of numbers which cannot be exactly represented in binary floating-point numbers include 1/3, 2/3, 1/5.
Since the set of floating-point numbers (in any representation) is finite, and the set of real numbers is infinite, one algorithm to find a real number which is not exactly representable as a floating-point number is to select a real number at random. The probability that the real number is exactly representable as a floating-point number is 0.
You generally need to be prepared for the possibility that any value you store in a double has some small amount of error. Unless you're storing a constant value, chances are it could be something with at least some error. If it's imperative that there never be any error, and the values aren't constant, you probably shouldn't be using a floating point type.
What you probably should be asking in many cases is, "How do I deal with the minor floating point errors?" You'll want to know what types of operations can result in a lot of error, and what types don't. You'll want to ensure that comparing two values for "equality" actually just ensures they are "close enough" rather than exactly equal, etc.
This question actually goes beyond any single programming language or platform. The inaccuracy is actually inherent in binary data.
Consider that with a double, each number N to the left (at 0-based index I) of the decimal point represents the value N * 2^I and every digit to the right of the decimal point represents the value N * 2^(-I).
As an example, 5.625 (base 10) would be 101.101 (base 2).
Given this calculation, and decimal value that can't be calculated as a sum of 2^(-I) for different values of I would have an incorrect value as a double.
A float is represented as s, e and m in the following formula
s * m * 2^e
This means that any number that cannot be represented using the given expression (and in the respective domains of s, e and m) cannot be represented exactly.
Basically, you can represent all numbers between 0 and 2^53 - 1 multiplied by a certain power of two (possibly a negative power).
As an example, all numbers between 0 and 2^53 - 1 can be represented multiplied with 2^0 = 1. And you can also represent all those numbers by dividing them by 2 (with a .5 fraction). And so on.
This answer does not fully cover the topic, but I hope it helps.
This question already has answers here:
When should I use double instead of decimal?
(12 answers)
Closed 9 years ago.
I keep seeing people using doubles in C#. I know I read somewhere that doubles sometimes lose precision.
My question is when should a use a double and when should I use a decimal type?
Which type is suitable for money computations? (ie. greater than $100 million)
For money, always decimal. It's why it was created.
If numbers must add up correctly or balance, use decimal. This includes any financial storage or calculations, scores, or other numbers that people might do by hand.
If the exact value of numbers is not important, use double for speed. This includes graphics, physics or other physical sciences computations where there is already a "number of significant digits".
My question is when should a use a
double and when should I use a decimal
type?
decimal for when you work with values in the range of 10^(+/-28) and where you have expectations about the behaviour based on base 10 representations - basically money.
double for when you need relative accuracy (i.e. losing precision in the trailing digits on large values is not a problem) across wildly different magnitudes - double covers more than 10^(+/-300). Scientific calculations are the best example here.
which type is suitable for money
computations?
decimal, decimal, decimal
Accept no substitutes.
The most important factor is that double, being implemented as a binary fraction, cannot accurately represent many decimal fractions (like 0.1) at all and its overall number of digits is smaller since it is 64-bit wide vs. 128-bit for decimal. Finally, financial applications often have to follow specific rounding modes (sometimes mandated by law). decimal supports these; double does not.
According to Characteristics of the floating-point types:
.NET Type
C# Keyword
Precision
System.Single
float
~6-9 digits
System.Double
double
~15-17 digits
System.Decimal
decimal
28-29 digits
The way I've been stung by using the wrong type (a good few years ago) is with large amounts:
£520,532.52 - 8 digits
£1,323,523.12 - 9 digits
You run out at 1 million for a float.
A 15 digit monetary value:
£1,234,567,890,123.45
9 trillion with a double. But with division and comparisons it's more complicated (I'm definitely no expert in floating point and irrational numbers - see Marc's point). Mixing decimals and doubles causes issues:
A mathematical or comparison operation
that uses a floating-point number
might not yield the same result if a
decimal number is used because the
floating-point number might not
exactly approximate the decimal
number.
When should I use double instead of decimal? has some similar and more in depth answers.
Using double instead of decimal for monetary applications is a micro-optimization - that's the simplest way I look at it.
Decimal is for exact values. Double is for approximate values.
USD: $12,345.67 USD (Decimal)
CAD: $13,617.27 (Decimal)
Exchange Rate: 1.102932 (Double)
For money: decimal. It costs a little more memory, but doesn't have rounding troubles like double sometimes has.
Definitely use integer types for your money computations.
This cannot be emphasized enough since at first glance it might seem that a floating point type is adequate.
Here an example in python code:
>>> amount = float(100.00) # one hundred dollars
>>> print amount
100.0
>>> new_amount = amount + 1
>>> print new_amount
101.0
>>> print new_amount - amount
>>> 1.0
looks pretty normal.
Now try this again with 10^20 Zimbabwe dollars:
>>> amount = float(1e20)
>>> print amount
1e+20
>>> new_amount = amount + 1
>>> print new_amount
1e+20
>>> print new_amount-amount
0.0
As you can see, the dollar disappeared.
If you use the integer type, it works fine:
>>> amount = int(1e20)
>>> print amount
100000000000000000000
>>> new_amount = amount + 1
>>> print new_amount
100000000000000000001
>>> print new_amount - amount
1
I think that the main difference beside bit width is that decimal has exponent base 10 and double has 2
http://software-product-development.blogspot.com/2008/07/net-double-vs-decimal.html