There is a similar question in here. Sometimes that solution gives exceptions because the numbers might be to large.
I think that if there is a way of looking at the bytes of a decimal number it will be more efficient. For example a decimal number has to be represented by some n number of bytes. For example an Int32 is represented by 32 bits and all the numbers that start with the bit of 1 are negative. Maybe there is some kind of similar relationship with decimal numbers. How could you look at the bytes of a decimal number? or the bytes of an integer number?
If you are really talking about decimal numbers (as opposed to floating-point numbers), then Decimal.GetBits will let you look at the individual bits of a decimal. The MSDN page also contains a description of the meaning of the bits.
On the other hand, if you just want to check whether a number has a fractional part or not, doing a simple
var hasFractionalPart = (myValue - Math.Round(myValue) != 0)
is much easier than decoding the binary structure. This should work for decimals as well as classic floating-point data types such as float or double. In the latter case, due to floating-point rounding error, it might make sense to check for Math.Abs(myValue - Math.Round(myValue)) < someThreshold instead of comparing to 0.
If you want a reasonably efficient way of getting the 'decimal' value of a decimal type you can just mod it by one.
decimal number = 4.75M;
decimal fractionalPart = number % 1;
Console.WriteLine(fractionalPart); //will print 0.75
While it may not be the theoretically optimal solution, it'll be quite fast, and almost certainly fast enough for your purposes (far better than string manipulation and parsing, which is a common naive approach).
You can use Decimal.GetBits in order to retrieve the bits from a decimal structure.
The MSDN page linked above details how they are laid out in memory:
The binary representation of a Decimal number consists of a 1-bit sign, a 96-bit integer number, and a scaling factor used to divide the integer number and specify what portion of it is a decimal fraction. The scaling factor is implicitly the number 10, raised to an exponent ranging from 0 to 28.
The return value is a four-element array of 32-bit signed integers.
The first, second, and third elements of the returned array contain the low, middle, and high 32 bits of the 96-bit integer number.
The fourth element of the returned array contains the scale factor and sign. It consists of the following parts:
Bits 0 to 15, the lower word, are unused and must be zero.
Bits 16 to 23 must contain an exponent between 0 and 28, which indicates the power of 10 to divide the integer number.
Bits 24 to 30 are unused and must be zero.
Bit 31 contains the sign; 0 meaning positive, and 1 meaning negative.
Going with Oded's detailed info to use GetBits, I came up with this
const int EXP_MASK = 0x00FF0000;
bool hasDecimal = (Decimal.GetBits(value)[3] & EXP_MASK) != 0x0;
Related
I've been doing research on floating-point doubles in .NET lately. While reading Jon Skeet's article Binary floating points and .NET, I had a question.
Let's start with the example of 46.428292315077 in the article.
Represented as a 64 bit double, this equates to the following bits:
Sign Exponent Mantissa
0 10000000100 0111001101101101001001001000010101110011000100100011
One bit is used to represent the sign, 11 bits are used to represent the exponent, and 52 bits are used to represent the mantissa. Note the bias of 1023 for doubles (which I assume is to allow for negative exponents - more on this later).
My confusion is with the 11 bits which represent the exponent, and their use (or lack thereof) for large numbers, specifically double.MaxValue (1.7976931348623157E+308).
For the exponent, there are a few special values as cited in the article which help determine a number's value. All zeroes represent 0; all ones represent NaN and positive/negative infinity. There are 11 bits to work with: the first bit of the exponent is bias, so we can disregard that. This gives us 10 bits which control the actual size of the exponent.
The exponent on double.MaxValue is 308, which can be represented with 9 bits (100110100, or with bias: 10100110100). The smallest fractional value is double.Epsilon (4.94065645841247E-324), and its exponent can still be represented in 9 bits (101000100, or with bias: 00101000100).
You might notice that the first bit after the bias always seems to be wasted. Are my assumptions about negative exponents correct? If so, why is the second bit after the bias wasted? Regardless, it seems like the actual largest number we could represent (while respecting the special values and a possible sign bit after the bias) is 111111111 (or 511 in base 10).
If the bit after the bias is actually wasted, why can't we represent numbers with exponents larger than 324? What am I misunderstanding about this?
There are no wasted bits in a double.
Let's sort out your confusion. How do we turn a double from bits into a mathematical value? Let's assume the double is not zero, infinity, negative infinity, NaN or a denormal, because those all have special rules.
The crux of your confusion is mixing up decimal quantities with binary quantities. For this answer I'll put all binary quantities in this formatting and decimal quantities in regular formatting.
We take the 52 bits of the mantissa and we put them after 1. So in your example, that would be
1.0111001101101101001001001000010101110011000100100011
That's a binary number. So 1 + 0/2 + 1/4 + 1/8 + 1/16 + 0/32 ...
Then we take the 11 bits of the exponent, treat that as an 11 bit unsigned integer, and subtract 1023 from that value. So in your example we have 10000000100 which is the unsigned integer 1028. Subtract 1023, and we get 5.
Now we shift the "decimal place" (ha ha) by 5 places:
101110.01101101101001001001000010101110011000100100011
Note that this is equivalent to multiplying by 25. It is not multiplying by 105!
And now we multiply the whole thing by 1 if the sign bit is 0, and -1 if the sign bit is 1. So the final answer is
101110.01101101101001001001000010101110011000100100011
Let's see an example with a negative exponent.
Suppose the exponent had been 01111111100. That's 1020 as an unsigned integer. Subtract 1023. We get -3, so we would shift three places to the left, and get:
0.0010111001101101101001001001000010101110011000100100011
Let's see an example with a large exponent. What if the exponent had been 11111111100 ?
Work it out. That's 2044 in decimal. Subtract 1023. That's 1021. So this number would be the extremely large number that you get when multiplying 1.0111001101101101001001001000010101110011000100100011 by 21021.
So the value of that double is exactly equal to
32603055608669827528875188998863283395233949199438288081243712122350844851941321466156747022359800582932574058697506453751658312301708309704448596122037141141297743099124156580613023692715652869864010740666615694378079258090383719888417882332809291228958035810952632190230935024250237637887765563383983636480
Which is approximately 3.26030556 x 10307.
Is that now clear?
If this subject interests you, here's some further reading:
Code to decode a double into its parts:
https://ericlippert.com/2015/11/30/the-dedoublifier-part-one/
A simple arbitrary-precision rational:
https://ericlippert.com/2015/12/03/the-dedoublifier-part-two/
Code to turn a double into its exact rational:
https://ericlippert.com/2015/12/07/the-dedoublifier-part-three/
Representation of floats:
https://blogs.msdn.microsoft.com/ericlippert/2005/01/10/floating-point-arithmetic-part-one/
How Benford's Law is used to minimize representation errors:
https://blogs.msdn.microsoft.com/ericlippert/2005/01/13/floating-point-and-benfords-law-part-two/
What algorithm do we use to display floats as decimal quantities?
https://blogs.msdn.microsoft.com/ericlippert/2005/01/17/fun-with-floating-point-arithmetic-part-three/
What happens when you try to compare for equality floats of different precision levels?
https://blogs.msdn.microsoft.com/ericlippert/2005/01/18/fun-with-floating-point-arithmetic-part-four/
What properties of standard arithmetic fail to hold in floating point?
https://blogs.msdn.microsoft.com/ericlippert/2005/01/20/fun-with-floating-point-arithmetic-part-five/
How are infinities and divisions by zero represented?
https://blogs.msdn.microsoft.com/ericlippert/2009/10/15/as-timeless-as-infinity/
I am wondering how you take a number (for example 9), convert it to a 32 int (00000000000000000000000000001001), then invert or flip every bit (11111111111111111111111111110110) so that the zeroes become ones and the ones become zeroes.
I know how to do that by replacing the numbers in a string, but I need to know how to do that with binary operators on a binary number.
I think you have to use this operator, "~", but it just gives me a negative number when I use it on a value.
That is doing the correct functionality. The int data type within C# uses signed integers, so 11111111111111111111111111110110 is in fact a negative number.
As Marc pointed out, if you want to use unsigned values declare your number as a uint.
If you look at the decimal version of your number then its a negative number.
If you declare it as a unsigned int then its a positive one.
But this doesnt matter, binary it will always be 11111111111111111111111111110110.
Try this:
int number = 9;
Console.WriteLine(Convert.ToString(number, 2)); //Gives you 1001
number = ~number; //Invert all bits
Console.WriteLine(Convert.ToString(number, 2));
//Gives you your wanted result: 11111111111111111111111111110110
Here comes a silly question. I'm playing with the parse function of System.Single and it behaves unexpected which might be because I don't really understand floating-point numbers. The MSDN page of System.Single.MaxValue states that the max value is 3.402823e38, in standard form that is
340282300000000000000000000000000000000
If I use this string as an argument for the Parse() method, it will succeed without error, if I change any of the zeros to an arbitrary digit it will still succeed without error (although it seems to ignore them looking at the result). In my understanding, that exceeds the limit, so What am I missing?
It may be easier to think about this by looking at some lower numbers. All (positive) integers up to 16777216 can be exactly represented in a float. After that point, only every other integer can be represented (up to the next time we hit a limit, at which point it's only every 4th integer that can be represented).
So what has to happen then is the 16777218 has to stand for 16777218∓1, 16777220 has to stand for 16777220∓1, etc. As you move up into even larger numbers, the range of integers that each value has to "represent" grows wider and wider - until the point where 340282300000000000000000000000000000000 represents all numbers in the range 340282300000000000000000000000000000000∓100000000000000000000000000000000, approximately (I've not actually worked out what the right ∓ value is here, but hopefully you get the point)
Number Significand Exponent
16777215 = 1 11111111111111111111111 2^0 = 111111111111111111111111
16777216 = 1 00000000000000000000000 2^1 = 1000000000000000000000000
16777218 = 1 00000000000000000000001 2^1 = 1000000000000000000000010
^
|
Implicit leading bit
That's actually not true - change the first 0 to 9 and you will see an exception. Actually change it to anything 6 and up and it blows up.
Any other number is just rounded down as float is not an 100% accurate representation of a decimal with 38+1 positions that's fine.
A floating point number is not like a decimal. It comprises a mantissa that carries the significant digits and an exponent that effectively says how far left or right of the decimal point to place the mantissa. A System.Single can only handle seven significant digits in the mantissa. If you replace any of your trailing zeroes with an arbitrary digit it is being lost when your decimal is converted into the mantissa and exponent form.
Good question. That is happening because the fact you can save a number with that range doesn't mean this type'll have enough precision to hold it. You can only store ~6-7 leading digits for floats and add an exponent to describe decimal point position.
0.012345 and 1234500 hold the same amount of informations - same mantissa, different exponents. The MSDN states only that value AFTRER EXPONENTIATION cannot be bigger, than MaxValue.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Why is floating point arithmetic in C# imprecise?
Why is there a bias in floating point ops? Any specific reason?
Output:
160
139
static void Main()
{
float x = (float) 1.6;
int y = (int)(x * 100);
float a = (float) 1.4;
int b = (int)(a * 100);
Console.WriteLine(y);
Console.WriteLine(b);
Console.ReadKey();
}
Any rational number that has a denominator that is not a power of 2 will lead to an infinite number of digits when represented as a binary. Here you have 8/5 and 7/5. Therefore there is no exact binary representation as a floating-point number (unless you have infinite memory).
The exact binary representation of 1.6 is 110011001100110011001100110011001100...
The exact binary representation of 1.4 is 101100110011001100110011001100110011...
Both values have an infinite number of digits (1100 is repeated endlessly).
float values have a precision of 24 bits. So the binary representation of any value will be rounded to 24 bits. If you round the given values to 24 bits you get:
1.6: 110011001100110011001101 (decimal 13421773) - rounded up
1.4: 101100110011001100110011 (decimal 11744051) - rounded down
Both values have an exponent of 0 (the first bit is 2^0 = 1, the second is 2^-1 = 0.5 etc.).
Since the first bit in a 24 bit value is 2^23 you can calculate the exact decimal values by dividing the 24 bit values (13421773 and 11744051) by two 23 times.
The values are: 1.60000002384185791015625 and 1.39999997615814208984375.
When using floating-point types you always have to consider that their precision is finite. Values that can be written exact as decimal values might be rounded up or down when represented as binaries. Casting to int does not respect that because it truncates the given values. You should always use something like Math.Round.
If you really need an exact representation of rational numbers you need a completely different approach. Since rational numbers are fractions you can use integers to represent them. Here is an example of how you can achieve that.
However, you can not write Rational x = (Rational)1.6 then. You have to write something like Rational x = new Rational(8, 5) (or new Rational(16, 10) etc.).
This is due to the fact that floating point arithmetic is not precise. When you set a to 1.4, internally it may not be exactly 1.4, just as close as can be made with machine precision. If it is fractionally less than 1.4, then multiplying by 100 and casting to integer will take only the integer portion which in this case would be 139. You will get far more technically precise answers but essentially this is what is happening.
In the case of your output for the 1.6 case, the floating point representation may actually be minutely larger than 1.6 and so when you multiply by 100, the total is slightly larger than 160 and so the integer cast gives you what you expect. The fact is that there is simply not enough precision available in a computer to store every real number exactly.
See this link for details of the conversion from floating point to integer types http://msdn.microsoft.com/en-us/library/aa691289%28v=vs.71%29.aspx - it has its own section.
The floating point types float (32 bit) and double (64 bit) have a limited precision and more over the value is represented as a binary value internally. Just as you cannot represent 1/7 precisely in a decimal system (~ 0.1428571428571428...), 1/10 cannot be represented precisely in a binary system.
You can however use the decimal type. It still has a limited (however high) precision, but the numbers a represented in a decimal way internally. Therefore a value like 1/10 is represented exactly like 0.1000000000000000000000000000 internally. 1/7 is still a problem for decimal. But at least you don't get a loss of precision by converting to binary and then back to decimal.
Consider using decimal.
The Double data type cannot correctly represent some base 10 values. This is because of how floating point numbers represent real numbers. What this means is that when representing monetary values, one should use the decimal value type to prevent errors. (feel free to correct errors in this preamble)
What I want to know is what are the values which present such a problem under the Double data-type under a 64 bit architecture in the standard .Net framework (C# if that makes a difference) ?
I expect the answer the be a formula or rule to find such values but I would also like some example values.
Any number which cannot be written as the sum of positive and negative powers of 2 cannot be exactly represented as a binary floating-point number.
The common IEEE formats for 32- and 64-bit representations of floating-point numbers impose further constraints; they limit the number of binary digits in both the significand and the exponent. So there are maximum and minimum representable numbers (approximately +/- 10^308 (base-10) if memory serves) and limits to the precision of a number that can be represented. This limit on the precision means that, for 64-bit numbers, the difference between the exponent of the largest power of 2 and the smallest power in a number is limited to 52, so if your number includes a term in 2^52 it can't also include a term in 2^-1.
Simple examples of numbers which cannot be exactly represented in binary floating-point numbers include 1/3, 2/3, 1/5.
Since the set of floating-point numbers (in any representation) is finite, and the set of real numbers is infinite, one algorithm to find a real number which is not exactly representable as a floating-point number is to select a real number at random. The probability that the real number is exactly representable as a floating-point number is 0.
You generally need to be prepared for the possibility that any value you store in a double has some small amount of error. Unless you're storing a constant value, chances are it could be something with at least some error. If it's imperative that there never be any error, and the values aren't constant, you probably shouldn't be using a floating point type.
What you probably should be asking in many cases is, "How do I deal with the minor floating point errors?" You'll want to know what types of operations can result in a lot of error, and what types don't. You'll want to ensure that comparing two values for "equality" actually just ensures they are "close enough" rather than exactly equal, etc.
This question actually goes beyond any single programming language or platform. The inaccuracy is actually inherent in binary data.
Consider that with a double, each number N to the left (at 0-based index I) of the decimal point represents the value N * 2^I and every digit to the right of the decimal point represents the value N * 2^(-I).
As an example, 5.625 (base 10) would be 101.101 (base 2).
Given this calculation, and decimal value that can't be calculated as a sum of 2^(-I) for different values of I would have an incorrect value as a double.
A float is represented as s, e and m in the following formula
s * m * 2^e
This means that any number that cannot be represented using the given expression (and in the respective domains of s, e and m) cannot be represented exactly.
Basically, you can represent all numbers between 0 and 2^53 - 1 multiplied by a certain power of two (possibly a negative power).
As an example, all numbers between 0 and 2^53 - 1 can be represented multiplied with 2^0 = 1. And you can also represent all those numbers by dividing them by 2 (with a .5 fraction). And so on.
This answer does not fully cover the topic, but I hope it helps.