Floating point numbers are tricky, since many natural arithmetic properties don't hold.
I SUPPOSE that this particular property holds nevertheless, but I prefer to ask rather than be hit by hard to detect errors.
Assume that d is an arbitrary variable of type double. May I assume that after the following operation:
d *= 0;
The following check will always return true?
d == 0
I suppose that this will not hold if d is positive / negative infinity or NaN. However, are there any other problems I need to be aware of?
I hear that in floating point there are actually two zeroes, namely +0 and -0. If d is negative in the beginning, will d *= 0 return -0 instead? If so, will it be equal to 0?
I hear that floating point operations are subjected to inaccuracy. Thus, is it possible that multiplying by 0 will instead return something like 0.0000001 or -0.000001 which will not be equal to 0? My assumption is that this will likely be impossible, but I don't have enough knowledge to back this assumption up so I prefer to ask.
Any other problems I didn't foresee?
The short answer is yes, given your specific example, d will be 0. This is because 0 has an exact representation in every FP model, so when you multiply a double by the value '0' (or '0.0'), the result is not subject to rounding/truncation errors.
The issues you mentioned come into play during FP arithmetic as a result of the inherent approximations that occur when you have a finite resolution.
For example, the most accurate 64-bit representation of the value 0.002 in the IEEE-754 floating point standard is 0x3F60624D_D2F1A9FC which amounts to 2.00000000000000004163336342344E-3 (source: http://www.binaryconvert.com/convert_double.html)
The value 0 does not suffer from this loss of precision. However, obviously if you were to do something like this, you would run into problems:
double d = 0.009;
d -= (0.001 * 9);
if (d == 0)
{
//This check will fail
}
This is why it is almost always advised against to use exact equality when comparing floating point values.
Related
My question is not about floating precision. It is about why Equals() is different from ==.
I understand why .1f + .2f == .3f is false (while .1m + .2m == .3m is true).
I get that == is reference and .Equals() is value comparison. (Edit: I know there is more to this.)
But why is (.1f + .2f).Equals(.3f) true, while (.1d+.2d).Equals(.3d) is still false?
.1f + .2f == .3f; // false
(.1f + .2f).Equals(.3f); // true
(.1d + .2d).Equals(.3d); // false
The question is confusingly worded. Let's break it down into many smaller questions:
Why is it that one tenth plus two tenths does not always equal three tenths in floating point arithmetic?
Let me give you an analogy. Suppose we have a math system where all numbers are rounded off to exactly five decimal places. Suppose you say:
x = 1.00000 / 3.00000;
You would expect x to be 0.33333, right? Because that is the closest number in our system to the real answer. Now suppose you said
y = 2.00000 / 3.00000;
You'd expect y to be 0.66667, right? Because again, that is the closest number in our system to the real answer. 0.66666 is farther from two thirds than 0.66667 is.
Notice that in the first case we rounded down and in the second case we rounded up.
Now when we say
q = x + x + x + x;
r = y + x + x;
s = y + y;
what do we get? If we did exact arithmetic then each of these would obviously be four thirds and they would all be equal. But they are not equal. Even though 1.33333 is the closest number in our system to four thirds, only r has that value.
q is 1.33332 -- because x was a little bit small, every addition accumulated that error and the end result is quite a bit too small. Similarly, s is too big; it is 1.33334, because y was a little bit too big. r gets the right answer because the too-big-ness of y is cancelled out by the too-small-ness of x and the result ends up correct.
Does the number of places of precision have an effect on the magnitude and direction of the error?
Yes; more precision makes the magnitude of the error smaller, but can change whether a calculation accrues a loss or a gain due to the error. For example:
b = 4.00000 / 7.00000;
b would be 0.57143, which rounds up from the true value of 0.571428571... Had we gone to eight places that would be 0.57142857, which has far, far smaller magnitude of error but in the opposite direction; it rounded down.
Because changing the precision can change whether an error is a gain or a loss in each individual calculation, this can change whether a given aggregate calculation's errors reinforce each other or cancel each other out. The net result is that sometimes a lower-precision computation is closer to the "true" result than a higher-precision computation because in the lower-precision computation you get lucky and the errors are in different directions.
We would expect that doing a calculation in higher precision always gives an answer closer to the true answer, but this argument shows otherwise. This explains why sometimes a computation in floats gives the "right" answer but a computation in doubles -- which have twice the precision -- gives the "wrong" answer, correct?
Yes, this is exactly what is happening in your examples, except that instead of five digits of decimal precision we have a certain number of digits of binary precision. Just as one-third cannot be accurately represented in five -- or any finite number -- of decimal digits, 0.1, 0.2 and 0.3 cannot be accurately represented in any finite number of binary digits. Some of those will be rounded up, some of them will be rounded down, and whether or not additions of them increase the error or cancel out the error depends on the specific details of how many binary digits are in each system. That is, changes in precision can change the answer for better or worse. Generally the higher the precision, the closer the answer is to the true answer, but not always.
How can I get accurate decimal arithmetic computations then, if float and double use binary digits?
If you require accurate decimal math then use the decimal type; it uses decimal fractions, not binary fractions. The price you pay is that it is considerably larger and slower. And of course as we've already seen, fractions like one third or four sevenths are not going to be represented accurately. Any fraction that is actually a decimal fraction however will be represented with zero error, up to about 29 significant digits.
OK, I accept that all floating point schemes introduce inaccuracies due to representation error, and that those inaccuracies can sometimes accumulate or cancel each other out based on the number of bits of precision used in the calculation. Do we at least have the guarantee that those inaccuracies will be consistent?
No, you have no such guarantee for floats or doubles. The compiler and the runtime are both permitted to perform floating point calculations in higher precision than is required by the specification. In particular, the compiler and the runtime are permitted to do single-precision (32 bit) arithmetic in 64 bit or 80 bit or 128 bit or whatever bitness greater than 32 they like.
The compiler and the runtime are permitted to do so however they feel like it at the time. They need not be consistent from machine to machine, from run to run, and so on. Since this can only make calculations more accurate this is not considered a bug. It's a feature. A feature that makes it incredibly difficult to write programs that behave predictably, but a feature nevertheless.
So that means that calculations performed at compile time, like the literals 0.1 + 0.2, can give different results than the same calculation performed at runtime with variables?
Yep.
What about comparing the results of 0.1 + 0.2 == 0.3 to (0.1 + 0.2).Equals(0.3)?
Since the first one is computed by the compiler and the second one is computed by the runtime, and I just said that they are permitted to arbitrarily use more precision than required by the specification at their whim, yes, those can give different results. Maybe one of them chooses to do the calculation only in 64 bit precision whereas the other picks 80 bit or 128 bit precision for part or all of the calculation and gets a difference answer.
So hold up a minute here. You're saying not only that 0.1 + 0.2 == 0.3 can be different than (0.1 + 0.2).Equals(0.3). You're saying that 0.1 + 0.2 == 0.3 can be computed to be true or false entirely at the whim of the compiler. It could produce true on Tuesdays and false on Thursdays, it could produce true on one machine and false on another, it could produce both true and false if the expression appeared twice in the same program. This expression can have either value for any reason whatsoever; the compiler is permitted to be completely unreliable here.
Correct.
The way this is usually reported to the C# compiler team is that someone has some expression that produces true when they compile in debug and false when they compile in release mode. That's the most common situation in which this crops up because the debug and release code generation changes register allocation schemes. But the compiler is permitted to do anything it likes with this expression, so long as it chooses true or false. (It cannot, say, produce a compile-time error.)
This is craziness.
Correct.
Who should I blame for this mess?
Not me, that's for darn sure.
Intel decided to make a floating point math chip in which it was far, far more expensive to make consistent results. Small choices in the compiler about what operations to enregister vs what operations to keep on the stack can add up to big differences in results.
How do I ensure consistent results?
Use the decimal type, as I said before. Or do all your math in integers.
I have to use doubles or floats; can I do anything to encourage consistent results?
Yes. If you store any result into any static field, any instance field of a class or array element of type float or double then it is guaranteed to be truncated back to 32 or 64 bit precision. (This guarantee is expressly not made for stores to locals or formal parameters.) Also if you do a runtime cast to (float) or (double) on an expression that is already of that type then the compiler will emit special code that forces the result to truncate as though it had been assigned to a field or array element. (Casts which execute at compile time -- that is, casts on constant expressions -- are not guaranteed to do so.)
To clarify that last point: does the C# language specification make those guarantees?
No. The runtime guarantees that stores into an array or field truncate. The C# specification does not guarantee that an identity cast truncates but the Microsoft implementation has regression tests that ensure that every new version of the compiler has this behaviour.
All the language spec has to say on the subject is that floating point operations may be performed in higher precision at the discretion of the implementation.
When you write
double a = 0.1d;
double b = 0.2d;
double c = 0.3d;
Actually, these are not exactly 0.1, 0.2 and 0.3. From IL code;
IL_0001: ldc.r8 0.10000000000000001
IL_000a: stloc.0
IL_000b: ldc.r8 0.20000000000000001
IL_0014: stloc.1
IL_0015: ldc.r8 0.29999999999999999
There are a lof of question in SO pointing that issue like (Difference between decimal, float and double in .NET? and Dealing with floating point errors in .NET) but I suggest you to read cool article called;
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Well, what leppie said is more logical. The real situation is here, totaly depends on compiler / computer or cpu.
Based on leppie code, this code works on my Visual Studio 2010 and Linqpad, as a result True/False, but when I tried it on ideone.com, the result will be True/True
Check the DEMO.
Tip: When I wrote Console.WriteLine(.1f + .2f == .3f); Resharper warnings me;
Comparison of floating points number with equality operator. Possible
loss of precision while rounding values.
As said in the comments, this is due to the compiler doing constant propagation and performing the calculation at a higher precision (I believe this is CPU dependent).
var f1 = .1f + .2f;
var f2 = .3f;
Console.WriteLine(f1 == f2); // prints true (same as Equals)
Console.WriteLine(.1f+.2f==.3f); // prints false (acts the same as double)
#Caramiriel also points out that .1f+.2f==.3f is emit as false in the IL, hence the compiler did the calculation at compile-time.
To confirm the constant folding/propagation compiler optimization
const float f1 = .1f + .2f;
const float f2 = .3f;
Console.WriteLine(f1 == f2); // prints false
FWIW following test passes
float x = 0.1f + 0.2f;
float result = 0.3f;
bool isTrue = x.Equals(result);
bool isTrue2 = x == result;
Assert.IsTrue(isTrue);
Assert.IsTrue(isTrue2);
So problem is actually with this line
0.1f + 0.2f==0.3f
Which as stated is probably compiler/pc specific
Most people are jumping at this question from wrong angle I think so far
UPDATE:
Another curious test I think
const float f1 = .1f + .2f;
const float f2 = .3f;
Assert.AreEqual(f1, f2); passes
Assert.IsTrue(f1==f2); doesnt pass
Single equality implementation:
public bool Equals(float obj)
{
return ((obj == this) || (IsNaN(obj) && IsNaN(this)));
}
== is about comparing exact floats values.
Equals is a boolean method that may return true or false. The specific implementation may vary.
I don't know why but at this time some results of mine are different from yours. Note the third and fourth test happens to be contrary to the problem, so parts of your explanations might be wrong now.
using System;
class Test
{
static void Main()
{
float a = .1f + .2f;
float b = .3f;
Console.WriteLine(a == b); // true
Console.WriteLine(a.Equals(b)); // true
Console.WriteLine(.1f + .2f == .3f); // true
Console.WriteLine((1f + .2f).Equals(.3f)); //false
Console.WriteLine(.1d + .2d == .3d); //false
Console.WriteLine((1d + .2d).Equals(.3d)); //false
}
}
I'm just curious, why in IEEE-754 any non zero float number divided by zero results in infinite value? It's a nonsense from the mathematical perspective. So I think that correct result for this operation is NaN.
Function f(x) = 1/x is not defined when x=0, if x is a real number. For example, function sqrt is not defined for any negative number and sqrt(-1.0f) if IEEE-754 produces a NaN value. But 1.0f/0 is Inf.
But for some reason this is not the case in IEEE-754. There must be a reason for this, maybe some optimization or compatibility reasons.
So what's the point?
It's a nonsense from the mathematical perspective.
Yes. No. Sort of.
The thing is: Floating-point numbers are approximations. You want to use a wide range of exponents and a limited number of digits and get results which are not completely wrong. :)
The idea behind IEEE-754 is that every operation could trigger "traps" which indicate possible problems. They are
Illegal (senseless operation like sqrt of negative number)
Overflow (too big)
Underflow (too small)
Division by zero (The thing you do not like)
Inexact (This operation may give you wrong results because you are losing precision)
Now many people like scientists and engineers do not want to be bothered with writing trap routines. So Kahan, the inventor of IEEE-754, decided that every operation should also return a sensible default value if no trap routines exist.
They are
NaN for illegal values
signed infinities for Overflow
signed zeroes for Underflow
NaN for indeterminate results (0/0) and infinities for (x/0 x != 0)
normal operation result for Inexact
The thing is that in 99% of all cases zeroes are caused by underflow and therefore in 99%
of all times Infinity is "correct" even if wrong from a mathematical perspective.
I'm not sure why you would believe this to be nonsense.
The simplistic definition of a / b, at least for non-zero b, is the unique number of bs that has to be subtracted from a before you get to zero.
Expanding that to the case where b can be zero, the number that has to be subtracted from any non-zero number to get to zero is indeed infinite, because you'll never get to zero.
Another way to look at it is to talk in terms of limits. As a positive number n approaches zero, the expression 1 / n approaches "infinity". You'll notice I've quoted that word because I'm a firm believer in not propagating the delusion that infinity is actually a concrete number :-)
NaN is reserved for situations where the number cannot be represented (even approximately) by any other value (including the infinities), it is considered distinct from all those other values.
For example, 0 / 0 (using our simplistic definition above) can have any amount of bs subtracted from a to reach 0. Hence the result is indeterminate - it could be 1, 7, 42, 3.14159 or any other value.
Similarly things like the square root of a negative number, which has no value in the real plane used by IEEE754 (you have to go to the complex plane for that), cannot be represented.
In mathematics, division by zero is undefined because zero has no sign, therefore two results are equally possible, and exclusive: negative infinity or positive infinity (but not both).
In (most) computing, 0.0 has a sign. Therefore we know what direction we are approaching from, and what sign infinity would have. This is especially true when 0.0 represents a non-zero value too small to be expressed by the system, as it frequently the case.
The only time NaN would be appropriate is if the system knows with certainty that the denominator is truly, exactly zero. And it can't unless there is a special way to designate that, which would add overhead.
NOTE:
I re-wrote this following a valuable comment from #Cubic.
I think the correct answer to this has to come from calculus and the notion of limits. Consider the limit of f(x)/g(x) as x->0 under the assumption that g(0) == 0. There are two broad cases that are interesting here:
If f(0) != 0, then the limit as x->0 is either plus or minus infinity, or it's undefined. If g(x) takes both signs in the neighborhood of x==0, then the limit is undefined (left and right limits don't agree). If g(x) has only one sign near 0, however, the limit will be defined and be either positive or negative infinity. More on this later.
If f(0) == 0 as well, then the limit can be anything, including positive infinity, negative infinity, a finite number, or undefined.
In the second case, generally speaking, you cannot say anything at all. Arguably, in the second case NaN is the only viable answer.
Now in the first case, why choose one particular sign when either is possible or it might be undefined? As a practical matter, it gives you more flexibility in cases where you do know something about the sign of the denominator, at relatively little cost in the cases where you don't. You may have a formula, for example, where you know analytically that g(x) >= 0 for all x, say, for example, g(x) = x*x. In that case the limit is defined and it's infinity with sign equal to the sign of f(0). You might want to take advantage of that as a convenience in your code. In other cases, where you don't know anything about the sign of g, you cannot generally take advantage of it, but the cost here is just that you need to trap for a few extra cases - positive and negative infinity - in addition to NaN if you want to fully error check your code. There is some price there, but it's not large compared to the flexibility gained in other cases.
Why worry about general functions when the question was about "simple division"? One common reason is that if you're computing your numerator and denominator through other arithmetic operations, you accumulate round-off errors. The presence of those errors can be abstracted into the general formula format shown above. For example f(x) = x + e, where x is the analytically correct, exact answer, e represents the error from round-off, and f(x) is the floating point number that you actually have on the machine at execution.
When I run the following code, I get 0 printed on both lines:
Double a = 9.88131291682493E-324;
Double b = a*0.1D;
Console.WriteLine(b);
Console.WriteLine(BitConverter.DoubleToInt64Bits(b));
I would expect to get Double.NaN if an operation result gets out of range. Instead I get 0. It looks that to be able to detect when this happens I have to check:
Before the operation check if any of the operands is zero
After the operation, if neither of operands were zero, check if the result is zero. If not let it run. If it is zero, assign Double.NaN to it instead to indicate that it's not really a zero, it's just a result that can't be represented within this variable.
That's rather unwieldy. Is there a better way? What Double.NaN is designed for? I'm assuming some operations must have return it, surely designers did not put it there just in case? Is it possible that this is a bug in BCL? (I know unlikely, but, that's why I'd like to understand how that Double.NaN is supposed to work)
Update
By the way, this problem is not specific for double. decimal exposes it all the same:
Decimal a = 0.0000000000000000000000000001m;
Decimal b = a* 0.1m;
Console.WriteLine(b);
That also gives zero.
In my case I need double, because I need the range they provide (I'm working on probabilistic calculations) and I'm not that worried about precision.
What I need though is to be able to detect when my results stop mean anything, that is when calculations drop the value so low, that it can no longer be presented by double.
Is there a practical way of detecting this?
Double works exactly according to the floating point numbers specification, IEEE 754. So no, it's not an error in BCL - it's just the way IEEE 754 floating points work.
The reason, of course, is that it's not what floats are designed for at all. Instead, you might want to use decimal, which is a precise decimal number, unlike float/double.
There's a few special values in floating point numbers, with different meanings:
Infinity - e.g. 1f / 0f.
-Infinity - e.g. -1f / 0f.
NaN - e.g. 0f / 0f or Math.Sqrt(-1)
However, as the commenters below noted, while decimal does in fact check for overflows, coming too close to zero is not considered an overflow, just like with floating point numbers. So if you really need to check for this, you will have to make your own * and / methods. With decimal numbers, you shouldn't really care, though.
If you need this kind of precision for multiplication and division (that is, you want your divisions to be reversible by multiplication), you should probably use rational numbers instead - two integers (big integers if necessary). And use a checked context - that will produce an exception on overflow.
IEEE 754 in fact does handle underflow. There's two problems:
The return value is 0 (or -1 for negative undreflow). The exception flag for underflow is set, but there's no way to get that in .NET.
This only occurs for the loss of precision when you get too close to zero. But you lost most of your precision way long before that. Whatever "precise" number you had is long gone - the operations are not reversible, and they are not precise.
So if you really do care about reversibility etc., stick to rational numbers. Neither decimal nor double will work, C# or not. If you're not that precise, you shouldn't care about underflows anyway - just pick the lowest reasonable number, and declare anything under that as "invalid"; may sure you're far away from the actual maximum precision - double.Epsilon will not help, obviously.
All you need is epsilon.
This is a "small number" which is small enough so you're no longer interested in.
You could use:
double epsilon = 1E-50;
and whenever one of your factors gets smaller than epislon you take action (for example treat it like 0.0)
I have a class like such:
public class Test
{
public void CheckValue(float val)
{
if (val == 0f)
{
//Do something
}
}
}
//Somewhere else..
void SomeMethod
{
Test t = new Test();
t.CheckValue(0f);
}
Is it guaranteed that checking if (val == 0f) will return true? Or will floating point numbers only lose precision when you perform operations with them?
They don't lose precision. There are inaccuracies in describing some floating point values in decimal. When doing operations on them, those inaccuracies can add up or cancel each other out.
In your case, 0 can be described completely accurately both in decimal and as a binary floating point, so it should work. Other values might not be and you should (as others point out) not rely on == for floating point values.
See more information here: http://floating-point-gui.de/
What it comes down to is that you can't perfectly represent every floating point number (in all their infinite variance) in the 32 bits worth of memory that holds a float. A number that can't be represented perfectly will only have a very small error value, but it will be there.
For example, 1/3 is (roughly) 0.333333343267 in the nearest float representation. You've got 7 decimal digits of accuracy, then it hits a bit of a snag. So let's say you divide 1 by 3, then later multiply by 3. The result isn't 1, it's 1.0000000298 and change. So (1/3)*3 =/= 1.
These representational errors are why directly comparing floating point values is considered poor practice.
In situations where you need to minimize the impact of these little errors you should do whatever you can to stop them from accumulating. Rotating a set of coordinates a little at a time for instance will give you imperfect results, since the errors introduced by each rotation operation (using sin and cos values that are unlikely to have perfect float representations) will accumulate over time. Better to track the total angular change and rotate the original vector each time to get smaller error values.
Another place this can bite you is in a for loop. If you want to do 10,000 steps at 1/30000 per step, use an int for the loop variable and calculate the current position as a division. Slower, yes, but far more accurate. Using floats your error accumulates until you have only about 3 reliable decimal digits of significance, instead of over 7 if you divide each time.
Speed is nice, but precision is nicer in some circumstances.
And yes, it seems silly, but these errors can really mount up. Not so important in some places, rampant error factories in others.
You should never compare floating-point-numbers with "==". Instead you can use a border. For example instead of comparing == 0 you can do something like
float border = 0.00000001;
if( (val - border) <= 0 && (val + border) >= 0)
I'm messing around with Fourier transformations. Now I've created a class that does an implementation of the DFT (not doing anything like FFT atm). This is the implementation I've used:
public static Complex[] Dft(double[] data)
{
int length = data.Length;
Complex[] result = new Complex[length];
for (int k = 1; k <= length; k++)
{
Complex c = Complex.Zero;
for (int n = 1; n <= length; n++)
{
c += Complex.FromPolarCoordinates(data[n-1], (-2 * Math.PI * n * k) / length);
}
result[k-1] = 1 / Math.Sqrt(length) * c;
}
return result;
}
And these are the results I get from Dft({2,3,4})
Well it seems pretty okay, since those are the values I expect. There is only one thing I find confusing. And it all has to do with the rounding of doubles.
First of all, why are the first two numbers not exactly the same (0,8660..443 8 ) vs (0,8660..443). And why can't it calculate a zero, where you'd expect it. I know 2.8E-15 is pretty close to zero, but well it's not.
Anyone know how these, marginal, errors occur and if I can and want to do something about it.
It might seem that there's not a real problem, because it's just small errors. However, how do you deal with these rounding errors if you're for example comparing 2 values.
5,2 + 0i != 5,1961524 + i2.828107*10^-15
Cheers
I think you've already explained it to yourself - limited precision means limited precision. End of story.
If you want to clean up the results, you can do some rounding of your own to a more reasonable number of siginificant digits - then your zeros will show up where you want them.
To answer the question raised by your comment, don't try to compare floating point numbers directly - use a range:
if (Math.Abs(float1 - float2) < 0.001) {
// they're the same!
}
The comp.lang.c FAQ has a lot of questions & answers about floating point, which you might be interested in reading.
From http://support.microsoft.com/kb/125056
Emphasis mine.
There are many situations in which precision, rounding, and accuracy in floating-point calculations can work to generate results that are surprising to the programmer. There are four general rules that should be followed:
In a calculation involving both single and double precision, the result will not usually be any more accurate than single precision. If double precision is required, be certain all terms in the calculation, including constants, are specified in double precision.
Never assume that a simple numeric value is accurately represented in the computer. Most floating-point values can't be precisely represented as a finite binary value. For example .1 is .0001100110011... in binary (it repeats forever), so it can't be represented with complete accuracy on a computer using binary arithmetic, which includes all PCs.
Never assume that the result is accurate to the last decimal place. There are always small differences between the "true" answer and what can be calculated with the finite precision of any floating point processing unit.
Never compare two floating-point values to see if they are equal or not- equal. This is a corollary to rule 3. There are almost always going to be small differences between numbers that "should" be equal. Instead, always check to see if the numbers are nearly equal. In other words, check to see if the difference between them is very small or insignificant.
Note that although I referenced a microsoft document, this is not a windows problem. It's a problem with using binary and is in the CPU itself.
And, as a second side note, I tend to use the Decimal datatype instead of double: See this related SO question: decimal vs double! - Which one should I use and when?
In C# you'll want to use the 'decimal' type, not double for accuracy with decimal points.
As to the 'why'... repsensenting fractions in different base systems gives different answers. For example 1/3 in a base 10 system is 0.33333 recurring, but in a base 3 system is 0.1.
The double is a binary value, at base 2. When converting to base 10 decimal you can expect to have these rounding errors.