This surprised me - the same arithmetic gives different results depending on how its executed:
> 0.1f+0.2f==0.3f
False
> var z = 0.3f;
> 0.1f+0.2f==z
True
> 0.1f+0.2f==(dynamic)0.3f
True
(Tested in Linqpad)
What's going on?
Edit: I understand why floating point arithmetic is imprecise, but not why it would be inconsistent.
The venerable C reliably confirms that 0.1 + 0.2 == 0.3 holds for single-precision floats, but not double-precision floating points.
I strongly suspect you may find that you get different results running this code with and without the debugger, and in release configuration vs in debug configuration.
In the first version, you're comparing two expressions. The C# language allows those expressions to be evaluated in higher precision arithmetic than the source types.
In the second version, you're assigning the addition result to a local variable. In some scenarios, that will force the result to be truncated down to 32 bits - leading to a different result. In other scenarios, the CLR or C# compiler will realize that it can optimize away the local variable.
From section 4.1.6 of the C# 4 spec:
Floating point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an "extended" or "long double" floating point type with greater range and precision than the double type, and implicitly perform all floating point operations with the higher precision type. Only at excessive cost in performance can such hardware architectures be made to perform floating point operations with less precision. Rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating point operations. Other than delivering more precise results, this rarely has any measurable effects.
EDIT: I haven't tried compiling this, but in the comments, Chris says the first form isn't being evaluated at execution time at all. The above can still apply (I've tweaked my wording slightly) - it's just shifted the evaluation time of a constant from execution time to compile-time. So long as it behaves the same way as a valid evaluation, that seems okay to me - so the compiler's own constant expression evaluation can use higher-precision arithmetic too.
Related
This question already has answers here:
Is floating-point math consistent in C#? Can it be?
(10 answers)
Closed 7 years ago.
I am writing a library for multiprecision arithmetic based on a paper I am reading. It is very important that I am able to guarantee the properties of floating point numbers I use. In particular, that they adhere to the IEEE 754 standard for double precision floating point numbers. Clearly I cannot guarantee the behavior of my code on an unexpected platform, but for x86 and x64 chipsets, which I am writing for, I am concerned about a particular hazard. Apparently, some or all x86 / x64 chipsets may make use of extended precision floating point numbers in their FPU registers, with 80 bits of precision. I cannot tolerate my arithmetic being handled in extended precision FPUs without being rounded to double precision after every operation because the proofs of correctness for the algorithms I am using rely on rounding to occur. I can easily identify cases in which extended precision could break these algorithms.
I am writing my code in C#. How can I guarantee certain values are rounded? In C, I would declare variables as volatile, forcing them to be written back to RAM. This is slow and I'd rather keep the numbers in registers as 64 bit floats, but correctness in these algorithms is the whole point, not speed. In any case, I need a solution for C#. If this seems in-feasible I will approach the problem in a different language.
The C# spec has this to say on the topic:
Only at excessive cost in performance can such hardware architectures be made to perform floating-point operations with less precision, and rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating-point operations. Other than delivering more precise results, this rarely has any measurable effects.
As a result, third-party libraries are required to simulate the behavior of a IEEE 754-compliant FPU. One such is SoftFloat, which creates a type SoftFloat that uses operator overloads to simulate a standard double behavior.
An obvious problem with 80-bit intermediate values is that it is very much up to the compiler and optimizer to decide when a value is truncated back to 64-bit. So different compilers may end up producing different results for the same sequence of floating point operations. An example is an operation like abcd. Depending on the availability of 80-bit floating point registers the compiler might round ab to 64-bit and leave c*d at 80-bit. I guess this is the root of your question where you need to eliminate this uncertainty.
I think your options are pretty limited in managed code. You could use a 3rd party software emulation like the other answer suggested. Or maybe you could try coercing the double to long and back. I have no way of checking if this actually works right now but you could try something like this between operations:
public static double Truncate64(double val)
{
unsafe
{
long l = *((long*) &val);
return *((double*) &l);
}
}
This also type checks:
public static double Truncate64(double val)
{
unsafe
{
return *((long*) &val);
}
}
Hope that helps.
When I run the following code, I get 0 printed on both lines:
Double a = 9.88131291682493E-324;
Double b = a*0.1D;
Console.WriteLine(b);
Console.WriteLine(BitConverter.DoubleToInt64Bits(b));
I would expect to get Double.NaN if an operation result gets out of range. Instead I get 0. It looks that to be able to detect when this happens I have to check:
Before the operation check if any of the operands is zero
After the operation, if neither of operands were zero, check if the result is zero. If not let it run. If it is zero, assign Double.NaN to it instead to indicate that it's not really a zero, it's just a result that can't be represented within this variable.
That's rather unwieldy. Is there a better way? What Double.NaN is designed for? I'm assuming some operations must have return it, surely designers did not put it there just in case? Is it possible that this is a bug in BCL? (I know unlikely, but, that's why I'd like to understand how that Double.NaN is supposed to work)
Update
By the way, this problem is not specific for double. decimal exposes it all the same:
Decimal a = 0.0000000000000000000000000001m;
Decimal b = a* 0.1m;
Console.WriteLine(b);
That also gives zero.
In my case I need double, because I need the range they provide (I'm working on probabilistic calculations) and I'm not that worried about precision.
What I need though is to be able to detect when my results stop mean anything, that is when calculations drop the value so low, that it can no longer be presented by double.
Is there a practical way of detecting this?
Double works exactly according to the floating point numbers specification, IEEE 754. So no, it's not an error in BCL - it's just the way IEEE 754 floating points work.
The reason, of course, is that it's not what floats are designed for at all. Instead, you might want to use decimal, which is a precise decimal number, unlike float/double.
There's a few special values in floating point numbers, with different meanings:
Infinity - e.g. 1f / 0f.
-Infinity - e.g. -1f / 0f.
NaN - e.g. 0f / 0f or Math.Sqrt(-1)
However, as the commenters below noted, while decimal does in fact check for overflows, coming too close to zero is not considered an overflow, just like with floating point numbers. So if you really need to check for this, you will have to make your own * and / methods. With decimal numbers, you shouldn't really care, though.
If you need this kind of precision for multiplication and division (that is, you want your divisions to be reversible by multiplication), you should probably use rational numbers instead - two integers (big integers if necessary). And use a checked context - that will produce an exception on overflow.
IEEE 754 in fact does handle underflow. There's two problems:
The return value is 0 (or -1 for negative undreflow). The exception flag for underflow is set, but there's no way to get that in .NET.
This only occurs for the loss of precision when you get too close to zero. But you lost most of your precision way long before that. Whatever "precise" number you had is long gone - the operations are not reversible, and they are not precise.
So if you really do care about reversibility etc., stick to rational numbers. Neither decimal nor double will work, C# or not. If you're not that precise, you shouldn't care about underflows anyway - just pick the lowest reasonable number, and declare anything under that as "invalid"; may sure you're far away from the actual maximum precision - double.Epsilon will not help, obviously.
All you need is epsilon.
This is a "small number" which is small enough so you're no longer interested in.
You could use:
double epsilon = 1E-50;
and whenever one of your factors gets smaller than epislon you take action (for example treat it like 0.0)
I'm experiencing strange issue when casting decimal to double.
Following code returns true:
Math.Round(0.010000000312312m, 2) == 0.01m //true
However, when I cast this to double it returns false:
(double)Math.Round(0.010000000312312m, 2) == (double)0.01m //false
I've experienced this problem when I wanted to use Math.Pow and was forced to cast decimal to double since there is no Math.Pow overload for decimal.
Is this documented behavior? How can I avoid it when I'm forced to cast decimal to double?
Screenshot from Visual Studio:
Casting Math.Round to double me following result:
(double)Math.Round(0.010000000312312m, 2) 0.0099999997764825821 double
(double)0.01m 0.01 double
UPDATE
Ok, I'm reproducing the issue as follows:
When I run WPF application and check the output in watch just after it started I get true like on empty project.
There is a part of application that sends values from the slider to the calculation algorithm. I get wrong result and I put breakpoint on the calculation method. Now, when I check the value in watch window I get false (without any modifications, I just refresh watch window).
As soon as I reproduce the issue in some smaller project I will post it here.
UPDATE2
Unfortunately, I cannot reproduce the issue in smaller project. I think that Eric's answer explains why.
People are reporting in the comments here that sometimes the result of the comparison is true and sometimes it is false.
Unfortunately, this is to be expected. The C# compiler, the jitter and the CPU are all permitted to perform arithmetic on doubles in more than 64 bit double precision, as they see fit. This means that sometimes the results of what looks like "the same" computation can be done in 64 bit precision in one calculation, 80 or 128 bit precision in another calculation, and the two results might differ in their last bit.
Let me make sure that you understand what I mean by "as they see fit". You can get different results for any reason whatsoever. You can get different results in debug and retail. You can get different results if you make the compiler do the computation in constants and if you make the runtime do the computation at runtime. You can get different results when the debugger is running. You can get different results in the runtime and the debugger's expression evaluator. Any reason whatsoever. Double arithmetic is inherently unreliable. This is due to the design of the floating point chip; double arithmetic on these chips cannot be made more repeatable without a considerable performance penalty.
For this and other reasons you should almost never compare two doubles for exact equality. Rather, subtract the doubles, and see if the absolute value of the difference is smaller than a reasonable bound.
Moreover, it is important that you understand why rounding a double to two decimal places is a difficult thing to do. A non-zero, finite double is a number of the form (1 + f) x 2e where f is a fraction with a denominator that is a power of two, and e is an exponent. Clearly it is not possible to represent 0.01 in that form, because there is no way to get a denominator equal to a power of ten out of a denominator equal to a power of two.
The double 0.01 is actually the binary number 1.0100011110101110000101000111101011100001010001111011 x 2-7, which in decimal is 0.01000000000000000020816681711721685132943093776702880859375. That is the closest you can possibly get to 0.01 in a double. If you need to represent exactly that value then use decimal. That's why its called decimal.
Incidentally, I have answered variations on this question many times on StackOverflow. For example:
Why differs floating-point precision in C# when separated by parantheses and when separated by statements?
Also, if you need to "take apart" a double to see what its bits are, this handy code that I whipped up a while back is quite useful. It requires that you install Solver Foundation, but that's a free download.
http://ericlippert.com/2011/02/17/looking-inside-a-double/
This is documented behavior. The decimal data type is more precise than the double type. So when you convert from decimal to double there is the possibility of data loss. This is why you are required to do an explicit conversion of the type.
See the following MSDN C# references for more information:
decimal data type: http://msdn.microsoft.com/en-us/library/364x0z75(v=vs.110).aspx
double data type: http://msdn.microsoft.com/en-us/library/678hzkk9(v=vs.110).aspx
casting and type conversion: http://msdn.microsoft.com/en-us/library/ms173105.aspx
After further investigation, it all boils down to this:
(decimal)((object)my_4_decimal_place_double_value_20.9032)
after casting twice, it becomes 20.903199999999998
I have a double value, which is rounded to just 4 decimal points via Math.Round(...) the value is 20.9032
In my dev environment, it is displayed as is.
But in released environment, it is displayed as 20.903199999999998
There were no operation after Math.Round(...) but the value has been copied around and assigned.
How can this happen?
Updates:
Data is not loaded from a DB.
returned value from Math.Round() is assigned to the original double varible.
Release and dev are the same architecture, if this information helps.
According to the CLR ECMA specification:
Storage locations for floating-point numbers (statics, array elements,
and fields of classes) are of fixed size. The supported storage sizes
are float32 and float64. Everywhere else (on the evaluation stack, as
arguments, as return types, and as local variables) floating-point
numbers are represented using an internal floating-point type. In each
such instance, the nominal type of the variable or expression is
either R4 or R8, but its value can be represented internally with
additional range and/or precision. The size of the internal
floating-point representation is implementation-dependent, can vary,
and shall have precision at least as great as that of the variable or
expression being represented. An implicit widening conversion to the
internal representation from float32 or float64 is performed when
those types are loaded from storage. The internal representation is
typically the native size for the hardware, or as required for
efficient implementation of an operation.
To translate, the IL generated will be the same (except that debug mode inserts nops in places to ensure a breakpoint is possible, it may also deliberately maintain a temporary variable that release mode deems unnecessary.)... but the JITter is less aggressive when dealing with an assembly marked as debug. Release builds tend to move more floating values into 80-bit registers; debug builds tend to read direct from 64-bit memory storage.
If you want a "precise" float number printing, use string.Substring(...) instead of Math.Round
A IEEE754 double precision floating point number can not represent 20.9032.
The most accurate representation is 2.09031999999999982264853315428E1 and that is what you see in your output.
Do not format numbers with round instead use the string format of the double.ToString(string formatString) Method.
See msdn documentation of Double.ToString Method (String)
The difference between Release and Debug build may be some optimization that gets done for the release build, but this is way to detailed in my opinion.
In my opinion the core issue is that you try to format a text output with a mathematical Operation. I'm sorry but i don't know what in detail creates the different behavior.
What is the most recommended data type to use in scientific calculation in .Net? Is it float, double or something else?
Scientific values tend to be "natural" values (length, mass, time etc) where there's a natural degree of imprecision to start with - but where you may well want very, very large or very, very small numbers. For these values, double is generally a good idea. It's fast (with hardware support almost everywhere), scales up and down to huge/tiny values, and generally works fine if you're not concerned with exact decimal values.
decimal is a good type for "artificial" numbers where there's an exact value, almost always represented naturally as a decimal - the canonical example for this is currency. However, it's twice as expensive as double in terms of storage (8 bytes per value instead of 4), has a smaller range (due to a more limited exponent range) and is significantly slower due to a lack of hardware support.
I'd personally only use float if storage was an issue - it's amazing how quickly the inaccuracies can build up when you only have around 7 significant decimal places.
Ultimately, as the comment from "bears will eat you" suggests, it depends on what values you're talking about - and of course what you plan to do with them. Without any further information I suspect that double is a good starting point - but you should really make the decision based on the individual situation.
Well, of course the term “scientific calculation” is a bit vague, but in general, it’s double.
float is largely for compatibility with libraries that expect 32-bit floating-point numbers. The performance of float and double operations (like addition) is exactly the same, so new code should always use double because it has greater precision.
However, the x86 JITter will never inline functions that take or return a float, so using float in methods could actually be slower. Once again, this is for compatibility: if it were inlined, the execution engine would skip a conversion step that reduces its precision, and thus the JITter could inadvertantly change the result of some calculations if it were to inline such functions.
Finally, there’s also decimal. Use this whenever it is important to have a certain number of decimal places. The stereotypical use-case is currency operations, but of course it supports more than 2 decimal places — it’s actually an 80-bit piece of data.
If even the accuracy of 64-bit double is not enough, consider using an external library for arbitrary-precision numbers, but of course you will only need that if your specific scientific use-case specifically calls for it.
Double seems to be the most reliable data type for such operations. Even WPF uses it extensively.
Be aware that decimals are much more expensive to use than floats/doubles (in addition to what Jon Skeet and Timwi wrote).
I'd recommend double unless you need the value to be exact; decimal is for financial calculations that need this exactitude. Scientific calculations tolerate small errors because you can't exactly measure 1 meter anyways. Float only helps if storage is a problem (ie. huge matrices).