c# double value displayed as .9999998? - c#

After further investigation, it all boils down to this:
(decimal)((object)my_4_decimal_place_double_value_20.9032)
after casting twice, it becomes 20.903199999999998
I have a double value, which is rounded to just 4 decimal points via Math.Round(...) the value is 20.9032
In my dev environment, it is displayed as is.
But in released environment, it is displayed as 20.903199999999998
There were no operation after Math.Round(...) but the value has been copied around and assigned.
How can this happen?
Updates:
Data is not loaded from a DB.
returned value from Math.Round() is assigned to the original double varible.
Release and dev are the same architecture, if this information helps.

According to the CLR ECMA specification:
Storage locations for floating-point numbers (statics, array elements,
and fields of classes) are of fixed size. The supported storage sizes
are float32 and float64. Everywhere else (on the evaluation stack, as
arguments, as return types, and as local variables) floating-point
numbers are represented using an internal floating-point type. In each
such instance, the nominal type of the variable or expression is
either R4 or R8, but its value can be represented internally with
additional range and/or precision. The size of the internal
floating-point representation is implementation-dependent, can vary,
and shall have precision at least as great as that of the variable or
expression being represented. An implicit widening conversion to the
internal representation from float32 or float64 is performed when
those types are loaded from storage. The internal representation is
typically the native size for the hardware, or as required for
efficient implementation of an operation.
To translate, the IL generated will be the same (except that debug mode inserts nops in places to ensure a breakpoint is possible, it may also deliberately maintain a temporary variable that release mode deems unnecessary.)... but the JITter is less aggressive when dealing with an assembly marked as debug. Release builds tend to move more floating values into 80-bit registers; debug builds tend to read direct from 64-bit memory storage.
If you want a "precise" float number printing, use string.Substring(...) instead of Math.Round

A IEEE754 double precision floating point number can not represent 20.9032.
The most accurate representation is 2.09031999999999982264853315428E1 and that is what you see in your output.
Do not format numbers with round instead use the string format of the double.ToString(string formatString) Method.
See msdn documentation of Double.ToString Method (String)
The difference between Release and Debug build may be some optimization that gets done for the release build, but this is way to detailed in my opinion.
In my opinion the core issue is that you try to format a text output with a mathematical Operation. I'm sorry but i don't know what in detail creates the different behavior.

Related

Float epsilon is different in c++ than c#

So small question, I've been looking into moving part of my C# code to C++ for performance reasons.
Now when I look at my float.Epsilon in C# its value is different from my C++ value.
In C# the value, as described by microsoft is 1.401298E-45.
In C++ the value, as described by cppreferences is 1.19209e-07;
How can it be that the smallest possible value for a float/single can be different between these languages?
If I'm correct, the binary values should be equal in terms of number of bytes an maybe even their binary values. Or am I looking at this the wrong way?
Hope someone can help me, thanks!
The second value you quoted is the machine epsilon for IEEE binary32 values.
The first value you quoted is NOT the machine epsilon. From the documentation you linked:
The value of the Epsilon property is not equivalent to machine epsilon, which represents the upper bound of the relative error due to rounding in floating-point arithmetic.
From the wiki Variant Definitions section for machine epsilon:
The IEEE standard does not define the terms machine epsilon and unit roundoff, so differing definitions of these terms are in use, which can cause some confusion.
...
The following different definition is much more widespread outside academia: Machine epsilon is defined as the difference between 1 and the next larger floating point number.
The C# documentation is using that variant definition.
So the answer is that you are comparing two different types of Epsilon.
C++
Returns the machine epsilon, that is, the difference between 1.0 and the next value representable by the floating-point type T.
https://en.cppreference.com/w/cpp/types/numeric_limits/epsilon
C#
Represents the smallest positive Single value that is greater than zero. This field is constant.
https://learn.microsoft.com/en-us/dotnet/api/system.single.epsilon?view=net-5.0
Conclusion
C# has next value from 0, C++ has next from 1. Two completely different things.
Edit: The other answer is probably more correct
From the link you referenced, you should use the value FLT_TRUE_MIN ("minimum positive value of float") if you want something similar to .NET Single.Epsilon ("smallest positive single value that is greater than zero").

Why does the compiler optimize ldc.i8 and not ldc.r8?

I'm wondering why this C# code
long b = 20;
is compiled to
ldc.i4.s 0x14
conv.i8
(Because it takes 3 bytes instead of the 9 required by ldc.i8 20. See this for more information.)
while this code
double a = 20;
is compiled to the 9-byte instruction
ldc.r8 20
instead of this 3-byte sequence
ldc.i4.s 0x14
conv.r8
(Using mono 4.8.)
Is this a missed opportunity or the cost of the conv.i8 outbalances the gain in code size ?
Because float is not a smaller double, and integer is not a float (or vice versa).
All int values have a 1:1 mapping on a long value. The same simply isn't true for float and double - floating point operations are tricky that way. Not to mention that int-float conversions aren't free - unlike pushing a 1 byte value on the stack / in a register; look at the x86-64 code produced by both approaches, not just the IL code. Size of the IL code is not the only factor to consider in optimisation.
This is in contrast to decimal, which is actually a base-10 decimal number, rather than a base-2 decimal floating point number. There 20M maps perfectly to 20 and vice versa, so the compiler is free to emit this:
IL_0000: ldc.i4.s 0A
IL_0002: newobj System.Decimal..ctor
The same approach simply isn't safe (or cheap!) for binary floating point numbers.
You might think that the two approaches are necessarily safe, because it doesn't really matter whether we do a conversion from an integer literal ("a string") to a double value in compile-time, or whether we do it in IL. But this simply isn't the case, as a bit of specification diving unveils:
ECMA CLR spec, III.1.1.1:
Storage locations for floating-point numbers (statics, array elements, and fields of classes) are of fixed size. The supported storage sizes are float32 and float64.
Everywhere else (on the evaluation stack, as arguments, as return types, and as local variables) floating-point numbers are represented using an internal floating-point type. In each such instance, the nominal type of the variable or expression is either float32 or float64, but its value might be represented internally with additional range and/or precision.
To keep things short, let's pretend float64 actually uses 4 binary digits, while the implementation defined floating type (F) uses 5 binary digits. We want to convert an integer literal that happens to have a binary representation that's more than four digits. Now compare how it's going to behave:
ldc.r8 0.1011E2 ; expanded to 0.10110E2
ldc.r8 0.1E2
mul ; 0.10110E2 * 0.10000E2 == 0.10110E3
conv.r8 converts to the F, not float64. So we actually get:
ldc.i4.s theSameLiteral
conv.r8 ; converted to 0.10111E2
mul ; 0.10111E2 * 0.10000E2 == 0.10111E3
Oops :)
Now, I'm pretty sure this isn't going to happen with an integer in the range of 0-255 on any reasonable platform. But since we're coding against the CLR specification, we can't make that assumption. The JIT compiler can, but that's too late. The language compiler may define the two to be equivalent, but the C# specification doesn't - a double local is considered a float64, not F. You can make your own language, if you so desire.
In any case, IL generators don't really optimise much. That's left to JIT compilation for the most part. If you want an optimised C#-IL compiler, write one - I doubt there's enough benefit to warrant the effort, especially if your only goal is to make the IL code smaller. Most IL binaries are already quite a bit smaller than the equivalent native code.
As for the actual code that runs, on my machine, both approaches result in exactly the same x86-64 assembly - load a double precision value from the data segment. The JIT can easily make this optimisation, since it knows what architecture the code is actually running on.
I doubt you will get more satisfactory answer than "because noone thought it necessary to implement it."
The fact is, they could've made it this way, but as Eric Lippert has many times stated, features are chosen to be implemented rather than chosen not to be implemented. In this particular case, this feature's gain didn't outweigh the costs, e.g. additional testing, non-trivial conversion between int and float, while in the case of ldc.i4.s, it's not that much of a trouble. Also it's better not to bloat the jitter with more optimization rules.
As shown by the Roslyn source code, the conversion is done only for long. All in all, it's entirely possible to add this feature also for float or double, but it won't be much useful except when producing shorter CIL code (useful when inlining is needed), and when you want to use a float constant, you usually actually use a floating point number (i.e. not an integer).
First, let's consider correctness. The ldc.i4.s can handle integers between -128 to 127, all of which can be exactly represented in float32. However, the CIL uses an internal floating-point type called F for some storage locations. The ECMA-335 standard says in III.1.1.1:
...the nominal type of the variable or expression is either float32 or
float64...The internal representation shall have the following
characteristics:
The internal representation shall have precision and range greater than or equal to the nominal type.
Conversions to and from the internal representation shall preserve value.
This all means that any float32 value is guaranteed to be safely represented in F no matter what F is.
We conclude that the alternative sequence of instructions that you have proposed is correct. Now the question is: is it better in terms of performance?
To answer this question, let's see what the JIT compiler does when it sees both code sequences. When using ldc.r8 20, the answer given in the link you referenced explains nicely the ramifications of using long instructions.
Let's consider the 3-byte sequence:
ldc.i4.s 0x14
conv.r8
We can make an assumption here that is reasonable for any optimizing JIT compiler. We'll assume that the JIT is capable of recognizing such sequence of instructions so that the two instructions can be compiled together. The compiler is given the value 0x14 represented in the two's complement format and have to convert it to the float32 format (which is always safe as discussed above). On relatively modern architectures, this can be done extremely efficiently. This tiny overhead is part of the JIT time and therefore is incurred only once. The quality of the generated native code is the same for both IL sequences.
So the 9-byte sequence has a size issue which could incur any amount of overhead from nothing to more (assuming that we use it everywhere) and the 3-byte sequence has the one-time tiny conversion overhead. Which one is better? Well, somebody has to do some scientifically-sound experimentation to measure the difference in performance to answer that question. I would like to stress that you should not care about this unless you are an engineer or researcher in compiler optimizations. Otherwise, you should be optimizing your code at a higher level (at the source code level).

On narrowing casts in C# between signed and unsigned integral types

I seek to get a few thing confirmed regarding the way narrowing casts work with integral types in C# (5.0, .NET Framework 4.0/4.5). Bottom line: Can I be sure that the underlying bytes of integral types stay the same, both in order and value, when casting between signed and unsigned?
Let's say that I do the following:
short shortVal = -20000;
ushort ushortVal = (ushort)shortVal;
Now, the experiments I've done so far, shows me that the bytes in following two byte arrays:
byte[] shortBytes = BitConverter.GetBytes(shortVal);
byte[] ushortBytes = BitConverter.GetBytes(ushortVal);
do NOT differ. I have done this exact experiment with an explicit narrowing cast from short to ushort with the value of shortVal in the range from Int16.MinValue to Int16.MaxValue. All 2^16 cases checks out fine. The actual interpreted value though, is naturally re-interpreted since the bytes stay the same. I assume the signed integral types use two's complement to represent signed values (is this true?)
I need to know, if I can count on these conversion always being "byte-safe" - as in not changing the underlying bytes and their order. This also goes for conversion the other way, from unsigned to signed. Are these conversion exact reverses of each other? I am focused mostly on short/ushort and int/uint. But all integral types are of interest.
These detail are likely up to implementation of the technology behind C# and the CLR. I am here strictly focused on the CLR for Windows 32/64 bit.
This can be quite tricky.
CLR does use two's complement for signed integral number representation on x86/x64 architectures (as you observed in your test), and it's unlikely to change in the near future, because the architecture itself has good support for it. It's safe to assume it will stay like this for a while.
On the other hand, I haven't found any mention of this in either CLI or C# specifications, so you can't count on it in general, especially in face of other architectures and/or CLI implementations.
So it depends on what you want to use this on. I would stay away from depending on implementation details like this if possible, and use higher level serialization tools to convert to/from any binary representation.

Floating point inconsistency between expression and assigned object

This surprised me - the same arithmetic gives different results depending on how its executed:
> 0.1f+0.2f==0.3f
False
> var z = 0.3f;
> 0.1f+0.2f==z
True
> 0.1f+0.2f==(dynamic)0.3f
True
(Tested in Linqpad)
What's going on?
Edit: I understand why floating point arithmetic is imprecise, but not why it would be inconsistent.
The venerable C reliably confirms that 0.1 + 0.2 == 0.3 holds for single-precision floats, but not double-precision floating points.
I strongly suspect you may find that you get different results running this code with and without the debugger, and in release configuration vs in debug configuration.
In the first version, you're comparing two expressions. The C# language allows those expressions to be evaluated in higher precision arithmetic than the source types.
In the second version, you're assigning the addition result to a local variable. In some scenarios, that will force the result to be truncated down to 32 bits - leading to a different result. In other scenarios, the CLR or C# compiler will realize that it can optimize away the local variable.
From section 4.1.6 of the C# 4 spec:
Floating point operations may be performed with higher precision than the result type of the operation. For example, some hardware architectures support an "extended" or "long double" floating point type with greater range and precision than the double type, and implicitly perform all floating point operations with the higher precision type. Only at excessive cost in performance can such hardware architectures be made to perform floating point operations with less precision. Rather than require an implementation to forfeit both performance and precision, C# allows a higher precision type to be used for all floating point operations. Other than delivering more precise results, this rarely has any measurable effects.
EDIT: I haven't tried compiling this, but in the comments, Chris says the first form isn't being evaluated at execution time at all. The above can still apply (I've tweaked my wording slightly) - it's just shifted the evaluation time of a constant from execution time to compile-time. So long as it behaves the same way as a valid evaluation, that seems okay to me - so the compiler's own constant expression evaluation can use higher-precision arithmetic too.

Is this a bug? Float operation being treated as integer

This operation returns a 0:
string value = “0.01”;
float convertedValue = float.Parse(value);
return (int)(convertedValue * 100.0f);
But this operation returns a 1:
string value = “0.01”;
float convertedValue = float.Parse(value) * 100.0f;
return (int)(convertedValue);
Because the convertedValue is a float, and it is in parenthesis *100f shouldn't it still be treated as float operation?
The difference between the two lies in the way the compiler optimizes floating point operations. Let me explain.
string value = "0.01";
float convertedValue = float.Parse(value);
return (int)(convertedValue * 100.0f);
In this example, the value is parsed into an 80-bit floating point number for use in the inner floating point dungeons of the computer. Then this is converted to a 32-bit float for storage in the convertedValue variable. This causes the value to be rounded to, seemingly, a number slightly less than 0.01. Then it is converted back to an 80-bit float and multiplied by 100, increasing the rounding error 100-fold. Then it is converted to an 32-bit int. This causes the float to be truncated, and since it is actually slightly less than 1, the int conversion returns 0.
string value = "0.01";
float convertedValue = float.Parse(value) * 100.0f;
return (int)(convertedValue);
In this example, the value is parsed into an 80-bit floating point number again. It is then multiplied by 100, before it is converted to a 32-bit float. This means that the rounding error is so small that when it is converted to a 32-bit float for storage in convertedValue, it rounds to exactly 1. Then when it is converted to an int, you get 1.
The main idea is that the computer uses high-precision floats for calculations, and then rounds the values whenever they are stored in a variable. The more assignments you have with floats, the more the rounding errors accumulate.
Please read an introduction to floatingpoint. This is a typical floating point problem. Binary floating points can't represent 0.01 exactly.
0.01 * 100 is approximately 1.
If it happens to be rounded to 0.999... you get 0, and if it gets rounded to 1.000... you get 1. Which one of those you get is undefined.
The jit compiler is not required to round the same way every time it encounters a similar expression(or even the same expression in different contexts). In particular it can use higher precision whenever it wants to, but can downgrade to 32 bit floats if it thinks that's a good idea.
One interesting point is an explicit cast to float (even if you already have an expression of type float). This forces the JITer to reduce the precision to 32 bit floats at that point. The exact rounding is still undefined though.
Since the rounding is undefined, it can vary between .net versions, debug/release builds, the presence of debuggers (and possibly the phase of the moon :P).
Storage locations for floating-point numbers (statics, array elements, and fields of classes) are of fixed size. The
supported storage sizes are float32 and float64. Everywhere else (on the evaluation stack, as arguments, as
return types, and as local variables) floating-point numbers are represented using an internal floating-point
type.
When a floating-point value whose internal representation has greater range and/or precision than its nominal type is put in a storage location, it is automatically coerced to the type of the storage location. This can involve
a loss of precision or the creation of an out-of-range value (NaN, +infinity, or -infinity). However, the value might be retained in the internal representation for future use, if it is reloaded from the storage location without
having been modified. It is the responsibility of the compiler to ensure that the retained value is still valid at the time of a subsequent load, taking into account the effects of aliasing and other execution threads (see memory model (§12.6)). This freedom to carry extra precision is not permitted, however, following the execution of an explicit conversion (conv.r4 or conv.r8), at which time the internal representation must be
exactly representable in the associated type.
Your specific problem can be solved by using Decimal, but similar problems with 3*(1/3f) won't be solved by this, since Decimal can't represent one third exactly either.
In this line:
(int)(convertedValue * 100.0f)
The intermediate value is actually of higher precision, not simply a float. To obtain identical results to the second one, you'd have to do:
(int)((float)(convertedValue * 100.0f))
On the IL level, the difference looks like:
mul
conv.i4
versus your second version:
mul
stloc.3
ldloc.3
conv.i4
Note that the second one store/restores the value in a float32 variable, which forces it to be of float precision. (Note that, as per CodeInChaos' comment, this is not guaranteed by the spec.)
(For completeness the explicit cast looks like:)
mul
conv.r4
conv.i4
I know this issue and alwayes working with it.
As our friend CodeInChaose answer that the floating point will not be presented on memory as its.
But i want to add that you have a reason for the different result, not because the JIT free to use the precision that he want.
The reason is on your first code you did convert the string and save it on memory so on this case its will not be saved 0.1 and some how will be saved 0.0999966 or something like this number.
On your second code you make the conversion and before you save it on memory and before the value is allocated on memory you did the multiplication operation so you will have your correct result without taking the risk of JIT precision of float numbers.

Categories

Resources