"Converting" float to double in C# - c#

I have property for the life of an particle (I'm simulating particle systems) which is a float value, because I also use it for transparency (alpha is float). I have read questions about converting float to double and realized that it is quite a problem, so I'll probably can't convert float to double. Problem is that I want try to calculate path of the particle from the life variable like:
particle.x += particle.xi;
particle.xi = Math.Sin(life);
Note: life value is at the beginning 1.0f and decreasing to 0 (if its <0.0f I reinitialize the particle and set the life to 1.0f).
But the Sin wants the the double value ... and we are back at the beginning.
So one solution can be changing the type of life property to double and when I use it for transparency I will just this double convert to float which shouldn't be big problem (I guess).
But my question is if is there any other way to do it, cause double also cost more memory (I don't know what more means in this case - I guess to times more, lets say I'll have 500 particles and each will have this property). I just need somehow calculate sin value from this float property.
So is it possible? Are my concerns about memory important?

This should work:
particle.x += particle.xi;
particle.xi = (float) Math.Sin(life);
It doesn't "use more memory", it simply converts the value to a double temporarily while it's recalculating it, then converts it back to a float when it goes to store the value.
To go into a bit more detail: that Math.Sin requires a double, but the float can be converted to this higher precision without loss, so it just "magically works" (an implicit cast from float to double). However, to convert the resulting double back to a float, you will be reducing the precision of the number and so the compiler (which doesn't know if this will be acceptable to you) won't do it unless you force it to (by using (float) which is an explicit cast from double to float).

Related

C# explanation of the "f" keyword for a float vs implicit and explicit conversions

So I don't really get the conversion from a float to a double e.g:
float wow = 1.123562641346;
float wow = 1.123562641346f;
The first one is a double that gets saved in a variable that has memory reserved for a float and it can't implicitely convert to a float so it gives an error.
The second one has on the right side a float being saved inside a float variable. What I don't get is that the second one gives exactly the same result as this:
float wow = (float) 1.123562641346;
I mean the second one is it just exactly the same as (float), does the "f" just stand for explicitely convert to a float?
If it doesn't mean "explicitely" convert to float, then I don't know why it doesn't give an error since there isn't an implicit conversion for it.
I really can't find any resources that seem to explain this in any way, the only answers I can find is that the "f" just means that it is a float data type, but that still doesn't explain why I can give it something with 13 decimals and it converts it to the 7 decimals expected, while the first solution doesn't work and it doesn't automatically convert it.
Implicitly converting a double to a float is potentially a data-losing operation, so the compiler makes it an error.
Explicitly converting it means that the programmer has taken control, so it does not provoke an error.
Note that in your example, the double value does indeed lose data when it's converted to float as the following demonstrates:
double d = 1.123562641346;
Console.WriteLine(d.ToString("f16")); // 1.1235626413460000
float f1 = (float)1.123562641346;
Console.WriteLine(f1.ToString("f16")); // 1.1235626935958862
float f2 = 1.123562641346f;
Console.WriteLine(f2.ToString("f16")); // 1.1235626935958862
The compiler is trying to prevent the programmer from writing code that causes accidental data loss.
Note that it does NOT warn that float f2 = 1.123562641346f; is trying to initialise f2 with a value that it cannot actually represent. The same thing can happen with double initialisations - the compiler won't warn about assigning a double that can't actually be represented exactly.
The numeric value on the right of the "=" when initialising a floating point number is known as a "real-literal".
The C# Standard says this about converting the value of a real-literal to a floating point value:
The value of a real literal of type float or double is determined by
using the IEEE “round to nearest” mode.
This rounding is performed without provoking a compile error.
Your understanding is correct, the f at the end of the number indicates that it's a float, so it will be considered as a float and when you assign this float value to a float variable, you will not get conversion errors.
If there is no f at the end of the number having decimals, then, by default, the value is handled as a double and once you assign this double value to a float variable, you get an error because of potential data loss.
Read more here: https://answers.unity.com/questions/423675/why-is-there-sometimes-an-f-behinf-a-number.html

How do i convert Vector3 into float? [duplicate]

Example:
float timeRemaining = 0.58f;
Why is the f is required at the end of this number?
Your declaration of a float contains two parts:
It declares that the variable timeRemaining is of type float.
It assigns the value 0.58 to this variable.
The problem occurs in part 2.
The right-hand side is evaluated on its own. According to the C# specification, a number containing a decimal point that doesn't have a suffix is interpreted as a double.
So we now have a double value that we want to assign to a variable of type float. In order to do this, there must be an implicit conversion from double to float. There is no such conversion, because you may (and in this case do) lose information in the conversion.
The reason is that the value used by the compiler isn't really 0.58, but the floating-point value closest to 0.58, which is 0.57999999999999978655962351581366... for double and exactly 0.579999946057796478271484375 for float.
Strictly speaking, the f is not required. You can avoid having to use the f suffix by casting the value to a float:
float timeRemaining = (float)0.58;
Because there are several numeric types that the compiler can use to represent the value 0.58: float, double and decimal. Unless you are OK with the compiler picking one for you, you have to disambiguate.
The documentation for double states that if you do not specify the type yourself the compiler always picks double as the type of any real numeric literal:
By default, a real numeric literal on the right side of the assignment
operator is treated as double. However, if you want an integer number
to be treated as double, use the suffix d or D.
Appending the suffix f creates a float; the suffix d creates a double; the suffix m creates a decimal. All of these also work in uppercase.
However, this is still not enough to explain why this does not compile:
float timeRemaining = 0.58;
The missing half of the answer is that the conversion from the double 0.58 to the float timeRemaining potentially loses information, so the compiler refuses to apply it implicitly. If you add an explicit cast the conversion is performed; if you add the f suffix then no conversion will be needed. In both cases the code would then compile.
The problem is that .NET, in order to allow some types of implicit operations to be carried out involving float and double, needed to either explicitly specify what should happen in all scenarios involving mixed operands or else allow implicit conversions between the types to be performed in one direction only; Microsoft chose to follow the lead of Java in allowing the direction which occasionally favors precision, but frequently sacrifices correctness and generally creates hassle.
In almost all cases, taking the double value which is closest to a particular numeric quantity and assigning it to a float will yield the float value which is closest to that same quantity. There are a few corner cases, such as the value 9,007,199,791,611,905; the best float representation would be 9,007,200,328,482,816 (which is off by 536,870,911), but casting the best double representation (i.e. 9,007,199,791,611,904) to float yields 9,007,199,254,740,992 (which is off by 536,870,913). In general, though, converting the best double representation of some quantity to float will either yield the best possible float representation, or one of two representations that are essentially equally good.
Note that this desirable behavior applies even at the extremes; for example, the best float representation for the quantity 10^308 matches the float representation achieved by converting the best double representation of that quantity. Likewise, the best float representation of 10^309 matches the float representation achieved by converting the best double representation of that quantity.
Unfortunately, conversions in the direction that doesn't require an explicit cast are seldom anywhere near as accurate. Converting the best float representation of a value to double will seldom yield anything particularly close to the best double representation of that value, and in some cases the result may be off by hundreds of orders of magnitude (e.g. converting the best float representation of 10^40 to double will yield a value that compares greater than the best double representation of 10^300.
Alas, the conversion rules are what they are, so one has to live with using silly typecasts and suffixes when converting values in the "safe" direction, and be careful of implicit typecasts in the dangerous direction which will frequently yield bogus results.

C# "Integral constant is too large" - Integer variable too large for Int32 type

I'm building an ASP.NET Core API that returns the periodic table and the planets in the solar system. Mainly to play with API calls, as well as data types in .NET. Building this in Visual Studio 2017.
I'm having problems with a Mass property for planets, in Kilograms. Obviously a large integer, so I've tried declaring it as a long, Ulong, Int64, and even UInt64. However, when I try to enter in a new model of a planet, and put in the mass, I get the following error:
struct System.Int32
Represents a 32 bit signed integer.
Integral constant is too large.
Here is my PlanetModel.cs where I'm describing the property:
...
public UInt64 Volume { get; set; } //In Kilometers
public Int64 Mass { get; set; } //In Kilograms
public float Gravity { get; set; } //In m/s^2
...
Yet when I move over to my PlanetDataStore.cs, here is where I'm trying to build the object with data.
new PlanetModel()
{
Position = 1,
Name = "Mercury",
Distance = 57909050, //In Kilometers
Orbit = 0.240846F, //In years
SolarDay = 0.5F, //In Days
Radius = 2439, //In Kilometers
Volume = 60830000000, // In Kilometers
Mass = 330110000000000000000000, //In Kilograms
Gravity = 3.7F, //In m/s^2
},
I get a red error squiggly over the first '3' in my Mass, with the error above. Yet when I mouse over Mass, it reads: "long PlanetMode.Mass {get; set;}
When I mouse over the equals, it reads: "struct System.Int64"
Where is the miscommunication? Why is my property declared as a 64-bit integer, but the value stuck in 32-bit?
Obviously a large integer
That is by no means obvious. The mass of the earth is not an integral number of kilograms. You shouldn't be using any integral type for this application. Use integers for things that are genuinely integers, like the number of elements in a sequence.
Use double for physical quantities that are accurately measured to ten-ish decimal places. Mass, volume, length, force, and so on, should always be doubles.
This will also let you get rid of those hard-to-read numbers. Doubles let you use scientific notation because doubles are for science:
Mass = 3.3011E23,
Also, while we're looking at your solution, I note that for spherical planets, the volume can be computed from the radius. You might not want to store both; instead, just store one and calculate the other when you need it.
I note also that you have a single "distance" which seems to be the semi-major axis in the case of Mercury. Why are you storing only the semi-major axis, and why not call it what it is?
I note also that you have mixed up your units all over the place -- meters and kilometers, days and years, and so on. Why not keep everything in standard units? Meters for length, seconds for time, kilograms for mass. You'll find that you make fewer silly arithmetic mistakes when you use standard units consistently.
The value you have is too long for an Int64, long is also 64 bits. The largest number you can store in Int64 is (2^64 - 1)/2 = 9,223,372,036,854,775,807
You can store this number as floating point. If you want to store the number as a floating point you have to tell the compiler it's a floating point number.
Look up how floating point numbers are stored to get a better understanding. If you want to store it in a float you can do this 330110000000000000000000.0f
Long literals must be suffixed with L in C#
Mass = 123456789012345L
Though that clearly won't enable you to store a number larger than 64 bits in a long :)
Why not use a floating point type?
As others pointed out, your number is too large to fit in the UInt64 structure.
So you'll need that BigInteger, or maybe another, wider, mutable numeric type, such as double. So your model declare the Mass property like this:
public double Mass { get; set; } //In Kilograms
Next, you must suffix your numeric literal with the D, like so:
Mass = 330110000000000000000000D, //In Kilograms
You format the number for display as usual:
mercury.Mass.ToString("N0");
You have a slight problem here, the max value of an Int64/UInt64 are way less than the number you are trying to assign.
To put it in perspective:
330,110,000,000,000,000,000,000 //Your number
9,223,372,036,854,775,807 //Int64 Max Value
18,446,744,073,709,551,615 //UInt64 Max Value
Your number is simply too large for that data type.
Although I have zero experience with it, you might be able to use BigInteger if you are using .NET 4.0 or above.
I'm not sure where you're getting the structs and type info, but even at Int64 that value is simply too large. Int64.MaxValue is:
9223372036854775807
And you have:
330110000000000000000000
Even unsigned, that's orders of magnitude too large. You might use something like BigInteger instead.

Is this a bug? Float operation being treated as integer

This operation returns a 0:
string value = “0.01”;
float convertedValue = float.Parse(value);
return (int)(convertedValue * 100.0f);
But this operation returns a 1:
string value = “0.01”;
float convertedValue = float.Parse(value) * 100.0f;
return (int)(convertedValue);
Because the convertedValue is a float, and it is in parenthesis *100f shouldn't it still be treated as float operation?
The difference between the two lies in the way the compiler optimizes floating point operations. Let me explain.
string value = "0.01";
float convertedValue = float.Parse(value);
return (int)(convertedValue * 100.0f);
In this example, the value is parsed into an 80-bit floating point number for use in the inner floating point dungeons of the computer. Then this is converted to a 32-bit float for storage in the convertedValue variable. This causes the value to be rounded to, seemingly, a number slightly less than 0.01. Then it is converted back to an 80-bit float and multiplied by 100, increasing the rounding error 100-fold. Then it is converted to an 32-bit int. This causes the float to be truncated, and since it is actually slightly less than 1, the int conversion returns 0.
string value = "0.01";
float convertedValue = float.Parse(value) * 100.0f;
return (int)(convertedValue);
In this example, the value is parsed into an 80-bit floating point number again. It is then multiplied by 100, before it is converted to a 32-bit float. This means that the rounding error is so small that when it is converted to a 32-bit float for storage in convertedValue, it rounds to exactly 1. Then when it is converted to an int, you get 1.
The main idea is that the computer uses high-precision floats for calculations, and then rounds the values whenever they are stored in a variable. The more assignments you have with floats, the more the rounding errors accumulate.
Please read an introduction to floatingpoint. This is a typical floating point problem. Binary floating points can't represent 0.01 exactly.
0.01 * 100 is approximately 1.
If it happens to be rounded to 0.999... you get 0, and if it gets rounded to 1.000... you get 1. Which one of those you get is undefined.
The jit compiler is not required to round the same way every time it encounters a similar expression(or even the same expression in different contexts). In particular it can use higher precision whenever it wants to, but can downgrade to 32 bit floats if it thinks that's a good idea.
One interesting point is an explicit cast to float (even if you already have an expression of type float). This forces the JITer to reduce the precision to 32 bit floats at that point. The exact rounding is still undefined though.
Since the rounding is undefined, it can vary between .net versions, debug/release builds, the presence of debuggers (and possibly the phase of the moon :P).
Storage locations for floating-point numbers (statics, array elements, and fields of classes) are of fixed size. The
supported storage sizes are float32 and float64. Everywhere else (on the evaluation stack, as arguments, as
return types, and as local variables) floating-point numbers are represented using an internal floating-point
type.
When a floating-point value whose internal representation has greater range and/or precision than its nominal type is put in a storage location, it is automatically coerced to the type of the storage location. This can involve
a loss of precision or the creation of an out-of-range value (NaN, +infinity, or -infinity). However, the value might be retained in the internal representation for future use, if it is reloaded from the storage location without
having been modified. It is the responsibility of the compiler to ensure that the retained value is still valid at the time of a subsequent load, taking into account the effects of aliasing and other execution threads (see memory model (§12.6)). This freedom to carry extra precision is not permitted, however, following the execution of an explicit conversion (conv.r4 or conv.r8), at which time the internal representation must be
exactly representable in the associated type.
Your specific problem can be solved by using Decimal, but similar problems with 3*(1/3f) won't be solved by this, since Decimal can't represent one third exactly either.
In this line:
(int)(convertedValue * 100.0f)
The intermediate value is actually of higher precision, not simply a float. To obtain identical results to the second one, you'd have to do:
(int)((float)(convertedValue * 100.0f))
On the IL level, the difference looks like:
mul
conv.i4
versus your second version:
mul
stloc.3
ldloc.3
conv.i4
Note that the second one store/restores the value in a float32 variable, which forces it to be of float precision. (Note that, as per CodeInChaos' comment, this is not guaranteed by the spec.)
(For completeness the explicit cast looks like:)
mul
conv.r4
conv.i4
I know this issue and alwayes working with it.
As our friend CodeInChaose answer that the floating point will not be presented on memory as its.
But i want to add that you have a reason for the different result, not because the JIT free to use the precision that he want.
The reason is on your first code you did convert the string and save it on memory so on this case its will not be saved 0.1 and some how will be saved 0.0999966 or something like this number.
On your second code you make the conversion and before you save it on memory and before the value is allocated on memory you did the multiplication operation so you will have your correct result without taking the risk of JIT precision of float numbers.

Avoiding rounding in convert.tosingle method

consider ,
object a =1.123456;
float f = convert.ToSingle(a);
But when I print the value of f , I get 1.123455.
It is getting rounded off.
Also the problem is I cant change the data type of float in the code.
Please help.
This is done because of the way the floating-point type works.
If you want a better precision (in the cost of some performance) - use the Double or Decimal type instead.
For more information about why floating-point loses precision, read:
http://msdn.microsoft.com/en-us/library/c151dt3s%28VS.80%29.aspx

Categories

Resources