No, this is not another "Why is (1/3.0)*3 != 1" question.
I've been reading about floating-points a lot lately; specifically, how the same calculation might give different results on different architectures or optimization settings.
This is a problem for video games which store replays, or are peer-to-peer networked (as opposed to server-client), which rely on all clients generating exactly the same results every time they run the program - a small discrepancy in one floating-point calculation can lead to a drastically different game-state on different machines (or even on the same machine!)
This happens even amongst processors that "follow" IEEE-754, primarily because some processors (namely x86) use double extended precision. That is, they use 80-bit registers to do all the calculations, then truncate to 64- or 32-bits, leading to different rounding results than machines which use 64- or 32- bits for the calculations.
I've seen several solutions to this problem online, but all for C++, not C#:
Disable double extended-precision mode (so that all double calculations use IEEE-754 64-bits) using _controlfp_s (Windows), _FPU_SETCW (Linux?), or fpsetprec (BSD).
Always run the same compiler with the same optimization settings, and require all users to have the same CPU architecture (no cross-platform play). Because my "compiler" is actually the JIT, which may optimize differently every time the program is run, I don't think this is possible.
Use fixed-point arithmetic, and avoid float and double altogether. decimal would work for this purpose, but would be much slower, and none of the System.Math library functions support it.
So, is this even a problem in C#? What if I only intend to support Windows (not Mono)?
If it is, is there any way to force my program to run at normal double-precision?
If not, are there any libraries that would help keep floating-point calculations consistent?
I know of no way to way to make normal floating points deterministic in .net. The JITter is allowed to create code that behaves differently on different platforms(or between different versions of .net). So using normal floats in deterministic .net code is not possible.
The workarounds I considered:
Implement FixedPoint32 in C#. While this is not too hard(I have a half finished implementation) the very small range of values makes it annoying to use. You have to be careful at all times so you neither overflow, nor lose too much precision. In the end I found this not easier than using integers directly.
Implement FixedPoint64 in C#. I found this rather hard to do. For some operations intermediate integers of 128bit would be useful. But .net doesn't offer such a type.
Implement a custom 32 bit floatingpoint. The lack of a BitScanReverse intrinsic causes a few annoyances when implementing this. But currently I think this is the most promising path.
Use native code for the math operations. Incurs the overhead of a delegate call on every math operation.
I've just started a software implementation of 32 bit floating point math. It can do about 70million additions/multiplications per second on my 2.66GHz i3.
https://github.com/CodesInChaos/SoftFloat . Obviously it's still very incomplete and buggy.
The C# specification (§4.1.6 Floating point types) specifically allows floating point computations to be done using precision higher than that of the result. So, no, I don't think you can make those calculations deterministic directly in .Net. Others suggested various workarounds, so you could try them.
The following page may be useful in the case where you need absolute portability of such operations. It discusses software for testing implementations of the IEEE 754 standard, including software for emulating floating point operations. Most information is probably specific to C or C++, however.
http://www.math.utah.edu/~beebe/software/ieee/
A note on fixed point
Binary fixed point numbers can also work well as a substitute for floating point, as is evident from the four basic arithmetic operations:
Addition and subtraction are trivial. They work the same way as integers. Just add or subtract!
To multiply two fixed point numbers, multiply the two numbers then shift right the defined number of fractional bits.
To divide two fixed point numbers, shift the dividend left the defined number of fractional bits, then divide by the divisor.
Chapter four of Hattangady (2007) has additional guidance on implementing binary fixed point numbers (S.K. Hattangady, "Development of a Block Floating Point Interval ALU for DSP and Control Applications", Master's thesis, North Carolina State University, 2007).
Binary fixed point numbers can be implemented on any integer data type such as int, long, and BigInteger, and the non-CLS-compliant types uint and ulong.
As suggested in another answer, you can use lookup tables, where each element in the table is a binary fixed point number, to help implement complex functions such as sine, cosine, square root, and so on. If the lookup table is less granular than the fixed point number, it is suggested to round the input by adding one half of the granularity of the lookup table to the input:
// Assume each number has a 12 bit fractional part. (1/4096)
// Each entry in the lookup table corresponds to a fixed point number
// with an 8-bit fractional part (1/256)
input+=(1<<3); // Add 2^3 for rounding purposes
input>>=4; // Shift right by 4 (to get 8-bit fractional part)
// --- clamp or restrict input here --
// Look up value.
return lookupTable[input];
Is this a problem for C#?
Yes. Different architectures are the least of your worries, different framerates etc. can lead to deviations due to inaccuracies in float representations - even if they are the same inaccuracies (e.g. same architecture, except a slower GPU on one machine).
Can I use System.Decimal?
There is no reason you can't, however it's dog slow.
Is there a way to force my program to run in double precision?
Yes. Host the CLR runtime yourself; and compile in all the nessecary calls/flags (that change the behaviour of floating point arithmetic) into the C++ application before calling CorBindToRuntimeEx.
Are there any libraries that would help keep floating point calculations consistent?
Not that I know of.
Is there another way to solve this?
I have tackled this problem before, the idea is to use QNumbers. They are a form of reals that are fixed-point; but not fixed point in base-10 (decimal) - rather base-2 (binary); because of this the mathematical primitives on them (add, sub, mul, div) are much faster than the naive base-10 fixed points; especially if n is the same for both values (which in your case it would be). Furthermore because they are integral they have well-defined results on every platform.
Keep in mind that framerate can still affect these, but it is not as bad and is easily rectified using syncronisation points.
Can I use more mathematical functions with QNumbers?
Yes, round-trip a decimal to do this. Furthermore, you should really be using lookup tables for the trig (sin, cos) functions; as those can really give different results on different platforms - and if you code them correctly they can use QNumbers directly.
According to this slightly old MSDN blog entry the JIT will not use SSE/SSE2 for floating point, it's all x87. Because of that, as you mentioned you have to worry about modes and flags, and in C# that's not possible to control. So using normal floating point operations will not guarantee the exact same result on every machine for your program.
To get precise reproducibility of double precision you are going to have to do software floating point (or fixed point) emulation. I don't know of C# libraries to do this.
Depending on the operations you need, you might be able to get away with single precision. Here's the idea:
store all values you care about in single precision
to perform an operation:
expand inputs to double precision
do operation in double precision
convert result back to single precision
The big issue with x87 is that calculations might be done in 53-bit or 64-bit accuracy depending on the precision flag and whether the register spilled to memory. But for many operations, performing the operation in high precision and rounding back to lower precision will guarantee the correct answer, which implies that the answer will be guaranteed to be the same on all systems. Whether you get the extra precision won't matter, since you have enough precision to guarantee the right answer in either case.
Operations that should work in this scheme: addition, subtraction, multiplication, division, sqrt. Things like sin, exp, etc. won't work (results will usually match but there is no guarantee). "When is double rounding innocuous?" ACM Reference (paid reg. req.)
Hope this helps!
As already stated by other answers:
Yes, this is a problem in C# - even when staying pure Windows.
As for a solution:
You can reduce (and with some effort/performance hit) avoid the problem completely if you use built-in BigInteger class and scaling all calculations to a defined precision by using a common denominator for any calculation/storage of such numbers.
As requested by OP - regarding performance:
System.Decimal represents number with 1 bit for a sign and 96 bit Integer and a "scale" (representing where the decimal point is). For all calculations you make it must operate on this data structure and can't use any floating point instructions built into the CPU.
The BigInteger "solution" does something similar - only that you can define how much digits you need/want... perhaps you want only 80 bits or 240 bits of precision.
The slowness comes always from having to simulate all operations on these number via integer-only instructions without using the CPU/FPU-built-in instructions which in turn leads to much more instructions per mathematical operation.
To lessen the performance hit there are several strategies - like QNumbers (see answer from Jonathan Dickinson - Is floating-point math consistent in C#? Can it be?) and/or caching (for example trig calculations...) etc.
Well, here would be my first attempt on how to do this:
Create an ATL.dll project that has a simple object in it to be used for your critical floating point operations. make sure to compile it with flags that disable using any non xx87 hardware to do floating point.
Create functions that call floating point operations and return the results; start simple and then if it's working for you, you can always increase the complexity to meet your performance needs later if necessary.
Put the control_fp calls around the actual math to ensure that it's done the same way on all machines.
Reference your new library and test to make sure it works as expected.
(I believe you can just compile to a 32-bit .dll and then use it with either x86 or AnyCpu [or likely only targeting x86 on a 64-bit system; see comment below].)
Then, assuming it works, should you want to use Mono I imagine you should be able to replicate the library on other x86 platforms in a similar manner (not COM of course; although, perhaps, with wine? a little out of my area once we go there though...).
Assuming you can make it work, you should be able to set up custom functions that can do multiple operations at once to fix any performance issues, and you'll have floating point math that allows you to have consistent results across platforms with a minimal amount of code written in C++, and leaving the rest of your code in C#.
I'm not a game developer, though I do have a lot of experience with computationally difficult problems ... so, I'll do my best.
The strategy I would adopt is essentially this:
Use a slower (if necessary; if there's a faster way, great!), but predictable method to get reproducible results
Use double for everything else (eg, rendering)
The short'n long of this is: you need to find a balance. If you're spending 30ms rendering (~33fps) and only 1ms doing collision detection (or insert some other highly sensitive operation) -- even if you triple the time it takes to do the critical arithmetic, the impact it has on your framerate is you drop from 33.3fps to 30.3fps.
I suggest you profile everything, account for how much time is spent doing each of the noticeably expensive calculations, then repeat the measurements with 1 or more methods of resolving this problem and see what the impact is.
Checking the links in the other answers make it clear you'll never have a guarantee of whether floating point is "correctly" implemented or whether you'll always receive a certain precision for a given calculation, but perhaps you could make a best effort by (1) truncating all calculations to a common minimum (eg, if different implementations will give you 32 to 80 bits of precision, always truncating every operation to 30 or 31 bits), (2) have a table of a few test cases at startup (borderline cases of add, subtract, multiply, divide, sqrt, cosine, etc.) and if the implementation calculates values matching the table then not bother making any adjustments.
Your question in quite difficult and technical stuff O_o. However I may have an idea.
You sure know that the CPU makes some adjustment after any floating operations.
And CPU offer several different instructions which make different rounding operation.
So for an expression, your compiler will choose a set of instructions which lead you to a result. But any other instruction workflow, even if they intend to compute the same expression, can provide another result.
The 'mistakes' made by a rounding adjustment will grow at each further instructions.
As an exemple we can say that at an assembly level: a * b * c is not equivalent to a * c * b.
I'm not entirely sure of that, you will need to ask for someone who know CPU architecture a lot more than me : p
However to answer your question: in C or C++ you can solve your problem because you have some control on the machine code generate by your compiler, however in .NET you don't have any. So as long as your machine code can be different, you'll never be sure about the exact result.
I'm curious in which way this can be a problem because variation seems very minimal, but if you need really accurate operation the only solution I can think about will be to increase the size of your floating registers. Use double precision or even long double if you can (not sure that's possible using CLI).
I hope I've been clear enough, I'm not perfect in English (...at all : s)
Related
No, this is not another "Why is (1/3.0)*3 != 1" question.
I've been reading about floating-points a lot lately; specifically, how the same calculation might give different results on different architectures or optimization settings.
This is a problem for video games which store replays, or are peer-to-peer networked (as opposed to server-client), which rely on all clients generating exactly the same results every time they run the program - a small discrepancy in one floating-point calculation can lead to a drastically different game-state on different machines (or even on the same machine!)
This happens even amongst processors that "follow" IEEE-754, primarily because some processors (namely x86) use double extended precision. That is, they use 80-bit registers to do all the calculations, then truncate to 64- or 32-bits, leading to different rounding results than machines which use 64- or 32- bits for the calculations.
I've seen several solutions to this problem online, but all for C++, not C#:
Disable double extended-precision mode (so that all double calculations use IEEE-754 64-bits) using _controlfp_s (Windows), _FPU_SETCW (Linux?), or fpsetprec (BSD).
Always run the same compiler with the same optimization settings, and require all users to have the same CPU architecture (no cross-platform play). Because my "compiler" is actually the JIT, which may optimize differently every time the program is run, I don't think this is possible.
Use fixed-point arithmetic, and avoid float and double altogether. decimal would work for this purpose, but would be much slower, and none of the System.Math library functions support it.
So, is this even a problem in C#? What if I only intend to support Windows (not Mono)?
If it is, is there any way to force my program to run at normal double-precision?
If not, are there any libraries that would help keep floating-point calculations consistent?
I know of no way to way to make normal floating points deterministic in .net. The JITter is allowed to create code that behaves differently on different platforms(or between different versions of .net). So using normal floats in deterministic .net code is not possible.
The workarounds I considered:
Implement FixedPoint32 in C#. While this is not too hard(I have a half finished implementation) the very small range of values makes it annoying to use. You have to be careful at all times so you neither overflow, nor lose too much precision. In the end I found this not easier than using integers directly.
Implement FixedPoint64 in C#. I found this rather hard to do. For some operations intermediate integers of 128bit would be useful. But .net doesn't offer such a type.
Implement a custom 32 bit floatingpoint. The lack of a BitScanReverse intrinsic causes a few annoyances when implementing this. But currently I think this is the most promising path.
Use native code for the math operations. Incurs the overhead of a delegate call on every math operation.
I've just started a software implementation of 32 bit floating point math. It can do about 70million additions/multiplications per second on my 2.66GHz i3.
https://github.com/CodesInChaos/SoftFloat . Obviously it's still very incomplete and buggy.
The C# specification (§4.1.6 Floating point types) specifically allows floating point computations to be done using precision higher than that of the result. So, no, I don't think you can make those calculations deterministic directly in .Net. Others suggested various workarounds, so you could try them.
The following page may be useful in the case where you need absolute portability of such operations. It discusses software for testing implementations of the IEEE 754 standard, including software for emulating floating point operations. Most information is probably specific to C or C++, however.
http://www.math.utah.edu/~beebe/software/ieee/
A note on fixed point
Binary fixed point numbers can also work well as a substitute for floating point, as is evident from the four basic arithmetic operations:
Addition and subtraction are trivial. They work the same way as integers. Just add or subtract!
To multiply two fixed point numbers, multiply the two numbers then shift right the defined number of fractional bits.
To divide two fixed point numbers, shift the dividend left the defined number of fractional bits, then divide by the divisor.
Chapter four of Hattangady (2007) has additional guidance on implementing binary fixed point numbers (S.K. Hattangady, "Development of a Block Floating Point Interval ALU for DSP and Control Applications", Master's thesis, North Carolina State University, 2007).
Binary fixed point numbers can be implemented on any integer data type such as int, long, and BigInteger, and the non-CLS-compliant types uint and ulong.
As suggested in another answer, you can use lookup tables, where each element in the table is a binary fixed point number, to help implement complex functions such as sine, cosine, square root, and so on. If the lookup table is less granular than the fixed point number, it is suggested to round the input by adding one half of the granularity of the lookup table to the input:
// Assume each number has a 12 bit fractional part. (1/4096)
// Each entry in the lookup table corresponds to a fixed point number
// with an 8-bit fractional part (1/256)
input+=(1<<3); // Add 2^3 for rounding purposes
input>>=4; // Shift right by 4 (to get 8-bit fractional part)
// --- clamp or restrict input here --
// Look up value.
return lookupTable[input];
Is this a problem for C#?
Yes. Different architectures are the least of your worries, different framerates etc. can lead to deviations due to inaccuracies in float representations - even if they are the same inaccuracies (e.g. same architecture, except a slower GPU on one machine).
Can I use System.Decimal?
There is no reason you can't, however it's dog slow.
Is there a way to force my program to run in double precision?
Yes. Host the CLR runtime yourself; and compile in all the nessecary calls/flags (that change the behaviour of floating point arithmetic) into the C++ application before calling CorBindToRuntimeEx.
Are there any libraries that would help keep floating point calculations consistent?
Not that I know of.
Is there another way to solve this?
I have tackled this problem before, the idea is to use QNumbers. They are a form of reals that are fixed-point; but not fixed point in base-10 (decimal) - rather base-2 (binary); because of this the mathematical primitives on them (add, sub, mul, div) are much faster than the naive base-10 fixed points; especially if n is the same for both values (which in your case it would be). Furthermore because they are integral they have well-defined results on every platform.
Keep in mind that framerate can still affect these, but it is not as bad and is easily rectified using syncronisation points.
Can I use more mathematical functions with QNumbers?
Yes, round-trip a decimal to do this. Furthermore, you should really be using lookup tables for the trig (sin, cos) functions; as those can really give different results on different platforms - and if you code them correctly they can use QNumbers directly.
According to this slightly old MSDN blog entry the JIT will not use SSE/SSE2 for floating point, it's all x87. Because of that, as you mentioned you have to worry about modes and flags, and in C# that's not possible to control. So using normal floating point operations will not guarantee the exact same result on every machine for your program.
To get precise reproducibility of double precision you are going to have to do software floating point (or fixed point) emulation. I don't know of C# libraries to do this.
Depending on the operations you need, you might be able to get away with single precision. Here's the idea:
store all values you care about in single precision
to perform an operation:
expand inputs to double precision
do operation in double precision
convert result back to single precision
The big issue with x87 is that calculations might be done in 53-bit or 64-bit accuracy depending on the precision flag and whether the register spilled to memory. But for many operations, performing the operation in high precision and rounding back to lower precision will guarantee the correct answer, which implies that the answer will be guaranteed to be the same on all systems. Whether you get the extra precision won't matter, since you have enough precision to guarantee the right answer in either case.
Operations that should work in this scheme: addition, subtraction, multiplication, division, sqrt. Things like sin, exp, etc. won't work (results will usually match but there is no guarantee). "When is double rounding innocuous?" ACM Reference (paid reg. req.)
Hope this helps!
As already stated by other answers:
Yes, this is a problem in C# - even when staying pure Windows.
As for a solution:
You can reduce (and with some effort/performance hit) avoid the problem completely if you use built-in BigInteger class and scaling all calculations to a defined precision by using a common denominator for any calculation/storage of such numbers.
As requested by OP - regarding performance:
System.Decimal represents number with 1 bit for a sign and 96 bit Integer and a "scale" (representing where the decimal point is). For all calculations you make it must operate on this data structure and can't use any floating point instructions built into the CPU.
The BigInteger "solution" does something similar - only that you can define how much digits you need/want... perhaps you want only 80 bits or 240 bits of precision.
The slowness comes always from having to simulate all operations on these number via integer-only instructions without using the CPU/FPU-built-in instructions which in turn leads to much more instructions per mathematical operation.
To lessen the performance hit there are several strategies - like QNumbers (see answer from Jonathan Dickinson - Is floating-point math consistent in C#? Can it be?) and/or caching (for example trig calculations...) etc.
Well, here would be my first attempt on how to do this:
Create an ATL.dll project that has a simple object in it to be used for your critical floating point operations. make sure to compile it with flags that disable using any non xx87 hardware to do floating point.
Create functions that call floating point operations and return the results; start simple and then if it's working for you, you can always increase the complexity to meet your performance needs later if necessary.
Put the control_fp calls around the actual math to ensure that it's done the same way on all machines.
Reference your new library and test to make sure it works as expected.
(I believe you can just compile to a 32-bit .dll and then use it with either x86 or AnyCpu [or likely only targeting x86 on a 64-bit system; see comment below].)
Then, assuming it works, should you want to use Mono I imagine you should be able to replicate the library on other x86 platforms in a similar manner (not COM of course; although, perhaps, with wine? a little out of my area once we go there though...).
Assuming you can make it work, you should be able to set up custom functions that can do multiple operations at once to fix any performance issues, and you'll have floating point math that allows you to have consistent results across platforms with a minimal amount of code written in C++, and leaving the rest of your code in C#.
I'm not a game developer, though I do have a lot of experience with computationally difficult problems ... so, I'll do my best.
The strategy I would adopt is essentially this:
Use a slower (if necessary; if there's a faster way, great!), but predictable method to get reproducible results
Use double for everything else (eg, rendering)
The short'n long of this is: you need to find a balance. If you're spending 30ms rendering (~33fps) and only 1ms doing collision detection (or insert some other highly sensitive operation) -- even if you triple the time it takes to do the critical arithmetic, the impact it has on your framerate is you drop from 33.3fps to 30.3fps.
I suggest you profile everything, account for how much time is spent doing each of the noticeably expensive calculations, then repeat the measurements with 1 or more methods of resolving this problem and see what the impact is.
Checking the links in the other answers make it clear you'll never have a guarantee of whether floating point is "correctly" implemented or whether you'll always receive a certain precision for a given calculation, but perhaps you could make a best effort by (1) truncating all calculations to a common minimum (eg, if different implementations will give you 32 to 80 bits of precision, always truncating every operation to 30 or 31 bits), (2) have a table of a few test cases at startup (borderline cases of add, subtract, multiply, divide, sqrt, cosine, etc.) and if the implementation calculates values matching the table then not bother making any adjustments.
Your question in quite difficult and technical stuff O_o. However I may have an idea.
You sure know that the CPU makes some adjustment after any floating operations.
And CPU offer several different instructions which make different rounding operation.
So for an expression, your compiler will choose a set of instructions which lead you to a result. But any other instruction workflow, even if they intend to compute the same expression, can provide another result.
The 'mistakes' made by a rounding adjustment will grow at each further instructions.
As an exemple we can say that at an assembly level: a * b * c is not equivalent to a * c * b.
I'm not entirely sure of that, you will need to ask for someone who know CPU architecture a lot more than me : p
However to answer your question: in C or C++ you can solve your problem because you have some control on the machine code generate by your compiler, however in .NET you don't have any. So as long as your machine code can be different, you'll never be sure about the exact result.
I'm curious in which way this can be a problem because variation seems very minimal, but if you need really accurate operation the only solution I can think about will be to increase the size of your floating registers. Use double precision or even long double if you can (not sure that's possible using CLI).
I hope I've been clear enough, I'm not perfect in English (...at all : s)
I've been developing a game in C# which currently uses floating points for some calculations and arithmetic. This game will feature networking functionality and a basic replay system which keeps track of inputs and basic player actions over time. I think these features require me to have every important core mechanic be deterministic. Due to the supposed indeterministic attributes of floating point numbers, I have gone through some resources about fixed-point numbers in order to provide myself with an alternative to floating point.
I understand many of the core concepts of fixed-point due to a variety of very well documented online resources on the matter. However, I'm unsure of whether or not I should use a 32bit type (int) or 64bit type (long) for the raw-value of the fixed-point class.
I would like to have the following basic features for my class:
Pass a float, double or int and convert it to a fixed-point value.
Addition, Subtraction, Division, Multiplication against values of types fixed-point, int, float and double.
My assumption is that it would be best to use a long as it will give me more decimal accuracy, but I am worried about potential roadblocks that may come along the way. For example, would using a long provide issues when targeting 32bit or running on 32bit machines? Are ints ultimately more compatible than long when it comes to potential hardware configurations? Because games are performance heavy, is there a large performance loss when switching from float to long based fixed-point numbers?
It seems like a silly question, but I guess I'm wondering if I should use types based off the lowest common denominator of cpu architecture that I expect my program to run on or are these concerns typically handled by the compiler during compilation? Will linux or mac osx handle long calculations differently than a windows machine?
The type you use is irrelevant, with regards to the platform, as types are types are types, in C#. In other words, a long is always 64 bits, no matter what platform you're on. It's a guarantee of C#.
However, the real problem is going to be precision and scale.. When doing fixed-point math, you're going to have to pick the precision you want to use. That's an easy problem. What's not easy is scaling. If you have numbers that will exceed the maximum value of your chosen type (don't forget to include the decimals in this consideration), then you're broken out of the gate.
Have you looked into the decimal type?
decimal is still a floating-point type, but it is floating-point decimal, rather than IEEE754 binary floating-point, and thus is capable of representing any base-10 number you throw at it, so long as it fits in the scale and precision of the decimal type.
See these links for information on the decimal type:
C# In Depth:Decimal
MSDN C# decimal reference
decimal comes with some performance considerations, however, and may not be the best choice for a game, if performance is critical. For a simple 2D scroller, you'd be fine, but it's probably not ideal for anything beyond that.
The results will be exactly the same across all targets. Integer math is nice like that, you can really rely on it to be the same all the same time.
Using a long will be slower though, particularly when targeting 32bit machines but also on 64bit machines. While adding ints and adding longs takes the same time on a 64bit machine (if it's actually 64bit code), doing a fixpoint multiplication requires some extra code if you don't have a type that's twice as big. 64bit division has the same problem. Of course on 32bit machines, 64bit operations are emulated using multiple 32bit operations, so they are inherently slightly slower there.
On the other hand, you may need the extra precision sometimes. In the end you may need both a 32bit fixpoint type and a 64bit one, or even multiple of those with differently positioned radix points.
I've been reading a lot about floating-point determinism in .NET, i.e. ensuring that the same code with the same inputs will give the same results across different machines. Since .NET lacks options like Java's fpstrict and MSVC's fp:strict, the consensus seems to be that there is no way around this issue using pure managed code. The C# game AI Wars has settled on using Fixed-point math instead, but this is a cumbersome solution.
The main issue appears to be that the CLR allows intermediate results to live in FPU registers that have higher precision than the type's native precision, leading to impredictably higher precision results. An MSDN article by CLR engineer David Notario explains the following:
Note that with current spec, it’s still a language choice to give
‘predictability’. The language may insert conv.r4 or conv.r8
instructions after every FP operation to get a ‘predictable’ behavior.
Obviously, this is really expensive, and different languages have
different compromises. C#, for example, does nothing, if you want
narrowing, you will have to insert (float) and (double) casts by hand.
This suggests that one may achieve floating-point determinism simply by inserting explicit casts for every expression and sub-expression that evaluates to float. One might write a wrapper type around float to automate this task. This would be a simple and ideal solution!
Other comments however suggest that it isn't so simple. Eric Lippert recently stated (emphasis mine):
in some version of the runtime, casting to float explicitly gives a
different result than not doing so. When you explicitly cast to float,
the C# compiler gives a hint to the runtime to say "take this thing
out of extra high precision mode if you happen to be using this
optimization".
Just what is this "hint" to the runtime? Does the C# spec stipulate that an explicit cast to float causes the insertion of a conv.r4 in the IL? Does the CLR spec stipulate that a conv.r4 instruction causes a value to be narrowed down to its native size? Only if both of these are true can we rely on explicit casts to provide floating point "predictability" as explained by David Notario.
Finally, even if we can indeed coerce all intermediate results to the type's native size, is this enough to guarantee reproducibility across machines, or are there other factors like FPU/SSE run-time settings?
Just what is this "hint" to the runtime?
As you conjecture, the compiler tracks whether a conversion to double or float was actually present in the source code, and if it was, it always inserts the appropriate conv opcode.
Does the C# spec stipulate that an explicit cast to float causes the insertion of a conv.r4 in the IL?
No, but I assure you that there are unit tests in the compiler test cases that ensure that it does. Though the specification does not demand it, you can rely on this behaviour.
The specification's only comment is that any floating point operation may be done in a higher precision than required at the whim of the runtime, and that this can make your results unexpectedly more accurate. See section 4.1.6.
Does the CLR spec stipulate that a conv.r4 instruction causes a value to be narrowed down to its native size?
Yes, in Partition I, section 12.1.3, which I note you could have looked up yourself rather than asking the internet to do it for you. These specifications are free on the web.
A question you didn't ask but probably should have:
Is there any operation other than casting that truncates floats out of high precision mode?
Yes. Assigning to a static field, instance field or element of a double[] or float[] array truncates.
Is consistent truncation enough to guarantee reproducibility across machines?
No. I encourage you to read section 12.1.3, which has much interesting to say on the subject of denormals and NaNs.
And finally, another question you did not ask but probably should have:
How can I guarantee reproducible arithmetic?
Use integers.
The 8087 Floating Point Unit chip design was Intel's billion dollar mistake. The idea looks good on paper, give it an 8 register stack that stores values in extended precision, 80 bits. So that you can write calculations whose intermediate values are less likely to lose significant digits.
The beast is however impossible to optimize for. Storing a value from the FPU stack back to memory is expensive. So keeping them inside the FPU is a strong optimization goal. Inevitable, having only 8 registers is going to require a write-back if the calculation is deep enough. It is also implemented as a stack, not freely addressable registers so that requires gymnastics as well that may produce a write-back. Inevitably a write back will truncate the value from 80-bits back to 64-bits, losing precision.
So consequences are that non-optimized code does not produce the same result as optimized code. And small changes to the calculation can have big effects on the result when an intermediate value ends up needing to be written back. The /fp:strict option is a hack around that, it forces the code generator to emit a write-back to keep the values consistent, but with the inevitable and considerable loss of perf.
This is a complete rock and a hard place. For the x86 jitter they just didn't try to address the problem.
Intel didn't make the same mistake when they designed the SSE instruction set. The XMM registers are freely addressable and don't store extra bits. If you want consistent results then compiling with the AnyCPU target, and a 64-bit operating system, is the quick solution. The x64 jitter uses SSE instead of FPU instructions for floating point math. Albeit that this added a third way that a calculation can produce a different result. If the calculation is wrong because it loses too many significant digits then it will be consistently wrong. Which is a bit of a bromide, really, but typically only as far as a programmer looks.
Is using float type slower than using double type?
I heard that modern Intel and AMD CPUs can do calculations with doubles faster than with floats.
What about standard math functions (sqrt, pow, log, sin, cos, etc.)? Computing them in single-precision should be considerably faster because it should require less floating-point operations. For example, single precision sqrt can use simpler math formula than double precision sqrt. Also, I heard that standard math functions are faster in 64 bit mode (when compiled and run on 64 bit OS). What is the definitive answer on this?
The classic x86 architecture uses floating-point unit (FPU) to perform floating-point calculations. The FPU performs all calculations in its internal registers, which have 80-bit precision each. Every time you attempt to work with float or double, the variable is first loaded from memory into the internal register of the FPU. This means that there is absolutely no difference in the speed of the actual calculations, since in any case the calculations are carried out with full 80-bit precision. The only thing that might be different is the speed of loading the value from memory and storing the result back to memory. Naturally, on a 32-bit platform it might take longer to load/store a double as compared to float. On a 64-bit platform there shouldn't be any difference.
Modern x86 architectures support extended instruction sets (SSE/SSE2) with new instructions that can perform the very same floating-point calculations without involving the "old" FPU instructions. However, again, I wouldn't expect to see any difference in calculation speed for float and double. And since these modern platforms are 64-bit ones, the load/store speed is supposed to be the same as well.
On a different hardware platform the situation could be different. But normally a smaller floating-point type should not provide any performance benefits. The main purpose of smaller floating-point types is to save memory, not to improve performance.
Edit: (To address #MSalters comment)
What I said above applies to fundamental arithmetical operations. When it comes to library functions, the answer will depend on several implementation details. If the platform's floating-point instruction set contains an instruction that implements the functionality of the given library function, then what I said above will normally apply to that function as well (that would normally include functions like sin, cos, sqrt). For other functions, whose functionality is not immediately supported in the FP instruction set, the situation might prove to be significantly different. It is quite possible that float versions of such functions can be implemented more efficiently than their double versions.
Your first question has already been answer here on SO.
Your second question is entirely dependent on the "size" of the data you are working with. It all boils down to the low level architecture of the system and how it handles large values. 64-bits of data in a 32 bit system would require 2 cycles to access 2 registers. The same data on a 64 bit system should only take 1 cycle to access 1 register.
Everything always depends on what you're doing. I find there are no fast and hard rules so you need to analyze the current task and choose what works best for your needs for that specific task.
From some research and empirical measurements I have made in Java:
basic arithmetic operations on doubles and floats essentially perform identically on Intel hardware, with the exception of division;
on the other hand, on the Cortex-A8 as used in the iPhone 4 and iPad, even "basic" arithmetic on doubles takes around twice as long as on floats (a register FP addition on a float taking around 4ns vs a register FP on a double taking around 9ns);
I've made some timings of methods on java.util.Math (trigonometrical functions etc) which may be of interest -- in principle, some of these may well be faster on floats as fewer terms would be required to calculate to the precision of a float; on the other hand, many of these end up being "not as bad as you'd think";
It is also true that there may be special circumstances in which e.g. memory bandwidth issues outweigh "raw" calculation times.
While on most systems double will be the same speed as float for individual values, you're right that computing functions like sqrt, sin, etc. in single-precision should be a lot faster than computing them to double-precision. In C99, you can use the sqrtf, sinf, etc. functions even if your variables are double, and get the benefit.
Another issue I've seen mentioned is memory (and likewise storage device) bandwidth. If you have millions or billions of values to deal with, float will almost certainly be twice as fast as double since everything will be memory-bound or io-bound. This is a good reason to use float as the type in an array or on-disk storage in some cases, but I would not consider it a good reason to use float for the variables you do your computations with.
The "native" internal floating point representation in the x86 FPU is 80 bits wide. This is different from both float (32 bits) and double (64 bits). Every time a value moves in or out of the FPU, a conversion is performed. There is only one FPU instruction that performs a sin operation, and it works on the internal 80 bit representation.
Whether this conversion is faster for float or for double depends on many factors, and must be measured for a given application.
It depends on the processor. If the processor has native double-precision instructions, it'll usually be faster to just do double-precision arithmetic than to be given a float, convert it to a double, do the double-precision arithmetic, then convert it back to a float.
I am working on an app that will need to handle very large numbers.
I checked out a few available LargeNumber classes and have found a few that I am happy with. I have a class for large integers and for large floating point numbers.
Since some of the numbers will be small and some large the question is whether it is worth checking the length of the number and if it is small use a regular C# int or double and if is is large use the other classes I have or if I am already using the Large Integer and Large Float classes I should just stick with them even for smaller numbers.
My consideration is purely performance. Will I save enough time on the math for the smaller numbers that it would be worthwhile to check each number after is is put in.
Really hard to tell - Depends on your 3rd party libraries :)
Best bet would be to use the System.Diagnostics.StopWatch class, do a gazzillion different calculations, time them and compare the results, I guess ..
[EDIT] - About the benchmarks, I'd do a series of benchmarks your largeInt-type to do the calculations on regular 32/64 bits numbers, and a series checking if the number can fit in the regular Int32/Int64 types (which they should), "downcasting" them to these types, and then run the same calculations using these types. From your question, this sounds like what you'll be doing if the built-in types are faster..
If your application is targetted for more people than yourself, try to run them on different machines (single core, multicore, 32bit, 64bit platforms), and if the platform seems to have a large impact in the time the calculations take, use some sort of strategy-pattern to do the calculations differently on different machines.
Good luck :)
I would expect that a decent large numbers library would be able to do this optimization on it's own...
I would say yes, the check will more than pay for itself, as long as you have enough values within the regular range.
The logic is simple: an integer addition is one assembly instruction. Combined with a comparison, that's three or four instructions. Any software implementation of such operation will most probably be much slower.
Optimally, this check should be done in the LargeNumber libraries themselves. If they don't do it, you may need a wrapper to avoid having checks all over the place. But then you need to think of the additional cost of the wrapper as well.
Worked in a project where the same fields needed to handle very large numbers and at the same time handels precision for very small numbers.
The ended up with storing to fields (mantissa and exponent) for every number of such kind.
We made a class for mantissa/exponent calculations and it performed well.