How to combine float representation with discontinous function? - c#

I have read tons of things about floating error, and floating approximation, and all that.
The thing is : I never read an answer to a real world problem. And today, I came across a real world problem. And this is really bad, and I really don't know how to escape.
Take a look at this example :
[TestMethod]
public void TestMethod1()
{
float t1 = 8460.32F;
float t2 = 5990;
var x = t1 - t2;
var y = F(x);
Assert.AreEqual(x, y);
}
float F(float x)
{
if (x <= 2470.32F) { return x; }
else { return -x; }
}
x is supposed to be 2470.32. But in fact, due to rounding error, its value is 2470.32031.
Most of the time, this is not a problem. Functions are continuous, and all is good, the result is off by a little value.
But here, we have a discontinous function, and the error is really, really big. The test failed exactly on the discontinuous point.
How could I handle the rounding error with discontinuous functions?

The key problem here is:
The function has a large (and significant) change in output value in certain cases when there is a small change in input value.
You are passing an incorrect input value to the function.
As you write, “due to rounding error, [x’s value] is 2470.32031”. Suppose you could write any code you desire—simply describe the function to be performed, and a team of expert programmers will provide complete, bug-free source code within seconds. What would you tell them?
The problem you are posing is, “I am going to pass a wrong value, 2470.32031, to this function. I want it to know that the correct value is something else and to provide the result for the correct value, which I did not pass, instead of the incorrect value, which I did pass.”
In general, that problem is impossible to solve, because it is impossible to distinguish when 2470.32031 is passed to the function but 2470.32 is intended from when 2470.32031 is passed to the function and 2470.32031 is intended. You cannot expect a computer to read your mind. When you pass incorrect input, you cannot expect correct output.
What this tells us is that no solution inside of the function F is possible. Therefore, we must zoom out and look at the larger problem. You must examine whether the value passed to F can be improved (calculated in a better way or with higher precision or with supplementary information) or whether the nature of the problem is such that, when 2470.32031 is passed, 2470.32 is always intended, so that this knowledge can be incorporated into F.

NOTE: this answer is essentially the same as the one of Eric
It just enlighten the test point of view, since a test is a form of specification.
The problem here is that testMethod1 does not test F.
It rather tests that conversion of decimal quantity 8460.32 to float and float subtraction are inexact.
But is it the intention of the test?
All you can say is that in certain bad conditions (near discontinuity), a small error on input will result in a large error on output, so the test could express that it is an expected result.
Note that function F is almost perfect, except maybe for the float value 2470.32F itself.
Indeed, the floating point approximation will round the decimal by excess (1/3200 exactly).
So the answer should be:
Assert.AreEqual(F(2470.32F), -2470.32F); /* because 2470.32F exceed the decimal 2470.32 */
If you want to test such low level requirements, you'll need a library with high (arbitrary/infinite) precision to perform the tests.
If you can't afford such imprecision on function F, then Float is a mismatch., and you'll have to find another implementation with increased, arbitrary or infinite precision.
It's up to you to specify your needs, and testMethod1 shall explicit this specification better than it does right now.

If you need the 8460.32 number to be exactly that without rounding error, you could look at the .NET Decimal type which was created explicitly to represent base 10 fractional numbers without rounding error. How they perform that magic is beyond me.
Now, I realize this may be impractical for you to do because the float presumably comes from somewhere and refactoring it to Decimal type could be way too much to do, but if you need it to have that much precision for the discontinuous function that relies on that value you'll either need a more precise type or some mathematical trickery. Perhaps there is some way to always ensure that a float is created that has rounding error such that it's always less than the actual number? I'm not sure if such a thing exists but it should also solve your issue.

You have three numbers represented in your application, you have accepted imprecision in each of them by representing them as floats.
So I think you can reasonably claim that your program is working correctly
(oneNumber +/- some imprecision ) - (another number +/- some imprecision)
is not quite bigger than another number +/- some imprecision
when viewed in decimal representation on paper it looks wrong but that's not what you've implemented. What's the origin of the data? How precisely was 8460.32 known? Had it been 8460.31999 what should have happened? 8460.32001? Was the original value known to such precision?
In the end if you want to model more accuracy use a different data type, as suggested elsewhere.

I always just assume that when comparing floating point values a small margin of error is needed because of rounding issues. In your case, this would most likely mean choosing values in your test method that aren't quite so stringent--e.g., define a very small error constant and subtract that value from x. Here's a SO question that relates to this.
Edit to better address the concluding question: Presumably it doesn't matter what the function outputs on the discontinuity exactly, so test just slightly on either side of it. If it does matter, then really about the best you can do is allow either of two outputs from the function at that point.

Related

Value of a double variable not exact after multiplying with 100 [duplicate]

This question already has answers here:
Is floating point math broken?
(31 answers)
Closed 7 years ago.
If I execute the following expression in C#:
double i = 10*0.69;
i is: 6.8999999999999995. Why?
I understand numbers such as 1/3 can be hard to represent in binary as it has infinite recurring decimal places but this is not the case for 0.69. And 0.69 can easily be represented in binary, one binary number for 69 and another to denote the position of the decimal place.
How do I work around this? Use the decimal type?
Because you've misunderstood floating point arithmetic and how data is stored.
In fact, your code isn't actually performing any arithmetic at execution time in this particular case - the compiler will have done it, then saved a constant in the generated executable. However, it can't store an exact value of 6.9, because that value cannot be precisely represented in floating point point format, just like 1/3 can't be precisely stored in a finite decimal representation.
See if this article helps you.
why doesn't the framework work around this and hide this problem from me and give me the
right answer,0.69!!!
Stop behaving like a dilbert manager, and accept that computers, though cool and awesome, have limits. In your specific case, it doesn't just "hide" the problem, because you have specifically told it not to. The language (the computer) provides alternatives to the format, that you didn't choose. You chose double, which has certain advantages over decimal, and certain downsides. Now, knowing the answer, you're upset that the downsides don't magically disappear.
As a programmer, you are responsible for hiding this downside from managers, and there are many ways to do that. However, the makers of C# have a responsibility to make floating point work correctly, and correct floating point will occasionally result in incorrect math.
So will every other number storage method, as we do not have infinite bits. Our job as programmers is to work with limited resources to make cool things happen. They got you 90% of the way there, just get the torch home.
And 0.69 can easily be represented in
binary, one binary number for 69 and
another to denote the position of the
decimal place.
I think this is a common mistake - you're thinking of floating point numbers as if they are base-10 (i.e decimal - hence my emphasis).
So - you're thinking that there are two whole-number parts to this double: 69 and divide by 100 to get the decimal place to move - which could also be expressed as:
69 x 10 to the power of -2.
However floats store the 'position of the point' as base-2.
Your float actually gets stored as:
68999999999999995 x 2 to the power of some big negative number
This isn't as much of a problem once you're used to it - most people know and expect that 1/3 can't be expressed accurately as a decimal or percentage. It's just that the fractions that can't be expressed in base-2 are different.
but why doesn't the framework work around this and hide this problem from me and give me the right answer,0.69!!!
Because you told it to use binary floating point, and the solution is to use decimal floating point, so you are suggesting that the framework should disregard the type you specified and use decimal instead, which is very much slower because it is not directly implemented in hardware.
A more efficient solution is to not output the full value of the representation and explicitly specify the accuracy required by your output. If you format the output to two decimal places, you will see the result you expect. However if this is a financial application decimal is precisely what you should use - you've seen Superman III (and Office Space) haven't you ;)
Note that it is all a finite approximation of an infinite range, it is merely that decimal and double use a different set of approximations. The advantage of decimal is it produces the same approximations that you would if you were performing the calculation yourself. For example if you calculated 1/3, you would eventually stop writing 3's when it was 'good enough'.
For the same reason that 1 / 3 in a decimal systems comes out as 0.3333333333333333333333333333333333333333333 and not the exact fraction, which is infinitely long.
To work around it (e.g. to display on screen) try this:
double i = (double) Decimal.Multiply(10, (Decimal) 0.69);
Everyone seems to have answered your first question, but ignored the second part.

Why does a parsed double not equal an initialized double supposedly of the same value?

When I execute this line:
double dParsed = double.Parse("0.00000002036");
dParsed actually gets the value: 0.000000020360000000000002
Compared to this line,
double dInitialized = 0.00000002036;
in which case the value of dInitialized is exactly 0.00000002036
Here they are in the debugger:
This inconsistency is a trifle annoying, because I want to run tests along the lines of:
[Subject("parsing doubles")]
public class when_parsing_crazy_doubles
{
static double dInitialized = 0.00000002036;
static double dParsed;
Because of = () => dParsed = double.Parse("0.00000002036");
It should_match = () => dParsed.ShouldBeLike(dInitialized);
}
This of course fails with:
Machine.Specifications.SpecificationException
"":
Expected: [2.036E-08]
But was: [2.036E-08]
In my production code, the 'parsed' doubles are read from a data file whereas the comparison values are hard coded as object initializers. Over many hundreds of records, 4 or 5 of them don't match. The original data appears in the text file like this:
0.00000002036 0.90908165072 6256.77753019160
So the values being parsed have only 11 decimal places. Any ideas for working around this inconsistency?
While I accept that comparing doubles for equality is risky, I'm surprised that the compiler can get an exact representation when the text is used as an object initializer, but that double.Parse can't get an exact representation when parsing exactly the same text. How can I limit the parsed doubles to 11 decimal places?
Compared to this line,
double dInitialized = 0.00000002036;
in which case the value of dInitialized is exactly 0.00000002036
If you have anything remotely resembling a commodity computer, dInitialized is not initialized as exactly 0.00000002036. It can't be because the base 10 number 0.00000002036 does not have a finite representation in base 2.
Your mistake is expecting two doubles to compare equal. That's usually not a good idea. Unless you have very good reasons and know what you are doing, it is best to not compare two doubles for equality or inequality. Instead test whether the difference between the two lies within some small epsilon of zero.
Getting the size of that epsilon right is a bit tricky. If your two numbers are both small, (less than one, for example), an epsilon of 1e-15 might well be appropriate. If the numbers are large (larger than ten, for example), that small of an epsilon value is equivalent to testing for equality.
Edit: I didn't answer the question.
How can I limit the parsed doubles to 11 decimal places?
If you don't have to worry about very small values,
static double epsilon = 1e-11;
if (Math.Abs(dParsed-dInitialized) > epsilon*Math.Abs(dInitialized)) {
noteTestAsFailed();
}
You should be able to safely change that epsilon to 4e-16.
Edit #2: Why is it that the compiler and double.Parse produce different internal representations for the same text?
That's kind of obvious, isn't it? The compiler and double.Parse use different algorithms. The number in question 0.00000002036 is very close to being on the cusp of whether rounding up or rounding down should be used to yield a representable value that is within half an ULP of the desired value (0.00000002036). The "right" value is the one that is within a half an ULP of the desired value. In this case, the compiler makes the right decision of picking the rounded-down value while the parser makes the wrong decision of picking the rounded-up value.
The value 0.00000002036 is a nasty corner case. It is not an exactly representable value. The two closest values that can be represented exactly as IEEE doubles are 6153432421838462/2^78 and 6153432421838463/2^78. The value halfway between these two is 12306864843676925/2^79, which is very, very close to 0.00000002036. That's what makes this a corner case. I suspect all of the values you found where the compiled value is not identically equal to the value from double.Parse are corner cases, cases where the desired value is almost halfway between the two closest exactly representable values.
Edit #3:
Here are a number of different ways to interpret 0.00000002036:
2/1e8 + 3/1e10 + 6/1e11
2*1e-8 + 3*1e-10 + 6*1e-11
2.036 * 1e-8
2.036 / 1e8
2036 * 1e-11
2036 / 1e11
On an ideal computer all of these will be the same. Don't count on that being the case on a computer that uses finite precision arithmetic.

Strange behavior when casting decimal to double

I'm experiencing strange issue when casting decimal to double.
Following code returns true:
Math.Round(0.010000000312312m, 2) == 0.01m //true
However, when I cast this to double it returns false:
(double)Math.Round(0.010000000312312m, 2) == (double)0.01m //false
I've experienced this problem when I wanted to use Math.Pow and was forced to cast decimal to double since there is no Math.Pow overload for decimal.
Is this documented behavior? How can I avoid it when I'm forced to cast decimal to double?
Screenshot from Visual Studio:
Casting Math.Round to double me following result:
(double)Math.Round(0.010000000312312m, 2) 0.0099999997764825821 double
(double)0.01m 0.01 double
UPDATE
Ok, I'm reproducing the issue as follows:
When I run WPF application and check the output in watch just after it started I get true like on empty project.
There is a part of application that sends values from the slider to the calculation algorithm. I get wrong result and I put breakpoint on the calculation method. Now, when I check the value in watch window I get false (without any modifications, I just refresh watch window).
As soon as I reproduce the issue in some smaller project I will post it here.
UPDATE2
Unfortunately, I cannot reproduce the issue in smaller project. I think that Eric's answer explains why.
People are reporting in the comments here that sometimes the result of the comparison is true and sometimes it is false.
Unfortunately, this is to be expected. The C# compiler, the jitter and the CPU are all permitted to perform arithmetic on doubles in more than 64 bit double precision, as they see fit. This means that sometimes the results of what looks like "the same" computation can be done in 64 bit precision in one calculation, 80 or 128 bit precision in another calculation, and the two results might differ in their last bit.
Let me make sure that you understand what I mean by "as they see fit". You can get different results for any reason whatsoever. You can get different results in debug and retail. You can get different results if you make the compiler do the computation in constants and if you make the runtime do the computation at runtime. You can get different results when the debugger is running. You can get different results in the runtime and the debugger's expression evaluator. Any reason whatsoever. Double arithmetic is inherently unreliable. This is due to the design of the floating point chip; double arithmetic on these chips cannot be made more repeatable without a considerable performance penalty.
For this and other reasons you should almost never compare two doubles for exact equality. Rather, subtract the doubles, and see if the absolute value of the difference is smaller than a reasonable bound.
Moreover, it is important that you understand why rounding a double to two decimal places is a difficult thing to do. A non-zero, finite double is a number of the form (1 + f) x 2e where f is a fraction with a denominator that is a power of two, and e is an exponent. Clearly it is not possible to represent 0.01 in that form, because there is no way to get a denominator equal to a power of ten out of a denominator equal to a power of two.
The double 0.01 is actually the binary number 1.0100011110101110000101000111101011100001010001111011 x 2-7, which in decimal is 0.01000000000000000020816681711721685132943093776702880859375. That is the closest you can possibly get to 0.01 in a double. If you need to represent exactly that value then use decimal. That's why its called decimal.
Incidentally, I have answered variations on this question many times on StackOverflow. For example:
Why differs floating-point precision in C# when separated by parantheses and when separated by statements?
Also, if you need to "take apart" a double to see what its bits are, this handy code that I whipped up a while back is quite useful. It requires that you install Solver Foundation, but that's a free download.
http://ericlippert.com/2011/02/17/looking-inside-a-double/
This is documented behavior. The decimal data type is more precise than the double type. So when you convert from decimal to double there is the possibility of data loss. This is why you are required to do an explicit conversion of the type.
See the following MSDN C# references for more information:
decimal data type: http://msdn.microsoft.com/en-us/library/364x0z75(v=vs.110).aspx
double data type: http://msdn.microsoft.com/en-us/library/678hzkk9(v=vs.110).aspx
casting and type conversion: http://msdn.microsoft.com/en-us/library/ms173105.aspx

'Beautify' number by rounding erroneous digits appropriately

I want my cake and to eat it. I want to beautify (round) numbers to the largest extent possible without compromising accuracy for other calculations. I'm using doubles in C# (with some string conversion manipulation too).
Here's the issue. I understand the inherent limitations in double number representation (so please don't explain that). HOWEVER, I want to round the number in some way to appear aesthetically pleasing to the end user (I am making a calculator). The problem is rounding by X significant digits works in one case, but not in the other, whilst rounding by decimal place works in the other, but not the first case.
Observe:
CASE A: Math.Sin(Math.Pi) = 0.000000000000000122460635382238
CASE B: 0.000000000000001/3 = 0.000000000000000333333333333333
For the first case, I want to round by DECIMAL PLACES. That would give me the nice neat zero I'm looking for. Rounding by Sig digits would mean I would keep the erroneous digits too.
However for the second case, I want to round by SIGNIFICANT DIGITS, as I would lose tons of accuracy if I rounded merely by decimal places.
Is there a general way I can cater to both types of calculation?
I don't thinks it's feasible to do that to the result itself and precision has nothing to do with it.
Consider this input: (1+3)/2^3 . You can "beautify" it by showing the result as sin(30) or cos(60) or 1/2 and a whole lot of other interpretations. Choosing the wrong "beautification" can mislead your user, making them think their function has something to do with sin(x).
If your calculator keeps all the initial input as variables you could keep all the operations postponed until you need the result and then make sure you simplify the result until it matches your needs. And you'll need to consider using rational numbers, e, Pi and other irrational numbers may not be as easy to deal with.
The best solution to this is to keep every bit you can get during calculations, and leave the display format up to the end user. The user should have some idea how many significant digits make sense in their situation, given both the nature of the calculations and the use of the result.
Default to a reasonable number of significant digits for a few calculations in the floating point format you are using internally - about 12 if you are using double. If the user changes the format, immediately redisplay in the new format.
The best solution is to use arbitrary-precision and/or symbolic arithmetic, although these result in much more complex code and slower speed. But since performance isn't important for a calculator (in case of a button calculator and not the one that you enter expressions to calculate) you can use them without issue
Anyway there's a good trade-off which is to use decimal floating point. You'll need to limit the input/output precision but use a higher precision for the internal representation so that you can discard values very close to zero like the sin case above. For better results you could detect some edge cases such as sine/cosine of 45 degree's multiples... and directly return the exact result.
Edit: just found a good solution but haven't had an opportunity to try.
Here’s something I bet you never think about, and for good reason: how are floating-point numbers rendered as text strings? This is a surprisingly tough problem, but it’s been regarded as essentially solved since about 1990.
Prior to Steele and White’s "How to print floating-point numbers accurately", implementations of printf and similar rendering functions did their best to render floating point numbers, but there was wide variation in how well they behaved. A number such as 1.3 might be rendered as 1.29999999, for instance, or if a number was put through a feedback loop of being written out and its written representation read back, each successive result could drift further and further away from the original.
...
In 2010, Florian Loitsch published a wonderful paper in PLDI, "Printing floating-point numbers quickly and accurately with integers", which represents the biggest step in this field in 20 years: he mostly figured out how to use machine integers to perform accurate rendering! Why do I say "mostly"? Because although Loitsch's "Grisu3" algorithm is very fast, it gives up on about 0.5% of numbers, in which case you have to fall back to Dragon4 or a derivative
Here be dragons: advances in problems you didn’t even know you had

Why is the division result between two integers truncated?

All experienced programmers in C# (I think this comes from C) are used to cast on of the integers in a division to get the decimal / double / float result instead of the int (the real result truncated).
I'd like to know why is this implemented like this? Is there ANY good reason to truncate the result if both numbers are integer?
C# traces its heritage to C, so the answer to "why is it like this in C#?" is a combination of "why is it like this in C?" and "was there no good reason to change?"
The approach of C is to have a fairly close correspondence between the high-level language and low-level operations. Processors generally implement integer division as returning a quotient and a remainder, both of which are of the same type as the operands.
(So my question would be, "why doesn't integer division in C-like languages return two integers", not "why doesn't it return a floating point value?")
The solution was to provide separate operations for division and remainder, each of which returns an integer. In the context of C, it's not surprising that the result of each of these operations is an integer. This is frequently more accurate than floating-point arithmetic. Consider the example from your comment of 7 / 3. This value cannot be represented by a finite binary number nor by a finite decimal number. In other words, on today's computers, we cannot accurately represent 7 / 3 unless we use integers! The most accurate representation of this fraction is "quotient 2, remainder 1".
So, was there no good reason to change? I can't think of any, and I can think of a few good reasons not to change. None of the other answers has mentioned Visual Basic which (at least through version 6) has two operators for dividing integers: / converts the integers to double, and returns a double, while \ performs normal integer arithmetic.
I learned about the \ operator after struggling to implement a binary search algorithm using floating-point division. It was really painful, and integer division came in like a breath of fresh air. Without it, there was lots of special handling to cover edge cases and off-by-one errors in the first draft of the procedure.
From that experience, I draw the conclusion that having different operators for dividing integers is confusing.
Another alternative would be to have only one integer operation, which always returns a double, and require programmers to truncate it. This means you have to perform two int->double conversions, a truncation and a double->int conversion every time you want integer division. And how many programmers would mistakenly round or floor the result instead of truncating it? It's a more complicated system, and at least as prone to programmer error, and slower.
Finally, in addition to binary search, there are many standard algorithms that employ integer arithmetic. One example is dividing collections of objects into sub-collections of similar size. Another is converting between indices in a 1-d array and coordinates in a 2-d matrix.
As far as I can see, no alternative to "int / int yields int" survives a cost-benefit analysis in terms of language usability, so there's no reason to change the behavior inherited from C.
In conclusion:
Integer division is frequently useful in many standard algorithms.
When the floating-point division of integers is needed, it may be invoked explicitly with a simple, short, and clear cast: (double)a / b rather than a / b
Other alternatives introduce more complication both the programmer and more clock cycles for the processor.
Is there ANY good reason to truncate the result if both numbers are integer?
Of course; I can think of a dozen such scenarios easily. For example: you have a large image, and a thumbnail version of the image which is 10 times smaller in both dimensions. When the user clicks on a point in the large image, you wish to identify the corresponding pixel in the scaled-down image. Clearly to do so, you divide both the x and y coordinates by 10. Why would you want to get a result in decimal? The corresponding coordinates are going to be integer coordinates in the thumbnail bitmap.
Doubles are great for physics calculations and decimals are great for financial calculations, but almost all the work I do with computers that does any math at all does it entirely in integers. I don't want to be constantly having to convert doubles or decimals back to integers just because I did some division. If you are solving physics or financial problems then why are you using integers in the first place? Use nothing but doubles or decimals. Use integers to solve finite mathematics problems.
Calculating on integers is faster (usually) than on floating point values. Besides, all other integer/integer operations (+, -, *) return an integer.
EDIT:
As per the request of the OP, here's some addition:
The OP's problem is that they think of / as division in the mathematical sense, and the / operator in the language performs some other operation (which is not the math. division). By this logic they should question the validity of all other operations (+, -, *) as well, since those have special overflow rules, which is not the same as would be expected from their math counterparts. If this is bothersome for someone, they should find another language where the operations perform as expected by the person.
As for the claim on perfomance difference in favor of integer values: When I wrote the answer I only had "folk" knowledge and "intuition" to back up the claim (hece my "usually" disclaimer). Indeed as Gabe pointed out, there are platforms where this does not hold. On the other hand I found this link (point 12) that shows mixed performances on an Intel platform (the language used is Java, though).
The takeaway should be that with performance many claims and intuition are unsubstantiated until measured and found true.
Yes, if the end result needs to be a whole number. It would depend on the requirements.
If these are indeed your requirements, then you would not want to store a decimal and then truncate it. You would be wasting memory and processing time to accomplish something that is already built-in functionality.
The operator is designed to return the same type as it's input.
Edit (comment response):
Why? I don't design languages, but I would assume most of the time you will be sticking with the data types you started with and in the remaining instance, what criteria would you use to automatically assume which type the user wants? Would you automatically expect a string when you need it? (sincerity intended)
If you add an int to an int, you expect to get an int. If you subtract an int from an int, you expect to get an int. If you multiple an int by an int, you expect to get an int. So why would you not expect an int result if you divide an int by an int? And if you expect an int, then you will have to truncate.
If you don't want that, then you need to cast your ints to something else first.
Edit: I'd also note that if you really want to understand why this is, then you should start looking into how binary math works and how it is implemented in an electronic circuit. It's certainly not necessary to understand it in detail, but having a quick overview of it would really help you understand how the low-level details of the hardware filter through to the details of high-level languages.

Categories

Resources