Situation:
I've a system which reads a continuous stream of numbers (integers only).
The numbers are both positive and negative.
Some of the numbers are results of overflowed arithmetic operations: so these numbers will be negative ones in the stream.
Problem:
How can I differentiate between overflowed negative numbers and non-overflowed negative number in a stream? Is there a way to find and discard overflowed numbers? I'm developing in C# and don't have control over the stream source. So I can't change the code or add any checks.
There is no way to achieve what you want in generic case. There is no information on int (or any other built in value type) on the way how it was generated. There is no difference in binary pattern of value obtained by regular operation or overflow.
Your options:
some cases may be detected by range checking - if normal values are relatively small some large values are likely errors
capture more information when constructing stream of values - include "overflow" flag as pair to the value
use wider type for values (long or BigInteger)
Note that overflow also can produce small number/positive numbers, so there is no chances of filtering all invalid values unless you know more information about how computation is performed.
You can write defensive code to guard against overflow.
All of the built-in numeric types have MinValue and MaxValue properties
if (number >= Int32.MinValue && number <= Int32.MaxValue)
{
// logic
}
Related
When adding two positive Int32 values with a theoretical result greater than Int32.MaxValue can I count on the overflown value always being negative?
I mean to do this as a way to check for overflow without using a checked context and exception handling (like proposed here: http://sandbox.mc.edu/~bennet/cs110/tc/orules.html ), but is this behaviour garantueed?
From what I've read so far is signed integer overflow in C# defined behaviour (Is C#/.NET signed integer overflow behavior defined?) (in contrast to C/C++) and Int32 is two-complement so I would be thankful for someone with better understanding of this subject than me to verify this.
Update
Quote from link 1:
The rules for detecting overflow in a two's complement sum are simple:
If the sum of two positive numbers yields a negative result, the sum has overflowed.
If the sum of two negative numbers yields a positive result, the sum has overflowed.
Otherwise, the sum has not overflowed.
Rule #2 from
http://sandbox.mc.edu/~bennet/cs110/tc/orules.html
is incorrect
If the sum of two negative numbers yields a positive result, the sum has overflowed.
Counter example:
int a = int.MinValue;
int b = int.MinValue;
unchecked {
// 0
Console.Write(a + b);
}
However, the rule can be simply amended
If the sum of two negative numbers yields a non-negative result, the sum has overflowed.
As for Rule #1
If the sum of two positive numbers yields a negative result, the sum has overflowed.
it's correct one
No, you cannot.
You already talk about checked context, where you know that overflow causes an exception to be thrown. However, your seems to assume that the lack of a checked keyword indicates you're in unchecked context. That isn't necessarily the case. The default context when neither checked nor unchecked is specified is configurable, can be different in multiple projects that share the same source files, and can even be different between different configurations of the same project.
If you want integer overflow to wrap, be explicit, use the unchecked keyword.
I recently came across denormalized definition and I understand that there are some numbers that cannot be represented in a normalized form because they are too small to fit into its corresponding type. According with IEEE
So what I was trying to do is catch when a denormalized number is being passed as a parameter to avoid calculations with this numbers. If I am understanding correct I just need to look for numbers within the Range of denormalized
private bool IsDenormalizedNumber(float number)
{
return Math.Pow(2, -149) <= number && number<= ((2-Math.Pow(2,-23))*Math.Pow(2, -127)) ||
Math.Pow(-2, -149) <= number && number<= -((2 - Math.Pow(2, -23)) * Math.Pow(2, -127));
}
Is my interpretation correct?
I think a better approach would be to inspect the bits. Normalized or denormalized is a characteristic of the binary representation, not of the value itself. Therefore, you will be able to detect it more reliably this way and you can do so without and potentially dangerous floating point comparisons.
I put together some runnable code for you, so that you can see it work. I adapted this code from a similar question regarding doubles. Detecting the denormal is much simpler than fully excising the exponent and significand, so I was able to simplify the code greatly.
As for why it works... The exponent is stored in offset notation. The 8 bits of the exponent can take the values 1 to 254 (0 and 255 are reserved for special cases), they are then offset adjusted by -127 yielding the normalized range of -126 (1-127) to 127 (254-127). The exponent is set to 0 in the denormal case. I think this is only required because .NET does not store the leading bit on the significand. According to IEEE 754, it can be stored either way. It appears that C# has opted for dropping it in favor of a sign bit, though I don't have any concrete details to back that observation.
In any case, the actual code is quite simple. All that is required is to excise the 8 bits storing the exponent and test for 0. There is a special case around 0, which is handled below.
NOTE: Per the comment discussion, this code relies on platform specific implementation details (x86_64 in this test case). As #ChiuneSugihara pointed out, the CLI does not ensure this behavior and it may differ on other platforms, such as ARM.
using System;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("-120, denormal? " + IsDenormal((float)Math.Pow(2, -120)));
Console.WriteLine("-126, denormal? " + IsDenormal((float)Math.Pow(2, -126)));
Console.WriteLine("-127, denormal? " + IsDenormal((float)Math.Pow(2, -127)));
Console.WriteLine("-149, denormal? " + IsDenormal((float)Math.Pow(2, -149)));
Console.ReadKey();
}
public static bool IsDenormal(float f)
{
// when 0, the exponent will also be 0 and will break
// the rest of this algorithm, so we should check for
// this first
if (f == 0f)
{
return false;
}
// Get the bits
byte[] buffer = BitConverter.GetBytes(f);
int bits = BitConverter.ToInt32(buffer, 0);
// extract the exponent, 8 bits in the upper registers,
// above the 23 bit significand
int exponent = (bits >> 23) & 0xff;
// check and see if anything is there!
return exponent == 0;
}
}
}
The output is:
-120, denormal? False
-126, denormal? False
-127, denormal? True
-149, denormal? True
Sources:
extracting mantissa and exponent from double in c#
https://en.wikipedia.org/wiki/IEEE_floating_point
https://en.wikipedia.org/wiki/Denormal_number
http://csharpindepth.com/Articles/General/FloatingPoint.aspx
Code adapted from:
extracting mantissa and exponent from double in c#
From my understanding denormalized numbers are there for help with underflows in some cases (see answer to Denormalized Numbers - IEEE 754 Floating Point).
So to get a denormalized number you would need to explicitly create it or else cause an underflow. In the first case it seems unlikely that a literal denormalized number would be specified in code, and even if someone tried it I am not sure that .NET would allow it. In the second case as long as you are in a checked context you should get an OverflowException thrown for any overflow or underflow in an arithmetic computation so that would guard against the possibility of getting a denormalized number. In an unchecked context I am not sure if an underflow will take you to a denormalized number, but you can try it and see if you are wanting to run calculations in unchecked.
Long story short you can not worry about it if you are running in checked and try an underflow and see in unchecked if you want to run in that context.
EDIT
I wanted to update my answer since a comment didn't feel substantial enough. First off I struck out the comment I made about the checked context since that only applies to non-floating point calculations (like int) and not to float or double. That was my mistake on that one.
The issue with denormalized numbers is that they are not consistent in the CLI. Notice how I am using "CLI" and not "C#" because we need to go lower level than just C# to understand the issue. From The Common Language Infrastructure Annotated Standard Partition I Section 12.1.3 the second note (page 125 of the book) it states:
This standard does not specify the behavior of arithmetic operations on denormalized floating point numbers, nor does it specify when or whether such representations should be created. This is in keeping with IEC 60559:1989. In addition, this standard does not specify how to access the exact bit pattern of NaNs that are created, nor the behavior when converting a NaN between 32-bit and 64-bit representation. All of this behavior is deliberately left implementation specific.
So at the CLI level the handling of denormalized numbers is deliberately left to be implementation specific. Furthermore, if you look at the documentation for float.Epsilon (found here), which is the smallest positive number representable by a float you will get a denormalized number on most machines that matches what is listed in the documentation (which is approximately 1.4e-45). This is what #Kevin Burdett was most likely seeing in his answer. That being said if you scroll down farther on the page you will see the following quote under "Platform Notes"
On ARM systems, the value of the Epsilon constant is too small to be detected, so it equates to zero. You can define an alternative epsilon value that equals 1.175494351E-38 instead.
So there are portability issues that can come into play when you are dealing with manually handling denormalized numbers even just for the .NET CLR (which is an implementation of the CLI). In fact this ARM specific value is kind of interesting since it appears to be a normalized number (I used the function from #Kevin Burdett with IsDenormal(1.175494351E-38f) and it returned false). In the CLI proper the concerns are more severe since there is no standardization on their handling by design according to the annotation on the CLI standard. So this leaves questions about what would happen with the same code on Mono or Xamarin for instance which is a difference implementation of the CLI than the .NET CLR.
In the end I am right back to my previous advice. Just don't worry about denormalized numbers, they are there to silently help you and it is hard to imagine why you would need to specifically single them out. Also as #HansPassant mentioned you most likely won't even encounter anyway. It is just hard to imagine how you would be going under the smallest, positive normalized number in double which is absurdly small.
I'm just curious, why in IEEE-754 any non zero float number divided by zero results in infinite value? It's a nonsense from the mathematical perspective. So I think that correct result for this operation is NaN.
Function f(x) = 1/x is not defined when x=0, if x is a real number. For example, function sqrt is not defined for any negative number and sqrt(-1.0f) if IEEE-754 produces a NaN value. But 1.0f/0 is Inf.
But for some reason this is not the case in IEEE-754. There must be a reason for this, maybe some optimization or compatibility reasons.
So what's the point?
It's a nonsense from the mathematical perspective.
Yes. No. Sort of.
The thing is: Floating-point numbers are approximations. You want to use a wide range of exponents and a limited number of digits and get results which are not completely wrong. :)
The idea behind IEEE-754 is that every operation could trigger "traps" which indicate possible problems. They are
Illegal (senseless operation like sqrt of negative number)
Overflow (too big)
Underflow (too small)
Division by zero (The thing you do not like)
Inexact (This operation may give you wrong results because you are losing precision)
Now many people like scientists and engineers do not want to be bothered with writing trap routines. So Kahan, the inventor of IEEE-754, decided that every operation should also return a sensible default value if no trap routines exist.
They are
NaN for illegal values
signed infinities for Overflow
signed zeroes for Underflow
NaN for indeterminate results (0/0) and infinities for (x/0 x != 0)
normal operation result for Inexact
The thing is that in 99% of all cases zeroes are caused by underflow and therefore in 99%
of all times Infinity is "correct" even if wrong from a mathematical perspective.
I'm not sure why you would believe this to be nonsense.
The simplistic definition of a / b, at least for non-zero b, is the unique number of bs that has to be subtracted from a before you get to zero.
Expanding that to the case where b can be zero, the number that has to be subtracted from any non-zero number to get to zero is indeed infinite, because you'll never get to zero.
Another way to look at it is to talk in terms of limits. As a positive number n approaches zero, the expression 1 / n approaches "infinity". You'll notice I've quoted that word because I'm a firm believer in not propagating the delusion that infinity is actually a concrete number :-)
NaN is reserved for situations where the number cannot be represented (even approximately) by any other value (including the infinities), it is considered distinct from all those other values.
For example, 0 / 0 (using our simplistic definition above) can have any amount of bs subtracted from a to reach 0. Hence the result is indeterminate - it could be 1, 7, 42, 3.14159 or any other value.
Similarly things like the square root of a negative number, which has no value in the real plane used by IEEE754 (you have to go to the complex plane for that), cannot be represented.
In mathematics, division by zero is undefined because zero has no sign, therefore two results are equally possible, and exclusive: negative infinity or positive infinity (but not both).
In (most) computing, 0.0 has a sign. Therefore we know what direction we are approaching from, and what sign infinity would have. This is especially true when 0.0 represents a non-zero value too small to be expressed by the system, as it frequently the case.
The only time NaN would be appropriate is if the system knows with certainty that the denominator is truly, exactly zero. And it can't unless there is a special way to designate that, which would add overhead.
NOTE:
I re-wrote this following a valuable comment from #Cubic.
I think the correct answer to this has to come from calculus and the notion of limits. Consider the limit of f(x)/g(x) as x->0 under the assumption that g(0) == 0. There are two broad cases that are interesting here:
If f(0) != 0, then the limit as x->0 is either plus or minus infinity, or it's undefined. If g(x) takes both signs in the neighborhood of x==0, then the limit is undefined (left and right limits don't agree). If g(x) has only one sign near 0, however, the limit will be defined and be either positive or negative infinity. More on this later.
If f(0) == 0 as well, then the limit can be anything, including positive infinity, negative infinity, a finite number, or undefined.
In the second case, generally speaking, you cannot say anything at all. Arguably, in the second case NaN is the only viable answer.
Now in the first case, why choose one particular sign when either is possible or it might be undefined? As a practical matter, it gives you more flexibility in cases where you do know something about the sign of the denominator, at relatively little cost in the cases where you don't. You may have a formula, for example, where you know analytically that g(x) >= 0 for all x, say, for example, g(x) = x*x. In that case the limit is defined and it's infinity with sign equal to the sign of f(0). You might want to take advantage of that as a convenience in your code. In other cases, where you don't know anything about the sign of g, you cannot generally take advantage of it, but the cost here is just that you need to trap for a few extra cases - positive and negative infinity - in addition to NaN if you want to fully error check your code. There is some price there, but it's not large compared to the flexibility gained in other cases.
Why worry about general functions when the question was about "simple division"? One common reason is that if you're computing your numerator and denominator through other arithmetic operations, you accumulate round-off errors. The presence of those errors can be abstracted into the general formula format shown above. For example f(x) = x + e, where x is the analytically correct, exact answer, e represents the error from round-off, and f(x) is the floating point number that you actually have on the machine at execution.
I have a database table that needs to be converted into current form. This table has three columns that are of type Double (it's Pervasive.SQL, if anyone cares).
My problem is that this table has been around for a long time, and it's been acted upon by code going back some 15 years or better.
Historically, we have always used Double.MinValue (or whatever language equivalent at the time) to represent "blank" values provided by the user. The absence of a value, in other words, is actually stored as a value that we can recognize later and react to intelligently.
So, today my problem is that I need to loop through these records and insert them into a newly created table (this is the "conversion" I spoke of). However, I am not seeing consistent values in the tables I am converting. Here are the ones I know of for sure:
2.2250738585072014E-308
3.99285938963E-313
3.99099435427E-313
1.1125369292536007E-308
-5.389000690742776E279
2.104687961E-314
Now, I recognize that there are other ways that Double.MinValue might exist or at least be represented. Having done some google searches, I found that the first one is another representation of Double.MinValue (actually DBL_MIN referenced here: http://msdn.microsoft.com/en-us/library/6bs3y5ya(v=vs.100).aspx).
I don't want to get too long-winded, so I'll solicit questions if this is not enough information to help me. Suffice it to say, I need a reliable way of spotting all of the previous values of "minimum" and replace them with the C# Double.MinValue constant as I am looping these data rows.
If it proves to be dataRow["Value"] < someConstant, then so be it. But I'll let the math theorists help me out with that determination.
Thank you for the time.
EDIT:
Here's what I am doing with these values as I find them. It's part of a generic method that assembles values to be written to the database:
else if (c.DataType == typeof(System.Double))
{
if (inRow[c] == DBNull.Value)
retString += #"NULL";
else
{
Double d;
if (Double.TryParse(inRow[c].ToString(), out d))
retString += d.ToStringFull();
}
}
Until now, it simply accepted them. And that's bad because when the application finds them, they look like acceptable data, and not like Double.MinValue. Therefore, not seen as blanks. But that's what they are.
This is utter craziness. Let's look at some of those numbers in detail. These are all tiny numbers just barely larger than zero:
2.2250738585072014E-308
This is 1 / 21022 -- it is a normal double. This is one of the two "special" numbers in your set; it is the smallest normal double that is larger than zero. The rest of the small doubles on your list are subnormal doubles.
1.1125369292536007E-308
This is 1 / 21023 -- it is a subnormal double. This is also a special number; it is half the smallest normal double larger than zero. (I originally said that it was the largest subnormal double but of course that is not true; see the comments.)
3.99285938963E-313
This isn't anything special. It's a subnormal double equal to a fraction where the numerator is 154145 and the denominator is a rather large power of two.
3.99099435427E-313
This isn't anything special either. This time the numerator is 154073.
2.104687961E-314
This isn't anything special either. The numerator is 2129967929 and the denominator is an even larger power of two.
All the numbers so far have been very close to zero and positive. This number is very far from zero and negative, and therefore stands out:
-5.389000690742776E279
But again it is nothing special; it is nowhere even close to the negative double with the largest absolute value, which is about -1.79E308, about a billion times larger.
This is a complete mess.
My advice is stop this madness immediately. It makes absolutely no sense to use values that are incredibly close to zero to represent "blank" values; values that are incredibly close to zero should be rounded to zero, not treated as blanks!
Double already has a representative for "blank" values, namely Double.NaN -- Not A Number; it is bizarre to use a valid value to represent an invalid value when the domain already includes a specific "invalid" value. (Remember that there are actually a large number of distinct NaN bit patterns; use IsNaN to determine if a double is a NaN.)
So my advice is:
Examine individually every number in the database that is a subnormal or very small normal double. Some of those probably ought to be zero and ended up as tiny values due to rounding errors. Replace them with zero. The ones that ought to be blank, replace with database null (best practice) or double NaN (acceptable, but not as good as database null.)
Write a program to find every number in the database that is impossibly large in absolute value and replace it with database null or double NaN.
Update all clients so that they understand the convention you're using to represent blank values.
You seem to want to check if a double is really small and positive or really big, finite, and negative. (Others have detailed some problems with your approach in the comments; I'm not going to go into that here.) A test like this:
if (d == d && (d > 0 && d < 1e-290 || d < -1e270 && d + d != d))
might do roughly what you want. You'll probably need to tweak the numbers above. The d == d test is checking for NaN, while the d + d != d test is checking for infinities.
Dividing an int by zero, will throw an exception, but a float won't - at least in Java. Why does a float have additional NaN info, while an int type doesn't?
The representation of a float has been designed such that there are some special combination of bits reserved to store special values such as NaN, infinity, etc.
There are no unused representations for an int type - every bit pattern corresponds to an integer. This has many advantages:
The range of an integer type is as large as possible - no bit patterns are wasted.
The representation of an integer is easy to understand because there are no special cases.
Integer arithmetic can be done at extremely high speed even on very simple processors.
A clear Explanation about float arithmetic is given here
http://www.artima.com/underthehood/floatingP.html
I think the real reason, the root of this, is the well known fact: computers store everything in zeroes and ones.
What does it have to do with integers, floats and zero division? It's pretty simple. If you have only zeroes and ones, it is pretty easy to combine them into integer numbers, like you do with decimal digits. So "10" becomes two, "11" becomes three and so on. This kind of integer representation is so natural that no one would think of inventing anything else for integers, it would just make CPUs more complicated and things more confusing. The only "invention" that was required is to figure out how to store negative numbers, but that's also very natural and simple if you start from the point that x+(-x) should always be equal to zero, without using any special kind of addition here. That's why 11111111 is -1 for 8-bit integers, because if you add 1 to it, it becomes 100000000, then 8th bit is truncated due to overflow and you get your zero. But this natural format has no place for infinities and NaNs, and nobody wanted to invent a non-natural representation just for that. Well, I won't be surprised if someone actually did that, but there is no way such format would become well-known and widely used.
Now, for floating-point numbers, there is no natural representation. Even if we translate 0.5 to binary, it would still be something like 0.1 only now we have "binary point" instead of decimal point. But CPUs can't naturally represent a "point", only 1 and 0. So some kind of special format was needed. There was simply no other way to go. And then someone probably suggested, "Hey guys, while we are at it, why not to include special representation for infinity and other numeric nonsense?" and so it was done.
This is the reason why these formats are so different. How to handle divisions by zero, it's up to language designers, but for floating-points they have the choice between inf/NaN and exceptions, while for integers they don't naturally have such kind of thing.
Basically, it's a purely arbitrary decision.
The traditional int tries to use all the bits for representing possible numbers, whereas IEEE 754 standard reserves a special value for NaN.
The standard could be changed for ints to include special values, at a cost of less efficient operations. The developers usually expect int operations to be very efficient, whereas the operations with floating point numbers are (purely psychologically) more allowed to be slower.
Ints and floats are represented differently inside the machine. Integers usually use a signed, two's complement representation that is (essentially) the number written out in base two. Floats, on the other hand, use a more complex representation that can hold much larger and much smaller values. However, the machine reserves several special bit patterns for floats to mean things other than numbers. There's values for NaN, and for positive or negative infinity, for example. This means that if you divide a float by zero, there is a series of bits that the computer can use to encode that you divided by zero. For ints, all bit patterns are used to encode numbers, so there's no meaningful series of bits the computer could use to represent the error.
This isn't an essential property of ints, though. One could, in theory, make an integer representation that handles division by zero by returning some NaN variant. It's just not what's done in practice.
Java reflects the way most CPUs are implemented. Integer divide by zero causes an interrupt on x86/x64 and Floating point divide by zero results in Infinity, Negative infinity or NaN. Note: with floating point you can also divide by negative zero. :P