Is it more efficient to use hexadecimal instead of decimal? - c#

Hello i am using visual studio 2015 with .net framework 4.5 if it matters and my Resharper keeps suggesting me to switch from decimal numbers to hex. Why is that ? Is there any performance bonus if im using hex ?

There is absolutely no performance difference between the format of numeric literals in a source language, because the conversion is done by the compiler. The only reason to switch from one representation to another is readability of your code.
Two common cases for using hexadecimal literals are representing colors and bit masks. Since color representation is often split at byte boundaries, parsing a number 0xFF00FF is much easier than 16711935: hex format tells you that the red and blue components are maxed out, while the green component is zero. Decimal format, on the other hand, requires you to perform the conversion.
Bit masks are similar: when you use hex or octal representation, it is very easy to see what bits are ones and what bits are zero. All you need to learn is a short table of sixteen bit patterns corresponding to hex digits 0 through F. You can immediately tell that 0xFF00 has the upper eight bits set to 1, and the lower eight bits set to 0. Doing the same with 65280 is much harder for most programmers.

There is absolutely no performance difference when writing constants in your code in decimal vs. hex. Both will be translated to the exact same IL and ultimately JITted to the same machine code.
Use whichever representation makes more sense for the work you are doing, is clearer and easier to understand in the context of the problem your code solves.

Related

Why does C# System.Decimal (decimal) "waste" bits?

As written in the official docs the 128 bits of System.Decimal are filled like this:
The return value is a four-element array of 32-bit signed integers.
The first, second, and third elements of the returned array contain
the low, middle, and high 32 bits of the 96-bit integer number.
The fourth element of the returned array contains the scale factor and
sign. It consists of the following parts:
Bits 0 to 15, the lower word, are unused and must be zero.
Bits 16 to 23 must contain an exponent between 0 and 28, which
indicates the power of 10 to divide the integer number.
Bits 24 to 30 are unused and must be zero.
Bit 31 contains the sign: 0 mean positive, and 1 means negative.
With that in mind one can see that some bits are "wasted" or unused.
Why not for example 120 bits of integer, 7 bits of exponent and 1 bit of sign.
Probably there is a good reason for a decimal being the way it is. This question would like to know the reasoning behind that decision.
Based on Kevin Gosse's comment
For what it's worth, the decimal type seems to predate .net. The .net
framework CLR delegates the computations to the oleaut32 lib, and I
could find traces of the DECIMAL type as far back as Windows 95
I searched further and found a likely user of the DECIMAL code in oleauth32 Windows 95.
The old Visual Basic (non .NET based) and VBA have a sort-of-dynamic type called 'Variant'. In there (and only in there) you could save something nearly identical to our current System.Decimal.
Variant is always 128 bits with the first 16 bits reserved for an enum value of which data type is inside the Variant.
The separation of the remaining 112 bits could be based on common CPU architectures in the early 90'ies or ease of use for the Windows programmer. It sounds sensible to not pack exponent and sign in one byte just to have one more byte available for the integer.
When .NET was built the existing (low level) code for this type and it's operations was reused for System.Decimal.
Nothing of this is 100% verified and I would have liked the answer to contain more historical evidence but that's what I could puzzle together.
Here is the C# source of Decimal. Note the FCallAddSub style methods. These calls out to (unavailable) fast C++ implementations of these methods.
I suspect the implementation is like this because it means that operations on the 'numbers' in the first 96 bits can be simple and fast, as CPUs operate on 32-bit words. If 120 bits were used, CPU operations would be slower and trickier and require a lot of bitmasks to get the interesting extra 24 bits, which would then be difficult to work with. Additionally, this would then 'pollute' the highest 32-bit flags, and make certain optimizations impossible.
If you look at the code, you can see that this simple bit layout is useful everywhere. It is no doubt especially useful in the underlying C++ (and probably assembler).

Why does C#'s decimal use binary integer significand?

The new IEEE 128-bit decimal floating-point type https://en.wikipedia.org/wiki/Decimal128_floating-point_format specifies that the significand (mantissa) can be represented in one of two ways, either as a simple binary integer, or in densely packed decimal (in which case every ten bits represent three decimal digits).
C#'s decimal type predates this standard, but has the same idea. It went with binary integer significand.
On the face of it, this seems inefficient; for addition and subtraction, to line up the significands, you have to divide one of them by a power of ten; and division is the most expensive of all the arithmetic operators.
What was the reason for the choice? What corresponding advantage was considered worth that penalty?
Choosing one representation is almost always about trade offs.
From here
A binary encoding is inherently less efficient for conversions to or
from decimal-encoded data, such as strings (ASCII, Unicode, etc.) and
BCD. A binary encoding is therefore best chosen only when the data are
binary rather than decimal. IBM has published some unverified
performance data.
Here you can find more about the relative performance.
Basically, it asserts your thoughts, that decimal significant is generally faster, but most operations show similar performance and binary even winning in division. Also keep in mind, since Intel mostly seems to rely on binary significants (i couldn't find hints about other manufactures), they are more likely to get hardware support and than might beat decimals by a good margin.

'Beautify' number by rounding erroneous digits appropriately

I want my cake and to eat it. I want to beautify (round) numbers to the largest extent possible without compromising accuracy for other calculations. I'm using doubles in C# (with some string conversion manipulation too).
Here's the issue. I understand the inherent limitations in double number representation (so please don't explain that). HOWEVER, I want to round the number in some way to appear aesthetically pleasing to the end user (I am making a calculator). The problem is rounding by X significant digits works in one case, but not in the other, whilst rounding by decimal place works in the other, but not the first case.
Observe:
CASE A: Math.Sin(Math.Pi) = 0.000000000000000122460635382238
CASE B: 0.000000000000001/3 = 0.000000000000000333333333333333
For the first case, I want to round by DECIMAL PLACES. That would give me the nice neat zero I'm looking for. Rounding by Sig digits would mean I would keep the erroneous digits too.
However for the second case, I want to round by SIGNIFICANT DIGITS, as I would lose tons of accuracy if I rounded merely by decimal places.
Is there a general way I can cater to both types of calculation?
I don't thinks it's feasible to do that to the result itself and precision has nothing to do with it.
Consider this input: (1+3)/2^3 . You can "beautify" it by showing the result as sin(30) or cos(60) or 1/2 and a whole lot of other interpretations. Choosing the wrong "beautification" can mislead your user, making them think their function has something to do with sin(x).
If your calculator keeps all the initial input as variables you could keep all the operations postponed until you need the result and then make sure you simplify the result until it matches your needs. And you'll need to consider using rational numbers, e, Pi and other irrational numbers may not be as easy to deal with.
The best solution to this is to keep every bit you can get during calculations, and leave the display format up to the end user. The user should have some idea how many significant digits make sense in their situation, given both the nature of the calculations and the use of the result.
Default to a reasonable number of significant digits for a few calculations in the floating point format you are using internally - about 12 if you are using double. If the user changes the format, immediately redisplay in the new format.
The best solution is to use arbitrary-precision and/or symbolic arithmetic, although these result in much more complex code and slower speed. But since performance isn't important for a calculator (in case of a button calculator and not the one that you enter expressions to calculate) you can use them without issue
Anyway there's a good trade-off which is to use decimal floating point. You'll need to limit the input/output precision but use a higher precision for the internal representation so that you can discard values very close to zero like the sin case above. For better results you could detect some edge cases such as sine/cosine of 45 degree's multiples... and directly return the exact result.
Edit: just found a good solution but haven't had an opportunity to try.
Here’s something I bet you never think about, and for good reason: how are floating-point numbers rendered as text strings? This is a surprisingly tough problem, but it’s been regarded as essentially solved since about 1990.
Prior to Steele and White’s "How to print floating-point numbers accurately", implementations of printf and similar rendering functions did their best to render floating point numbers, but there was wide variation in how well they behaved. A number such as 1.3 might be rendered as 1.29999999, for instance, or if a number was put through a feedback loop of being written out and its written representation read back, each successive result could drift further and further away from the original.
...
In 2010, Florian Loitsch published a wonderful paper in PLDI, "Printing floating-point numbers quickly and accurately with integers", which represents the biggest step in this field in 20 years: he mostly figured out how to use machine integers to perform accurate rendering! Why do I say "mostly"? Because although Loitsch's "Grisu3" algorithm is very fast, it gives up on about 0.5% of numbers, in which case you have to fall back to Dragon4 or a derivative
Here be dragons: advances in problems you didn’t even know you had

Division by zero: int vs. float

Dividing an int by zero, will throw an exception, but a float won't - at least in Java. Why does a float have additional NaN info, while an int type doesn't?
The representation of a float has been designed such that there are some special combination of bits reserved to store special values such as NaN, infinity, etc.
There are no unused representations for an int type - every bit pattern corresponds to an integer. This has many advantages:
The range of an integer type is as large as possible - no bit patterns are wasted.
The representation of an integer is easy to understand because there are no special cases.
Integer arithmetic can be done at extremely high speed even on very simple processors.
A clear Explanation about float arithmetic is given here
http://www.artima.com/underthehood/floatingP.html
I think the real reason, the root of this, is the well known fact: computers store everything in zeroes and ones.
What does it have to do with integers, floats and zero division? It's pretty simple. If you have only zeroes and ones, it is pretty easy to combine them into integer numbers, like you do with decimal digits. So "10" becomes two, "11" becomes three and so on. This kind of integer representation is so natural that no one would think of inventing anything else for integers, it would just make CPUs more complicated and things more confusing. The only "invention" that was required is to figure out how to store negative numbers, but that's also very natural and simple if you start from the point that x+(-x) should always be equal to zero, without using any special kind of addition here. That's why 11111111 is -1 for 8-bit integers, because if you add 1 to it, it becomes 100000000, then 8th bit is truncated due to overflow and you get your zero. But this natural format has no place for infinities and NaNs, and nobody wanted to invent a non-natural representation just for that. Well, I won't be surprised if someone actually did that, but there is no way such format would become well-known and widely used.
Now, for floating-point numbers, there is no natural representation. Even if we translate 0.5 to binary, it would still be something like 0.1 only now we have "binary point" instead of decimal point. But CPUs can't naturally represent a "point", only 1 and 0. So some kind of special format was needed. There was simply no other way to go. And then someone probably suggested, "Hey guys, while we are at it, why not to include special representation for infinity and other numeric nonsense?" and so it was done.
This is the reason why these formats are so different. How to handle divisions by zero, it's up to language designers, but for floating-points they have the choice between inf/NaN and exceptions, while for integers they don't naturally have such kind of thing.
Basically, it's a purely arbitrary decision.
The traditional int tries to use all the bits for representing possible numbers, whereas IEEE 754 standard reserves a special value for NaN.
The standard could be changed for ints to include special values, at a cost of less efficient operations. The developers usually expect int operations to be very efficient, whereas the operations with floating point numbers are (purely psychologically) more allowed to be slower.
Ints and floats are represented differently inside the machine. Integers usually use a signed, two's complement representation that is (essentially) the number written out in base two. Floats, on the other hand, use a more complex representation that can hold much larger and much smaller values. However, the machine reserves several special bit patterns for floats to mean things other than numbers. There's values for NaN, and for positive or negative infinity, for example. This means that if you divide a float by zero, there is a series of bits that the computer can use to encode that you divided by zero. For ints, all bit patterns are used to encode numbers, so there's no meaningful series of bits the computer could use to represent the error.
This isn't an essential property of ints, though. One could, in theory, make an integer representation that handles division by zero by returning some NaN variant. It's just not what's done in practice.
Java reflects the way most CPUs are implemented. Integer divide by zero causes an interrupt on x86/x64 and Floating point divide by zero results in Infinity, Negative infinity or NaN. Note: with floating point you can also divide by negative zero. :P

Why is System.Math and for example MathNet.Numerics based on double?

All the methods in System.Math takes double as parameters and returns parameters. The constants are also of type double. I checked out MathNet.Numerics, and the same seems to be the case there.
Why is this? Especially for constants. Isn't decimal supposed to be more exact? Wouldn't that often be kind of useful when doing calculations?
This is a classic speed-versus-accuracy trade off.
However, keep in mind that for PI, for example, the most digits you will ever need is 41.
The largest number of digits of pi
that you will ever need is 41. To
compute the circumference of the
universe with an error less than the
diameter of a proton, you need 41
digits of pi †. It seems safe to
conclude that 41 digits is sufficient
accuracy in pi for any circle
measurement problem you're likely to
encounter. Thus, in the over one
trillion digits of pi computed in
2002, all digits beyond the 41st have
no practical value.
In addition, decimal and double have a slightly different internal storage structure. Decimals are designed to store base 10 data, where as doubles (and floats), are made to hold binary data. On a binary machine (like every computer in existence) a double will have fewer wasted bits when storing any number within its range.
Also consider:
System.Double 8 bytes Approximately ±5.0e-324 to ±1.7e308 with 15 or 16 significant figures
System.Decimal 12 bytes Approximately ±1.0e-28 to ±7.9e28 with 28 or 29 significant figures
As you can see, decimal has a smaller range, but a higher precision.
No, - decimals are no more "exact" than doubles, or for that matter, any type. The concept of "exactness", (when speaking about numerical representations in a compuiter), is what is wrong. Any type is absolutely 100% exact at representing some numbers. unsigned bytes are 100% exact at representing the whole numbers from 0 to 255. but they're no good for fractions or for negatives or integers outside the range.
Decimals are 100% exact at representing a certain set of base 10 values. doubles (since they store their value using binary IEEE exponential representation) are exact at representing a set of binary numbers.
Neither is any more exact than than the other in general, they are simply for different purposes.
To elaborate a bit furthur, since I seem to not be clear enough for some readers...
If you take every number which is representable as a decimal, and mark every one of them on a number line, between every adjacent pair of them there is an additional infinity of real numbers which are not representable as a decimal. The exact same statement can be made about the numbers which can be represented as a double. If you marked every decimal on the number line in blue, and every double in red, except for the integers, there would be very few places where the same value was marked in both colors.
In general, for 99.99999 % of the marks, (please don't nitpick my percentage) the blue set (decimals) is a completely different set of numbers from the red set (the doubles).
This is because by our very definition for the blue set is that it is a base 10 mantissa/exponent representation, and a double is a base 2 mantissa/exponent representation. Any value represented as base 2 mantissa and exponent, (1.00110101001 x 2 ^ (-11101001101001) means take the mantissa value (1.00110101001) and multiply it by 2 raised to the power of the exponent (when exponent is negative this is equivilent to dividing by 2 to the power of the absolute value of the exponent). This means that where the exponent is negative, (or where any portion of the mantissa is a fractional binary) the number cannot be represented as a decimal mantissa and exponent, and vice versa.
For any arbitrary real number, that falls randomly on the real number line, it will either be closer to one of the blue decimals, or to one of the red doubles.
Decimal is more precise but has less of a range. You would generally use Double for physics and mathematical calculations but you would use Decimal for financial and monetary calculations.
See the following articles on msdn for details.
Double
http://msdn.microsoft.com/en-us/library/678hzkk9.aspx
Decimal
http://msdn.microsoft.com/en-us/library/364x0z75.aspx
Seems like most of the arguments here to "It does not do what I want" are "but it's faster", well so is ANSI C+Gmp library, but nobody is advocating that right?
If you particularly want to control accuracy, then there are other languages which have taken the time to implement exact precision, in a user controllable way:
http://www.doughellmann.com/PyMOTW/decimal/
If precision is really important to you, then you are probably better off using languages that mathematicians would use. If you do not like Fortran then Python is a modern alternative.
Whatever language you are working in, remember the golden rule:
Avoid mixing types...
So do convert a and b to be the same before you attempt a operator b
If I were to hazard a guess, I'd say those functions leverage low-level math functionality (perhaps in C) that does not use decimals internally, and so returning a decimal would require a cast from double to decimal anyway. Besides, the purpose of the decimal value type is to ensure accuracy; these functions do not and cannot return 100% accurate results without infinite precision (e.g., irrational numbers).
Neither Decimal nor float or double are good enough if you require something to be precise. Furthermore, Decimal is so expensive and overused out there it is becoming a regular joke.
If you work in fractions and require ultimate precision, use fractions. It's same old rule, convert once and only when necessary. Your rounding rules too will vary per app, domain and so on, but sure you can find an odd example or two where it is suitable. But again, if you want fractions and ultimate precision, the answer is not to use anything but fractions. Consider you might want a feature of arbitrary precision as well.
The actual problem with CLR in general is that it is so odd and plain broken to implement a library that deals with numerics in generic fashion largely due to bad primitive design and shortcoming of the most popular compiler for the platform. It's almost the same as with Java fiasco.
double just turns out to be the best compromise covering most domains, and it works well, despite the fact MS JIT is still incapable of utilising a CPU tech that is about 15 years old now.
[piece to users of MSDN slowdown compilers]
Double is a built-in type. Is is supported by FPU/SSE core (formerly known as "Math coprocessor"), that's why it is blazingly fast. Especially at multiplication and scientific functions.
Decimal is actually a complex structure, consisting of several integers.

Categories

Resources