Why does C# System.Decimal (decimal) "waste" bits? - c#

As written in the official docs the 128 bits of System.Decimal are filled like this:
The return value is a four-element array of 32-bit signed integers.
The first, second, and third elements of the returned array contain
the low, middle, and high 32 bits of the 96-bit integer number.
The fourth element of the returned array contains the scale factor and
sign. It consists of the following parts:
Bits 0 to 15, the lower word, are unused and must be zero.
Bits 16 to 23 must contain an exponent between 0 and 28, which
indicates the power of 10 to divide the integer number.
Bits 24 to 30 are unused and must be zero.
Bit 31 contains the sign: 0 mean positive, and 1 means negative.
With that in mind one can see that some bits are "wasted" or unused.
Why not for example 120 bits of integer, 7 bits of exponent and 1 bit of sign.
Probably there is a good reason for a decimal being the way it is. This question would like to know the reasoning behind that decision.

Based on Kevin Gosse's comment
For what it's worth, the decimal type seems to predate .net. The .net
framework CLR delegates the computations to the oleaut32 lib, and I
could find traces of the DECIMAL type as far back as Windows 95
I searched further and found a likely user of the DECIMAL code in oleauth32 Windows 95.
The old Visual Basic (non .NET based) and VBA have a sort-of-dynamic type called 'Variant'. In there (and only in there) you could save something nearly identical to our current System.Decimal.
Variant is always 128 bits with the first 16 bits reserved for an enum value of which data type is inside the Variant.
The separation of the remaining 112 bits could be based on common CPU architectures in the early 90'ies or ease of use for the Windows programmer. It sounds sensible to not pack exponent and sign in one byte just to have one more byte available for the integer.
When .NET was built the existing (low level) code for this type and it's operations was reused for System.Decimal.
Nothing of this is 100% verified and I would have liked the answer to contain more historical evidence but that's what I could puzzle together.

Here is the C# source of Decimal. Note the FCallAddSub style methods. These calls out to (unavailable) fast C++ implementations of these methods.
I suspect the implementation is like this because it means that operations on the 'numbers' in the first 96 bits can be simple and fast, as CPUs operate on 32-bit words. If 120 bits were used, CPU operations would be slower and trickier and require a lot of bitmasks to get the interesting extra 24 bits, which would then be difficult to work with. Additionally, this would then 'pollute' the highest 32-bit flags, and make certain optimizations impossible.
If you look at the code, you can see that this simple bit layout is useful everywhere. It is no doubt especially useful in the underlying C++ (and probably assembler).

Related

Is it more efficient to use hexadecimal instead of decimal?

Hello i am using visual studio 2015 with .net framework 4.5 if it matters and my Resharper keeps suggesting me to switch from decimal numbers to hex. Why is that ? Is there any performance bonus if im using hex ?
There is absolutely no performance difference between the format of numeric literals in a source language, because the conversion is done by the compiler. The only reason to switch from one representation to another is readability of your code.
Two common cases for using hexadecimal literals are representing colors and bit masks. Since color representation is often split at byte boundaries, parsing a number 0xFF00FF is much easier than 16711935: hex format tells you that the red and blue components are maxed out, while the green component is zero. Decimal format, on the other hand, requires you to perform the conversion.
Bit masks are similar: when you use hex or octal representation, it is very easy to see what bits are ones and what bits are zero. All you need to learn is a short table of sixteen bit patterns corresponding to hex digits 0 through F. You can immediately tell that 0xFF00 has the upper eight bits set to 1, and the lower eight bits set to 0. Doing the same with 65280 is much harder for most programmers.
There is absolutely no performance difference when writing constants in your code in decimal vs. hex. Both will be translated to the exact same IL and ultimately JITted to the same machine code.
Use whichever representation makes more sense for the work you are doing, is clearer and easier to understand in the context of the problem your code solves.

HMAC Licensing Example Does Not Make Sense

I am researching licensing solutions for a project of mine, one article has the following text:
"The expiration date is represented as days (not seconds) since 1/1/1970. This way it only takes two bytes to represent the date" - [http://www.drdobbs.com/licensing-using-symmetric-and-asymmetric/184401687?pgno=1][1] (under the heading "HMAC Licensing System" about half way down)
How can this be correct if the days returned are a 32-bit integer, how can this guy fit that info into 2 bytes?
You can simply truncate a 32 bit integer to 16 bits. An unsigned 16 bit integer has a maximum of 65535, which, if expressing a number of days, is over 179 years.

Most efficient way to store a 40 cards deck

I'm building a simulator for a 40 card's deck game. The deck is divided into 4 seeds, each one with 10 cards. Since there's only 1 seed that's different from the others ( let's say, hearts ) , I've thinked of a quite convinient way to store a set of 4 cards with the same value in 3 bits: the first two indicate how many cards of a given value are left, and the last one is a marker that tells if the heart card of that value is still in the deck.
So,
{7h 7c 7s} = 101
That allows me to store the whole deck on 30 bits of memory instead of 40. Now, when i was programming in C, I'd have allocated 4 chars ( 1 byte each = 32 bits), and played with the values with bit operations.
In C# I can't do that, since chars are 2 bytes each and playing with bits is much more of a pain, so, the question is : what's the smallest amount of memory I'll have to use to store the data required?
PS: Keep in mind that i may have to allocate 100k+ of those decks in system's memory, so saving 10 bits is quite a lot
in C, I'd have allocated 3 chars ( 1 byte each = 32 bits)
3 bytes gives you 24 bits, not 32... you need 4 bytes to get 32 bits. (Okay, some platforms have non-8-bit bytes, but they're pretty rare these days.)
In C# I can't do that, since chars are 2 bytes each
Yes, so you use byte instead of char. You shouldn't be using char for non-textual information.
and playing with bits is much more of a pain
In what way?
But if you need to store 30 bits, just use an int or a uint. Or, better, create your own custom value type which backs the data with an int, but exposes appropriate properties and constructors to make it better to work with.
PS: Keep in mind that i may have to allocate 100k+ of those decks in system's memory, so saving 10 bits is quite a lot
Is it a significant amount though? If it turned out you needed to store 8 bytes per deck instead of 4 bytes, that means 800M instead of 400M for 100,000 of them. Still less than a gig of memory. That's not that much...
In C#, unlike in C/C++, the concept of a byte is not overloaded with the concept of a character.
Check out the byte datatype, in particular a byte[], which many of the APIs in the .Net Framework have special support for.
C# (and modern versions of C) have a type that's exactly 8 bits: byte (or uint8_t in C), so you should use that. C char usually is 8 bits, but it's not guaranteed and so you shouldn't rely on that.
In C#, you should use char and string only when dealing with actual characters and strings of characters, don't treat them as numbers.

Why is System.Math and for example MathNet.Numerics based on double?

All the methods in System.Math takes double as parameters and returns parameters. The constants are also of type double. I checked out MathNet.Numerics, and the same seems to be the case there.
Why is this? Especially for constants. Isn't decimal supposed to be more exact? Wouldn't that often be kind of useful when doing calculations?
This is a classic speed-versus-accuracy trade off.
However, keep in mind that for PI, for example, the most digits you will ever need is 41.
The largest number of digits of pi
that you will ever need is 41. To
compute the circumference of the
universe with an error less than the
diameter of a proton, you need 41
digits of pi †. It seems safe to
conclude that 41 digits is sufficient
accuracy in pi for any circle
measurement problem you're likely to
encounter. Thus, in the over one
trillion digits of pi computed in
2002, all digits beyond the 41st have
no practical value.
In addition, decimal and double have a slightly different internal storage structure. Decimals are designed to store base 10 data, where as doubles (and floats), are made to hold binary data. On a binary machine (like every computer in existence) a double will have fewer wasted bits when storing any number within its range.
Also consider:
System.Double 8 bytes Approximately ±5.0e-324 to ±1.7e308 with 15 or 16 significant figures
System.Decimal 12 bytes Approximately ±1.0e-28 to ±7.9e28 with 28 or 29 significant figures
As you can see, decimal has a smaller range, but a higher precision.
No, - decimals are no more "exact" than doubles, or for that matter, any type. The concept of "exactness", (when speaking about numerical representations in a compuiter), is what is wrong. Any type is absolutely 100% exact at representing some numbers. unsigned bytes are 100% exact at representing the whole numbers from 0 to 255. but they're no good for fractions or for negatives or integers outside the range.
Decimals are 100% exact at representing a certain set of base 10 values. doubles (since they store their value using binary IEEE exponential representation) are exact at representing a set of binary numbers.
Neither is any more exact than than the other in general, they are simply for different purposes.
To elaborate a bit furthur, since I seem to not be clear enough for some readers...
If you take every number which is representable as a decimal, and mark every one of them on a number line, between every adjacent pair of them there is an additional infinity of real numbers which are not representable as a decimal. The exact same statement can be made about the numbers which can be represented as a double. If you marked every decimal on the number line in blue, and every double in red, except for the integers, there would be very few places where the same value was marked in both colors.
In general, for 99.99999 % of the marks, (please don't nitpick my percentage) the blue set (decimals) is a completely different set of numbers from the red set (the doubles).
This is because by our very definition for the blue set is that it is a base 10 mantissa/exponent representation, and a double is a base 2 mantissa/exponent representation. Any value represented as base 2 mantissa and exponent, (1.00110101001 x 2 ^ (-11101001101001) means take the mantissa value (1.00110101001) and multiply it by 2 raised to the power of the exponent (when exponent is negative this is equivilent to dividing by 2 to the power of the absolute value of the exponent). This means that where the exponent is negative, (or where any portion of the mantissa is a fractional binary) the number cannot be represented as a decimal mantissa and exponent, and vice versa.
For any arbitrary real number, that falls randomly on the real number line, it will either be closer to one of the blue decimals, or to one of the red doubles.
Decimal is more precise but has less of a range. You would generally use Double for physics and mathematical calculations but you would use Decimal for financial and monetary calculations.
See the following articles on msdn for details.
Double
http://msdn.microsoft.com/en-us/library/678hzkk9.aspx
Decimal
http://msdn.microsoft.com/en-us/library/364x0z75.aspx
Seems like most of the arguments here to "It does not do what I want" are "but it's faster", well so is ANSI C+Gmp library, but nobody is advocating that right?
If you particularly want to control accuracy, then there are other languages which have taken the time to implement exact precision, in a user controllable way:
http://www.doughellmann.com/PyMOTW/decimal/
If precision is really important to you, then you are probably better off using languages that mathematicians would use. If you do not like Fortran then Python is a modern alternative.
Whatever language you are working in, remember the golden rule:
Avoid mixing types...
So do convert a and b to be the same before you attempt a operator b
If I were to hazard a guess, I'd say those functions leverage low-level math functionality (perhaps in C) that does not use decimals internally, and so returning a decimal would require a cast from double to decimal anyway. Besides, the purpose of the decimal value type is to ensure accuracy; these functions do not and cannot return 100% accurate results without infinite precision (e.g., irrational numbers).
Neither Decimal nor float or double are good enough if you require something to be precise. Furthermore, Decimal is so expensive and overused out there it is becoming a regular joke.
If you work in fractions and require ultimate precision, use fractions. It's same old rule, convert once and only when necessary. Your rounding rules too will vary per app, domain and so on, but sure you can find an odd example or two where it is suitable. But again, if you want fractions and ultimate precision, the answer is not to use anything but fractions. Consider you might want a feature of arbitrary precision as well.
The actual problem with CLR in general is that it is so odd and plain broken to implement a library that deals with numerics in generic fashion largely due to bad primitive design and shortcoming of the most popular compiler for the platform. It's almost the same as with Java fiasco.
double just turns out to be the best compromise covering most domains, and it works well, despite the fact MS JIT is still incapable of utilising a CPU tech that is about 15 years old now.
[piece to users of MSDN slowdown compilers]
Double is a built-in type. Is is supported by FPU/SSE core (formerly known as "Math coprocessor"), that's why it is blazingly fast. Especially at multiplication and scientific functions.
Decimal is actually a complex structure, consisting of several integers.

Read/Write compressed binary data

I read all over the place people talk about compressing objects on a bit by bit scale. Things like "The first three bits represent such and such, then the next two represent this and twelve bits for that"
I understand why it would be desirable to minimize memory usage, but I cannot think of a good way to implement this. I know I would pack it into one or more integers (or longs, whatever), but I cannot envision an easy way to work with it. It would be pretty cool if there were a class where I could get/set arbitrary bits from an arbitrary length binary field, and it would take care of things for me, and I wouldn't have to go mucking about with &'s and |'s and masks and such.
Is there a standard pattern for this kind of thing?
From MSDN:
BitArray Class
Manages a compact array of bit values, which are represented as Booleans, where true indicates that the bit is on (1) and false indicates the bit is off (0).
Example:
BitArray myBitArray = new BitArray(5);
myBitArray[3] = true; // set bit at offset 3 to 1
BitArray allows you to set only individual bits, though. If you want to encode values with more bits, there's probably no way around mucking about with &'s and |'s and masks and stuff :-)
You might want to check out the BitVector32 structure in the .NET Framework. It lets you define "sections" which are ranges of bits within an int, then read and write values to those sections.
The main limitation is that it's limited to a single 32-bit integer; this may or may not be a problem depending on what you're trying to do. As dtb mentioned, BitArray can handle bit fields of any size, but you can only get and set a single bit at a time--there is no support for sections as in BitVector32.
What you're looking for are called bitwise operations.
For example, let's say we're going to represent an RGB value in the least significant 24 bits of an integer, with R being bits 23-16, G being bits 15-8, and B being bits 7-0.
You can set R to any value between 0 and 255 without effecting the other bits like this:
void setR(ref int RGBValue, int newR)
{
int newRValue = newR << 16; // shift it left 16 bits so that the 8 low-bits are now in position 23-16
RGBValue = RGBValue & 0x00FF; // AND it with 0x00FF so that the top 16 bits are set to zero
RGBValue = RGBValue | newRValue; // now OR it with the newR value so that the new value is set.
}
By using bitwise ANDs and ORs (and occasionally more exotic operations) you can easily set and clear any individual bit of a larger value.
Rather than using toolkit or platform specific wrapper classes I think you are better off biting the bullet and learning your &s and |s and 0x04s and how all the bitwise operators work. By and large that's how its done for most projects, and the operations are extremely fast. The operations are pretty much identical on most languages so you won't be stuck dependant on some specific toolkit.

Categories

Resources