Most efficient way to store a 40 cards deck - c#

I'm building a simulator for a 40 card's deck game. The deck is divided into 4 seeds, each one with 10 cards. Since there's only 1 seed that's different from the others ( let's say, hearts ) , I've thinked of a quite convinient way to store a set of 4 cards with the same value in 3 bits: the first two indicate how many cards of a given value are left, and the last one is a marker that tells if the heart card of that value is still in the deck.
So,
{7h 7c 7s} = 101
That allows me to store the whole deck on 30 bits of memory instead of 40. Now, when i was programming in C, I'd have allocated 4 chars ( 1 byte each = 32 bits), and played with the values with bit operations.
In C# I can't do that, since chars are 2 bytes each and playing with bits is much more of a pain, so, the question is : what's the smallest amount of memory I'll have to use to store the data required?
PS: Keep in mind that i may have to allocate 100k+ of those decks in system's memory, so saving 10 bits is quite a lot

in C, I'd have allocated 3 chars ( 1 byte each = 32 bits)
3 bytes gives you 24 bits, not 32... you need 4 bytes to get 32 bits. (Okay, some platforms have non-8-bit bytes, but they're pretty rare these days.)
In C# I can't do that, since chars are 2 bytes each
Yes, so you use byte instead of char. You shouldn't be using char for non-textual information.
and playing with bits is much more of a pain
In what way?
But if you need to store 30 bits, just use an int or a uint. Or, better, create your own custom value type which backs the data with an int, but exposes appropriate properties and constructors to make it better to work with.
PS: Keep in mind that i may have to allocate 100k+ of those decks in system's memory, so saving 10 bits is quite a lot
Is it a significant amount though? If it turned out you needed to store 8 bytes per deck instead of 4 bytes, that means 800M instead of 400M for 100,000 of them. Still less than a gig of memory. That's not that much...

In C#, unlike in C/C++, the concept of a byte is not overloaded with the concept of a character.
Check out the byte datatype, in particular a byte[], which many of the APIs in the .Net Framework have special support for.

C# (and modern versions of C) have a type that's exactly 8 bits: byte (or uint8_t in C), so you should use that. C char usually is 8 bits, but it's not guaranteed and so you shouldn't rely on that.
In C#, you should use char and string only when dealing with actual characters and strings of characters, don't treat them as numbers.

Related

Why does C# System.Decimal (decimal) "waste" bits?

As written in the official docs the 128 bits of System.Decimal are filled like this:
The return value is a four-element array of 32-bit signed integers.
The first, second, and third elements of the returned array contain
the low, middle, and high 32 bits of the 96-bit integer number.
The fourth element of the returned array contains the scale factor and
sign. It consists of the following parts:
Bits 0 to 15, the lower word, are unused and must be zero.
Bits 16 to 23 must contain an exponent between 0 and 28, which
indicates the power of 10 to divide the integer number.
Bits 24 to 30 are unused and must be zero.
Bit 31 contains the sign: 0 mean positive, and 1 means negative.
With that in mind one can see that some bits are "wasted" or unused.
Why not for example 120 bits of integer, 7 bits of exponent and 1 bit of sign.
Probably there is a good reason for a decimal being the way it is. This question would like to know the reasoning behind that decision.
Based on Kevin Gosse's comment
For what it's worth, the decimal type seems to predate .net. The .net
framework CLR delegates the computations to the oleaut32 lib, and I
could find traces of the DECIMAL type as far back as Windows 95
I searched further and found a likely user of the DECIMAL code in oleauth32 Windows 95.
The old Visual Basic (non .NET based) and VBA have a sort-of-dynamic type called 'Variant'. In there (and only in there) you could save something nearly identical to our current System.Decimal.
Variant is always 128 bits with the first 16 bits reserved for an enum value of which data type is inside the Variant.
The separation of the remaining 112 bits could be based on common CPU architectures in the early 90'ies or ease of use for the Windows programmer. It sounds sensible to not pack exponent and sign in one byte just to have one more byte available for the integer.
When .NET was built the existing (low level) code for this type and it's operations was reused for System.Decimal.
Nothing of this is 100% verified and I would have liked the answer to contain more historical evidence but that's what I could puzzle together.
Here is the C# source of Decimal. Note the FCallAddSub style methods. These calls out to (unavailable) fast C++ implementations of these methods.
I suspect the implementation is like this because it means that operations on the 'numbers' in the first 96 bits can be simple and fast, as CPUs operate on 32-bit words. If 120 bits were used, CPU operations would be slower and trickier and require a lot of bitmasks to get the interesting extra 24 bits, which would then be difficult to work with. Additionally, this would then 'pollute' the highest 32-bit flags, and make certain optimizations impossible.
If you look at the code, you can see that this simple bit layout is useful everywhere. It is no doubt especially useful in the underlying C++ (and probably assembler).

Is it bad practice to work with datatypes that consist of less bits than the cpu's register size? Expressiveness of the code put aside

So im working with serialization alot where i compress loads of longs and ints down to their percentage of maxvalue and store them as a byte to the HD.
Now I was wondering if I could have less of an RAM impact if I used more bytes and shorts in my running code. To my knowledge this would have a negative impact on my OP/s as the CPU converts all values smaller than its register size up to match it.
Is the same true for the ram? I.E slot sizes are fixed size and a byte would take up just as much space as an int? Or is there something like labeled slots that signify the datatype, so that you could fill a slot with 2 bytes, one on the first 8 bits and another on the following 8, this storing 2 values in one slot?

from C++ to C# // Structure Memory Optimization

#Zbyl I have seen your answer in this thread (Bit fields in C#)
and i really like the Bitvector32 method , but for purpose of optimization : what if i have plenty of structures of a size of 8 bits / 12 bits (less than 32 bits) , is there anyway to do it in a manner less than Bitvector32 because that would be a lot of exagerated allocation of memory that would never be used : i will only use the first 8 bits of Bitvector32 .
Here is an example of structure that i want to make in C# :
struct iec_qualif
{
unsigned char var :2;
unsigned char res :2;
unsigned char bl :1; // blocked/not blocked
unsigned char sb :1; // substituted/not substituted
unsigned char nt :1; // not topical/topical
unsigned char iv :1; // valid/invalid
};
At first, I would consider the number of structs you are actually dealing with. Today's GB-memory-equipped hardware should easily be capable to cope with several thousands of your structs, even if you waste three bytes for each (1 000 000 of your structs would then occupy 4 MB of potentially > 1000 MB available). With this in mind, you could even live without the bit field entirely, having just normal bytes for each field (your setter should check range then, however), resulting in 6 bytes instead of four (and possibly two more for alignment issues), but giving you faster access to the values (getters), as no bit fiddling is necessary.
On the other hand: Bit fields in the end are nothing more than a convenient way to let the compiler write the code that you otherwise would have to write yourself. But with a little practise, it is not a too difficult task to handle the bits yourself, see this answer in the topic you referred to yourself (the 'handmade accessors' part), storing internally all data in a variable of type byte and accessing it via bitshifting and masking.

Wasted bytes in protocol buffer arrays?

I have a protocol buffer setup like this:
[ProtoContract]
Foo
{
[ProtoMember(1)]
Bar[] Bars;
}
A single Bar gets encoded to a 67 byte protocol buffer. This sounds about right because I know that a Bar is pretty much just a 64 byte array, and then there are 3 bytes overhead for length prefixing.
However, when I encode a Foo with an array of 20 Bars it takes 1362 bytes. 20 * 67 is 1340, so there are 22 bytes of overhead just for encoding an array!
Why does this take up so much space? And is there anything I can do to reduce it?
This overhead is quite simply the information it needs to know where each of the 20 objects starts and ends. There is nothing I can do different here without breaking the format (i.e. doing something contrary to the spec).
If you really want the gory details:
An array or list is (if we exclude "packed", which doesn't apply here) simply a repeated block of sub-messages. There are two layouts available for sub-messages; strings and groups. With a string, the layout is:
[header][length][data]
where header is the varint-encoded mash of the wire-type and field-number (hex 08 in this case with field 1), length is the varint-encoded size of data, and data is the sub-object itself. For small objects (data less than 128 bytes) this often means 2 bytes overhead per object, depending on a: the field number (fields above 15 take more space), and b: the size of the data.
With a group, the layout is:
[header][data][footer]
where header is the varint-encoded mash of the wire-type and field-number (hex 0B in this case with field 1), data is the sub-object, and footer is another varint mash to indicate the end of the object (hex 0C in this case with field 1).
Groups are less favored generally, but they have the advantage that they don't incur any overhead as data grows in size. For small field-numbers (less than 16) again the overhead is 2 bytes per object. Of course, you pay double for large field-numbers, instead.
By default, arrays aren't actually passed as arrays, but as repeated members, which have a little more overhead.
So I'd guess you actually have 1 byte of overhead for each repeated array element, plus 2 extra bytes overhead on top.
You can lose the overhead by using a "packed" array. protobuf-net supports this: http://code.google.com/p/protobuf-net/
The documentation for the binary format is here: http://code.google.com/apis/protocolbuffers/docs/encoding.html

Read/Write compressed binary data

I read all over the place people talk about compressing objects on a bit by bit scale. Things like "The first three bits represent such and such, then the next two represent this and twelve bits for that"
I understand why it would be desirable to minimize memory usage, but I cannot think of a good way to implement this. I know I would pack it into one or more integers (or longs, whatever), but I cannot envision an easy way to work with it. It would be pretty cool if there were a class where I could get/set arbitrary bits from an arbitrary length binary field, and it would take care of things for me, and I wouldn't have to go mucking about with &'s and |'s and masks and such.
Is there a standard pattern for this kind of thing?
From MSDN:
BitArray Class
Manages a compact array of bit values, which are represented as Booleans, where true indicates that the bit is on (1) and false indicates the bit is off (0).
Example:
BitArray myBitArray = new BitArray(5);
myBitArray[3] = true; // set bit at offset 3 to 1
BitArray allows you to set only individual bits, though. If you want to encode values with more bits, there's probably no way around mucking about with &'s and |'s and masks and stuff :-)
You might want to check out the BitVector32 structure in the .NET Framework. It lets you define "sections" which are ranges of bits within an int, then read and write values to those sections.
The main limitation is that it's limited to a single 32-bit integer; this may or may not be a problem depending on what you're trying to do. As dtb mentioned, BitArray can handle bit fields of any size, but you can only get and set a single bit at a time--there is no support for sections as in BitVector32.
What you're looking for are called bitwise operations.
For example, let's say we're going to represent an RGB value in the least significant 24 bits of an integer, with R being bits 23-16, G being bits 15-8, and B being bits 7-0.
You can set R to any value between 0 and 255 without effecting the other bits like this:
void setR(ref int RGBValue, int newR)
{
int newRValue = newR << 16; // shift it left 16 bits so that the 8 low-bits are now in position 23-16
RGBValue = RGBValue & 0x00FF; // AND it with 0x00FF so that the top 16 bits are set to zero
RGBValue = RGBValue | newRValue; // now OR it with the newR value so that the new value is set.
}
By using bitwise ANDs and ORs (and occasionally more exotic operations) you can easily set and clear any individual bit of a larger value.
Rather than using toolkit or platform specific wrapper classes I think you are better off biting the bullet and learning your &s and |s and 0x04s and how all the bitwise operators work. By and large that's how its done for most projects, and the operations are extremely fast. The operations are pretty much identical on most languages so you won't be stuck dependant on some specific toolkit.

Categories

Resources