C# - Bits and Bytes - c#

I try to store some information in two bytes (byte[2]).
In the first four bit of the first byte I want to store a "type-information" encoded as a value from 0-9. And in the last four bit + the second byte I want to store a size-info, so the maximum of the size-info is 4095 (0xFFF);
Lets do some examples to explain what I mean.
When type-info is 5 and the size is 963 than the result should look like: 35-C3 as hex string.
35-C3 => the 5 is the type-info and the 3C3 is the 963.
03-00 => type-info 3 and size 0.
13-00 => type-info 3 and size 1.
But I have no idea how to do this with C# and need some community help:
byte type = 5; // hex 5
short size = 963; // hex 3C3
byte[] bytes = ???
string result = BitConverter.ToString(bytes);
// here result should by 35-C3

It should look like this:
bytes = new byte[2];
bytes[0] = type << 4 | size >> 8;
bytes[1] = size & 0xff;
Note: initially my numbers were wrong, I had written type << 8 | size >> 16 while it should have been type << 4 | size >> 8 as Aleksey showed in his answer.
Comments moved into the answer for posterity:
By shifting your type bits to the left by 4 before storing them in bytes[0] you ensure that they occupy the top 4 bits of bytes[0]. By shifting your size bits to the right by 8 you ensure that the low 8 bits of size are dropped out, and only the top 4 bits remain, and these top 4 bits are going to be stored into the low 4 bits of bytes[0]. It helps to draw a diagram:
bytes[0] bytes[1]
+------------------------+ +------------------------+
| 7 6 5 4 3 2 1 0 | | 7 6 5 4 3 2 1 0 |
+------------------------+ +------------------------+
type << 4
+-----------+
| 3 2 1 0| <-- type
+-----------+
+------------+ +------------------------+
|11 10 9 8 | | 7 6 5 4 3 2 1 0 | <-- size
+------------+ +------------------------+
size >> 8 size & 0xff
size is a 12-bit quantity. The bits are in positions 11 though 0. By shifting it right by 8 you are dropping the rightmost 8 bits and you are left with the top 4 bits only, at positions 3-0. These 4 bits are then stored in the low 4 bits of bytes[0].

Try this:
byte[] bytes = new byte[2];
bytes[0] = (byte) (type << 4 | size >> 8);
bytes[1] = (byte) (size & 0xff);

Micro Optimisations of the Memory Profile for types is not something you usually should bother with in .NET. If you were running native C++ I could understand this to some degree, but I would still advise against doing that. It is a lot of work with limited benefits at best.
But in .NET you just make a class or struct with a Enumeration (Int8) "Type" and a Int16 SizeInfo, say "good enough" and call it a day. Spend your resource on something better then shaving 1 byte worth of memory off it, when 64 bit is the native Integer Size of most Computers nowadays.
BitArray is about the closest you can get to actually defining specific Bits from a byte in .NET. And it features some info on similar types.
If you want to do the math the hard way, Modulo is a good place to start.

Related

Packing and unpacking a 32-bit number?

I have a project I'm working on where the file format stores the locations of various parts of the file in offsets. So, for example, the file will hold information about 8 different layers. There will be an offset in bytes to the data for each layer.
I'm having trouble calculating what that offset is as the way it is stored is confusing to me. I do have enough documentation to do it by hand but I don't know how to do it in code.
The docs say:
A packed offset is 32bits. The unpacked offset is also a 32 bit number to be used as a byte count. An offset is packed in memory as two words, or 4 bytes.
So, for example,
byte0 = aaaaaaaa
byte1 = bbbbbbbb
byte3 = cccccccc
byte4 = ddddeeee
The hi nibble of the low byte is appended to byte 0 and byte 2 as follows:
dddd aaaaaaaa cccccccc
Four 0 are added to the lo part (enforcing 16 byte chunkiness)
dddd aaaaaaaa cccccccc 0000
For completeness we specify that the high 8 bits of a 32 bit offset are 0.
The final unpacked offset looks like this:
00000000 ddddaaaa aaaacccc cccc0000
I can follow those instructions manually and come up with the correct numnber, but I don't know how to code that. I was copying another person's code who was working with the same filetype and they used:
offset = (val1 << 12) + (val2 << 4) + (val3 <<4) + (val4 >> 4)
val1, val2, val3, and val4 are just the 4 individual bytes. This worked fine for smaller numbers, but as soon as they got over a certain value, it no longer worked.
Can anyone help in getting this to work in C#?
Judging by your description, it looks like you need the following
offset = val1 << 12 | val3 << 4 | (val4 & 0xF0) << 16;
In this case, val1 means aaaaaaaa, val3 means cccccccc and val4 means dddddddd. val2 appears to be ignored.

Storage over heading and wasting space for value types in C# and CLR

i am just second year computer scientist student who is trying to learn more. i was reading a c# book:"c# in a nutshell" and i encountered this paragraph regarding storage over heading in c#.
Value-type instances occupy precisely the memory required to store their fields. In
this example, Point takes eight bytes of memory:
struct Point
{
int x; // 4 bytes
int y; // 4 bytes
}
Technically, the CLR positions fields within the type at an
address that’s a multiple of the fields’ size (up to a maximum
of eight bytes). Thus, the following actually consumes 16 bytes
of memory (with the seven bytes following the first field “wasted”):
struct A { byte b; long l; }
You can override this behavior with the StructLayout
attribute
my first question is: why only 16 bytes?why not 8 or 32 or other number which are multiple of 8
my second question is: why is it wasted
Computer architectures determine the minimum addressable space as a "word". So for example, on a 32-bit architecture the word is 32-bits or 4 bytes. And it is twice that for a 64-bit architecture. The processor operations work on words, not bytes.
So, imagine struct MyStruct {byte a; long b;} on 32-bit architecture, this takes 3 words (12 bytes). And on a 64-bit architecture, this takes 16-bytes.
// 8-bit word size (3 1-byte words or 3 bytes) - this is the most compact it can be, but we don't use 8-bit processors.
|1|1|2|3|4|5|6|7|8|
|a|b|b|b|b|b|b|b|b|
// 32-bit word size (3 4-byte words or 12 bytes)
|1234|1234|1234|
|a---|bbbb bbbb|
// 64-bit word size (2 8-byte words or 16-bytes)
|12345678|12345678|
|a-------|bbbbbbbb|
In C#, int is an alias for System.Int32 which is a 32-bit signed integer, which is 4 bytes.
When aligning to machine-word boundary, it is often more efficient for a processor to access memory at 64-bit boundaries, so the compiler might align members of the structure with that in mind, and actually leave empty space in the physical structure to make it faster to access them.
struct A { byte b; long l; }
The memory layout of the above struct might look something like the following:
0 1 2 3 4 5 6 7 8 9 a b c d e f
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| b | | | | | | | | l | l | l | l | l | l | l | l |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
byte b is placed at the beginning of the struct (starting index 0). It occupies one byte, so the next field will have a starting index of 1 (or greater). long l will occupy 8 bytes, therefore it will have to start at an index that is a multiple of 8. The runtime will try to place the field at the next possible index, in order to make the struct any larger than it needs to be. That's why it'll be placed at starting offset 8.
The bytes 1 through 7 will thus end up unoccupied by any field of struct A. They are not used by any field of A, and since .NET objects do not overlay one another in memory, nothing is going to make use, or access, these 7 bytes. They are reserved exclusively for an instance of A, but not for anything, so their space is effectively wasted.

How to generate bit-shift equations?

Preferably for this to be done in C#.
Supposedly, I have an integer of 1024.
I will be able to generate these equations:
4096 >> 2 = 1024
65536 >> 6 = 1024
64 << 4 = 1024
and so on...
Any clues or tips or guides or ideas?
Edit: Ok, in simple terms, what I want is, for example...Hey, I'm giving you an integer of 1024, now give me a list of possible bit-shift equations that will always return the value of 1024.
Ok, scratch that. It seems my question wasn't very concise and clear. I'll try again.
What I want, is to generate a list of possible bit-shift equations based on a numerical value. For example, if I have a value of 1024, how would I generate a list of possible equations that would always return the value of 1024?
Sample Equations:
4096 >> 2 = 1024
65536 >> 6 = 1024
64 << 4 = 1024
In a similar way, if I asked you to give me some additional equations that would give me 5, you would response:
3 + 2 = 5
10 - 5 = 5
4 + 1 = 5
Am I still too vague? I apologize for that.
You may reverse each equation and thus "generate" possible equations:
1024 >> 4 == 64
and therefore
64 << 4 == 1024
Thus generate all right/left shifts of 1024 without loosing bits due to overflow or underflow of your variable and then invert the corresponding equation.
Just add an extra '>' or '<':
uint value1= 4096 >> 2;
uint value2 = 65536 >> 6;
uint value3 = 64 << 4;
http://www.blackwasp.co.uk/CSharpShiftOperators.aspx
Are you asking why these relationships exist? Shifting bits left by 1 bit is equivalent to multiplying by 2. So 512 << 1 = 512 * 2 = 1024. Shifting right by 1 is dividing by 2. Shifting by 2 is multiplying/dividing by 4, by n is 2^n. So 1 << 10 = 1 * 2^10 = 1024. To see why, write the number out in binary: let's take 7 as an example:
7 -> 0000 0111b
7 << 1 -> 0000 1110b = 14
7 << 3 -> 0011 1000b = 56
If you already knew all this, I apologize, but you might want to make the question less vague.

Ignoring Leftmost Bit in Four Bytes

First an explanation of why:
I have a list of links to a variety of MP3 links and I'm trying to read the ID3 information for these files quickly. I'm only downloading the first 1500 or so bytes and trying to ana yze the data within this chunk. I came across ID3Lib, but I could only get it to work on completely downloaded files and didn't notice any support for Streams. (If I'm wrong in this, feel free to point that out)
So basically, I'm left trying to parse the ID3 tag by myself. The size of the tag can be determined from four bytes near the start of the file. From the ID3 site:
The ID3v2 tag size is encoded with four bytes where the most
significant bit (bit 7) is set to zero in every byte, making a total
of 28 bits. The zeroed bits are ignored, so a 257 bytes long tag is
represented as $00 00 02 01.
So basically:
00000000 00000000 00000010 00000001
becomes
0000 00000000 00000001 00000001
I'm not too familiar with bit level operations and was wondering if someone could shed some insight on an elegant solution to ignore the leftmost bit of each of these four bytes? I'm trying to pull a base 10 integer from it, so that works as well.
If you've got the four individual bytes, you'd want:
int value = ((byte1 & 0x7f) << 21) |
((byte2 & 0x7f) << 14) |
((byte3 & 0x7f) << 7) |
((byte4 & 0x7f) << 0);
If you've got it in a single int already:
int value = ((value & 0x7f000000) >> 3) |
((value & 0x7f0000) >> 2) |
((value & 0x7f00) >> 1) |
(value & 0x7f);
To clear the most significant bit, AND with 127 (0x7F), this takes all bits apart from the MSB.
int tag1 = tag1Byte & 0x7F; // this is the first one read from the file
int tag2 = tag2Byte & 0x7F;
int tag3 = tag3Byte & 0x7F;
int tag4 = tag4Byte & 0x7F; // this is the last one
To convert this into a single number, realize that each tag value is a base 128 digit. So, the least signifiant is multipled by 128^0 (1), the next 128^1 (128), the third significant (128^2) and so on.
int tagLength = tag4+(tag3<<7)+(tag2<<14)+(tag1<<21)
You mention you want to conver this to base 10. You can then convert this to base 10, say for printing, using int to string conversion:
String base10 = String.valueOf(tagLength);

Why AND two numbers to get a Boolean?

I am working on a little Hardware interface project based on the Velleman k8055 board.
The example code comes in VB.Net and I'm rewriting this into C#, mostly to have a chance to step through the code and make sense of it all.
One thing has me baffled though:
At one stage they read all digital inputs and then set a checkbox based on the answer to the read digital inputs (which come back in an Integer) and then they AND this with a number:
i = ReadAllDigital
cbi(1).Checked = (i And 1)
cbi(2).Checked = (i And 2) \ 2
cbi(3).Checked = (i And 4) \ 4
cbi(4).Checked = (i And 8) \ 8
cbi(5).Checked = (i And 16) \ 16
I have not done Digital systems in a while and I understand what they are trying to do but what effect would it have to AND two numbers? Doesn't everything above 0 equate to true?
How would you translate this to C#?
This is doing a bitwise AND, not a logical AND.
Each of those basically determines whether a single bit in i is set, for instance:
5 AND 4 = 4
5 AND 2 = 0
5 AND 1 = 1
(Because 5 = binary 101, and 4, 2 and 1 are the decimal values of binary 100, 010 and 001 respectively.)
I think you 'll have to translate it to this:
i & 1 == 1
i & 2 == 2
i & 4 == 4
etc...
This is using the bitwise AND operator.
When you use the bitwise AND operator, this operator will compare the binary representation of the two given values, and return a binary value where only those bits are set, that are also set in the two operands.
For instance, when you do this:
2 & 2
It will do this:
0010 & 0010
And this will result in:
0010
0010
&----
0010
Then if you compare this result with 2 (0010), it will ofcourse return true.
Just to add:
It's called bitmasking
http://en.wikipedia.org/wiki/Mask_(computing)
A boolean only require 1 bit. In the implementation most programming language, a boolean takes more than a single bit. In PC this won't be a big waste, but embedded system usually have very limited memory space, so the waste is really significant. To save space, the booleans are packed together, this way a boolean variable only takes up 1 bit.
You can think of it as doing something like an array indexing operation, with a byte (= 8 bits) becoming like an array of 8 boolean variables, so maybe that's your answer: use an array of booleans.
Think of this in binary e.g.
10101010
AND
00000010
yields 00000010
i.e. not zero. Now if the first value was
10101000
you'd get
00000000
i.e. zero.
Note the further division to reduce everything to 1 or 0.
(i and 16) / 16 extracts the value (1 or 0) of the 5th bit.
1xxxx and 16 = 16 / 16 = 1
0xxxx and 16 = 0 / 16 = 0
And operator performs "...bitwise conjunction on two numeric expressions", which maps to '|' in C#. The '` is an integer division, and equivalent in C# is /, provided that both operands are integer types.
The constant numbers are masks (think of them in binary). So what the code does is apply the bitwise AND operator on the byte and the mask and divide by the number, in order to get the bit.
For example:
xxxxxxxx & 00000100 = 00000x000
if x == 1
00000x00 / 00000100 = 000000001
else if x == 0
00000x00 / 00000100 = 000000000
In C# use the BitArray class to directly index individual bits.
To set an individual bit i is straightforward:
b |= 1 << i;
To reset an individual bit i is a little more awkward:
b &= ~(1 << i);
Be aware that both the bitwise operators and the shift operators tend to promote everything to int which may unexpectedly require casting.
As said this is a bitwise AND, not a logical AND. I do see that this has been said quite a few times before me, but IMO the explanations are not so easy to understand.
I like to think of it like this:
Write up the binary numbers under each other (here I'm doing 5 and 1):
101
001
Now we need to turn this into a binary number, where all the 1's from the 1st number, that is also in the second one gets transfered, that is - in this case:
001
In this case we see it gives the same number as the 2nd number, in which this operation (in VB) returns true. Let's look at the other examples (using 5 as i):
(5 and 2)
101
010
----
000
(false)
(5 and 4)
101
100
---
100
(true)
(5 and 8)
0101
1000
----
0000
(false)
(5 and 16)
00101
10000
-----
00000
(false)
EDIT: and obviously I miss the entire point of the question - here's the translation to C#:
cbi[1].Checked = i & 1 == 1;
cbi[2].Checked = i & 2 == 2;
cbi[3].Checked = i & 4 == 4;
cbi[4].Checked = i & 8 == 8;
cbi[5].Checked = i & 16 == 16;
I prefer to use hexadecimal notation when bit twiddling (e.g. 0x10 instead of 16). It makes more sense as you increase your bit depths as 0x20000 is better than 131072.

Categories

Resources