Bitwise Shift - Getting different results in C#.net vs PHP

Bitwise Shift - Getting different results in C#.net vs PHP - c#

When I run this command in PHP, I get:
Code: 2269495617392648 >> 24
Result: 32
When I run it in C#.net or vb.net, I get:
Code: 2269495617392648 >> 24
Result: 135272480
PHP is correct.
The interesting thing is, that when I try to shift any number greater than int32 in .net, it yields bad results..
Every number under int32 (2147483647) yields the same results from php and c#.net or vb.net
Is there a workaround for this in .net?

Strictly speaking, PHP is wrong.
The entire bit pattern of the number 2269495617392648 is:
1000 0001 0000 0001 1000 0010 0000 0001 1000 0001 0000 0000 1000 (2269495617392648)
Right shifting this 24 times gets you:
0000 0000 0000 0000 0000 0000 1000 0001 0000 0001 1000 0010 0000 (135272480)
This is the bit pattern for 135272480, not 32.
What's going on in PHP apparently is that the number 2269495617392648 is being truncated to 538447880 by perserving only the lower 32 bits. Note that the number 2269495617392648 is much too big to fit in a 32-bit integer, signed or unsigned.
Right shifting the truncated bits 24 times gives us 32.
Before truncation to 32-bits:
1000 0001 0000 0001 1000 0010 0000 0001 1000 0001 0000 0000 1000 (2269495617392648)
After truncation to 32-bits:
0010 0000 0001 1000 0001 0000 0000 1000 (538447880)
Right shifting the truncated bits by 24 bits:
0000 0000 0000 0000 0000 0000 0010 0000 (32)
You have alluded to this problem when you said:
The interesting thing is, that when I
try to shift any number greater then
int32 in .net it yields bad results..
It gives you bad results because some bits are being chopped off in order to fit in 32 bits.
If you're porting from PHP to C# and need to preserve this behavior, you need to manually truncate the bits by using 2269495617392648 & 0xffffffff instead of just 2269495617392648 (see jcomeau_ictx's answer). But be aware that there is an issue of bit truncation going on in your PHP code. I'm not certain if it's intentional or not.

bitwise-AND your number with 0xffffffff before shifting.
in Python:
>>> 2269495617392648 >> 24
135272480L
>>> (2269495617392648 & 0xffffffff) >> 24
32L
I rarely use .net, but fairly sure the syntax will be very similar.

Related

How to read and show bit 3 from Binary 8 bit textbox data?

I have this in hex: 08 Which is this in binary: 0000 1000 (bit positions: 7,6,5,4,3,2,1,0)
Now I would like to make a bitmask in C# , so I have bit position 3.
Here in example 1 or better (the one in ""): 0000 "1"000
How to show only bit3 in TextBox.text
Thanks
Mano.

Checking if a char is equal to multiple other chars, with as little branching as possible

I'm writing some performance-sensitive C# code that deals with character comparisons. I recently discovered a trick where you can tell if a char is equal to one or more others without branching, if the difference between them is a power of 2.
For example, say you want to check if a char is U+0020 (space) or U+00A0 (non-breaking space). Since the difference between the two is 0x80, you can do this:
public static bool Is20OrA0(char c) => (c | 0x80) == 0xA0;
as opposed to this naive implementation, which would add an additional branch if the character was not a space:
public static bool Is20OrA0(char c) => c == 0x20 || c == 0xA0;
How the first one works is since the difference between the two chars is a power of 2, it has exactly one bit set. So that means when you OR it with the character and it leads to a certain result, there are exactly 2 ^ 1 different characters that could have lead to that result.
Anyway, my question is, can this trick somehow be extended to characters with differences that aren't multiples of 2? For example, if I had the characters # and 0 (which have a difference of 13, by the way), is there any sort of bit-twiddling hack I could use to check if a char was equal to either of them, without branching?
Thanks for your help.
edit: For reference, here is where I first stumbled across this trick in the .NET Framework source code, in char.IsLetter. They take advantage of the fact that a - A == 97 - 65 == 32, and simply OR it with 0x20 to uppercase the char (as opposed to calling ToUpper).

If you can tolerate a multiply instead of a branch, and the values you are testing against only occupy the lower bits of the data type you are using (and therefore won't overflow when multiplied by a smallish constant, consider casting to a larger data type and using a correspondingly larger mask value if this is an issue), then you could multiply the value by a constant to force the two values to be a power of 2 apart.
For example, in the case of # and 0 (decimal values 35 and 48), the values are 13 apart. Rounding down, the nearest power of 2 to 13 is 8, which is 0.615384615 of 13. Multiplying this by 256 and rounding up, to give an 8.8 fixed point value gives 158.
Here are the binary values for 35 and 48, multiplied by 158, and their neighbours:
34 * 158 = 5372 = 0001 0100 1111 1100
35 * 158 = 5530 = 0001 0101 1001 1010
36 * 158 = 5688 = 0001 0110 0011 1000
47 * 158 = 7426 = 0001 1101 0000 0010
48 * 158 = 7548 = 0001 1101 1010 0000
49 * 158 = 7742 = 0001 1110 0011 1110
The lower 7 bits can be ignored because they aren't necessary in order to separate any of the neighbouring values from each other, and apart from that, the values 5530 and 7548 only differ in bit 11, so you can use the mask and compare technique, but using an AND instead of an OR. The mask value in binary is 1111 0111 1000 0000 (63360) and the compare value is 0001 0101 1000 0000 (5504), so you can use this code:
public static bool Is23Or30(char c) => ((c * 158) & 63360) == 5504;
I haven't profiled this, so I can't promise it's faster than a simple compare.
If you do implement something like this, be sure to write some test code that loops through every possible value that can be passed to the function, to verify that it works as expected.

You can use the same trick to compare against a set of 2^N values provided that they have all other bits equal except N bits. E.g if the set of values is 0x01, 0x03, 0x81, 0x83 then N=2 and you can use (c | 0x82) == 0x83. Note that the values in the set differ only in bits 1 and/or 7. All other bits are equal. There are not many cases where this kind of optimization can be applied, but when it can and every little bit of extra speed counts, its a good optimization.
This is the same way boolean expressions are optimized (e.g. when compiling VHDL). You may also want to look up Karnaugh maps.
That being said, it is really bad practice to do this kind of comparisons on character values especially with Unicode, unless you know what you are doing and are doing really low level stuff (such as drivers, kernel code etc). Comparing characters (as opposed to bytes) has to take into account the linguistic features (such as uppercase/lowercase, ligatures, accents, composited characters etc)
On the other hand if all you need is binary comparison (or classification) you can use lookup tables. With single byte character sets these can be reasonably small and really fast.

If not having branches is really your main concern, you can do something like this:
if ( (x-c0|c0-x) & (x-c1|c1-x) & ... & (x-cn|cn-x) & 0x80) {
// x is not equal to any ci
If x is not equal to a specific c, either x-c or c-x will be negative, so x-c|c-x will have bit 7 set. This should work for signed and unsigned chars alike. If you & it for all c's, the result will have bit 7 set only if it's set for every c (i.e. x is not equal to any of them)

What does an integer look like when converted to a byte?

I want to create a Byte array in C# and the first and second bytes have to be 70 and 75 respectively:
So I did something like following:
List<byte> retval = new List<byte>();
retval.Add(Convert.ToByte(75));
retval.Add(Convert.ToByte(70));
I thought the function will convert the numbers into byte and if I put watch on the arrayList at runtime, then it would look a little different, but it did not change. I was expecting to see values in format of 0x00 something but it still looks like raw integers.
Am I missing something?

Right-click your Watch Window or Immediate Window and check Hexadecimal Display option. It is by default set to "unchecked" for bytes, because Visual Studio IDE assumes that you would prefer to read int values instead of bytes:

A byte is an integeral value and it, along with all the other integral types, don't look like anything until you convert them into a representation. All they are are electrical charges on a silicon chip.
byte, ushort, uint and ulong are all unsigned integral types of varying lengths (1, 2, 4 and 8 octets respectively). They are stored as binary (base two) numbers, with each bit representing a power of two, with the low-order (logically, the right-most) bit representing 20 or 1 and the high-order bit representing the highest power of two possible for that size (respectively: 27, 215, , 231 and 263.
sbyte, short, int and long are signed integral types of varying lengths (1, 2, 4 and 8 octets respectively). Internally, however, they are stored in what's called two's-complement notation
http://en.wikipedia.org/wiki/Two's_complement
http://www.cs.cornell.edu/~tomf/notes/cps104/twoscomp.html
That means, the high order bit is the sign bit (0 for non-negative, 1 for negative). Non-negative values ( x >= 0) are represented as above, except the largest power of two that can be represented is, respectively 26, 214 , 230 and 262, since the high order bit is reserved for the sign.
Negative numbers are stored as the two's complement of the absolute value. That allow subtraction to be performed by addition circuits, subtraction being the addition of the negative. To get the two's-complement of a number, the following process is performed:
Invert the bits
Add 1, propagating any carries in the usual way, to obtain the negative form of the integer.
So the negative form of 1 is
binary representation of +1: 0000 0000 0000 0001
invert the bits (complement): 1111 1111 1111 1110
add 1: 1111 1111 1111 1111
If you add the positive and negative forms, the result is zero:
0000 0000 0000 0001
1111 1111 1111 1111
-------------------
0000 0000 0000 0000
And the CPUs carry or integer overflow flag will be set.
Zero remains the same as zero really doesn't have a sign
binary zero: 0000 0000 0000 0000
inverted (complement: 1111 1111 1111 1111
add 1: 0000 0000 0000 0000
And, change the sign of the smallest negative number that can be represented
largest positive value: 1000 0000 0000 0000
inverted (complement): 0111 1111 1111 1111
add 1: 1000 0000 0000 0000
And you get itself, since its absolute value is 1 larger than the large positive number and adding them together yields the expected zero value as the sign bit is carried out to the left, setting the carry flag).
Adding them together, they yield the expected 0 as the carries propagates out of the high order bit and sets the CPUs carry flag.

Understanding the behavior of a single ampersand operator (&) on integers

I understand that the single ampersand operator is normally used for a 'bitwise AND' operation. However, can anyone help explain the interesting results you get when you use it for comparison between two numbers?
For example;
(6 & 2) = 2
(10 & 5) = 0
(20 & 25) = 16
(123 & 20) = 16
I'm not seeing any logical link between these results and I can only find information on comparing booleans or single bits.

Compare the binary representations of each of those.
110 & 010 = 010
1010 & 0101 = 0000
10100 & 11001 = 10000
1111011 & 0010100 = 0010000
In each case, a digit is 1 in the result only when it is 1 on both the left AND right side of the input.

You need to convert your numbers to binary representation and then you will see the link between results like 6 & 2= 2 is actually 110 & 010 =010 etc
10 & 5 is 1010 & 0101 = 0000

The binary and operation is performed on the integers, represented in binary. For example
110 (6)
010 (2)
--------
010 (2)

The bitwise AND is does exactly that: it does an AND operation on the Bits.
So to anticipate the result you need to look at the bits, not the numbers.
AND gives you 1, only if there's 1 in both number in the same position:
6(110) & 2(010) = 2(010)
10(1010) & 5(0101) = 0(0000)
A bitwise OR will give you 1 if there's 1 in either numbers in the same position:
6(110) | 2(010) = 6(110)
10(1010) | 5(0101) = 15(1111)

6 = 0110
2 = 0010
6 & 2 = 0010
20 = 10100
25 = 11001
20 & 25 = 10000
(looks like you're calculation is wrong for this one)
Etc...

Internally, Integers are stored in binary format. I strongly suggest you read about that. Knowing about the bitwise representation of numbers is very important.
That being said, the bitwise comparison compares the bits of the parameters:
Decimal: 6 & 2 = 2
Binary: 0110 & 0010 = 0010

Bitwize AND matches the bits in binary notation one by one and the result is the bits that are comon between the two numbers.
To convert a number to binary you need to understand the binary system.
For example
6 = 110 binary
The 110 represents 1x4 + 1x2 + 0x1 = 6.
2 then is
0x4 + 1x2 + 0x1 = 2.
Bitwize and only retains the positions where both numbers have the position set, in this case the bit for 2 and the result is then 2.
Every extra bit is double the last so a 4 bit number uses the multipliers 8, 4, 2, 1 and can there fore represent all numbers from 0 to 15 (the sum of the multipliers.)

How to solve this bit operation?

I have one byte in with I need to replace last (least important) bits.
Example below.
Original byte: xxxx0110
Replacement byte: 1111
What I want to get: xxxx1111
Original byte: xxxx1111
Replacement byte: 0000
What I want to get: xxxx0000
Original byte: xxxx0000
Replacement byte: 1111
What I want to get: xxxx1111
Original byte: xxxx1010
Replacement byte: 1111
What I want to get: xxxx1111
Original byte: xxxx0101
Replacement byte: 0111
What I want to get: xxxx0111

value = (byte)( (value & ~15) | newByte);
The ~15 creates a mask of everything except the last 4 bits; value & {that mask} takes the last 4 bits away, then | newByte puts the bits from the new data in their place.

This can be done with a combination of bitwise AND to clear the bits and bitwise OR to set the bits.
To clear the lowest four bits, you can AND with a value that is 1 everywhere except at those bits, where it's zero. One value like this would be ~0xF, which is the complement of 0xF, which is four ones: 0b1111.
To set the bits, you can then use bitwise OR with the bits to set. Since 0 OR x = x, this works as you'd intend it.
The net result would be
(x & ~0xF) | bits
EDIT: As per Eamon Nerbonne's comment, you should then cast back to a byte:
(byte)((x & ~0xF) | bits)

If my understanding is right, you want to OR your byte (after left shift 4 times) with the replacement byte(left shift 4 times, too). Then right shift 4 times and you will get the desired result.
For example: a = 1001 1101
Replacement byte: 0000 1011
Left shift a 4 times: 1101 0000
Left shift replacement 4 times: 1011 0000
OR result: 1111
Right shift 4 times: 1011 (end result).
Maybe this link is helpful: http://www.codeproject.com/KB/cs/leftrightshift.aspx

trim the last 4 bits. and append the new ones.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Bitwise Shift - Getting different results in C#.net vs PHP - c#

bitwise-AND your number with 0xffffffff before shifting. in Python: >>> 2269495617392648 >> 24 135272480L >>> (2269495617392648 & 0xffffffff) >> 24 32L I rarely use .net, but fairly sure the syntax will be very similar.

Related

How to read and show bit 3 from Binary 8 bit textbox data?

Checking if a char is equal to multiple other chars, with as little branching as possible

What does an integer look like when converted to a byte?

Understanding the behavior of a single ampersand operator (&) on integers

How to solve this bit operation?

Categories

Resources