Number of unset bit left of most significant set bit?

Number of unset bit left of most significant set bit? - c#

Assuming the 64bit integer 0x000000000000FFFF which would be represented as
00000000 00000000 00000000 00000000
00000000 00000000 >11111111 11111111
How do I find the amount of unset bits to the left of the most significant set bit (the one marked with >) ?

In straight C (long long are 64 bit on my setup), taken from similar Java implementations: (updated after a little more reading on Hamming weight)
A little more explanation: The top part just sets all bit to the right of the most significant 1, and then negates it. (i.e. all the 0's to the 'left' of the most significant 1 are now 1's and everything else is 0).
Then I used a Hamming Weight implementation to count the bits.
unsigned long long i = 0x0000000000000000LLU;
i |= i >> 1;
i |= i >> 2;
i |= i >> 4;
i |= i >> 8;
i |= i >> 16;
i |= i >> 32;
// Highest bit in input and all lower bits are now set. Invert to set the bits to count.
i=~i;
i -= (i >> 1) & 0x5555555555555555LLU; // each 2 bits now contains a count
i = (i & 0x3333333333333333LLU) + ((i >> 2) & 0x3333333333333333LLU); // each 4 bits now contains a count
i = (i + (i >> 4)) & 0x0f0f0f0f0f0f0f0fLLU; // each 8 bits now contains a count
i *= 0x0101010101010101LLU; // add each byte to all the bytes above it
i >>= 56; // the number of bits
printf("Leading 0's = %lld\n", i);
I'd be curious to see how this was efficiency wise. Tested it with several values though and it seems to work.

Based on: http://www.hackersdelight.org/HDcode/nlz.c.txt
template<typename T> int clz(T v) {int n=sizeof(T)*8;int c=n;while (n){n>>=1;if (v>>n) c-=n,v>>=n;}return c-v;}
If you'd like a version that allows you to keep your lunch down, here you go:
int clz(uint64_t v) {
int n=64,c=64;
while (n) {
n>>=1;
if (v>>n) c-=n,v>>=n;
}
return c-v;
}
As you'll see, you can save cycles on this by careful analysis of the assembler, but the strategy here is not a terrible one. The while loop will operate Lg[64]=6 times; each time it will convert the problem into one of counting the number of leading bits on an integer of half the size.
The if statement inside the while loop asks the question: "can i represent this integer in half as many bits", or analogously, "if i cut this in half, have i lost it?". After the if() payload completes, our number will always be in the lowest n bits.
At the final stage, v is either 0 or 1, and this completes the calculation correctly.

If you are dealing with unsigned integers, you could do this:
#include <math.h>
int numunset(uint64_t number)
{
int nbits = sizeof(uint64_t)*8;
if(number == 0)
return nbits;
int first_set = floor(log2(number));
return nbits - first_set - 1;
}
I don't know how it will compare in performance to the loop and count methods that have already been offered because log2() could be expensive.
Edit:
This could cause some problems with high-valued integers since the log2() function is casting to double and some numerical issues may arise. You could use the log2l() function that works with long double. A better solution would be to use an integer log2() function as in this question.

// clear all bits except the lowest set bit
x &= -x;
// if x==0, add 0, otherwise add x - 1.
// This sets all bits below the one set above to 1.
x+= (-(x==0))&(x - 1);
return 64 - count_bits_set(x);
Where count_bits_set is the fastest version of counting bits you can find. See https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel for various bit counting techniques.

I'm not sure I understood the problem correctly. I think you have a 64bit value and want to find the number of leading zeros in it.
One way would be to find the most significant bit and simply subtract its position from 63 (assuming lowest bit is bit 0). You can find out the most significant bit by testing whether a bit is set from within a loop over all 64 bits.
Another way might be to use the (non-standard) __builtin_clz in gcc.

I agree with the binary search idea. However two points are important here:
The range of valid answers to your question is from 0 to 64 inclusive. In other words - there may be 65 different answers to the question. I think (almost sure) all who posted the "binary search" solution missed this point, hence they'll get wrong answer for either zero or a number with the MSB bit on.
If speed is critical - you may want to avoid the loop. There's an elegant way to achieve this using templates.
The following template stuff finds the MSB correctly of any unsigned type variable.
// helper
template <int bits, typename T>
bool IsBitReached(T x)
{
const T cmp = T(1) << (bits ? (bits-1) : 0);
return (x >= cmp);
}
template <int bits, typename T>
int FindMsbInternal(T x)
{
if (!bits)
return 0;
int ret;
if (IsBitReached<bits>(x))
{
ret = bits;
x >>= bits;
} else
ret = 0;
return ret + FindMsbInternal<bits/2, T>(x);
}
// Main routine
template <typename T>
int FindMsb(T x)
{
const int bits = sizeof(T) * 8;
if (IsBitReached<bits>(x))
return bits;
return FindMsbInternal<bits/2>(x);
}

Here you go, pretty trivial to update as you need for other sizes...
int bits_left(unsigned long long value)
{
static unsigned long long mask = 0x8000000000000000;
int c = 64;
// doh
if (value == 0)
return c;
// check byte by byte to see what has been set
if (value & 0xFF00000000000000)
c = 0;
else if (value & 0x00FF000000000000)
c = 8;
else if (value & 0x0000FF0000000000)
c = 16;
else if (value & 0x000000FF00000000)
c = 24;
else if (value & 0x00000000FF000000)
c = 32;
else if (value & 0x0000000000FF0000)
c = 40;
else if (value & 0x000000000000FF00)
c = 48;
else if (value & 0x00000000000000FF)
c = 56;
// skip
value <<= c;
while(!(value & mask))
{
value <<= 1;
c++;
}
return c;
}

Same idea as user470379's, but counting down ...
Assume all 64 bits are unset. While value is larger than 0 keep shifting the value right and decrementing number of unset bits:
/* untested */
int countunsetbits(uint64_t val) {
int x = 64;
while (val) { x--; val >>= 1; }
return x;
}

Try
int countBits(int value)
{
int result = sizeof(value) * CHAR_BITS; // should be 64
while(value != 0)
{
--result;
value = value >> 1; // Remove bottom bits until all 1 are gone.
}
return result;
}

Use log base 2 to get you the most significant digit which is 1.
log(2) = 1, meaning 0b10 -> 1
log(4) = 2, 5-7 => 2.xx, or 0b100 -> 2
log(8) = 3, 9-15 => 3.xx, 0b1000 -> 3
log(16) = 4 you get the idea
and so on...
The numbers in between become fractions of the log result. So typecasting the value to an int gives you the most significant digit.
Once you get this number, say b, the simple 64 - n will be the answer.
function get_pos_msd(int n){
return int(log2(n))
}
last_zero = 64 - get_pos_msd(n)

Related

Non looping way to check if every Nth bit is set, with or without an offset?

For example, is every 4th bit set.
1000.1000 true
1010.1000 true
0010.1000 false
with offset of 1
0100.0100 true
0101.0100 true
0001.0100 false
Currently I am doing this by looping through every 4 bits
int num = 170; //1010.1010
int N = 4;
int offset = 0; //[0, N-1]
bool everyNth = true;
for (int i = 0; i < intervals ; i++){
if(((num >> (N*i)) & ((1 << (N - 1)) >> offset)) == 0){
every4th = false;
break;
}
}
return everyNth;
EXPLANATION OF CODE:
num = 1010.1010
The loop makes it so I look at each 4 bits as a block by right shifting * 4.
num >> 4 = 0000.1010
Then an & for a specific bit that can be offset.
And to only look at a specific bit of the chunk, a mask is created by ((1 << (N - 1)) >> offset)
0000.1010
1000 (mask >> offset0)
OR 0100 (mask >> offset1)
OR 0010 (mask >> offset2)
OR 0001 (mask >> offset3)
Is there a purely computational way to do this? Like how you can XOR your way through to figure out parity. I am working with 64 bit integers for my case, but I am wondering this in a more general case.
Additionally, I am under the assumption that bit operators are one of the fastest methods for calculations or math in general. If this is not true, please feel free to correct me on what the time and place is for bit operators.

If we had a mask M in which every Nth bit is set, then testing whether every Nth bit in a given integer x is set could be calculated as (x & M) == M. Or with offset, you could use ((x << offset) & M) == M. Shifting M right is fine too.
If N is constant, that's all there is to it, just use the right M.
If N is variable, the question becomes, how do we get a mask in which every Nth bit is set.
Here is a simple way to do that:
Start by setting the Nth bit
"Double" the mask until done
For example,
ulong M = 1UL << (N - 1);
do
{
M |= M << N;
N += N;
} while (N < 64);
That is clearly still a loop. But it's not a bit-by-bit loop, it makes only a logarithmic number of iterations.
You could precompute the masks and store them in a small array, the range of N is necessarily small.
There may also be a way based on ulong.MaxValue / ((1UL << N) - 1) but that needs something more to "align" the mask and 64-bit division is not so great anyway. Perhaps there is a smarter way to get the mask.
I am under the assumption that bit operators are one of the fastest methods for calculations or math in general
Bitwise operations are some of the fastest operations, but addition is equally fast, and multiplication is not that far behind (and a multiplication can do a lot more work at once, compared to how much more it costs).

Need help writing a binary reader extension method for a specific value format: 6 bit then 7 bits structure

Alright so here goes.
I currently need to write an extension method for the System.IO.BinaryReader class that is capable of reading a specific format.
I have no idea what this format is called but I do know exactly how it works so i will describe it below.
Each byte that makes up the value is flagged to indicate how the reader will need to behave next.
The first byte has 2 flags, and any subsequent bytes for the value have only 1 flag.
First byte:
01000111
^^^^^^^^
|||____|_ 6 bit value
||_______ flag: next byte required
|________ flag: signed value
Next bytes:
00000011
^^^^^^^^
||_____|_ 7 bit value
|________ flag: next byte required
The first byte in the value has 2 flags, the first bit is if the value is positive or negative.
The second bit is if another byte needs to be read.
The 6 remaining bits is the value so far which will need to be kept for later.
If no more bytes need to be read then you just return the 6 bit value with the right sign as dictated by the first bit flag.
If another byte needs to be read then you read the first bit of that byte, and that will indicate if another byte needs to be read.
The remaining 7 bits are the value here.
That value will need to be joined with the 6 bit value from the first byte.
So in the case of the example above:
The first value was this: 01000111.
Which means it is positive, another byte needs to be read, and the value so far is 000111.
Another byte is read and it is this: 00000011
Therefore no new bytes need to be read and value here is this: 0000011
That is joined onto the front of the value so far like so: 0000011000111
That is therefore the final value: 0000011000111 or 199
0100011100000011 turns into this: 0000011000111
Here is another example:
011001111000110100000001
^^^^^^^^^^^^^^^^^^^^^^^^
| || ||______|_____ Third Byte (1 flag)
| ||______|_____________ Second Byte (1 flag)
|______|_____________________ First Byte (2 flags)
First Byte:
0 - Positive
1 - Next Needed
100111 - Value
Second Byte:
1 - Next Needed
0001101 - Value
Third Byte:
0 - Next Not Needed
0000001 - Value
Value:
00000010001101100111 = 9063
Hopefully my explanation was clear :)
Now i need to be able to write a clear, simple and, and most importantly fast extension method for System.IO.BinaryReader to read such a value from a stream.
My attempts so far are kind of bad and unnecessarily complicated involving boolean arrays and bitarrays.
Therefore I could really do with somebody helping me out with this in writing such a method, that would be really appreciated!
Thanks for reading.

Based on the description in the comments I came up with this, unusually reading in signed bytes since it makes the continue flag slightly easier to check: (not tested)
static int ReadVLQInt32(this BinaryReader r)
{
sbyte b0 = r.ReadSByte();
// the first byte has 6 bits of the raw value
int shift = 6;
int raw = b0 & 0x3F;
// first continue flag is the second bit from the top, shift it into the sign
sbyte cont = (sbyte)(b0 << 1);
while (cont < 0)
{
sbyte b = r.ReadSByte();
// these bytes have 7 bits of the raw value
raw |= (b & 0x7F) << shift;
shift += 7;
// continue flag is already in the sign
cont = b;
}
return b0 < 0 ? -raw : raw;
}
It can easily be extended to read a long too, just make sure to use b & 0x7FL otherwise that value is shifted as an int and bits would get dropped.

Version that checks for illegal values (an overlong sequence of 0xFF, 0xFF... for example, plus works with checked math of C# (there is an option in the C# compiler to use cheched math to check for overflows)
public static int ReadVlqInt32(this BinaryReader r)
{
byte b = r.ReadByte();
// the first byte has 6 bits of the raw value
uint raw = (uint)(b & 0x3F);
bool negative = (b & 0x80) != 0;
// first continue flag is the second bit from the top, shift it into the sign
bool cont = (b & 0x40) != 0;
if (cont)
{
int shift = 6;
while (true)
{
b = r.ReadByte();
cont = (b & 0x80) != 0;
b &= 0x7F;
if (shift == 27)
{
if (negative)
{
// minumum value abs(int.MinValue)
if (b > 0x10 || (b == 0x10 && raw != 0))
{
throw new Exception();
}
}
else
{
// maximum value int.MaxValue
if (b > 0xF)
{
throw new Exception();
}
}
}
// these bytes have 7 bits of the raw value
raw |= ((uint)b) << shift;
if (!cont)
{
break;
}
if (shift == 27)
{
throw new Exception();
}
shift += 7;
}
}
// We use unchecked here to handle int.MinValue
return negative ? unchecked(-(int)raw) : (int)raw;
}

Get Sign of a Number without Logical Statement in C#

I want to get the Sign of a Number without a Logical Statement. Already a predefined method is available Math.Sign(). But I need to Implement in my own style.
The Tried C# Code:
public int GetSign(int value)
{
int bitFlag = 1;
var m = Convert.ToString(value, 2);
int length = m.Length;
if (m[length - 1] == '1')
{
bitFlag = -1;
}
return bitFlag;
}
Condition:
If the Last bit is 1 then return -1
If the Last bit is 0 then return 1
Kindly assist me, how to remove the above IF Statement...

Interesting thing about bit shifting:
If you right shift the bits, the leading bit will be propagated to the right.
Example byte : 10000000
Example byte >> 1 : 11000000
Integers take 32 bits to represent. So what happens if we shift the bits by 31 places? The leading bit will always be propagated, meaning all positive numbers will become 0 and all negative numbers will become -1.
Therefore :
public static int signOfInt(int input)
{
return (input >> 31);
}
will return 0 for positive numbers and -1 for negative numbers.

I think this will do it
public int GetSign(int value)
{
return -(((value & 1) << 1) - 1);
}

Setting all low order bits to 0 until two 1s remain (for a number stored as a byte array)

I need to set all the low order bits of a given BigInteger to 0 until only two 1 bits are left. In other words leave the highest and second-highest bits set while unsetting all others.
The number could be any combination of bits. It may even be all 1s or all 0s. Example:
MSB 0000 0000
1101 1010
0010 0111
...
...
...
LSB 0100 1010
We can easily take out corner cases such as 0, 1, PowerOf2, etc. Not sure how to apply popular bit manipulation algorithms on a an array of bytes representing one number.
I have already looked at bithacks but have the following constraints. The BigInteger structure only exposes underlying data through the ToByteArray method which itself is expensive and unnecessary. Since there is no way around this, I don't want to slow things down further by implementing a bit counting algorithm optimized for 32/64 bit integers (which most are).
In short, I have a byte [] representing an arbitrarily large number. Speed is the key factor here.
NOTE: In case it helps, the numbers I am dealing with have around 5,000,000 bits. They keep on decreasing with each iteration of the algorithm so I could probably switch techniques as the magnitude of the number decreases.
Why I need to do this: I am working with a 2D graph and am particularly interested in coordinates whose x and y values are powers of 2. So (x+y) will always have two bits set and (x-y) will always have consecutive bits set. Given an arbitrary coordinate (x, y), I need to transform an intersection by getting values with all bits unset except the first two MSB.

Try the following (not sure if it's actually valid C#, but it should be close enough):
// find the next non-zero byte (I'm assuming little endian) or return -1
int find_next_byte(byte[] data, int i) {
while (data[i] == 0) --i;
return i;
}
// find a bit mask of the next non-zero bit or return 0
int find_next_bit(int value, int b) {
while (b > 0 && ((value & b) == 0)) b >>= 1;
return b;
}
byte[] data;
int i = find_next_byte(data, data.Length - 1);
// find the first 1 bit
int b = find_next_bit(data[i], 1 << 7);
// try to find the second 1 bit
b = find_next_bit(data[i], b >> 1);
if (b > 0) {
// found 2 bits, removing the rest
if (b > 1) data[i] &= ~(b - 1);
} else {
// we only found 1 bit, find the next non-zero byte
i = find_next_byte(data, i - 1);
b = find_next_bit(data[i], 1 << 7);
if (b > 1) data[i] &= ~(b - 1);
}
// remove the rest (a memcpy would be even better here,
// but that would probably require unmanaged code)
for (--i; i >= 0; --i) data[i] = 0;
Untested.
Probably this would be a bit more performant if compiled as unmanaged code or even with a C or C++ compiler.
As harold noted correctly, if you have no a priori knowledge about your number, this O(n) method is the best you can do. If you can, you should keep the position of the highest two non-zero bytes, which would drastically reduce the time needed to perform your transformation.

I'm not sure if this is getting optimised out or not but this code appears to be 16x faster than ToByteArray. It also avoids the memory copy and it means you get to the results as uint instead of byte so you should have further improvements there.
//create delegate to get private _bit field
var par = Expression.Parameter(typeof(BigInteger));
var bits = Expression.Field(par, "_bits");
var lambda = Expression.Lambda(bits, par);
var func = (Func<BigInteger, uint[]>)lambda.Compile();
//test call our delegate
var bigint = BigInteger.Parse("3498574578238348969856895698745697868975687978");
int time = Environment.TickCount;
for (int y = 0; y < 10000000; y++)
{
var x = func(bigint);
}
Console.WriteLine(Environment.TickCount - time);
//compare time to ToByteArray
time = Environment.TickCount;
for (int y = 0; y < 10000000; y++)
{
var x = bigint.ToByteArray();
}
Console.WriteLine(Environment.TickCount - time);
From there finding the top 2 bits should be pretty easy. The first bit will be in the first int I presume, then it is just a matter of searching for the second top most bit. If it is in the same integer then just set the first bit to zero and find the topmost bit, otherwise search for the next no zero int and find the topmost bit.
EDIT: to make things simple just copy/paste this class into your project. This creates extension methods that means you can just call mybigint.GetUnderlyingBitsArray(). I added a method to get the Sign also and, to make it more generic, have created a function that will allow accessing any private field of any object. I found this to be slower than my original code in debug mode but the same speed in release mode. I would advise performance testing this yourself.
static class BigIntegerEx
{
private static Func<BigInteger, uint[]> getUnderlyingBitsArray;
private static Func<BigInteger, int> getUnderlyingSign;
static BigIntegerEx()
{
getUnderlyingBitsArray = CompileFuncToGetPrivateField<BigInteger, uint[]>("_bits");
getUnderlyingSign = CompileFuncToGetPrivateField<BigInteger, int>("_sign");
}
private static Func<TObject, TField> CompileFuncToGetPrivateField<TObject, TField>(string fieldName)
{
var par = Expression.Parameter(typeof(TObject));
var field = Expression.Field(par, fieldName);
var lambda = Expression.Lambda(field, par);
return (Func<TObject, TField>)lambda.Compile();
}
public static uint[] GetUnderlyingBitsArray(this BigInteger source)
{
return getUnderlyingBitsArray(source);
}
public static int GetUnderlyingSign(this BigInteger source)
{
return getUnderlyingSign(source);
}
}

Checking bits in a byte using for loop

I was recently studying C# where i came across following for loop
// Display the bits within a byte.
using System;
class ShowBits {
static void Main() {
int t;
byte val;
val = 123;
for(t=128; t > 0; t = t/2) {
if((val & t) != 0)
Console.Write("1 ");
if((val & t) == 0)
Console.Write("0 ");
}
}
}
I am not able to understand that Why in doing t=t/2 in the incrementing/decrementing section of the for loop . plz explain

Decimal 128 is binary 10000000 - i.e. a mask for just the most significant bit of the byte. When you divide it by two, you get 01000000, i.e. the second most significant bit, etc.
Using & between the original value and the mask and just comparing with 0 indicates whether that bit is set in the original value.
Another alternative would be to shift the original value instead:
for (int i = 7; i >= 0; i--)
{
int shifted = val >> i;
// Take the bottom-most bit of the shifted value
Console.Write("{0} ", shifted & 1);
}

It's looping in decreasing powers of two and using that value in a mask.
(base 10): 128, 64, 32, 16, 8, 4, 2, 1
(base 2): 10000000, 01000000, 00100000, 00010000, 00001000, 00000100, 00000010, 00000001

128 is written as 10000000 in binary, so we check if the highest bit in a byte is on. Then we do t=t/2, which is t=128/2=64 which written as 01000000 in binary and so on. Any division shifts the one bit that is on one place to the right.

The t is used as a mask for the bits in val.
So it starts at 128, 10000000 in binary.
When it is divided by 2, it becomes 64 - or 01000000.
This goes until it reaches 0.
Then in each iteration, the '&' is used to mask the bits in val with the current bit in t.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.