Looking for a more efficient pop count given a restriction - c#

The popcount function returns the number of 1's in an input. 0010 1101 has a popcount of 4.
Currently, I am using this algorithm to get the popcount:
private int PopCount(int x)
{
x = x - ((x >> 1) & 0x55555555);
x = (x & 0x33333333) + ((x >> 2) & 0x33333333);
return (((x + (x >> 4)) & 0x0F0F0F0F) * 0x01010101) >> 24;
}
This works fine and the only reason I ask for more is because this operation is run awfully often and I am looking for additional performance gains.
I'm looking for a way to simplify the algorithm based on the fact that my 1's will always be right aligned. That is, the input will be something like 00000 11111 (returns 5) or 00000 11111 11111 (returns 10).
Is there a way to make a more efficient popcount based on this constraint? If the input was 01011 11101 10011, it would just return 2 because it only cares about the right-most ones. It seems any kind of looping is slower than the existing solution.

Here's a C# implementation that performs "find highest set" (binary logarithm). It may or may not be faster than your current PopCount, it surely is slower than using the real clz and/or popcnt CPU instructions:
static int FindMSB( uint input )
{
if (input == 0) return 0;
return (int)(BitConverter.DoubleToInt64Bits(input) >> 52) - 1022;
}
Test: http://rextester.com/AOXD85351
And a slight variation without a conditional branch:
/* precondition: ones are right-justified, e.g. 00000111 or 00111111 */
static int FindMSB( uint input )
{
return (int)(input & (int)(BitConverter.DoubleToInt64Bits(input) >> 52) - 1022);
}

Related

Non looping way to check if every Nth bit is set, with or without an offset?

For example, is every 4th bit set.
1000.1000 true
1010.1000 true
0010.1000 false
with offset of 1
0100.0100 true
0101.0100 true
0001.0100 false
Currently I am doing this by looping through every 4 bits
int num = 170; //1010.1010
int N = 4;
int offset = 0; //[0, N-1]
bool everyNth = true;
for (int i = 0; i < intervals ; i++){
if(((num >> (N*i)) & ((1 << (N - 1)) >> offset)) == 0){
every4th = false;
break;
}
}
return everyNth;
EXPLANATION OF CODE:
num = 1010.1010
The loop makes it so I look at each 4 bits as a block by right shifting * 4.
num >> 4 = 0000.1010
Then an & for a specific bit that can be offset.
And to only look at a specific bit of the chunk, a mask is created by ((1 << (N - 1)) >> offset)
0000.1010
1000 (mask >> offset0)
OR 0100 (mask >> offset1)
OR 0010 (mask >> offset2)
OR 0001 (mask >> offset3)
Is there a purely computational way to do this? Like how you can XOR your way through to figure out parity. I am working with 64 bit integers for my case, but I am wondering this in a more general case.
Additionally, I am under the assumption that bit operators are one of the fastest methods for calculations or math in general. If this is not true, please feel free to correct me on what the time and place is for bit operators.
If we had a mask M in which every Nth bit is set, then testing whether every Nth bit in a given integer x is set could be calculated as (x & M) == M. Or with offset, you could use ((x << offset) & M) == M. Shifting M right is fine too.
If N is constant, that's all there is to it, just use the right M.
If N is variable, the question becomes, how do we get a mask in which every Nth bit is set.
Here is a simple way to do that:
Start by setting the Nth bit
"Double" the mask until done
For example,
ulong M = 1UL << (N - 1);
do
{
M |= M << N;
N += N;
} while (N < 64);
That is clearly still a loop. But it's not a bit-by-bit loop, it makes only a logarithmic number of iterations.
You could precompute the masks and store them in a small array, the range of N is necessarily small.
There may also be a way based on ulong.MaxValue / ((1UL << N) - 1) but that needs something more to "align" the mask and 64-bit division is not so great anyway. Perhaps there is a smarter way to get the mask.
I am under the assumption that bit operators are one of the fastest methods for calculations or math in general
Bitwise operations are some of the fastest operations, but addition is equally fast, and multiplication is not that far behind (and a multiplication can do a lot more work at once, compared to how much more it costs).

Bit manipulation on large integers out of 'int' range

Ok, so let's start with a 32 bit integer:
int big = 536855551; // 00011111111111111100001111111111
Now, I want to set the last 10 bits to within this integer:
int little = 69; // 0001101001
So, my approach was this:
big = (big & 4294966272) & (little)
where 4294966272 is the first 22 bits, or 11111111111111111111110000000000.
But of course this isn't supported because 4294966272 is outside of the int range of 0x7FFFFFFF. Also, this isn't going to be my only operation. I also need to be able to set bits 11 through 14. My approach for that (with the same problem) was:
big = (big & 4294951935) | (little << 10)
So with the explanation out of the way, here is what I'm doing as alternative's for the above:
1: ((big >> 10) << 10) | (little)
2: (big & 1023) | ((big >> 14) << 14) | (little << 10)
I don't feel like my alternative's are the best, efficient way I could go. Is there any better ways to do this?
Sidenote: If C# supported binary literals, '0b', this would be a lot prettier.
Thanks.
4294966272 should actually be -1024, which is represented as 11111111111111111111110000000000.
For example:
int big = 536855551;
int little = 69;
var thing = Convert.ToInt32("11111111111111111111110000000000", 2);
var res = (big & thing) & (little);
Though, the result will always be 0
00011111111111111100001111111111
&
00000000000000000000000001101001
&
11111111111111111111110000000000
Bit shift is usually faster compared to bit-shift + mask (that is, &). I have a test case for it.
You should go with your first alternative.
1: ((big >> 10) << 10) | (little)
Just beware of a little difference between unsigned and signed int when it comes to bit-shifting.
Alternatively, you could define big and little as unsigned. Use uint instead of int.

Optimisation of threshold computation

I'm trying to optimise the following C# code, which sets bytes to 0x00 or 0xFF based on a threshold.
for (int i = 0; i < veryLargeNumber; i++)
{
data[i] = (byte)(data[i] < threshold ? 0 : 255);
}
Visual Studio's performance profiler shows that the above code is rather expensive, taking nearly 8 seconds to compute - 98% of my total processing expense. I'm processing just under a thousand items, so that adds up to over two hours.
I think the issue is to do with the ternary conditional operator, since it causes a branch. I'd imagine a pure-math operation of some sort could be significantly faster, since it's CPU-cache friendly.
Is there a way to optimise this? It's possible for me to fix the threshold value, if that helps. I'd consider anything above a ~7% performance increase a win, since that's a whole 10 minutes shaved off the total processing time.
If you are using .NET 4.0 Framework, you could make use of Parallel Library in following link,
http://msdn.microsoft.com/en-us/library/dd460717
In Your case, you must have to verify the threshold, anyway it would take time. So make use of thread or lambda expressions
Just to suggest, use bitwise operators for this purpose because they are faster, together with parallel approach.
0x00 = 0000 0000
0xFF = 1111 1111
Try with OR operator(i.e. 0 | 1 = 1 where | stands for OR operator
EDIT:
This is how you could compare which number is bigger:
let a,b be numbers:
int temp= a ^ b;
temp|= temp>> 1;
temp|= temp>> 2;
temp|= temp>> 4;
temp|= temp>> 8;
temp|= temp>> 16;
temp&= ~(temp>> 1) | 0x80000000;
temp&= (a ^ 0x80000000) & (b ^ 0x7fffffff);
If you want a bit-wise solution -
int intSize = sizeof(int) * 8 - 1;
byte t = (byte)(threshold - 1);
for (....)
{
data[i] = (byte)(255 + 1 ^ ((t - data[i]) >> intSize));
}
Note: Wont work for corner case of 0. Sorry bout that
Also, try using an int array instead of byte and see if it is faster

Number of unset bit left of most significant set bit?

Assuming the 64bit integer 0x000000000000FFFF which would be represented as
00000000 00000000 00000000 00000000
00000000 00000000 >11111111 11111111
How do I find the amount of unset bits to the left of the most significant set bit (the one marked with >) ?
In straight C (long long are 64 bit on my setup), taken from similar Java implementations: (updated after a little more reading on Hamming weight)
A little more explanation: The top part just sets all bit to the right of the most significant 1, and then negates it. (i.e. all the 0's to the 'left' of the most significant 1 are now 1's and everything else is 0).
Then I used a Hamming Weight implementation to count the bits.
unsigned long long i = 0x0000000000000000LLU;
i |= i >> 1;
i |= i >> 2;
i |= i >> 4;
i |= i >> 8;
i |= i >> 16;
i |= i >> 32;
// Highest bit in input and all lower bits are now set. Invert to set the bits to count.
i=~i;
i -= (i >> 1) & 0x5555555555555555LLU; // each 2 bits now contains a count
i = (i & 0x3333333333333333LLU) + ((i >> 2) & 0x3333333333333333LLU); // each 4 bits now contains a count
i = (i + (i >> 4)) & 0x0f0f0f0f0f0f0f0fLLU; // each 8 bits now contains a count
i *= 0x0101010101010101LLU; // add each byte to all the bytes above it
i >>= 56; // the number of bits
printf("Leading 0's = %lld\n", i);
I'd be curious to see how this was efficiency wise. Tested it with several values though and it seems to work.
Based on: http://www.hackersdelight.org/HDcode/nlz.c.txt
template<typename T> int clz(T v) {int n=sizeof(T)*8;int c=n;while (n){n>>=1;if (v>>n) c-=n,v>>=n;}return c-v;}
If you'd like a version that allows you to keep your lunch down, here you go:
int clz(uint64_t v) {
int n=64,c=64;
while (n) {
n>>=1;
if (v>>n) c-=n,v>>=n;
}
return c-v;
}
As you'll see, you can save cycles on this by careful analysis of the assembler, but the strategy here is not a terrible one. The while loop will operate Lg[64]=6 times; each time it will convert the problem into one of counting the number of leading bits on an integer of half the size.
The if statement inside the while loop asks the question: "can i represent this integer in half as many bits", or analogously, "if i cut this in half, have i lost it?". After the if() payload completes, our number will always be in the lowest n bits.
At the final stage, v is either 0 or 1, and this completes the calculation correctly.
If you are dealing with unsigned integers, you could do this:
#include <math.h>
int numunset(uint64_t number)
{
int nbits = sizeof(uint64_t)*8;
if(number == 0)
return nbits;
int first_set = floor(log2(number));
return nbits - first_set - 1;
}
I don't know how it will compare in performance to the loop and count methods that have already been offered because log2() could be expensive.
Edit:
This could cause some problems with high-valued integers since the log2() function is casting to double and some numerical issues may arise. You could use the log2l() function that works with long double. A better solution would be to use an integer log2() function as in this question.
// clear all bits except the lowest set bit
x &= -x;
// if x==0, add 0, otherwise add x - 1.
// This sets all bits below the one set above to 1.
x+= (-(x==0))&(x - 1);
return 64 - count_bits_set(x);
Where count_bits_set is the fastest version of counting bits you can find. See https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel for various bit counting techniques.
I'm not sure I understood the problem correctly. I think you have a 64bit value and want to find the number of leading zeros in it.
One way would be to find the most significant bit and simply subtract its position from 63 (assuming lowest bit is bit 0). You can find out the most significant bit by testing whether a bit is set from within a loop over all 64 bits.
Another way might be to use the (non-standard) __builtin_clz in gcc.
I agree with the binary search idea. However two points are important here:
The range of valid answers to your question is from 0 to 64 inclusive. In other words - there may be 65 different answers to the question. I think (almost sure) all who posted the "binary search" solution missed this point, hence they'll get wrong answer for either zero or a number with the MSB bit on.
If speed is critical - you may want to avoid the loop. There's an elegant way to achieve this using templates.
The following template stuff finds the MSB correctly of any unsigned type variable.
// helper
template <int bits, typename T>
bool IsBitReached(T x)
{
const T cmp = T(1) << (bits ? (bits-1) : 0);
return (x >= cmp);
}
template <int bits, typename T>
int FindMsbInternal(T x)
{
if (!bits)
return 0;
int ret;
if (IsBitReached<bits>(x))
{
ret = bits;
x >>= bits;
} else
ret = 0;
return ret + FindMsbInternal<bits/2, T>(x);
}
// Main routine
template <typename T>
int FindMsb(T x)
{
const int bits = sizeof(T) * 8;
if (IsBitReached<bits>(x))
return bits;
return FindMsbInternal<bits/2>(x);
}
Here you go, pretty trivial to update as you need for other sizes...
int bits_left(unsigned long long value)
{
static unsigned long long mask = 0x8000000000000000;
int c = 64;
// doh
if (value == 0)
return c;
// check byte by byte to see what has been set
if (value & 0xFF00000000000000)
c = 0;
else if (value & 0x00FF000000000000)
c = 8;
else if (value & 0x0000FF0000000000)
c = 16;
else if (value & 0x000000FF00000000)
c = 24;
else if (value & 0x00000000FF000000)
c = 32;
else if (value & 0x0000000000FF0000)
c = 40;
else if (value & 0x000000000000FF00)
c = 48;
else if (value & 0x00000000000000FF)
c = 56;
// skip
value <<= c;
while(!(value & mask))
{
value <<= 1;
c++;
}
return c;
}
Same idea as user470379's, but counting down ...
Assume all 64 bits are unset. While value is larger than 0 keep shifting the value right and decrementing number of unset bits:
/* untested */
int countunsetbits(uint64_t val) {
int x = 64;
while (val) { x--; val >>= 1; }
return x;
}
Try
int countBits(int value)
{
int result = sizeof(value) * CHAR_BITS; // should be 64
while(value != 0)
{
--result;
value = value >> 1; // Remove bottom bits until all 1 are gone.
}
return result;
}
Use log base 2 to get you the most significant digit which is 1.
log(2) = 1, meaning 0b10 -> 1
log(4) = 2, 5-7 => 2.xx, or 0b100 -> 2
log(8) = 3, 9-15 => 3.xx, 0b1000 -> 3
log(16) = 4 you get the idea
and so on...
The numbers in between become fractions of the log result. So typecasting the value to an int gives you the most significant digit.
Once you get this number, say b, the simple 64 - n will be the answer.
function get_pos_msd(int n){
return int(log2(n))
}
last_zero = 64 - get_pos_msd(n)

Does MSIL have ROL and ROR instructions?

I wrote an Int128 type and it works great. I thought I could improve on its performance with a simple idea: Improve the shift operations which are a bit clumsy.
Because they are heavily used in multiplication and division, an improvement would have a ripple effect. So I began creating a dynamic method (to shift low and rotate high), only to discover that there are no OpCodes.Rol or OpCodes.Ror instructions.
Is this possible in IL?
No.
You need to implement it with bit shifts
UInt64 highBits = 0;
UInt64 lowBits = 1;
Int32 n = 63;
var altShift = (n - 63);
var lowShiftedOff = (n - 63) > 0 ? 0 : (lowBits << n);
var highShiftedOff = (n - 63) > 0 ? 0 : (highBits << n);
var highResult = (UInt64)(highShiftedOff | (altShift > 0 ? (lowBits << altShift - 1) : 0));
var lowResult= (UInt64)(lowShiftedOff | (altShift > 0 ? (highBits << altShift - 1) : 0));
To partially answer this question 7 years later, in case someone should need it.
You can use ROR/ROL in .Net.
MSIL doesn't directly contain ROR or ROL operations, but there are patterns that will make the JIT compiler generate ROR and ROL. RuyJIT (.Net and .Net core) supports this.
The details of improving .Net Core to use this pattern was discussed here and a month later .Net Core code was updated to use it.
Looking at the implementation of SHA512 we find examples of ROR:
public static UInt64 RotateRight(UInt64 x, int n) {
return (((x) >> (n)) | ((x) << (64-(n))));
}
And extending by same pattern to ROL:
public static UInt64 RotateLeft(UInt64 x, int n) {
return (((x) << (n)) | ((x) >> (64-(n))));
}
To do this on 128-bit integer you can process as two 64-bit, then AND to extract "carry", AND to clear destination and OR to apply. This has to be mirrored in both directions (low->high and high->low). I'm not goin to bother with an example since this question is a bit old.

Categories

Resources