My apologies if this has been asked/answered before but I'm honestly not even sure how to word this as a question properly. I have the following bit pattern:
0110110110110110110110110110110110110110110110110110110110110110
I'm trying to perform a shift that'll preserve my underlying pattern; my first instinct was to use right rotation ((x >> count) | (x << (-count & 63))) but the asymmetry in my bit pattern results in:
0011011011011011011011011011011011011011011011011011011011011011 <--- wrong
The problem is that the most significant (far left) bit ends up being 0 instead of the desired 1:
1011011011011011011011011011011011011011011011011011011011011011 <--- right
Is there a colloquial name for this function I'm looking for? If not, how could I go about implementing this idea?
Additional Information:
While the question is language agnostic I'm currently trying to solve this using C#.
The bit patterns I'm using are entirely predictable and always have the same structure; the pattern starts with a single zero followed by n - 1 ones (where n is an odd number) and then repeats infinitely.
I'd like to accomplish this without conditional operations since they'd defeat the purpose of using bitwise manipulation in the first place but maybe I have no choice...
You've got a number structured like this:
B16 B15 B14 B13 B12 B11 B10 B09 B08 B07 B06 B05 B04 B03 B02 B01 B00
? 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0
The ? needs to appear in the MSB (B15, or B63, or whatever) after the shift. Where does it come from? Well, the closest copy is found n places to the right:
B13 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0
^--------------/
If your word has width w, this is 1 << (w-n)
*
So you can do:
var selector = 1 << (w-n);
var rotated = (val >> 1) | ((val & selector) << (n-1));
But you may want a multiple shift. Then we need to build a wider mask:
? 0 1 1 0 1 1 0 1 1 0 1 1 0 1 1 0
* * * * *
Here I've chosen to pretend n = 6, it just needs to be a multiple of the basic n, and larger than shift. Now:
var selector = ((1UL << shift) - 1) << (w - n);
var rotated = (val >> shift) | ((val & selector) << (n - shift));
Working demonstration using your pattern: http://rextester.com/UWYSW47054
It's easy to see that the output has period 3, as required:
1:B6DB6DB6DB6DB6DB
2:DB6DB6DB6DB6DB6D
3:6DB6DB6DB6DB6DB6
4:B6DB6DB6DB6DB6DB
5:DB6DB6DB6DB6DB6D
6:6DB6DB6DB6DB6DB6
7:B6DB6DB6DB6DB6DB
8:DB6DB6DB6DB6DB6D
9:6DB6DB6DB6DB6DB6
10:B6DB6DB6DB6DB6DB
11:DB6DB6DB6DB6DB6D
Instead of storing a lot of repetitions of a pattern, just store one recurrence and apply modulo operations on the indexes
byte[] pattern = new byte[] { 0, 1, 1 };
// Get a "bit" at index "i", shifted right by "shift"
byte bit = pattern[(i - shift + 1000000 * byte.Length) % byte.Length];
The + 1000000 * byte.Length must be greater than the greatest expected shift and ensures that we get a posistive sum.
This allows you to store patterns of virtually any length.
An optimization would be to store a mirrored version of the pattern. You could then shift left instead of right. This would simplify the index calculation
byte bit = pattern[(i + shift) % byte.Length];
Branchless Answer after a poke by #BenVoigt:
Get the last bit b by doing (n & 1);
Return n >> 1 | b << ((sizeof(n) - 1).
Original Answer:
Get the last bit b by doing (n & 1);
If b is 1, right shift the number by 1 bit and bitwise-OR it with 1 << (sizeof(n) - 1);
If b is 0, just right shift the number by 1 bit.
The problem was changed a bit through the comments.
For all reasonable n, the following problem can be solved efficiently after minimal pre-computation:
Given an offset k, get 64 bits starting at that position in the stream of bits that follows the pattern of (zero, n-1 ones) repeating.
Clearly the pattern repeats with a period of n, so only n different ulongs have to be produced for every given value of n. That could either be done explicitly, constructing all of them in pre-processing (they could be constructed in any obvious way, it doesn't really matter since that only happens once), or left more implicitly by storing only two ulongs per value for n (this works under the assumption that n < 64, see below) and then extracting a range from them with some shifting/ORing. Either way, use offset % n to compute which pattern to retrieve (since the offset is increasing in a predictable manner, no actual modulo operation is required[1]).
Even with the first method, memory consumption will be reasonable since this optimization is only an optimization for low n: in particular for n > 64 there will be fewer than 1 zero per word on average, so the "old fashioned way" of visiting every multiple of n and resetting that bit starts to skip work while the above trick would still visit every word and would not be able anymore to reset multiple bits at once.
[1]: if there are multiple n's in play at the same time, a possible strategy is keeping an array offsets where offsets[n] = offset % n, which could be updated according to: (not tested)
int next = offsets[n] + _64modn[n]; // 64 % n precomputed
offsets[n] = next - (((n - next - 1) >> 31) & n);
The idea being that n is subtracted whenever next >= n. Only one subtraction is needed since the offset and thing added to the offset are already reduced modulo n.
This offset-increment can be done with System.Numerics.Vectors, which is very feature-poor compared to actual hardware but is just about able to do this. It can't do the shift (yes, it's weird) but it can implement a comparison in a branchless way.
Doing one pass per value of n is easier, but touches lots of memory in a cache unfriendly manner. Doing lots of different n at the same time may not be great either. I guess you'd just have to bechmark that..
Also you could consider hard-coding it for some low numbers, something like offset % 3 is fairly efficient (unlike offset % variable). This does take manual loop-unrolling which is a bit annoying, but it's actually simpler, just big in terms of lines of code.
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
In counting the number of bits in a word, a brute force would be something like this:
int CountNumSetBits(unsigned long n)
{
unsigned short num_setbits = 0;
while (n)
{
num_setbits += n & 1;
n >>= 1;
}
return num_setbits;
}
The big O speed would be O(n) where n is the number of bits in the Word.
I thought of another way of writing the algorithm taking advantage of the fact that we an optain the first occurance of a set bit using y = x&~(x-1)
int CountNumSetBitsMethod2(unsigned long n)
{
unsigned short num_setbits = 0;
int y = 0;
while (n)
{
y = n& ~(n - 1); // get first occurrence of '1'
if (y) // if we have a set bit inc our counter
++num_setbits;
n ^=y; // erase the first occurrence of '1'
}
return num_setbits;
}
If we assume that are inputs are 50% 1's and 50% 0's it appears that the second algorithm could be twice as fast. However, the actual complexity is greater:
In method one we do the following for each bit:
1 add
1 and
1 shift
In method two we do the following for each set bit:
1 and
1 complement
1 subtraction (the result of the subtraction has to be copied to another reg)
1 compare
1 increment (if compare is true)
1 XOR
Now, in practice one can determine which algorithm is faster by performing some profiling. That is, using a stop watch mechanism and some test data and call each algorithm say a million times.
What I want to do first, however, is see how well I can estimate the speed difference by eyeballing the code (given same number of set and unset bits).
If we assume that the subtraction takes the same amount cycles as the add (approximately), and all the other operations are equal cycle wise, can one conclude that each algorithm takes about the same amount of time?
Note: I am assuming here we cannot use lookup tables.
The second algorithm can be greatly simplified:
int CountNumSetBitsMethod2(unsigned long n) {
unsigned short num_setbits = 0;
while (n) {
num_setbits++;
n &= n - 1;
}
return num_setbits;
}
There are many more ways to compute the number of bits set in a word:
Using lookup tables for mutiple bits at a time
Using 64-bit multiplications
Using parallel addition
Using extra tricks to shave a few cycles.
Trying to determine empirically which is faster by counting cycles is not so easy because even looking at the assembly output, it is difficult to assess the impact of instruction parallelisation, pipelining, branch prediction, register renaming and contention... Modern CPUs are very sophisticated! Furthermore, the actual code generated depends on the compiler version and configuration and the timings depend on the CPU type and release... Not to mention the variability linked to the particular sets of values used (for algorithms with variable numbers of instructions).
Benchmarking is a necessary tool, but even careful benchmarking may fail to model the actual usage correctly.
Here is a great site for this kind of bit twiddling games:
http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
I suggest you implement the different versions and perform comparative benchmarks on your system. There is no definite answer, only local optima for specific sets of conditions.
Some amazing finds:
// option 3, for at most 32-bit values in v:
c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) %
0x1f;
c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
A more classic one, usually considered the best method for counting bits in a 32-bit integer v:
v = v - ((v >> 1) & 0x55555555); // reuse input as temporary
v = (v & 0x33333333) + ((v >> 2) & 0x33333333); // temp
c = ((v + (v >> 4) & 0xF0F0F0F) * 0x1010101) >> 24; // count
first, the only way to know how fast things are is to measure them.
Second - to find the number of set bits in some bytes, build a lookup table for the number of set bits in a byte
0->0
1->1
2->1
3->2
4->1
etc.
This is a common method and very fast
You can code it by hand or create it at startup
I was trying to recreate my C++ factor program from a few years ago in my new language C#. All I could remember is that it possibly involved a modulo, and possibly didn't. I knew that it involved at least one for and if statement. However, when I started trying to recreate it I kept getting nothing near what should be. I thought it had something to do with me not understanding loops, but it turns out I understand loops just fine. What I don't understand is how to use the modulo when performing math operations.
for instance what am I doing when I say something like:
(ignore that it might not actually work, it's just an example)
if(12 % 2 == 0)
{
Console.WriteLine("I don't understand.");
}
This kind of thing I don't quite have a grasp of yet. I realize that it is taking the remainder, and that's all I can grasp, not how it's actually used in real programming. I managed to get my factor program to work in C# after a bit of thinking and tinkering, it again doesn't mean I understand this operator or its uses. I no longer have access to the old C++ file.
The % (modulo) operator yields the remainder from the division. In your example the remainder is equal to 0 and the if evaluates to true (0 == 0). A classic example is when it's used to see if a number is even or not.
if (number % 2 == 0) {
// even
} else {
// odd
}
Think of modulo like a circle with a pointer (spinner), easiest example is a clock.
Notice how at the top it is zero.
The modulo function maps any value to one of those values on the spinner, think of the value to the left of the % as the number of steps around the spinner, and the second value as the number of total steps in the spinner, so we have the following.
0 % 12 = 0
1 % 12 = 1
12 % 12 = 0
13 % 12 = 1
We always start at 0.
So if we go 0 steps around a 12 step spinner we are still at 0, if we go 1 step from zero we are on 1, if we go 12 steps we are back at 0. If we go 13 we go all the way around and end at 1 again.
I hope this helps you visualize it.
It helps when you are using structures like an array, and you want to cycle through them. Imagine you have an array of the days of the week, 7 elements (mon-sunday). You want to always display the day 3 days from the current day. well Today is tuesday, so the array element is days[1], if we want to get the day 3 days from now we do days[1+3]; now this is alright, but what if we are at saturday (days[5]) and want to get 3 days from there? well we have days[5+3] which is an index out of bounds error as our array has only 7 elements (max index of 6) and we tried to access the 8th element.
However, knowing what you know about modulos and spinners now you can do the following:
string threeDaysFromNow = days[(currentDay + 3)%7]; When it goes over the bounds of the array, it wraps around and starts at the beginning again. There are many applications for this. Just remember the visualization of spinners, that is when it clicked in my head.
The modulo operator % returns the remainder of a division operation. For example, where 13 / 5 = 2, 13 % 5 = 3 (using integer math).
It's a common tactic to check a value against % 2 to see if it is even. If it is even, the remainder will be 0, otherwise it will be 1.
As for your specific use of it, you are doing 12 % 2 which is not only 0, but will always be 0. That will always make the if condition 12 % 2 == 0 true, which makes the if rather redundant.
as mentioned, it's commonly used for checking even/odd but also can use it to iterate loops at intervals, or split files into mod chunks. i personally use mod for clock face type problems as my data often navigates a circle.
the register is in mod for example an 8 bit register rolls over at 2^8 so so can force compliance into a register size var = mod(var, 256)
and the last thing i know about mod is that it is used in checksum and random number generation, but i haven't gone into the why for those. at all
An example where you could use this is in indexing arrays in certain for loops. For example, take the simple equation that defines the new pixel value of a resampled image using bicubic interpolation:
where
Don't worry what bicubic interpolation exactly is for the moment, we're just concerned about executing what seems to be two simple for loops: one for index i and one for index j. Note that the vector 'a' is 16 numbers long.
A simple for loop someone would try could be:
int n= 0;
for(int i = 0; i < 4; ++i)
{
for(int j = 0; i < 4; ++j)
{
pxy += a[n] * pow(x,i) * pow(y,j); // p(x,y)
n++; // n = 15 when finished
}
}
Or you could do it in one for loop:
for(int i = 0; i < 16; ++i)
{
int i_new = floor(i / 4.0); // i_new provides indices 0-3 incrementing every 4 iterations of loop
int j_new = i % 4; // j_new is reset to 0 when i is a multiple of 4
pxy += a[i] * pow(x,i_new) * pow(y,j_new); // p(x,y)
}
Printing i_new and j_new in the loop:
i_new j_new
0 0
0 1
0 2
0 3
1 0
1 1
1 2
1 3
2 0
2 1
2 2
2 3
3 0
3 1
3 2
3 3
As you can see, % can be very useful.
My question is, is there a way in C# with a starting bit location to find the next binary digit within a byte that has a specified value of 0 or 1 without iteration (looking for the highest performance option).
As an example, if you had 10011 and started at the first bit (far right) and searched for the first 0, it would be the 3rd place going right to left. If you then started at the 3rd place and wanted to find the next 1, it would be at the 5th place (far left).
Thanks for any help and feel free to let me know if I need to provide anything further.
Edit: Here is my current code.
private int GetBinarySegment(uint uiValue, int iStart, int iMaxBits, byte bValue)
{
int r = 0; uiValue >>= iStart;
if (uiValue == 0) return iMaxBits - iStart;
while ((uiValue & 1) == bValue) { uiValue >>= 1; r++; }
return r;
}
There are ways, but they're ugly because there's no _BitScanForward or equivalent intrinsic. Still, you can actually compute this thing efficiently without needing a huge table.
First step: make a number that has a 1 at the position you're searching for and 0 everywhere else.
If searching for a 1, that means x & -x. If searching for a 0, use ~x & (x + 1).
Then, use one of the many ways to emulate either bitscan (there is only one set bit now, so it doesn't matter which side you search from). Some ways to do that are detailed here (not in C#, but you can convert them).
Use a lookup table. That is, precalculate a 2D array indexed by byte value and current position. You can do a separate table for zeros and ones, or you can combine it.
So for your example, you start at bit 0 of the number 19. That happens to be a 1. So if you lookup nextBit[19][0] it should return 1, and so on. Here's what a combined lookup table might look like. It shows the next bit for both 0s and 1s:
nextBit[19][0] = 1 // 1
nextBit[19][1] = 4 // 1
nextBit[19][2] = 3 // 0
nextBit[19][3] = 4 // 0
nextBit[19][4] = 0 // 1
nextBit[19][5] = 6 // 0
nextBit[19][6] = 7 // 0
Obviously there is no 'next' for bit 7, and if 'next' returns 0, there are no more of that particular bit.
I may have interpreted your question incorrectly, but this technique can be modified to suit your purposes. I initially thought you wanted to navigate through all 1-bits or 0-bits. If instead you want to skip over consecutive 1-bits, then you just arrange your table in that way. Or indeed, you can have a 'next' for both 0 and 1 at each position.
I'm working on a data structure which subdivides items into quadrants, and one of the bottlenecks I've identified is my method to select the quadrant of the point. Admittedly, it's fairly simple, but it's called so many times that it adds up. I imagine there's got to be an efficient way to bit twiddle this into what I want, but I can't think of it.
private int Quadrant(Point p)
{
if (p.X >= Center.X)
return p.Y >= Center.Y ? 0 : 3;
return p.Y >= Center.Y ? 1 : 2;
}
Center is of type Point, coordinates are ints. Yes, I've run a code profile, and no, this isn't premature optimization.
Because this is only used internally, I suppose my quadrants don't have to be in Cartesian order, as long as they range from 0-3.
The fastest way in C/C++ it would be
(((unsigned int)x >> 30) & 2) | ((unsigned int)y >> 31)
(30/31 or 62/63, depending on size of int).
This will give the quadrants in order 0, 2, 3, 1.
Edit for LBushkin:
(((unsigned int)(x - center.x) >> 30) & 2) | ((unsigned int)(y-center.y) >> 31)
I don't know that you can make this code dramatically faster in C#. What you may be able to do, however, it look at how you're processing points, and see if you can avoid making unecessary calls to this method. Perhaps you could create a QuadPoint structure that stores which quadrant a point is in (after you compute it once), so that you don't have to do so again.
But, admittedly, this depends on what your algorithm is doing, and whether it's possible to store/memoize the quadrant information. If every point is completely unique, this obviously won't help.
I've just been told about the solution which produces 0,1,2,3 quadrant results ordered correctly:
#define LONG_LONG_SIGN (sizeof(long long) * 8 - 1)
double dx = point.x - center.x;
double dy = point.y - center.y;
long long *pdx = (void *)&dx;
long long *pdy = (void *)&dy;
int quadrant = ((*pdy >> LONG_LONG_SIGN) & 3) ^ ((*pdx >> LONG_LONG_SIGN) & 1);
This solution is for x,y coordinates of double type.
I've done some performance testing of this method and the method with branching as in original question: my results are that the branching method is always a bit faster (currently I am having stable 160/180 relation), so I prefer the branching method over the method with bitwise operations.
UPDATE
If someone is interested, all three algorithms were merged into EKAlgorithms C/Objective-C repository as "Cartesian quadrant selection" algorithms:
Original branching algorithm
Bitwise algorithm by #ruslik from the accepted answer.
Alternative bitwise promoted by one of my colleagues which is a bit slower than second algorithm but returns quadrants in correct order.
All algorithms there are optimized to work with double-typed points.
Performance testing showed us that in general the first branching algorithm is the winner on Mac OS X, though on Linux machine we did see second algorithm performing a small bit faster than the branching one.
So, general conclusion is to stick with branching algorithm because bitwise versions do not give any performance gain.
My first try would be to get rid of the nested conditional.
int xi = p.X >= Center.X ? 1 : 0;
int yi = p.Y >= Center.Y ? 2 : 0;
int quadrants[4] = { ... };
return quadrants[xi+yi];
The array lookup in quadrants is optional if the quadrants are allowed to be renumbered. My code still needs two comparisons but they can be done in parallel.
I apologise in advance for any C# errors as I usually code C++.
Perhaps something more efficient is possible when two unsigned 31 bit coordinates are stored in a 64 bit unsigned long variable.
// The following two lines are unnecessary
// if you store your coordinated in unsigned longs right away
unsigned long Pxy = (((unsigned long)P.x) << 32) + P.y;
unsigned long Centerxy = (((unsigned long)Center.x) << 32) + Center.y;
// This is the actual calculation, only 1 subtraction is needed.
// The or-ing with ones hast only to be done once for a repeated use of Centerxy.
unsigned long diff = (Centerxy|(1<<63)|(1<<31))-Pxy;
int quadrant = ((diff >> 62)&2) | ((diff >> 31)&1);
Taking a step back, a different solution is possible. Do not arrange your data structure split into quadrants right away but alternately in both directions. This is also done in the related Kd-tree
I am working on a little Hardware interface project based on the Velleman k8055 board.
The example code comes in VB.Net and I'm rewriting this into C#, mostly to have a chance to step through the code and make sense of it all.
One thing has me baffled though:
At one stage they read all digital inputs and then set a checkbox based on the answer to the read digital inputs (which come back in an Integer) and then they AND this with a number:
i = ReadAllDigital
cbi(1).Checked = (i And 1)
cbi(2).Checked = (i And 2) \ 2
cbi(3).Checked = (i And 4) \ 4
cbi(4).Checked = (i And 8) \ 8
cbi(5).Checked = (i And 16) \ 16
I have not done Digital systems in a while and I understand what they are trying to do but what effect would it have to AND two numbers? Doesn't everything above 0 equate to true?
How would you translate this to C#?
This is doing a bitwise AND, not a logical AND.
Each of those basically determines whether a single bit in i is set, for instance:
5 AND 4 = 4
5 AND 2 = 0
5 AND 1 = 1
(Because 5 = binary 101, and 4, 2 and 1 are the decimal values of binary 100, 010 and 001 respectively.)
I think you 'll have to translate it to this:
i & 1 == 1
i & 2 == 2
i & 4 == 4
etc...
This is using the bitwise AND operator.
When you use the bitwise AND operator, this operator will compare the binary representation of the two given values, and return a binary value where only those bits are set, that are also set in the two operands.
For instance, when you do this:
2 & 2
It will do this:
0010 & 0010
And this will result in:
0010
0010
&----
0010
Then if you compare this result with 2 (0010), it will ofcourse return true.
Just to add:
It's called bitmasking
http://en.wikipedia.org/wiki/Mask_(computing)
A boolean only require 1 bit. In the implementation most programming language, a boolean takes more than a single bit. In PC this won't be a big waste, but embedded system usually have very limited memory space, so the waste is really significant. To save space, the booleans are packed together, this way a boolean variable only takes up 1 bit.
You can think of it as doing something like an array indexing operation, with a byte (= 8 bits) becoming like an array of 8 boolean variables, so maybe that's your answer: use an array of booleans.
Think of this in binary e.g.
10101010
AND
00000010
yields 00000010
i.e. not zero. Now if the first value was
10101000
you'd get
00000000
i.e. zero.
Note the further division to reduce everything to 1 or 0.
(i and 16) / 16 extracts the value (1 or 0) of the 5th bit.
1xxxx and 16 = 16 / 16 = 1
0xxxx and 16 = 0 / 16 = 0
And operator performs "...bitwise conjunction on two numeric expressions", which maps to '|' in C#. The '` is an integer division, and equivalent in C# is /, provided that both operands are integer types.
The constant numbers are masks (think of them in binary). So what the code does is apply the bitwise AND operator on the byte and the mask and divide by the number, in order to get the bit.
For example:
xxxxxxxx & 00000100 = 00000x000
if x == 1
00000x00 / 00000100 = 000000001
else if x == 0
00000x00 / 00000100 = 000000000
In C# use the BitArray class to directly index individual bits.
To set an individual bit i is straightforward:
b |= 1 << i;
To reset an individual bit i is a little more awkward:
b &= ~(1 << i);
Be aware that both the bitwise operators and the shift operators tend to promote everything to int which may unexpectedly require casting.
As said this is a bitwise AND, not a logical AND. I do see that this has been said quite a few times before me, but IMO the explanations are not so easy to understand.
I like to think of it like this:
Write up the binary numbers under each other (here I'm doing 5 and 1):
101
001
Now we need to turn this into a binary number, where all the 1's from the 1st number, that is also in the second one gets transfered, that is - in this case:
001
In this case we see it gives the same number as the 2nd number, in which this operation (in VB) returns true. Let's look at the other examples (using 5 as i):
(5 and 2)
101
010
----
000
(false)
(5 and 4)
101
100
---
100
(true)
(5 and 8)
0101
1000
----
0000
(false)
(5 and 16)
00101
10000
-----
00000
(false)
EDIT: and obviously I miss the entire point of the question - here's the translation to C#:
cbi[1].Checked = i & 1 == 1;
cbi[2].Checked = i & 2 == 2;
cbi[3].Checked = i & 4 == 4;
cbi[4].Checked = i & 8 == 8;
cbi[5].Checked = i & 16 == 16;
I prefer to use hexadecimal notation when bit twiddling (e.g. 0x10 instead of 16). It makes more sense as you increase your bit depths as 0x20000 is better than 131072.