C# possibilities tree from integers array

C# possibilities tree from integers array - c#

How can i build possibilities tree from integers array with C#? I need to make all possibles variants of array if in the every step delete one element from array.
example if we have array from three integers [1,2,3] then tree should looks like this: tree view

I would approach this as a binary arithmetic problem:
static void Main(string[] args)
{
int[] arr = { 1, 2, 3 };
PickElements(0, arr);
}
static void PickElements<T>(int depth, T[] arr, int mask = -1)
{
int bits = Math.Min(32, arr.Length);
// keep just the bits from mask that are represented in arr
mask &= ~(-1 << bits);
if (mask == 0) return;
// UI: write the options
for (int i = 0; i < depth; i++ )
Console.Write('>'); // indent to depth
for (int i = 0; i < arr.Length; i++)
{
if ((mask & (1 << i)) != 0)
{
Console.Write(' ');
Console.Write(arr[i]);
}
}
Console.WriteLine();
// recurse, taking away one bit (naive and basic bit sweep)
for (int i = 0; i < bits; i++)
{
// try and subtract each bit separately; if it
// is different, recurse
var childMask = mask & ~(1 << i);
if (childMask != mask) PickElements(depth + 1, arr, childMask);
}
}
For a TreeView, simply replace the Console.Write etc with node creation, presumably passing the parent node in (and down) as part of the recursion (in place of depth, perhaps).
To see what this is doing, consider the binary; -1 is:
11111111111111...111111111111111
we then look at bits, which we derive from the array length, and find to be 3 in this example. We only need to look at 3 bits, then; the line:
~(-1 << bits)
computes a mask for this, because:
-1 = 1111111....1111111111111
(-1 << 3) = 1111111....1111111111000 (left-shift back-fills with 0)
~(-1 << 3) = 0000000....0000000000111 (binary inverse)
we then apply this to our input mask, so we're only ever looking at the least significant 3 bits, via mask &= .... If that turns out to be zero, we've run out of things to do, so stop recursing.
The UI update is simple enough; we just scan over the 3 bits that we care about, checking whether the current bit is "on" for our mask; 1 << i creates a mask with just the "i-th set bit"; the & and != 0 checks whether that bit is set. If it is, we include the element in the output.
Finally, we need to start taking away bits, to look at the sub-tree; we could probably be more sophisticated about this, but I chose just to scan all the bits and try them - worst case this is 32 bit tests per level, which is nothing. As before, 1 << i creates a mask of just the "i-th set bit". This time we want to disable that bit, so we "negate" and "and" via mask & ~(...). It is possible that this bit was already disabled, so the childMask != mask check ensures we only actually recurse when we have disabled a bit that was previously enabled.
The end result is that we end up with the masks being successively:
11..1111111111111111 (special case for first call; all set)
110 (first bit disabled)
100 (first and second bits disabled)
010 (first and third bits disabled)
101 (second bit disabled)
100 (second and first bits disabled)
001 (second and third bits disabled)
011 (third bit disabled)
010 (third and first bits disabled)
001 (third and second bits disabled)
Note that for a simpler combination example, it would be possible to just iterate in a single for, using the bits to pick elements; however, I've done it a recursive way because we need to build a tree of successive subtractions, rather than just flat possibilities in no particular order.

Related

What does the << mean?

Thanks for taking a look at this question.
I saw the following piece of code inside a traditional for block, but was not sure what its significance was inside its context.
index <<= 1;
For further context, here is the full block of code.
ulong index = 1;
int distance = 0;
for (int i = 0; i < 64; i++)
{
if ((hash1 & index) != (hash2 & index))
{
distance++;
}
index <<= 1;
}
Is it simply making sure that index is still 1 and if it isn't, return it's value to 1?
Secondly, what is this called so I can read up on it some more.
Finally, Thank you for your time and consideration for this matter.

The code in question is spinning through a pair of 64-bit hashes (probably as ulongs, like the index), and checking how many bits differ between them. I'm going to use 4-bit values for example purposes, but the principle is the same.
if ((hash1 & index) != (hash2 & index))
The & operator is doing a bitwise-AND operation. When the hash is ANDed with the index value, you get either 0 or the index value back, depending on whether that specific bit was 0 or 1. (1010 & 0010 == 0010 and 1010 & 0100 == 0000).
If both ANDs produce a 0, or both produce the index value, then the two bits of the hash match. Otherwise, they don't, and we distance++; to indicate that they are off by one more bit than we knew of before.
index <<= 1;
This line merely bumps the index digit to the next bit. It does this by taking the old index (which starts as 1, equal to 0001), and left shifting by one place (<< 1), then setting that back into the index variable (<<= instead of <<). So after the first loop, index will be 0010, then 0100, and so on.
This has the effect of multiplying by 2, but that's not its intended use here.
So overall, you'd get a distance of 2 by running 0011 and 1111 through this algorithm, because two bits are different.

The code
index <<= 1;
Is a left shift by one bit. It has the same effect in this case as multiplying by two. But see comments for cautions.

Setting all low order bits to 0 until two 1s remain (for a number stored as a byte array)

I need to set all the low order bits of a given BigInteger to 0 until only two 1 bits are left. In other words leave the highest and second-highest bits set while unsetting all others.
The number could be any combination of bits. It may even be all 1s or all 0s. Example:
MSB 0000 0000
1101 1010
0010 0111
...
...
...
LSB 0100 1010
We can easily take out corner cases such as 0, 1, PowerOf2, etc. Not sure how to apply popular bit manipulation algorithms on a an array of bytes representing one number.
I have already looked at bithacks but have the following constraints. The BigInteger structure only exposes underlying data through the ToByteArray method which itself is expensive and unnecessary. Since there is no way around this, I don't want to slow things down further by implementing a bit counting algorithm optimized for 32/64 bit integers (which most are).
In short, I have a byte [] representing an arbitrarily large number. Speed is the key factor here.
NOTE: In case it helps, the numbers I am dealing with have around 5,000,000 bits. They keep on decreasing with each iteration of the algorithm so I could probably switch techniques as the magnitude of the number decreases.
Why I need to do this: I am working with a 2D graph and am particularly interested in coordinates whose x and y values are powers of 2. So (x+y) will always have two bits set and (x-y) will always have consecutive bits set. Given an arbitrary coordinate (x, y), I need to transform an intersection by getting values with all bits unset except the first two MSB.

Try the following (not sure if it's actually valid C#, but it should be close enough):
// find the next non-zero byte (I'm assuming little endian) or return -1
int find_next_byte(byte[] data, int i) {
while (data[i] == 0) --i;
return i;
}
// find a bit mask of the next non-zero bit or return 0
int find_next_bit(int value, int b) {
while (b > 0 && ((value & b) == 0)) b >>= 1;
return b;
}
byte[] data;
int i = find_next_byte(data, data.Length - 1);
// find the first 1 bit
int b = find_next_bit(data[i], 1 << 7);
// try to find the second 1 bit
b = find_next_bit(data[i], b >> 1);
if (b > 0) {
// found 2 bits, removing the rest
if (b > 1) data[i] &= ~(b - 1);
} else {
// we only found 1 bit, find the next non-zero byte
i = find_next_byte(data, i - 1);
b = find_next_bit(data[i], 1 << 7);
if (b > 1) data[i] &= ~(b - 1);
}
// remove the rest (a memcpy would be even better here,
// but that would probably require unmanaged code)
for (--i; i >= 0; --i) data[i] = 0;
Untested.
Probably this would be a bit more performant if compiled as unmanaged code or even with a C or C++ compiler.
As harold noted correctly, if you have no a priori knowledge about your number, this O(n) method is the best you can do. If you can, you should keep the position of the highest two non-zero bytes, which would drastically reduce the time needed to perform your transformation.

I'm not sure if this is getting optimised out or not but this code appears to be 16x faster than ToByteArray. It also avoids the memory copy and it means you get to the results as uint instead of byte so you should have further improvements there.
//create delegate to get private _bit field
var par = Expression.Parameter(typeof(BigInteger));
var bits = Expression.Field(par, "_bits");
var lambda = Expression.Lambda(bits, par);
var func = (Func<BigInteger, uint[]>)lambda.Compile();
//test call our delegate
var bigint = BigInteger.Parse("3498574578238348969856895698745697868975687978");
int time = Environment.TickCount;
for (int y = 0; y < 10000000; y++)
{
var x = func(bigint);
}
Console.WriteLine(Environment.TickCount - time);
//compare time to ToByteArray
time = Environment.TickCount;
for (int y = 0; y < 10000000; y++)
{
var x = bigint.ToByteArray();
}
Console.WriteLine(Environment.TickCount - time);
From there finding the top 2 bits should be pretty easy. The first bit will be in the first int I presume, then it is just a matter of searching for the second top most bit. If it is in the same integer then just set the first bit to zero and find the topmost bit, otherwise search for the next no zero int and find the topmost bit.
EDIT: to make things simple just copy/paste this class into your project. This creates extension methods that means you can just call mybigint.GetUnderlyingBitsArray(). I added a method to get the Sign also and, to make it more generic, have created a function that will allow accessing any private field of any object. I found this to be slower than my original code in debug mode but the same speed in release mode. I would advise performance testing this yourself.
static class BigIntegerEx
{
private static Func<BigInteger, uint[]> getUnderlyingBitsArray;
private static Func<BigInteger, int> getUnderlyingSign;
static BigIntegerEx()
{
getUnderlyingBitsArray = CompileFuncToGetPrivateField<BigInteger, uint[]>("_bits");
getUnderlyingSign = CompileFuncToGetPrivateField<BigInteger, int>("_sign");
}
private static Func<TObject, TField> CompileFuncToGetPrivateField<TObject, TField>(string fieldName)
{
var par = Expression.Parameter(typeof(TObject));
var field = Expression.Field(par, fieldName);
var lambda = Expression.Lambda(field, par);
return (Func<TObject, TField>)lambda.Compile();
}
public static uint[] GetUnderlyingBitsArray(this BigInteger source)
{
return getUnderlyingBitsArray(source);
}
public static int GetUnderlyingSign(this BigInteger source)
{
return getUnderlyingSign(source);
}
}

Checking bits in a byte using for loop

I was recently studying C# where i came across following for loop
// Display the bits within a byte.
using System;
class ShowBits {
static void Main() {
int t;
byte val;
val = 123;
for(t=128; t > 0; t = t/2) {
if((val & t) != 0)
Console.Write("1 ");
if((val & t) == 0)
Console.Write("0 ");
}
}
}
I am not able to understand that Why in doing t=t/2 in the incrementing/decrementing section of the for loop . plz explain

Decimal 128 is binary 10000000 - i.e. a mask for just the most significant bit of the byte. When you divide it by two, you get 01000000, i.e. the second most significant bit, etc.
Using & between the original value and the mask and just comparing with 0 indicates whether that bit is set in the original value.
Another alternative would be to shift the original value instead:
for (int i = 7; i >= 0; i--)
{
int shifted = val >> i;
// Take the bottom-most bit of the shifted value
Console.Write("{0} ", shifted & 1);
}

It's looping in decreasing powers of two and using that value in a mask.
(base 10): 128, 64, 32, 16, 8, 4, 2, 1
(base 2): 10000000, 01000000, 00100000, 00010000, 00001000, 00000100, 00000010, 00000001

128 is written as 10000000 in binary, so we check if the highest bit in a byte is on. Then we do t=t/2, which is t=128/2=64 which written as 01000000 in binary and so on. Any division shifts the one bit that is on one place to the right.

The t is used as a mask for the bits in val.
So it starts at 128, 10000000 in binary.
When it is divided by 2, it becomes 64 - or 01000000.
This goes until it reaches 0.
Then in each iteration, the '&' is used to mask the bits in val with the current bit in t.

Number of unset bit left of most significant set bit?

Assuming the 64bit integer 0x000000000000FFFF which would be represented as
00000000 00000000 00000000 00000000
00000000 00000000 >11111111 11111111
How do I find the amount of unset bits to the left of the most significant set bit (the one marked with >) ?

In straight C (long long are 64 bit on my setup), taken from similar Java implementations: (updated after a little more reading on Hamming weight)
A little more explanation: The top part just sets all bit to the right of the most significant 1, and then negates it. (i.e. all the 0's to the 'left' of the most significant 1 are now 1's and everything else is 0).
Then I used a Hamming Weight implementation to count the bits.
unsigned long long i = 0x0000000000000000LLU;
i |= i >> 1;
i |= i >> 2;
i |= i >> 4;
i |= i >> 8;
i |= i >> 16;
i |= i >> 32;
// Highest bit in input and all lower bits are now set. Invert to set the bits to count.
i=~i;
i -= (i >> 1) & 0x5555555555555555LLU; // each 2 bits now contains a count
i = (i & 0x3333333333333333LLU) + ((i >> 2) & 0x3333333333333333LLU); // each 4 bits now contains a count
i = (i + (i >> 4)) & 0x0f0f0f0f0f0f0f0fLLU; // each 8 bits now contains a count
i *= 0x0101010101010101LLU; // add each byte to all the bytes above it
i >>= 56; // the number of bits
printf("Leading 0's = %lld\n", i);
I'd be curious to see how this was efficiency wise. Tested it with several values though and it seems to work.

Based on: http://www.hackersdelight.org/HDcode/nlz.c.txt
template<typename T> int clz(T v) {int n=sizeof(T)*8;int c=n;while (n){n>>=1;if (v>>n) c-=n,v>>=n;}return c-v;}
If you'd like a version that allows you to keep your lunch down, here you go:
int clz(uint64_t v) {
int n=64,c=64;
while (n) {
n>>=1;
if (v>>n) c-=n,v>>=n;
}
return c-v;
}
As you'll see, you can save cycles on this by careful analysis of the assembler, but the strategy here is not a terrible one. The while loop will operate Lg[64]=6 times; each time it will convert the problem into one of counting the number of leading bits on an integer of half the size.
The if statement inside the while loop asks the question: "can i represent this integer in half as many bits", or analogously, "if i cut this in half, have i lost it?". After the if() payload completes, our number will always be in the lowest n bits.
At the final stage, v is either 0 or 1, and this completes the calculation correctly.

If you are dealing with unsigned integers, you could do this:
#include <math.h>
int numunset(uint64_t number)
{
int nbits = sizeof(uint64_t)*8;
if(number == 0)
return nbits;
int first_set = floor(log2(number));
return nbits - first_set - 1;
}
I don't know how it will compare in performance to the loop and count methods that have already been offered because log2() could be expensive.
Edit:
This could cause some problems with high-valued integers since the log2() function is casting to double and some numerical issues may arise. You could use the log2l() function that works with long double. A better solution would be to use an integer log2() function as in this question.

// clear all bits except the lowest set bit
x &= -x;
// if x==0, add 0, otherwise add x - 1.
// This sets all bits below the one set above to 1.
x+= (-(x==0))&(x - 1);
return 64 - count_bits_set(x);
Where count_bits_set is the fastest version of counting bits you can find. See https://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel for various bit counting techniques.

I'm not sure I understood the problem correctly. I think you have a 64bit value and want to find the number of leading zeros in it.
One way would be to find the most significant bit and simply subtract its position from 63 (assuming lowest bit is bit 0). You can find out the most significant bit by testing whether a bit is set from within a loop over all 64 bits.
Another way might be to use the (non-standard) __builtin_clz in gcc.

I agree with the binary search idea. However two points are important here:
The range of valid answers to your question is from 0 to 64 inclusive. In other words - there may be 65 different answers to the question. I think (almost sure) all who posted the "binary search" solution missed this point, hence they'll get wrong answer for either zero or a number with the MSB bit on.
If speed is critical - you may want to avoid the loop. There's an elegant way to achieve this using templates.
The following template stuff finds the MSB correctly of any unsigned type variable.
// helper
template <int bits, typename T>
bool IsBitReached(T x)
{
const T cmp = T(1) << (bits ? (bits-1) : 0);
return (x >= cmp);
}
template <int bits, typename T>
int FindMsbInternal(T x)
{
if (!bits)
return 0;
int ret;
if (IsBitReached<bits>(x))
{
ret = bits;
x >>= bits;
} else
ret = 0;
return ret + FindMsbInternal<bits/2, T>(x);
}
// Main routine
template <typename T>
int FindMsb(T x)
{
const int bits = sizeof(T) * 8;
if (IsBitReached<bits>(x))
return bits;
return FindMsbInternal<bits/2>(x);
}

Here you go, pretty trivial to update as you need for other sizes...
int bits_left(unsigned long long value)
{
static unsigned long long mask = 0x8000000000000000;
int c = 64;
// doh
if (value == 0)
return c;
// check byte by byte to see what has been set
if (value & 0xFF00000000000000)
c = 0;
else if (value & 0x00FF000000000000)
c = 8;
else if (value & 0x0000FF0000000000)
c = 16;
else if (value & 0x000000FF00000000)
c = 24;
else if (value & 0x00000000FF000000)
c = 32;
else if (value & 0x0000000000FF0000)
c = 40;
else if (value & 0x000000000000FF00)
c = 48;
else if (value & 0x00000000000000FF)
c = 56;
// skip
value <<= c;
while(!(value & mask))
{
value <<= 1;
c++;
}
return c;
}

Same idea as user470379's, but counting down ...
Assume all 64 bits are unset. While value is larger than 0 keep shifting the value right and decrementing number of unset bits:
/* untested */
int countunsetbits(uint64_t val) {
int x = 64;
while (val) { x--; val >>= 1; }
return x;
}

Try
int countBits(int value)
{
int result = sizeof(value) * CHAR_BITS; // should be 64
while(value != 0)
{
--result;
value = value >> 1; // Remove bottom bits until all 1 are gone.
}
return result;
}

Use log base 2 to get you the most significant digit which is 1.
log(2) = 1, meaning 0b10 -> 1
log(4) = 2, 5-7 => 2.xx, or 0b100 -> 2
log(8) = 3, 9-15 => 3.xx, 0b1000 -> 3
log(16) = 4 you get the idea
and so on...
The numbers in between become fractions of the log result. So typecasting the value to an int gives you the most significant digit.
Once you get this number, say b, the simple 64 - n will be the answer.
function get_pos_msd(int n){
return int(log2(n))
}
last_zero = 64 - get_pos_msd(n)

Need a way to pick a common bit in two bitmasks at random

Imagine two bitmasks, I'll just use 8 bits for simplicity:
01101010
10111011
The 2nd, 4th, and 6th bits are both 1. I want to pick one of those common "on" bits at random. But I want to do this in O(1).
The only way I've found to do this so far is pick a random "on" bit in one, then check the other to see if it's also on, then repeat until I find a match. This is still O(n), and in my case the majority of the bits are off in both masks. I do of course & them together to initially check if there's any common bits at all.
Is there a way to do this? If so, I can increase the speed of my function by around 6%. I'm using C# if that matters. Thanks!
Mike

If you are willing to have an O(lg n) solution, at the cost of a possibly nonuniform probability, recursively half split, i.e. and with the top half of the bits set and the bottom half set. If both are nonzero then chose one randomly, else choose the nonzero one. Then half split what remains, etc. This will take 10 comparisons for a 32 bit number, maybe not as few as you would like, but better than 32.
You can save a few ands by choosing to and with the high half or low half at random, and if there are no hits taking the other half, and if there are hits taking the half tested.
The random number only needs to be generated once, as you are only using one bit at each test, just shift the used bit out when you are done with it.
If you have lots of bits, this will be more efficient. I do not see how you can get this down to O(1) though.
For example, if you have a 32 bit number first and the anded combination with either 0xffff0000 or 0x0000ffff if the result is nonzero (say you anded with 0xffff0000) conitinue on with 0xff000000 of 0x00ff0000, and so on till you get to one bit. This ends up being a lot of tedious code. 32 bits takes 5 layers of code.

Do you want a uniform random distribution? If so, I don't see any good way around counting the bits and then selecting one at random, or selecting random bits until you hit one that is set.
If you don't care about uniform, you can select a set bit out of a word randomly with:
unsigned int pick_random(unsigned int w, int size) {
int bitpos = rng() % size;
unsigned int mask = ~((1U << bitpos) - 1);
if (mask & w)
w &= mask;
return w - (w & (w-1));
}
where rng() is your random number generator, w is the word you want to pick from, and size is the relevant size of the word in bits (which may be the machine wordsize, or may be less as long as you don't set the upper bits of the word. Then, for your example, you use pick_random(0x6a & 0xbb, 8) or whatever values you like.

This function uniformly randomly selects one bit which is high in both masks. If there are
no possible bits to pick, zero is returned instead. The running time is O(n), where n is the number of high bits in the anded masks. So if you have a low number of high bits in your masks, this function could be faster even though the worst case is O(n) which happens when all the bits are high. The implementation in C is as follows:
unsigned int randomMasksBit(unsigned a, unsigned b){
unsigned int i = a & b; // Calculate the bits which are high in both masks.
unsigned int count = 0
unsigned int randomBit = 0;
while (i){ // Loop through all high bits.
count++;
// Randomly pick one bit from the bit stream uniformly, by selecting
// a random floating point number between 0 and 1 and checking if it
// is less then the probability needed for random selection.
if ((rand() / (double)RAND_MAX) < (1 / (double)count)) randomBit = i & -i;
i &= i - 1; // Move on to the next high bit.
}
return randomBit;
}

O(1) with uniform distribution (or as uniform as random generator offers) can be done, depending on whether you count certain mathematical operation as O(1). As a rule we would, though in the case of bit-tweaking one might make a case that they are not.
The trick is that while it's easy enough to get the lowest set bit and to get the highest set bit, in order to have uniform distribution we need to randomly pick a partitioning point, and then randomly pick whether we'll go for the highest bit below it or the lowest bit above (trying the other approach if that returns zero).
I've broken this down a bit more than might be usual to allow the steps to be more easily followed. The only question on constant timing I can see is whether Math.Pow and Math.Log should be considered O(1).
Hence:
public static uint FindRandomSharedBit(uint x, uint y)
{//and two nums together, to find shared bits.
return FindRandomBit(x & y);
}
public static uint FindRandomBit(uint val)
{//if there's none, we can escape out quickly.
if(val == 0)
return 0;
Random rnd = new Random();
//pick a partition point. Note that Random.Next(1, 32) is in range 1 to 31
int maskPoint = rnd.Next(1, 32);
//pick which to try first.
bool tryLowFirst = rnd.Next(0, 2) == 1;
// will turn off all bits above our partition point.
uint lowerMask = Convert.ToUInt32(Math.Pow(2, maskPoint) - 1);
//will turn off all bits below our partition point
uint higherMask = ~lowerMask;
if(tryLowFirst)
{
uint lowRes = FindLowestBit(val & higherMask);
return lowRes != 0 ? lowRes : FindHighestBit(val & lowerMask);
}
uint hiRes = FindHighestBit(val & lowerMask);
return hiRes != 0 ? hiRes : FindLowestBit(val & higherMask);
}
public static uint FindLowestBit(uint masked)
{ //e.g 00100100
uint minusOne = masked - 1; //e.g. 00100011
uint xord = masked ^ minusOne; //e.g. 00000111
uint plusOne = xord + 1; //e.g. 00001000
return plusOne >> 1; //e.g. 00000100
}
public static uint FindHighestBit(uint masked)
{
double db = masked;
return (uint)Math.Pow(2, Math.Floor(Math.Log(masked, 2)));
}

I believe that, if you want uniform, then the answer will have to be Theta(n) in terms of the number of bits, if it has to work for all possible combinations.
The following C++ snippet (stolen) should be able to check if any given num is a power of 2.
if (!var || (var & (var - 1))) {
printf("%u is not power of 2\n", var);
}
else {
printf("%u is power of 2\n", var);
}

If you have few enough bits to worry about, you can get O(1) using a lookup table:
var lookup8bits = new int[256][] = {
new [] {},
new [] {0},
new [] {1},
new [] {0, 1},
...
new [] {0, 1, 2, 3, 4, 5, 6, 7}
};
Failing that, you can find the least significant bit of a number x with (x & -x), assuming 2s complement. For example, if x = 46 = 101110b, then -x = 111...111010010b, hence x & -x = 10.
You can use this technique to enumerate the set bits of x in O(n) time, where n is the number of set bits in x.
Note that computing a pseudo random number is going to take you a lot longer than enumerating the set bits in x!

This can't be done in O(1), and any solution for a fixed number of N bits (unless it's totally really ridiculously stupid) will have a constant upper bound, for that N.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.