I'm trying to implement an algorithm to Howellize a matrix, in the way explained on page 5 of this paper (google docs link) (link to the pdf).
Most of it is pretty obvious to me, I think, but I'm not sure about line 16, does >> mean a right shift there? If it does, then how does it even work? Surely it would mean that bits are being chopped off? As far as I know there's no guarantee at that point that the number it is shifting is being shifted by an amount that preserves the information.
And if it doesn't mean a right shift, what does it mean?
If anyone can spare the time, I'd also like to have a test case (I don't trust myself to come up with one, I don't understand it well enough).
I've implemented it like this, is that correct? (I don't have a test case, so how can I find out?)
int j = 0;
for (int i = 0; i < 2 * k + 1; i++)
var R = (from row in rows
where leading_index(row) == i
orderby rank(row[i]) ascending
select row).ToList();
if (R.Count > 0)
uint[] r = R[0];
int p = rank(r[i]); // rank counts the trailing zeroes
uint u = r[i] >> p;
invert(r, u); // multiplies each element of r by the
// multiplicative inverse of u
for (int s = 1; s < R.Count; s++)
int t = rank(R[s][i]);
uint v = R[s][i] >> t;
if (subtract(R[s], r, v << (t - p)) == 0)
// subtracts (v<<(t-p)) * r from R[s],
// removes if all elements are zero
swap(rows, rows.IndexOf(r), j);
for (int h = 0; h < j - 1; h++)
uint d = rows[h][i] >> p;
subtract(rows[h], r, d);
if (r[i] != 1)
// shifted returns r left-shifted by 32-p
rows.Add(shifted(r, 32 - p));
For test case, this may help you (page no #2). Also try this.
I think that you are right about the right shift. To get the Howell form, they want the values other than leading value in a column to be smaller than the leading value. Right shifting seems fruitful for that.
line 16 says:
Pick d so that 0 <= G(h,i) - d * ri < ri
G(h,i) - d * ri = 0
G(h,i) = d * ri
G(h,i) = d * (2 ^ p) ... as the comment on line 8 says, ri = 2^p.
So d = G(h,i) / (2 ^ p)
Right shifting G(h,i) by p positions is the quickest way to compute the value of d.
Problem statement:
Given an array of non-negative integers, count the number of unordered pairs of array elements, such that their bitwise AND is a power of 2.
arr = [10, 7, 2, 8, 3]
Answer: 6 (10&7, 10&2, 10&8, 10&3, 7&2, 2&3)
1 <= arr.Count <= 2*10^5
0 <= arr[i] <= 2^12
Here's my brute-force solution that I've come up with:
private static Dictionary<int, bool> _dictionary = new Dictionary<int, bool>();
public static long CountPairs(List<int> arr)
long result = 0;
for (var i = 0; i < arr.Count - 1; ++i)
for (var j = i + 1; j < arr.Count; ++j)
if (IsPowerOfTwo(arr[i] & arr[j])) ++result;
return result;
public static bool IsPowerOfTwo(int number)
if (_dictionary.TryGetValue(number, out bool value)) return value;
var result = (number != 0) && ((number & (number - 1)) == 0);
_dictionary[number] = result;
return result;
For small inputs this works fine, but for big inputs this works slow.
My question is: what is the optimal (or at least more optimal) solution for the problem? Please provide a graceful solution in C#. 😊
One way to accelerate your approach is to compute the histogram of your data values before counting.
This will reduce the number of computations for long arrays because there are fewer options for value (4096) than the length of your array (200000).
Be careful when counting bins that are powers of 2 to make sure you do not overcount the number of pairs by including cases when you are comparing a number with itself.
We can adapt the bit-subset dynamic programming idea to have a solution with O(2^N * N^2 + n * N) complexity, where N is the number of bits in the range, and n is the number of elements in the list. (So if the integers were restricted to [1, 4096] or 2^12, with n at 100,000, we would have on the order of 2^12 * 12^2 + 100000*12 = 1,789,824 iterations.)
The idea is that we want to count instances for which we have overlapping bit subsets, with the twist of adding a fixed set bit. Given Ai -- for simplicity, take 6 = b110 -- if we were to find all partners that AND to zero, we'd take Ai's negation,
110 -> ~110 -> 001
Now we can build a dynamic program that takes a diminishing mask, starting with the full number and diminishing the mask towards the left
Each set bit on the negation of Ai represents a zero, which can be ANDed with either 1 or 0 to the same effect. Each unset bit on the negation of Ai represents a set bit in Ai, which we'd like to pair only with zeros, except for a single set bit.
We construct this set bit by examining each possibility separately. So where to count pairs that would AND with Ai to zero, we'd do something like
001 ->
we now want to enumerate
011 ->
101 ->
fixing a single bit each time.
We can achieve this by adding a dimension to the inner iteration. When the mask does have a set bit at the end, we "fix" the relevant bit by counting only the result for the previous DP cell that would have the bit set, and not the usual union of subsets that could either have that bit set or not.
Here is some JavaScript code (sorry, I do not know C#) to demonstrate with testing at the end comparing to the brute-force solution.
var debug = 0;
function bruteForce(a){
let answer = 0;
for (let i = 0; i < a.length; i++) {
for (let j = i + 1; j < a.length; j++) {
let and = a[i] & a[j];
if ((and & (and - 1)) == 0 && and != 0){
if (debug)
console.log(a[i], a[j], a[i].toString(2), a[j].toString(2))
return answer;
function f(A, N){
const n = A.length;
const hash = {};
const dp = new Array(1 << N);
for (let i=0; i<1<<N; i++){
dp[i] = new Array(N + 1);
for (let j=0; j<N+1; j++)
dp[i][j] = new Array(N + 1).fill(0);
for (let i=0; i<n; i++){
if (hash.hasOwnProperty(A[i]))
hash[A[i]] = hash[A[i]] + 1;
hash[A[i]] = 1;
for (let mask=0; mask<1<<N; mask++){
// j is an index where we fix a 1
for (let j=0; j<=N; j++){
if (mask & 1){
if (j == 0)
dp[mask][j][0] = hash[mask] || 0;
dp[mask][j][0] = (hash[mask] || 0) + (hash[mask ^ 1] || 0);
} else {
dp[mask][j][0] = hash[mask] || 0;
for (let i=1; i<=N; i++){
if (mask & (1 << i)){
if (j == i)
dp[mask][j][i] = dp[mask][j][i-1];
dp[mask][j][i] = dp[mask][j][i-1] + dp[mask ^ (1 << i)][j][i - 1];
} else {
dp[mask][j][i] = dp[mask][j][i-1];
let answer = 0;
for (let i=0; i<n; i++){
for (let j=0; j<N; j++)
if (A[i] & (1 << j))
answer += dp[((1 << N) - 1) ^ A[i] | (1 << j)][j][N];
for (let i=0; i<N + 1; i++)
if (hash[1 << i])
answer = answer - hash[1 << i];
return answer / 2;
var As = [
[10, 7, 2, 8, 3] // 6
for (let A of As){
console.log(`DP, brute force: ${ f(A, 4) }, ${ bruteForce(A) }`);
var numTests = 1000;
for (let i=0; i<numTests; i++){
const N = 6;
const A = [];
const n = 10;
for (let j=0; j<n; j++){
const num = Math.floor(Math.random() * (1 << N));
const fA = f(A, N);
const brute = bruteForce(A);
if (fA != brute){
console.log(fA, brute);
console.log("Done testing.");
int[] numbers = new[] { 10, 7, 2, 8, 3 };
static bool IsPowerOfTwo(int n) => (n != 0) && ((n & (n - 1)) == 0);
long result = numbers.AsParallel()
.Select((a, i) => numbers
.Skip(i + 1)
.Select(b => a & b)
If I understand the problem correctly, this should work and should be faster.
First, for each number in the array we grab all elements in the array after it to get a collection of numbers to pair with.
Then we transform each pair number with a bitwise AND, then counting the number that satisfy our 'IsPowerOfTwo;' predicate (implementation here).
Finally we simply get the sum of all the counts - our output from this case is 6.
I think this should be more performant than your dictionary based solution - it avoids having to perform a lookup each time you wish to check power of 2.
I think also given the numerical constraints of your inputs it is fine to use int data types.
Consider a simple equation: 5 = 2 * a + 4 * b - 3 * c
Is it a better way to loop thru the variables than multiple for loops?
This has multiple answers, but in order to find answers to the equation I'm using multiple for loops like
for(int a = 1; a < 50; a++) {
for(int b = 1; b < 50; b++) {
for(int c = 1; c < 50; c++) {
Now, for this example this would not take much time. But if this was going thru a dataset of thousands of entries and the goal of the for loop is see If I can find a optimized set of variables then its going to take some time. Maybe there are more than 3. The equation above is just an example.
Is there a alternative better way to do this? A code pattern maybe? I'm also interested to see how I can clean this up as there is a lot of nesting.
My validation logic is already thrown inside a BackgroundWorker and I limit the count so I can utilize 100% of the CPU, so I'm mainly looking into not doing for-loop nesting if possible.
The nested loop is the most efficient way to do it, and you can parallelize it pretty easily by Parallel.Foring the outer loop.
int solutionsCount = 0;
Parallel.For(1, 50, a =>
for (int b = 1; b < 50; b++)
for (int c = 1; c < 50; c++)
if (2 * a + 4 * b - 3 * c == 0) Interlocked.Increment(ref solutionsCount);
If you want to get fancy you can create a custom iterator that will produce all the permutations:
private static IEnumerable<(int a, int b, int c)> Loop(int to1, int to2, int to3)
for (int a = 1; a < to1; a++)
for (int b = 1; b < to2; b++)
for (int c = 1; c < to3; c++)
yield return (a, b, c); // this is a ValueTuple<int, int, int>
And use it like this:
foreach (var p in Loop(50, 50, 50))
// Do something with p.a, p.b and p.c
You can even use LINQ to get the solutions directly:
var solutions = Loop(50, 50, 50)
.Where(p => 2 * p.a + 4 * p.b - 3 * p.c == 0);
Console.WriteLine($"Solutions: {String.Join(", ", solutions)}");
...but it is 10 times slower.
You could even go pure LINQ like this:
var solutions = Enumerable.Range(1, 50 - 1)
.SelectMany(a => Enumerable.Range(1, 50 - 1)
.SelectMany(b => Enumerable.Range(1, 50 - 1)
.Where(c => 2 * a + 4 * b - 3 * c == 0)));
...which has about the same performance as the previous one. It is also parallelizable by chaining AsParallel() in the query (after the first Enumerable.Range).
public static int n;
public static int w;
public static int[] s;
public static int[] p;
static void Main(string[] args)
n = 5;
w = 5;
s = new int[n + 1];
p = new int[n + 1];
Random rnd = new Random();
for (int i = 1; i <= n; i++)
s[i] = rnd.Next(1, 10);
p[i] = rnd.Next(1, 10);
Console.WriteLine(F_recursion(n, w));
Console.WriteLine(DP(n, w));
// recursive approach
public static int F_recursion(int n, int w)
if (n == 0 || w == 0)
return 0;
else if (s[n] > w)
return F_recursion(n - 1, w);
return Math.Max(F_recursion(n - 1, w), (p[n] + F_recursion(n - 1, w - s[n])));
// iterative approach
public static int DP(int n, int w)
int result = 0;
for (int i = 1; i <= n; i++)
if (s[i] > w)
result += p[i];
w = w - s[i];
return result;
I need to convert F_recursion function to iterative. I currently written following function DP that sometimes works but not always. I learned that problem is in F_recursion(n - 1, w - s[n]) I have no idea how to make w - s[n] work correctly in iterative solution. If change w - s[n] and w - s[i] to only w then program always work.
In Console:
s[i] = 2 p[i] = 3
s[i] = 3 p[i] = 4
s[i] = 5 p[i] = 3
s[i] = 3 p[i] = 8
s[i] = 6 p[i] = 6
but sometimes it works
s[i] = 5 p[i] = 6
s[i] = 8 p[i] = 1
s[i] = 3 p[i] = 5
s[i] = 3 p[i] = 1
s[i] = 7 p[i] = 7
The following approach might be useful, when bigger numbers are involved (specially for s) and consequently a 2 dimensional array would be unnecessary big and only a few w values would actually be used in computing the result.
The idea: precompute possible w values, by starting at w and for each i in [n, n-1, ..., 1] determine the values w_[i], where w_[i+1] >= s[i] without duplicates.
Then iterate i_n over n and compute sub-results only for valid w_[i] values.
I chose an array of Dictionary as datastructure, since it's relatively easy to design sparse data this way.
public static int DP(int n, int w)
// compute possible w values for each iteration from 0 to n
Stack<HashSet<int>> validW = new Stack<HashSet<int>>();
validW.Push(new HashSet<int>() { w });
for (int i = n; i > 0; i--)
HashSet<int> validW_i = new HashSet<int>();
foreach (var prevValid in validW.Peek())
if (prevValid >= s[i])
validW_i.Add(prevValid - s[i]);
// compute sub-results for all possible n,w values.
Dictionary<int, int>[] value = new Dictionary<int,int>[n + 1];
for (int n_i = 0; n_i <= n; n_i++)
value[n_i] = new Dictionary<int, int>();
HashSet<int> validSubtractW_i = validW.Pop();
foreach (var w_j in validSubtractW_i)
if (n_i == 0 || w_j == 0)
value[n_i][w_j] = 0;
else if (s[n_i] > w_j)
value[n_i][w_j] = value[n_i - 1][w_j];
value[n_i][w_j] = Math.Max(value[n_i - 1][w_j], (p[n_i] + value[n_i - 1][w_j - s[n_i]]));
return value[n][w];
It's important to understand that some space and computation is "wasted" in order to precompute possible w values and to support the sparse data structures. So this approach might perform bad for large data sets with small values in s, where most w values will be possible sub-results.
After some more thought I realized, if space is a concern, you can actually throw away the sub-results of everything except the previous outer loop iteration, since the recursion in this algorithm follows a strict n-1 pattern. However, I'm not including this into my code for now.
Your approach does not work because your dynamic programmig state space (which apparently is only one variable) does not match the signature of the recursive method. The goal of the dynamic programming approach should be to define and fill a state space such that all results for evaluation are available when needed. On inspection of the recursive method, notice that the recursive calls of F_recursion may change both arguments, n and w. This is an indication that a two-dimensional state space should be used.
The first argument (which apparently limits the range of items) can range from 0 to n while the second argument (which apparently is some bound for the total of an item property) can range from 0 to w.
You should define a two dimensional state space
int[,] value = new int[n,w];
for accomodation of the values. Next, you should initialize the values to undefined; you can use the value Int32.MaxValue for this, because it will behave in a suitable way if the minimum with some different value is calculated.
Next, the iterative version of the algorithm shoud use two loops which iterate in a forwad manner, unlike the recursive iteration which decreases the arguments.
for (int i = 0; i < n; i++)
for (int j = 0; j < w; j++)
// logic for the recurrence relation goes here
In the innermost block you can use a modified version of the recurrence relation. Instead of using recursive calls, you access values which are stored in value; instead of returning values, you write the values to value.
Semantically this is the same as memoization, but instead of using actual recursive calls, the order of evaluation asserts that necessary values always exist, making additional logic unneccessary.
Once the state space is filled, you have to examine its last state (namely the part of the array where the first index is n-1) to determine the maximal value for the entire input.
I'm stuck with a college project and I wonder if you can help me have a hint on how to do this, I have to do it on c#.
Using an 80x80 matrix I have to go through it only from left to right and from up to down so I can find the path that gives me the lowest number when sum all the values from top left corner to bottom right corner.
As an example on this case the numbers that should be picked up are:
131,201,96,342,746,422,121,37,331 = 2427 the lowest number
It does not matter how many times you move to the right or down but what matters is to get the lowest number.
This is an interesting project in that it illustrates an important technique called dynamic programming: a solution to the entire problem can be constructed from a solution to a smaller sub-problem with a simple computation step.
Start with a recursive solution that wouldn't work for large matrix:
// m is the matrix
// R (uppercase) is the number of rows; C is the number of columns
// r (lowercase) and c are starting row/column
int minSum(int[,] m, int R, int C, int r, int c) {
int res;
if (r == R-1 && c == C-1) {
// Bottom-right corner - one answer
res = m[r,c];
} else if (r == R-1) {
// Bottom row - go right
res = m[r,c] + minSum(m, R, C, r, c+1);
} else if (c == C-1) {
// Rightmost column - go down
res = m[r,c] + minSum(m, R, C, r+1, c);
} else {
// In the middle - try going right, then try going down
int goRight = m[r,c] + minSum(m, R, C, r, c+1);
int goDown = m[r,c] + minSum(m, R, C, r+1, c);
res = Math.Min(goRight, goDown);
return res;
This will work for a 10×10 matrix, but it would take too long for a 80×80 matrix. However, it provides a template for a working solution: if you add a separate matrix of results you obtained at earlier steps, you would transform it into a faster solution:
// m is the matrix
// R (uppercase) is the number of rows; C is the number of columns
// known is the matrix of solutions you already know
// r (lowercase) and c are starting row/column
int minSum(int[,] m, int R, int C, int?[,] known, int r, int c) {
if (known[r,c].HasValue) {
return known[r,c];
int res;
... // Computation of the result goes here
known[r,c] = res;
return res;
This particular technique of implementing dynamic programming solutions is called memoization.
First step is always analysis, in particular to try to figure out the scale of the problem.
Ok assuming you can only ever step down or to the right, you will have 79 steps down and 79 steps to the right. 158 steps total of the form 011100101001 (1=move right, 0=move down) etc. Note that the solution space is not as much as 2^158 since not all binary numbers are possible... you must have exactly 79 downs and 79 rights. From combinatorics, this limits the number of possible correct answers to 158!/79!79!, which evaluates to still a very large number, something like 10^46.
You should realize is that this is quite large to brute-force, which methodology otherwise should definitely be a consideration for you if the project does not specifically rule it out, since it invariably makes the algorithm simpler (e.g. by simply iterating all the solution possibilities). I imagine the question has been designed this way in order to require you to use an algorithm that does not brute-force the correct answer.
The way to solve this problem without iterating the whole solution space is to realize that the best path to the lower right corner is the better of the two best paths to the squares immediately to the left of, and above, the lower right corner, and the best path to those is the best path to the next diagonal (numbers 524, 121, 111 in your diagram), and so on.
What you need to do is to treat each cell as a node in a graph and implement shortest path algorithm.
Dijkstra algorithm is one of them. You can find more information here https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm
It is really simple, because You can divide the problem into solved and unsolved part and move items from unsolved into solved one by one. Start on top left and move through all "/" diagonals towards bottom right.
int size = 5;
int[,] matrix = new int[,] {
//Random rand = new Random();
//for (int y = 0; y < size; ++y)
// for (int x = 0; x < size; ++x)
// {
// matrix[y, x] = rand.Next(10);
// }
int[,] distance = new int[size, size];
distance[0, 0] = matrix[0, 0];
for (int i = 1; i < size * 2 - 1; ++i)
int y = Math.Min(i, size - 1);
int x = i - y;
while (x < size && y >= 0)
distance[y, x] = Math.Min(
x > 0 ? distance[y, x - 1] + matrix[y, x] : int.MaxValue,
y > 0 ? distance[y - 1, x] + matrix[y, x] : int.MaxValue);
for (int y = 0; y < size; ++y)
for (int x = 0; x < size; ++x)
Console.Write(matrix[y, x].ToString().PadLeft(5, ' '));
for (int y = 0; y < size; ++y)
for (int x = 0; x < size; ++x)
Console.Write(distance[y, x].ToString().PadLeft(5, ' '));
I asked a question about having the Excel's BetaInv function ported to .NET: BetaInv function in SQL Server
now I managed to write that function in pure dependency less C# code and I do get the same results than in MS Excel up to 6 or 7 digits after comma, results are fine for us, the problem is that such code is embedded in a SQL CLR Function and gets called thousands of time from a stored procedure and makes the execution of the whole procedure about 50% slower, from 30 seconds up to a minute if I use that function or not.
here some code of it, I am not asking a deep analysis but is there anybody who sees any major performance issue in the way I am doing this calculations? like for example usage of other data types instead of doubles or whatsoever... ?
private static double betacf(double a, double b, double x)
int m, m2;
double aa, c, d, del, h, qab, qam, qap;
qab = a + b;
qap = a + 1.0;
qam = a - 1.0;
c = 1.0; // First step of Lentz’s method.
d = 1.0 - qab * x / qap;
if (System.Math.Abs(d) < FPMIN)
d = FPMIN;
d = 1.0 / d;
h = d;
for (m = 1; m <= MAXIT; ++m)
m2 = 2 * m;
aa = m * (b - m) * x / ((qam + m2) * (a + m2));
d = 1.0 + aa * d; //One step (the even one) of the recurrence.
if (System.Math.Abs(d) < FPMIN)
d = FPMIN;
c = 1.0 + aa / c;
if (System.Math.Abs(c) < FPMIN)
c = FPMIN;
d = 1.0 / d;
h *= d * c;
aa = -(a + m) * (qab + m) * x / ((a + m2) * (qap + m2));
d = 1.0 + aa * d; // Next step of the recurrence (the odd one).
if (System.Math.Abs(d) < FPMIN)
d = FPMIN;
c = 1.0 + aa / c;
if (System.Math.Abs(c) < FPMIN)
c = FPMIN;
d = 1.0 / d;
del = d * c;
h *= del;
if (System.Math.Abs(del - 1.0) < EPS)
// Are we done?
if (m > MAXIT)
return 0;
return h;
private static double gammln(double xx)
double x, y, tmp, ser;
double[] cof = new double[] { 76.180091729471457, -86.505320329416776, 24.014098240830911, -1.231739572450155, 0.001208650973866179, -0.000005395239384953 };
y = xx;
x = xx;
tmp = x + 5.5;
tmp -= (x + 0.5) * System.Math.Log(tmp);
ser = 1.0000000001900149;
for (int j = 0; j <= 5; ++j)
y += 1;
ser += cof[j] / y;
return -tmp + System.Math.Log(2.5066282746310007 * ser / x);
The only thing that stands out for me, and is usually a performance hit, is memory allocation. I don't know how often gammln is called but you might want to move the double[] cof = new double[] {} to a static one time only allocation.
double is usually the best type. Especially since the functions in Math take doubles. Unfortunately I see no obvious improvements to make on your code.
It might be possible to use look up tables to get a better first estimate on which you iterate, but since I don't know the Math behind what you're doing I don't know if that's possible in this specific case.
Obviously larger epsilons will improve performance. So choose it as large as possible while fulfilling your accuracy demands.
If the function gets called repeatedly with the same parameters you might be able to cache results.
One thing that looks odd is the way you force small values for c, d,... to FPMIN. My instinct is that this might lead to suboptimal step sizes.
All I've got is unrolling the j loop in gammln, but it'll make at most a tiny difference.
A more radical thought would be to rewrite in pure T-SQL, since it has everything you use: + - * / abs log are all available.