Implementation of extremely sparse arrays

Implementation of extremely sparse arrays - c#

I have an extremely sparse static array with 4 dimensions of 8192 each that I want to do lookups from (C#). Only 68796 of these 4.5 * 10^15 values are non-zero. What is the fastest way to do this, with speed and low memory usage being vital?
Thanks

First, I would argue that plain arrays are quite clearly the wrong kind of data structure for your problem.
How about using a dictionary where you use a 4-tuple as index?
var lookup = new Dictionary<Tuple<int,int,int,int>, int>();
I've never done that myself, but it should work fine. If you don't have Tuple ready because you're working with a version of the .NET Framework preceding .NET 4, you could provide your own index type:
struct LookupKey
{
public readonly int First;
public readonly int Second;
public readonly int Third;
public readonly int Fourth;
…
}
var lookup = new Dictionary<LookupKey, int>();

You could use a plain Dictionary or create a similar map suited for your needs (it will be an array in which you place elements according to an hashvalue you calculate on your 4 values) but you'll need to care about collisions.
Also a binary seach tree can make the trick if you accept a logarithmic complexity for lookup..

Use hashtable (generic Dictionary is already implemented as Hashtable). As key use vector of 4 dimension index. As value store what you want.

What I'd do is use hash lists instead of "normal" arrays for this, then (pseudo-code):
// first, check bounds:
if(x < 0 || y < 0 || z < 0 || w < 0
|| x > xsize || y > ysize || z > zsize || w > wsize)
throw new Whatever(...);
// now return value if != 0
if(x in arr && y in arr[x] && z in arr[x][y] && w in arr[x][y][z])
return arr[x][y][z][w];
else
return 0;

I think the best way is to use a hash-table (Dictionary<T, int>), indexed with a custom struct containing the 4 indexes. Don't forgot to override object.Equals() and object.GetHashCode() on that struct.

Related

.NET BitArray cardinality [duplicate]

I am implementing a library where I am extensively using the .Net BitArray class and need an equivalent to the Java BitSet.Cardinality() method, i.e. a method which returns the number of bits set. I was thinking of implementing it as an extension method for the BitArray class. The trivial implementation is to iterate and count the bits set (like below), but I wanted a faster implementation as I would be performing thousands of set operations and counting the answer. Is there a faster way than the example below?
count = 0;
for (int i = 0; i < mybitarray.Length; i++)
{
if (mybitarray [i])
count++;
}

This is my solution based on the "best bit counting method" from http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel
public static Int32 GetCardinality(BitArray bitArray)
{
Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];
bitArray.CopyTo(ints, 0);
Int32 count = 0;
// fix for not truncated bits in last integer that may have been set to true with SetAll()
ints[ints.Length - 1] &= ~(-1 << (bitArray.Count % 32));
for (Int32 i = 0; i < ints.Length; i++)
{
Int32 c = ints[i];
// magic (http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetParallel)
unchecked
{
c = c - ((c >> 1) & 0x55555555);
c = (c & 0x33333333) + ((c >> 2) & 0x33333333);
c = ((c + (c >> 4) & 0xF0F0F0F) * 0x1010101) >> 24;
}
count += c;
}
return count;
}
According to my tests, this is around 60 times faster than the simple foreach loop and still 30 times faster than the Kernighan approach with around 50% bits set to true in a BitArray with 1000 bits. I also have a VB version of this if needed.

you can accomplish this pretty easily with Linq
BitArray ba = new BitArray(new[] { true, false, true, false, false });
var numOnes = (from bool m in ba
where m
select m).Count();

BitArray myBitArray = new BitArray(...
int
bits = myBitArray.Count,
size = ((bits - 1) >> 3) + 1,
counter = 0,
x,
c;
byte[] buffer = new byte[size];
myBitArray.CopyTo(buffer, 0);
for (x = 0; x < size; x++)
for (c = 0; buffer[x] > 0; buffer[x] >>= 1)
counter += buffer[x] & 1;
Taken from "Counting bits set, Brian Kernighan's way" and adapted for bytes. I'm using it for bit arrays of 1 000 000+ bits and it's superb.
If your bits are not n*8 then you can count the mod byte manually.

I had the same issue, but had more than just the one Cardinality method to convert. So, I opted to port the entire BitSet class. Fortunately it was self-contained.
Here is the Gist of the C# port.
I would appreciate if people would report any bugs that are found - I am not a Java developer, and have limited experience with bit logic, so I might have translated some of it incorrectly.

Faster and simpler version than the accepted answer thanks to the use of System.Numerics.BitOperations.PopCount
C#
Int32[] ints = new Int32[(bitArray.Count >> 5) + 1];
bitArray.CopyTo(ints, 0);
Int32 count = 0;
for (Int32 i = 0; i < ints.Length; i++) {
count += BitOperations.PopCount(ints[i]);
}
Console.WriteLine(count);
F#
let ints = Array.create ((bitArray.Count >>> 5) + 1) 0u
bitArray.CopyTo(ints, 0)
ints
|> Array.sumBy BitOperations.PopCount
|> printfn "%d"
See more details in Is BitOperations.PopCount the best way to compute the BitArray cardinality in .NET?

You could use Linq, but it would be useless and slower:
var sum = mybitarray.OfType<bool>().Count(p => p);

There is no faster way with using BitArray - What it comes down to is you will have to count them - you could use LINQ to do that or do your own loop, but there is no method offered by BitArray and the underlying data structure is an int[] array (as seen with Reflector) - so this will always be O(n), n being the number of bits in the array.
The only way I could think of making it faster is using reflection to get a hold of the underlying m_array field, then you can get around the boundary checks that Get() uses on every call (see below) - but this is kinda dirty, and might only be worth it on very large arrays since reflection is expensive.
public bool Get(int index)
{
if ((index < 0) || (index >= this.Length))
{
throw new ArgumentOutOfRangeException("index", Environment.GetResourceString("ArgumentOutOfRange_Index"));
}
return ((this.m_array[index / 0x20] & (((int) 1) << (index % 0x20))) != 0);
}
If this optimization is really important to you, you should create your own class for bit manipulation, that internally could use BitArray, but keeps track of the number of bits set and offers the appropriate methods (mostly delegate to BitArray but add methods to get number of bits currently set) - then of course this would be O(1).

If you really want to maximize the speed, you could pre-compute a lookup table where given a byte-value you have the cardinality, but BitArray is not the most ideal structure for this, since you'd need to use reflection to pull the underlying storage out of it and operate on the integral types - see this question for a better explanation of that technique.
Another, perhaps more useful technique, is to use something like the Kernighan trick, which is O(m) for an n-bit value of cardinality m.
static readonly ZERO = new BitArray (0);
static readonly NOT_ONE = new BitArray (1).Not ();
public static int GetCardinality (this BitArray bits)
{
int c = 0;
var tmp = new BitArray (myBitArray);
for (c; tmp != ZERO; c++)
tmp = tmp.And (tmp.And (NOT_ONE));
return c;
}
This too is a bit more cumbersome than it would be in say C, because there are no operations defined between integer types and BitArrays, (tmp &= tmp - 1, for example, to clear the least significant set bit, has been translated to tmp &= (tmp & ~0x1).
I have no idea if this ends up being any faster than naively iterating for the case of the BCL BitArray, but algorithmically speaking it should be superior.
EDIT: cited where I discovered the Kernighan trick, with a more in-depth explanation

If you don't mind to copy the code of System.Collections.BitArray to your project and Edit it,you can write as fellow:
(I think it's the fastest. And I've tried use BitVector32[] to implement my BitArray, but it's still so slow.)
public void Set(int index, bool value)
{
if ((index < 0) || (index >= this.m_length))
{
throw new ArgumentOutOfRangeException("index", "Index Out Of Range");
}
SetWithOutAuth(index,value);
}
//When in batch setting values,we need one method that won't auth the index range
private void SetWithOutAuth(int index, bool value)
{
int v = ((int)1) << (index % 0x20);
index = index / 0x20;
bool NotSet = (this.m_array[index] & v) == 0;
if (value && NotSet)
{
CountOfTrue++;//Count the True values
this.m_array[index] |= v;
}
else if (!value && !NotSet)
{
CountOfTrue--;//Count the True values
this.m_array[index] &= ~v;
}
else
return;
this._version++;
}
public int CountOfTrue { get; internal set; }
public void BatchSet(int start, int length, bool value)
{
if (start < 0 || start >= this.m_length || length <= 0)
return;
for (int i = start; i < length && i < this.m_length; i++)
{
SetWithOutAuth(i,value);
}
}

I wrote my version of after not finding one that uses a look-up table:
private int[] _bitCountLookup;
private void InitLookupTable()
{
_bitCountLookup = new int[256];
for (var byteValue = 0; byteValue < 256; byteValue++)
{
var count = 0;
for (var bitIndex = 0; bitIndex < 8; bitIndex++)
{
count += (byteValue >> bitIndex) & 1;
}
_bitCountLookup[byteValue] = count;
}
}
private int CountSetBits(BitArray bitArray)
{
var result = 0;
var numberOfFullBytes = bitArray.Length / 8;
var numberOfTailBits = bitArray.Length % 8;
var tailByte = numberOfTailBits > 0 ? 1 : 0;
var bitArrayInBytes = new byte[numberOfFullBytes + tailByte];
bitArray.CopyTo(bitArrayInBytes, 0);
for (var i = 0; i < numberOfFullBytes; i++)
{
result += _bitCountLookup[bitArrayInBytes[i]];
}
for (var i = (numberOfFullBytes * 8); i < bitArray.Length; i++)
{
if (bitArray[i])
{
result++;
}
}
return result;
}

The problem is naturally O(n), as a result your solution is probably the most efficient.
Since you are trying to count an arbitrary subset of bits you cannot count the bits when they are set (would would provide a speed boost if you are not setting the bits too often).
You could check to see if the processor you are using has a command which will return the number of set bits. For example a processor with SSE4 could use the POPCNT according to this post. This would probably not work for you since .Net does not allow assembly (because it is platform independent). Also, ARM processors probably do not have an equivalent.
Probably the best solution would be a look up table (or switch if you could guarantee the switch will compiled to a single jump to currentLocation + byteValue). This would give you the count for the whole byte. Of course BitArray does not give access to the underlying data type so you would have to make your own BitArray. You would also have to guarantee that all the bits in the byte will always be part of the intersection which does not sound likely.
Another option would be to use an array of booleans instead of a BitArray. This has the advantage not needing to extract the bit from the others in the byte. The disadvantage is the array will take up 8x as much space in memory meaning not only wasted space, but also more data push as you iterate through the array to perform your count.
The difference between a standard array look up and a BitArray look up is as follows:
Array:
offset = index * indexSize
Get memory at location + offset and save to value
BitArray:
index = index/indexSize
offset = index * indexSize
Get memory at location + offset and save to value
position = index%indexSize
Shift value position bits
value = value and 1
With the exception of #2 for Arrays and #3 most of these commands take 1 processor cycle to complete. Some of the commands can be combined into 1 command using x86/x64 processors, though probably not with ARM since it uses a reduced set of instructions.
Which of the two (array or BitArray) perform better will be specific to your platform (processor speed, processor instructions, processor cache sizes, processor cache speed, amount of system memory (Ram), speed of system memory (CAS), speed of connection between processor and RAM) as well as the spread of indexes you want to count (are the intersections most often clustered or are they randomly distributed).
To summarize: you could probably find a way to make it faster, but your solution is the fastest you will get for your data set using a bit per boolean model in .NET.
Edit: make sure you are accessing the indexes you want to count in order. If you access indexes 200, 5, 150, 151, 311, 6 in that order then you will increase the amount of cache misses resulting in more time spent waiting for values to be retrieved from RAM.

Does C# have a builtinway of converting binary number to array of bit powers ( e.g. 0b1101 -> {0, 2, 3} )?

(Edit - preface: I am implementing an iterable through all subsets of given size.
For getting the next combination, I am using Gosper's hack to quickly get the 0/1 vector of lexicographically next combination. Now I need to quickly map the vector of combination to the array of elements from my set. Luckily, the elements are the very same as the powers of individual bits, and I am wondering if C# doesn't have fast shortcut for that.)
If I get K-th subset (in lexicographical order) of numbers 0 - (N-1), the bits in binary representation of K are telling me which elements should I choose. What is the most elegant way of checking which bits are set and making a subset (array) based on these?
Something like:
var BitPowers = new List<int>();
for(int i = 0; i<N; ++i)
{
if((K & (1<<i)) != 0)
{
BitPowers.Add(i);
}
}
return BitPowers.ToArray();
would probably suffice, but is it the best way? I guess bit operations are quick, but as the number of possible sets is exponential, optimizing this function as much as I can would be ideal.

There is no .NET builtin API to do such thing, as I know.
The Linq magic
You can write this compact code in one assignment, but less optimized for speed:
int value = 0b10110010;
var BitPowers = Convert.ToString(value, 2)
.Reverse()
.Select((bit, index) => new { index, bit })
.Where(v => v.bit == '1').Select(v => v.index);
foreach ( int index in BitPowers )
Console.WriteLine(index);
It converts the integer to a binary representation of a string, inverted to have good indexes from left to right, then it selects a pair of (bit, index), then it filters on those that are defined, then it selects the indexes for create the enumerable list.
Output
1
4
5
7
Compromise between elegance and speed
You can simplify your loop using a BitArray instance.
Perhaps it is the closest "builtin" way you ask for:
var bits = new BitArray(BitConverter.GetBytes(value));
for ( int index = 0; index < bits.Length; index++ )
if ( bits[index] )
BitPowers.Add(index);

Cartesian product subset returning set of mostly 0

I'm trying to calculate
If we calculated every possible combination of numbers from 0 to (c-1)
with a length of x
what set would occur at point i
For example:
c = 4
x = 4
i = 3
Would yield:
[0000]
[0001]
[0002]
[0003] <- i
[0010]
....
[3333]
This is very nearly the same problem as in the related question Logic to select a specific set from Cartesian set. However, because x and i are large enough to require the use of BigInteger objects, the code has to be changed to return a List, and take an int, instead of a string array:
int PossibleNumbers;
public List<int> Get(BigInteger Address)
{
List<int> values = new List<int>();
BigInteger sizes = new BigInteger(1);
for (int j = 0; j < PixelArrayLength; j++)
{
BigInteger index = BigInteger.Divide(Address, sizes);
index = (index % PossibleNumbers);
values.Add((int)index);
sizes *= PossibleNumbers;
}
return values;
}
This seems to behave as I'd expect, however, when I start using values like this:
c = 66000
x = 950000
i = (66000^950000)/2
So here, I'm looking for the ith value in the cartesian set of 0 to (c-1) of length 950000, or put another way, the halfway point.
At this point, I just get a list of zeroes returned. How can I solve this problem?
Notes: It's quite a specific problem, and I apologise for the wall-of-text, I do hope it's not too much, I was just hoping to properly explain what I meant. Thanks to you all!
Edit: Here are some more examples: http://pastebin.com/zmSDQEGC

Here is a generic base converter... it takes a decimal for the base10 value to convert into your newBase and returns an array of int's. If you need a BigInteger this method works perfectly well with just changing the base10Value to BigInteger.
EDIT: Converted method to BigInteger since that's what you need.
EDIT 2: Thanks phoog for pointing out BigInteger is base2 so changing the method signature.
public static int[] ConvertToBase(BigInteger value, int newBase, int length)
{
var result = new Stack<int>();
while (value > 0)
{
result.Push((int)(value % newBase));
if (value < newBase)
value = 0;
else
value = value / newBase;
}
for (var i = result.Count; i < length; i++)
result.Push(0);
return result.ToArray();
}
usage...
int[] a = ConvertToBase(13, 4, 4) = [0,0,3,1]
int[] b = ConvertToBase(0, 4, 4) = [0,0,3,1]
int[] c = ConvertToBase(1234, 12, 4) = [0,8,6,10]
However the probelm you specifically state is a bit large to test it on. :)
Just calculating 66000 ^ 950000 / 2 is a good bit of work as Phoog mentioned. Unless of course you meant ^ to be the XOR operator. In which case it's quite fast.
EDIT: From the comments... The largest base10 number that can be represented given a particular newBase and length is...
var largestBase10 = BigInteger.Pow(newBase, length)-1;

The first expression of the problem boils down to "write 3 as a 4-digit base-4 number". So, if the problem is "write i as an x-digit base-c number", or, in this case, "write (66000^950000)/2 as a 950000-digit base 66000 number", then does that make it easier?
If you're specifically looking for the halfway point of the cartesian product, it's not so hard. If you assume that c is even, then the most significant digit is c / 2, and the rest of the digits are zero. If your return value is all zeros, then you may have an off-by-one error, or the like, since actually only one digit is incorrect.

Find intersection of two multi-dimensional Arrays in C# 4.0

Trying to find a solution to my ranking problem.
Basically I have two multi-dimensional double[,] arrays. Both containing rankings for certain scenarios, so [rank number, scenario number]. More than one scenario can have the same rank.
I want to generate a third multi-dimensional array, taking the intersections of the previous two multi-dimensional arrays to provide a joint ranking.
Does anyone have an idea how I can do this in C#?
Many thanks for any advice or help you can provide!
Edit:
Thank you for all the responses, sorry I should have included an example.
Here it is:
Array One:
[{0,4},{1,0},{1,2},{2,1},{3,5},{4,3}]
Array Two:
[{0,1},{0,4},{1,0},{1,2},{3,5},{4,3}]
Required Result:
[{0,4},{1,0},{1,2},{1,1},{2,5},{3,3}]

Here's some sample code that makes a bunch of assumptions but might be something like what you are looking for. I've added a few comments as well:
static double[,] Intersect(double[,] a1, double[,] a2)
{
// Assumptions:
// a1 and a2 are two-dimensional arrays of the same size
// An element in the array matches if and only if its value is found in the same location in both arrays
// result will contain not-a-number (NaN) for non-matches
double[,] result = new double[a1.GetLength(0), a1.GetLength(1)];
for (int i = 0; i < a1.GetLength(0); i++)
{
for (int j = 0; j < a1.GetLength(1); j++)
{
if (a1[i, j] == a2[i, j])
{
result[i, j] = a1[i, j];
}
else
{
result[i, j] = double.NaN;
}
}
}
return result;
}
For the most part, finding the intersection of multiple dimensional arrays will involve iterating over the elements in each of the dimensions in the arrays. If the indices of the array are not part of the match criteria (my second assumption in my code is removed), you would have to walk each dimension in each array - which increases the run-time of the algorithm (in this case, from O(n^2) to O(n^4).
If you care enough about run-time, I believe array matching is one of the typical examples of dynamic programming (DP) optimization; which you can read up on at your leisure.
I'm not sure how you wanted your results...you could probably return a flat collection of results that can be indexed by a pair, which would potentially save a lot of space if the expected result set is typically small. I went with a third fixed-sized array because it was the easiest thing to do.
Lastly, I'll mention that I don't see a keen C# way of doing this using IEnumerable, LINQ, or something like that. Someone more C# knowledgeable than I can chime in anytime now....

Given the additional information, I'd argue that you aren't actually working with multidimensional arrays, but instead are working with a collection of pairs. The pair is a pair of doubles. I think the following should work nicely:
public class Pair : IEquatable<Pair>
{
public double Rank;
public double Scenario;
public bool Equals(Pair p)
{
return Rank == p.Rank && Scenario == p.Scenario;
}
public override int GetHashCode()
{
int hashRank= Rank.GetHashCode();
int hashScenario = Scenario.GetHashCode();
return hashRank ^ hashScenario;
}
}
You can then use the Intersect operator on IEnumerable:
List<Pair> one = new List<Pair>();
List<Pair> two = new List<Pair>();
// ... populate the lists
List<Pair> result = one.Intersect(two).ToList();
Check out the following msdn article on Enumerable.Intersect() for more information:
http://msdn.microsoft.com/en-us/library/bb910215%28v=vs.90%29.aspx

How to safely check arrays bounds

I am making game of life in 2D array. I need to determine when all adjacent cells are empty so I just test all of them. It works well unless the cell being checked is the boundary. Then of course testing X+1 throws an exception as the index is out of the array boundaries. Can I handle this somehow instead of handling the exception?
Thanks

use GetLength(0) and GetLength(1) to get the width and height of the array.
There is also a neat performance trick the runtime uses for its own bounds checks: casting to unsigned int to reduce the two checks into one. But since this costs readability you use it if the check is really performance critical.
(i >= 0) && (i < length)
becomes
(uint)i < length

If you want speed you'll have to treat the edge-cells differently and process the rest with a
for(int x = 1; x < Size-1; x++) // these all have x-1 and x+1 neighbours

I created an extension method for a 2D array to check if in bounds:
public static class ExtensionMethods
{
public static bool In2DArrayBounds(this object[,] array, int x, int y)
{
if (x < array.GetLowerBound(0) ||
x > array.GetUpperBound(0) ||
y < array.GetLowerBound(1) ||
y > array.GetUpperBound(1)) return false;
return true;
}
}

Yes, before accessing the element at index i + 1, check whether i + 1 is strictly inferior to array.Length.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.