Optimize the rearranging of bits

Optimize the rearranging of bits - c#

I have a core C# function that I am trying to speed up. Suggestions involving safe or unsafe code are equally welcome. Here is the method:
public byte[] Interleave(uint[] vector)
{
var byteVector = new byte[BytesNeeded + 1]; // Extra byte needed when creating a BigInteger, for sign bit.
foreach (var idx in PrecomputedIndices)
{
var bit = (byte)(((vector[idx.iFromUintVector] >> idx.iFromUintBit) & 1U) << idx.iToByteBit);
byteVector[idx.iToByteVector] |= bit;
}
return byteVector;
}
PrecomputedIndices is an array of the following class:
class Indices
{
public readonly int iFromUintVector;
public readonly int iFromUintBit;
public readonly int iToByteVector;
public readonly int iToByteBit;
public Indices(int fromUintVector, int fromUintBit, int toByteVector, int toByteBit)
{
iFromUintVector = fromUintVector;
iFromUintBit = fromUintBit;
iToByteVector = toByteVector;
iToByteBit = toByteBit;
}
}
The purpose of the Interleave method is to copy bits from an array of uints to an array of bytes. I have pre-computed the source and target array index and the source and target bit number and stored them in the Indices objects. No two adjacent bits in the source will be adjacent in the target, so that rules out certain optimizations.
To give you an idea of scale, the problem I am working on has about 4,200 dimensions, so "vector" has 4,200 elements. The values in vector range from zero to twelve, so I only need to use four bits to store their values in the byte array, thus I need 4,200 x 4 = 16,800 bits of data, or 2,100 bytes of output per vector. This method will be called millions of times. It consumes approximately a third of the time in the larger procedure I need to optimize.
UPDATE 1: Changing "Indices" to a struct and shrinking a few of the datatypes so that the object was just eight bytes (an int, a short, and two bytes) reduced the percentage of execution time from 35% to 30%.

These are the crucial parts of my revised implementation, with ideas drawn from the commenters:
Convert object to struct, shrink data types to smaller ints, and rearrange so that the object should fit into a 64-bit value, which is better for a 64-bit machine:
struct Indices
{
/// <summary>
/// Index into source vector of source uint to read.
/// </summary>
public readonly int iFromUintVector;
/// <summary>
/// Index into target vector of target byte to write.
/// </summary>
public readonly short iToByteVector;
/// <summary>
/// Index into source uint of source bit to read.
/// </summary>
public readonly byte iFromUintBit;
/// <summary>
/// Index into target byte of target bit to write.
/// </summary>
public readonly byte iToByteBit;
public Indices(int fromUintVector, byte fromUintBit, short toByteVector, byte toByteBit)
{
iFromUintVector = fromUintVector;
iFromUintBit = fromUintBit;
iToByteVector = toByteVector;
iToByteBit = toByteBit;
}
}
Sort the PrecomputedIndices so that I write each target byte and bit in ascending order, which improves memory cache access:
Comparison<Indices> sortByTargetByteAndBit = (a, b) =>
{
if (a.iToByteVector < b.iToByteVector) return -1;
if (a.iToByteVector > b.iToByteVector) return 1;
if (a.iToByteBit < b.iToByteBit) return -1;
if (a.iToByteBit > b.iToByteBit) return 1;
return 0;
};
Array.Sort(PrecomputedIndices, sortByTargetByteAndBit);
Unroll the loop so that a whole target byte is assembled at once, reducing the number of times I access the target array:
public byte[] Interleave(uint[] vector)
{
var byteVector = new byte[BytesNeeded + 1]; // An extra byte is needed to hold the extra bits and a sign bit for the BigInteger.
var extraBits = Bits - BytesNeeded << 3;
int iIndex = 0;
var iByte = 0;
for (; iByte < BytesNeeded; iByte++)
{
// Unroll the loop so we compute the bits for a whole byte at a time.
uint bits = 0;
var idx0 = PrecomputedIndices[iIndex];
var idx1 = PrecomputedIndices[iIndex + 1];
var idx2 = PrecomputedIndices[iIndex + 2];
var idx3 = PrecomputedIndices[iIndex + 3];
var idx4 = PrecomputedIndices[iIndex + 4];
var idx5 = PrecomputedIndices[iIndex + 5];
var idx6 = PrecomputedIndices[iIndex + 6];
var idx7 = PrecomputedIndices[iIndex + 7];
bits = (((vector[idx0.iFromUintVector] >> idx0.iFromUintBit) & 1U))
| (((vector[idx1.iFromUintVector] >> idx1.iFromUintBit) & 1U) << 1)
| (((vector[idx2.iFromUintVector] >> idx2.iFromUintBit) & 1U) << 2)
| (((vector[idx3.iFromUintVector] >> idx3.iFromUintBit) & 1U) << 3)
| (((vector[idx4.iFromUintVector] >> idx4.iFromUintBit) & 1U) << 4)
| (((vector[idx5.iFromUintVector] >> idx5.iFromUintBit) & 1U) << 5)
| (((vector[idx6.iFromUintVector] >> idx6.iFromUintBit) & 1U) << 6)
| (((vector[idx7.iFromUintVector] >> idx7.iFromUintBit) & 1U) << 7);
byteVector[iByte] = (Byte)bits;
iIndex += 8;
}
for (; iIndex < PrecomputedIndices.Length; iIndex++)
{
var idx = PrecomputedIndices[iIndex];
var bit = (byte)(((vector[idx.iFromUintVector] >> idx.iFromUintBit) & 1U) << idx.iToByteBit);
byteVector[idx.iToByteVector] |= bit;
}
return byteVector;
}
#1 cuts the function from taking up 35% of the execution time to 30% of the execution time (14% savings).
#2 did not speed the function up, but made #3 possible.
#3 cuts the function from 30% of exec time to 19.6%, another 33% in savings.
Total savings: 44%!!!

Related

Which one is faster? Regex or EndsWith?

What would be faster?
public String Roll()
{
Random rnd = new Random();
int roll = rnd.Next(1, 100000);
if (Regex.IsMatch(roll.ToString(), #"(.)\1{1,}$"))
{
return "doubles";
}
return "none";
}
Or
public String Roll()
{
Random rnd = new Random();
int roll = rnd.Next(1, 100000);
if (roll.ToString().EndsWith("11") || roll.ToString().EndsWith("22") || roll.ToString().EndsWith("33") || roll.ToString().EndsWith("44") || roll.ToString().EndsWith("55") || roll.ToString().EndsWith("66") || roll.ToString().EndsWith("77") || roll.ToString().EndsWith("88") || roll.ToString().EndsWith("99") || roll.ToString().EndsWith("00"))
{
return "doubles";
}
return "none";
}
I'm currently using a really long if-statement list full with regexes to check if an int ends with doubles, triples, quads, quints, etc... so I would like to know which one is faster before I change all of it.

In your particular case, Regex is actually faster... but it is likely because you use EndsWith with many OR and redundant ToString(). If you simplify the logic, simple string operation will likely be faster.
Here is the performance summary for text processing - from the fastest to the slowest (10 millions loop [Prefer/Non-Prefer 32-bit] - rank is ordered based on the fastest of the two):
Large Lookup Fast Random UInt (not counted for bounty): 219/273 ms - Mine, improved from Evk's
Large Lookup Optimized Random: 228/273 ms - Ivan Stoev's Alternate Solution
Large Lookup Fast Random: 238/294 ms - Evk's Alternative Solution
Large Lookup Parameterless Random: 248/287 ms - Thomas Ayoub
There are few notes I want to make on this solution (based on the comments below it):
This solution introduces 0.0039% bias towards small numbers (< 100000) (ref: Eric Lippert's blog post, linked by Lucas Trzesniewski)
Does not generate the same number sequence as others while being tested (ref: Michael Liu's comment) - since it changes the way to use Random (from Random.Next(int) to Random.Next()), which is used for the testing itself.
While the testing cannot be performed with the exact same number sequence for this method as for the rests (as mentioned by Phil1970), I have two points to make:
Some might be interested to look at the implement of Random.Next() vs Random.Next(int) to understand why this solution will still be faster even if the same sequence of numbers are used.
The use of Random in the real case itself will (most of the time) not assume the number sequence to be the same (or predictable) - It is only for testing our method we want the Random sequence to be identical (for fair unit testing purpose). The expected faster result for this method cannot be fully derived from the testing result alone, but by also looking at the Next() vs Next(int) implementation.
Large Look-up: 320/284 ms - Evk
Fastest Optimized Random Modded: 286/333 ms Ivan Stoev
Lookup Optimized Modded: 315/329 ms - Corak
Optimized Modded: 471/330 ms - Stian Standahl
Optimized Modded + Constant: 472/337 - Gjermund Grøneng
Fastest Optimized Modded: 345/340 ms - Gjermund Grøneng
Modded: 496/370 ms - Corak + possibly Alexei Levenkov
Numbers: 537/408 ms - Alois Kraus
Simple: 1668/1176 ms - Mine
HashSet Contains: 2138/1609 ms - Dandré
List Contains: 3013/2465 ms - Another Mine
Compiled Regex: 8956/7675 ms - Radin Gospodinov
Regex: 15032/16640 ms - OP's Solution 1
EndsWith: 24763/20702 ms - OP's Solution 2
Here are my simple test cases:
Random rnd = new Random(10000);
FastRandom fastRnd = new FastRandom(10000);
//OP's Solution 2
public String RollRegex() {
int roll = rnd.Next(1, 100000);
if (Regex.IsMatch(roll.ToString(), #"(.)\1{1,}$")) {
return "doubles";
} else {
return "none";
}
}
//Radin Gospodinov's Solution
Regex optionRegex = new Regex(#"(.)\1{1,}$", RegexOptions.Compiled);
public String RollOptionRegex() {
int roll = rnd.Next(1, 100000);
string rollString = roll.ToString();
if (optionRegex.IsMatch(roll.ToString())) {
return "doubles";
} else {
return "none";
}
}
//OP's Solution 1
public String RollEndsWith() {
int roll = rnd.Next(1, 100000);
if (roll.ToString().EndsWith("11") || roll.ToString().EndsWith("22") || roll.ToString().EndsWith("33") || roll.ToString().EndsWith("44") || roll.ToString().EndsWith("55") || roll.ToString().EndsWith("66") || roll.ToString().EndsWith("77") || roll.ToString().EndsWith("88") || roll.ToString().EndsWith("99") || roll.ToString().EndsWith("00")) {
return "doubles";
} else {
return "none";
}
}
//My Solution
public String RollSimple() {
int roll = rnd.Next(1, 100000);
string rollString = roll.ToString();
return roll > 10 && rollString[rollString.Length - 1] == rollString[rollString.Length - 2] ?
"doubles" : "none";
}
//My Other Solution
List<string> doubles = new List<string>() { "00", "11", "22", "33", "44", "55", "66", "77", "88", "99" };
public String RollAnotherSimple() {
int roll = rnd.Next(1, 100000);
string rollString = roll.ToString();
return rollString.Length > 1 && doubles.Contains(rollString.Substring(rollString.Length - 2)) ?
"doubles" : "none";
}
//Dandré's Solution
HashSet<string> doublesHashset = new HashSet<string>() { "00", "11", "22", "33", "44", "55", "66", "77", "88", "99" };
public String RollSimpleHashSet() {
int roll = rnd.Next(1, 100000);
string rollString = roll.ToString();
return rollString.Length > 1 && doublesHashset.Contains(rollString.Substring(rollString.Length - 2)) ?
"doubles" : "none";
}
//Corak's Solution - hinted by Alexei Levenkov too
public string RollModded() { int roll = rnd.Next(1, 100000); return roll % 100 % 11 == 0 ? "doubles" : "none"; }
//Stian Standahl optimizes modded solution
public string RollOptimizedModded() { return rnd.Next(1, 100000) % 100 % 11 == 0 ? "doubles" : "none"; }
//Gjermund Grøneng's method with constant addition
private const string CONST_DOUBLES = "doubles";
private const string CONST_NONE = "none";
public string RollOptimizedModdedConst() { return rnd.Next(1, 100000) % 100 % 11 == 0 ? CONST_DOUBLES : CONST_NONE; }
//Gjermund Grøneng's method after optimizing the Random (The fastest!)
public string FastestOptimizedModded() { return (rnd.Next(99999) + 1) % 100 % 11 == 0 ? CONST_DOUBLES : CONST_NONE; }
//Corak's Solution, added on Gjermund Grøneng's
private readonly string[] Lookup = { "doubles", "none", "none", "none", "none", "none", "none", "none", "none", "none", "none" };
public string RollLookupOptimizedModded() { return Lookup[(rnd.Next(99999) + 1) % 100 % 11]; }
//Evk's Solution, large Lookup
private string[] LargeLookup;
private void InitLargeLookup() {
LargeLookup = new string[100000];
for (int i = 0; i < 100000; i++) {
LargeLookup[i] = i % 100 % 11 == 0 ? "doubles" : "none";
}
}
public string RollLargeLookup() { return LargeLookup[rnd.Next(99999) + 1]; }
//Thomas Ayoub's Solution
public string RollLargeLookupParameterlessRandom() { return LargeLookup[rnd.Next() % 100000]; }
//Alois Kraus's Solution
public string RollNumbers() {
int roll = rnd.Next(1, 100000);
int lastDigit = roll % 10;
int secondLastDigit = (roll / 10) % 10;
if (lastDigit == secondLastDigit) {
return "doubles";
} else {
return "none";
}
}
//Ivan Stoev's Solution
public string FastestOptimizedRandomModded() {
return ((int)(rnd.Next() * (99999.0 / int.MaxValue)) + 1) % 100 % 11 == 0 ? CONST_DOUBLES : CONST_NONE;
}
//Ivan Stoev's Alternate Solution
public string RollLargeLookupOptimizedRandom() {
return LargeLookup[(int)(rnd.Next() * (99999.0 / int.MaxValue))];
}
//Evk's Solution using FastRandom
public string RollLargeLookupFastRandom() {
return LargeLookup[fastRnd.Next(99999) + 1];
}
//My Own Test, using FastRandom + NextUInt
public string RollLargeLookupFastRandomUInt() {
return LargeLookup[fastRnd.NextUInt() % 99999 + 1];
}
The additional FastRandom class:
//FastRandom's part used for the testing
public class FastRandom {
// The +1 ensures NextDouble doesn't generate 1.0
const double REAL_UNIT_INT = 1.0 / ((double)int.MaxValue + 1.0);
const double REAL_UNIT_UINT = 1.0 / ((double)uint.MaxValue + 1.0);
const uint Y = 842502087, Z = 3579807591, W = 273326509;
uint x, y, z, w;
#region Constructors
/// <summary>
/// Initialises a new instance using time dependent seed.
/// </summary>
public FastRandom() {
// Initialise using the system tick count.
Reinitialise((int)Environment.TickCount);
}
/// <summary>
/// Initialises a new instance using an int value as seed.
/// This constructor signature is provided to maintain compatibility with
/// System.Random
/// </summary>
public FastRandom(int seed) {
Reinitialise(seed);
}
#endregion
#region Public Methods [Reinitialisation]
/// <summary>
/// Reinitialises using an int value as a seed.
/// </summary>
/// <param name="seed"></param>
public void Reinitialise(int seed) {
// The only stipulation stated for the xorshift RNG is that at least one of
// the seeds x,y,z,w is non-zero. We fulfill that requirement by only allowing
// resetting of the x seed
x = (uint)seed;
y = Y;
z = Z;
w = W;
}
#endregion
#region Public Methods [System.Random functionally equivalent methods]
/// <summary>
/// Generates a random int over the range 0 to int.MaxValue-1.
/// MaxValue is not generated in order to remain functionally equivalent to System.Random.Next().
/// This does slightly eat into some of the performance gain over System.Random, but not much.
/// For better performance see:
///
/// Call NextInt() for an int over the range 0 to int.MaxValue.
///
/// Call NextUInt() and cast the result to an int to generate an int over the full Int32 value range
/// including negative values.
/// </summary>
/// <returns></returns>
public int Next() {
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
w = (w ^ (w >> 19)) ^ (t ^ (t >> 8));
// Handle the special case where the value int.MaxValue is generated. This is outside of
// the range of permitted values, so we therefore call Next() to try again.
uint rtn = w & 0x7FFFFFFF;
if (rtn == 0x7FFFFFFF)
return Next();
return (int)rtn;
}
/// <summary>
/// Generates a random int over the range 0 to upperBound-1, and not including upperBound.
/// </summary>
/// <param name="upperBound"></param>
/// <returns></returns>
public int Next(int upperBound) {
if (upperBound < 0)
throw new ArgumentOutOfRangeException("upperBound", upperBound, "upperBound must be >=0");
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
// The explicit int cast before the first multiplication gives better performance.
// See comments in NextDouble.
return (int)((REAL_UNIT_INT * (int)(0x7FFFFFFF & (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8))))) * upperBound);
}
/// <summary>
/// Generates a random int over the range lowerBound to upperBound-1, and not including upperBound.
/// upperBound must be >= lowerBound. lowerBound may be negative.
/// </summary>
/// <param name="lowerBound"></param>
/// <param name="upperBound"></param>
/// <returns></returns>
public int Next(int lowerBound, int upperBound) {
if (lowerBound > upperBound)
throw new ArgumentOutOfRangeException("upperBound", upperBound, "upperBound must be >=lowerBound");
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
// The explicit int cast before the first multiplication gives better performance.
// See comments in NextDouble.
int range = upperBound - lowerBound;
if (range < 0) { // If range is <0 then an overflow has occured and must resort to using long integer arithmetic instead (slower).
// We also must use all 32 bits of precision, instead of the normal 31, which again is slower.
return lowerBound + (int)((REAL_UNIT_UINT * (double)(w = (w ^ (w >> 19)) ^ (t ^ (t >> 8)))) * (double)((long)upperBound - (long)lowerBound));
}
// 31 bits of precision will suffice if range<=int.MaxValue. This allows us to cast to an int and gain
// a little more performance.
return lowerBound + (int)((REAL_UNIT_INT * (double)(int)(0x7FFFFFFF & (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8))))) * (double)range);
}
/// <summary>
/// Generates a random double. Values returned are from 0.0 up to but not including 1.0.
/// </summary>
/// <returns></returns>
public double NextDouble() {
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
// Here we can gain a 2x speed improvement by generating a value that can be cast to
// an int instead of the more easily available uint. If we then explicitly cast to an
// int the compiler will then cast the int to a double to perform the multiplication,
// this final cast is a lot faster than casting from a uint to a double. The extra cast
// to an int is very fast (the allocated bits remain the same) and so the overall effect
// of the extra cast is a significant performance improvement.
//
// Also note that the loss of one bit of precision is equivalent to what occurs within
// System.Random.
return (REAL_UNIT_INT * (int)(0x7FFFFFFF & (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8)))));
}
/// <summary>
/// Fills the provided byte array with random bytes.
/// This method is functionally equivalent to System.Random.NextBytes().
/// </summary>
/// <param name="buffer"></param>
public void NextBytes(byte[] buffer) {
// Fill up the bulk of the buffer in chunks of 4 bytes at a time.
uint x = this.x, y = this.y, z = this.z, w = this.w;
int i = 0;
uint t;
for (int bound = buffer.Length - 3; i < bound; ) {
// Generate 4 bytes.
// Increased performance is achieved by generating 4 random bytes per loop.
// Also note that no mask needs to be applied to zero out the higher order bytes before
// casting because the cast ignores thos bytes. Thanks to Stefan Troschьtz for pointing this out.
t = (x ^ (x << 11));
x = y; y = z; z = w;
w = (w ^ (w >> 19)) ^ (t ^ (t >> 8));
buffer[i++] = (byte)w;
buffer[i++] = (byte)(w >> 8);
buffer[i++] = (byte)(w >> 16);
buffer[i++] = (byte)(w >> 24);
}
// Fill up any remaining bytes in the buffer.
if (i < buffer.Length) {
// Generate 4 bytes.
t = (x ^ (x << 11));
x = y; y = z; z = w;
w = (w ^ (w >> 19)) ^ (t ^ (t >> 8));
buffer[i++] = (byte)w;
if (i < buffer.Length) {
buffer[i++] = (byte)(w >> 8);
if (i < buffer.Length) {
buffer[i++] = (byte)(w >> 16);
if (i < buffer.Length) {
buffer[i] = (byte)(w >> 24);
}
}
}
}
this.x = x; this.y = y; this.z = z; this.w = w;
}
// /// <summary>
// /// A version of NextBytes that uses a pointer to set 4 bytes of the byte buffer in one operation
// /// thus providing a nice speedup. The loop is also partially unrolled to allow out-of-order-execution,
// /// this results in about a x2 speedup on an AMD Athlon. Thus performance may vary wildly on different CPUs
// /// depending on the number of execution units available.
// ///
// /// Another significant speedup is obtained by setting the 4 bytes by indexing pDWord (e.g. pDWord[i++]=w)
// /// instead of adjusting it dereferencing it (e.g. *pDWord++=w).
// ///
// /// Note that this routine requires the unsafe compilation flag to be specified and so is commented out by default.
// /// </summary>
// /// <param name="buffer"></param>
// public unsafe void NextBytesUnsafe(byte[] buffer)
// {
// if(buffer.Length % 8 != 0)
// throw new ArgumentException("Buffer length must be divisible by 8", "buffer");
//
// uint x=this.x, y=this.y, z=this.z, w=this.w;
//
// fixed(byte* pByte0 = buffer)
// {
// uint* pDWord = (uint*)pByte0;
// for(int i=0, len=buffer.Length>>2; i < len; i+=2)
// {
// uint t=(x^(x<<11));
// x=y; y=z; z=w;
// pDWord[i] = w = (w^(w>>19))^(t^(t>>8));
//
// t=(x^(x<<11));
// x=y; y=z; z=w;
// pDWord[i+1] = w = (w^(w>>19))^(t^(t>>8));
// }
// }
//
// this.x=x; this.y=y; this.z=z; this.w=w;
// }
#endregion
#region Public Methods [Methods not present on System.Random]
/// <summary>
/// Generates a uint. Values returned are over the full range of a uint,
/// uint.MinValue to uint.MaxValue, inclusive.
///
/// This is the fastest method for generating a single random number because the underlying
/// random number generator algorithm generates 32 random bits that can be cast directly to
/// a uint.
/// </summary>
/// <returns></returns>
public uint NextUInt() {
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
return (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8)));
}
/// <summary>
/// Generates a random int over the range 0 to int.MaxValue, inclusive.
/// This method differs from Next() only in that the range is 0 to int.MaxValue
/// and not 0 to int.MaxValue-1.
///
/// The slight difference in range means this method is slightly faster than Next()
/// but is not functionally equivalent to System.Random.Next().
/// </summary>
/// <returns></returns>
public int NextInt() {
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
return (int)(0x7FFFFFFF & (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8))));
}
// Buffer 32 bits in bitBuffer, return 1 at a time, keep track of how many have been returned
// with bitBufferIdx.
uint bitBuffer;
uint bitMask = 1;
/// <summary>
/// Generates a single random bit.
/// This method's performance is improved by generating 32 bits in one operation and storing them
/// ready for future calls.
/// </summary>
/// <returns></returns>
public bool NextBool() {
if (bitMask == 1) {
// Generate 32 more bits.
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
bitBuffer = w = (w ^ (w >> 19)) ^ (t ^ (t >> 8));
// Reset the bitMask that tells us which bit to read next.
bitMask = 0x80000000;
return (bitBuffer & bitMask) == 0;
}
return (bitBuffer & (bitMask >>= 1)) == 0;
}
#endregion
}
The test scenario:
public delegate string RollDelegate();
private void Test() {
List<string> rollMethodNames = new List<string>(){
"Large Lookup Fast Random UInt",
"Large Lookup Fast Random",
"Large Lookup Optimized Random",
"Fastest Optimized Random Modded",
"Numbers",
"Large Lookup Parameterless Random",
"Large Lookup",
"Lookup Optimized Modded",
"Fastest Optimized Modded",
"Optimized Modded Const",
"Optimized Modded",
"Modded",
"Simple",
"Another simple with HashSet",
"Another Simple",
"Option (Compiled) Regex",
"Regex",
"EndsWith",
};
List<RollDelegate> rollMethods = new List<RollDelegate>{
RollLargeLookupFastRandomUInt,
RollLargeLookupFastRandom,
RollLargeLookupOptimizedRandom,
FastestOptimizedRandomModded,
RollNumbers,
RollLargeLookupParameterlessRandom,
RollLargeLookup,
RollLookupOptimizedModded,
FastestOptimizedModded,
RollOptimizedModdedConst,
RollOptimizedModded,
RollModded,
RollSimple,
RollSimpleHashSet,
RollAnotherSimple,
RollOptionRegex,
RollRegex,
RollEndsWith
};
int trial = 10000000;
InitLargeLookup();
for (int k = 0; k < rollMethods.Count; ++k) {
rnd = new Random(10000);
fastRnd = new FastRandom(10000);
logBox.GetTimeLapse();
for (int i = 0; i < trial; ++i)
rollMethods[k]();
logBox.WriteTimedLogLine(rollMethodNames[k] + ": " + logBox.GetTimeLapse());
}
}
The result (Prefer 32-Bit):
[2016-05-30 08:20:54.056 UTC] Large Lookup Fast Random UInt: 219 ms
[2016-05-30 08:20:54.296 UTC] Large Lookup Fast Random: 238 ms
[2016-05-30 08:20:54.524 UTC] Large Lookup Optimized Random: 228 ms
[2016-05-30 08:20:54.810 UTC] Fastest Optimized Random Modded: 286 ms
[2016-05-30 08:20:55.347 UTC] Numbers: 537 ms
[2016-05-30 08:20:55.596 UTC] Large Lookup Parameterless Random: 248 ms
[2016-05-30 08:20:55.916 UTC] Large Lookup: 320 ms
[2016-05-30 08:20:56.231 UTC] Lookup Optimized Modded: 315 ms
[2016-05-30 08:20:56.577 UTC] Fastest Optimized Modded: 345 ms
[2016-05-30 08:20:57.049 UTC] Optimized Modded Const: 472 ms
[2016-05-30 08:20:57.521 UTC] Optimized Modded: 471 ms
[2016-05-30 08:20:58.017 UTC] Modded: 496 ms
[2016-05-30 08:20:59.685 UTC] Simple: 1668 ms
[2016-05-30 08:21:01.824 UTC] Another simple with HashSet: 2138 ms
[2016-05-30 08:21:04.837 UTC] Another Simple: 3013 ms
[2016-05-30 08:21:13.794 UTC] Option (Compiled) Regex: 8956 ms
[2016-05-30 08:21:28.827 UTC] Regex: 15032 ms
[2016-05-30 08:21:53.589 UTC] EndsWith: 24763 ms
The result (Non Prefer 32-Bit):
[2016-05-30 08:16:00.934 UTC] Large Lookup Fast Random UInt: 273 ms
[2016-05-30 08:16:01.230 UTC] Large Lookup Fast Random: 294 ms
[2016-05-30 08:16:01.503 UTC] Large Lookup Optimized Random: 273 ms
[2016-05-30 08:16:01.837 UTC] Fastest Optimized Random Modded: 333 ms
[2016-05-30 08:16:02.245 UTC] Numbers: 408 ms
[2016-05-30 08:16:02.532 UTC] Large Lookup Parameterless Random: 287 ms
[2016-05-30 08:16:02.816 UTC] Large Lookup: 284 ms
[2016-05-30 08:16:03.145 UTC] Lookup Optimized Modded: 329 ms
[2016-05-30 08:16:03.486 UTC] Fastest Optimized Modded: 340 ms
[2016-05-30 08:16:03.824 UTC] Optimized Modded Const: 337 ms
[2016-05-30 08:16:04.154 UTC] Optimized Modded: 330 ms
[2016-05-30 08:16:04.524 UTC] Modded: 370 ms
[2016-05-30 08:16:05.700 UTC] Simple: 1176 ms
[2016-05-30 08:16:07.309 UTC] Another simple with HashSet: 1609 ms
[2016-05-30 08:16:09.774 UTC] Another Simple: 2465 ms
[2016-05-30 08:16:17.450 UTC] Option (Compiled) Regex: 7675 ms
[2016-05-30 08:16:34.090 UTC] Regex: 16640 ms
[2016-05-30 08:16:54.793 UTC] EndsWith: 20702 ms
And the picture:

#StianStandahls friend here. This solution is fastest! It is the same as the previous fastest example in #Ians answer, but the random generator is optimized here.
private const string CONST_DOUBLES = "doubles";
private const string CONST_NONE = "none";
public string FastestOptimizedModded()
{
return (rnd.Next(99999)+1) % 100 % 11 == 0 ? CONST_DOUBLES : CONST_NONE;
}

As for the most performance, I believe #Ian already covered that quite nicely. All credits go to him.
One thing that isn't answered to my satisfaction in the Q/A is why Regex'es outperform EndsWith in this case. I felt the need to explain the difference so people realize what solution will probably work better in which scenario.
Endswith
The EndsWith functionality is basically a 'compare' on part of the string in sequential order. Something like this:
bool EndsWith(string haystack, string needle)
{
bool equal = haystack.Length >= needle.Length;
for (int i=0; i<needle.Length && equal; ++i)
{
equal = s[i] == needle[needle.Length - haystack.Length + i];
}
return equal;
}
The code is pretty straight forward; we simply take the first character, see if it matches, then the next, etc - until we hit the end of the string.
Regex
Regex'es work differently. Consider looking for the needle "foofoo" in a very large haystack. The obvious implementation is to at the first character, check if it's an 'f', move to the next character, etc. until we hit the end of the string. However, we can do much better:
Look closely at the task. If we would first look at character 5 of the string, and notice that it's not an 'o' (the last character), we can immediately skip to character 11 and again check if it's an 'o'. That way, we would get a nice improvement over our original code of a factor 6 in the best case and the same performance in the worst case.
Also note that regexes can become more complex with 'or's, 'and's, etc. Doing forward scans no longer makes a lot of sense if we only need to look at the trailing characters.
This is why Regex'es usually work with NFA's that are compiled to DFA's. There's a great online tool here: http://hackingoff.com/compilers/regular-expression-to-nfa-dfa that shows what this looks like (for simple regex'es).
Internally, you can ask .NET to compile a Regex using Reflection.Emit and when you use a regex, you actually evaluate this optimized, compiled state machine (RegexOptions.Compiled).
What you will probably end up with is something that works like this:
bool Matches(string haystack)
{
char _1;
int index = 0;
// match (.)
state0:
if (index >= haystack.Length)
{
goto stateFail;
}
_1 = haystack[index];
state = 1;
++index;
goto state1;
// match \1{1}
state1:
if (index >= haystack.Length)
{
goto stateFail;
}
if (_1 == haystack[index])
{
++index;
goto state2;
}
goto stateFail;
// match \1{2,*}$ -- usually optimized away because it always succeeds
state1:
if (index >= haystack.Length)
{
goto stateSuccess;
}
if (_1 == haystack[index])
{
++index;
goto state2;
}
goto stateSuccess;
stateSuccess:
return true;
stateFail:
return false;
}
So what's faster?
Well, that depends. There's overhead in determining the NFA/DFA from the expression, compiling the program and for each call looking up the program and evaluating it. For very simple cases, an EndsWith beats the Regex. In this case it's the 'OR's in the EndsWith that make it slower than the Regex.
On the other hand, a Regex is usually something you use multiple times, which means that you only have to compile it once, and simply look it up for each call.

Since at this moment the subject has been moved to Random method micro optimizations, I'll concentrate on LargeLookup implementations.
First of, the RollLargeLookupParameterlessRandom solution in addition to bias has another issue. All other implementations check random numbers in range [1, 99999] inclusive, i.e. total 99999 numbers while % 100000 generates range [0, 99999] inclusive, i.e. total 100000 numbers.
So let correct that and at the same time optimize a bit RollLargeLookup implementation by removing add operation:
private string[] LargeLookup;
private void InitLargeLookup()
{
LargeLookup = new string[99999];
for (int i = 0; i < LargeLookup.Length; i++)
{
LargeLookup[i] = (i + 1) % 100 % 11 == 0 ? "doubles" : "none";
}
}
public string RollLargeLookup()
{
return LargeLookup[rnd.Next(99999)];
}
public string RollLargeLookupParameterlessRandom()
{
return LargeLookup[rnd.Next() % 99999];
}
Now, can we optimize further the RollLargeLookupParameterlessRandom implementation and at the same time remove the forementioned bias issue and make it compatible with the other implementations? It turns out that we can. In order to do that again we need to know the Random.Next(maxValue) implementation which is something like this:
return (int)((Next() * (1.0 / int.MaxValue)) * maxValue);
Note that 1.0 / int.MaxValue is a constant evaluated at compile time. The idea is to replace 1.0 with maxValue (also constant 99999 in our case), thus eliminating one multiplication. So the resulting function is:
public string RollLargeLookupOptimizedRandom()
{
return LargeLookup[(int)(rnd.Next() * (99999.0 / int.MaxValue))];
}
Interestingly, this not only fixes the RollLargeLookupParameterlessRandom issues but also is a little bit faster.
Actually this optimization can be applied to any of the other solutions, so the fastest non lookup implementation would be:
public string FastestOptimizedRandomModded()
{
return ((int)(rnd.Next() * (99999.0 / int.MaxValue)) + 1) % 100 % 11 == 0 ? CONST_DOUBLES : CONST_NONE;
}
But before showing the performance tests, let prove that the result is compatible with Random.Next(maxValue) implementation:
for (int n = 0; n < int.MaxValue; n++)
{
var n1 = (int)((n * (1.0 / int.MaxValue)) * 99999);
var n2 = (int)(n * (99999.0 / int.MaxValue));
Debug.Assert(n1 == n2);
}
Finally, my benchmarks:
64 OS, Release build, Prefer 32 bit = True
Large Lookup Optimized Random: 149 ms
Large Lookup Parameterless Random: 159 ms
Large Lookup: 179 ms
Lookup Optimized Modded: 231 ms
Fastest Optimized Random Modded: 219 ms
Fastest Optimized Modded: 251 ms
Optimized Modded Const: 412 ms
Optimized Modded: 416 ms
Modded: 419 ms
Simple: 1343 ms
Another simple with HashSet: 1805 ms
Another Simple: 2690 ms
Option (Compiled) Regex: 8538 ms
Regex: 14861 ms
EndsWith: 39117 ms
64 OS, Release build, Prefer 32 bit = False
Large Lookup Optimized Random: 121 ms
Large Lookup Parameterless Random: 126 ms
Large Lookup: 156 ms
Lookup Optimized Modded: 168 ms
Fastest Optimized Random Modded: 154 ms
Fastest Optimized Modded: 186 ms
Optimized Modded Const: 178 ms
Optimized Modded: 180 ms
Modded: 202 ms
Simple: 795 ms
Another simple with HashSet: 1287 ms
Another Simple: 2178 ms
Option (Compiled) Regex: 7246 ms
Regex: 17090 ms
EndsWith: 36554 ms

A bit more perfomance could be squeezed out if pregenerate whole lookup table for all possible values. This will avoid two modulo divisions in the fastest method and so will be a bit faster:
private string[] LargeLookup;
private void Init() {
LargeLookup = new string[100000];
for (int i = 0; i < 100000; i++) {
LargeLookup[i] = i%100%11 == 0 ? "doubles" : "none";
}
}
And the method itself is then just:
public string RollLargeLookup() {
return LargeLookup[rnd.Next(99999) + 1];
}
While looking somewhat contrieved - such methods are often used. For example fastest known poker hand evaluator pregenerates huge array with hunders of thousands of entries (with very clever tricks) and then just makes several simple lookups on this array to evaluate strength of one poker hand over another in no time.
You can make even faster still, by using alternative random number generators. For example if you replace System.Random with this FastRandom class implementation (based on xorshift algorithm) - it will be twice as fast.
If implement both large lookup table and FastRandom - on my computer it shows 100ms vs 220ms of RollLookupOptimizedModded.
Here is the source code of FastRandom class mentioned in my link above:
public class FastRandom
{
// The +1 ensures NextDouble doesn't generate 1.0
const double REAL_UNIT_INT = 1.0 / ((double)int.MaxValue + 1.0);
const double REAL_UNIT_UINT = 1.0 / ((double)uint.MaxValue + 1.0);
const uint Y = 842502087, Z = 3579807591, W = 273326509;
uint x, y, z, w;
#region Constructors
/// <summary>
/// Initialises a new instance using time dependent seed.
/// </summary>
public FastRandom()
{
// Initialise using the system tick count.
Reinitialise((int)Environment.TickCount);
}
/// <summary>
/// Initialises a new instance using an int value as seed.
/// This constructor signature is provided to maintain compatibility with
/// System.Random
/// </summary>
public FastRandom(int seed)
{
Reinitialise(seed);
}
#endregion
#region Public Methods [Reinitialisation]
/// <summary>
/// Reinitialises using an int value as a seed.
/// </summary>
/// <param name="seed"></param>
public void Reinitialise(int seed)
{
// The only stipulation stated for the xorshift RNG is that at least one of
// the seeds x,y,z,w is non-zero. We fulfill that requirement by only allowing
// resetting of the x seed
x = (uint)seed;
y = Y;
z = Z;
w = W;
}
#endregion
#region Public Methods [System.Random functionally equivalent methods]
/// <summary>
/// Generates a random int over the range 0 to int.MaxValue-1.
/// MaxValue is not generated in order to remain functionally equivalent to System.Random.Next().
/// This does slightly eat into some of the performance gain over System.Random, but not much.
/// For better performance see:
///
/// Call NextInt() for an int over the range 0 to int.MaxValue.
///
/// Call NextUInt() and cast the result to an int to generate an int over the full Int32 value range
/// including negative values.
/// </summary>
/// <returns></returns>
public int Next()
{
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
w = (w ^ (w >> 19)) ^ (t ^ (t >> 8));
// Handle the special case where the value int.MaxValue is generated. This is outside of
// the range of permitted values, so we therefore call Next() to try again.
uint rtn = w & 0x7FFFFFFF;
if (rtn == 0x7FFFFFFF)
return Next();
return (int)rtn;
}
/// <summary>
/// Generates a random int over the range 0 to upperBound-1, and not including upperBound.
/// </summary>
/// <param name="upperBound"></param>
/// <returns></returns>
public int Next(int upperBound)
{
if (upperBound < 0)
throw new ArgumentOutOfRangeException("upperBound", upperBound, "upperBound must be >=0");
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
// The explicit int cast before the first multiplication gives better performance.
// See comments in NextDouble.
return (int)((REAL_UNIT_INT * (int)(0x7FFFFFFF & (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8))))) * upperBound);
}
/// <summary>
/// Generates a random int over the range lowerBound to upperBound-1, and not including upperBound.
/// upperBound must be >= lowerBound. lowerBound may be negative.
/// </summary>
/// <param name="lowerBound"></param>
/// <param name="upperBound"></param>
/// <returns></returns>
public int Next(int lowerBound, int upperBound)
{
if (lowerBound > upperBound)
throw new ArgumentOutOfRangeException("upperBound", upperBound, "upperBound must be >=lowerBound");
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
// The explicit int cast before the first multiplication gives better performance.
// See comments in NextDouble.
int range = upperBound - lowerBound;
if (range < 0)
{ // If range is <0 then an overflow has occured and must resort to using long integer arithmetic instead (slower).
// We also must use all 32 bits of precision, instead of the normal 31, which again is slower.
return lowerBound + (int)((REAL_UNIT_UINT * (double)(w = (w ^ (w >> 19)) ^ (t ^ (t >> 8)))) * (double)((long)upperBound - (long)lowerBound));
}
// 31 bits of precision will suffice if range<=int.MaxValue. This allows us to cast to an int and gain
// a little more performance.
return lowerBound + (int)((REAL_UNIT_INT * (double)(int)(0x7FFFFFFF & (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8))))) * (double)range);
}
/// <summary>
/// Generates a random double. Values returned are from 0.0 up to but not including 1.0.
/// </summary>
/// <returns></returns>
public double NextDouble()
{
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
// Here we can gain a 2x speed improvement by generating a value that can be cast to
// an int instead of the more easily available uint. If we then explicitly cast to an
// int the compiler will then cast the int to a double to perform the multiplication,
// this final cast is a lot faster than casting from a uint to a double. The extra cast
// to an int is very fast (the allocated bits remain the same) and so the overall effect
// of the extra cast is a significant performance improvement.
//
// Also note that the loss of one bit of precision is equivalent to what occurs within
// System.Random.
return (REAL_UNIT_INT * (int)(0x7FFFFFFF & (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8)))));
}
/// <summary>
/// Fills the provided byte array with random bytes.
/// This method is functionally equivalent to System.Random.NextBytes().
/// </summary>
/// <param name="buffer"></param>
public void NextBytes(byte[] buffer)
{
// Fill up the bulk of the buffer in chunks of 4 bytes at a time.
uint x = this.x, y = this.y, z = this.z, w = this.w;
int i = 0;
uint t;
for (int bound = buffer.Length - 3; i < bound;)
{
// Generate 4 bytes.
// Increased performance is achieved by generating 4 random bytes per loop.
// Also note that no mask needs to be applied to zero out the higher order bytes before
// casting because the cast ignores thos bytes. Thanks to Stefan Troschьtz for pointing this out.
t = (x ^ (x << 11));
x = y; y = z; z = w;
w = (w ^ (w >> 19)) ^ (t ^ (t >> 8));
buffer[i++] = (byte)w;
buffer[i++] = (byte)(w >> 8);
buffer[i++] = (byte)(w >> 16);
buffer[i++] = (byte)(w >> 24);
}
// Fill up any remaining bytes in the buffer.
if (i < buffer.Length)
{
// Generate 4 bytes.
t = (x ^ (x << 11));
x = y; y = z; z = w;
w = (w ^ (w >> 19)) ^ (t ^ (t >> 8));
buffer[i++] = (byte)w;
if (i < buffer.Length)
{
buffer[i++] = (byte)(w >> 8);
if (i < buffer.Length)
{
buffer[i++] = (byte)(w >> 16);
if (i < buffer.Length)
{
buffer[i] = (byte)(w >> 24);
}
}
}
}
this.x = x; this.y = y; this.z = z; this.w = w;
}
// /// <summary>
// /// A version of NextBytes that uses a pointer to set 4 bytes of the byte buffer in one operation
// /// thus providing a nice speedup. The loop is also partially unrolled to allow out-of-order-execution,
// /// this results in about a x2 speedup on an AMD Athlon. Thus performance may vary wildly on different CPUs
// /// depending on the number of execution units available.
// ///
// /// Another significant speedup is obtained by setting the 4 bytes by indexing pDWord (e.g. pDWord[i++]=w)
// /// instead of adjusting it dereferencing it (e.g. *pDWord++=w).
// ///
// /// Note that this routine requires the unsafe compilation flag to be specified and so is commented out by default.
// /// </summary>
// /// <param name="buffer"></param>
// public unsafe void NextBytesUnsafe(byte[] buffer)
// {
// if(buffer.Length % 8 != 0)
// throw new ArgumentException("Buffer length must be divisible by 8", "buffer");
//
// uint x=this.x, y=this.y, z=this.z, w=this.w;
//
// fixed(byte* pByte0 = buffer)
// {
// uint* pDWord = (uint*)pByte0;
// for(int i=0, len=buffer.Length>>2; i < len; i+=2)
// {
// uint t=(x^(x<<11));
// x=y; y=z; z=w;
// pDWord[i] = w = (w^(w>>19))^(t^(t>>8));
//
// t=(x^(x<<11));
// x=y; y=z; z=w;
// pDWord[i+1] = w = (w^(w>>19))^(t^(t>>8));
// }
// }
//
// this.x=x; this.y=y; this.z=z; this.w=w;
// }
#endregion
#region Public Methods [Methods not present on System.Random]
/// <summary>
/// Generates a uint. Values returned are over the full range of a uint,
/// uint.MinValue to uint.MaxValue, inclusive.
///
/// This is the fastest method for generating a single random number because the underlying
/// random number generator algorithm generates 32 random bits that can be cast directly to
/// a uint.
/// </summary>
/// <returns></returns>
public uint NextUInt()
{
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
return (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8)));
}
/// <summary>
/// Generates a random int over the range 0 to int.MaxValue, inclusive.
/// This method differs from Next() only in that the range is 0 to int.MaxValue
/// and not 0 to int.MaxValue-1.
///
/// The slight difference in range means this method is slightly faster than Next()
/// but is not functionally equivalent to System.Random.Next().
/// </summary>
/// <returns></returns>
public int NextInt()
{
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
return (int)(0x7FFFFFFF & (w = (w ^ (w >> 19)) ^ (t ^ (t >> 8))));
}
// Buffer 32 bits in bitBuffer, return 1 at a time, keep track of how many have been returned
// with bitBufferIdx.
uint bitBuffer;
uint bitMask = 1;
/// <summary>
/// Generates a single random bit.
/// This method's performance is improved by generating 32 bits in one operation and storing them
/// ready for future calls.
/// </summary>
/// <returns></returns>
public bool NextBool()
{
if (bitMask == 1)
{
// Generate 32 more bits.
uint t = (x ^ (x << 11));
x = y; y = z; z = w;
bitBuffer = w = (w ^ (w >> 19)) ^ (t ^ (t >> 8));
// Reset the bitMask that tells us which bit to read next.
bitMask = 0x80000000;
return (bitBuffer & bitMask) == 0;
}
return (bitBuffer & (bitMask >>= 1)) == 0;
}
#endregion
}
Then you need to initialize it together with your Random:
Random rnd = new Random(10000);
FastRandom fastRnd = new FastRandom(10000);
And method becomes:
public string RollLargeLookup() {
return LargeLookup[fastRnd.Next(99999) + 1];
}

As several others have already pointed out string comparisons for numbers are not efficient.
public static String RollNumbers()
{
int roll = rnd.Next(1, 100000);
int lastDigit = roll % 10;
int secondLastDigit = (roll / 10) % 10;
if( lastDigit == secondLastDigit )
{
return "doubles";
}
else
{
return "none";
}
}
That will run on my machine in 50ms vs the 1200ms of the original approach. Most time is spent on allocating many small temporary objects. If you can you should get rid of strings in the first place. If that is your hot code path it can help to convert your data structures into something which is more expensive to create but very cheap to query. Lookup tables which have been shown here are a good start.
If you look closely to the LargeLookup implementation you will find that most of its good performance is because it does cheat by not using a string as key but it uses the inital random number with some calculations as index.
If you try my solution it will most likely run faster because lookup tables tend to have bad cache coherency which makes memory accesses more expensive.

The fastest that I was able to achieve is optimizing the use of Random with the large lookup method:
return LargeLookup[rnd.Next() % 100000];
And it runs 20% faster than the original since it avoid a division (look at Next() code vs Next(int maxValue)).
Looking for real fairnessIMHO, I changed a bit the way the method where tested.
TL;DR; here's the dasboard:
|-----------------Name---------------|--Avg--|--Min--|---Max---|
|------------------------------------|-------|-------|---------|
|RollLargeLookup | 108| 122| 110,2|
|RollLookupOptimizedModded | 141| 156| 145,5|
|RollOptimizedModdedConst | 156| 159| 156,7|
|RollOptimizedModded | 158| 163| 159,8|
|RollNumbers | 197| 214| 200,9|
|RollSimple | 1 242| 1 304| 1 260,8|
|RollSimpleHashSet | 1 635| 1 774| 1 664,6|
|RollAnotherSimple | 2 544| 2 732| 2 603,2|
|RollOptionRegex | 9 137| 9 605| 9 300,6|
|RollRegex | 17 510| 18 873| 17 959 |
|RollEndsWith | 20 725| 22 001| 21 196,1|
I changed a few points:
Pre-computed the numbers to test so each method were tested with the same set of numbers (taking out the random generation war and the biais I introduced);
Runned each method 10 times in a random order;
Introduced a parameter in each function;
Removed the dupes.
I created a class MethodToTest:
public class MethodToTest
{
public delegate string RollDelegate(int number);
public RollDelegate MethodDelegate { get; set; }
public List<long> timeSpent { get; set; }
public MethodToTest()
{
timeSpent = new List<long>();
}
public string TimeStats()
{
return string.Format("Min: {0}ms, Max: {1}ms, Avg: {2}ms", timeSpent.Min(), timeSpent.Max(),
timeSpent.Average());
}
}
Here's the main content:
private static void Test()
{
List<MethodToTest> methodList = new List<MethodToTest>
{
new MethodToTest{ MethodDelegate = RollNumbers},
new MethodToTest{ MethodDelegate = RollLargeLookup},
new MethodToTest{ MethodDelegate = RollLookupOptimizedModded},
new MethodToTest{ MethodDelegate = RollOptimizedModdedConst},
new MethodToTest{ MethodDelegate = RollOptimizedModded},
new MethodToTest{ MethodDelegate = RollSimple},
new MethodToTest{ MethodDelegate = RollSimpleHashSet},
new MethodToTest{ MethodDelegate = RollAnotherSimple},
new MethodToTest{ MethodDelegate = RollOptionRegex},
new MethodToTest{ MethodDelegate = RollRegex},
new MethodToTest{ MethodDelegate = RollEndsWith},
};
InitLargeLookup();
Stopwatch s = new Stopwatch();
Random rnd = new Random();
List<int> Randoms = new List<int>();
const int trial = 10000000;
const int numberOfLoop = 10;
for (int j = 0; j < numberOfLoop; j++)
{
Console.Out.WriteLine("Loop: " + j);
Randoms.Clear();
for (int i = 0; i < trial; ++i)
Randoms.Add(rnd.Next(1, 100000));
// Shuffle order
foreach (MethodToTest method in methodList.OrderBy(m => new Random().Next()))
{
s.Restart();
for (int i = 0; i < trial; ++i)
method.MethodDelegate(Randoms[i]);
method.timeSpent.Add(s.ElapsedMilliseconds);
Console.Out.WriteLine("\tMethod: " +method.MethodDelegate.Method.Name);
}
}
File.WriteAllLines(#"C:\Users\me\Desktop\out.txt", methodList.OrderBy(m => m.timeSpent.Average()).Select(method => method.MethodDelegate.Method.Name + ": " + method.TimeStats()));
}
And here are the functions:
//OP's Solution 2
public static String RollRegex(int number)
{
return Regex.IsMatch(number.ToString(), #"(.)\1{1,}$") ? "doubles" : "none";
}
//Radin Gospodinov's Solution
static readonly Regex OptionRegex = new Regex(#"(.)\1{1,}$", RegexOptions.Compiled);
public static String RollOptionRegex(int number)
{
return OptionRegex.IsMatch(number.ToString()) ? "doubles" : "none";
}
//OP's Solution 1
public static String RollEndsWith(int number)
{
if (number.ToString().EndsWith("11") || number.ToString().EndsWith("22") || number.ToString().EndsWith("33") ||
number.ToString().EndsWith("44") || number.ToString().EndsWith("55") || number.ToString().EndsWith("66") ||
number.ToString().EndsWith("77") || number.ToString().EndsWith("88") || number.ToString().EndsWith("99") ||
number.ToString().EndsWith("00"))
{
return "doubles";
}
return "none";
}
//Ian's Solution
public static String RollSimple(int number)
{
string rollString = number.ToString();
return number > 10 && rollString[rollString.Length - 1] == rollString[rollString.Length - 2] ?
"doubles" : "none";
}
//Ian's Other Solution
static List<string> doubles = new List<string>() { "00", "11", "22", "33", "44", "55", "66", "77", "88", "99" };
public static String RollAnotherSimple(int number)
{
string rollString = number.ToString();
return rollString.Length > 1 && doubles.Contains(rollString.Substring(rollString.Length - 2)) ?
"doubles" : "none";
}
//Dandré's Solution
static HashSet<string> doublesHashset = new HashSet<string>() { "00", "11", "22", "33", "44", "55", "66", "77", "88", "99" };
public static String RollSimpleHashSet(int number)
{
string rollString = number.ToString();
return rollString.Length > 1 && doublesHashset.Contains(rollString.Substring(rollString.Length - 2)) ?
"doubles" : "none";
}
//Stian Standahl optimizes modded solution
public static string RollOptimizedModded(int number) { return number % 100 % 11 == 0 ? "doubles" : "none"; }
//Gjermund Grøneng's method with constant addition
private const string CONST_DOUBLES = "doubles";
private const string CONST_NONE = "none";
public static string RollOptimizedModdedConst(int number) { return number % 100 % 11 == 0 ? CONST_DOUBLES : CONST_NONE; }
//Corak's Solution, added on Gjermund Grøneng's
private static readonly string[] Lookup = { "doubles", "none", "none", "none", "none", "none", "none", "none", "none", "none", "none" };
public static string RollLookupOptimizedModded(int number) { return Lookup[number % 100 % 11]; }
//Evk's Solution, large Lookup
private static string[] LargeLookup;
private static void InitLargeLookup()
{
LargeLookup = new string[100000];
for (int i = 0; i < 100000; i++)
{
LargeLookup[i] = i % 100 % 11 == 0 ? "doubles" : "none";
}
}
public static string RollLargeLookup(int number) { return LargeLookup[number]; }
//Alois Kraus's Solution
public static string RollNumbers(int number)
{
int lastDigit = number % 10;
int secondLastDigit = (number / 10) % 10;
return lastDigit == secondLastDigit ? "doubles" : "none";
}

Comparing bits efficiently ( overlap set of x )

I want to compare a stream of bits of arbitrary length to a mask in c# and return a ratio of how many bits were the same.
The mask to check against is anywhere between 2 bits long to 8k (with 90% of the masks being 5 bits long), the input can be anywhere between 2 bits up to ~ 500k, with an average input string of 12k (but yeah, most of the time it will be comparing 5 bits with the first 5 bits of that 12k)
Now my naive implementation would be something like this:
bool[] mask = new[] { true, true, false, true };
float dendrite(bool[] input) {
int correct = 0;
for ( int i = 0; i<mask.length; i++ ) {
if ( input[i] == mask[i] )
correct++;
}
return (float)correct/(float)mask.length;
}
but I expect this is better handled (more efficient) with some kind of binary operator magic?
Anyone got any pointers?
EDIT: the datatype is not fixed at this point in my design, so if ints or bytearrays work better, I'd also be a happy camper, trying to optimize for efficiency here, the faster the computation, the better.
eg if you can make it work like this:
int[] mask = new[] { 1, 1, 0, 1 };
float dendrite(int[] input) {
int correct = 0;
for ( int i = 0; i<mask.length; i++ ) {
if ( input[i] == mask[i] )
correct++;
}
return (float)correct/(float)mask.length;
}
or this:
int mask = 13; //1101
float dendrite(int input) {
return // your magic here;
} // would return 0.75 for an input
// of 101 given ( 1100101 in binary,
// matches 3 bits of the 4 bit mask == .75
ANSWER:
I ran each proposed answer against each other and Fredou's and Marten's solution ran neck to neck but Fredou submitted the fastest leanest implementation in the end. Of course since the average result varies quite wildly between implementations I might have to revisit this post later on. :) but that's probably just me messing up in my test script. ( i hope, too late now, going to bed =)
sparse1.Cyclone
1317ms 3467107ticks 10000iterations
result: 0,7851563
sparse1.Marten
288ms 759362ticks 10000iterations
result: 0,05066964
sparse1.Fredou
216ms 568747ticks 10000iterations
result: 0,8925781
sparse1.Marten
296ms 778862ticks 10000iterations
result: 0,05066964
sparse1.Fredou
216ms 568601ticks 10000iterations
result: 0,8925781
sparse1.Marten
300ms 789901ticks 10000iterations
result: 0,05066964
sparse1.Cyclone
1314ms 3457988ticks 10000iterations
result: 0,7851563
sparse1.Fredou
207ms 546606ticks 10000iterations
result: 0,8925781
sparse1.Marten
298ms 786352ticks 10000iterations
result: 0,05066964
sparse1.Cyclone
1301ms 3422611ticks 10000iterations
result: 0,7851563
sparse1.Marten
292ms 769850ticks 10000iterations
result: 0,05066964
sparse1.Cyclone
1305ms 3433320ticks 10000iterations
result: 0,7851563
sparse1.Fredou
209ms 551178ticks 10000iterations
result: 0,8925781
( testscript copied here, if i destroyed yours modifying it lemme know. https://dotnetfiddle.net/h9nFSa )

how about this one - dotnetfiddle example
using System;
namespace ConsoleApplication1
{
public class Program
{
public static void Main(string[] args)
{
int a = Convert.ToInt32("0001101", 2);
int b = Convert.ToInt32("1100101", 2);
Console.WriteLine(dendrite(a, 4, b));
}
private static float dendrite(int mask, int len, int input)
{
return 1 - getBitCount(mask ^ (input & (int.MaxValue >> 32 - len))) / (float)len;
}
private static int getBitCount(int bits)
{
bits = bits - ((bits >> 1) & 0x55555555);
bits = (bits & 0x33333333) + ((bits >> 2) & 0x33333333);
return ((bits + (bits >> 4) & 0xf0f0f0f) * 0x1010101) >> 24;
}
}
}
64 bits one here - dotnetfiddler
using System;
namespace ConsoleApplication1
{
public class Program
{
public static void Main(string[] args)
{
// 1
ulong a = Convert.ToUInt64("0000000000000000000000000000000000000000000000000000000000001101", 2);
ulong b = Convert.ToUInt64("1110010101100101011001010110110101100101011001010110010101100101", 2);
Console.WriteLine(dendrite(a, 4, b));
}
private static float dendrite(ulong mask, int len, ulong input)
{
return 1 - getBitCount(mask ^ (input & (ulong.MaxValue >> (64 - len)))) / (float)len;
}
private static ulong getBitCount(ulong bits)
{
bits = bits - ((bits >> 1) & 0x5555555555555555UL);
bits = (bits & 0x3333333333333333UL) + ((bits >> 2) & 0x3333333333333333UL);
return unchecked(((bits + (bits >> 4)) & 0xF0F0F0F0F0F0F0FUL) * 0x101010101010101UL) >> 56;
}
}
}

I came up with this code:
static float dendrite(ulong input, ulong mask)
{
// get bits that are same (0 or 1) in input and mask
ulong samebits = mask & ~(input ^ mask);
// count number of same bits
int correct = cardinality(samebits);
// count number of bits in mask
int inmask = cardinality(mask);
// compute fraction (0.0 to 1.0)
return inmask == 0 ? 0f : correct / (float)inmask;
}
// this is a little hack to count the number of bits set to one in a 64-bit word
static int cardinality(ulong word)
{
const ulong mult = 0x0101010101010101;
const ulong mask1h = (~0UL) / 3 << 1;
const ulong mask2l = (~0UL) / 5;
const ulong mask4l = (~0UL) / 17;
word -= (mask1h & word) >> 1;
word = (word & mask2l) + ((word >> 2) & mask2l);
word += word >> 4;
word &= mask4l;
return (int)((word * mult) >> 56);
}
This will check 64-bits at a time. If you need more than that you can just split the input data into 64-bit words and compare them one by one and compute the average result.
Here's a .NET fiddle with the code and a working test case:
https://dotnetfiddle.net/5hYFtE

I would change the code to something along these lines:
// hardcoded bitmask
byte mask = 255;
float dendrite(byte input) {
int correct = 0;
// store the xor:ed result
byte xored = input ^ mask;
// loop through each bit
for(int i = 0; i < 8; i++) {
// if the bit is 0 then it was correct
if(!(xored & (1 << i)))
correct++;
}
return (float)correct/(float)mask.length;
}
The above uses a mask and input of 8 bits, but of course you could modify this to use a 4 byte integer and so on.
Not sure if this will work as expected, but it might give you some clues on how to proceed.
For example if you only would like to check the first 4 bits you could change the code to something like:
float dendrite(byte input) {
// hardcoded bitmask i.e 1101
byte mask = 13;
// number of bits to check
byte bits = 4;
int correct = 0;
// store the xor:ed result
byte xored = input ^ mask;
// loop through each bit, notice that we only checking the first 4 bits
for(int i = 0; i < bits; i++) {
// if the bit is 0 then it was correct
if(!(xored & (1 << i)))
correct++;
}
return (float)correct/(float)bits;
}
Of course it might be faster to actually use a int instead of a byte.

Optimize bit reader for ReadInt on datastream

Could anyone help me optimize this piece of code? Its currently a large bottleneck as it gets called very often. Even a 25% speed improvement would be significant.
public int ReadInt(int length)
{
if (Position + length > Length)
throw new BitBufferException("Not enough bits remaining.");
int result = 0;
while (length > 0)
{
int off = Position & 7;
int count = 8 - off;
if (count > length)
count = length;
int mask = (1 << count) - 1;
int bits = (Data[Position >> 3] >> off);
result |= (bits & mask) << (length - count);
length -= count;
Position += count;
}
return result;
}
Best answer would go to fastest solution. Benchmarks done with dottrace. Currently this block of code takes up about 15% of the total cpu time. Lowest number wins best answer.
EDIT: Sample usage:
public class Auth : Packet
{
int Field0;
int ProtocolHash;
int Field1;
public override void Parse(buffer)
{
Field0 = buffer.ReadInt(9);
ProtocolHash = buffer.ReadInt(32);
Field1 = buffer.ReadInt(8);
}
}
Size of Data is variable but in most cases 512 bytes;

How about using pointers and unsafe context? You didn't say anything about your input data, method context, etc. so I tried to deduct all of these by myself.
public class BitTest
{
private int[] _data;
public BitTest(int[] data)
{
Length = data.Length * 4 * 8;
// +2, because we use byte* and long* later
// and don't want to read outside the array memory
_data = new int[data.Length + 2];
Array.Copy(data, _data, data.Length);
}
public int Position { get; private set; }
public int Length { get; private set; }
and ReadInt method. Hope comments give a little light on the solution:
public unsafe int ReadInt(int length)
{
if (Position + length > Length)
throw new ArgumentException("Not enough bits remaining.");
// method returns int, so getting more then 32 bits is pointless
if (length > 4 * 8)
throw new ArgumentException();
//
int bytePosition = Position / 8;
int bitPosition = Position % 8;
Position += length;
// get int* on array to start with
fixed (int* array = _data)
{
// change pointer to byte*
byte* bt = (byte*)array;
// skip already read bytes and change pointer type to long*
long* ptr = (long*)(bt + bytePosition);
// read value from current pointer position
long value = *ptr;
// take only necessary bits
value &= (1L << (length + bitPosition)) - 1;
value >>= bitPosition;
// cast value to int before returning
return (int)value;
}
}
}
I didn't test the method, but would bet it's much faster then your approach.
My simple test code:
var data = new[] { 1 | (1 << 8 + 1) | (1 << 16 + 2) | (1 << 24 + 3) };
var test = new BitTest(data);
var bytes = Enumerable.Range(0, 4)
.Select(x => test.ReadInt(8))
.ToArray();
bytes contains { 1, 2, 4, 8}, as expected.

I Don't know if this give you a significant improvements but it should give you some numbers.
Instead of creating new int variables inside the loop (this requires a time to create) let reserved those variables before entering the loop.
public int ReadInt(int length)
{
if (Position + length > Length)
throw new BitBufferException("Not enough bits remaining.");
int result = 0;
int off = 0;
int count = 0;
int mask = 0;
int bits = 0
while (length > 0)
{
off = Position & 7;
count = 8 - off;
if (count > length)
count = length;
mask = (1 << count) - 1;
bits = (Data[Position >> 3] >> off);
result |= (bits & mask) << (length - count);
length -= count;
Position += count;
}
return result;
}
HOPE THIS increase your performance even a bit

Read bit range from byte array

I am looking for a method that will enable me to get a range of bits. For example if I have the binary data
0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 (2 bytes)
I might need to get data from range bit 3 to 9. In other words I would be interested in:
0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1
so in short I will like to construct the method:
byte[] Read(byte[] data, int left, int right){
// implementation
}
so that if I pass the data new byte[]{91,215}, 3, 9 I will get byte[]{122} (note byte 91 and 215 = 0 1 0 1 1 0 1 1 1 1 0 1 0 1 1 1 and byte 122 = 1 1 1 1 0 1 0 same binary data as the example.
It would be nice if I could use the << operator on byte arrays such as doing something like:
byte[] myArray = new byte[] { 1, 2, 3 };
var shift = myArray << 2;
If you are interested to know why I need this functionality:
I am creating a project on a board and often need to read and write values to the memory. The cdf, sfr, or ddf (refereed to as chip definition file) contains information about a particular chip. That file may look like:
; Name Zone Address Bytesize Displaybase Bitrange
; ---- ---- ------- -------- ----------- --------
sfr = "SPI0_CONTROL" , "Memory", 0x40001000, 4, base=16
sfr = "SPI0_CONTROL.SPH" , "Memory", 0x40001000, 4, base=16, bitRange=25-25
sfr = "SPI0_CONTROL.SPO" , "Memory", 0x40001000, 4, base=16, bitRange=24-24
sfr = "SPI0_CONTROL.TXRXDFCOUNT" , "Memory", 0x40001000, 4, base=16, bitRange=8-23
sfr = "SPI0_CONTROL.INT_TXUR" , "Memory", 0x40001000, 4, base=16, bitRange=7-7
sfr = "SPI0_CONTROL.INT_RXOVF" , "Memory", 0x40001000, 4, base=16, bitRange=6-6
Since I am reading a lot of variables (sometimes 80 times per second) I will like to have an efficient algorithm. I guess my approach would be that if the bytesize is 8 then I will create a long from those 8 bytes then use the << and >> operators in order to get what I need. if the bytesize if 4 then I will create an int and use the << and >> operators but How will I do it if I need to read 16 bytes though? I guess my question should been how to implement the << and >> operators on custom struct types.

You need the BitArray class from System.Collections.

Looks like you could help a "bit stream". There is an implementation of such a concept here. Take a look, perhaps it fits your needs.

The BigInteger class in .NET 4+ takes a Byte[] in its constructor and has left and right shift operators.

It's been 10 years since this question and I could not yet find a simple C# implementation that extracts a range of bits from a byte array using bitwise operations.
Here's how you can do it using simple bitwise operations:
public static class ByteExtensions
{
public const int BitsPerByte = 8;
/// <summary>
/// Extracts a range of bits from a byte array into a new byte array.
/// </summary>
/// <param name="bytes">The byte array to extract the range from.</param>
/// <param name="start">The 0-based start index of the bit.</param>
/// <param name="length">The number of bits to extract.</param>
/// <returns>A new <see cref="byte"/> array with the extracted bits.</returns>
/// <exception cref="ArgumentOutOfRangeException">Thrown if <paramref name="start"/> or <paramref name="length"/> are out of range.</exception>
public static byte[] GetBitRange(this byte[] bytes, int start, int length)
{
// Calculate the length of the input byte array in bits
int maxLength = bytes.Length * BitsPerByte;
int end = start + length;
// Range validations
if (start >= maxLength || start < 0)
{
throw new ArgumentOutOfRangeException(nameof(start), start, $"Start must non-negative and lesser than {maxLength}");
}
if (length < 0)
{
throw new ArgumentOutOfRangeException(nameof(length), length, $"Length must be non-negative");
}
if (end > maxLength)
{
throw new ArgumentOutOfRangeException(nameof(length), length, $"Range length must be less than or equal to {maxLength}");
}
// Calculate the length of the new byte array and allocate
var (byteLength, remainderLength) = Math.DivRem(length, BitsPerByte);
byte[] result = new byte[byteLength + (remainderLength == 0 ? 0 : 1)];
// Iterate through each byte in the new byte array
for (int i = 0; i < result.Length; i++)
{
// Compute each of the 8 bits of the ith byte
// Stop if start index >= end index (rest of the bits in the current byte will be 0 by default)
for (int j = 0; j < BitsPerByte && start < end; j++)
{
var (byteIndex, bitIndex) = Math.DivRem(start++, BitsPerByte); // Note the increment(++) in start
int currentBitIndex = j;
result[i] |= (byte)(((bytes[byteIndex] >> bitIndex) & 1) << currentBitIndex);
}
}
return result;
}
}
Explanation
1. Allocate a new byte[] for the range
The GetBitRange(..) method above first (after validation) calculates the length of the new byte array(length in bytes) using the length parameter(length in bits) and allocates this array(result).
The outer loop iterates through each byte in result and the inner loop iterates through each of the 8 bits in the ith byte.
2. Extracting a bit from a byte
In the inner loop, the method calculates the index of the byte in the input array bytes which contains the bit indexed by start. It is the bitIndexth bit in the byteIndexth byte. To extract this bit, you perform the following operations:
int nextBit = (bytes[byteIndex] >> bitIndex) & 1;
Shift the bitIndexth bit in bytes[byteIndex] to the rightmost position so it is the least significant bit(LSB). And then perform a bitwise AND with 1. (A bitwise AND with 1 extracts only the LSB and makes the rest of the bits 0.)
3. Setting a bit in a byte
Now, nextBit is the next bit I need to add to my output byte array(result). Since, I'm currently working on the jth bit of my ith byte of result in my inner loop, I need to set that bit to nextBit. This is done as:
int currentBitIndex = j;
result[i] |= (byte) (nextBit << currentBitIndex);
Shift nextBit j times to the left (since I want to set the jth bit). Now to set the bit, I perform a bitwise OR between the shifted bit and result[i]. This sets the jth bit in result[i].
The steps 2 & 3 described above are implemented in the method as a single step:
result[i] |= (byte)(((bytes[byteIndex] >> bitIndex) & 1) << currentBitIndex);
Two important things to consider here:
Byte Endianness
Bit Numbering
Byte Endianness
The above implementation indexes into the input byte array in big endian order. So, as per the example in the question,
new byte[]{ 91 , 215 }.GetBitRange(3, 8)
does not return 122 but returns 235 instead. This is because the example in the question expects the answer in little-endian format.
To use little-endian format, a simple reversal of the output array does the job. Even better is to change byteIndex in the implementation:
byteIndex = bytes.Length - 1 - byteIndex;
Bit Numbering
Bit numbering determines if the least significant bit is the first bit in the byte (LSB 0) or the most significant bit is the first bit (MSB 0).
The implementation above assumes LSB0 bit numbering.
To use MSB0, change the bit indices as follows:
currentBitIndex = BitsPerByte - 1 - currentBitIndex;
bitIndex = BitsPerByte - 1 - bitIndex;
Extending for endianness and bit numbering
Here is the full method supporting both types of endianness as well as bit numbering:
public enum Endianness
{
BigEndian,
LittleEndian
}
public enum BitNumbering
{
Lsb0,
Msb0
}
public static class ByteExtensions
{
public const int BitsPerByte = 8;
public static byte[] GetBitRange(
this byte[] bytes,
int start,
int length,
Endianness endianness,
BitNumbering bitNumbering)
{
// Calculate the length of the input byte array in bits
int maxLength = bytes.Length * BitsPerByte;
int end = start + length;
// Range validations
if (start >= maxLength || start < 0)
{
throw new ArgumentOutOfRangeException(nameof(start), start, $"Start must non-negative and lesser than {maxLength}");
}
if (length < 0)
{
throw new ArgumentOutOfRangeException(nameof(length), length, $"Length must be non-negative");
}
if (end > maxLength)
{
throw new ArgumentOutOfRangeException(nameof(length), length, $"Range length must be less than or equal to {maxLength}");
}
// Calculate the length of the new byte array and allocate
var (byteLength, remainderLength) = Math.DivRem(length, BitsPerByte);
byte[] result = new byte[byteLength + (remainderLength == 0 ? 0 : 1)];
// Iterate through each byte in the new byte array
for (int i = 0; i < result.Length; i++)
{
// Compute each of the 8 bits of the ith byte
// Stop if start index >= end index (rest of the bits in the current byte will be 0 by default)
for (int j = 0; j < BitsPerByte && start < end; j++)
{
var (byteIndex, bitIndex) = Math.DivRem(start++, BitsPerByte); // Note the increment(++) in start
int currentBitIndex = j;
// Adjust for MSB 0
if (bitNumbering is BitNumbering.Msb0)
{
currentBitIndex = 7 - currentBitIndex;
bitIndex = 7 - bitIndex;
}
// Adjust for little-endian
if (endianness is Endianness.LittleEndian)
{
byteIndex = bytes.Length - 1 - byteIndex;
}
result[i] |= (byte)(((bytes[byteIndex] >> bitIndex) & 1) << currentBitIndex);
}
}
return result;
}
}

Counting common bits in a sequence of unsigned longs

I am looking for a faster algorithm than the below for the following. Given a sequence of 64-bit unsigned integers, return a count of the number of times each of the sixty-four bits is set in the sequence.
Example:
4608 = 0000000000000000000000000000000000000000000000000001001000000000
4097 = 0000000000000000000000000000000000000000000000000001000000000001
2048 = 0000000000000000000000000000000000000000000000000000100000000000
counts 0000000000000000000000000000000000000000000000000002101000000001
Example:
2560 = 0000000000000000000000000000000000000000000000000000101000000000
530 = 0000000000000000000000000000000000000000000000000000001000010010
512 = 0000000000000000000000000000000000000000000000000000001000000000
counts 0000000000000000000000000000000000000000000000000000103000010010
Currently I am using a rather obvious and naive approach:
static int bits = sizeof(ulong) * 8;
public static int[] CommonBits(params ulong[] values) {
int[] counts = new int[bits];
int length = values.Length;
for (int i = 0; i < length; i++) {
ulong value = values[i];
for (int j = 0; j < bits && value != 0; j++, value = value >> 1) {
counts[j] += (int)(value & 1UL);
}
}
return counts;
}

A small speed improvement might be achieved by first OR'ing the integers together, then using the result to determine which bits you need to check. You would still have to iterate over each bit, but only once over bits where there are no 1s, rather than values.Length times.

I'll direct you to the classical: Bit Twiddling Hacks, but your goal seems slightly different than just typical counting (i.e. your 'counts' variable is in a really weird format), but maybe it'll be useful.

The best I can do here is just get silly with it and unroll the inner-loop... seems to have cut the performance in half (roughly 4 seconds as opposed to the 8 in yours to process 100 ulongs 100,000 times)... I used a qick command-line app to generate the following code:
for (int i = 0; i < length; i++)
{
ulong value = values[i];
if (0ul != (value & 1ul)) counts[0]++;
if (0ul != (value & 2ul)) counts[1]++;
if (0ul != (value & 4ul)) counts[2]++;
//etc...
if (0ul != (value & 4611686018427387904ul)) counts[62]++;
if (0ul != (value & 9223372036854775808ul)) counts[63]++;
}
that was the best I can do... As per my comment, you'll waste some amount (I know not how much) running this in a 32-bit environment. If your that concerned over performance it may benefit you to first convert the data to uint.
Tough problem... may even benefit you to marshal it into C++ but that entirely depends on your application. Sorry I couldn't be more help, maybe someone else will see something I missed.
Update, a few more profiler sessions showing a steady 36% improvement. shrug I tried.

Ok let me try again :D
change each byte in 64 bit integer into 64 bit integer by shifting each bit by n*8 in lef
for instance
10110101 -> 0000000100000000000000010000000100000000000000010000000000000001
(use the lookup table for that translation)
Then just sum everything togeter in right way and you got array of unsigned chars whit integers.
You have to make 8*(number of 64bit integers) sumations
Code in c
//LOOKTABLE IS EXTERNAL and has is int64[256] ;
unsigned char* bitcounts(int64* int64array,int len)
{
int64* array64;
int64 tmp;
unsigned char* inputchararray;
array64=(int64*)malloc(64);
inputchararray=(unsigned char*)input64array;
for(int i=0;i<8;i++) array64[i]=0; //set to 0
for(int j=0;j<len;j++)
{
tmp=int64array[j];
for(int i=7;tmp;i--)
{
array64[i]+=LOOKUPTABLE[tmp&0xFF];
tmp=tmp>>8;
}
}
return (unsigned char*)array64;
}
This redcuce speed compared to naive implemetaton by factor 8, becuase it couts 8 bit at each time.
EDIT:
I fixed code to do faster break on smaller integers, but I am still unsure about endianess
And this works only on up to 256 inputs, becuase it uses unsigned char to store data in. If you have longer input string, you can change this code to hold up to 2^16 bitcounts and decrease spped by 2

const unsigned int BYTESPERVALUE = 64 / 8;
unsigned int bcount[BYTESPERVALUE][256];
memset(bcount, 0, sizeof bcount);
for (int i = values.length; --i >= 0; )
for (int j = BYTESPERVALUE ; --j >= 0; ) {
const unsigned int jth_byte = (values[i] >> (j * 8)) & 0xff;
bcount[j][jth_byte]++; // count byte value (0..255) instances
}
unsigned int count[64];
memset(count, 0, sizeof count);
for (int i = BYTESPERVALUE; --i >= 0; )
for (int j = 256; --j >= 0; ) // check each byte value instance
for (int k = 8; --k >= 0; ) // for each bit in a given byte
if (j & (1 << k)) // if bit was set, then add its count
count[i * 8 + k] += bcount[i][j];

Another approach that might be profitable, would be to build an array of 256 elements,
which encodes the actions that you need to take in incrementing the count array.
Here is a sample for a 4 element table, which does 2 bits instead of 8 bits.
int bitToSubscript[4][3] =
{
{0}, // No Bits set
{1,0}, // Bit 0 set
{1,1}, // Bit 1 set
{2,0,1} // Bit 0 and bit 1 set.
}
The algorithm then degenerates to:
pick the 2 right hand bits off of the number.
Use that as a small integer to index into the bitToSubscriptArray.
In that array, pull off the first integer. That is the number of elements in the count array, that you need to increment.
Based on that count, Iterate through the remainder of the row, incrementing count, based on the subscript you pull out of the bitToSubscript array.
Once that loop is done, shift your original number two bits to the right.... Rinse Repeat as needed.
Now there is one issue I ignored, in that description. The actual subscripts are relative. You need to keep track of where you are in the count array. Every time you loop, you add two to an offset. To That offset, you add the relative subscript from the bitToSubscript array.
It should be possible to scale up to the size you want, based on this small example. I would think that another program could be used, to generate the source code for the bitToSubscript array, so that it can be simply hard coded in your program.
There are other variation on this scheme, but I would expect it to run faster on average than anything that does it one bit at a time.
Good Hunting.
Evil.

I believe this should give a nice speed improvement:
const ulong mask = 0x1111111111111111;
public static int[] CommonBits(params ulong[] values)
{
int[] counts = new int[64];
ulong accum0 = 0, accum1 = 0, accum2 = 0, accum3 = 0;
int i = 0;
foreach( ulong v in values ) {
if (i == 15) {
for( int j = 0; j < 64; j += 4 ) {
counts[j] += ((int)accum0) & 15;
counts[j+1] += ((int)accum1) & 15;
counts[j+2] += ((int)accum2) & 15;
counts[j+3] += ((int)accum3) & 15;
accum0 >>= 4;
accum1 >>= 4;
accum2 >>= 4;
accum3 >>= 4;
}
i = 0;
}
accum0 += (v) & mask;
accum1 += (v >> 1) & mask;
accum2 += (v >> 2) & mask;
accum3 += (v >> 3) & mask;
i++;
}
for( int j = 0; j < 64; j += 4 ) {
counts[j] += ((int)accum0) & 15;
counts[j+1] += ((int)accum1) & 15;
counts[j+2] += ((int)accum2) & 15;
counts[j+3] += ((int)accum3) & 15;
accum0 >>= 4;
accum1 >>= 4;
accum2 >>= 4;
accum3 >>= 4;
}
return counts;
}
Demo: http://ideone.com/eNn4O (needs more test cases)

http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetNaive
One of them
unsigned int v; // count the number of bits set in v
unsigned int c; // c accumulates the total bits set in v
for (c = 0; v; c++)
{
v &= v - 1; // clear the least significant bit set
}
Keep in mind, that complexity of this method is aprox O(log2(n)) where n is the number to count bits in, so for 10 binary it need only 2 loops
You should probably take the metod for counting 32 bits whit 64 bit arithmetics and applying it on each half of word, what would take by 2*15 + 4 instructions
// option 3, for at most 32-bit values in v:
c = ((v & 0xfff) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
c += (((v & 0xfff000) >> 12) * 0x1001001001001ULL & 0x84210842108421ULL) %
0x1f;
c += ((v >> 24) * 0x1001001001001ULL & 0x84210842108421ULL) % 0x1f;
If you have sse4,3 capable processor you can use POPCNT instruction.
http://en.wikipedia.org/wiki/SSE4

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.