Find bit position without using Log() - c#

I have an integer input that is power of 2 (1, 2, 4, 8 etc). I want the function to return bit position without using log(). For example, for inputs above will return {0, 1, 2, 3} respectively This for C#. Plus if this can be done in SQL.
Thanks!

I don't have VS on my Mac to test this out, but did you want something like this?
public static int Foo(int n)
{
if (n <= 0)
{
return -1; // A weird value is better than an infinite loop.
}
int i = 0;
while (n % 2 == 0)
{
n /= 2;
i++;
}
return i;
}

The fastest code I found to do this is from the Bit Twiddling Hacks site. Specifically, the lookup based on the DeBruijn sequence. See http://graphics.stanford.edu/~seander/bithacks.html#IntegerLogDeBruijn
I tested a naive method, a switch-based method, and two of the Bit Twiddling Hacks methods: the DeBruijn sequence, and the other that says, "if you know your value is a power of two."
I ran all of these against an array of 32 million powers of two. That is, integers of the form 2^N, where N is in the range 0..30. A value of 2^31 is a negative number, which causes the naive method to go into an infinite loop.
I compiled the code with Visual Studio 2010 in release mode and ran it without the debugger (i.e. Ctrl+F5). On my system, the averages over several dozen runs are:
Naive method: 950 ms
Switch method: 660 ms
Bithack method 1: 1,154 ms
DeBruijn: 105 ms
It's clear that the DeBruijn sequence method is much faster than any of the others. The other Bithack method is inferior here because the conversion from C to C# results in some inefficiencies. For example, the C statement int r = (v & b[0]) != 0; ends up requiring an if or a ternary operator (i.e. ?:) in C#.
Here's the code.
class Program
{
const int Million = 1000 * 1000;
static readonly int NumValues = 32 * Million;
static void Main(string[] args)
{
// Construct a table of integers.
// These are random powers of two.
// That is 2^N, where N is in the range 0..31.
Console.WriteLine("Constructing table");
int[] values = new int[NumValues];
Random rnd = new Random();
for (int i = 0; i < NumValues; ++i)
{
int pow = rnd.Next(31);
values[i] = 1 << pow;
}
// Run each one once to make sure it's JITted
GetLog2_Bithack(values[0]);
GetLog2_DeBruijn(values[0]);
GetLog2_Switch(values[0]);
GetLog2_Naive(values[0]);
Stopwatch sw = new Stopwatch();
Console.Write("GetLog2_Naive ... ");
sw.Restart();
for (int i = 0; i < NumValues; ++i)
{
GetLog2_Naive(values[i]);
}
sw.Stop();
Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds);
Console.Write("GetLog2_Switch ... ");
sw.Restart();
for (int i = 0; i < NumValues; ++i)
{
GetLog2_Switch(values[i]);
}
sw.Stop();
Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds);
Console.Write("GetLog2_Bithack ... ");
sw.Restart();
for (int i = 0; i < NumValues; ++i)
{
GetLog2_Bithack(values[i]);
}
Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds);
Console.Write("GetLog2_DeBruijn ... ");
sw.Restart();
for (int i = 0; i < NumValues; ++i)
{
GetLog2_DeBruijn(values[i]);
}
sw.Stop();
Console.WriteLine("{0:N0} ms", sw.ElapsedMilliseconds);
Console.ReadLine();
}
static int GetLog2_Naive(int v)
{
int r = 0;
while ((v = v >> 1) != 0)
{
++r;
}
return r;
}
static readonly int[] MultiplyDeBruijnBitPosition2 = new int[32]
{
0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
};
static int GetLog2_DeBruijn(int v)
{
return MultiplyDeBruijnBitPosition2[(uint)(v * 0x077CB531U) >> 27];
}
static readonly uint[] b = new uint[] { 0xAAAAAAAA, 0xCCCCCCCC, 0xF0F0F0F0, 0xFF00FF00, 0xFFFF0000};
static int GetLog2_Bithack(int v)
{
int r = (v & b[0]) == 0 ? 0 : 1;
int x = 1 << 4;
for (int i = 4; i > 0; i--) // unroll for speed...
{
if ((v & b[i]) != 0)
r |= x;
x >>= 1;
}
return r;
}
static int GetLog2_Switch(int v)
{
switch (v)
{
case 0x00000001: return 0;
case 0x00000002: return 1;
case 0x00000004: return 2;
case 0x00000008: return 3;
case 0x00000010: return 4;
case 0x00000020: return 5;
case 0x00000040: return 6;
case 0x00000080: return 7;
case 0x00000100: return 8;
case 0x00000200: return 9;
case 0x00000400: return 10;
case 0x00000800: return 11;
case 0x00001000: return 12;
case 0x00002000: return 13;
case 0x00004000: return 14;
case 0x00008000: return 15;
case 0x00010000: return 16;
case 0x00020000: return 17;
case 0x00040000: return 18;
case 0x00080000: return 19;
case 0x00100000: return 20;
case 0x00200000: return 21;
case 0x00400000: return 22;
case 0x00800000: return 23;
case 0x01000000: return 24;
case 0x02000000: return 25;
case 0x04000000: return 26;
case 0x08000000: return 27;
case 0x10000000: return 28;
case 0x20000000: return 29;
case 0x40000000: return 30;
case int.MinValue: return 31;
default:
return -1;
}
}
}
If I optimize the Bithack code by unrolling the loop and using constants instead of array lookups, its time is the same as the time for the switch statement method.
static int GetLog2_Bithack(int v)
{
int r = ((v & 0xAAAAAAAA) != 0) ? 1 : 0;
if ((v & 0xFFFF0000) != 0) r |= (1 << 4);
if ((v & 0xFF00FF00) != 0) r |= (1 << 3);
if ((v & 0xF0F0F0F0) != 0) r |= (1 << 2);
if ((v & 0xCCCCCCCC) != 0) r |= (1 << 1);
return r;
}

0] if number is zero or negative, return/throw error
1] In your language, find the construct that converts a number to base 2.
2] convert the base-2 value to string
3] return the length of the string minus 1.

Verbose code, but probably the fastest:
if (x < 1)
throw SomeException();
if (x < 2)
return 0;
if (x < 4)
return 1;
if (x < 8)
return 2;
//.... etc.
This involves no division, nor conversion to-from double. It requires only comparisons, which are very speedy. See Code Complete, 2nd edition, page 633, for a discussion.
If you know that the input will always be a power of two, you might get better performance from a switch block:
switch (input)
{
case 1:
return 0;
case 2:
return 1;
case 4:
return 2;
case 8:
return 3;
//...
default:
throw SomeException();
}
I tested the performance on 10 million random ints, and on 10 million randomly-selected powers of two. The results:
Bithacks 1: 1360 milliseconds
Bithacks 2: 1103 milliseconds
If: 1320 milliseconds
Bithacks 1 (powers of 2): 1335 milliseconds
Bithacks 2 (powers of 2): 1060 milliseconds
Bithacks 3 (powers of 2): 1286 milliseconds
If (powers of 2): 1026 milliseconds
Switch (powers of 2): 896 milliseconds
I increased the number of iterations by ten times, and got these results:
Bithacks 1: 13347 milliseconds
Bithacks 2: 10370 milliseconds
If: 12918 milliseconds
Bithacks 1 (powers of 2): 12528 milliseconds
Bithacks 2 (powers of 2): 10150 milliseconds
Bithacks 3 (powers of 2): 12384 milliseconds
If (powers of 2): 9969 milliseconds
Switch (powers of 2): 8937 milliseconds
Now I didn't do any profiling to see if I did something stupid in translating the bit hacks to C# code, nor to see how much of the execution time is spent in the function that computes the log. So this is just a back-of-the-envelope kind of calculation, but it does suggest that the if approach is about the same as the bit hacks algorithms, and switch is a bit faster. Additionally, the if and switch approaches are far easier to understand and maintain.

This is a CPU friendly way to do it:
int bitpos=-1;
while(num>0) {
num = num>>1;
bitpos++;
}
return bitpos;
For SQL, use CASE. You can do binary search using nested IF....ELSE if performance is a concern. But with just 32 possible values, the overheads of implementing it could be much more than something simple sequential search.

find bit position in a power-of-two byte (uint8 Flag with only one bit set) coded in C (don't know about C#)
This returns the bit position 1..8 - you may need to decrement the result for indexing 0..7.
int pos(int a){
int p = (a & 3) | ((a >> 1) & 6) | ((a >> 2) & 5) | ((a >> 3) & 4) | ((a >> 4) & 15) | ((a >> 5) & 2) | ((a >> 6) & 1);
return p;
}
For info on the "evolution" of the code snippet see my blogpost here http://blog.edutoolbox.de/?p=263
EDIT:
deBruijn style is about 100x faster ...

Related

Optimal solution for "Bitwise AND" problem in C#

Problem statement:
Given an array of non-negative integers, count the number of unordered pairs of array elements, such that their bitwise AND is a power of 2.
Example:
arr = [10, 7, 2, 8, 3]
Answer: 6 (10&7, 10&2, 10&8, 10&3, 7&2, 2&3)
Constraints:
1 <= arr.Count <= 2*10^5
0 <= arr[i] <= 2^12
Here's my brute-force solution that I've come up with:
private static Dictionary<int, bool> _dictionary = new Dictionary<int, bool>();
public static long CountPairs(List<int> arr)
{
long result = 0;
for (var i = 0; i < arr.Count - 1; ++i)
{
for (var j = i + 1; j < arr.Count; ++j)
{
if (IsPowerOfTwo(arr[i] & arr[j])) ++result;
}
}
return result;
}
public static bool IsPowerOfTwo(int number)
{
if (_dictionary.TryGetValue(number, out bool value)) return value;
var result = (number != 0) && ((number & (number - 1)) == 0);
_dictionary[number] = result;
return result;
}
For small inputs this works fine, but for big inputs this works slow.
My question is: what is the optimal (or at least more optimal) solution for the problem? Please provide a graceful solution in C#. 😊
One way to accelerate your approach is to compute the histogram of your data values before counting.
This will reduce the number of computations for long arrays because there are fewer options for value (4096) than the length of your array (200000).
Be careful when counting bins that are powers of 2 to make sure you do not overcount the number of pairs by including cases when you are comparing a number with itself.
We can adapt the bit-subset dynamic programming idea to have a solution with O(2^N * N^2 + n * N) complexity, where N is the number of bits in the range, and n is the number of elements in the list. (So if the integers were restricted to [1, 4096] or 2^12, with n at 100,000, we would have on the order of 2^12 * 12^2 + 100000*12 = 1,789,824 iterations.)
The idea is that we want to count instances for which we have overlapping bit subsets, with the twist of adding a fixed set bit. Given Ai -- for simplicity, take 6 = b110 -- if we were to find all partners that AND to zero, we'd take Ai's negation,
110 -> ~110 -> 001
Now we can build a dynamic program that takes a diminishing mask, starting with the full number and diminishing the mask towards the left
001
^^^
001
^^
001
^
Each set bit on the negation of Ai represents a zero, which can be ANDed with either 1 or 0 to the same effect. Each unset bit on the negation of Ai represents a set bit in Ai, which we'd like to pair only with zeros, except for a single set bit.
We construct this set bit by examining each possibility separately. So where to count pairs that would AND with Ai to zero, we'd do something like
001 ->
001
000
we now want to enumerate
011 ->
011
010
101 ->
101
100
fixing a single bit each time.
We can achieve this by adding a dimension to the inner iteration. When the mask does have a set bit at the end, we "fix" the relevant bit by counting only the result for the previous DP cell that would have the bit set, and not the usual union of subsets that could either have that bit set or not.
Here is some JavaScript code (sorry, I do not know C#) to demonstrate with testing at the end comparing to the brute-force solution.
var debug = 0;
function bruteForce(a){
let answer = 0;
for (let i = 0; i < a.length; i++) {
for (let j = i + 1; j < a.length; j++) {
let and = a[i] & a[j];
if ((and & (and - 1)) == 0 && and != 0){
answer++;
if (debug)
console.log(a[i], a[j], a[i].toString(2), a[j].toString(2))
}
}
}
return answer;
}
function f(A, N){
const n = A.length;
const hash = {};
const dp = new Array(1 << N);
for (let i=0; i<1<<N; i++){
dp[i] = new Array(N + 1);
for (let j=0; j<N+1; j++)
dp[i][j] = new Array(N + 1).fill(0);
}
for (let i=0; i<n; i++){
if (hash.hasOwnProperty(A[i]))
hash[A[i]] = hash[A[i]] + 1;
else
hash[A[i]] = 1;
}
for (let mask=0; mask<1<<N; mask++){
// j is an index where we fix a 1
for (let j=0; j<=N; j++){
if (mask & 1){
if (j == 0)
dp[mask][j][0] = hash[mask] || 0;
else
dp[mask][j][0] = (hash[mask] || 0) + (hash[mask ^ 1] || 0);
} else {
dp[mask][j][0] = hash[mask] || 0;
}
for (let i=1; i<=N; i++){
if (mask & (1 << i)){
if (j == i)
dp[mask][j][i] = dp[mask][j][i-1];
else
dp[mask][j][i] = dp[mask][j][i-1] + dp[mask ^ (1 << i)][j][i - 1];
} else {
dp[mask][j][i] = dp[mask][j][i-1];
}
}
}
}
let answer = 0;
for (let i=0; i<n; i++){
for (let j=0; j<N; j++)
if (A[i] & (1 << j))
answer += dp[((1 << N) - 1) ^ A[i] | (1 << j)][j][N];
}
for (let i=0; i<N + 1; i++)
if (hash[1 << i])
answer = answer - hash[1 << i];
return answer / 2;
}
var As = [
[10, 7, 2, 8, 3] // 6
];
for (let A of As){
console.log(JSON.stringify(A));
console.log(`DP, brute force: ${ f(A, 4) }, ${ bruteForce(A) }`);
console.log('');
}
var numTests = 1000;
for (let i=0; i<numTests; i++){
const N = 6;
const A = [];
const n = 10;
for (let j=0; j<n; j++){
const num = Math.floor(Math.random() * (1 << N));
A.push(num);
}
const fA = f(A, N);
const brute = bruteForce(A);
if (fA != brute){
console.log('Mismatch:');
console.log(A);
console.log(fA, brute);
console.log('');
}
}
console.log("Done testing.");
int[] numbers = new[] { 10, 7, 2, 8, 3 };
static bool IsPowerOfTwo(int n) => (n != 0) && ((n & (n - 1)) == 0);
long result = numbers.AsParallel()
.Select((a, i) => numbers
.Skip(i + 1)
.Select(b => a & b)
.Count(IsPowerOfTwo))
.Sum();
If I understand the problem correctly, this should work and should be faster.
First, for each number in the array we grab all elements in the array after it to get a collection of numbers to pair with.
Then we transform each pair number with a bitwise AND, then counting the number that satisfy our 'IsPowerOfTwo;' predicate (implementation here).
Finally we simply get the sum of all the counts - our output from this case is 6.
I think this should be more performant than your dictionary based solution - it avoids having to perform a lookup each time you wish to check power of 2.
I think also given the numerical constraints of your inputs it is fine to use int data types.

prime number summing still slow after using sieve

I had a go at a project euler coding challenge below, the answer given by the code is correct but I do not understand why its taking nearly a minute to run. It was finishing with similar times prior to using a sieve. Others users are reporting times as low as milliseconds.
I assume I am making a basic error somewhere...
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Find the sum of all the primes below two million.
public static long Ex010()
{
var sum = 0L;
var sieve = new bool[2000000];
var primes = new List<int>(10000);
for (int i = 2; i < sieve.Length; i++)
{
if (sieve[i-1])
continue;
var isPrime = true;
foreach (var prime in primes)
{
if (i % prime == 0) {
isPrime = false;
break;
}
}
if (isPrime) {
primes.Add(i);
sum += i;
for (var x = i * 2; x < sieve.Length; x += i) {
sieve[x-1] = true;
}
}
}
return sum;
}
EDIT:
The only thing that seemed missing was this optimization :
if (prime > Math.Sqrt(i))
break;
It brings the time down to 160 ms.
EDIT 2:
Finally clicked, took out the foreach as was suggested many times. Its now 12 ms. Final solution :
public static long Ex010()
{
var sum = 0L;
var sieve = new bool[2000000];
for (int i = 2; i < sieve.Length; i++)
{
if (sieve[i-1])
continue;
sum += i;
for (var x = i * 2; x < sieve.Length; x += i) {
sieve[x-1] = true;
}
}
return sum;
}
You are doing trial division in addition to a sieve.
The boolean array will already tell you if a number is prime, so you don't need the List of primes at all.
You can also speed it up by only sieving up to the square root of the limit.
If you want to save some memory also, you can use a BitArray instead of a boolean array.
public static long Ex010()
{
const int Limit = 2000000;
int sqrt = (int)Math.Sqrt(Limit);
var sum = 0L;
var isComposite = new bool[Limit];
for (int i = 2; i < sqrt; i++) {
if (isComposite[i - 2])
continue;//This number is not prime, skip
sum += i;
for (var x = i * i; x < isComposite.Length; x += i) {
isComposite[x - 2] = true;
}
}
//Add the remaining prime numbers
for (int i = sqrt; i < Limit; i++) {
if (!isComposite[i - 2]) {
sum += i;
}
}
return sum;
}
(tl;dr: 2 million in 0.8 ms, 2 billion in 1.25 s; segmented odds-only SoE, presieving, wheeled striding)
As always, the limit of Euler task #10 seems designed to pose a mild challenge on a ZX81, Apple ][ or C64 but on modern hardware you generally have to multiply the limits by 1000 to make things even remotely interesting. Or set a time limit like 5 seconds and try to see by how many orders of magnitude the Euler limit can be exceeded...
Dennis_E's solution is simple and efficient, but I'd recommend applying two small improvements that give a marked performance boost without any effort at all.
Represent only odd numbers in the sieve
All even numbers except for the number 2 are composite. If you pull the number 2 out of thin air when needed then you can drop all even numbers from the sieve. This halves workload and memory footprint, for a doubling of performance at the marginal cost of writing << or >> in a few places (to convert between the realm of numbers and the realm of bit indices). This is usually known as an 'odds-only sieve' or 'wheeled representation mod 2'; it has the added advantage that it largely removes the need for guarding against index overflow.
Skip a few extra small primes during decoding
Skipping a few small primes ('applying a wheel') is much easier when going through a range of numbers incrementally, compared to hopping wildly about with different strides as during sieving. This skipping only involves applying a cyclic sequence of differences between consecutive numbers that are not multiples of the primes in question, like 4,2,4,2... for skipping multiples of 2 and 3 (the mod 6 wheel) or 6,4,2,4,2,4,6,2... for skipping multiples of 2, 3 and 5.
The mod 6 wheel sequence alternates between only two numbers, which can easily be achieved by XORing with a suitable value. On top of the odds-only sieve the distances are halved, so that the sequence becomes 2,1,2,1... This skipping reduces the work during decoding by 1/3rd (for stepping mod 3), and the skipped primes can also be ignored during the sieving. The latter can have a marked effect on sieve times, since the smallest primes do the greatest number of crossings-off during the sieving.
Here's a simple Sieve of Eratosthenes, with both suggestions applied. Note: here and in the following I generally go with the flow of C#/.Net and use signed integers where I would normally use unsigned integers in any sane language. That's because I don't have the time to vet the code for the performance implications (penalties) resulting from the use of unsigned types, like that the compiler suddenly forgets how to replace division by a constant with multiplication of the inverse and so on.
static long sum_small_primes_up_to (int n)
{
if (n < 7)
return (0xAA55200 >> (n << 2)) & 0xF;
int sqrt_n_halved = (int)Math.Sqrt(n) >> 1;
int max_bit = (int)(n - 1) >> 1;
var odd_composite = new bool[max_bit + 1];
for (int i = 5 >> 1; i <= sqrt_n_halved; ++i)
if (!odd_composite[i])
for (int p = (i << 1) + 1, j = p * p >> 1; j <= max_bit; j += p)
odd_composite[j] = true;
long sum = 2 + 3;
for (int i = 5 >> 1, d = 1; i <= max_bit; i += d, d ^= 3)
if (!odd_composite[i])
sum += (i << 1) + 1;
return sum;
}
The first if statement handles the small fry (n in 0..6) by returning a suitable element of a precomputed list of numbers, and it serves to get all the special cases out of the way in one fell swoop. All other occurrences of shift operators are for converting between the realm of numbers and the realm of indices into the odds-only sieve.
This is pretty much the same code that I normally use for sieving small primes, up to 64K or so (the potential least factors for numbers up to 32 bits). It does Euler's piddling 2 million in 4.5 milliseconds but throwing bigger numbers at it shows its Achilles heel: it does a lot of striding over large distances, which interacts badly with modern memory subsystems where decent access speed can only be got from caches. The performance drops markedly when the capacity of the level 1 cache (typically 32 KiByte) is exceeded significantly, and it goes down even further when exceeding the L2 and L3 capacities (typically several megabytes). How sharp the drop is depends on the quality (price tag) of the computer, of course...
Here are some timings taken on my laptop:
# benchmark: small_v0 ...
sum up to 2 * 10^4: 21171191 in 0,03 ms
sum up to 2 * 10^5: 1709600813 in 0,35 ms // 11,0 times
sum up to 2 * 10^6: 142913828922 in 4,11 ms // 11,7 times
sum up to 2 * 10^7: 12272577818052 in 59,36 ms // 14,4 times
sum up to 2 * 10^8: 1075207199997334 in 1.225,19 ms // 20,6 times
sum up to 2 * 10^9: 95673602693282040 in 14.381,29 ms // 11,7 times
In the middlish ranges there are time increases that go well beyond the expected factor of about 11, and then things stabilise again.
And here comes how to speed up the beast an order of magnitude...
Process the range in cache-sized segments
The remedy is easy enough: instead of striding each prime all the way from one end of the range to the other - and hence all across the memory space - we sieve the range in cache-sized strips, memorising the final positions for each prime so that the next round can continue right where the previous round left off. If we don't need a big bad sieve full of bits at the end then we can process a strip (extract its primes) after it has been sieved and then discard its data, reusing the sieve buffer for the next strip. Both are variations on the theme of segmented sieving but the offsets are treated differently during the processing; when the distinction matters then the first approach (big bad sieve for the whole range) is usually called a segmented sieve and the latter an iterated sieve. The terms 'moving' or 'sliding' sieve might fit the latter somewhat but should be avoided because they normally refer to a totally different class of sieves (also known as deque sieves) that are deceptively simple but whose performance is worse by at least an order of magnitude.
Here's an example of an iterated sieve, a slightly modified version of a function that I normally use for sieving primes in given ranges [m, n], just like in SPOJ's PRIMES1 and PRINT. Here the parameter m is implicitly 0, so it doesn't need to be passed.
Normally the function takes an interface that is responsible for processing the raw sieve (and any lose primes it may get passed), and which can get queried for the number of primes that the processor skips ('decoder order') so that the sieve can ignore those during the sieving. For this exposition I've replaced this with a delegate for simplicity.
The factor primes get sieved by a stock function that may look somewhat familiar, and I've changed the logic of the sieve from 'is_composite' to 'not_composite' (and to a base type that can participate in arithmetic) for reasons that will be explained later. decoder_order is the number of additional primes skipped by the decoder (which would be 1 for the function shown earlier, because it skips multiples of the prime 3 during the prime extraction/summing, over and above the wheel prime 2).
const int SIEVE_BITS = 1 << 15; // L1 cache size, 1 usable bit per byte
delegate long sieve_sum_func (byte[] sieve, int window_base, int window_bits);
static long sum_primes_up_to (int n, sieve_sum_func sum_func, int decoder_order)
{
if (n < 7)
return 0xF & (0xAA55200 >> (n << 2));
n -= ~n & 1; // make odd (n can't be 0 here)
int sqrt_n = (int)Math.Sqrt(n);
var factor_primes = small_primes_up_to(sqrt_n).ToArray();
int first_sieve_prime_index = 1 + decoder_order; // skip wheel primes + decoder primes
int m = 7; // this would normally be factor_primes[first_sieve_prime_index] + 2
int bits_to_sieve = ((n - m) >> 1) + 1;
int sieve_bits = Math.Min(bits_to_sieve, SIEVE_BITS);
var sieve = new byte[sieve_bits];
var offsets = new int[factor_primes.Length];
int sieve_primes_end = first_sieve_prime_index;
long sum = 2 + 3 + 5; // wheel primes + decoder primes
for (int window_base = m; ; )
{
int window_bits = Math.Min(bits_to_sieve, sieve_bits);
int last_number_in_window = window_base - 1 + (window_bits << 1);
while (sieve_primes_end < factor_primes.Length)
{
int prime = factor_primes[sieve_primes_end];
int start = prime * prime, stride = prime << 1;
if (start > last_number_in_window)
break;
if (start < window_base)
start = (stride - 1) - (window_base - start - 1) % stride;
else
start -= window_base;
offsets[sieve_primes_end++] = start >> 1;
}
fill(sieve, window_bits, (byte)1);
for (int i = first_sieve_prime_index; i < sieve_primes_end; ++i)
{
int prime = factor_primes[i], j = offsets[i];
for ( ; j < window_bits; j += prime)
sieve[j] = 0;
offsets[i] = j - window_bits;
}
sum += sum_func(sieve, window_base, window_bits);
if ((bits_to_sieve -= window_bits) == 0)
break;
window_base += window_bits << 1;
}
return sum;
}
static List<int> small_primes_up_to (int n)
{
int upper_bound_on_pi = 32 + (n < 137 ? 0 : (int)(n / (Math.Log(n) - 1.083513)));
var result = new List<int>(upper_bound_on_pi);
if (n < 2)
return result;
result.Add(2); // needs to be pulled out of thin air because of the mod 2 wheel
if (n < 3)
return result;
result.Add(3); // needs to be pulled out of thin air because of the mod 3 decoder
int sqrt_n_halved = (int)Math.Sqrt(n) >> 1;
int max_bit = (n - 1) >> 1;
var odd_composite = new bool[max_bit + 1];
for (int i = 5 >> 1; i <= sqrt_n_halved; ++i)
if (!odd_composite[i])
for (int p = (i << 1) + 1, j = p * p >> 1; j <= max_bit; j += p)
odd_composite[j] = true;
for (int i = 5 >> 1, d = 1; i <= max_bit; i += d, d ^= 3)
if (!odd_composite[i])
result.Add((i << 1) + 1);
return result;
}
static void fill<T> (T[] array, int count, T value, int threshold = 16)
{
Trace.Assert(count <= array.Length);
int current_size = Math.Min(threshold, count);
for (int i = 0; i < current_size; ++i)
array[i] = value;
for (int half = count >> 1; current_size <= half; current_size <<= 1)
Buffer.BlockCopy(array, 0, array, current_size, current_size);
Buffer.BlockCopy(array, 0, array, current_size, count - current_size);
}
Here's sieve processor that is equivalent to the logic used in the function shown at the beginning, and a dummy function that can be used to measure the sieve time sans any decoding, for comparison:
static long prime_sum_null (byte[] sieve, int window_base, int window_bits)
{
return 0;
}
static long prime_sum_v0 (byte[] sieve, int window_base, int window_bits)
{
long sum = 0;
int i = window_base % 3 == 0 ? 1 : 0;
int d = 3 - (window_base + 2 * i) % 3;
for ( ; i < window_bits; i += d, d ^= 3)
if (sieve[i] == 1)
sum += window_base + (i << 1);
return sum;
}
This function needs to perform a bit of modulo magic to synchronise itself with the mod 3 sequence over the mod 2 sieve; the earlier function did not need to do this because its starting point was fixed, not a parameter. Here are the timings:
# benchmark: iter_v0 ...
sum up to 2 * 10^4: 21171191 in 0,04 ms
sum up to 2 * 10^5: 1709600813 in 0,28 ms // 7,0 times
sum up to 2 * 10^6: 142913828922 in 2,42 ms // 8,7 times
sum up to 2 * 10^7: 12272577818052 in 22,11 ms // 9,1 times
sum up to 2 * 10^8: 1075207199997334 in 223,67 ms // 10,1 times
sum up to 2 * 10^9: 95673602693282040 in 2.408,06 ms // 10,8 times
Quite a difference, n'est-ce pas? But we're not done yet.
Replace conditional branching with arithmetic
Modern processors like things to be simple and predictable; if branches are not predicted correctly then the CPU levies a heavy fine in extra cycles for flushing and refilling the instruction pipeline. Unfortunately, the decoding loop isn't very predictable because primes are fairly dense in the low number ranges we're talking about here:
if (!odd_composite[i])
++count;
If the average number of non-primes between primes times the cost of an addition is less than the penalty for a mispredicted branch then the following statement should be faster:
count += sieve[i];
This explains why I inverted the logic of the sieve compared to normal, because with 'is_composite' semantics I'd have to do
count += 1 ^ odd_composite[i];
And the rule is to pull everything out of inner loops that can be pulled out, so that I simply applied 1 ^ x to the whole array before even starting.
However, Euler wants us to sum the primes instead of counting them. This can be done in a similar fashion, by turning the value 1 into a mask of all 1 bits (which preserves everything when ANDing) and 0 zeroises any value. This is similar to the CMOV instruction, except that it works even on the oldest of CPUs and does not require a reasonably decent compiler:
static long prime_sum_v1 (byte[] sieve, int window_base, int window_bits)
{
long sum = 0;
int i = window_base % 3 == 0 ? 1 : 0;
int d = 3 - (window_base + 2 * i) % 3;
for ( ; i < window_bits; i += d, d ^= 3)
sum += (0 - sieve[i]) & (window_base + (i << 1));
return sum;
}
Result:
# benchmark: iter_v1 ...
sum up to 2 * 10^4: 21171191 in 0,10 ms
sum up to 2 * 10^5: 1709600813 in 0,36 ms // 3,6 times
sum up to 2 * 10^6: 142913828922 in 1,88 ms // 5,3 times
sum up to 2 * 10^7: 12272577818052 in 13,80 ms // 7,3 times
sum up to 2 * 10^8: 1075207199997334 in 157,39 ms // 11,4 times
sum up to 2 * 10^9: 95673602693282040 in 1.819,05 ms // 11,6 times
Unrolling and strength reduction
Now, a bit of overkill: a decoder with a fully unrolled wheel mod 15 (the unrolling can unlock some reserves of instruction-level parallelism).
static long prime_sum_v5 (byte[] sieve, int window_base, int window_bits)
{
Trace.Assert(window_base % 2 == 1);
int count = 0, sum = 0;
int residue = window_base % 30;
int phase = UpperIndex[residue];
int i = (SpokeValue[phase] - residue) >> 1;
// get into phase for the unrolled code (which is based on phase 0)
for ( ; phase != 0 && i < window_bits; i += DeltaDiv2[phase], phase = (phase + 1) & 7)
{
int b = sieve[i]; count += b; sum += (0 - b) & i;
}
// process full revolutions of the wheel (anchored at phase 0 == residue 1)
for (int e = window_bits - (29 >> 1); i < e; i += (30 >> 1))
{
int i0 = i + ( 1 >> 1), b0 = sieve[i0]; count += b0; sum += (0 - b0) & i0;
int i1 = i + ( 7 >> 1), b1 = sieve[i1]; count += b1; sum += (0 - b1) & i1;
int i2 = i + (11 >> 1), b2 = sieve[i2]; count += b2; sum += (0 - b2) & i2;
int i3 = i + (13 >> 1), b3 = sieve[i3]; count += b3; sum += (0 - b3) & i3;
int i4 = i + (17 >> 1), b4 = sieve[i4]; count += b4; sum += (0 - b4) & i4;
int i5 = i + (19 >> 1), b5 = sieve[i5]; count += b5; sum += (0 - b5) & i5;
int i6 = i + (23 >> 1), b6 = sieve[i6]; count += b6; sum += (0 - b6) & i6;
int i7 = i + (29 >> 1), b7 = sieve[i7]; count += b7; sum += (0 - b7) & i7;
}
// clean up leftovers
for ( ; i < window_bits; i += DeltaDiv2[phase], phase = (phase + 1) & 7)
{
int b = sieve[i]; count += b; sum += (0 - b) & i;
}
return (long)window_base * count + ((long)sum << 1);
}
As you can see, I performed a bit of strength reduction in order to make things easier for the compiler. Instead of summing window_base + (i << 1), I sum i and 1 separately and perform the rest of the calculation only once, at the end of the function.
Timings:
# benchmark: iter_v5(1) ...
sum up to 2 * 10^4: 21171191 in 0,01 ms
sum up to 2 * 10^5: 1709600813 in 0,11 ms // 9,0 times
sum up to 2 * 10^6: 142913828922 in 1,01 ms // 9,2 times
sum up to 2 * 10^7: 12272577818052 in 11,52 ms // 11,4 times
sum up to 2 * 10^8: 1075207199997334 in 130,43 ms // 11,3 times
sum up to 2 * 10^9: 95673602693282040 in 1.563,10 ms // 12,0 times
# benchmark: iter_v5(2) ...
sum up to 2 * 10^4: 21171191 in 0,01 ms
sum up to 2 * 10^5: 1709600813 in 0,09 ms // 8,7 times
sum up to 2 * 10^6: 142913828922 in 1,03 ms // 11,3 times
sum up to 2 * 10^7: 12272577818052 in 10,34 ms // 10,0 times
sum up to 2 * 10^8: 1075207199997334 in 121,08 ms // 11,7 times
sum up to 2 * 10^9: 95673602693282040 in 1.468,28 ms // 12,1 times
The first set of timings is for decoder_order == 1 (i.e. not telling the sieve about the extra skipped prime), for direct comparison to the other decoder versions. The second set is for decoder_order == 2, which means the sieve could skip the crossings-off for the prime 5 as well. Here are the null timings (essentially the sieve time without the decode time), to put things a bit into perspective:
# benchmark: iter_null(1) ...
sum up to 2 * 10^8: 10 in 94,74 ms // 11,4 times
sum up to 2 * 10^9: 10 in 1.194,18 ms // 12,6 times
# benchmark: iter_null(2) ...
sum up to 2 * 10^8: 10 in 86,05 ms // 11,9 times
sum up to 2 * 10^9: 10 in 1.109,32 ms // 12,9 times
This shows that the work on the decoder has decreased decode time for 2 billion from 1.21 s to 0.35 s, which is nothing to sneeze at. Similar speedups can be realised for the sieving as well, but that is nowhere near as easy as it was for the decoding.
Low-hanging fruit: presieving
Lastly, a technique that can sometimes offer dramatic speedups (especially for packed bitmaps and/or higher-order wheels) is blasting a canned bit pattern over the sieve before comencing a round of sieving, such that the sieve looks as if it had already been sieved by a handful of small primes. This is usually known as presieving. In the current case the speedup is marginal (not even 20%) but I'm showing it because it is a useful technique to have in one's toolchest.
Note: I've ripped the presieving logic from another Euler project, so it doesn't fit organically into the code I wrote for this article. But it should demonstrate the technique well enough.
const byte CROSSED_OFF = 0; // i.e. composite
const byte NOT_CROSSED = 1 ^ CROSSED_OFF; // i.e. not composite
const int SIEVE_BYTES = SIEVE_BITS; // i.e. 1 usable bit per byte
internal readonly static byte[] TinyPrimes = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31 };
internal readonly static int m_wheel_order = 3; // == number of wheel primes
internal static int m_presieve_level = 0; // == number of presieve primes
internal static int m_presieve_modulus = 0;
internal static byte[] m_presieve_pattern;
internal static void set_presieve_level (int presieve_primes)
{
m_presieve_level = Math.Max(0, presieve_primes);
m_presieve_modulus = 1;
for (int i = m_wheel_order; i < m_wheel_order + presieve_primes; ++i)
m_presieve_modulus *= TinyPrimes[i];
// the pattern needs to provide SIEVE_BYTES bytes for every residue of the modulus
m_presieve_pattern = new byte[m_presieve_modulus + SIEVE_BYTES - 1];
var pattern = m_presieve_pattern;
int current_size = 1;
pattern[0] = NOT_CROSSED;
for (int i = m_wheel_order; i < m_wheel_order + presieve_primes; ++i)
{
int current_prime = TinyPrimes[i];
int new_size = current_size * current_prime;
// keep doubling while possible
for ( ; current_size * 2 <= new_size; current_size *= 2)
Buffer.BlockCopy(pattern, 0, pattern, current_size, current_size);
// copy rest, if any
Buffer.BlockCopy(pattern, 0, pattern, current_size, new_size - current_size);
current_size = new_size;
// mark multiples of the current prime
for (int j = current_prime >> 1; j < current_size; j += current_prime)
pattern[j] = CROSSED_OFF;
}
for (current_size = m_presieve_modulus; current_size * 2 <= pattern.Length; current_size *= 2)
Buffer.BlockCopy(pattern, 0, pattern, current_size, current_size);
Buffer.BlockCopy(pattern, 0, pattern, current_size, pattern.Length - current_size);
}
For a quick test you can hack the presieving into the sieve function as follows:
- int first_sieve_prime_index = 1 + decoder_order; // skip wheel primes + decoder primes
+ int first_sieve_prime_index = 1 + decoder_order + m_presieve_level; // skip wheel primes + decoder primes
plus
- long sum = 2 + 3 + 5; // wheel primes + decoder primes
+ long sum = 2 + 3 + 5; // wheel primes + decoder primes
+
+ for (int i = 0; i < m_presieve_level; ++i)
+ sum += TinyPrimes[m_wheel_order + i];
plus
- fill(sieve, window_bits, (byte)1);
+ if (m_presieve_level == 0)
+ fill(sieve, window_bits, (byte)1);
+ else
+ Buffer.BlockCopy(m_presieve_pattern, (window_base >> 1) % m_presieve_modulus, sieve, 0, window_bits);
and
set_presieve_level(4) // 4 and 5 work well
in the static constructor or Main().
This way you can use m_presieve_level for turning presieving on and off. The BlockCopy also works correctly after calling set_presieve_level(0), though, because then the modulus is 1. m_wheel_order should reflect the actual wheel order (= 1) plus the decoder order; it's currently set to 3, so it'll work only with the v5 decoder at level 2.
Timings:
# benchmark: iter_v5(2) pre(7) ...
sum up to 2 * 10^4: 21171191 in 0,02 ms
sum up to 2 * 10^5: 1709600813 in 0,08 ms // 4,0 times
sum up to 2 * 10^6: 142913828922 in 0,78 ms // 9,6 times
sum up to 2 * 10^7: 12272577818052 in 8,78 ms // 11,2 times
sum up to 2 * 10^8: 1075207199997334 in 98,89 ms // 11,3 times
sum up to 2 * 10^9: 95673602693282040 in 1.245,19 ms // 12,6 times
sum up to 2^31 - 1: 109930816131860852 in 1.351,97 ms

find set bits positions in a byte

I was tyring to create a sparse octree implementation like the people at nVidia ("Efficient Sparse Voxel Octrees") were doing for their voxel things when I came across this issue:
I have a bitfield of type byte (so 8 bits only) that tells me where the leafs of the octree are (1 says leaf, 0 means no leaf, 8 nodes attached --> 8 bit). What I want to do now is returning an array of the leaf positions. My current implementation is using a while loop to find out if the LSB is set. The input is shifted by 1 afterwards. So here's how I do that:
int leafposition = _leafmask & _validmask;
int[] result = new int[8];
int arrayPosition = 0;
int iteration = 0;
while ( leafposition > 0 )
{
iteration++; //nodes are not zero-indexed ... ?
if ( (leafposition & 1) == 1 ) // LSB set?
{
result.SetValue( iteration, arrayPosition );
arrayPosition++;
};
leafposition = leafposition >> 1;
}
return result;
This is not looking elegant and has two things that are disturbing:
This while loop mimics a for loop
the result array will most likely be smaller that 8 values, but resizing is costly
I expect the result to be like [2,4,6] for 42 (0010 1010).
Can anyone provide a more elegant solution that is still readable?
Result
I am using a function for the octree leaf count I implemented earlier to set the array to an appropriate size.
If you're going after code conciseness, I would use this:
int[] result = new int[8];
byte leafposition = 42;
int arrayPosition = 0;
for (int iteration = 0; iteration < 8; ++iteration)
if ((leafposition & (1 << iteration)) != 0)
result[arrayPosition++] = iteration + 1; // one-indexed
If you're going after performance, I would use a pre-populated array (of 256 entries). You can either generate this statically (at compile-time) or lazily (before calling your method the first time).
int[][] leaves =
{
/* 00000000 */ new int[] { },
/* 00000001 */ new int[] { 1 },
/* 00000010 */ new int[] { 2 },
/* 00000011 */ new int[] { 1, 2 },
/* 00000100 */ new int[] { 3 },
/* 00000101 */ new int[] { 1, 3 },
/* 00000110 */ new int[] { 2, 3 },
/* 00000111 */ new int[] { 1, 2, 3 },
/* 00001000 */ new int[] { 4 },
/* 00001001 */ new int[] { 1, 4 },
/* ... */
};
byte leafposition = 42;
int[] result = leaves[leafposition];
Edit: If you're using the lookup table and can afford a one-time initialization (that will be amortized through many subsequent uses), I would suggest creating it dynamically (rather than bloating your source code). Here's some sample code in LINQ; you can use the loop version instead.
int[][] leaves = new int[256][];
for (int i = 0; i < 256; ++i)
leaves[i] = Enumerable.Range(0, 8)
.Where(b => (i & (1 << b)) != 0)
.Select(b => b + 1)
.ToArray();
Here's a C-style solution that uses __builtin_ffs
int arrayPosition = 0;
unsigned int tmp_bitmap = original_bitfield;
while (tmp_bitmap > 0) {
int leafposition = __builtin_ffs(tmp_bitmap) - 1;
tmp_bitmap &= (tmp_bitmap-1);
result[arrayPosition++] = leafposition;
}
how about,
public static IEnumerable<int> LeafPositions(byte octet)
{
for (var i = 1; octet > 0; octet >>= 1, i++)
{
if ((octet & 1) == 1)
{
yield return i;
}
}
}
or, easier to read in my opinion.
IEnumerable<int> LeafPositions(byte octet)
{
if ((octet & 1) == 1) yield return 1;
if ((octet & 2) == 2) yield return 2;
if ((octet & 4) == 4) yield return 3;
if ((octet & 8) == 8) yield return 4;
if ((octet & 16) == 16) yield return 5;
if ((octet & 32) == 32) yield return 6;
if ((octet & 64) == 64) yield return 7;
if ((octet & 128) == 128) yield return 8;
}
Or, going to extremes
IEnumerable<int> LeafPositions(byte octet)
{
switch (octet)
{
case 1:
yield return 1;
break;
case 2:
yield return 2;
break;
case 3:
yield return 1;
yield return 2;
break;
...
case 255:
yield return 1;
yield return 2;
yield return 3;
yield return 4;
yield return 5;
yield return 6;
yield return 7;
yield return 8;
break;
}
yield break;
}

Project Euler: Problem 1 (Possible refactorings and run time optimizations)

I have been hearing a lot about Project Euler so I thought I solve one of the problems in C#. The problem as stated on the website is as follows:
If we list all the natural numbers
below 10 that are multiples of 3 or 5,
we get 3, 5, 6 and 9. The sum of these
multiples is 23.
Find the sum of all the multiples of 3
or 5 below 1000.
I wrote my code as follows:
class EulerProblem1
{
public static void Main()
{
var totalNum = 1000;
var counter = 1;
var sum = 0;
while (counter < totalNum)
{
if (DivisibleByThreeOrFive(counter))
sum += counter;
counter++;
}
Console.WriteLine("Total Sum: {0}", sum);
Console.ReadKey();
}
private static bool DivisibleByThreeOrFive(int counter)
{
return ((counter % 3 == 0) || (counter % 5 == 0));
}
}
It will be great to get some ideas on alternate implementations with less verbosity/cleaner syntax and better optimizations. The ideas may vary from quick and dirty to bringing out the cannon to annihilate the mosquito. The purpose is to explore the depths of computer science while trying to improve this particularly trivial code snippet.
Thanks
Updated to not double count numbers that are multiples of both 3 and 5:
int EulerProblem(int totalNum)
{
int a = (totalNum-1)/3;
int b = (totalNum-1)/5;
int c = (totalNum-1)/15;
int d = a*(a+1)/2;
int e = b*(b+1)/2;
int f = c*(c+1)/2;
return 3*d + 5*e - 15*f;
}
With LINQ (updated as suggested in comments)
static void Main(string[] args)
{
var total = Enumerable.Range(0,1000)
.Where(counter => (counter%3 == 0) || (counter%5 == 0))
.Sum();
Console.WriteLine(total);
Console.ReadKey();
}
Here's a transliteration of my original F# solution into C#. Edited: It's basically mbeckish's solution as a loop rather than a function (and I remove the double count). I like mbeckish's better.
static int Euler1 ()
{
int sum = 0;
for (int i=3; i<1000; i+=3) sum+=i;
for (int i=5; i<1000; i+=5) sum+=i;
for (int i=15; i<1000; i+=15) sum-=i;
return sum;
}
Here's the original:
let euler1 d0 d1 n =
(seq {d0..d0..n} |> Seq.sum) +
(seq {d1..d1..n} |> Seq.sum) -
(seq {d0*d1..d0*d1..n} |> Seq.sum)
let result = euler1 3 5 (1000-1)
I haven't written any Java in a while, but this should solve it in constant time with little overhead:
public class EulerProblem1
{
private static final int EULER1 = 233168;
// Equal to the sum of all natural numbers less than 1000
// which are multiples of 3 or 5, inclusive.
public static void main(String[] args)
{
System.out.println(EULER1);
}
}
EDIT: Here's a C implementation, if every instruction counts:
#define STDOUT 1
#define OUT_LENGTH 8
int main (int argc, char **argv)
{
const char out[OUT_LENGTH] = "233168\n";
write(STDOUT, out, OUT_LENGTH);
}
Notes:
There's no error handling on the call to write. If true robustness is needed, a more sophisticated error handling strategy must be employed. Whether the added complexity is worth greater reliability depends on the needs of the user.
If you have memory constraints, you may be able to save a byte by using a straight char array rather than a string terminated by a superfluous null character. In practice, however, out would almost certainly be padded to 8 bytes anyway.
Although the declaration of the out variable could be avoided by placing the string inline in the write call, any real compiler willoptimize away the declaration.
The write syscall is used in preference to puts or similar to avoid the additional overhead. Theoretically, you could invoke the system call directly, perhaps saving a few cycles, but this would raise significant portability issues. Your mileage may vary regarding whether this is an acceptable tradeoff.
Refactoring #mbeckish's very clever solution:
public int eulerProblem(int max) {
int t1 = f(max, 3);
int t2 = f(max, 5);
int t3 = f(max, 3 * 5);
return t1 + t2 - t3;
}
private int f(int max, int n) {
int a = (max - 1) / n;
return n * a * (a + 1) / 2;
}
That's basically the same way I did that problem. I know there were other solutions (probably more efficient ones too) on the forums for project-euler.
Once you input your answer going back to the question gives you the option to go to the forum for that problem. You may want to look there!
The code in DivisibleByThreeOrFive would be slightly faster if you would state it as follows:
return ((counter % 3 == 0) || (counter % 5 == 0));
And if you do not want to rely on the compiler to inline the function call, you could do this yourself by putting this code into the Main routine.
You can come up with a closed form solution for this. The trick is to look for patterns. Try listing out the terms in the sum up to say ten, or twenty and then using algebra to group them. By making appropriate substitutions you can generalize that to numbers other than ten. Just be careful about edge cases.
Try this, in C. It's constant time, and there's only one division (two if the compiler doesn't optimize the div/mod, which it should). I'm sure it's possible to make it a bit more obvious, but this should work.
It basically divides the sum into two parts. The greater part (for N >= 15) is a simple quadratic function that divides N into exact blocks of 15. The lesser part is the last bit that doesn't fit into a block. The latter bit is messier, but there are only a few possibilities, so a LUT will solve it in no time.
const unsigned long N = 1000 - 1;
const unsigned long q = N / 15;
const unsigned long r = N % 15;
const unsigned long rc = N - r;
unsigned long sum = ((q * 105 + 15) * q) >> 1;
switch (r) {
case 3 : sum += 3 + 1*rc ; break;
case 4 : sum += 3 + 1*rc ; break;
case 5 : sum += 8 + 2*rc ; break;
case 6 : sum += 14 + 3*rc ; break;
case 7 : sum += 14 + 3*rc ; break;
case 8 : sum += 14 + 3*rc ; break;
case 9 : sum += 23 + 4*rc ; break;
case 10 : sum += 33 + 5*rc ; break;
case 11 : sum += 33 + 5*rc ; break;
case 12 : sum += 45 + 6*rc ; break;
case 13 : sum += 45 + 6*rc ; break;
case 14 : sum += 45 + 6*rc ; break;
}
You can do something like this:
Func<int,int> Euler = total=>
new List<int>() {3,5}
.Select(m => ((int) (total-1) / m) * m * (((int) (total-1) / m) + 1) / 2)
.Aggregate( (T, m) => T+=m);
You still have the double counting problem. I'll think about this a little more.
Edit:
Here is a working (if slightly inelegant) solution in LINQ:
var li = new List<int>() { 3, 5 };
Func<int, int, int> Summation = (total, m) =>
((int) (total-1) / m) * m * (((int) (total-1) / m) + 1) / 2;
Func<int,int> Euler = total=>
li
.Select(m => Summation(total, m))
.Aggregate((T, m) => T+=m)
- Summation(total, li.Aggregate((T, m) => T*=m));
Can any of you guys improve on this?
Explanation:
Remember the summation formula for a linear progression is n(n+1)/2. In the first case where you have multiples of 3,5 < 10, you want Sum(3+6+9,5). Setting total=10, you make a sequence of the integers 1 .. (int) (total-1)/3, and then sum the sequence and multiply by 3. You can easily see that we're just setting n=(int) (total-1)/3, then using the summation formula and multiplying by 3. A little algebra gives us the formula for the Summation functor.
I like technielogys idea, here's my idea of a modification
static int Euler1 ()
{
int sum = 0;
for (int i=3; i<1000; i+=3)
{
if (i % 5 == 0) continue;
sum+=i;
}
for (int i=5; i<1000; i+=5) sum+=i;
return sum;
}
Though also comes to mind is maybe a minor heuristic, does this make any improvement?
static int Euler1 ()
{
int sum = 0;
for (int i=3; i<1000; i+=3)
{
if (i % 5 == 0) continue;
sum+=i;
}
for (int i=5; i<250; i+=5)
{
sum+=i;
}
for (int i=250; i<500; i+=5)
{
sum+=i;
sum+=i*2;
sum+=(i*2)+5;
}
return sum;
}
Your approach is brute force apprach, The time complexity of the following approach is O(1), Here we
are dividing the given (number-1) by 3, 5 and 15, and store in countNumOf3,countNumOf5, countNumOf15.
Now we can say that 3 will make AP, within the range of given (number-1) with difference of 3.
suppose you are given number is 16, then
3=> 3, 6, 9, 12, 15= sum1=>45
5=> 5, 10, 15 sum2=> 30
15=> 15 => sum3=15
Add sum= sum1 and sum2
Here 15 is multiple of 3 and 5 so remove sum3 form sum, this will be your answer. **sum=sum-
sum3** please check link of my solution on http://ideone.com/beXsam]
import java.util.*;
class Multiplesof3And5 {
public static void main(String [] args){
Scanner scan=new Scanner(System.in);
int num=scan.nextInt();
System.out.println(getSum(num));
}
public static long getSum(int n){
int countNumOf3=(n-1)/3;//
int countNumOf5=(n-1)/5;
int countNumOf15=(n-1)/15;
long sum=0;
sum=sumOfAP(3,countNumOf3,3)+sumOfAP(5,countNumOf5,5)-sumOfAP(15,countNumOf15,15);
return sum;
}
public static int sumOfAP(int a, int n, int d){
return (n*(2*a +(n -1)*d))/2;
}
}
new List<int>{3,5}.SelectMany(n =>Enumerable.Range(1,999/n).Select(i=>i*n))
.Distinct()
.Sum()
[Update] (In response to the comment asking to explain this algorothm)
This builds a flattened list of multiples for each base value (3 and 5 in this case), then removes duplicates (e.g where a multiple is divisible, in this case, by 3*5 =15) and then sums the remaining values. (Also this is easily generalisable for having more than two base values IMHO compared to any of the other solutions I have seen here.)

How can you get the first digit in an int (C#)?

In C#, what's the best way to get the 1st digit in an int? The method I came up with is to turn the int into a string, find the 1st char of the string, then turn it back to an int.
int start = Convert.ToInt32(curr.ToString().Substring(0, 1));
While this does the job, it feels like there is probably a good, simple, math-based solution to such a problem. String manipulation feels clunky.
Edit: irrespective of speed differences, mystring[0] instead of Substring() is still just string manipulation
Benchmarks
Firstly, you must decide on what you mean by "best" solution, of course that takes into account the efficiency of the algorithm, its readability/maintainability, and the likelihood of bugs creeping up in the future. Careful unit tests can generally avoid those problems, however.
I ran each of these examples 10 million times, and the results value is the number of ElapsedTicks that have passed.
Without further ado, from slowest to quickest, the algorithms are:
Converting to a string, take first character
int firstDigit = (int)(Value.ToString()[0]) - 48;
Results:
12,552,893 ticks
Using a logarithm
int firstDigit = (int)(Value / Math.Pow(10, (int)Math.Floor(Math.Log10(Value))));
Results:
9,165,089 ticks
Looping
while (number >= 10)
number /= 10;
Results:
6,001,570 ticks
Conditionals
int firstdigit;
if (Value < 10)
firstdigit = Value;
else if (Value < 100)
firstdigit = Value / 10;
else if (Value < 1000)
firstdigit = Value / 100;
else if (Value < 10000)
firstdigit = Value / 1000;
else if (Value < 100000)
firstdigit = Value / 10000;
else if (Value < 1000000)
firstdigit = Value / 100000;
else if (Value < 10000000)
firstdigit = Value / 1000000;
else if (Value < 100000000)
firstdigit = Value / 10000000;
else if (Value < 1000000000)
firstdigit = Value / 100000000;
else
firstdigit = Value / 1000000000;
Results:
1,421,659 ticks
Unrolled & optimized loop
if (i >= 100000000) i /= 100000000;
if (i >= 10000) i /= 10000;
if (i >= 100) i /= 100;
if (i >= 10) i /= 10;
Results:
1,399,788 ticks
Note:
each test calls Random.Next() to get the next int
Here's how
int i = Math.Abs(386792);
while(i >= 10)
i /= 10;
and i will contain what you need
Try this
public int GetFirstDigit(int number) {
if ( number < 10 ) {
return number;
}
return GetFirstDigit ( (number - (number % 10)) / 10);
}
EDIT
Several people have requested the loop version
public static int GetFirstDigitLoop(int number)
{
while (number >= 10)
{
number = (number - (number % 10)) / 10;
}
return number;
}
The best I can come up with is:
int numberOfDigits = Convert.ToInt32(Math.Floor( Math.Log10( value ) ) );
int firstDigit = value / Math.Pow( 10, numberOfDigits );
variation on Anton's answer:
// cut down the number of divisions (assuming i is positive & 32 bits)
if (i >= 100000000) i /= 100000000;
if (i >= 10000) i /= 10000;
if (i >= 100) i /= 100;
if (i >= 10) i /= 10;
int myNumber = 8383;
char firstDigit = myNumber.ToString()[0];
// char = '8'
Had the same idea as Lennaert
int start = number == 0 ? 0 : number / (int) Math.Pow(10,Math.Floor(Math.Log10(Math.Abs(number))));
This also works with negative numbers.
If you think Keltex's answer is ugly, try this one, it's REALLY ugly, and even faster.
It does unrolled binary search to determine the length.
... leading code along the same lines
/* i<10000 */
if (i >= 100){
if (i >= 1000){
return i/1000;
}
else /* i<1000 */{
return i/100;
}
}
else /* i<100*/ {
if (i >= 10){
return i/10;
}
else /* i<10 */{
return i;
}
}
P.S. MartinStettner had the same idea.
Very simple (and probably quite fast because it only involves comparisons and one division):
if(i<10)
firstdigit = i;
else if (i<100)
firstdigit = i/10;
else if (i<1000)
firstdigit = i/100;
else if (i<10000)
firstdigit = i/1000;
else if (i<100000)
firstdigit = i/10000;
else (etc... all the way up to 1000000000)
An obvious, but slow, mathematical approach is:
int firstDigit = (int)(i / Math.Pow(10, (int)Math.Log10(i))));
int temp = i;
while (temp >= 10)
{
temp /= 10;
}
Result in temp
I know it's not C#, but it's surprising curious that in python the "get the first char of the string representation of the number" is the faster!
EDIT: no, I made a mistake, I forgot to construct again the int, sorry. The unrolled version it's the fastest.
$ cat first_digit.py
def loop(n):
while n >= 10:
n /= 10
return n
def unrolled(n):
while n >= 100000000: # yea... unlimited size int supported :)
n /= 100000000
if n >= 10000:
n /= 10000
if n >= 100:
n /= 100
if n >= 10:
n /= 10
return n
def string(n):
return int(str(n)[0])
$ python -mtimeit -s 'from first_digit import loop as test' \
'for n in xrange(0, 100000000, 1000): test(n)'
10 loops, best of 3: 275 msec per loop
$ python -mtimeit -s 'from first_digit import unrolled as test' \
'for n in xrange(0, 100000000, 1000): test(n)'
10 loops, best of 3: 149 msec per loop
$ python -mtimeit -s 'from first_digit import string as test' \
'for n in xrange(0, 100000000, 1000): test(n)'
10 loops, best of 3: 284 msec per loop
$
I just stumbled upon this old question and felt inclined to propose another suggestion since none of the other answers so far returns the correct result for all possible input values and it can still be made faster:
public static int GetFirstDigit( int i )
{
if( i < 0 && ( i = -i ) < 0 ) return 2;
return ( i < 100 ) ? ( i < 1 ) ? 0 : ( i < 10 )
? i : i / 10 : ( i < 1000000 ) ? ( i < 10000 )
? ( i < 1000 ) ? i / 100 : i / 1000 : ( i < 100000 )
? i / 10000 : i / 100000 : ( i < 100000000 )
? ( i < 10000000 ) ? i / 1000000 : i / 10000000
: ( i < 1000000000 ) ? i / 100000000 : i / 1000000000;
}
This works for all signed integer values inclusive -2147483648 which is the smallest signed integer and doesn't have a positive counterpart. Math.Abs( -2147483648 ) triggers a System.OverflowException and - -2147483648 computes to -2147483648.
The implementation can be seen as a combination of the advantages of the two fastest implementations so far. It uses a binary search and avoids superfluous divisions. A quick benchmark with the index of a loop with 100,000,000 iterations shows that it is twice as fast as the currently fastest implementation.
It finishes after 2,829,581 ticks.
For comparison I also measured a corrected variant of the currently fastest implementation which took 5,664,627 ticks.
public static int GetFirstDigitX( int i )
{
if( i < 0 && ( i = -i ) < 0 ) return 2;
if( i >= 100000000 ) i /= 100000000;
if( i >= 10000 ) i /= 10000;
if( i >= 100 ) i /= 100;
if( i >= 10 ) i /= 10;
return i;
}
The accepted answer with the same correction needed 16,561,929 ticks for this test on my computer.
public static int GetFirstDigitY( int i )
{
if( i < 0 && ( i = -i ) < 0 ) return 2;
while( i >= 10 )
i /= 10;
return i;
}
Simple functions like these can easily be proven for correctness since iterating all possible integer values takes not much more than a few seconds on current hardware. This means that it is less important to implement them in a exceptionally readable fashion as there simply won't ever be the need to fix a bug inside them later on.
Did some tests with one of my co-workers here, and found out most of the solutions don't work for numbers under 0.
public int GetFirstDigit(int number)
{
number = Math.Abs(number); <- makes sure you really get the digit!
if (number < 10)
{
return number;
}
return GetFirstDigit((number - (number % 10)) / 10);
}
Using all the examples below to get this code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Diagnostics;
namespace Benfords
{
class Program
{
static int FirstDigit1(int value)
{
return Convert.ToInt32(value.ToString().Substring(0, 1));
}
static int FirstDigit2(int value)
{
while (value >= 10) value /= 10;
return value;
}
static int FirstDigit3(int value)
{
return (int)(value.ToString()[0]) - 48;
}
static int FirstDigit4(int value)
{
return (int)(value / Math.Pow(10, (int)Math.Floor(Math.Log10(value))));
}
static int FirstDigit5(int value)
{
if (value < 10) return value;
if (value < 100) return value / 10;
if (value < 1000) return value / 100;
if (value < 10000) return value / 1000;
if (value < 100000) return value / 10000;
if (value < 1000000) return value / 100000;
if (value < 10000000) return value / 1000000;
if (value < 100000000) return value / 10000000;
if (value < 1000000000) return value / 100000000;
return value / 1000000000;
}
static int FirstDigit6(int value)
{
if (value >= 100000000) value /= 100000000;
if (value >= 10000) value /= 10000;
if (value >= 100) value /= 100;
if (value >= 10) value /= 10;
return value;
}
const int mcTests = 1000000;
static void Main(string[] args)
{
Stopwatch lswWatch = new Stopwatch();
Random lrRandom = new Random();
int liCounter;
lswWatch.Start();
for (liCounter = 0; liCounter < mcTests; liCounter++)
FirstDigit1(lrRandom.Next());
lswWatch.Stop();
Console.WriteLine("Test {0} = {1} ticks", 1, lswWatch.ElapsedTicks);
lswWatch.Reset();
lswWatch.Start();
for (liCounter = 0; liCounter < mcTests; liCounter++)
FirstDigit2(lrRandom.Next());
lswWatch.Stop();
Console.WriteLine("Test {0} = {1} ticks", 2, lswWatch.ElapsedTicks);
lswWatch.Reset();
lswWatch.Start();
for (liCounter = 0; liCounter < mcTests; liCounter++)
FirstDigit3(lrRandom.Next());
lswWatch.Stop();
Console.WriteLine("Test {0} = {1} ticks", 3, lswWatch.ElapsedTicks);
lswWatch.Reset();
lswWatch.Start();
for (liCounter = 0; liCounter < mcTests; liCounter++)
FirstDigit4(lrRandom.Next());
lswWatch.Stop();
Console.WriteLine("Test {0} = {1} ticks", 4, lswWatch.ElapsedTicks);
lswWatch.Reset();
lswWatch.Start();
for (liCounter = 0; liCounter < mcTests; liCounter++)
FirstDigit5(lrRandom.Next());
lswWatch.Stop();
Console.WriteLine("Test {0} = {1} ticks", 5, lswWatch.ElapsedTicks);
lswWatch.Reset();
lswWatch.Start();
for (liCounter = 0; liCounter < mcTests; liCounter++)
FirstDigit6(lrRandom.Next());
lswWatch.Stop();
Console.WriteLine("Test {0} = {1} ticks", 6, lswWatch.ElapsedTicks);
Console.ReadLine();
}
}
}
I get these results on an AMD Ahtlon 64 X2 Dual Core 4200+ (2.2 GHz):
Test 1 = 2352048 ticks
Test 2 = 614550 ticks
Test 3 = 1354784 ticks
Test 4 = 844519 ticks
Test 5 = 150021 ticks
Test 6 = 192303 ticks
But get these on a AMD FX 8350 Eight Core (4.00 GHz)
Test 1 = 3917354 ticks
Test 2 = 811727 ticks
Test 3 = 2187388 ticks
Test 4 = 1790292 ticks
Test 5 = 241150 ticks
Test 6 = 227738 ticks
So whether or not method 5 or 6 is faster depends on the CPU, I can only surmise this is because the branch prediction in the command processor of the CPU is smarter on the new processor, but I'm not really sure.
I dont have any Intel CPUs, maybe someone could test it for us?
Check this one too:
int get1digit(Int64 myVal)
{
string q12 = myVal.ToString()[0].ToString();
int i = int.Parse(q12);
return i;
}
Also good if you want multiple numbers:
int get3digit(Int64 myVal) //Int64 or whatever numerical data you have
{
char mg1 = myVal.ToString()[0];
char mg2 = myVal.ToString()[1];
char mg3 = myVal.ToString()[2];
char[] chars = { mg1, mg2, mg3 };
string q12= new string(chars);
int i = int.Parse(q12);
return i;
}
while (i > 10)
{
i = (Int32)Math.Floor((Decimal)i / 10);
}
// i is now the first int
Non iterative formula:
public static int GetHighestDigit(int num)
{
if (num <= 0)
return 0;
return (int)((double)num / Math.Pow(10f, Math.Floor(Math.Log10(num))));
}
Just to give you an alternative, you could repeatedly divide the integer by 10, and then rollback one value once you reach zero. Since string operations are generally slow, this may be faster than string manipulation, but is by no means elegant.
Something like this:
while(curr>=10)
curr /= 10;
start = getFirstDigit(start);
public int getFirstDigit(final int start){
int number = Math.abs(start);
while(number > 10){
number /= 10;
}
return number;
}
or
public int getFirstDigit(final int start){
return getFirstDigit(Math.abs(start), true);
}
private int getFirstDigit(final int start, final boolean recurse){
if(start < 10){
return start;
}
return getFirstDigit(start / 10, recurse);
}
int start = curr;
while (start >= 10)
start /= 10;
This is more efficient than a ToString() approach which internally must implement a similar loop and has to construct (and parse) a string object on the way ...
Very easy method to get the Last digit:
int myInt = 1821;
int lastDigit = myInt - ((myInt/10)*10); // 1821 - 1820 = 1
int i = 4567789;
int digit1 = int.Parse(i.ToString()[0].ToString());
This is what I usually do ,please refer my function below :
This function can extract first number occurance from any string you can modify and use this function according to your usage
public static int GetFirstNumber(this string strInsput)
{
int number = 0;
string strNumber = "";
bool bIsContNo = true;
bool bNoOccued = false;
try
{
var arry = strInsput.ToCharArray(0, strInsput.Length - 1);
foreach (char item in arry)
{
if (char.IsNumber(item))
{
strNumber = strNumber + item.ToString();
bIsContNo = true;
bNoOccued = true;
}
else
{
bIsContNo = false;
}
if (bNoOccued && !bIsContNo)
{
break;
}
}
number = Convert.ToInt32(strNumber);
}
catch (Exception ex)
{
return 0;
}
return number;
}
public static int GetFirstDigit(int n, bool removeSign = true)
{
if (removeSign)
return n <= -10 || n >= 10 ? Math.Abs(n) % 10 : Math.Abs(n);
else
return n <= -10 || n >= 10 ? n % 10 : n;
}
//Your code goes here
int[] test = new int[] { -1574, -221, 1246, -4, 8, 38546};
foreach(int n in test)
Console.WriteLine(string.Format("{0} : {1}", n, GetFirstDigit(n)));
Output:
-1574 : 4
-221 : 1
1246 : 6
-4 : 4
8 : 8
38546 : 6
Here is a simpler way that does not involve looping
int number = 1234
int firstDigit = Math.Floor(number/(Math.Pow(10, number.ToString().length - 1))
That would give us 1234/Math.Pow(10, 4 - 1) = 1234/1000 = 1

Categories

Resources