prime number summing still slow after using sieve - c#
I had a go at a project euler coding challenge below, the answer given by the code is correct but I do not understand why its taking nearly a minute to run. It was finishing with similar times prior to using a sieve. Others users are reporting times as low as milliseconds.
I assume I am making a basic error somewhere...
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Find the sum of all the primes below two million.
public static long Ex010()
{
var sum = 0L;
var sieve = new bool[2000000];
var primes = new List<int>(10000);
for (int i = 2; i < sieve.Length; i++)
{
if (sieve[i-1])
continue;
var isPrime = true;
foreach (var prime in primes)
{
if (i % prime == 0) {
isPrime = false;
break;
}
}
if (isPrime) {
primes.Add(i);
sum += i;
for (var x = i * 2; x < sieve.Length; x += i) {
sieve[x-1] = true;
}
}
}
return sum;
}
EDIT:
The only thing that seemed missing was this optimization :
if (prime > Math.Sqrt(i))
break;
It brings the time down to 160 ms.
EDIT 2:
Finally clicked, took out the foreach as was suggested many times. Its now 12 ms. Final solution :
public static long Ex010()
{
var sum = 0L;
var sieve = new bool[2000000];
for (int i = 2; i < sieve.Length; i++)
{
if (sieve[i-1])
continue;
sum += i;
for (var x = i * 2; x < sieve.Length; x += i) {
sieve[x-1] = true;
}
}
return sum;
}
You are doing trial division in addition to a sieve.
The boolean array will already tell you if a number is prime, so you don't need the List of primes at all.
You can also speed it up by only sieving up to the square root of the limit.
If you want to save some memory also, you can use a BitArray instead of a boolean array.
public static long Ex010()
{
const int Limit = 2000000;
int sqrt = (int)Math.Sqrt(Limit);
var sum = 0L;
var isComposite = new bool[Limit];
for (int i = 2; i < sqrt; i++) {
if (isComposite[i - 2])
continue;//This number is not prime, skip
sum += i;
for (var x = i * i; x < isComposite.Length; x += i) {
isComposite[x - 2] = true;
}
}
//Add the remaining prime numbers
for (int i = sqrt; i < Limit; i++) {
if (!isComposite[i - 2]) {
sum += i;
}
}
return sum;
}
(tl;dr: 2 million in 0.8 ms, 2 billion in 1.25 s; segmented odds-only SoE, presieving, wheeled striding)
As always, the limit of Euler task #10 seems designed to pose a mild challenge on a ZX81, Apple ][ or C64 but on modern hardware you generally have to multiply the limits by 1000 to make things even remotely interesting. Or set a time limit like 5 seconds and try to see by how many orders of magnitude the Euler limit can be exceeded...
Dennis_E's solution is simple and efficient, but I'd recommend applying two small improvements that give a marked performance boost without any effort at all.
Represent only odd numbers in the sieve
All even numbers except for the number 2 are composite. If you pull the number 2 out of thin air when needed then you can drop all even numbers from the sieve. This halves workload and memory footprint, for a doubling of performance at the marginal cost of writing << or >> in a few places (to convert between the realm of numbers and the realm of bit indices). This is usually known as an 'odds-only sieve' or 'wheeled representation mod 2'; it has the added advantage that it largely removes the need for guarding against index overflow.
Skip a few extra small primes during decoding
Skipping a few small primes ('applying a wheel') is much easier when going through a range of numbers incrementally, compared to hopping wildly about with different strides as during sieving. This skipping only involves applying a cyclic sequence of differences between consecutive numbers that are not multiples of the primes in question, like 4,2,4,2... for skipping multiples of 2 and 3 (the mod 6 wheel) or 6,4,2,4,2,4,6,2... for skipping multiples of 2, 3 and 5.
The mod 6 wheel sequence alternates between only two numbers, which can easily be achieved by XORing with a suitable value. On top of the odds-only sieve the distances are halved, so that the sequence becomes 2,1,2,1... This skipping reduces the work during decoding by 1/3rd (for stepping mod 3), and the skipped primes can also be ignored during the sieving. The latter can have a marked effect on sieve times, since the smallest primes do the greatest number of crossings-off during the sieving.
Here's a simple Sieve of Eratosthenes, with both suggestions applied. Note: here and in the following I generally go with the flow of C#/.Net and use signed integers where I would normally use unsigned integers in any sane language. That's because I don't have the time to vet the code for the performance implications (penalties) resulting from the use of unsigned types, like that the compiler suddenly forgets how to replace division by a constant with multiplication of the inverse and so on.
static long sum_small_primes_up_to (int n)
{
if (n < 7)
return (0xAA55200 >> (n << 2)) & 0xF;
int sqrt_n_halved = (int)Math.Sqrt(n) >> 1;
int max_bit = (int)(n - 1) >> 1;
var odd_composite = new bool[max_bit + 1];
for (int i = 5 >> 1; i <= sqrt_n_halved; ++i)
if (!odd_composite[i])
for (int p = (i << 1) + 1, j = p * p >> 1; j <= max_bit; j += p)
odd_composite[j] = true;
long sum = 2 + 3;
for (int i = 5 >> 1, d = 1; i <= max_bit; i += d, d ^= 3)
if (!odd_composite[i])
sum += (i << 1) + 1;
return sum;
}
The first if statement handles the small fry (n in 0..6) by returning a suitable element of a precomputed list of numbers, and it serves to get all the special cases out of the way in one fell swoop. All other occurrences of shift operators are for converting between the realm of numbers and the realm of indices into the odds-only sieve.
This is pretty much the same code that I normally use for sieving small primes, up to 64K or so (the potential least factors for numbers up to 32 bits). It does Euler's piddling 2 million in 4.5 milliseconds but throwing bigger numbers at it shows its Achilles heel: it does a lot of striding over large distances, which interacts badly with modern memory subsystems where decent access speed can only be got from caches. The performance drops markedly when the capacity of the level 1 cache (typically 32 KiByte) is exceeded significantly, and it goes down even further when exceeding the L2 and L3 capacities (typically several megabytes). How sharp the drop is depends on the quality (price tag) of the computer, of course...
Here are some timings taken on my laptop:
# benchmark: small_v0 ...
sum up to 2 * 10^4: 21171191 in 0,03 ms
sum up to 2 * 10^5: 1709600813 in 0,35 ms // 11,0 times
sum up to 2 * 10^6: 142913828922 in 4,11 ms // 11,7 times
sum up to 2 * 10^7: 12272577818052 in 59,36 ms // 14,4 times
sum up to 2 * 10^8: 1075207199997334 in 1.225,19 ms // 20,6 times
sum up to 2 * 10^9: 95673602693282040 in 14.381,29 ms // 11,7 times
In the middlish ranges there are time increases that go well beyond the expected factor of about 11, and then things stabilise again.
And here comes how to speed up the beast an order of magnitude...
Process the range in cache-sized segments
The remedy is easy enough: instead of striding each prime all the way from one end of the range to the other - and hence all across the memory space - we sieve the range in cache-sized strips, memorising the final positions for each prime so that the next round can continue right where the previous round left off. If we don't need a big bad sieve full of bits at the end then we can process a strip (extract its primes) after it has been sieved and then discard its data, reusing the sieve buffer for the next strip. Both are variations on the theme of segmented sieving but the offsets are treated differently during the processing; when the distinction matters then the first approach (big bad sieve for the whole range) is usually called a segmented sieve and the latter an iterated sieve. The terms 'moving' or 'sliding' sieve might fit the latter somewhat but should be avoided because they normally refer to a totally different class of sieves (also known as deque sieves) that are deceptively simple but whose performance is worse by at least an order of magnitude.
Here's an example of an iterated sieve, a slightly modified version of a function that I normally use for sieving primes in given ranges [m, n], just like in SPOJ's PRIMES1 and PRINT. Here the parameter m is implicitly 0, so it doesn't need to be passed.
Normally the function takes an interface that is responsible for processing the raw sieve (and any lose primes it may get passed), and which can get queried for the number of primes that the processor skips ('decoder order') so that the sieve can ignore those during the sieving. For this exposition I've replaced this with a delegate for simplicity.
The factor primes get sieved by a stock function that may look somewhat familiar, and I've changed the logic of the sieve from 'is_composite' to 'not_composite' (and to a base type that can participate in arithmetic) for reasons that will be explained later. decoder_order is the number of additional primes skipped by the decoder (which would be 1 for the function shown earlier, because it skips multiples of the prime 3 during the prime extraction/summing, over and above the wheel prime 2).
const int SIEVE_BITS = 1 << 15; // L1 cache size, 1 usable bit per byte
delegate long sieve_sum_func (byte[] sieve, int window_base, int window_bits);
static long sum_primes_up_to (int n, sieve_sum_func sum_func, int decoder_order)
{
if (n < 7)
return 0xF & (0xAA55200 >> (n << 2));
n -= ~n & 1; // make odd (n can't be 0 here)
int sqrt_n = (int)Math.Sqrt(n);
var factor_primes = small_primes_up_to(sqrt_n).ToArray();
int first_sieve_prime_index = 1 + decoder_order; // skip wheel primes + decoder primes
int m = 7; // this would normally be factor_primes[first_sieve_prime_index] + 2
int bits_to_sieve = ((n - m) >> 1) + 1;
int sieve_bits = Math.Min(bits_to_sieve, SIEVE_BITS);
var sieve = new byte[sieve_bits];
var offsets = new int[factor_primes.Length];
int sieve_primes_end = first_sieve_prime_index;
long sum = 2 + 3 + 5; // wheel primes + decoder primes
for (int window_base = m; ; )
{
int window_bits = Math.Min(bits_to_sieve, sieve_bits);
int last_number_in_window = window_base - 1 + (window_bits << 1);
while (sieve_primes_end < factor_primes.Length)
{
int prime = factor_primes[sieve_primes_end];
int start = prime * prime, stride = prime << 1;
if (start > last_number_in_window)
break;
if (start < window_base)
start = (stride - 1) - (window_base - start - 1) % stride;
else
start -= window_base;
offsets[sieve_primes_end++] = start >> 1;
}
fill(sieve, window_bits, (byte)1);
for (int i = first_sieve_prime_index; i < sieve_primes_end; ++i)
{
int prime = factor_primes[i], j = offsets[i];
for ( ; j < window_bits; j += prime)
sieve[j] = 0;
offsets[i] = j - window_bits;
}
sum += sum_func(sieve, window_base, window_bits);
if ((bits_to_sieve -= window_bits) == 0)
break;
window_base += window_bits << 1;
}
return sum;
}
static List<int> small_primes_up_to (int n)
{
int upper_bound_on_pi = 32 + (n < 137 ? 0 : (int)(n / (Math.Log(n) - 1.083513)));
var result = new List<int>(upper_bound_on_pi);
if (n < 2)
return result;
result.Add(2); // needs to be pulled out of thin air because of the mod 2 wheel
if (n < 3)
return result;
result.Add(3); // needs to be pulled out of thin air because of the mod 3 decoder
int sqrt_n_halved = (int)Math.Sqrt(n) >> 1;
int max_bit = (n - 1) >> 1;
var odd_composite = new bool[max_bit + 1];
for (int i = 5 >> 1; i <= sqrt_n_halved; ++i)
if (!odd_composite[i])
for (int p = (i << 1) + 1, j = p * p >> 1; j <= max_bit; j += p)
odd_composite[j] = true;
for (int i = 5 >> 1, d = 1; i <= max_bit; i += d, d ^= 3)
if (!odd_composite[i])
result.Add((i << 1) + 1);
return result;
}
static void fill<T> (T[] array, int count, T value, int threshold = 16)
{
Trace.Assert(count <= array.Length);
int current_size = Math.Min(threshold, count);
for (int i = 0; i < current_size; ++i)
array[i] = value;
for (int half = count >> 1; current_size <= half; current_size <<= 1)
Buffer.BlockCopy(array, 0, array, current_size, current_size);
Buffer.BlockCopy(array, 0, array, current_size, count - current_size);
}
Here's sieve processor that is equivalent to the logic used in the function shown at the beginning, and a dummy function that can be used to measure the sieve time sans any decoding, for comparison:
static long prime_sum_null (byte[] sieve, int window_base, int window_bits)
{
return 0;
}
static long prime_sum_v0 (byte[] sieve, int window_base, int window_bits)
{
long sum = 0;
int i = window_base % 3 == 0 ? 1 : 0;
int d = 3 - (window_base + 2 * i) % 3;
for ( ; i < window_bits; i += d, d ^= 3)
if (sieve[i] == 1)
sum += window_base + (i << 1);
return sum;
}
This function needs to perform a bit of modulo magic to synchronise itself with the mod 3 sequence over the mod 2 sieve; the earlier function did not need to do this because its starting point was fixed, not a parameter. Here are the timings:
# benchmark: iter_v0 ...
sum up to 2 * 10^4: 21171191 in 0,04 ms
sum up to 2 * 10^5: 1709600813 in 0,28 ms // 7,0 times
sum up to 2 * 10^6: 142913828922 in 2,42 ms // 8,7 times
sum up to 2 * 10^7: 12272577818052 in 22,11 ms // 9,1 times
sum up to 2 * 10^8: 1075207199997334 in 223,67 ms // 10,1 times
sum up to 2 * 10^9: 95673602693282040 in 2.408,06 ms // 10,8 times
Quite a difference, n'est-ce pas? But we're not done yet.
Replace conditional branching with arithmetic
Modern processors like things to be simple and predictable; if branches are not predicted correctly then the CPU levies a heavy fine in extra cycles for flushing and refilling the instruction pipeline. Unfortunately, the decoding loop isn't very predictable because primes are fairly dense in the low number ranges we're talking about here:
if (!odd_composite[i])
++count;
If the average number of non-primes between primes times the cost of an addition is less than the penalty for a mispredicted branch then the following statement should be faster:
count += sieve[i];
This explains why I inverted the logic of the sieve compared to normal, because with 'is_composite' semantics I'd have to do
count += 1 ^ odd_composite[i];
And the rule is to pull everything out of inner loops that can be pulled out, so that I simply applied 1 ^ x to the whole array before even starting.
However, Euler wants us to sum the primes instead of counting them. This can be done in a similar fashion, by turning the value 1 into a mask of all 1 bits (which preserves everything when ANDing) and 0 zeroises any value. This is similar to the CMOV instruction, except that it works even on the oldest of CPUs and does not require a reasonably decent compiler:
static long prime_sum_v1 (byte[] sieve, int window_base, int window_bits)
{
long sum = 0;
int i = window_base % 3 == 0 ? 1 : 0;
int d = 3 - (window_base + 2 * i) % 3;
for ( ; i < window_bits; i += d, d ^= 3)
sum += (0 - sieve[i]) & (window_base + (i << 1));
return sum;
}
Result:
# benchmark: iter_v1 ...
sum up to 2 * 10^4: 21171191 in 0,10 ms
sum up to 2 * 10^5: 1709600813 in 0,36 ms // 3,6 times
sum up to 2 * 10^6: 142913828922 in 1,88 ms // 5,3 times
sum up to 2 * 10^7: 12272577818052 in 13,80 ms // 7,3 times
sum up to 2 * 10^8: 1075207199997334 in 157,39 ms // 11,4 times
sum up to 2 * 10^9: 95673602693282040 in 1.819,05 ms // 11,6 times
Unrolling and strength reduction
Now, a bit of overkill: a decoder with a fully unrolled wheel mod 15 (the unrolling can unlock some reserves of instruction-level parallelism).
static long prime_sum_v5 (byte[] sieve, int window_base, int window_bits)
{
Trace.Assert(window_base % 2 == 1);
int count = 0, sum = 0;
int residue = window_base % 30;
int phase = UpperIndex[residue];
int i = (SpokeValue[phase] - residue) >> 1;
// get into phase for the unrolled code (which is based on phase 0)
for ( ; phase != 0 && i < window_bits; i += DeltaDiv2[phase], phase = (phase + 1) & 7)
{
int b = sieve[i]; count += b; sum += (0 - b) & i;
}
// process full revolutions of the wheel (anchored at phase 0 == residue 1)
for (int e = window_bits - (29 >> 1); i < e; i += (30 >> 1))
{
int i0 = i + ( 1 >> 1), b0 = sieve[i0]; count += b0; sum += (0 - b0) & i0;
int i1 = i + ( 7 >> 1), b1 = sieve[i1]; count += b1; sum += (0 - b1) & i1;
int i2 = i + (11 >> 1), b2 = sieve[i2]; count += b2; sum += (0 - b2) & i2;
int i3 = i + (13 >> 1), b3 = sieve[i3]; count += b3; sum += (0 - b3) & i3;
int i4 = i + (17 >> 1), b4 = sieve[i4]; count += b4; sum += (0 - b4) & i4;
int i5 = i + (19 >> 1), b5 = sieve[i5]; count += b5; sum += (0 - b5) & i5;
int i6 = i + (23 >> 1), b6 = sieve[i6]; count += b6; sum += (0 - b6) & i6;
int i7 = i + (29 >> 1), b7 = sieve[i7]; count += b7; sum += (0 - b7) & i7;
}
// clean up leftovers
for ( ; i < window_bits; i += DeltaDiv2[phase], phase = (phase + 1) & 7)
{
int b = sieve[i]; count += b; sum += (0 - b) & i;
}
return (long)window_base * count + ((long)sum << 1);
}
As you can see, I performed a bit of strength reduction in order to make things easier for the compiler. Instead of summing window_base + (i << 1), I sum i and 1 separately and perform the rest of the calculation only once, at the end of the function.
Timings:
# benchmark: iter_v5(1) ...
sum up to 2 * 10^4: 21171191 in 0,01 ms
sum up to 2 * 10^5: 1709600813 in 0,11 ms // 9,0 times
sum up to 2 * 10^6: 142913828922 in 1,01 ms // 9,2 times
sum up to 2 * 10^7: 12272577818052 in 11,52 ms // 11,4 times
sum up to 2 * 10^8: 1075207199997334 in 130,43 ms // 11,3 times
sum up to 2 * 10^9: 95673602693282040 in 1.563,10 ms // 12,0 times
# benchmark: iter_v5(2) ...
sum up to 2 * 10^4: 21171191 in 0,01 ms
sum up to 2 * 10^5: 1709600813 in 0,09 ms // 8,7 times
sum up to 2 * 10^6: 142913828922 in 1,03 ms // 11,3 times
sum up to 2 * 10^7: 12272577818052 in 10,34 ms // 10,0 times
sum up to 2 * 10^8: 1075207199997334 in 121,08 ms // 11,7 times
sum up to 2 * 10^9: 95673602693282040 in 1.468,28 ms // 12,1 times
The first set of timings is for decoder_order == 1 (i.e. not telling the sieve about the extra skipped prime), for direct comparison to the other decoder versions. The second set is for decoder_order == 2, which means the sieve could skip the crossings-off for the prime 5 as well. Here are the null timings (essentially the sieve time without the decode time), to put things a bit into perspective:
# benchmark: iter_null(1) ...
sum up to 2 * 10^8: 10 in 94,74 ms // 11,4 times
sum up to 2 * 10^9: 10 in 1.194,18 ms // 12,6 times
# benchmark: iter_null(2) ...
sum up to 2 * 10^8: 10 in 86,05 ms // 11,9 times
sum up to 2 * 10^9: 10 in 1.109,32 ms // 12,9 times
This shows that the work on the decoder has decreased decode time for 2 billion from 1.21 s to 0.35 s, which is nothing to sneeze at. Similar speedups can be realised for the sieving as well, but that is nowhere near as easy as it was for the decoding.
Low-hanging fruit: presieving
Lastly, a technique that can sometimes offer dramatic speedups (especially for packed bitmaps and/or higher-order wheels) is blasting a canned bit pattern over the sieve before comencing a round of sieving, such that the sieve looks as if it had already been sieved by a handful of small primes. This is usually known as presieving. In the current case the speedup is marginal (not even 20%) but I'm showing it because it is a useful technique to have in one's toolchest.
Note: I've ripped the presieving logic from another Euler project, so it doesn't fit organically into the code I wrote for this article. But it should demonstrate the technique well enough.
const byte CROSSED_OFF = 0; // i.e. composite
const byte NOT_CROSSED = 1 ^ CROSSED_OFF; // i.e. not composite
const int SIEVE_BYTES = SIEVE_BITS; // i.e. 1 usable bit per byte
internal readonly static byte[] TinyPrimes = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31 };
internal readonly static int m_wheel_order = 3; // == number of wheel primes
internal static int m_presieve_level = 0; // == number of presieve primes
internal static int m_presieve_modulus = 0;
internal static byte[] m_presieve_pattern;
internal static void set_presieve_level (int presieve_primes)
{
m_presieve_level = Math.Max(0, presieve_primes);
m_presieve_modulus = 1;
for (int i = m_wheel_order; i < m_wheel_order + presieve_primes; ++i)
m_presieve_modulus *= TinyPrimes[i];
// the pattern needs to provide SIEVE_BYTES bytes for every residue of the modulus
m_presieve_pattern = new byte[m_presieve_modulus + SIEVE_BYTES - 1];
var pattern = m_presieve_pattern;
int current_size = 1;
pattern[0] = NOT_CROSSED;
for (int i = m_wheel_order; i < m_wheel_order + presieve_primes; ++i)
{
int current_prime = TinyPrimes[i];
int new_size = current_size * current_prime;
// keep doubling while possible
for ( ; current_size * 2 <= new_size; current_size *= 2)
Buffer.BlockCopy(pattern, 0, pattern, current_size, current_size);
// copy rest, if any
Buffer.BlockCopy(pattern, 0, pattern, current_size, new_size - current_size);
current_size = new_size;
// mark multiples of the current prime
for (int j = current_prime >> 1; j < current_size; j += current_prime)
pattern[j] = CROSSED_OFF;
}
for (current_size = m_presieve_modulus; current_size * 2 <= pattern.Length; current_size *= 2)
Buffer.BlockCopy(pattern, 0, pattern, current_size, current_size);
Buffer.BlockCopy(pattern, 0, pattern, current_size, pattern.Length - current_size);
}
For a quick test you can hack the presieving into the sieve function as follows:
- int first_sieve_prime_index = 1 + decoder_order; // skip wheel primes + decoder primes
+ int first_sieve_prime_index = 1 + decoder_order + m_presieve_level; // skip wheel primes + decoder primes
plus
- long sum = 2 + 3 + 5; // wheel primes + decoder primes
+ long sum = 2 + 3 + 5; // wheel primes + decoder primes
+
+ for (int i = 0; i < m_presieve_level; ++i)
+ sum += TinyPrimes[m_wheel_order + i];
plus
- fill(sieve, window_bits, (byte)1);
+ if (m_presieve_level == 0)
+ fill(sieve, window_bits, (byte)1);
+ else
+ Buffer.BlockCopy(m_presieve_pattern, (window_base >> 1) % m_presieve_modulus, sieve, 0, window_bits);
and
set_presieve_level(4) // 4 and 5 work well
in the static constructor or Main().
This way you can use m_presieve_level for turning presieving on and off. The BlockCopy also works correctly after calling set_presieve_level(0), though, because then the modulus is 1. m_wheel_order should reflect the actual wheel order (= 1) plus the decoder order; it's currently set to 3, so it'll work only with the v5 decoder at level 2.
Timings:
# benchmark: iter_v5(2) pre(7) ...
sum up to 2 * 10^4: 21171191 in 0,02 ms
sum up to 2 * 10^5: 1709600813 in 0,08 ms // 4,0 times
sum up to 2 * 10^6: 142913828922 in 0,78 ms // 9,6 times
sum up to 2 * 10^7: 12272577818052 in 8,78 ms // 11,2 times
sum up to 2 * 10^8: 1075207199997334 in 98,89 ms // 11,3 times
sum up to 2 * 10^9: 95673602693282040 in 1.245,19 ms // 12,6 times
sum up to 2^31 - 1: 109930816131860852 in 1.351,97 ms
Related
Get sum of each digit below n as long
This is the code I have but it's to slow, any way to do it faster.. the number range I'm hitting is 123456789 but I can't get it below 15 seconds, and I need it to get below 5 seconds.. long num = 0; for (long i = 0; i <= n; i++) { num = num + GetSumOfDigits(i); } static long GetSumOfDigits(long n) { long num2 = 0; long num3 = n; long r = 0; while (num3 != 0) { r = num3 % 10; num3 = num3 / 10; num2 = num2 + r; } return num2; } sum =(n(n+1))/2 is not giving me the results I need, not calculating properly.. For N = 12 the sum is 1+2+3+4+5+6+7+8+9+(1+0)+(1+1)+(1+2)= 51. I need to do this with a formula instead of a loop.. I've got about 15 tests to run through each under 6 seconds.. with parallel I got one test from 15seconds to 4-8 seconds.. Just to show you the test I'm doing this is the hard one.. [Test] public void When123456789_Then4366712385() { Assert.AreEqual(4366712385, TwistedSum.Solution(123456789)); } On my computer I can run all the tests under 5 seconds.. Look at the photo.. With DineMartine Answer I got these results:
Your algorithm complexity is N log(N). I have found a better algorithm with complexity log(N). The idea is to iterate on the number of digits which is : log10(n) = ln(n)/ln(10) = O(log(n)). The demonstration of this algorithm involves a lot of combinatorial calculus. So I choose not to write it here. Here is the code : public static long Resolve(long input) { var n = (long)Math.Log10(input); var tenPow = (long)Math.Pow(10, n); var rest = input; var result = 0L; for (; n > 0; n--) { var dn = rest / tenPow; rest = rest - dn * tenPow; tenPow = tenPow / 10; result += dn * (rest + 1) + dn * 45 * n * tenPow + dn * (dn-1) * tenPow * 5 ; } result += rest * (rest + 1) / 2; return result; } Now you would solve the problem in a fraction of second. The idea is to write the input as a list of digit : Assuming the solution is given by a function f, we are looking for g a recursive expression of f over n : Actually g can be written as follow : If you find h, the problem would be practically solved.
A little bit convoluted but gets the time down to practicaly zero: private static long getSumOfSumOfDigitsBelow(long num) { if (num == 0) return 0; // 1 -> 1 ; 12 -> 10; 123 -> 100; 321 -> 100, ... int pow10 = (int)Math.Pow(10, Math.Floor(Math.Log10(num))); long firstDigit = num / pow10; long sum = 0; var sum999 = getSumOfSumOfDigitsBelow(pow10 - 1); var sumRest = getSumOfSumOfDigitsBelow(num % pow10); sum += (firstDigit - 1)*(firstDigit - 0)/2*pow10 + firstDigit*sum999; sum += firstDigit*(num%pow10 + 1) + sumRest; return sum; } getSumOfSumOfDigitsBelow(123456789) -> 4366712385 (80us) getSumOfSumOfDigitsBelow(9223372036854775807) -> 6885105964130132360 (500us - unverified) The trick is to avoid to compute the same answer again and again. e.g. 33: your approach: sum = 1+2+3+4+5+6+7+8+9+(1+0)+(1+1)+(1+2)+ ... +(3+2)+(3+3) my approach: sum = 10*(0 + (1+2+3+4+5+6+7+8+9)) + 10*(1 + (1+2+3+4+5+6+7+8+9)) + 10*(2 + (1+2+3+4+5+6+7+8+9)) + 3*(3 + (1 + 2 + 3)) The (1+2+3+4+5+6+7+8+9)-part have to be calculated only once. The loop of 0..firstDigit-1 can be avoided by the n(n-1)/2-trick. I hope this makes sense. The complexity is O(2^N) with N counting the number of digits. This looks very bad but is fast enough for your threshold of 5s even for long-max. It may be possible to transform this algorithm in something running in O(n) by calling getSumOfSumOfDigitsBelow() only once but it would look much more complex. First step of optimization: look at your algorithm ;) Comming back to this problem after the answer of DineMartine : To further optimize the algorithm, the sum999-part can be replaced by an explicit formula. Lets take some number 9999...9=10^k-1 into the code and replace accordingly: sum(10^k-1) = (9 - 1)*(9 - 0)/2*pow10 + 9*sum999 + 9*(num%pow10 + 1) + sumRest sum(10^k-1) = 36*pow10 + 9*sum999 + 9*(num%pow10 + 1) + sumRest sum999 and sumRest are the same for the numbers of type 10^k: sum(10^k-1) = 36*pow10 + 10*sum(10^(k-1)-1) + 9*(num%pow10 + 1) sum(10^k-1) = 36*pow10 + 10*sum(10^(k-1)-1) + 9*((10^k-1)%pow10 + 1) sum(10^k-1) = 36*pow10 + 10*sum(10^(k-1)-1) + 9*pow10 sum(10^k-1) = 45*pow10 + 10*sum(10^(k-1)-1) We have a definition of sum(10^k-1) and know sum(9)=45. And we get: sum(10^k-1) = 45*k*10^k The updated code: private static long getSumOfSumOfDigitsBelow(long num) { if (num == 0) return 0; long N = (int) Math.Floor(Math.Log10(num)); int pow10 = (int)Math.Pow(10, N); long firstDigit = num / pow10; long sum = (firstDigit - 1)*firstDigit/2*pow10 + firstDigit* 45 * N * pow10 / 10 + firstDigit*(num%pow10 + 1) + getSumOfSumOfDigitsBelow(num % pow10); return sum; } This is the same algorithm as the one from DineMartine but expressed in a recursive fashion (I've compared both implementations and yes I'm sure it is ;) ). The runtime goes down to practically zero and the time complexity is O(N) counting the number of digits or O(long(N)) taking the value.
If you have multiple processors (or cores) in your system, you can speed it up quite a lot by doing the calculations in parallel. The following code demonstrates (it's a compilable console app). The output when I try it on my system (4 cores with hyperthreading) is as follows for a release build: x86 version: Serial took: 00:00:14.6890714 Parallel took: 00:00:03.5324480 Linq took: 00:00:04.4480217 Fast Parallel took: 00:00:01.6371894 x64 version: Serial took: 00:00:05.1424354 Parallel took: 00:00:00.9860272 Linq took: 00:00:02.6912356 Fast Parallel took: 00:00:00.4154711 Note that the parallel version is around 4 times faster. Also note that the x64 version is MUCH faster (due to the use of long in the calculations). The code uses Parallel.ForEach along with a Partitioner to split the range of number into sensible regions for the number of processors available. It also uses Interlocked.Add() to quickly add the numbers with efficient locking. I've also added another method where you need to pre-calculate the sums for the numbers between 0 and 1000. You should only need to pre-calculate the sums once for each run of the program. See FastGetSumOfDigits(). Using FastGetSumOfDigits() more than doubles the previous fastest time on my PC. You can increase the value of SUMS_SIZE to a larger multiple of 10 to increase the speed still further, at the expense of space. Increasing it to 10000 on my PC decreased the time to ~0.3s (The sums array only needs to be a short array, to save space. It doesn't need a larger type.) using System; using System.Collections.Concurrent; using System.Diagnostics; using System.Linq; using System.Runtime.CompilerServices; using System.Threading; using System.Threading.Tasks; namespace Demo { internal class Program { public static void Main() { long n = 123456789; Stopwatch sw = Stopwatch.StartNew(); long num = 0; for (long i = 0; i <= n; i++) num = num + GetSumOfDigits(i); Console.WriteLine("Serial took: " + sw.Elapsed); Console.WriteLine(num); sw.Restart(); num = 0; var rangePartitioner = Partitioner.Create(0, n + 1); Parallel.ForEach(rangePartitioner, (range, loopState) => { long subtotal = 0; for (long i = range.Item1; i < range.Item2; i++) subtotal += GetSumOfDigits(i); Interlocked.Add(ref num, subtotal); }); Console.WriteLine("Parallel took: " + sw.Elapsed); Console.WriteLine(num); sw.Restart(); num = Enumerable.Range(1, 123456789).AsParallel().Select(i => GetSumOfDigits(i)).Sum(); Console.WriteLine("Linq took: " + sw.Elapsed); Console.WriteLine(num); sw.Restart(); initSums(); num = 0; Parallel.ForEach(rangePartitioner, (range, loopState) => { long subtotal = 0; for (long i = range.Item1; i < range.Item2; i++) subtotal += FastGetSumOfDigits(i); Interlocked.Add(ref num, subtotal); }); Console.WriteLine("Fast Parallel took: " + sw.Elapsed); Console.WriteLine(num); } private static void initSums() { for (int i = 0; i < SUMS_SIZE; ++i) sums[i] = (short)GetSumOfDigits(i); } [MethodImpl(MethodImplOptions.AggressiveInlining)] private static long GetSumOfDigits(long n) { long sum = 0; while (n != 0) { sum += n%10; n /= 10; } return sum; } [MethodImpl(MethodImplOptions.AggressiveInlining)] private static long FastGetSumOfDigits(long n) { long sum = 0; while (n != 0) { sum += sums[n % SUMS_SIZE]; n /= SUMS_SIZE; } return sum; } static short[] sums = new short[SUMS_SIZE]; private const int SUMS_SIZE = 1000; } }
To increase performance you could calculate the sum starting from the highest number. Let r=n%10 + 1. Calculate the sum for the last r numbers. Then we note that if n ends with a 9, then the total sum can be calculated as 10 * sum(n/10) + (n+1)/10 * 45. The first term is the sum of all digits but the last, and the second term is the sum of the last digit. The function to calculate the total sum becomes: static long GetSumDigitFrom1toN(long n) { long num2 = 0; long i; long r = n%10 + 1; if (n <= 0) { return 0; } for (i = 0; i < r; i++) { num2 += GetSumOfDigits(n - i); } // The magic number 45 is the sum of 1 to 9. return num2 + 10 * GetSumDigitFrom1toN(n/10 - 1) + (n/10) * 45; } Test run: GetSumDigitFrom1toN(12L): 51 GetSumDigitFrom1toN(123456789L): 4366712385 The time complexity is O(log n).
Sum of digits for 0..99999999 is 10000000 * 8 * (0 + 1 + 2 + ... + 9). Then calculating the rest (100000000..123456789) using the loop might be fast enough. For N = 12: Sum of digits for 0..9 is 1 * 1 * 45. Then use your loop for 10, 11, 12. For N = 123: Sum of digits for 0..99 is 10 * 2 * 45. Then use your loop for 100..123. You see the pattern?
A different approach you can try: Convert the number to a string, then to a Char array Sum the ASCII codes for all the chars, minus the code for 0 Example code: long num = 123456789; var numChars = num.ToString().ToCharArray(); var zeroCode = Convert.ToByte('0'); var sum = numChars.Sum(ch => Convert.ToByte(ch) - zeroCode);
How use bitmask operator
I'm learning bit mask. And found and example but couldn't make it work. I'm trying to calculate all sum combination from one array. The result should be 0 - 1 - 2 - 3 - 3 - 4 - 5 - 6 My problem is with (i & mask) should only result in {0,1} and isn't. Instead is producing. 0 - 1 - 4 - 5 - 12 - 13 - 16 - 17 int[] elem = new int[] { 1, 2, 3 }; double maxElem = Math.Pow(2, elem.Length); for (int i = 0; i < maxElem; first++) { int mask = 1, sum = 0; for (int run = 0; run < elem.Length; run++) { sum += elem[run] * (i & mask); mask <<= 1; } Debug.Write(sum + " - "); }
(i & mask) should only result in {0,1} and isn't (i & mask) should return a result in {0,1} only when mask is 1 - that is, on the initial iteration. However, as soon as mask gets shifted by mask <<= 1 operation, the result of the next operation will be in {0,2}. As the mask gets shifted, possible results will become {0,4}, {0,8}, {0,16} and so on, because the only bit set to 1 in the mask would be moving to the left. The reason why << operator doubles the number is the same as the reason why writing a zero after a decimal number has the effect of multiplying the number by ten: appending a zero to a number of any base is the same as multiplying that number by the value of base.
Ok, I solve it creating an IF. int[] elem = new int[] { 1, 2, 3 }; double maxElem = Math.Pow(2, elem.Length); for (int i = 0; i < maxElem; first++) { for (int run = 0; run < elem.Length; run++) { int mask = 1, sum = 0; if ((i & mask) > 0) // ADD THIS LINE { sum += elem[run]; } mask <<= 1; } }
How to make this small C# program which calculates prime numbers using the Sieve of Eratosthenes as efficient and economical as possible?
I've made a small C# program which calculates prime numbers using the Sieve of Eratosthenes. long n = 100000; bool[] p = new bool[n+1]; for(long i=2; i<=n; i++) { p[i]=true; } for(long i=2; i<=X; i++) { for(long j=Y; j<=Z; j++) { p[i*j]=false; } } for(long i=0; i<=n; i++) { if(p[i]) { Console.Write(" "+i); } } Console.ReadKey(true); My question is: which X, Y and Z should I choose to make my program as efficient and economical as possible? Of course we can just take: X = n Y = 2 Z = n But then the program won't be very efficient. It seems we can take: X = Math.Sqrt(n) Y = i Z = n/i And apparently the first 100 primes that the program gives are all correct.
There are several optimisations that can be applied without making the program overly complicated. you can start the crossing out at j = i (effectively i * i instead of 2 * i) since all lower multiples of i have already been crossed out you can save some work by leaving all even numbers out of the array (remembering to produce the prime 2 out of thin air when needed); hence array cell k represents the odd integer 2 * k + 1 you can make things faster by turning repeated multiplication (i * j) into iterated addition (k += i); instead of looping over j in the inner loop you loop (k = i * i; k <= N; k += i) in some cases it can be advantageous to initialise the array with 0 (false) and set cells to 1 (true) for composites; its meaning is thus 'is_composite' instead of 'is_prime' Harvesting all the low-hanging fruit, the loops thus become (in C++, but C# should be sort of similar): uint32_t max_factor_bit = uint32_t(sqrt(double(n))) >> 1; uint32_t max_bit = n >> 1; for (uint32_t i = 3 >> 1; i <= max_factor_bit; ++i) { if (composite[i]) continue; uint32_t n = (i << 1) + 1; uint32_t k = (n * n) >> 1; for ( ; k <= max_bit; k += n) { composite[k] = true; } } Regarding the computation of max_factor there are some caveats where the compiler can bite you, for larger values of n. There's a topic for that on Code Review. A further, easy optimisation is to represent the bitmap as an array of bytes, with each byte standing for eight odd integers. For setting bit k in byte array a you would do a[k / CHAR_BIT] |= (1 << (k % CHAR_BIT)) where CHAR_BIT is the number of bits in a byte. However, such bit trickery is normally wrapped into an inline function to keep the code clean. E.g. in C++ I tell the compiler how to generate such functions using a template like this: template<typename word_t> inline void set_bit (word_t *p, uint32_t index) { enum { BITS_PER_WORD = sizeof(word_t) * CHAR_BIT }; // we can trust the compiler to use masking and shifting instead of division; we cannot do that // ourselves without having the log2 which cannot easily be computed as a constexpr p[index / BITS_PER_WORD] |= word_t(1) << (index % BITS_PER_WORD); } This allows me to say set_bit(a, k) for any type of array - byte, integer, whatever - without having to write special code or use invocations; it's basically a type-safe equivalent to the old C-style macros. I'm not certain whether something similar is possible in C#. There is, however, the C# type BitArray where all that stuff is already done for you under the hood. On pastebin there's a small demo .cpp for the segmented Sieve of Eratosthenes, where two further optimisations are applied: presieving by small integers, and sieving in small, cache friendly blocks so that the full range of 32-bit integers can be sieved in 2 seconds flat. This could give you some inspiration... When doing the Sieve of Eratosthenes, memory savings easily translate to speed gains because the algorithm is memory-intensive and it tends to stride all over the memory instead of accessing it locally. That's why space savings due to compact representation (only odd integers, packed bits - i.e. BitArray) and localisation of access (by sieving in small blocks instead of the whole array in one go) can speed up the code by one or more orders of magnitude, without making the code significantly more complicated. It is possible to go far beyond the easy optimisations mentioned here, but that tends to make the code increasingly complicated. One word that often occurs in this context is the 'wheel', which can save a further 50% of memory space. The wiki has an explanation of wheels here, and in a sense the odds-only sieve is already using a 'modulo 2 wheel'. Conversely, a wheel is the extension of the odds-only idea to dropping further small primes from the array, like 3 and 5 in the famous 'mod 30' wheel with modulus 2 * 3 * 5. That wheel effectively stuffs 30 integers into one 8-bit byte. Here's a runnable rendition of the above code in C#: static uint max_factor32 (double n) { double r = System.Math.Sqrt(n); if (r < uint.MaxValue) { uint r32 = (uint)r; return r32 - ((ulong)r32 * r32 > n ? 1u : 0u); } return uint.MaxValue; } static void sieve32 (System.Collections.BitArray odd_composites) { uint max_bit = (uint)odd_composites.Length - 1; uint max_factor_bit = max_factor32((max_bit << 1) + 1) >> 1; for (uint i = 3 >> 1; i <= max_factor_bit; ++i) { if (odd_composites[(int)i]) continue; uint p = (i << 1) + 1; // the prime represented by bit i uint k = (p * p) >> 1; // starting point for striding through the array for ( ; k <= max_bit; k += p) { odd_composites[(int)k] = true; } } } static int Main (string[] args) { int n = 100000000; System.Console.WriteLine("Hello, Eratosthenes! Sieving up to {0}...", n); System.Collections.BitArray odd_composites = new System.Collections.BitArray(n >> 1); sieve32(odd_composites); uint cnt = 1; ulong sum = 2; for (int i = 1; i < odd_composites.Length; ++i) { if (odd_composites[i]) continue; uint prime = ((uint)i << 1) + 1; cnt += 1; sum += prime; } System.Console.WriteLine("\n{0} primes, sum {1}", cnt, sum); return 0; } This does 10^8 in about a second, but for higher values of n it gets slow. If you want to do faster then you have to employ sieving in small, cache-sized blocks.
Explain the Peak and Flag Algorithm
EDIT Just was pointed that the requirements state peaks cannot be ends of Arrays. So I ran across this site http://codility.com/ Which gives you programming problems and gives you certificates if you can solve them in 2 hours. The very first question is one I have seen before, typically called the Peaks and Flags question. If you are not familiar A non-empty zero-indexed array A consisting of N integers is given. A peak is an array element which is larger than its neighbours. More precisely, it is an index P such that 0 < P < N − 1 and A[P − 1] < A[P] > A[P + 1] . For example, the following array A: A[0] = 1 A[1] = 5 A[2] = 3 A[3] = 4 A[4] = 3 A[5] = 4 A[6] = 1 A[7] = 2 A[8] = 3 A[9] = 4 A[10] = 6 A[11] = 2 has exactly four peaks: elements 1, 3, 5 and 10. You are going on a trip to a range of mountains whose relative heights are represented by array A. You have to choose how many flags you should take with you. The goal is to set the maximum number of flags on the peaks, according to certain rules. Flags can only be set on peaks. What's more, if you take K flags, then the distance between any two flags should be greater than or equal to K. The distance between indices P and Q is the absolute value |P − Q|. For example, given the mountain range represented by array A, above, with N = 12, if you take: two flags, you can set them on peaks 1 and 5; three flags, you can set them on peaks 1, 5 and 10; four flags, you can set only three flags, on peaks 1, 5 and 10. You can therefore set a maximum of three flags in this case. Write a function that, given a non-empty zero-indexed array A of N integers, returns the maximum number of flags that can be set on the peaks of the array. For example, given the array above the function should return 3, as explained above. Assume that: N is an integer within the range [1..100,000]; each element of array A is an integer within the range [0..1,000,000,000]. Complexity: expected worst-case time complexity is O(N); expected worst-case space complexity is O(N), beyond input storage (not counting the storage required for input arguments). Elements of input arrays can be modified. So this makes sense, but I failed it using this code public int GetFlags(int[] A) { List<int> peakList = new List<int>(); for (int i = 0; i <= A.Length - 1; i++) { if ((A[i] > A[i + 1] && A[i] > A[i - 1])) { peakList.Add(i); } } List<int> flagList = new List<int>(); int distance = peakList.Count; flagList.Add(peakList[0]); for (int i = 1, j = 0, max = peakList.Count; i < max; i++) { if (Math.Abs(Convert.ToDecimal(peakList[j]) - Convert.ToDecimal(peakList[i])) >= distance) { flagList.Add(peakList[i]); j = i; } } return flagList.Count; } EDIT int[] A = new int[] { 7, 10, 4, 5, 7, 4, 6, 1, 4, 3, 3, 7 }; The correct answer is 3, but my application says 2 This I do not get, since there are 4 peaks (indices 1,4,6,8) and from that, you should be able to place a flag at 2 of the peaks (1 and 6) Am I missing something here? Obviously my assumption is that the beginning or end of an Array can be a peak, is this not the case? If this needs to go in Stack Exchange Programmers, I will move it, but thought dialog here would be helpful. EDIT
Obviously my assumption is that the beginning or end of an Array can be a peak, is this not the case? Your assumption is wrong since peak is defined as: 0 < P < N − 1 When it comes to your second example you can set 3 flags on 1, 4, 8.
Here is a hint: If it is possible to set m flags, then there must be at least m * (m - 1) + 1 array elements. Given that N < 100,000, turning the above around should give you confidence that the problem can be efficiently brute-forced.
Here is a hint: If it is possible to set m flags, then there must be at least m * (m - 1) + 1 array elements. Given that N < 100,000, turning the above around should give you confidence that the problem can be efficiently brute-forced. No, that is wrong. Codility puts custom solutions through a series of tests, and brute forcing can easily fail on time.
I give here my solution of the task that makes 100% score (correctness and performance) in codility, implemented in C++. To understand the solution you must realize for a given distance of indexes (for example, when first peak starts at index 2 and the last peak at index 58 the distance is 56), that contains n peaks there is an upper limit for the maximal number of peaks that can hold flags according to condition described in the task. #include <vector> #include <math.h> typedef unsigned int uint; void flagPeaks(const std::vector<uint> & peaks, std::vector<uint> & flaggedPeaks, const uint & minDist) { flaggedPeaks.clear(); uint dist = peaks[peaks.size() - 1] - peaks[0]; if (minDist > dist / 2) return; flaggedPeaks.push_back(peaks[0]); for (uint i = 0; i < peaks.size(); ) { uint j = i + 1; while (j < (peaks.size()) && ((peaks[j] - peaks[i]) < minDist)) ++j; if (j < (peaks.size()) && ((peaks[j] - peaks[i]) >= minDist)) flaggedPeaks.push_back(peaks[j]); i = j; } } int solution(std::vector<int> & A) { std::vector<uint> peaks; uint min = A.size(); for (uint i = 1; i < A.size() - 1; i++) { if ((A[i] > A[i - 1]) && (A[i] > A[i + 1])) { peaks.push_back(i); if (peaks.size() > 1) { if (peaks[peaks.size() - 1] - peaks[peaks.size() - 2] < min) min = peaks[peaks.size() - 1] - peaks[peaks.size() - 2]; } } } // minimal distance between 2 peaks is 2 // so when we have less than 3 peaks we are done if (peaks.size() < 3 || min >= peaks.size()) return peaks.size(); const uint distance = peaks[peaks.size() - 1] - peaks[0]; // parts are the number of pieces between peaks // given n + 1 peaks we always have n parts uint parts = peaks.size() - 1; // calculate maximal possible number of parts // for the given distance and number of peaks double avgOptimal = static_cast<double>(distance) / static_cast<double> (parts); while (parts > 1 && avgOptimal < static_cast<double>(parts + 1)) { parts--; avgOptimal = static_cast<double>(distance) / static_cast<double>(parts); } std::vector<uint> flaggedPeaks; // check how many peaks we can flag for the // minimal possible distance between two flags flagPeaks(peaks, flaggedPeaks, parts + 1); uint flags = flaggedPeaks.size(); if (flags >= parts + 1) return parts + 1; // reduce the minimal distance between flags // until the condition fulfilled while ((parts > 0) && (flags < parts + 1)) { --parts; flagPeaks(peaks, flaggedPeaks, parts + 1); flags = flaggedPeaks.size(); } // return the maximal possible number of flags return parts + 1; }
Project Euler Question 3 Help
I'm trying to work through Project Euler and I'm hitting a barrier on problem 03. I have an algorithm that works for smaller numbers, but problem 3 uses a very, very large number. Problem 03: The prime factors of 13195 are 5, 7, 13 and 29. What is the largest prime factor of the number 600851475143? Here is my solution in C# and it's been running for I think close to an hour. I'm not looking for an answer because I do actually want to solve this myself. Mainly just looking for some help. static void Main(string[] args) { const long n = 600851475143; //const long n = 13195; long count, half, largestPrime = 0; bool IsAPrime; half = n / 2; for (long i = half; i > 1 && largestPrime == 0; i--) { if (n % i == 0) { // these are factors of n count = 1; IsAPrime = true; while (++count < i && IsAPrime) { if (i % count == 0) { // does a factor of n have a factor? (not prime) IsAPrime = false; } } if (IsAPrime) { largestPrime = i; } } } Console.WriteLine("The largest prime factor is " + largestPrime.ToString() + "."); Console.ReadLine(); }
For starters, instead of beginning your search at n / 2, start it at the square root of n. You'll get half of the factors, the other half being their complement. eg: n = 27 start at floor(sqrt(27)) = 5 is 5 a factor? no is 4 a factor? no is 3 a factor? yes. 27 / 3 = 9. 9 is also a factor. is 2 a factor? no. factors are 3 and 9.
Actually, for this case you don't need to check for primality, just remove the factors you find. Start with n == 2 and scan upwards. When evil-big-number % n == 0, divide evil-big-number by n and continue with smaller-evil-number. Stop when n >= sqrt(big-evil-number). Should not take more than a few seconds on any modern machine.
Although the question asks for the largest prime factor, it doesn't necessarily mean you have to find that one first...
long n = 600851475143L; //not even, so 2 wont be a factor int factor = 3; while( n > 1) { if(n % factor == 0) { n/=factor; }else factor += 2; //skip even numbrs } print factor; This should be quick enough...Notice, there's no need to check for prime...
You need to reduce the amount of checking you are doing ... think about what numbers you need to test. For a better approach read up on the Sieve of Erathosthenes ... it should get you pointed in the right direction.
As for the reason to accepted nicf's answer: It is OK for the problem at Euler, but does not make this an efficient solution in the general case. Why would you try even numbers for factors? If n is even, shift left (divide by 2) until it is not anymore. If it is one then, 2 is the largest prime factor. If n is not even, you do not have to test even numbers. It is true that you can stop at sqrt(n). You only have to test primes for factors. It might be faster to test whether k divides n and then test it for primality though. You can optimize the upper limit on the fly when you find a factor. This would lead to some code like this: n = abs(number); result = 1; if (n mod 2 = 0) { result = 2; while (n mod 2 = 0) n /= 2; } for(i=3; i<sqrt(n); i+=2) { if (n mod i = 0) { result = i; while (n mod i = 0) n /= i; } } return max(n,result) There are some modulo tests that are superflous, as n can never be divided by 6 if all factors 2 and 3 have been removed. You could only allow primes for i. Just as an example lets look at the result for 21: 21 is not even, so we go into the for loop with upper limit sqrt(21) (~4.6). We can then divide 21 by 3, therefore result = 3 and n = 21/3 = 7. We now only have to test up to sqrt(7). which is smaller then 3, so we are done with the for loop. We return the max of n and result, which is n = 7.
The way I did it was to search for primes (p), starting at 2 using the Sieve of Eratosthenes. This algorithm can find all the primes under 10 million in <2s on a decently fast machine. For every prime you find, test divide it into the number you are testing against untill you can't do integer division anymore. (ie. check n % p == 0 and if true, then divide.) Once n = 1, you're done. The last value of n that successfully divided is your answer. On a sidenote, you've also found all the prime factors of n on the way. PS: As been noted before, you only need to search for primes between 2 <= n <= sqrt(p). This makes the Sieve of Eratosthenes a very fast and easy to implement algorithm for our purposes.
Once you find the answer, enter the following in your browser ;) http://www.wolframalpha.com/input/?i=FactorInteger(600851475143) Wofram Alpha is your friend
Using a recursive algorithm in Java runs less than a second ... think your algorithm through a bit as it includes some "brute-forcing" that can be eliminated. Also look at how your solution space can be reduced by intermediate calculations.
Easy peasy in C++: #include <iostream> using namespace std; int main() { unsigned long long int largefactor = 600851475143; for(int i = 2;;) { if (largefactor <= i) break; if (largefactor % i == 0) { largefactor = largefactor / i; } else i++; } cout << largefactor << endl; cin.get(); return 0; }
This solution on C++ took 3.7 ms on my Intel Quad Core i5 iMac (3.1 GHz) #include <iostream> #include <cmath> #include <ctime> using std::sqrt; using std::cin; using std::cout; using std::endl; long lpf(long n) { long start = (sqrt(n) + 2 % 2); if(start % 2 == 0) start++; for(long i = start; i != 2; i -= 2) { if(n % i == 0) //then i is a factor of n { long j = 2L; do { ++j; } while(i % j != 0 && j <= i); if(j == i) //then i is a prime number return i; } } } int main() { long n, ans; cout << "Please enter your number: "; cin >> n; //600851475143L time_t start, end; time(&start); int i; for(i = 0; i != 3000; ++i) ans = lpf(n); time(&end); cout << "The largest prime factor of your number is: " << ans << endl; cout << "Running time: " << 1000*difftime(end, start)/i << " ms." << endl; return 0; }
All Project Euler's problems should take less then a minute; even an unoptimized recursive implementation in Python takes less then a second [0.09 secs (cpu 4.3GHz)]. from math import sqrt def largest_primefactor(number): for divisor in range(2, int(sqrt(number) + 1.5)): # divisor <= sqrt(n) q, r = divmod(number, divisor) if r == 0: #assert(isprime(divisor)) # recursion depth == number of prime factors, # e.g. 4 has two prime factors: {2,2} return largest_primefactor(q) return number # number is a prime itself
you might want to see this: Is there a simple algorithm that can determine if X is prime, and not confuse a mere mortal programmer? and I like lill mud's solution: require "mathn.rb" puts 600851475143.prime_division.last.first I checked it here
Try using the Miller-Rabin Primality Test to test for a number being prime. That should speed things up considerably.
Another approach is to get all primes up to n/2 first and then to check if the modulus is 0. An algorithm I use to get all the primes up to n can be found here.
Maybe it is considered cheating, but one possibility in haskell is to write (for the record I wrote the lines myself and haven't checked eulerproject threads); import Data.Numbers.Primes last (primeFactors 600851475143)