I had a go at a project euler coding challenge below, the answer given by the code is correct but I do not understand why its taking nearly a minute to run. It was finishing with similar times prior to using a sieve. Others users are reporting times as low as milliseconds.
I assume I am making a basic error somewhere...
// The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17.
// Find the sum of all the primes below two million.
public static long Ex010()
{
var sum = 0L;
var sieve = new bool[2000000];
var primes = new List<int>(10000);
for (int i = 2; i < sieve.Length; i++)
{
if (sieve[i-1])
continue;
var isPrime = true;
foreach (var prime in primes)
{
if (i % prime == 0) {
isPrime = false;
break;
}
}
if (isPrime) {
primes.Add(i);
sum += i;
for (var x = i * 2; x < sieve.Length; x += i) {
sieve[x-1] = true;
}
}
}
return sum;
}
EDIT:
The only thing that seemed missing was this optimization :
if (prime > Math.Sqrt(i))
break;
It brings the time down to 160 ms.
EDIT 2:
Finally clicked, took out the foreach as was suggested many times. Its now 12 ms. Final solution :
public static long Ex010()
{
var sum = 0L;
var sieve = new bool[2000000];
for (int i = 2; i < sieve.Length; i++)
{
if (sieve[i-1])
continue;
sum += i;
for (var x = i * 2; x < sieve.Length; x += i) {
sieve[x-1] = true;
}
}
return sum;
}
You are doing trial division in addition to a sieve.
The boolean array will already tell you if a number is prime, so you don't need the List of primes at all.
You can also speed it up by only sieving up to the square root of the limit.
If you want to save some memory also, you can use a BitArray instead of a boolean array.
public static long Ex010()
{
const int Limit = 2000000;
int sqrt = (int)Math.Sqrt(Limit);
var sum = 0L;
var isComposite = new bool[Limit];
for (int i = 2; i < sqrt; i++) {
if (isComposite[i - 2])
continue;//This number is not prime, skip
sum += i;
for (var x = i * i; x < isComposite.Length; x += i) {
isComposite[x - 2] = true;
}
}
//Add the remaining prime numbers
for (int i = sqrt; i < Limit; i++) {
if (!isComposite[i - 2]) {
sum += i;
}
}
return sum;
}
(tl;dr: 2 million in 0.8 ms, 2 billion in 1.25 s; segmented odds-only SoE, presieving, wheeled striding)
As always, the limit of Euler task #10 seems designed to pose a mild challenge on a ZX81, Apple ][ or C64 but on modern hardware you generally have to multiply the limits by 1000 to make things even remotely interesting. Or set a time limit like 5 seconds and try to see by how many orders of magnitude the Euler limit can be exceeded...
Dennis_E's solution is simple and efficient, but I'd recommend applying two small improvements that give a marked performance boost without any effort at all.
Represent only odd numbers in the sieve
All even numbers except for the number 2 are composite. If you pull the number 2 out of thin air when needed then you can drop all even numbers from the sieve. This halves workload and memory footprint, for a doubling of performance at the marginal cost of writing << or >> in a few places (to convert between the realm of numbers and the realm of bit indices). This is usually known as an 'odds-only sieve' or 'wheeled representation mod 2'; it has the added advantage that it largely removes the need for guarding against index overflow.
Skip a few extra small primes during decoding
Skipping a few small primes ('applying a wheel') is much easier when going through a range of numbers incrementally, compared to hopping wildly about with different strides as during sieving. This skipping only involves applying a cyclic sequence of differences between consecutive numbers that are not multiples of the primes in question, like 4,2,4,2... for skipping multiples of 2 and 3 (the mod 6 wheel) or 6,4,2,4,2,4,6,2... for skipping multiples of 2, 3 and 5.
The mod 6 wheel sequence alternates between only two numbers, which can easily be achieved by XORing with a suitable value. On top of the odds-only sieve the distances are halved, so that the sequence becomes 2,1,2,1... This skipping reduces the work during decoding by 1/3rd (for stepping mod 3), and the skipped primes can also be ignored during the sieving. The latter can have a marked effect on sieve times, since the smallest primes do the greatest number of crossings-off during the sieving.
Here's a simple Sieve of Eratosthenes, with both suggestions applied. Note: here and in the following I generally go with the flow of C#/.Net and use signed integers where I would normally use unsigned integers in any sane language. That's because I don't have the time to vet the code for the performance implications (penalties) resulting from the use of unsigned types, like that the compiler suddenly forgets how to replace division by a constant with multiplication of the inverse and so on.
static long sum_small_primes_up_to (int n)
{
if (n < 7)
return (0xAA55200 >> (n << 2)) & 0xF;
int sqrt_n_halved = (int)Math.Sqrt(n) >> 1;
int max_bit = (int)(n - 1) >> 1;
var odd_composite = new bool[max_bit + 1];
for (int i = 5 >> 1; i <= sqrt_n_halved; ++i)
if (!odd_composite[i])
for (int p = (i << 1) + 1, j = p * p >> 1; j <= max_bit; j += p)
odd_composite[j] = true;
long sum = 2 + 3;
for (int i = 5 >> 1, d = 1; i <= max_bit; i += d, d ^= 3)
if (!odd_composite[i])
sum += (i << 1) + 1;
return sum;
}
The first if statement handles the small fry (n in 0..6) by returning a suitable element of a precomputed list of numbers, and it serves to get all the special cases out of the way in one fell swoop. All other occurrences of shift operators are for converting between the realm of numbers and the realm of indices into the odds-only sieve.
This is pretty much the same code that I normally use for sieving small primes, up to 64K or so (the potential least factors for numbers up to 32 bits). It does Euler's piddling 2 million in 4.5 milliseconds but throwing bigger numbers at it shows its Achilles heel: it does a lot of striding over large distances, which interacts badly with modern memory subsystems where decent access speed can only be got from caches. The performance drops markedly when the capacity of the level 1 cache (typically 32 KiByte) is exceeded significantly, and it goes down even further when exceeding the L2 and L3 capacities (typically several megabytes). How sharp the drop is depends on the quality (price tag) of the computer, of course...
Here are some timings taken on my laptop:
# benchmark: small_v0 ...
sum up to 2 * 10^4: 21171191 in 0,03 ms
sum up to 2 * 10^5: 1709600813 in 0,35 ms // 11,0 times
sum up to 2 * 10^6: 142913828922 in 4,11 ms // 11,7 times
sum up to 2 * 10^7: 12272577818052 in 59,36 ms // 14,4 times
sum up to 2 * 10^8: 1075207199997334 in 1.225,19 ms // 20,6 times
sum up to 2 * 10^9: 95673602693282040 in 14.381,29 ms // 11,7 times
In the middlish ranges there are time increases that go well beyond the expected factor of about 11, and then things stabilise again.
And here comes how to speed up the beast an order of magnitude...
Process the range in cache-sized segments
The remedy is easy enough: instead of striding each prime all the way from one end of the range to the other - and hence all across the memory space - we sieve the range in cache-sized strips, memorising the final positions for each prime so that the next round can continue right where the previous round left off. If we don't need a big bad sieve full of bits at the end then we can process a strip (extract its primes) after it has been sieved and then discard its data, reusing the sieve buffer for the next strip. Both are variations on the theme of segmented sieving but the offsets are treated differently during the processing; when the distinction matters then the first approach (big bad sieve for the whole range) is usually called a segmented sieve and the latter an iterated sieve. The terms 'moving' or 'sliding' sieve might fit the latter somewhat but should be avoided because they normally refer to a totally different class of sieves (also known as deque sieves) that are deceptively simple but whose performance is worse by at least an order of magnitude.
Here's an example of an iterated sieve, a slightly modified version of a function that I normally use for sieving primes in given ranges [m, n], just like in SPOJ's PRIMES1 and PRINT. Here the parameter m is implicitly 0, so it doesn't need to be passed.
Normally the function takes an interface that is responsible for processing the raw sieve (and any lose primes it may get passed), and which can get queried for the number of primes that the processor skips ('decoder order') so that the sieve can ignore those during the sieving. For this exposition I've replaced this with a delegate for simplicity.
The factor primes get sieved by a stock function that may look somewhat familiar, and I've changed the logic of the sieve from 'is_composite' to 'not_composite' (and to a base type that can participate in arithmetic) for reasons that will be explained later. decoder_order is the number of additional primes skipped by the decoder (which would be 1 for the function shown earlier, because it skips multiples of the prime 3 during the prime extraction/summing, over and above the wheel prime 2).
const int SIEVE_BITS = 1 << 15; // L1 cache size, 1 usable bit per byte
delegate long sieve_sum_func (byte[] sieve, int window_base, int window_bits);
static long sum_primes_up_to (int n, sieve_sum_func sum_func, int decoder_order)
{
if (n < 7)
return 0xF & (0xAA55200 >> (n << 2));
n -= ~n & 1; // make odd (n can't be 0 here)
int sqrt_n = (int)Math.Sqrt(n);
var factor_primes = small_primes_up_to(sqrt_n).ToArray();
int first_sieve_prime_index = 1 + decoder_order; // skip wheel primes + decoder primes
int m = 7; // this would normally be factor_primes[first_sieve_prime_index] + 2
int bits_to_sieve = ((n - m) >> 1) + 1;
int sieve_bits = Math.Min(bits_to_sieve, SIEVE_BITS);
var sieve = new byte[sieve_bits];
var offsets = new int[factor_primes.Length];
int sieve_primes_end = first_sieve_prime_index;
long sum = 2 + 3 + 5; // wheel primes + decoder primes
for (int window_base = m; ; )
{
int window_bits = Math.Min(bits_to_sieve, sieve_bits);
int last_number_in_window = window_base - 1 + (window_bits << 1);
while (sieve_primes_end < factor_primes.Length)
{
int prime = factor_primes[sieve_primes_end];
int start = prime * prime, stride = prime << 1;
if (start > last_number_in_window)
break;
if (start < window_base)
start = (stride - 1) - (window_base - start - 1) % stride;
else
start -= window_base;
offsets[sieve_primes_end++] = start >> 1;
}
fill(sieve, window_bits, (byte)1);
for (int i = first_sieve_prime_index; i < sieve_primes_end; ++i)
{
int prime = factor_primes[i], j = offsets[i];
for ( ; j < window_bits; j += prime)
sieve[j] = 0;
offsets[i] = j - window_bits;
}
sum += sum_func(sieve, window_base, window_bits);
if ((bits_to_sieve -= window_bits) == 0)
break;
window_base += window_bits << 1;
}
return sum;
}
static List<int> small_primes_up_to (int n)
{
int upper_bound_on_pi = 32 + (n < 137 ? 0 : (int)(n / (Math.Log(n) - 1.083513)));
var result = new List<int>(upper_bound_on_pi);
if (n < 2)
return result;
result.Add(2); // needs to be pulled out of thin air because of the mod 2 wheel
if (n < 3)
return result;
result.Add(3); // needs to be pulled out of thin air because of the mod 3 decoder
int sqrt_n_halved = (int)Math.Sqrt(n) >> 1;
int max_bit = (n - 1) >> 1;
var odd_composite = new bool[max_bit + 1];
for (int i = 5 >> 1; i <= sqrt_n_halved; ++i)
if (!odd_composite[i])
for (int p = (i << 1) + 1, j = p * p >> 1; j <= max_bit; j += p)
odd_composite[j] = true;
for (int i = 5 >> 1, d = 1; i <= max_bit; i += d, d ^= 3)
if (!odd_composite[i])
result.Add((i << 1) + 1);
return result;
}
static void fill<T> (T[] array, int count, T value, int threshold = 16)
{
Trace.Assert(count <= array.Length);
int current_size = Math.Min(threshold, count);
for (int i = 0; i < current_size; ++i)
array[i] = value;
for (int half = count >> 1; current_size <= half; current_size <<= 1)
Buffer.BlockCopy(array, 0, array, current_size, current_size);
Buffer.BlockCopy(array, 0, array, current_size, count - current_size);
}
Here's sieve processor that is equivalent to the logic used in the function shown at the beginning, and a dummy function that can be used to measure the sieve time sans any decoding, for comparison:
static long prime_sum_null (byte[] sieve, int window_base, int window_bits)
{
return 0;
}
static long prime_sum_v0 (byte[] sieve, int window_base, int window_bits)
{
long sum = 0;
int i = window_base % 3 == 0 ? 1 : 0;
int d = 3 - (window_base + 2 * i) % 3;
for ( ; i < window_bits; i += d, d ^= 3)
if (sieve[i] == 1)
sum += window_base + (i << 1);
return sum;
}
This function needs to perform a bit of modulo magic to synchronise itself with the mod 3 sequence over the mod 2 sieve; the earlier function did not need to do this because its starting point was fixed, not a parameter. Here are the timings:
# benchmark: iter_v0 ...
sum up to 2 * 10^4: 21171191 in 0,04 ms
sum up to 2 * 10^5: 1709600813 in 0,28 ms // 7,0 times
sum up to 2 * 10^6: 142913828922 in 2,42 ms // 8,7 times
sum up to 2 * 10^7: 12272577818052 in 22,11 ms // 9,1 times
sum up to 2 * 10^8: 1075207199997334 in 223,67 ms // 10,1 times
sum up to 2 * 10^9: 95673602693282040 in 2.408,06 ms // 10,8 times
Quite a difference, n'est-ce pas? But we're not done yet.
Replace conditional branching with arithmetic
Modern processors like things to be simple and predictable; if branches are not predicted correctly then the CPU levies a heavy fine in extra cycles for flushing and refilling the instruction pipeline. Unfortunately, the decoding loop isn't very predictable because primes are fairly dense in the low number ranges we're talking about here:
if (!odd_composite[i])
++count;
If the average number of non-primes between primes times the cost of an addition is less than the penalty for a mispredicted branch then the following statement should be faster:
count += sieve[i];
This explains why I inverted the logic of the sieve compared to normal, because with 'is_composite' semantics I'd have to do
count += 1 ^ odd_composite[i];
And the rule is to pull everything out of inner loops that can be pulled out, so that I simply applied 1 ^ x to the whole array before even starting.
However, Euler wants us to sum the primes instead of counting them. This can be done in a similar fashion, by turning the value 1 into a mask of all 1 bits (which preserves everything when ANDing) and 0 zeroises any value. This is similar to the CMOV instruction, except that it works even on the oldest of CPUs and does not require a reasonably decent compiler:
static long prime_sum_v1 (byte[] sieve, int window_base, int window_bits)
{
long sum = 0;
int i = window_base % 3 == 0 ? 1 : 0;
int d = 3 - (window_base + 2 * i) % 3;
for ( ; i < window_bits; i += d, d ^= 3)
sum += (0 - sieve[i]) & (window_base + (i << 1));
return sum;
}
Result:
# benchmark: iter_v1 ...
sum up to 2 * 10^4: 21171191 in 0,10 ms
sum up to 2 * 10^5: 1709600813 in 0,36 ms // 3,6 times
sum up to 2 * 10^6: 142913828922 in 1,88 ms // 5,3 times
sum up to 2 * 10^7: 12272577818052 in 13,80 ms // 7,3 times
sum up to 2 * 10^8: 1075207199997334 in 157,39 ms // 11,4 times
sum up to 2 * 10^9: 95673602693282040 in 1.819,05 ms // 11,6 times
Unrolling and strength reduction
Now, a bit of overkill: a decoder with a fully unrolled wheel mod 15 (the unrolling can unlock some reserves of instruction-level parallelism).
static long prime_sum_v5 (byte[] sieve, int window_base, int window_bits)
{
Trace.Assert(window_base % 2 == 1);
int count = 0, sum = 0;
int residue = window_base % 30;
int phase = UpperIndex[residue];
int i = (SpokeValue[phase] - residue) >> 1;
// get into phase for the unrolled code (which is based on phase 0)
for ( ; phase != 0 && i < window_bits; i += DeltaDiv2[phase], phase = (phase + 1) & 7)
{
int b = sieve[i]; count += b; sum += (0 - b) & i;
}
// process full revolutions of the wheel (anchored at phase 0 == residue 1)
for (int e = window_bits - (29 >> 1); i < e; i += (30 >> 1))
{
int i0 = i + ( 1 >> 1), b0 = sieve[i0]; count += b0; sum += (0 - b0) & i0;
int i1 = i + ( 7 >> 1), b1 = sieve[i1]; count += b1; sum += (0 - b1) & i1;
int i2 = i + (11 >> 1), b2 = sieve[i2]; count += b2; sum += (0 - b2) & i2;
int i3 = i + (13 >> 1), b3 = sieve[i3]; count += b3; sum += (0 - b3) & i3;
int i4 = i + (17 >> 1), b4 = sieve[i4]; count += b4; sum += (0 - b4) & i4;
int i5 = i + (19 >> 1), b5 = sieve[i5]; count += b5; sum += (0 - b5) & i5;
int i6 = i + (23 >> 1), b6 = sieve[i6]; count += b6; sum += (0 - b6) & i6;
int i7 = i + (29 >> 1), b7 = sieve[i7]; count += b7; sum += (0 - b7) & i7;
}
// clean up leftovers
for ( ; i < window_bits; i += DeltaDiv2[phase], phase = (phase + 1) & 7)
{
int b = sieve[i]; count += b; sum += (0 - b) & i;
}
return (long)window_base * count + ((long)sum << 1);
}
As you can see, I performed a bit of strength reduction in order to make things easier for the compiler. Instead of summing window_base + (i << 1), I sum i and 1 separately and perform the rest of the calculation only once, at the end of the function.
Timings:
# benchmark: iter_v5(1) ...
sum up to 2 * 10^4: 21171191 in 0,01 ms
sum up to 2 * 10^5: 1709600813 in 0,11 ms // 9,0 times
sum up to 2 * 10^6: 142913828922 in 1,01 ms // 9,2 times
sum up to 2 * 10^7: 12272577818052 in 11,52 ms // 11,4 times
sum up to 2 * 10^8: 1075207199997334 in 130,43 ms // 11,3 times
sum up to 2 * 10^9: 95673602693282040 in 1.563,10 ms // 12,0 times
# benchmark: iter_v5(2) ...
sum up to 2 * 10^4: 21171191 in 0,01 ms
sum up to 2 * 10^5: 1709600813 in 0,09 ms // 8,7 times
sum up to 2 * 10^6: 142913828922 in 1,03 ms // 11,3 times
sum up to 2 * 10^7: 12272577818052 in 10,34 ms // 10,0 times
sum up to 2 * 10^8: 1075207199997334 in 121,08 ms // 11,7 times
sum up to 2 * 10^9: 95673602693282040 in 1.468,28 ms // 12,1 times
The first set of timings is for decoder_order == 1 (i.e. not telling the sieve about the extra skipped prime), for direct comparison to the other decoder versions. The second set is for decoder_order == 2, which means the sieve could skip the crossings-off for the prime 5 as well. Here are the null timings (essentially the sieve time without the decode time), to put things a bit into perspective:
# benchmark: iter_null(1) ...
sum up to 2 * 10^8: 10 in 94,74 ms // 11,4 times
sum up to 2 * 10^9: 10 in 1.194,18 ms // 12,6 times
# benchmark: iter_null(2) ...
sum up to 2 * 10^8: 10 in 86,05 ms // 11,9 times
sum up to 2 * 10^9: 10 in 1.109,32 ms // 12,9 times
This shows that the work on the decoder has decreased decode time for 2 billion from 1.21 s to 0.35 s, which is nothing to sneeze at. Similar speedups can be realised for the sieving as well, but that is nowhere near as easy as it was for the decoding.
Low-hanging fruit: presieving
Lastly, a technique that can sometimes offer dramatic speedups (especially for packed bitmaps and/or higher-order wheels) is blasting a canned bit pattern over the sieve before comencing a round of sieving, such that the sieve looks as if it had already been sieved by a handful of small primes. This is usually known as presieving. In the current case the speedup is marginal (not even 20%) but I'm showing it because it is a useful technique to have in one's toolchest.
Note: I've ripped the presieving logic from another Euler project, so it doesn't fit organically into the code I wrote for this article. But it should demonstrate the technique well enough.
const byte CROSSED_OFF = 0; // i.e. composite
const byte NOT_CROSSED = 1 ^ CROSSED_OFF; // i.e. not composite
const int SIEVE_BYTES = SIEVE_BITS; // i.e. 1 usable bit per byte
internal readonly static byte[] TinyPrimes = { 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31 };
internal readonly static int m_wheel_order = 3; // == number of wheel primes
internal static int m_presieve_level = 0; // == number of presieve primes
internal static int m_presieve_modulus = 0;
internal static byte[] m_presieve_pattern;
internal static void set_presieve_level (int presieve_primes)
{
m_presieve_level = Math.Max(0, presieve_primes);
m_presieve_modulus = 1;
for (int i = m_wheel_order; i < m_wheel_order + presieve_primes; ++i)
m_presieve_modulus *= TinyPrimes[i];
// the pattern needs to provide SIEVE_BYTES bytes for every residue of the modulus
m_presieve_pattern = new byte[m_presieve_modulus + SIEVE_BYTES - 1];
var pattern = m_presieve_pattern;
int current_size = 1;
pattern[0] = NOT_CROSSED;
for (int i = m_wheel_order; i < m_wheel_order + presieve_primes; ++i)
{
int current_prime = TinyPrimes[i];
int new_size = current_size * current_prime;
// keep doubling while possible
for ( ; current_size * 2 <= new_size; current_size *= 2)
Buffer.BlockCopy(pattern, 0, pattern, current_size, current_size);
// copy rest, if any
Buffer.BlockCopy(pattern, 0, pattern, current_size, new_size - current_size);
current_size = new_size;
// mark multiples of the current prime
for (int j = current_prime >> 1; j < current_size; j += current_prime)
pattern[j] = CROSSED_OFF;
}
for (current_size = m_presieve_modulus; current_size * 2 <= pattern.Length; current_size *= 2)
Buffer.BlockCopy(pattern, 0, pattern, current_size, current_size);
Buffer.BlockCopy(pattern, 0, pattern, current_size, pattern.Length - current_size);
}
For a quick test you can hack the presieving into the sieve function as follows:
- int first_sieve_prime_index = 1 + decoder_order; // skip wheel primes + decoder primes
+ int first_sieve_prime_index = 1 + decoder_order + m_presieve_level; // skip wheel primes + decoder primes
plus
- long sum = 2 + 3 + 5; // wheel primes + decoder primes
+ long sum = 2 + 3 + 5; // wheel primes + decoder primes
+
+ for (int i = 0; i < m_presieve_level; ++i)
+ sum += TinyPrimes[m_wheel_order + i];
plus
- fill(sieve, window_bits, (byte)1);
+ if (m_presieve_level == 0)
+ fill(sieve, window_bits, (byte)1);
+ else
+ Buffer.BlockCopy(m_presieve_pattern, (window_base >> 1) % m_presieve_modulus, sieve, 0, window_bits);
and
set_presieve_level(4) // 4 and 5 work well
in the static constructor or Main().
This way you can use m_presieve_level for turning presieving on and off. The BlockCopy also works correctly after calling set_presieve_level(0), though, because then the modulus is 1. m_wheel_order should reflect the actual wheel order (= 1) plus the decoder order; it's currently set to 3, so it'll work only with the v5 decoder at level 2.
Timings:
# benchmark: iter_v5(2) pre(7) ...
sum up to 2 * 10^4: 21171191 in 0,02 ms
sum up to 2 * 10^5: 1709600813 in 0,08 ms // 4,0 times
sum up to 2 * 10^6: 142913828922 in 0,78 ms // 9,6 times
sum up to 2 * 10^7: 12272577818052 in 8,78 ms // 11,2 times
sum up to 2 * 10^8: 1075207199997334 in 98,89 ms // 11,3 times
sum up to 2 * 10^9: 95673602693282040 in 1.245,19 ms // 12,6 times
sum up to 2^31 - 1: 109930816131860852 in 1.351,97 ms
I'm doing a operation in a very large set of numbers (13 digit length each).
I need to validate each number with checksum. Checksum digit will tell how far number is it to next ten, so i.e.:
checksum checksum digit
20 0
21 9
22 8
33 7
34 6
35 5
36 4
37 3
208 2
9 1
The numbers are EAN-13 format. So, the maximum digits sum = 217 (999999999999: no checksum validation).
By far, the fastest way I have think is preloading data into a int array and retrieving it by index.
Is this the fastest way possible?
Or at this point this does not matter anymore, since it will be executed fast enough even to a lot of numbers?
UPDATE:
preloading values of checksum digit for cheksum into an array as I mentioned:
for (int i = 0; i < 220; i += 10)
{
matchValues[i] = 0;
matchValues[i + 1] = 9;
matchValues[i + 2] = 8;
matchValues[i + 3] = 7;
matchValues[i + 4] = 6;
matchValues[i + 5] = 5;
matchValues[i + 6] = 4;
matchValues[i + 7] = 3;
matchValues[i + 8] = 2;
matchValues[i + 9] = 1;
}
With this I can cover all checksums with matching checksum digit matchValues[sum];
So:
matchValues[208] = 2;
matchValues[9] = 1;
etc.
You can use the modulo to get the distance from the ceiling ten. You still would need to iterate over each number obviously.
int modulo = i % 10;
int distanceFromTen = modulo == 0 ? 0 : 10 - modulus;
Another solution would be int distanceFromTen = (int)(Math.Ceiling(i / 10d) * 10 - i);.
I've run benchmarks for both approaches:
private static void Main(string[] args)
{
//Console.WriteLine("Checking {0}", i);
int loops = 10;
long averageModulo = 0;
long averageCeiling = 0;
for (int l = 0; l < loops; l++)
{
Stopwatch sw = new Stopwatch();
sw.Start();
for (int i = 0; i < 10000000; i++)
{
int modulus = i % 10;
int distanceFromTen = modulus == 0 ? 0 : 10 - modulus;
}
sw.Stop();
Stopwatch swTwo = new Stopwatch();
swTwo.Start();
for (int i = 0; i < 10000000; i++)
{
int distanceFromTenTwo = (int)(Math.Ceiling(i / 10d) * 10 - i);
}
swTwo.Stop();
Console.WriteLine("Modulo: {0} ({1}ms)", sw.ElapsedTicks, sw.ElapsedMilliseconds);
averageModulo += sw.ElapsedTicks;
Console.WriteLine("Math.Ceiling: {0} ({1}ms)", swTwo.ElapsedTicks, swTwo.ElapsedMilliseconds);
averageCeiling += swTwo.ElapsedTicks;
Console.WriteLine("");
}
Console.WriteLine("Average modulo: {0}", averageModulo / loops);
Console.WriteLine("Average ceiling: {0}", averageCeiling / loops);
Console.ReadLine();
}
The modulo operation is always faster than the ceiling (might be because of the boxing). This being said, both operations are very fast.
With the new edit I think this is now aimed at generating valid EANs in as little time as possible. Here is some code that will generate 100000000 EAN-13 checksums (as documented on the Wikipedia page) in toughly 3.5 seconds.
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Net;
using System.Net.Sockets;
using System.Numerics;
using System.Runtime.CompilerServices;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
internal class Program
{
private static void Main(string[] args)
{
Console.WriteLine("");
long start = 0;
long end = 99999999;
long count = end - start + 1;
long[] eans = new long[count];
Stopwatch sw = new Stopwatch();
sw.Start();
Parallel.For(start, end + 1, i => {
eans[i] = GenerateEAN13(i);
});
sw.Stop();
Console.WriteLine("Generation of {0} EAN-13s took {1} ticks ({2} ms)", count, sw.ElapsedTicks, sw.ElapsedMilliseconds);
Console.ReadLine();
}
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static long GenerateEAN13(long number)
{
long checksum = 0;
long digit = 0;
long tmp = number;
for (int i = 13; i >= 0; i--)
{
digit = tmp % 10;
tmp = tmp / 10;
checksum += i % 2 == 0 ? digit * 3 : digit;
if (tmp < 10)
break;
}
long modulus = checksum % 10;
checksum = modulus == 0 ? 0 : 10 - modulus;
return number * 10 + checksum;
}
}
Definitely use modulus for this. Your "cache" of values won't help much because the cost saved in the subtraction will be incurred in the lookup of the dictionary value by key, which in turn will call GetHashCode() and have some overhead:
int distanceFromNextTen = (10 - input % 10) % 10;
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
Project Euler Question 2.
Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:
1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
By considering the terms in the Fibonacci sequence whose values do not exceed four million, find the sum of the even-valued terms.
My solution :
int firstNum = 1;
int secondNum = 2;
int resultNum = firstNum + secondNum;
int sum = 0;
for (int i = 0; i < 4000000; i++)
{
firstNum = i;
secondNum = i;
if(resultNum == firstNum + secondNum)
{
sum += resultNum;
Console.WriteLine(sum);
}
}
Why is this not correct and can you guide me into the right way of thinking?
Try this:
int n1, n2, fib;
//n1 = 0;
//n2 = 1;
n1 = 1;
n2 = 1;
fib = n1 + n2;
while (fib < 4000000)
{
n2 = n1;
n1 = fib;
fib = n1 + n2;
}
Then find the even fib numbers and sum it
For a more modular approach (mixed with LINQ):
IEnumerable<Int32> Fibonacci(Int32 limit = 4000000)
{
for (Int32 previous = 0, current = 1, next = 0;
current <= limit; current = next)
{
next = previous + current;
previous = current;
yield return next;
}
}
Then:
var allNumbers = Fibonacci(4000000); // 1,2,3,5,8,13,21
var evenNumbers = allNumbers.Where(x => x % 2 == 0); // 2,8,34,144,610,2584
var sum = evenNumbers.Sum(); // 4613732
The fibonacci series is defined as
A0 = 1,
A1 = 1
An = An-1 + An-2
You are aiming at producing the pattern
1 2 3 5 8 13 etc
While iterating, you are going to want to adjust the input similar to a sliding window and then check to see if you have come across a valid insertion (i.e. < 4M and even)
int sum = 0;
int max = 4000000;
for( int n = 0; n < max ; n++ )
{
//only sum the even numbers
if( second % 2 == 0 ) sum += second;
//adjust
int result = first + second;
first = second;
second = result;
//test for numbers greater than max
if( result > max ) break;
}
//output
Console.WriteLine(sum); //An for all even An values
After looking at this hopefully you can see some of the issues you came across.
You are setting your variables to the iterator i which is not going to produce An as defined but instead something entirely different.
firstNum = i;
secondNum = i;
Further, you only calculate the result once. This needs to be done in the loop. Only calculating once will basically use a static value the entire time.
int resultNum = firstNum + secondNum;
The conditional statement should be testing for an even number in order to properly add to the sum, but this code will only test the static value of resultNum
if(resultNum == firstNum + secondNum)
Also, there needs to be some check on the sum in order to break out when the max is exceeded. 4M iterations will be too many.
There is even more optimization that can occur here though. Looking at the for loop, it is clear that while not used yet, the iterator can be a powerful tool.
The reason being that the fibonacci conforms to the "Golden ratio".
By making the simple observation that the fibonacci series hits an even number ever 3 iterations, the iterator can be used to skip through the series.
double p = (1 + Math.Pow(5,.5)) / 2;
for( int n = 3, sum = 0;;n+=3)
{
double f = ( Math.Pow(p,n) - Math.Pow( 1 - p , n ) ) / Math.Pow(5,.5);
if( f > 4000000 ){
Console.WriteLine(sum);
break;
}
sum += (int)Math.Round(f);
}
Your code does not produce a Fibonacci sequence and does no check for even-valued terms
Try this instead
int firstNum = 1;
int secondNum = 2;
int sum = 0;
while (secondNum <= 4000000)
{
if (secondNum % 2 == 0)
sum += secondNum;
int resultNum = firstNum + secondNum;
firstNum = secondNum;
secondNum = resultNum;
}
Console.WriteLine(sum);