Compare 2 byte arrays - c#

I have 2 int arrays.
int[] data1 #
int[] data2 #
I want to create a 3rd int[] data3 which is the differences between the 2 other arrays.
Let us take the 1st value in data1.
The value is 15 (e.g.).
Now let us take the 1st value in data2.
The value is 3 (e.g.).
The 1st value in data3 would be 12.
BUT, if the 1st values were the other way round i.e.
data1[0] = 3
data2[0] = 15
then the difference would be -12. Yet I want it to be just 12.
at the moment I have a for loop and I do the computation stuff there to get that type of result.
Is there a way to do data1-data2 = data3 without enumerating through
a loop?
If so, can I just get the differences without using minus
numbers?
Thanks
N.B.
In response to the 'closers'. I agree with you to a certain point. What I need to add to this questions is this:
I am looking for the most efficient (quickest way but low memory being 2nd priority) to facilitate this. Using Linq (as I understand it) can be the slowest approach?

You are looking for Zip method
var data3 = data1.Zip(data2, (d1,d2) => Math.Abs(d1 - d2)).ToArray();
Enumerable.Zip<TFirst, TSecond, TResult> Method
Applies a specified function to the corresponding elements of two sequences, producing a sequence of the results.
So it simply takes each corresponding element, e.g data1[0] and data2[0], then data1[1] and data2[1] so on.. then applies the function Math.Abs(d1-d2) which simply substracts two numbers and gets the absolute value of the result. Then returns a sequence which contains the result of each operation.

"Is there a way to do data1-data2 = data3 without enumerating through a loop?" No. It is technically impossible.
At best, or rather worst, you can call function that will do enumeration for you. But it will be slow. In case of LINQ, ungodly slow.
For machine I am currently working on results from other answers are as following for 4KB table (1024 integers).
23560 ticks - Giannis Paraskevopoulos. Array-Enumerable-Array conversions not too fast, Copying array via ToList().ToArray() chain is roughly 25 times slower than Array.Copy().
10198 ticks - Selman22. 2 times faster, but still slow. Lambdas are eye candy to make creating events prettier, not faster. You end up with some anonymous method laying around, which takes can eat more CPU time for call-return than its operation (remember that math we do here CPU can do in just few cycles).
566 ticks - Tim Schmelter GetDifference() function (Main culprit is JIT here, in native code and/or more often usage difference would be negligible)
27 ticks - Just a loop. 400 times faster than Zip, over 800 faster than converting array to list and back.
Loop code:
for (int i = 0; i < data3.Length; i++)
{
data3[i] = Math.Abs(data1[i] - data2[i]);
}
Such basic memory operations can be directly translated to machine code without horrible performance and humongous memory footprint of LINQ.
Moral of the story is: LINQ is for readability (which in this case is arguable) not for performance (which in this case is noticeable).
Optimization time! Lets abuse our CPU a tiny bit.
Unroll loop. Or do not. Your experience may vary. Even in
assembler itself loop unrolling performance gain or lose vary
greatly in same family of processors. New CPU's and compilers are
aware of old tricks and simply implement them on their own. For
i3-3220 I tested code on loop unroll to 4 lines resulted in faster
execution on 32bit code but on 64bit it was a bit slower while unroll to 8 was opposite.
Compile for x64. As we are here working on 32bit data we won't make
use of 64bit registers... or will we? On x86 less than half of
registers are truly available to generated code (in assembly written
by hand you can always squeeze out more), on x64 however you get eight bonus registers which are free to use. The more you can do without accessing memory, the faster your code. In this case speed gain is at about 20%.
Close Visual Studio. Do not speed-test 64 bit code in 32bit IDE
(there is no 64bit version as for now, and probably wont be for
long time). It will make x64 code roughly two times slower due
to architecture mismatch. (Well... you should never speed-test code under debugger anyway...)
Do not use built-in functions too much. In this case Math.Abs have
overhead hidden inside. For some reasons (which will need analyze of IL to find out), checking for negative values was faster with ?: than with If-Else. Such check saved a lot of time.
UPDATE: ?: is faster than If-Else due to differences in resulting machine code... at least for just comparing two values. Its machine code is much less weird than If-Else (which does not look like what you would write "by-hand"). Apparently it is not just different form of writing If-Else statement but fully separate command optimized for simple conditional assignment.
Resulting code was roughly 8 times faster than simple loop with Math.Abs(); Remember you can unroll loop only to divisors of your dataset size. You wrote that your dataset size is 25920, so 8 is fine. (max is 64, but I doubt it would have any sense to go that high). I suggest hiding this code in some function as it is fugly.
int[] data3 = new int[data1.Length];
for (int i = 0; i < data1.Length; i += 8)
{
int b;
b = (data1[i + 0] - data2[i + 0]);
data3[i + 0] = b < 0 ? -b : b;
b = (data1[i + 1] - data2[i + 1]);
data3[i + 1] = b < 0 ? -b : b;
b = (data1[i + 2] - data2[i + 2]);
data3[i + 2] = b < 0 ? -b : b;
b = (data1[i + 3] - data2[i + 3]);
data3[i + 3] = b < 0 ? -b : b;
b = (data1[i + 3] - data2[i + 4]);
data3[i + 4] = b < 0 ? -b : b;
b = (data1[i + 5] - data2[i + 5]);
data3[i + 5] = b < 0 ? -b : b;
b = (data1[i + 6] - data2[i + 6]);
data3[i + 6] = b < 0 ? -b : b;
b = (data1[i + 7] - data2[i + 7]);
data3[i + 7] = b < 0 ? -b : b;
}
This is not even its final form. I will try to do some more heretic tricks on it.
BitHack's, low level cheats!
As I mentioned, there was still place for improvements.
After cutting out LINQ, main ticks munchkin was Abs(). When it was removed from code we got left with contest between IF-ELSE and shorthand ?: operator.
Both are branching operators, which once in the past were widely known as being slower than linear code. Currently ease of use/writing tend to be picked over performance (sometimes correctly, sometimes incorrectly).
So lets make our branching condition linear. It is possible by abusing fact that branching in this code contains math operating on just single variable. So lets make code equivalent of this.
Now do you remember how to negate Two's complement number?, negate all bits and add one. Lets do it in one line without conditions then!
It is bitwise operators time to shine. OR and AND are boring, real men use XOR. Whats so cool about XOR? Aside of its usual behavior you can also turn it into NOT (negation) and NOP (no-operation).
1 XOR 1 = 0
0 XOR 1 = 1
so XOR'ing by value filled with only 1's gives you NOT operation.
1 XOR 0 = 1
0 XOR 0 = 0
so XOR'ing by value filled with only 0's does nothing at all.
We can obtain sign from our number. For 32bit integer it is as simple as x>>31. It moves bit sign to lowest bit. As even wiki will tell you, bits inserted from left will be zeros, so you result of x>>31 will be 1 for negative number (x<0) and 0 for non-negative (x>=0), right?
Nope. For signed values Arithmetic shift is used over plain bit-shift. So we will get -1 or 0 depending on sign.... which means that 'x>>31' will give 111...111 for negative and 000...000 for non-negative. If you will XOR original x by result of such shift you will perform NOT or NOP depending on value sign. Another useful thing is that 0 will result in NOP for addition/negation so we can add/subtract -1 depending on value sign.
So 'x^(x>>31)' will flip bits of negative number while making no changes to non-negative and 'x-(x>>31)' will add 1 (negated negative value gives positive) to negative x and make no changes to non-negative value.
When combined you get '(x ^ (x >> 31)) - (x >> 31)'... which can be translated to:
IF X<0
X=!X+1
and it is just
IF X<0
X=-X
How does it affect performance?
Our XorAbs() requires just four basic integer operations with one load and one store. Branching operator itself takes about as much CPU ticks. And while modern CPU's are great at doing branch-predicting, they are still faster by simply not doing it when feed sequential code.
And whats the score?
Roughly four times faster than built-in Abs();
About twice as fast as previous code (versions without unrolling)
Depending on CPU it can get better result without loop-unrolling.
Due to elimination of code branching CPU can "unroll" loop on its
own. (Haswells are weird with unrolling)
Resulting code:
for (int i = 0; i < data1.Length; i++)
{
int x = data1[i] - data2[i];
data3[i] = (x ^ (x >> 31)) - (x >> 31);
}
Parallelism and Cache usage
CPU have super fast Cache memory, when processing an array sequentially it will copy whole chunks of it to cache.
But if you write crappy code you will get cache misses. You can easily fall into this trap by screwing up order of nested loops.
Parallelism (multiple threads, same data) must works on sequential chunks in order to make good use of cpu cache.
Writing threads by hand will allow you to pick chunks for threads manually, but it is bothersome way.
Since 4.0 .NET comes with helpers for that, however default Parallel.For makes a mess of cache.
So this code is actually slower than its single thread version due to cache-miss.
Parallel.For(0, data1.Length,
fn =>
{
int x = data1[fn] - data2[fn];
data3[fn] = (x ^ (x >> 31)) - (x >> 31);
}
It is possible to make manual use of cached data by performing sequential operation in it. For example you can unroll loop, but its dirty hack and unrolling have its own performance issues (it depends on CPU model).
Parallel.For(0, data1.Length >> 3,
i =>
{
int b;
b = (data1[i + 0] - data2[i + 0]);
data3[i + 0] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 1] - data2[i + 1]);
data3[i + 1] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 2] - data2[i + 2]);
data3[i + 2] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 3] - data2[i + 3]);
data3[i + 3] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 3] - data2[i + 4]);
data3[i + 4] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 5] - data2[i + 5]);
data3[i + 5] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 6] - data2[i + 6]);
data3[i + 6] = b < 0 ? (b ^ -1) + b : b;
b = (data1[i + 7] - data2[i + 7]);
data3[i + 7] = b < 0 ? (b ^ -1) + b : b;
}
However .NET also have Parrarel.ForEach and Load Balancing Partitioners.
By using both of them you get best of all worlds:
dataset size independent code
short, neat code
multithreading
good cache usage
So final code would be:
var rangePartitioner = Partitioner.Create(0, data1.Length);
Parallel.ForEach(rangePartitioner, (range, loopState)
=>
{
for (int i = range.Item1; i < range.Item2; i++)
{
int x = data1[i] - data2[i];
data3[i] = (x ^ (x >> 31)) - (x >> 31);
}
});
It is far from maximum CPU usage (which is more complicated than just maxing its clock, there are multiple cache levels, several pipelines and much more) but it is readable, fast and platform independent (except integer size, but C# int is alias to System.Int32 so we are safe here).
Here I think we will stop with optimization.
It came out as an article rather than answer, I hope no one will purge me for it.

Here is another (less readable but maybe a little bit more efficient) approach that does not need LINQ:
public static int[] GetDifference(int[] first, int[] second)
{
int commonLength = Math.Min(first.Length, second.Length);
int[] diff = new int[commonLength];
for (int i = 0; i < commonLength; i++)
diff[i] = Math.Abs(first[i] - second[i]);
return diff;
}
Why little bit more efficient? Because ToArray has to resize the array until it knows the final size.

var data3 = data1.Select((x,i)=>new {x,i})
.Join
(
data2.Select((x,i)=>new {x,i}),
x=>x.i,
x=>x.i,
(d1,d2)=>Math.Abs(d1.x-d2.x)
)
.ToArray();

Related

C# post increment and pre increment

I get very confused at times with the shorthand increment operation.
Since when i was little programming in BASIC, i got some how stuck with a = a+1 which is long painful way of saying 'Get a's current value, add 1 to it and then store the new value back to a'.
1] a = a +1 ;
2] a++ ;
3] ++a;
4] a +=1;
[1] and [4] are similar in functionality different in notation, right?
2] and 3] work simply differently because of the fact that the increment signs ++ is before and after. Right?
Am I safe to assume the below?
int f(int x){ return x * x;}
y = f(x++) -> for x =2, f(x) = x^2
f(x) ======> y= 2^2 =4
x=x+1; ======> x= 2+1 = 3
y = f(++x) -> for x =2, f(x) = x^2
x=x+1 ===========> x = 2+1 = 3
f(x) ===========> y =3^2 = 9
Difference is, what the operator returns:
The post-increment operator "a plus plus" adds one, and returns the old value:
int a = 1;
int b = a++;
// now a is 2, b is 1
The pre-increment operator "plus plus a" adds one, and returns the new value:
a = 1;
b = ++a;
// now a is 2 and b is 2
First off, you should read this answer very carefully:
What is the difference between i++ and ++i?
And read this blog post very carefully:
http://blogs.msdn.com/b/ericlippert/archive/2011/03/29/compound-assignment-part-one.aspx
Note that part two of that post was an April Fool's joke, so don't believe anything it says. Part one is serious.
Those should answer your questions.
When you have just an ordinary local variable, the statements x++; ++x; x = x + 1; x += 1; are all basically the same thing. But as soon as you stray from ordinary local variables, things get more complicated. Those operations have subtleties to them.
1], 3] and 4] are functionally identical - a is incremented by one, and the value of the whole expression is the new value of a.
2] is different from the others. It also increments a, but the value of the expression is the previous value of a.

Optimisation of threshold computation

I'm trying to optimise the following C# code, which sets bytes to 0x00 or 0xFF based on a threshold.
for (int i = 0; i < veryLargeNumber; i++)
{
data[i] = (byte)(data[i] < threshold ? 0 : 255);
}
Visual Studio's performance profiler shows that the above code is rather expensive, taking nearly 8 seconds to compute - 98% of my total processing expense. I'm processing just under a thousand items, so that adds up to over two hours.
I think the issue is to do with the ternary conditional operator, since it causes a branch. I'd imagine a pure-math operation of some sort could be significantly faster, since it's CPU-cache friendly.
Is there a way to optimise this? It's possible for me to fix the threshold value, if that helps. I'd consider anything above a ~7% performance increase a win, since that's a whole 10 minutes shaved off the total processing time.
If you are using .NET 4.0 Framework, you could make use of Parallel Library in following link,
http://msdn.microsoft.com/en-us/library/dd460717
In Your case, you must have to verify the threshold, anyway it would take time. So make use of thread or lambda expressions
Just to suggest, use bitwise operators for this purpose because they are faster, together with parallel approach.
0x00 = 0000 0000
0xFF = 1111 1111
Try with OR operator(i.e. 0 | 1 = 1 where | stands for OR operator
EDIT:
This is how you could compare which number is bigger:
let a,b be numbers:
int temp= a ^ b;
temp|= temp>> 1;
temp|= temp>> 2;
temp|= temp>> 4;
temp|= temp>> 8;
temp|= temp>> 16;
temp&= ~(temp>> 1) | 0x80000000;
temp&= (a ^ 0x80000000) & (b ^ 0x7fffffff);
If you want a bit-wise solution -
int intSize = sizeof(int) * 8 - 1;
byte t = (byte)(threshold - 1);
for (....)
{
data[i] = (byte)(255 + 1 ^ ((t - data[i]) >> intSize));
}
Note: Wont work for corner case of 0. Sorry bout that
Also, try using an int array instead of byte and see if it is faster

Fibonacci LFSRs calculation optimisation

Fibonacci LFSR is described on wiki, it's pretty simple.
I'd like to calucate the period of some Fibonacci's LFSR and use generated sequence for ciphering later.
Let's take and example from wiki:
x16 + x14 + x13 + x11 + 1;
//code from wiki:
include <stdint.h>
uint16_t lfsr = 0xACE1u;
unsigned bit;
unsigned period = 0;
do {
/* taps: 16 14 13 11; characteristic polynomial: x^16 + x^14 + x^13 + x^11 + 1 */
bit = ((lfsr >> 0) ^ (lfsr >> 2) ^ (lfsr >> 3) ^ (lfsr >> 5) ) & 1;
lfsr = (lfsr >> 1) | (bit << 15);
++period;
} while(lfsr != 0xACE1u);
My weakly try so far in php:
function getPeriod(){
$polynoms = array(16, 14, 13, 11);
$input = $polynoms[0] - 1;
$n = sizeof($polynoms);
for ($i = 1; $i < $n; $i++)
$polynoms[$i] = $polynoms[0] - $polynoms[$i];
$polynoms[0] = 0;
//reversed polynoms == array(0, 2, 3, 5);
$lfsr = 0x1; //begining state
$period = 0;
//gmp -- php library for long numbers;
$lfsr = gmp_init($lfsr, 16);
do {
$bit = $lfsr; //bit = x^16 >> 0;
for($i = 1; $i < $n; $i++) {
//bit ^= lfsr >> 2 ^ lfst >> 3 ^ lfst >> 5;
$bit = gmp_xor($bit, ( gmp_div_q($lfsr, gmp_pow(2, $polynoms[$i])) ));
}
//bit &= 1;
$bit = gmp_and($bit, 1);
//lfsr = $lfsr >> 1 | $lsfr << (16 - 1);
$lfsr = gmp_or( (gmp_div_q($lfsr, 2)), (gmp_mul($bit, gmp_pow(2, $input))) );
$period++;
} while (gmp_cmp($lfsr, 0x1) != 0);
echo '<br />period = '.$period;
//period == 65535 == 2^16 - 1; -- and that's correct;
// I hope, at least;
return $period;
}
Problem:
If i try to modulate work of i.e.
x321 + x14 + x13 + x11 + 1;
i got an error:"Fatal error: Maximum execution time of 30 seconds exceeded in /var/www/Dx02/test.php";
Can i somehow optimize (accelerate :) ) the calculation?
Any help is appreciated. Thank you and excuse me for my English.
You simply can't do it this way with a polynomial like x^321 + ...;
If the polynomial is chosen well, you get a period length of 2^231 -1,
and this is approximately 4.27 *10^96. If I'm not mistaken, this number is
believed to exceed the number of atoms in the universe...
(Strictly speaking, I'm referring to the posted C-code since I do not know php, but that certainly makes no difference.)
However, there is a mathematical method to calculate the length of the period without doing a brute-force attack. Unfortunately, this can't be explained in a few lines. If you have a solid background in math (especially calculations in finite fields), I'll be glad to look for a helpful reference for you.
EDIT:
The first step in calculating the period of the LFSR obtained by using a polynomial p(x) is to obtain a factorization of p(x) mod 2, i.e. in GF(2). To do this, I recommend using software like Mathematica or Maple if available. You could also try the freely available Sage, see e.g. http://www.sagemath.org/doc/constructions/polynomials.html for usage details.
The period of p(x) is given by its order e, that means the smallest number such that p(x) divedes x^e+1. Unfortunately, I can't provide more information at the moment, it will take me several days to look for the lecture notes of a course I took several years ago...
A small example: p(x) = (x^5+x^4+1) = (x^3+x+1)*(x^2+x+1), the individual periods are 2^3-1=7 and 2^2-1=3, and since all polynomial factors are different, the period of p(x) is 3*7=21, which I also verified in C++.
To optimize this a bit we need to remember that PHP has great overhead on parsing code as it is not compiled, so we need to do as much work ourselves for it as we can. You should always profile your CPU/memory sensitive code with xdebug+KCachegrind(for example) to see where PHP wastes most of it's time. With your code only 12% is spent on gmp_* functions calculations, most of the time is spent for code parsing.
On my notebook(it is rather slow) my code runs 2.4 sec instead of 3.5 sec for your code, but for greater degrees the difference should be more noticeable (for example 19 power gives 19 vs 28 sec). It is not much, but it is some.
I left comments inside code, but if you have some questions - feel free to ask. I used function creation to replace that 'for($i = 1; $i < $n; $i++)' loop inside your main loop.
Also, I think you should change type of your $period variable to GMP(and $period++ to gmp_* function) as it can be greater then maximum integer on your system.
function getPeriod() {
$polynoms = array(16, 14, 13, 11);
$highest = $polynoms[0];
$input = $highest - 1;
//Delete first element of array - we don't need it anyway
array_shift($polynoms);
$polynoms_count = count($polynoms);
//You always repeat gmp_pow(2, $input) and it's result is constant,
//so better precalculate it once.
$input_pow = gmp_pow(2, $input);
//Start function creation.
//If you don't use PHP accelerators, then shorter variable names
//work slightly faster, so I replaced some of names
//$perion->$r,$bit -> $b, $lfsr -> $l, $polynoms -> $p
$function_str = '$r=0;';
$function_str .= 'do{';
//Now we need to get rid of your loop inside loop, we can generate
//static functions chain to replace it.
//Also, PHP parses all PHP tokens, even ';' and it takes some time,
//So, we should write as much one-liners as we can.
$function_str .= '$b=gmp_xor($b=$l';
foreach ($polynoms AS $id => &$polynom) {
//You always repeat gmp_pow(2, $polynoms[$i]) and it's result is constant,
//so better precalculate it once.
$polynom = gmp_pow(2, $highest - $polynom);
//We create our functions chain here
if ($id < $polynoms_count - 1) {
$function_str.=',gmp_xor(gmp_div_q($l, $p[' . $id . '])';
} else {
$function_str.=',gmp_div_q($l, $p[' . $id . '])';
}
}
//Close all brackets
$function_str.=str_repeat(')', $polynoms_count);
//I don't know how to optimize the following, so I left it without change
$function_str.=';';
$function_str.='$l = gmp_or((gmp_div_q($l, 2)), (gmp_mul(gmp_and($b, 1), $i_p)));';
$function_str.='$r++;';
$function_str.='} while (gmp_cmp($l, 0x1));';
$function_str.='return $r;';
//Now, create our funciton
$function = create_function('$l,$p,$i_p', $function_str);
//Set begining states
$lfsr = 0x1;
$lfsr = gmp_init($lfsr, 16);
//Run function
$period = $function($lfsr, $polynoms, $input_pow);
//Use result
echo '<br />period = ' . $period;
return $period;
}

Fast way to manually mod a number

I need to be able to calculate (a^b) % c for very large values of a and b (which individually are pushing limit and which cause overflow errors when you try to calculate a^b). For small enough numbers, using the identity (a^b)%c = (a%c)^b%c works, but if c is too large this doesn't really help. I wrote a loop to do the mod operation manually, one a at a time:
private static long no_Overflow_Mod(ulong num_base, ulong num_exponent, ulong mod)
{
long answer = 1;
for (int x = 0; x < num_exponent; x++)
{
answer = (answer * num_base) % mod;
}
return answer;
}
but this takes a very long time. Is there any simple and fast way to do this operation without actually having to take a to the power of b AND without using time-consuming loops? If all else fails, I can make a bool array to represent a huge data type and figure out how to do this with bitwise operators, but there has to be a better way.
I guess you are looking for : http://en.wikipedia.org/wiki/Montgomery_reduction
or the simpler way based on Modular Exponentiation (from wikipedia)
Bignum modpow(Bignum base, Bignum exponent, Bignum modulus) {
Bignum result = 1;
while (exponent > 0) {
if ((exponent & 1) == 1) {
// multiply in this bit's contribution while using modulus to keep result small
result = (result * base) % modulus;
}
// move to the next bit of the exponent, square (and mod) the base accordingly
exponent >>= 1;
base = (base * base) % modulus;
}
return result;
}
Fast Modular Exponentiation (I think that's what it's called) might work.
Given a, b, c and a^b (mod c):
1. Write b as a sum of powers of 2. (If b=72, this is 2^6 + 2^3 )
2. Do:
(1) a^2 (mod c) = a*
(2) (a*)^2 (mod c) = a*
(3) (a*)^2 (mod c) = a*
...
(n) (a*)^2 (mod c) = a*
3. Using the a* from above, multiply the a* for the powers of 2 you identified. For example:
b = 72, use a* at 3 and a* at 6.
a*(3) x a*(6) (mod c)
4. Do the previous step one multiplication at a time and at the end, you'll have a^b % c.
Now, how you're going to do that with data types, I don't know. As long as your datatype can support c^2, i think you'll be fine.
If using strings, just create string versions of add, subtract, and multiply (not too hard). This method should be quick enough doing that. (and you can start step 1 by a mod c so that a is never greater than c).
EDIT: Oh look, a wiki page on Modular Exponentiation.
Here's an example of Fast Modular Exponentiation (suggested in one of the earlier answers) in java. Shouldn't be too hard to convert that to C#
http://www.math.umn.edu/~garrett/crypto/a01/FastPow.html
and the source...
http://www.math.umn.edu/~garrett/crypto/a01/FastPow.java
Python has pow(a,b,c) which returns (a**b)%c (only faster), so there must be some clever way to do this. Maybe they just do the identity you mentioned.
I'd recommend checking over the Decimal documentation and seeing if it meets your requirements since it is a built in type and can use the mod operator. If not then you're going to need an arbitrary precision library like java's Bignum.
You can try factoring 'a' into sufficiently small numbers.
If the factors of 'a' are 'x', 'y', and 'z', then
a^b = (x^b)(y^b)(z^b).
Then you can use your identity: (a^b)%c = (a%c)^b%c
It seems to me like there's some kind of relation between power and mod. Power is just repeated multiplication and mod is related to division. We know that multiplication and division are inverses, so through that connection I would assume there's a correlation between power and mod.
For example, take powers of 5:
5 % 4 = 1
25 % 4 = 1
125 % 4 = 1
625 % 4 = 1
...
The pattern is clear that 5 ^ b % 4 = 1 for all values of b.
It's less clear in this situation:
5 % 3 = 2
25 % 3 = 1
125 % 3 = 2
625 % 3 = 1
3125 % 3 = 2
15625 % 3 = 1
78125 % 3 = 2
...
But there's still a pattern.
If you could work out the math behind the patterns, I wouldn't be surprised if you could figure out the value of the mod without doing the actual power.
You could try this:
C#: Doing a modulus (mod) operation on a very large number (> Int64.MaxValue)
http://www.del337ed.com/blog/index.php/2009/02/04/c-doing-a-modulus-mod-operation-on-a-very-large-number-int64maxvalue/
Short of writing your own fast modular exponentiation, the simplest idea I can come up with, is to use the F# BigInt type: Microsoft.FSharp.Math.Types.BigInt which supports operations with arbitrarily large scale - including exponentiation and modular arithmetic.
It's a built-in type that will be part of the full .NET framework with the next release. You don't need to use F# to use BitInt - you can make use of it directly in C#.
Can you factor a, b, or c? Does C have a known range?
These are 32 bit integers! Go check this site
For instance, here is how you get the mod of n%d where d 1>>s (1,2,4,8,...)
int n = 137; // numerator
int d = 32; // denom d will be one of: 1, 2, 4, 8, 16, 32, ...
int m; // m will be n % d
m = n & (d - 1);
There is code for n%d where d is 1>>s - 1 (1, 3, 7, 15, 31, ...)
This is only going to really help if c is small though, like you said.
Looks like homework in cryptography.
Hint: check out Fermat's little theorem.

Project Euler #16 - C# 2.0

I've been wrestling with Project Euler Problem #16 in C# 2.0. The crux of the question is that you have to calculate and then iterate through each digit in a number that is 604 digits long (or there-abouts). You then add up these digits to produce the answer.
This presents a problem: C# 2.0 doesn't have a built-in datatype that can handle this sort of calculation precision. I could use a 3rd party library, but that would defeat the purpose of attempting to solve it programmatically without external libraries. I can solve it in Perl; but I'm trying to solve it in C# 2.0 (I'll attempt to use C# 3.0 in my next run-through of the Project Euler questions).
Question
What suggestions (not answers!) do you have for solving project Euler #16 in C# 2.0? What methods would work?
NB: If you decide to post an answer, please prefix your attempt with a blockquote that has ###Spoiler written before it.
A number of a series of digits. A 32 bit unsigned int is 32 binary digits. The string "12345" is a series of 5 digits. Digits can be stored in many ways: as bits, characters, array elements and so on. The largest "native" datatype in C# with complete precision is probably the decimal type (128 bits, 28-29 digits). Just choose your own method of storing digits that allows you to store much bigger numbers.
As for the rest, this will give you a clue:
21 = 2
22 = 21 + 21
23 = 22 + 22
Example:
The sum of digits of 2^100000 is 135178
Ran in 4875 ms
The sum of digits of 2^10000 is 13561
Ran in 51 ms
The sum of digits of 2^1000 is 1366
Ran in 2 ms
SPOILER ALERT: Algorithm and solution in C# follows.
Basically, as alluded to a number is nothing more than an array of digits. This can be represented easily in two ways:
As a string;
As an array of characters or digits.
As others have mentioned, storing the digits in reverse order is actually advisable. It makes the calculations much easier. I tried both of the above methods. I found strings and the character arithmetic irritating (it's easier in C/C++; the syntax is just plain annoying in C#).
The first thing to note is that you can do this with one array. You don't need to allocate more storage at each iteration. As mentioned you can find a power of 2 by doubling the previous power of 2. So you can find 21000 by doubling 1 one thousand times. The doubling can be done in place with the general algorithm:
carry = 0
foreach digit in array
sum = digit + digit + carry
if sum > 10 then
carry = 1
sum -= 10
else
carry = 0
end if
digit = sum
end foreach
This algorithm is basically the same for using a string or an array. At the end you just add up the digits. A naive implementation might add the results into a new array or string with each iteration. Bad idea. Really slows it down. As mentioned, it can be done in place.
But how large should the array be? Well that's easy too. Mathematically you can convert 2^a to 10^f(a) where f(a) is a simple logarithmic conversion and the number of digits you need is the next higher integer from that power of 10. For simplicity, you can just use:
digits required = ceil(power of 2 / 3)
which is a close approximation and sufficient.
Where you can really optimise this is by using larger digits. A 32 bit signed int can store a number between +/- 2 billion (approximately. Well 9 digits equals a billion so you can use a 32 bit int (signed or unsigned) as basically a base one billion "digit". You can work out how many ints you need, create that array and that's all the storage you need to run the entire algorithm (being 130ish bytes) with everything being done in place.
Solution follows (in fairly rough C#):
static void problem16a()
{
const int limit = 1000;
int ints = limit / 29;
int[] number = new int[ints + 1];
number[0] = 2;
for (int i = 2; i <= limit; i++)
{
doubleNumber(number);
}
String text = NumberToString(number);
Console.WriteLine(text);
Console.WriteLine("The sum of digits of 2^" + limit + " is " + sumDigits(text));
}
static void doubleNumber(int[] n)
{
int carry = 0;
for (int i = 0; i < n.Length; i++)
{
n[i] <<= 1;
n[i] += carry;
if (n[i] >= 1000000000)
{
carry = 1;
n[i] -= 1000000000;
}
else
{
carry = 0;
}
}
}
static String NumberToString(int[] n)
{
int i = n.Length;
while (i > 0 && n[--i] == 0)
;
String ret = "" + n[i--];
while (i >= 0)
{
ret += String.Format("{0:000000000}", n[i--]);
}
return ret;
}
I solved this one using C# also, much to my dismay when I discovered that Python can do this in one simple operation.
Your goal is to create an adding machine using arrays of int values.
Spoiler follows
I ended up using an array of int
values to simulate an adding machine,
but I represented the number backwards
- which you can do because the problem only asks for the sum of the digits,
this means order is irrelevant.
What you're essentially doing is
doubling the value 1000 times, so you
can double the value 1 stored in the
1st element of the array, and then
continue looping until your value is
over 10. This is where you will have
to keep track of a carry value. The
first power of 2 that is over 10 is
16, so the elements in the array after
the 5th iteration are 6 and 1.
Now when you loop through the array
starting at the 1st value (6), it
becomes 12 (so you keep the last
digit, and set a carry bit on the next
index of the array) - which when
that value is doubled you get 2 ... plus the 1 for the carry bit which
equals 3. Now you have 2 and 3 in your
array which represents 32.
Continues this process 1000 times and
you'll have an array with roughly 600
elements that you can easily add up.
I have solved this one before, and now I re-solved it using C# 3.0. :)
I just wrote a Multiply extension method that takes an IEnumerable<int> and a multiplier and returns an IEnumerable<int>. (Each int represents a digit, and the first one it the least significant digit.) Then I just created a list with the item { 1 } and multiplied it by 2 a 1000 times. Adding the items in the list is simple with the Sum extension method.
19 lines of code, which runs in 13 ms. on my laptop. :)
Pretend you are very young, with square paper. To me, that is like a list of numbers. Then to double it you double each number, then handle any "carries", by subtracting the 10s and adding 1 to the next index. So if the answer is 1366... something like (completely unoptimized, rot13):
hfvat Flfgrz;
hfvat Flfgrz.Pbyyrpgvbaf.Trarevp;
pynff Cebtenz {
fgngvp ibvq Pneel(Yvfg<vag> yvfg, vag vaqrk) {
juvyr (yvfg[vaqrk] > 9) {
yvfg[vaqrk] -= 10;
vs (vaqrk == yvfg.Pbhag - 1) yvfg.Nqq(1);
ryfr yvfg[vaqrk + 1]++;
}
}
fgngvp ibvq Znva() {
ine qvtvgf = arj Yvfg<vag> { 1 }; // 2^0
sbe (vag cbjre = 1; cbjre <= 1000; cbjre++) {
sbe (vag qvtvg = 0; qvtvg < qvtvgf.Pbhag; qvtvg++) {
qvtvgf[qvtvg] *= 2;
}
sbe (vag qvtvg = 0; qvtvg < qvtvgf.Pbhag; qvtvg++) {
Pneel(qvtvgf, qvtvg);
}
}
qvtvgf.Erirefr();
sbernpu (vag v va qvtvgf) {
Pbafbyr.Jevgr(v);
}
Pbafbyr.JevgrYvar();
vag fhz = 0;
sbernpu (vag v va qvtvgf) fhz += v;
Pbafbyr.Jevgr("fhz: ");
Pbafbyr.JevgrYvar(fhz);
}
}
If you wish to do the primary calculation in C#, you will need some sort of big integer implementation (Much like gmp for C/C++). Programming is about using the right tool for the right job. If you cannot find a good big integer library for C#, it's not against the rules to calculate the number in a language like Python which already has the ability to calculate large numbers. You could then put this number into your C# program via your method of choice, and iterate over each character in the number (you will have to store it as a string). For each character, convert it to an integer and add it to your total until you reach the end of the number. If you would like the big integer, I calculated it with python below. The answer is further down.
Partial Spoiler
10715086071862673209484250490600018105614048117055336074437503883703510511249361
22493198378815695858127594672917553146825187145285692314043598457757469857480393
45677748242309854210746050623711418779541821530464749835819412673987675591655439
46077062914571196477686542167660429831652624386837205668069376
Spoiler Below!
>>> val = str(2**1000)
>>> total = 0
>>> for i in range(0,len(val)): total += int(val[i])
>>> print total
1366
If you've got ruby, you can easily calculate "2**1000" and get it as a string. Should be an easy cut/paste into a string in C#.
Spoiler
In Ruby: (2**1000).to_s.split(//).inject(0){|x,y| x+y.to_i}
spoiler
If you want to see a solution check
out my other answer. This is in Java but it's very easy to port to C#
Here's a clue:
Represent each number with a list. That way you can do basic sums like:
[1,2,3,4,5,6]
+ [4,5]
_____________
[1,2,3,5,0,1]
One alternative to representing the digits as a sequence of integers is to represent the number base 2^32 as a list of 32 bit integers, which is what many big integer libraries do. You then have to convert the number to base 10 for output. This doesn't gain you very much for this particular problem - you can write 2^1000 straight away, then have to divide by 10 many times instead of multiplying 2 by itself 1000 times ( or, as 1000 is 0b1111101000. calculating the product of 2^8,32,64,128,256,512 using repeated squaring 2^8 = (((2^2)^2)^2))) which requires more space and a multiplication method, but is far fewer operations ) - is closer to normal big integer use, so you may find it more useful in later problems ( if you try to calculate the last ten digits of 28433×2^(7830457)+1 using the digit-per int method and repeated addition, it may take some time (though in that case you could use modulo arthimetic, rather than adding strings of millions of digits) ).
Working solution that I have posted it here as well: http://www.mycoding.net/2012/01/solution-to-project-euler-problem-16/
The code:
import java.math.BigInteger;
public class Euler16 {
public static void main(String[] args) {
int power = 1;
BigInteger expo = new BigInteger("2");
BigInteger num = new BigInteger("2");
while(power < 1000){
expo = expo.multiply(num);
power++;
}
System.out.println(expo); //Printing the value of 2^1000
int sum = 0;
char[] expoarr = expo.toString().toCharArray();
int max_count = expoarr.length;
int count = 0;
while(count<max_count){ //While loop to calculate the sum of digits
sum = sum + (expoarr[count]-48);
count++;
}
System.out.println(sum);
}
}
Euler problem #16 has been discussed many times here, but I could not find an answer that gives a good overview of possible solution approaches, the lay of the land as it were. Here's my attempt at rectifying that.
This overview is intended for people who have already found a solution and want to get a more complete picture. It is basically language-agnostic even though the sample code is C#. There are some usages of features that are not available in C# 2.0 but they are not essential - their purpose is only to get boring stuff out of the way with a minimum of fuss.
Apart from using a ready-made BigInteger library (which doesn't count), straightforward solutions for Euler #16 fall into two fundamental categories: performing calculations natively - i.e. in a base that is a power of two - and converting to decimal in order to get at the digits, or performing the computations directly in a decimal base so that the digits are available without any conversion.
For the latter there are two reasonably simple options:
repeated doubling
powering by repeated squaring
Native Computation + Radix Conversion
This approach is the simplest and its performance exceeds that of naive solutions using .Net's builtin BigInteger type.
The actual computation is trivially achieved: just perform the moral equivalent of 1 << 1000, by storing 1000 binary zeroes and appending a single lone binary 1.
The conversion is also quite simple and can be done by coding the pencil-and-paper division method, with a suitably large choice of 'digit' for efficiency. Variables for intermediate results need to be able to hold two 'digits'; dividing the number of decimal digits that fit in a long by 2 gives 9 decimal digits for the maximum meta-digit (or 'limb', as it is usually called in bignum lore).
class E16_RadixConversion
{
const int BITS_PER_WORD = sizeof(uint) * 8;
const uint RADIX = 1000000000; // == 10^9
public static int digit_sum_for_power_of_2 (int exponent)
{
var dec = new List<int>();
var bin = new uint[(exponent + BITS_PER_WORD) / BITS_PER_WORD];
int top = bin.Length - 1;
bin[top] = 1u << (exponent % BITS_PER_WORD);
while (top >= 0)
{
ulong rest = 0;
for (int i = top; i >= 0; --i)
{
ulong temp = (rest << BITS_PER_WORD) | bin[i];
ulong quot = temp / RADIX; // x64 uses MUL (sometimes), x86 calls a helper function
rest = temp - quot * RADIX;
bin[i] = (uint)quot;
}
dec.Add((int)rest);
if (bin[top] == 0)
--top;
}
return E16_Common.digit_sum(dec);
}
}
I wrote (rest << BITS_PER_WORD) | big[i] instead of using operator + because that is precisely what is needed here; no 64-bit addition with carry propagation needs to take place. This means that the two operands could be written directly to their separate registers in a register pair, or to fields in an equivalent struct like LARGE_INTEGER.
On 32-bit systems the 64-bit division cannot be inlined as a few CPU instructions, because the compiler cannot know that the algorithm guarantees quotient and remainder to fit into 32-bit registers. Hence the compiler calls a helper function that can handle all eventualities.
These systems may profit from using a smaller limb, i.e. RADIX = 10000 and uint instead of ulong for holding intermediate (double-limb) results. An alternative for languages like C/C++ would be to call a suitable compiler intrinsic that wraps the raw 32-bit by 32-bit to 64-bit multiply (assuming that division by the constant radix is to be implemented by multiplication with the inverse). Conversely, on 64-bit systems the limb size can be increased to 19 digits if the compiler offers a suitable 64-by-64-to-128 bit multiply primitive or allows inline assembler.
Decimal Doubling
Repeated doubling seems to be everyone's favourite, so let's do that next. Variables for intermediate results need to hold one 'digit' plus one carry bit, which gives 18 digits per limb for long. Going to ulong cannot improve things (there's 0.04 bit missing to 19 digits plus carry), and so we might as well stick with long.
On a binary computer, decimal limbs do not coincide with computer word boundaries. That makes it necessary to perform a modulo operation on the limbs during each step of the calculation. Here, this modulo op can be reduced to a subtraction of the modulus in the event of carry, which is faster than performing a division. The branching in the inner loop can be eliminated by bit twiddling but that would be needlessly obscure for a demonstration of the basic algorithm.
class E16_DecimalDoubling
{
const int DIGITS_PER_LIMB = 18; // == floor(log10(2) * (63 - 1)), b/o carry
const long LIMB_MODULUS = 1000000000000000000L; // == 10^18
public static int digit_sum_for_power_of_2 (int power_of_2)
{
Trace.Assert(power_of_2 > 0);
int total_digits = (int)Math.Ceiling(Math.Log10(2) * power_of_2);
int total_limbs = (total_digits + DIGITS_PER_LIMB - 1) / DIGITS_PER_LIMB;
var a = new long[total_limbs];
int limbs = 1;
a[0] = 2;
for (int i = 1; i < power_of_2; ++i)
{
int carry = 0;
for (int j = 0; j < limbs; ++j)
{
long new_limb = (a[j] << 1) | carry;
carry = 0;
if (new_limb >= LIMB_MODULUS)
{
new_limb -= LIMB_MODULUS;
carry = 1;
}
a[j] = new_limb;
}
if (carry != 0)
{
a[limbs++] = carry;
}
}
return E16_Common.digit_sum(a);
}
}
This is just as simple as radix conversion, but except for very small exponents it does not perform anywhere near as well (despite its huge meta-digits of 18 decimal places). The reason is that the code must perform (exponent - 1) doublings, and the work done in each pass corresponds to about half the total number of digits (limbs).
Repeated Squaring
The idea behind powering by repeated squaring is to replace a large number of doublings with a small number of multiplications.
1000 = 2^3 + 2^5 + 2^6 + 2^7 + 2^8 + 2^9
x^1000 = x^(2^3 + 2^5 + 2^6 + 2^7 + 2^8 + 2^9)
x^1000 = x^2^3 * x^2^5 * x^2^6 * x^2^7 * x^2*8 * x^2^9
x^2^3 can be obtained by squaring x three times, x^2^5 by squaring five times, and so on. On a binary computer the decomposition of the exponent into powers of two is readily available because it is the bit pattern representing that number. However, even non-binary computers should be able to test whether a number is odd or even, or to divide a number by two.
The multiplication can be done by coding the pencil-and-paper method; here I'm using a helper function that computes one row of a product and adds it into the result at a suitably shifted position, so that the rows of partial products do not need to be stored for a separate addition step later. Intermediate values during computation can be up to two 'digits' in size, so that the limbs can be only half as wide as for repeated doubling (where only one extra bit had to fit in addition to a 'digit').
Note: the radix of the computations is not a power of 2, and so the squarings of 2 cannot be computed by simple shifting here. On the positive side, the code can be used for computing powers of bases other than 2.
class E16_DecimalSquaring
{
const int DIGITS_PER_LIMB = 9; // language limit 18, half needed for holding the carry
const int LIMB_MODULUS = 1000000000;
public static int digit_sum_for_power_of_2 (int e)
{
Trace.Assert(e > 0);
int total_digits = (int)Math.Ceiling(Math.Log10(2) * e);
int total_limbs = (total_digits + DIGITS_PER_LIMB - 1) / DIGITS_PER_LIMB;
var squared_power = new List<int>(total_limbs) { 2 };
var result = new List<int>(total_limbs);
result.Add((e & 1) == 0 ? 1 : 2);
while ((e >>= 1) != 0)
{
squared_power = multiply(squared_power, squared_power);
if ((e & 1) == 1)
result = multiply(result, squared_power);
}
return E16_Common.digit_sum(result);
}
static List<int> multiply (List<int> lhs, List<int> rhs)
{
var result = new List<int>(lhs.Count + rhs.Count);
resize_to_capacity(result);
for (int i = 0; i < rhs.Count; ++i)
addmul_1(result, i, lhs, rhs[i]);
trim_leading_zero_limbs(result);
return result;
}
static void addmul_1 (List<int> result, int offset, List<int> multiplicand, int multiplier)
{
// it is assumed that the caller has sized `result` appropriately before calling this primitive
Trace.Assert(result.Count >= offset + multiplicand.Count + 1);
long carry = 0;
foreach (long limb in multiplicand)
{
long temp = result[offset] + limb * multiplier + carry;
carry = temp / LIMB_MODULUS;
result[offset++] = (int)(temp - carry * LIMB_MODULUS);
}
while (carry != 0)
{
long final_temp = result[offset] + carry;
carry = final_temp / LIMB_MODULUS;
result[offset++] = (int)(final_temp - carry * LIMB_MODULUS);
}
}
static void resize_to_capacity (List<int> operand)
{
operand.AddRange(Enumerable.Repeat(0, operand.Capacity - operand.Count));
}
static void trim_leading_zero_limbs (List<int> operand)
{
int i = operand.Count;
while (i > 1 && operand[i - 1] == 0)
--i;
operand.RemoveRange(i, operand.Count - i);
}
}
The efficiency of this approach is roughly on par with radix conversion but there are specific improvements that apply here. Efficiency of the squaring can be doubled by writing a special squaring routine that utilises the fact that ai*bj == aj*bi if a == b, which cuts the number of multiplications in half.
Also, there are methods for computing addition chains that involve fewer operations overall than using the exponent bits for determining the squaring/multiplication schedule.
Helper Code and Benchmarks
The helper code for summing decimal digits in the meta-digits (decimal limbs) produced by the sample code is trivial, but I'm posting it here anyway for your convenience:
internal class E16_Common
{
internal static int digit_sum (int limb)
{
int sum = 0;
for ( ; limb > 0; limb /= 10)
sum += limb % 10;
return sum;
}
internal static int digit_sum (long limb)
{
const int M1E9 = 1000000000;
return digit_sum((int)(limb / M1E9)) + digit_sum((int)(limb % M1E9));
}
internal static int digit_sum (IEnumerable<int> limbs)
{
return limbs.Aggregate(0, (sum, limb) => sum + digit_sum(limb));
}
internal static int digit_sum (IEnumerable<long> limbs)
{
return limbs.Select((limb) => digit_sum(limb)).Sum();
}
}
This can be made more efficient in various ways but overall it is not critical.
All three solutions take O(n^2) time where n is the exponent. In other words, they will take a hundred times as long when the exponent grows by a factor of ten. Radix conversion and repeated squaring can both be improved to roughly O(n log n) by employing divide-and-conquer strategies; I doubt whether the doubling scheme can be improved in a similar fastion but then it was never competitive to begin with.
All three solutions presented here can be used to print the actual results, by stringifying the meta-digits with suitable padding and concatenating them. I've coded the functions as returning the digit sum instead of the arrays/lists with decimal limbs only in order to keep the sample code simple and to ensure that all functions have the same signature, for benchmarking.
In these benchmarks, the .Net BigInteger type was wrapped like this:
static int digit_sum_via_BigInteger (int power_of_2)
{
return System.Numerics.BigInteger.Pow(2, power_of_2)
.ToString()
.ToCharArray()
.Select((c) => (int)c - '0')
.Sum();
}
Finally, the benchmarks for the C# code:
# testing decimal doubling ...
1000: 1366 in 0,052 ms
10000: 13561 in 3,485 ms
100000: 135178 in 339,530 ms
1000000: 1351546 in 33.505,348 ms
# testing decimal squaring ...
1000: 1366 in 0,023 ms
10000: 13561 in 0,299 ms
100000: 135178 in 24,610 ms
1000000: 1351546 in 2.612,480 ms
# testing radix conversion ...
1000: 1366 in 0,018 ms
10000: 13561 in 0,619 ms
100000: 135178 in 60,618 ms
1000000: 1351546 in 5.944,242 ms
# testing BigInteger + LINQ ...
1000: 1366 in 0,021 ms
10000: 13561 in 0,737 ms
100000: 135178 in 69,331 ms
1000000: 1351546 in 6.723,880 ms
As you can see, the radix conversion is almost as slow as the solution using the builtin BigInteger class. The reason is that the runtime is of the newer type that does performs certain standard optimisations only for signed integer types but not for unsigned ones (here: implementing division by a constant as multiplication with the inverse).
I haven't found an easy means of inspecting the native code for existing .Net assemblies, so I decided on a different path of investigation: I coded a variant of E16_RadixConversion for comparison where ulong and uint were replaced by long and int respectively, and BITS_PER_WORD decreased by 1 accordingly. Here are the timings:
# testing radix conv Int63 ...
1000: 1366 in 0,004 ms
10000: 13561 in 0,202 ms
100000: 135178 in 18,414 ms
1000000: 1351546 in 1.834,305 ms
More than three times as fast as the version that uses unsigned types! Clear evidence of numbskullery in the compiler...
In order to showcase the effect of different limb sizes I templated the solutions in C++ on the unsigned integer types used as limbs. The timings are prefixed with the byte size of a limb and the number of decimal digits in a limb, separated by a colon. There is no timing for the often-seen case of manipulating digit characters in strings, but it is safe to say that such code will take at least twice as long as the code that uses double digits in byte-sized limbs.
# E16_DecimalDoubling
[1:02] e = 1000 -> 1366 0.308 ms
[2:04] e = 1000 -> 1366 0.152 ms
[4:09] e = 1000 -> 1366 0.070 ms
[8:18] e = 1000 -> 1366 0.071 ms
[1:02] e = 10000 -> 13561 30.533 ms
[2:04] e = 10000 -> 13561 13.791 ms
[4:09] e = 10000 -> 13561 6.436 ms
[8:18] e = 10000 -> 13561 2.996 ms
[1:02] e = 100000 -> 135178 2719.600 ms
[2:04] e = 100000 -> 135178 1340.050 ms
[4:09] e = 100000 -> 135178 588.878 ms
[8:18] e = 100000 -> 135178 290.721 ms
[8:18] e = 1000000 -> 1351546 28823.330 ms
For the exponent of 10^6 there is only the timing with 64-bit limbs, since I didn't have the patience to wait many minutes for full results. The picture is similar for radix conversion, except that there is no row for 64-bit limbs because my compiler does not have a native 128-bit integral type.
# E16_RadixConversion
[1:02] e = 1000 -> 1366 0.080 ms
[2:04] e = 1000 -> 1366 0.026 ms
[4:09] e = 1000 -> 1366 0.048 ms
[1:02] e = 10000 -> 13561 4.537 ms
[2:04] e = 10000 -> 13561 0.746 ms
[4:09] e = 10000 -> 13561 0.243 ms
[1:02] e = 100000 -> 135178 445.092 ms
[2:04] e = 100000 -> 135178 68.600 ms
[4:09] e = 100000 -> 135178 19.344 ms
[4:09] e = 1000000 -> 1351546 1925.564 ms
The interesting thing is that simply compiling the code as C++ doesn't make it any faster - i.e., the optimiser couldn't find any low-hanging fruit that the C# jitter missed, apart from not toeing the line with regard to penalising unsigned integers. That's the reason why I like prototyping in C# - performance in the same ballpark as (unoptimised) C++ and none of the hassle.
Here's the meat of the C++ version (sans reams of boring stuff like helper templates and so on) so that you can see that I didn't cheat to make C# look better:
template<typename W>
struct E16_RadixConversion
{
typedef W limb_t;
typedef typename detail::E16_traits<W>::long_t long_t;
static unsigned const BITS_PER_WORD = sizeof(limb_t) * CHAR_BIT;
static unsigned const RADIX_DIGITS = std::numeric_limits<limb_t>::digits10;
static limb_t const RADIX = detail::pow10_t<limb_t, RADIX_DIGITS>::RESULT;
static unsigned digit_sum_for_power_of_2 (unsigned e)
{
std::vector<limb_t> digits;
compute_digits_for_power_of_2(e, digits);
return digit_sum(digits);
}
static void compute_digits_for_power_of_2 (unsigned e, std::vector<limb_t> &result)
{
assert(e > 0);
unsigned total_digits = unsigned(std::ceil(std::log10(2) * e));
unsigned total_limbs = (total_digits + RADIX_DIGITS - 1) / RADIX_DIGITS;
result.resize(0);
result.reserve(total_limbs);
std::vector<limb_t> bin((e + BITS_PER_WORD) / BITS_PER_WORD);
bin.back() = limb_t(limb_t(1) << (e % BITS_PER_WORD));
while (!bin.empty())
{
long_t rest = 0;
for (std::size_t i = bin.size(); i-- > 0; )
{
long_t temp = (rest << BITS_PER_WORD) | bin[i];
long_t quot = temp / RADIX;
rest = temp - quot * RADIX;
bin[i] = limb_t(quot);
}
result.push_back(limb_t(rest));
if (bin.back() == 0)
bin.pop_back();
}
}
};
Conclusion
These benchmarks also show that this Euler task - like many others - seems designed to be solved on a ZX81 or an Apple ][, not on our modern toys that are a million times as powerful. There's no challenge involved here unless the limits are increased drastically (an exponent of 10^5 or 10^6 would be much more adequate).
A good overview of the practical state of the art can be got from GMP's overview of algorithms. Another excellent overview of the algorithms is chapter 1 of "Modern Computer Arithmetic" by Richard Brent and Paul Zimmermann. It contains exactly what one needs to know for coding challenges and competitions, but unfortunately the depth is not equal to that of Donald Knuth's treatment in "The Art of Computer Programming".
The radix conversion solution adds a useful technique to one's code challenge toolchest, since the given code can be trivially extended for converting any old big integer instead of only the bit pattern 1 << exponent. The repeated squaring solutiono can be similarly useful since changing the sample code to power something other than 2 is again trivial.
The approach of performing computations directly in powers of 10 can be useful for challenges where decimal results are required, because performance is in the same ballpark as native computation but there is no need for a separate conversion step (which can require similar amounts of time as the actual computation).

Categories

Resources