Speeding up lookup of pattern within byte array

Speeding up lookup of pattern within byte array - c#

I have a large binary file that is around 70MB in size. In my program, I have a method that looks up byte[] array patterns against the file, to see if they exist within the file or not. I have around 1-10 millions patterns to run against the file. The options I see are the following:
Read the file into memory by doing byte[] file = File.ReadAllBytes(path) then perform byte[] lookup of byte[] pattern(s) against the file bytes. I have used multiple methods for doing that from different topics on SO such as:
byte[] array pattern search
Find an array (byte[]) inside another array?
Best way to check if a byte[] is contained in another byte[] Though, byte[] versus byte[] lookups are extremely slow when the source is large in size. It would take take weeks to run 1 million patterns on normal computers.
Convert both the file and patterns into hex strings then do the comparisons using contains() method to perform the lookup. This one is faster than byte[] lookups but converting bytes to hex would result in the file being larger in memory which results in more processing time.
Convert both the file and pattern into strings using Encoding.GetEncoding(1252).GetBytes() and perform the lookups. Then, compensate for the limitation of binary to string conversion (I know they incompatible) by running the matches of contains() against another method which performs byte[] lookups (suggested first option). This one is the fastest option for me.
Using the third approach, which is the fastest, 1 million patterns would take 2/3 of a day to a day depending on CPU. I need information on how to speed up the lookups.
Thank you.
Edit: Thanks to #MySkullCaveIsADarkPlace I now have a fourth approach which is faster than the three approaches above. I was using limited byte[] lookup algorithms and now I am using MemoryExtensions.IndexOf() byte[] lookup method which is slightly faster than the three approaches above. Though, even though this method is faster, the lookups are still slow. It takes 1 minute for 1000 pattern lookups.
The patterns are 12-20 bytes each.

I assume that you are looking up one pattern after the other. I.e., you are doing 1 to 10 million pattern searches at every position in the file!
Consider doing it the other way round. Loop once through your file bytes and determine if the current position is the start of a pattern.
To do this efficiently, I suggest organizing the patterns in an array of list of patterns. Each pattern is stored in a list at array index 256 * byte[0] + byte[1].
With 10 million patterns you will have an average of 152 patterns in the lists at each array position. This allows a fast lookup.
You could also use the 3 first bytes (256 * (256 * byte[0] + byte[1]) + byte[2]) resulting in an array of length 256^3 ~ 16 millions (I worked with longer arrays; no problem for C#). Then you would have less than one pattern per array position in average. This would result in a nearly linear search time O(n) with respect to the file length. A huge improvement compared to the quadratic O(num_of_patterns * file_length) for a straight forward algorithm.
We can use a simple byte by byte comparison to compare the patterns, since we can compare starting at a known position. (Boyer Moore is of no use here.)
2 bytes index (patterns must be at least 2 bytes long)
byte[] file = { 23, 36, 43, 76, 125, 56, 34, 234, 12, 3, 5, 76, 8, 0, 6, 125, 234, 56, 211, 122, 22, 4, 7, 89, 76, 64, 12, 3, 5, 76, 8, 0, 6, 125 };
byte[][] patterns = {
new byte[]{ 12, 3, 5, 76, 8, 0, 6, 125, 11 },
new byte[]{ 211, 122, 22, 4 },
new byte[]{ 17, 211, 5, 8 },
new byte[]{ 22, 4, 7, 89, 76, 64 },
};
var patternMatrix = new List<byte[]>[256 * 256];
// Add patterns to matrix.
// We assume pattern.Length >= 2.
foreach (byte[] pattern in patterns) {
int index = 256 * pattern[0] + pattern[1];
patternMatrix[index] ??= new List<byte[]>(); // Ensure we have a list.
patternMatrix[index].Add(pattern);
}
// The search. Loop through the file
for (int fileIndex = 0; fileIndex < file.Length - 1; fileIndex++) { // Length - 1 because we need 2 bytes.
int patternIndex = 256 * file[fileIndex] + file[fileIndex + 1];
List<byte[]> candiatePatterns = patternMatrix[patternIndex];
if (candiatePatterns != null) {
foreach (byte[] candidate in candiatePatterns) {
if (fileIndex + candidate.Length <= file.Length) {
bool found = true;
// We know that the 2 first bytes are matching,
// so let's start at the 3rd
for (int i = 2; i < candidate.Length; i++) {
if (candidate[i] != file[fileIndex + i]) {
found = false;
break;
}
}
if (found) {
Console.WriteLine($"pattern {{{candidate[0]}, {candidate[1]} ..}} found at file index {fileIndex}");
}
}
}
}
}
Same algorithm with 3 bytes (even faster!)
3 bytes index (patterns must be at least 3 bytes long)
var patternMatrix = new List<byte[]>[256 * 256 * 256];
// Add patterns to matrix.
// We assume pattern.Length >= 3.
foreach (byte[] pattern in patterns) {
int index = 256 * 256 * pattern[0] + 256 * pattern[1] + pattern[2];
patternMatrix[index] ??= new List<byte[]>(); // Ensure we have a list.
patternMatrix[index].Add(pattern);
}
// The search. Loop through the file
for (int fileIndex = 0; fileIndex < file.Length - 2; fileIndex++) { // Length - 2 because we need 3 bytes.
int patternIndex = 256 * 256 * file[fileIndex] + 256 * file[fileIndex + 1] + file[fileIndex + 2];
List<byte[]> candiatePatterns = patternMatrix[patternIndex];
if (candiatePatterns != null) {
foreach (byte[] candidate in candiatePatterns) {
if (fileIndex + candidate.Length <= file.Length) {
bool found = true;
// We know that the 3 first bytes are matching,
// so let's start at the 4th
for (int i = 3; i < candidate.Length; i++) {
if (candidate[i] != file[fileIndex + i]) {
found = false;
break;
}
}
if (found) {
Console.WriteLine($"pattern {{{candidate[0]}, {candidate[1]} ..}} found at file index {fileIndex}");
}
}
}
}
}
Why is it faster?
A simple nested loops algorithm compares up to ~ 706 * 106 = 7 * 1014 (700 trillion) patterns! 706 is the length of the file. 106 is the number of patterns.
My algorithm with a 2 bytes index makes ~ 706 * 152 = 1010 pattern comparisons. The number 152 comes from the fact that there are in average 152 patterns for a given 2 bytes index ~ 106/(256 * 256). This is 65,536 times faster.
With 3 bytes you get less than about 706 pattern comparisons. This is more than 10 million times faster. This is the case because we store all the patterns in an array whose length is greater (16 millions) than the number of patterns (10 millions or less). Therefore, at any byte position plus 2 following positions within the file, we can pick up only the patterns starting with the same 3 bytes. And this is in average less than one pattern. Sometimes there may be 0 or 1, sometimes 2 or 3, but rarely more patterns at any array position.
Try it. The shift is from O(n2) to near O(n). The initialization time is O(n). The assumption is that the 2 or 3 first bytes of the patterns are more or less randomly distributed. If this was not the case, my algorithm would degrade to O(n2) in the worst case.
Okay, that's the theory. Since the 3 bytes index version is slower at initialization it may have only an advantage with huge data sets. Other improvements could be made by using Span<byte>.
See: Big O notation - Wikipedia.

One idea is to group the patterns by their length, put each group in a HashSet<byte[]> for searching with O(1) complexity, and then scan the source byte[] index by index for all groups. Since the number of groups in your case is small (only 9 groups), this optimization should yield significant performance improvements. Here is an implementation:
IEnumerable<byte[]> FindMatches(byte[] source, byte[][] patterns)
{
Dictionary<int, HashSet<ArraySegment<byte>>> buckets = new();
ArraySegmentComparer comparer = new();
foreach (byte[] pattern in patterns)
{
HashSet<ArraySegment<byte>> bucket;
if (!buckets.TryGetValue(pattern.Length, out bucket))
{
bucket = new(comparer);
buckets.Add(pattern.Length, bucket);
}
bucket.Add(pattern); // Implicit cast byte[] => ArraySegment<byte>
}
for (int i = 0; i < source.Length; i++)
{
foreach (var (length, bucket) in buckets)
{
if (i + length > source.Length) continue;
ArraySegment<byte> slice = new(source, i, length);
if (bucket.TryGetValue(slice, out var pattern))
{
yield return pattern.Array;
bucket.Remove(slice);
}
}
}
}
Currently (.NET 6) there is no equality comparer for sequences available in the standard libraries, so you'll have to provide a custom one:
class ArraySegmentComparer : IEqualityComparer<ArraySegment<byte>>
{
public bool Equals(ArraySegment<byte> x, ArraySegment<byte> y)
{
return x.AsSpan().SequenceEqual(y);
}
public int GetHashCode(ArraySegment<byte> obj)
{
HashCode hashcode = new();
hashcode.AddBytes(obj);
return hashcode.ToHashCode();
}
}
This algorithm assumes that there are no duplicates in the patterns. In case that's not the case, only one of the duplicates will be emitted.
In my (not very speedy) PC this algorithm takes around 10 seconds to create the buckets dictionary (for 10,000,000 patterns with size 12-20), and then additional 5-6 minutes to scan a source byte[] of size 70,000,000 (scans around 200,000 bytes per second). The number of the patterns does not affect the scanning phase (as long as the number of the groups is not increased).
Parallelizing this algorithm is not trivial, because the buckets are mutated during the scan.

Related

Text Hashing trick produces different results in Python and C#

I am trying to move a trained model into a production environment and have encountered an issue trying to replicate the behavior of the Keras hashing_trick() function in C#. When I go to encode the sentence my output is different in C# than it is in python:
Text: "Information - The configuration processing is completed."
Python: [ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 217 142 262 113 319 413]
C#: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 433, 426, 425, 461, 336, 146, 52]
(copied from debugger, both sequences have length 30)
What I've tried:
changing the encoding of the text bytes in C# to match the python string.encode() function default (UTF8)
Changing capitalization of letters to lowercase and upper case
Tried using Convert.ToUInt32 instead of BitConverter (resulted in overflow error)
My code (below) is my implementation of the Keras hashing_trick function. A single input sentence is given and then the function will return the corresponding encoded sequence.
public uint[] HashingTrick(string data)
{
const int VOCAB_SIZE = 534; //Determined through python debugging of model
var filters = "!#$%&()*+,-./:;<=>?#[\\]^_`{|}~\t\n".ToCharArray().ToList();
filters.ForEach(x =>
{
data = data.Replace(x, '\0');
});
string[] parts = data.Split(' ');
var encoded = new List<uint>();
parts.ToList().ForEach(x =>
{
using (System.Security.Cryptography.MD5 md5 = System.Security.Cryptography.MD5.Create())
{
byte[] inputBytes = System.Text.Encoding.UTF8.GetBytes(x);
byte[] hashBytes = md5.ComputeHash(inputBytes);
uint val = BitConverter.ToUInt32(hashBytes, 0);
encoded.Add(val % (VOCAB_SIZE - 1) + 1);
}
});
return PadSequence(encoded, 30);
}
private uint[] PadSequence(List<uint> seq, int maxLen)
{
if (seq.Count < maxLen)
{
while (seq.Count < maxLen)
{
seq.Insert(0, 0);
}
return seq.ToArray();
}
else if (seq.Count > maxLen)
{
return seq.GetRange(seq.Count - maxLen - 1, maxLen).ToArray();
}
else
{
return seq.ToArray();
}
}
The keras implementation of the hashing trick can be found here
If it helps, I am using an ASP.NET Web API as my solution type.

The biggest problem with your code is that it fails to account for the fact that Python's int is an arbitrary precision integer, while C#'s uint has only 32 bits. This means that Python is calculating the modulo over all 128 bits of the hash, while C# is not (and BitConverter.ToUInt32 is the wrong thing to do in any case, as the endianness is wrong). The other problem that trips you up is that \0 does not terminate strings in C#, and \0 can't just be added to an MD5 hash without changing the outcome.
Translated in as straightforward a manner as possible:
int[] hashingTrick(string text, int n, string filters, bool lower, string split) {
var splitWords = String.Join("", text.Where(c => !filters.Contains(c)))
.Split(new[] { split }, StringSplitOptions.RemoveEmptyEntries);
return (
from word in splitWords
let bytes = Encoding.UTF8.GetBytes(lower ? word.ToLower() : word)
let hash = MD5.Create().ComputeHash(bytes)
// add a 0 byte to force a non-negative result, per the BigInteger docs
let w = new BigInteger(hash.Reverse().Concat(new byte[] { 0 }).ToArray())
select (int) (w % (n - 1) + 1)
).ToArray();
}
Sample use:
const int vocabSize = 534;
Console.WriteLine(String.Join(" ",
hashingTrick(
text: "Information - The configuration processing is completed.",
n: vocabSize,
filters: "!#$%&()*+,-./:;<=>?#[\\]^_`{|}~\t\n",
lower: true,
split: " "
).Select(i => i.ToString())
));
217 142 262 113 319 413
This code has various inefficiencies: filtering characters with LINQ is very inefficient compared to using a StringBuilder and we don't really need BigInteger here since MD5 is always exactly 128 bits, but optimizing (if necessary) is left as an exercise to the reader, as is padding the outcome (which you already have a function for).

Instead of solving the issue of trying to fight with C# to get the hashing right, I took a different approach to the problem. When making my data set to train the model (this is a machine learning project after all) I decided to use #Jeron Mostert's implementation of the hashing function to pre-hash the data set before feeding it into the model.
This solution was much easier to implement and ended up working just as well as the original text hashing. Word of advice for those attempting to do cross language hashing like me: Don't do it, it's a lot of headache! Use one language for hashing your text data and find a way to create a valid data set with all of the information required.

C# formula to determine index

I have a maths issue within my program. I think the problem is simple but I'm not sure what terms to use, hence my own searches returned nothing useful.
I receive some values in a method, the only thing I know (in terms of logic) is the numbers will be something which can be duplicated.
In other words, the numbers I could receive are predictable and would be one of the following
1
2
4
16
256
65536
etc
I need to know at what index they appear at. In othewords, 1 is always at index 0, 2 at index 1, 4 at index 3, 16 is at index 4 etc.
I know I could write a big switch statement but I was hoping a formula would be tidier. Do you know if one exists or any clues as the names of the math forumula's I'm using.

The numbers you listed are powers of two. The inverse function of raising a number to a power is the logarithm, so that's what you use to go backwards from (using your terminology here) a number to an index.
var num = 256;
var ind = Math.Log(num, 2);
Above, ind is the base-2 logarithm of num. This code will work for any base; just substitute that base for 2. If you are only going to be working with powers of 2 then you can use a special-case solution that is faster based on the bitwise representation of your input; see What's the quickest way to compute log2 of an integer in C#?

Try
Math.Log(num, base)
where base is 2
MSDN: http://msdn.microsoft.com/en-us/library/hd50b6h5.aspx
Logarithm will return to You power of base from you number.
But it's in case if your number really are power of 2,
otherwise you have to understand exactly what you have, what you need
It also look like numbers was powered to 2 twice, so that try this:
private static int getIndexOfSeries(UInt64 aNum)
{
if (aNum == 1)
return 0;
else if (aNum == 2)
return 1;
else
{
int lNum = (int)Math.Log(aNum, 2);
return 1+(int)Math.Log(lNum, 2);
}
}
Result for UInt64[] Arr = new UInt64[] { 1, 2, 4, 16, 256, 65536, 4294967296 } is:
Num[0] = 1
Num[1] = 2
Num[2] = 4
Num[3] = 16
Num[4] = 256
Num[5] = 65536
Num[6] = 4294967296 //65536*65536
where [i] - index

You should calculate the base 2 logarithm of the number

Hint: For the results:
0 2
1 4
2 16
3 256
4 65536
5 4294967296
etc.
The formula is, for a give integer x:
Math.Pow(2, Math.Pow(2, x));
that is
2 to the power (2 to the power (x) )
Once the formula is known, one could solve it for x (I won't go through that since you already got an answer).

Speed of looking for byte[] in byte[] and string in string - why latter is faster?

I have a task where I need to look up for sequences in a file. When doing a test application, I have read file as string (File.ReadAllText) and used string.IndexOf to look up for a sequence. When I tried to implement the same algorithm with bytes (reading file as byte array and looking up byte array in byte array), I noticed that looking for byte[] in byte[] is about 3 times as slow as looking for string in string. I made sure to check it throughly, and exactly the same code, one using byte[] and other using string, take 3 times as much to execute - like, 16s for byte vs ~5s for string.
For looking up byte arrays, I used ways described here byte[] array pattern search. For looking up strings, I used built-in IndexOf function of string class. Here is one of the implementations of IndexOf for byte[] I tried:
public int IndexOf(byte[] source, byte[] pattern, int startpos = 0)
{
int search_limit = source.Length - pattern.Length;
for (int i = startpos; i < search_limit; i++)
{
if (source[i] == pattern[0])
{
bool found = true;
for (int j = 1; j < pattern.Length; j++)
{
if (source[i + j] != pattern[j])
{
found = false;
break;
}
}
if (found)
return i;
}
}
return -1;
}
Basically, looking up next match for sequence of bytes in bytes array takes three time as long as looking up next match for sequence of chars in string. My question is - WHY?
Does anyone know how .Net handles looking up chars in string, what kind of optimisation it does, what algorithm it uses? Is there a faster algorithm than the one I'm using here? Maybe someone has an idea what I'm doing wrong here so that it takes more time than it should? I really do not understand how looking up string in string can be 3 times as fast as looking up byte[] in byte[]...
UPDATE: I have tried the unsafe algorithm as suggested. It was as follows:
public static unsafe long IndexOfFast(byte[] Haystack, byte[] Needle, long startpos = 0)
{
long i = startpos;
fixed (byte* H = Haystack) fixed (byte* N = Needle)
{
for (byte* hNext = H + startpos, hEnd = H + Haystack.LongLength; hNext < hEnd; i++, hNext++)
{
bool Found = true;
for (byte* hInc = hNext, nInc = N, nEnd = N + Needle.LongLength; Found && nInc < nEnd; Found = *nInc == *hInc, nInc++, hInc++) ;
if (Found) return i;
}
return -1;
}
}
}
strange thing is, it actually proved to be twice as slow! I changed it to add my personal tweak (checking first letter before attempting to iterate through needle) and it looks like this now:
public static unsafe long IndexOfFast(byte[] Haystack, byte[] Needle, long startpos = 0)
{
long i = startpos;
fixed (byte* H = Haystack) fixed (byte* N = Needle)
{
for (byte* hNext = H + startpos, hEnd = H + Haystack.LongLength; hNext < hEnd; i++, hNext++)
{
if (*hNext == *N)
{
bool Found = true;
for (byte* hInc = hNext+1, nInc = N+1, nEnd = N + Needle.LongLength; Found && nInc < nEnd; Found = *nInc == *hInc, nInc++, hInc++) ;
if (Found) return i;
}
}
return -1;
}
}
Now, it takes exactly the same time to execute as the safe one. My question is again - any ideas why? Shouldn't it be faster because it's unsafe and operates with pointers, when compared to safe?

Basically, looking up next match for sequence of bytes in bytes array takes three time as long as looking up next match for sequence of chars in string. My question is - WHY?
Because the string search algorithm has been heavily optimized; it is written in tight unmanaged code which does not spend time checking array bounds. If you were to optimize your byte search algorithm in the same way, it would be just as fast; the string search algorithm uses the same naive algorithm that you are using.
Your algorithm is fine -- this is the standard "naive" search, and Kevin's claims notwithstanding, the naive algorithm is in practice almost always the fastest on typical data. Blazing through an array looking for a byte is incredibly fast on modern hardware. It depends on the size of your problem; if you're looking for a long DNA string in the middle of the human genome then Boyer-Moore is totally worth the expense of preprocessing. If you're looking for 0xDEADBEEF in a twenty KB file then you're not going to beat the naive algorithm if it is efficiently implemented.
For maximum speed what you should do here is turn off the safety system and write the code using unsafe pointer arithmetic.

Your search algorithm for bytes is extremely inefficient!
The baseline algorithm which all other string searches are compared to is Boyer-Moore. I would bet .NET string searches use it or a variation of it. There are others also but implementing Boyer-Moore for bytes will give you much better performance.
Edit: SO to the rescue! Here is a simple C# implementation of Boyer-Moore for byte arrays
Edit with timing numbers:
Eric's comments got me very interested because I've always heard the .Net string searches use Boyer-Moore. I really appreciated someone who actually knew telling me otherwise. It makes perfect sense after thinking about it. I decided to do timing on the Boyer-Moore vs Naive byte search and found out Eric is absolutely correct for a small needle and small haystack the naive search is over 13x faster. What I was surprised by though was the "break even" point was much lower than I expected. Boyer-Moore improves significantly with pattern size (or needle size in my timings) so the larger the pattern you are looking for the faster it searches. For an 8 byte needle Naive search vs Boyer-Moore were tied searching through a 500-600 byte haystack. For a larger haystack Boyer-Moore gets significantly better especially with a larger needle. For a 20KB haystack and 64 byte needle Boyer-Moore was 10x faster.
Full timing numbers are below for anyone interested.
All tests were using the simple Boyer-Moore linked above and the Op's naive byte search algorithm he posted doing 1M search iterations.
1000000 iterations, haystack size = 20 bytes, needle size = 8 bytes
20ms total : Naive Search
268ms total : Boyer-Moore Search
1000000 iterations, haystack size = 600 bytes, needle size = 8 bytes
608ms total : Naive Search
582ms total : Boyer-Moore Search
1000000 iterations, haystack size = 2000 bytes, needle size = 8 bytes
2011ms total : Naive Search
1339ms total : Boyer-Moore Search
1000000 iterations, haystack size = 2000 bytes, needle size = 32 bytes
1865ms total : Naive Search
563ms total : Boyer-Moore Search
1000000 iterations, haystack size = 2000 bytes, needle size = 64 bytes
1883ms total : Naive Search
466ms total : Boyer-Moore Search
1000000 iterations, haystack size = 20000 bytes, needle size = 8 bytes
18899ms total : Naive Search
10753ms total : Boyer-Moore Search
1000000 iterations, haystack size = 20000 bytes, needle size = 32 bytes
18639ms total : Naive Search
3114ms total : Boyer-Moore Search
1000000 iterations, haystack size = 20000 bytes, needle size = 64 bytes
18866ms total : Naive Search
1807ms total : Boyer-Moore Search

Are there any working implementations of the rolling hash function used in the Rabin-Karp string search algorithm?

I'm looking to use a rolling hash function so I can take hashes of n-grams of a very large string.
For example:
"stackoverflow", broken up into 5 grams would be:
"stack", "tacko", "ackov", "ckove",
"kover", "overf", "verfl", "erflo", "rflow"
This is ideal for a rolling hash function because after I calculate the first n-gram hash, the following ones are relatively cheap to calculate because I simply have to drop the first letter of the first hash and add the new last letter of the second hash.
I know that in general this hash function is generated as:
H = c1ak − 1 + c2ak − 2 + c3ak − 3 + ... + cka0 where a is a constant and c1,...,ck are the input characters.
If you follow this link on the Rabin-Karp string search algorithm , it states that "a" is usually some large prime.
I want my hashes to be stored in 32 bit integers, so how large of a prime should "a" be, such that I don't overflow my integer?
Does there exist an existing implementation of this hash function somewhere that I could already use?
Here is an implementation I created:
public class hash2
{
public int prime = 101;
public int hash(String text)
{
int hash = 0;
for(int i = 0; i < text.length(); i++)
{
char c = text.charAt(i);
hash += c * (int) (Math.pow(prime, text.length() - 1 - i));
}
return hash;
}
public int rollHash(int previousHash, String previousText, String currentText)
{
char firstChar = previousText.charAt(0);
char lastChar = currentText.charAt(currentText.length() - 1);
int firstCharHash = firstChar * (int) (Math.pow(prime, previousText.length() - 1));
int hash = (previousHash - firstCharHash) * prime + lastChar;
return hash;
}
public static void main(String[] args)
{
hash2 hashify = new hash2();
int firstHash = hashify.hash("mydog");
System.out.println(firstHash);
System.out.println(hashify.hash("ydogr"));
System.out.println(hashify.rollHash(firstHash, "mydog", "ydogr"));
}
}
I'm using 101 as my prime. Does it matter if my hashes will overflow? I think this is desirable but I'm not sure.
Does this seem like the right way to go about this?

i remember a slightly different implementation which seems to be from one of sedgewick's algorithms books (it also contains example code - try to look it up). here's a summary adjusted to 32 bit integers:
you use modulo arithmetic to prevent your integer from overflowing after each operation.
initially set:
c = text ("stackoverflow")
M = length of the "n-grams"
d = size of your alphabet (256)
q = a large prime so that (d+1)*q doesn't overflow (8355967 might be a good choice)
dM = dM-1 mod q
first calculate the hash value of the first n-gram:
h = 0
for i from 1 to M:
h = (h*d + c[i]) mod q
and for every following n-gram:
for i from 1 to lenght(c)-M:
// first subtract the oldest character
h = (h + d*q - c[i]*dM) mod q
// then add the next character
h = (h*d + c[i+M]) mod q
the reason why you have to add d*q before subtracting the oldest character is because you might run into negative values due to small values caused by the previous modulo operation.
errors included but i think you should get the idea. try to find one of sedgewick's algorithms books for details, less errors and a better description. :)

As i understand it's a function minimization for:
2^31 - sum (maxchar) * A^kx
where maxchar = 62 (for A-Za-z0-9). I've just calculated it by Excel (OO Calc, exactly) :) and a max A it found is 76, or 73, for a prime number.

Not sure what your aim is here, but if you are trying to improve performance, using math.pow will cost you far more than you save by calculating a rolling hash value.
I suggest you start by keeping to simple and efficient and you are very likely find it is fast enough.

Project Euler #16 - C# 2.0

I've been wrestling with Project Euler Problem #16 in C# 2.0. The crux of the question is that you have to calculate and then iterate through each digit in a number that is 604 digits long (or there-abouts). You then add up these digits to produce the answer.
This presents a problem: C# 2.0 doesn't have a built-in datatype that can handle this sort of calculation precision. I could use a 3rd party library, but that would defeat the purpose of attempting to solve it programmatically without external libraries. I can solve it in Perl; but I'm trying to solve it in C# 2.0 (I'll attempt to use C# 3.0 in my next run-through of the Project Euler questions).
Question
What suggestions (not answers!) do you have for solving project Euler #16 in C# 2.0? What methods would work?
NB: If you decide to post an answer, please prefix your attempt with a blockquote that has ###Spoiler written before it.

A number of a series of digits. A 32 bit unsigned int is 32 binary digits. The string "12345" is a series of 5 digits. Digits can be stored in many ways: as bits, characters, array elements and so on. The largest "native" datatype in C# with complete precision is probably the decimal type (128 bits, 28-29 digits). Just choose your own method of storing digits that allows you to store much bigger numbers.
As for the rest, this will give you a clue:
21 = 2
22 = 21 + 21
23 = 22 + 22
Example:
The sum of digits of 2^100000 is 135178
Ran in 4875 ms
The sum of digits of 2^10000 is 13561
Ran in 51 ms
The sum of digits of 2^1000 is 1366
Ran in 2 ms
SPOILER ALERT: Algorithm and solution in C# follows.
Basically, as alluded to a number is nothing more than an array of digits. This can be represented easily in two ways:
As a string;
As an array of characters or digits.
As others have mentioned, storing the digits in reverse order is actually advisable. It makes the calculations much easier. I tried both of the above methods. I found strings and the character arithmetic irritating (it's easier in C/C++; the syntax is just plain annoying in C#).
The first thing to note is that you can do this with one array. You don't need to allocate more storage at each iteration. As mentioned you can find a power of 2 by doubling the previous power of 2. So you can find 21000 by doubling 1 one thousand times. The doubling can be done in place with the general algorithm:
carry = 0
foreach digit in array
sum = digit + digit + carry
if sum > 10 then
carry = 1
sum -= 10
else
carry = 0
end if
digit = sum
end foreach
This algorithm is basically the same for using a string or an array. At the end you just add up the digits. A naive implementation might add the results into a new array or string with each iteration. Bad idea. Really slows it down. As mentioned, it can be done in place.
But how large should the array be? Well that's easy too. Mathematically you can convert 2^a to 10^f(a) where f(a) is a simple logarithmic conversion and the number of digits you need is the next higher integer from that power of 10. For simplicity, you can just use:
digits required = ceil(power of 2 / 3)
which is a close approximation and sufficient.
Where you can really optimise this is by using larger digits. A 32 bit signed int can store a number between +/- 2 billion (approximately. Well 9 digits equals a billion so you can use a 32 bit int (signed or unsigned) as basically a base one billion "digit". You can work out how many ints you need, create that array and that's all the storage you need to run the entire algorithm (being 130ish bytes) with everything being done in place.
Solution follows (in fairly rough C#):
static void problem16a()
{
const int limit = 1000;
int ints = limit / 29;
int[] number = new int[ints + 1];
number[0] = 2;
for (int i = 2; i <= limit; i++)
{
doubleNumber(number);
}
String text = NumberToString(number);
Console.WriteLine(text);
Console.WriteLine("The sum of digits of 2^" + limit + " is " + sumDigits(text));
}
static void doubleNumber(int[] n)
{
int carry = 0;
for (int i = 0; i < n.Length; i++)
{
n[i] <<= 1;
n[i] += carry;
if (n[i] >= 1000000000)
{
carry = 1;
n[i] -= 1000000000;
}
else
{
carry = 0;
}
}
}
static String NumberToString(int[] n)
{
int i = n.Length;
while (i > 0 && n[--i] == 0)
;
String ret = "" + n[i--];
while (i >= 0)
{
ret += String.Format("{0:000000000}", n[i--]);
}
return ret;
}

I solved this one using C# also, much to my dismay when I discovered that Python can do this in one simple operation.
Your goal is to create an adding machine using arrays of int values.
Spoiler follows
I ended up using an array of int
values to simulate an adding machine,
but I represented the number backwards
- which you can do because the problem only asks for the sum of the digits,
this means order is irrelevant.
What you're essentially doing is
doubling the value 1000 times, so you
can double the value 1 stored in the
1st element of the array, and then
continue looping until your value is
over 10. This is where you will have
to keep track of a carry value. The
first power of 2 that is over 10 is
16, so the elements in the array after
the 5th iteration are 6 and 1.
Now when you loop through the array
starting at the 1st value (6), it
becomes 12 (so you keep the last
digit, and set a carry bit on the next
index of the array) - which when
that value is doubled you get 2 ... plus the 1 for the carry bit which
equals 3. Now you have 2 and 3 in your
array which represents 32.
Continues this process 1000 times and
you'll have an array with roughly 600
elements that you can easily add up.

I have solved this one before, and now I re-solved it using C# 3.0. :)
I just wrote a Multiply extension method that takes an IEnumerable<int> and a multiplier and returns an IEnumerable<int>. (Each int represents a digit, and the first one it the least significant digit.) Then I just created a list with the item { 1 } and multiplied it by 2 a 1000 times. Adding the items in the list is simple with the Sum extension method.
19 lines of code, which runs in 13 ms. on my laptop. :)

Pretend you are very young, with square paper. To me, that is like a list of numbers. Then to double it you double each number, then handle any "carries", by subtracting the 10s and adding 1 to the next index. So if the answer is 1366... something like (completely unoptimized, rot13):
hfvat Flfgrz;
hfvat Flfgrz.Pbyyrpgvbaf.Trarevp;
pynff Cebtenz {
fgngvp ibvq Pneel(Yvfg<vag> yvfg, vag vaqrk) {
juvyr (yvfg[vaqrk] > 9) {
yvfg[vaqrk] -= 10;
vs (vaqrk == yvfg.Pbhag - 1) yvfg.Nqq(1);
ryfr yvfg[vaqrk + 1]++;
}
}
fgngvp ibvq Znva() {
ine qvtvgf = arj Yvfg<vag> { 1 }; // 2^0
sbe (vag cbjre = 1; cbjre <= 1000; cbjre++) {
sbe (vag qvtvg = 0; qvtvg < qvtvgf.Pbhag; qvtvg++) {
qvtvgf[qvtvg] *= 2;
}
sbe (vag qvtvg = 0; qvtvg < qvtvgf.Pbhag; qvtvg++) {
Pneel(qvtvgf, qvtvg);
}
}
qvtvgf.Erirefr();
sbernpu (vag v va qvtvgf) {
Pbafbyr.Jevgr(v);
}
Pbafbyr.JevgrYvar();
vag fhz = 0;
sbernpu (vag v va qvtvgf) fhz += v;
Pbafbyr.Jevgr("fhz: ");
Pbafbyr.JevgrYvar(fhz);
}
}

If you wish to do the primary calculation in C#, you will need some sort of big integer implementation (Much like gmp for C/C++). Programming is about using the right tool for the right job. If you cannot find a good big integer library for C#, it's not against the rules to calculate the number in a language like Python which already has the ability to calculate large numbers. You could then put this number into your C# program via your method of choice, and iterate over each character in the number (you will have to store it as a string). For each character, convert it to an integer and add it to your total until you reach the end of the number. If you would like the big integer, I calculated it with python below. The answer is further down.
Partial Spoiler
10715086071862673209484250490600018105614048117055336074437503883703510511249361
22493198378815695858127594672917553146825187145285692314043598457757469857480393
45677748242309854210746050623711418779541821530464749835819412673987675591655439
46077062914571196477686542167660429831652624386837205668069376
Spoiler Below!
>>> val = str(2**1000)
>>> total = 0
>>> for i in range(0,len(val)): total += int(val[i])
>>> print total
1366

If you've got ruby, you can easily calculate "2**1000" and get it as a string. Should be an easy cut/paste into a string in C#.
Spoiler
In Ruby: (2**1000).to_s.split(//).inject(0){|x,y| x+y.to_i}

spoiler
If you want to see a solution check
out my other answer. This is in Java but it's very easy to port to C#
Here's a clue:
Represent each number with a list. That way you can do basic sums like:
[1,2,3,4,5,6]
+ [4,5]
_____________
[1,2,3,5,0,1]

One alternative to representing the digits as a sequence of integers is to represent the number base 2^32 as a list of 32 bit integers, which is what many big integer libraries do. You then have to convert the number to base 10 for output. This doesn't gain you very much for this particular problem - you can write 2^1000 straight away, then have to divide by 10 many times instead of multiplying 2 by itself 1000 times ( or, as 1000 is 0b1111101000. calculating the product of 2^8,32,64,128,256,512 using repeated squaring 2^8 = (((2^2)^2)^2))) which requires more space and a multiplication method, but is far fewer operations ) - is closer to normal big integer use, so you may find it more useful in later problems ( if you try to calculate the last ten digits of 28433×2^(7830457)+1 using the digit-per int method and repeated addition, it may take some time (though in that case you could use modulo arthimetic, rather than adding strings of millions of digits) ).

Working solution that I have posted it here as well: http://www.mycoding.net/2012/01/solution-to-project-euler-problem-16/
The code:
import java.math.BigInteger;
public class Euler16 {
public static void main(String[] args) {
int power = 1;
BigInteger expo = new BigInteger("2");
BigInteger num = new BigInteger("2");
while(power < 1000){
expo = expo.multiply(num);
power++;
}
System.out.println(expo); //Printing the value of 2^1000
int sum = 0;
char[] expoarr = expo.toString().toCharArray();
int max_count = expoarr.length;
int count = 0;
while(count<max_count){ //While loop to calculate the sum of digits
sum = sum + (expoarr[count]-48);
count++;
}
System.out.println(sum);
}
}

Euler problem #16 has been discussed many times here, but I could not find an answer that gives a good overview of possible solution approaches, the lay of the land as it were. Here's my attempt at rectifying that.
This overview is intended for people who have already found a solution and want to get a more complete picture. It is basically language-agnostic even though the sample code is C#. There are some usages of features that are not available in C# 2.0 but they are not essential - their purpose is only to get boring stuff out of the way with a minimum of fuss.
Apart from using a ready-made BigInteger library (which doesn't count), straightforward solutions for Euler #16 fall into two fundamental categories: performing calculations natively - i.e. in a base that is a power of two - and converting to decimal in order to get at the digits, or performing the computations directly in a decimal base so that the digits are available without any conversion.
For the latter there are two reasonably simple options:
repeated doubling
powering by repeated squaring
Native Computation + Radix Conversion
This approach is the simplest and its performance exceeds that of naive solutions using .Net's builtin BigInteger type.
The actual computation is trivially achieved: just perform the moral equivalent of 1 << 1000, by storing 1000 binary zeroes and appending a single lone binary 1.
The conversion is also quite simple and can be done by coding the pencil-and-paper division method, with a suitably large choice of 'digit' for efficiency. Variables for intermediate results need to be able to hold two 'digits'; dividing the number of decimal digits that fit in a long by 2 gives 9 decimal digits for the maximum meta-digit (or 'limb', as it is usually called in bignum lore).
class E16_RadixConversion
{
const int BITS_PER_WORD = sizeof(uint) * 8;
const uint RADIX = 1000000000; // == 10^9
public static int digit_sum_for_power_of_2 (int exponent)
{
var dec = new List<int>();
var bin = new uint[(exponent + BITS_PER_WORD) / BITS_PER_WORD];
int top = bin.Length - 1;
bin[top] = 1u << (exponent % BITS_PER_WORD);
while (top >= 0)
{
ulong rest = 0;
for (int i = top; i >= 0; --i)
{
ulong temp = (rest << BITS_PER_WORD) | bin[i];
ulong quot = temp / RADIX; // x64 uses MUL (sometimes), x86 calls a helper function
rest = temp - quot * RADIX;
bin[i] = (uint)quot;
}
dec.Add((int)rest);
if (bin[top] == 0)
--top;
}
return E16_Common.digit_sum(dec);
}
}
I wrote (rest << BITS_PER_WORD) | big[i] instead of using operator + because that is precisely what is needed here; no 64-bit addition with carry propagation needs to take place. This means that the two operands could be written directly to their separate registers in a register pair, or to fields in an equivalent struct like LARGE_INTEGER.
On 32-bit systems the 64-bit division cannot be inlined as a few CPU instructions, because the compiler cannot know that the algorithm guarantees quotient and remainder to fit into 32-bit registers. Hence the compiler calls a helper function that can handle all eventualities.
These systems may profit from using a smaller limb, i.e. RADIX = 10000 and uint instead of ulong for holding intermediate (double-limb) results. An alternative for languages like C/C++ would be to call a suitable compiler intrinsic that wraps the raw 32-bit by 32-bit to 64-bit multiply (assuming that division by the constant radix is to be implemented by multiplication with the inverse). Conversely, on 64-bit systems the limb size can be increased to 19 digits if the compiler offers a suitable 64-by-64-to-128 bit multiply primitive or allows inline assembler.
Decimal Doubling
Repeated doubling seems to be everyone's favourite, so let's do that next. Variables for intermediate results need to hold one 'digit' plus one carry bit, which gives 18 digits per limb for long. Going to ulong cannot improve things (there's 0.04 bit missing to 19 digits plus carry), and so we might as well stick with long.
On a binary computer, decimal limbs do not coincide with computer word boundaries. That makes it necessary to perform a modulo operation on the limbs during each step of the calculation. Here, this modulo op can be reduced to a subtraction of the modulus in the event of carry, which is faster than performing a division. The branching in the inner loop can be eliminated by bit twiddling but that would be needlessly obscure for a demonstration of the basic algorithm.
class E16_DecimalDoubling
{
const int DIGITS_PER_LIMB = 18; // == floor(log10(2) * (63 - 1)), b/o carry
const long LIMB_MODULUS = 1000000000000000000L; // == 10^18
public static int digit_sum_for_power_of_2 (int power_of_2)
{
Trace.Assert(power_of_2 > 0);
int total_digits = (int)Math.Ceiling(Math.Log10(2) * power_of_2);
int total_limbs = (total_digits + DIGITS_PER_LIMB - 1) / DIGITS_PER_LIMB;
var a = new long[total_limbs];
int limbs = 1;
a[0] = 2;
for (int i = 1; i < power_of_2; ++i)
{
int carry = 0;
for (int j = 0; j < limbs; ++j)
{
long new_limb = (a[j] << 1) | carry;
carry = 0;
if (new_limb >= LIMB_MODULUS)
{
new_limb -= LIMB_MODULUS;
carry = 1;
}
a[j] = new_limb;
}
if (carry != 0)
{
a[limbs++] = carry;
}
}
return E16_Common.digit_sum(a);
}
}
This is just as simple as radix conversion, but except for very small exponents it does not perform anywhere near as well (despite its huge meta-digits of 18 decimal places). The reason is that the code must perform (exponent - 1) doublings, and the work done in each pass corresponds to about half the total number of digits (limbs).
Repeated Squaring
The idea behind powering by repeated squaring is to replace a large number of doublings with a small number of multiplications.
1000 = 2^3 + 2^5 + 2^6 + 2^7 + 2^8 + 2^9
x^1000 = x^(2^3 + 2^5 + 2^6 + 2^7 + 2^8 + 2^9)
x^1000 = x^2^3 * x^2^5 * x^2^6 * x^2^7 * x^2*8 * x^2^9
x^2^3 can be obtained by squaring x three times, x^2^5 by squaring five times, and so on. On a binary computer the decomposition of the exponent into powers of two is readily available because it is the bit pattern representing that number. However, even non-binary computers should be able to test whether a number is odd or even, or to divide a number by two.
The multiplication can be done by coding the pencil-and-paper method; here I'm using a helper function that computes one row of a product and adds it into the result at a suitably shifted position, so that the rows of partial products do not need to be stored for a separate addition step later. Intermediate values during computation can be up to two 'digits' in size, so that the limbs can be only half as wide as for repeated doubling (where only one extra bit had to fit in addition to a 'digit').
Note: the radix of the computations is not a power of 2, and so the squarings of 2 cannot be computed by simple shifting here. On the positive side, the code can be used for computing powers of bases other than 2.
class E16_DecimalSquaring
{
const int DIGITS_PER_LIMB = 9; // language limit 18, half needed for holding the carry
const int LIMB_MODULUS = 1000000000;
public static int digit_sum_for_power_of_2 (int e)
{
Trace.Assert(e > 0);
int total_digits = (int)Math.Ceiling(Math.Log10(2) * e);
int total_limbs = (total_digits + DIGITS_PER_LIMB - 1) / DIGITS_PER_LIMB;
var squared_power = new List<int>(total_limbs) { 2 };
var result = new List<int>(total_limbs);
result.Add((e & 1) == 0 ? 1 : 2);
while ((e >>= 1) != 0)
{
squared_power = multiply(squared_power, squared_power);
if ((e & 1) == 1)
result = multiply(result, squared_power);
}
return E16_Common.digit_sum(result);
}
static List<int> multiply (List<int> lhs, List<int> rhs)
{
var result = new List<int>(lhs.Count + rhs.Count);
resize_to_capacity(result);
for (int i = 0; i < rhs.Count; ++i)
addmul_1(result, i, lhs, rhs[i]);
trim_leading_zero_limbs(result);
return result;
}
static void addmul_1 (List<int> result, int offset, List<int> multiplicand, int multiplier)
{
// it is assumed that the caller has sized `result` appropriately before calling this primitive
Trace.Assert(result.Count >= offset + multiplicand.Count + 1);
long carry = 0;
foreach (long limb in multiplicand)
{
long temp = result[offset] + limb * multiplier + carry;
carry = temp / LIMB_MODULUS;
result[offset++] = (int)(temp - carry * LIMB_MODULUS);
}
while (carry != 0)
{
long final_temp = result[offset] + carry;
carry = final_temp / LIMB_MODULUS;
result[offset++] = (int)(final_temp - carry * LIMB_MODULUS);
}
}
static void resize_to_capacity (List<int> operand)
{
operand.AddRange(Enumerable.Repeat(0, operand.Capacity - operand.Count));
}
static void trim_leading_zero_limbs (List<int> operand)
{
int i = operand.Count;
while (i > 1 && operand[i - 1] == 0)
--i;
operand.RemoveRange(i, operand.Count - i);
}
}
The efficiency of this approach is roughly on par with radix conversion but there are specific improvements that apply here. Efficiency of the squaring can be doubled by writing a special squaring routine that utilises the fact that ai*bj == aj*bi if a == b, which cuts the number of multiplications in half.
Also, there are methods for computing addition chains that involve fewer operations overall than using the exponent bits for determining the squaring/multiplication schedule.
Helper Code and Benchmarks
The helper code for summing decimal digits in the meta-digits (decimal limbs) produced by the sample code is trivial, but I'm posting it here anyway for your convenience:
internal class E16_Common
{
internal static int digit_sum (int limb)
{
int sum = 0;
for ( ; limb > 0; limb /= 10)
sum += limb % 10;
return sum;
}
internal static int digit_sum (long limb)
{
const int M1E9 = 1000000000;
return digit_sum((int)(limb / M1E9)) + digit_sum((int)(limb % M1E9));
}
internal static int digit_sum (IEnumerable<int> limbs)
{
return limbs.Aggregate(0, (sum, limb) => sum + digit_sum(limb));
}
internal static int digit_sum (IEnumerable<long> limbs)
{
return limbs.Select((limb) => digit_sum(limb)).Sum();
}
}
This can be made more efficient in various ways but overall it is not critical.
All three solutions take O(n^2) time where n is the exponent. In other words, they will take a hundred times as long when the exponent grows by a factor of ten. Radix conversion and repeated squaring can both be improved to roughly O(n log n) by employing divide-and-conquer strategies; I doubt whether the doubling scheme can be improved in a similar fastion but then it was never competitive to begin with.
All three solutions presented here can be used to print the actual results, by stringifying the meta-digits with suitable padding and concatenating them. I've coded the functions as returning the digit sum instead of the arrays/lists with decimal limbs only in order to keep the sample code simple and to ensure that all functions have the same signature, for benchmarking.
In these benchmarks, the .Net BigInteger type was wrapped like this:
static int digit_sum_via_BigInteger (int power_of_2)
{
return System.Numerics.BigInteger.Pow(2, power_of_2)
.ToString()
.ToCharArray()
.Select((c) => (int)c - '0')
.Sum();
}
Finally, the benchmarks for the C# code:
# testing decimal doubling ...
1000: 1366 in 0,052 ms
10000: 13561 in 3,485 ms
100000: 135178 in 339,530 ms
1000000: 1351546 in 33.505,348 ms
# testing decimal squaring ...
1000: 1366 in 0,023 ms
10000: 13561 in 0,299 ms
100000: 135178 in 24,610 ms
1000000: 1351546 in 2.612,480 ms
# testing radix conversion ...
1000: 1366 in 0,018 ms
10000: 13561 in 0,619 ms
100000: 135178 in 60,618 ms
1000000: 1351546 in 5.944,242 ms
# testing BigInteger + LINQ ...
1000: 1366 in 0,021 ms
10000: 13561 in 0,737 ms
100000: 135178 in 69,331 ms
1000000: 1351546 in 6.723,880 ms
As you can see, the radix conversion is almost as slow as the solution using the builtin BigInteger class. The reason is that the runtime is of the newer type that does performs certain standard optimisations only for signed integer types but not for unsigned ones (here: implementing division by a constant as multiplication with the inverse).
I haven't found an easy means of inspecting the native code for existing .Net assemblies, so I decided on a different path of investigation: I coded a variant of E16_RadixConversion for comparison where ulong and uint were replaced by long and int respectively, and BITS_PER_WORD decreased by 1 accordingly. Here are the timings:
# testing radix conv Int63 ...
1000: 1366 in 0,004 ms
10000: 13561 in 0,202 ms
100000: 135178 in 18,414 ms
1000000: 1351546 in 1.834,305 ms
More than three times as fast as the version that uses unsigned types! Clear evidence of numbskullery in the compiler...
In order to showcase the effect of different limb sizes I templated the solutions in C++ on the unsigned integer types used as limbs. The timings are prefixed with the byte size of a limb and the number of decimal digits in a limb, separated by a colon. There is no timing for the often-seen case of manipulating digit characters in strings, but it is safe to say that such code will take at least twice as long as the code that uses double digits in byte-sized limbs.
# E16_DecimalDoubling
[1:02] e = 1000 -> 1366 0.308 ms
[2:04] e = 1000 -> 1366 0.152 ms
[4:09] e = 1000 -> 1366 0.070 ms
[8:18] e = 1000 -> 1366 0.071 ms
[1:02] e = 10000 -> 13561 30.533 ms
[2:04] e = 10000 -> 13561 13.791 ms
[4:09] e = 10000 -> 13561 6.436 ms
[8:18] e = 10000 -> 13561 2.996 ms
[1:02] e = 100000 -> 135178 2719.600 ms
[2:04] e = 100000 -> 135178 1340.050 ms
[4:09] e = 100000 -> 135178 588.878 ms
[8:18] e = 100000 -> 135178 290.721 ms
[8:18] e = 1000000 -> 1351546 28823.330 ms
For the exponent of 10^6 there is only the timing with 64-bit limbs, since I didn't have the patience to wait many minutes for full results. The picture is similar for radix conversion, except that there is no row for 64-bit limbs because my compiler does not have a native 128-bit integral type.
# E16_RadixConversion
[1:02] e = 1000 -> 1366 0.080 ms
[2:04] e = 1000 -> 1366 0.026 ms
[4:09] e = 1000 -> 1366 0.048 ms
[1:02] e = 10000 -> 13561 4.537 ms
[2:04] e = 10000 -> 13561 0.746 ms
[4:09] e = 10000 -> 13561 0.243 ms
[1:02] e = 100000 -> 135178 445.092 ms
[2:04] e = 100000 -> 135178 68.600 ms
[4:09] e = 100000 -> 135178 19.344 ms
[4:09] e = 1000000 -> 1351546 1925.564 ms
The interesting thing is that simply compiling the code as C++ doesn't make it any faster - i.e., the optimiser couldn't find any low-hanging fruit that the C# jitter missed, apart from not toeing the line with regard to penalising unsigned integers. That's the reason why I like prototyping in C# - performance in the same ballpark as (unoptimised) C++ and none of the hassle.
Here's the meat of the C++ version (sans reams of boring stuff like helper templates and so on) so that you can see that I didn't cheat to make C# look better:
template<typename W>
struct E16_RadixConversion
{
typedef W limb_t;
typedef typename detail::E16_traits<W>::long_t long_t;
static unsigned const BITS_PER_WORD = sizeof(limb_t) * CHAR_BIT;
static unsigned const RADIX_DIGITS = std::numeric_limits<limb_t>::digits10;
static limb_t const RADIX = detail::pow10_t<limb_t, RADIX_DIGITS>::RESULT;
static unsigned digit_sum_for_power_of_2 (unsigned e)
{
std::vector<limb_t> digits;
compute_digits_for_power_of_2(e, digits);
return digit_sum(digits);
}
static void compute_digits_for_power_of_2 (unsigned e, std::vector<limb_t> &result)
{
assert(e > 0);
unsigned total_digits = unsigned(std::ceil(std::log10(2) * e));
unsigned total_limbs = (total_digits + RADIX_DIGITS - 1) / RADIX_DIGITS;
result.resize(0);
result.reserve(total_limbs);
std::vector<limb_t> bin((e + BITS_PER_WORD) / BITS_PER_WORD);
bin.back() = limb_t(limb_t(1) << (e % BITS_PER_WORD));
while (!bin.empty())
{
long_t rest = 0;
for (std::size_t i = bin.size(); i-- > 0; )
{
long_t temp = (rest << BITS_PER_WORD) | bin[i];
long_t quot = temp / RADIX;
rest = temp - quot * RADIX;
bin[i] = limb_t(quot);
}
result.push_back(limb_t(rest));
if (bin.back() == 0)
bin.pop_back();
}
}
};
Conclusion
These benchmarks also show that this Euler task - like many others - seems designed to be solved on a ZX81 or an Apple ][, not on our modern toys that are a million times as powerful. There's no challenge involved here unless the limits are increased drastically (an exponent of 10^5 or 10^6 would be much more adequate).
A good overview of the practical state of the art can be got from GMP's overview of algorithms. Another excellent overview of the algorithms is chapter 1 of "Modern Computer Arithmetic" by Richard Brent and Paul Zimmermann. It contains exactly what one needs to know for coding challenges and competitions, but unfortunately the depth is not equal to that of Donald Knuth's treatment in "The Art of Computer Programming".
The radix conversion solution adds a useful technique to one's code challenge toolchest, since the given code can be trivially extended for converting any old big integer instead of only the bit pattern 1 << exponent. The repeated squaring solutiono can be similarly useful since changing the sample code to power something other than 2 is again trivial.
The approach of performing computations directly in powers of 10 can be useful for challenges where decimal results are required, because performance is in the same ballpark as native computation but there is no need for a separate conversion step (which can require similar amounts of time as the actual computation).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.