Do long string keys in a Dictionary cause performance problems? - c#

I thinking about using a Dictionary<string, object> for looking up values by a string key. Based on my knowledge the longer keys the longer it takes to do lookups in a dictionary. My keys can be pretty long, like /page-1/page-2/page-3/page-4 ... and so on where each name can be pretty long by them self.
What performance hit can I expect when using long string keys in a Dictionary? What mechanism causes these costs?

Each time you access a key in that dictionary the input you pass in must be hashed. .NET does not cache string hash codes. Hashing is a linear operation in the input string length. 10 times the length is about 10 times the hashing cost.
The same goes for equality comparisons. When the dictionary finds that two hash codes are equals (this happens on every successful lookup and on each key collision) it must compare the strings. This is again a linear operation but a very fast one.
Those are pretty much the only costs that long keys cause.
I can't tell you whether this is fast enough or not for your use case. You'll have to measure. The answer depends on the key length and how often you access the dictionary.

This is how HashCode is computed for string.
public override unsafe int GetHashCode()
{
if (HashHelpers.s_UseRandomizedStringHashing)
return string.InternalMarvin32HashString(this, this.Length, 0L);
fixed (char* chPtr = this)
{
int num1 = 352654597;
int num2 = num1;
int* numPtr = (int*) chPtr;
int length = this.Length;
while (length > 2)
{
num1 = (num1 << 5) + num1 + (num1 >> 27) ^ *numPtr;
num2 = (num2 << 5) + num2 + (num2 >> 27) ^ numPtr[1];
numPtr += 2;
length -= 4;
}
if (length > 0)
num1 = (num1 << 5) + num1 + (num1 >> 27) ^ *numPtr;
return num1 + num2 * 1566083941;
}
}
So as we can see hash code computation cost depends directly on a length of a string.

Related

Hash tables with long (100+ character) key names

I am working on a data structure for a utility of mine, and I am TEMPTED to do a hash table in which the key is a very long string, specifically a file path. There are a number of reasons why this makes sense from a data standpoint, mainly the fact that the path is guaranteed unique. That said, every single example I have seen of a hash table has very short keys and potentially long values. So, I am wondering if that is just a function of easy examples? Or is there a performance or technical reason not to use long keys?
I will be using $variable = New-Object Collections.Specialized.OrderedDictionary for version agnostic ordering, if that makes any difference.
I think you are fine to have keys that have a long string.
Under the hood, the key lookup in OrderedDictionary is doing this in
if (objectsTable.Contains(key)) {
objectsTable is of type Hashtable
If you follow the chain of getting the hash in the Hashtable class, you'll get to this:
https://referencesource.microsoft.com/#mscorlib/system/collections/hashtable.cs,4f6addb8551463cf
// Internal method to get the hash code for an Object. This will call
// GetHashCode() on each object if you haven't provided an IHashCodeProvider
// instance. Otherwise, it calls hcp.GetHashCode(obj).
protected virtual int GetHash(Object key)
{
if (_keycomparer != null)
return _keycomparer.GetHashCode(key);
return key.GetHashCode();
}
So, the question becomes, what's the cost of getting a HashCode on a string?
https://referencesource.microsoft.com/#mscorlib/system/string.cs
The function GetHashCode, you'll see is a loop, but its only an O(n) function as it only grows based on the string length. You'll notice the computation for a hash is a bit different on 32-bit machines than on others, but O(n) is a worse case for expansion of the algorithm.
There's other parts of the function, but I think this is the key part, as it's the part that can grow (src is the char* meaning a pointing to the characters in the string).
#if WIN32
// 32 bit machines.
int* pint = (int *)src;
int len = this.Length;
while (len > 2)
{
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
pint += 2;
len -= 4;
}
if (len > 0)
{
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
}
#else
int c;
char *s = src;
while ((c = s[0]) != 0) {
hash1 = ((hash1 << 5) + hash1) ^ c;
c = s[1];
if (c == 0)
break;
hash2 = ((hash2 << 5) + hash2) ^ c;
s += 2;
}
#endif

C# Random(Long)

I'm trying to generate a number based on a seed in C#. The only problem is that the seed is too big to be an int32. Is there a way I can use a long as the seed?
And yes, the seed MUST be a long.
Here's a C# version of Java.Util.Random that I ported from the Java Specification.
The best thing to do is to write a Java program to generate a load of numbers and check that this C# version generates the same numbers.
public sealed class JavaRng
{
public JavaRng(long seed)
{
_seed = (seed ^ LARGE_PRIME) & ((1L << 48) - 1);
}
public int NextInt(int n)
{
if (n <= 0)
throw new ArgumentOutOfRangeException("n", n, "n must be positive");
if ((n & -n) == n) // i.e., n is a power of 2
return (int)((n * (long)next(31)) >> 31);
int bits, val;
do
{
bits = next(31);
val = bits % n;
} while (bits - val + (n-1) < 0);
return val;
}
private int next(int bits)
{
_seed = (_seed*LARGE_PRIME + SMALL_PRIME) & ((1L << 48) - 1);
return (int) (((uint)_seed) >> (48 - bits));
}
private long _seed;
private const long LARGE_PRIME = 0x5DEECE66DL;
private const long SMALL_PRIME = 0xBL;
}
For anyone seeing this question today, .NET 6 and upwards provides Random.NextInt64, which has the following overloads:
NextInt64()
Returns a non-negative random integer.
NextInt64(Int64)
Returns a non-negative random integer that is less than the specified maximum.
NextInt64(Int64, Int64)
Returns a random integer that is within a specified range.
I'd go for the answer provided here by #Dyppl: Random number in long range, is this the way?
Put this function where it's accessible to the code that needs to generate the random number:
long LongRandom(long min, long max, Random rand)
{
byte[] buf = new byte[8];
rand.NextBytes(buf);
long longRand = BitConverter.ToInt64(buf, 0);
return (Math.Abs(longRand % (max - min)) + min);
}
Then call the function like this:
long r = LongRandom(100000000000000000, 100000000000000050, new Random());

Does String.GetHashCode consider the full string or only part of it?

I'm just curious because I guess it will have impact on performance. Does it consider the full string? If yes, it will be slow on long string. If it only consider part of the string, it will have bad performance (e.g. if it only consider the beginning of the string, it will have bad performance if a HashSet contains mostly strings with the same.
Be sure to obtain the Reference Source source code when you have questions like this. There's a lot more to it than what you can see from a decompiler. Pick the one that matches your preferred .NET target, the method has changed a great deal between versions. I'll just reproduce the .NET 4.5 version of it here, retrieved from Source.NET 4.5\4.6.0.0\net\clr\src\BCL\System\String.cs\604718\String.cs
public override int GetHashCode() {
#if FEATURE_RANDOMIZED_STRING_HASHING
if(HashHelpers.s_UseRandomizedStringHashing)
{
return InternalMarvin32HashString(this, this.Length, 0);
}
#endif // FEATURE_RANDOMIZED_STRING_HASHING
unsafe {
fixed (char *src = this) {
Contract.Assert(src[this.Length] == '\0', "src[this.Length] == '\\0'");
Contract.Assert( ((int)src)%4 == 0, "Managed string should start at 4 bytes boundary");
#if WIN32
int hash1 = (5381<<16) + 5381;
#else
int hash1 = 5381;
#endif
int hash2 = hash1;
#if WIN32
// 32 bit machines.
int* pint = (int *)src;
int len = this.Length;
while (len > 2)
{
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
hash2 = ((hash2 << 5) + hash2 + (hash2 >> 27)) ^ pint[1];
pint += 2;
len -= 4;
}
if (len > 0)
{
hash1 = ((hash1 << 5) + hash1 + (hash1 >> 27)) ^ pint[0];
}
#else
int c;
char *s = src;
while ((c = s[0]) != 0) {
hash1 = ((hash1 << 5) + hash1) ^ c;
c = s[1];
if (c == 0)
break;
hash2 = ((hash2 << 5) + hash2) ^ c;
s += 2;
}
#endif
#if DEBUG
// We want to ensure we can change our hash function daily.
// This is perfectly fine as long as you don't persist the
// value from GetHashCode to disk or count on String A
// hashing before string B. Those are bugs in your code.
hash1 ^= ThisAssembly.DailyBuildNumber;
#endif
return hash1 + (hash2 * 1566083941);
}
}
}
This is possibly more than you bargained for, I'll annotate the code a bit:
The #if conditional compilation directives adapt this code to different .NET targets. The FEATURE_XX identifiers are defined elsewhere and turn features off whole sale throughout the .NET source code. WIN32 is defined when the target is the 32-bit version of the framework, the 64-bit version of mscorlib.dll is built separately and stored in a different subdirectory of the GAC.
The s_UseRandomizedStringHashing variable enables a secure version of the hashing algorithm, designed to keep programmers out of trouble that do something unwise like using GetHashCode() to generate hashes for things like passwords or encryption. It is enabled by an entry in the app.exe.config file
The fixed statement keeps indexing the string cheap, avoids the bounds checking done by the regular indexer
The first Assert ensures that the string is zero-terminated as it should be, required to allow the optimization in the loop
The second Assert ensures that the string is aligned to an address that's a multiple of 4 as it should be, required to keep the loop performant
The loop is unrolled by hand, consuming 4 characters per loop for the 32-bit version. The cast to int* is a trick to store 2 characters (2 x 16 bits) in a int (32-bits). The extra statements after the loop deal with a string whose length is not a multiple of 4. Note that the zero terminator may or may not be included in the hash, it won't be if the length is even. It looks at all the characters in the string, answering your question
The 64-bit version of the loop is done differently, hand-unrolled by 2. Note that it terminates early on an embedded zero, so doesn't look at all the characters. Otherwise very uncommon. That's pretty odd, I can only guess that this has something to do with strings potentially being very large. But can't think of a practical example
The debug code at the end ensures that no code in the framework ever takes a dependency on the hash code being reproducible between runs.
The hash algorithm is pretty standard. The value 1566083941 is a magic number, a prime that is common in a Mersenne twister.
Examining the source code (courtesy of ILSpy), we can see that it does iterate over the length of the string.
// string
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail), SecuritySafeCritical]
public unsafe override int GetHashCode()
{
IntPtr arg_0F_0;
IntPtr expr_06 = arg_0F_0 = this;
if (expr_06 != 0)
{
arg_0F_0 = (IntPtr)((int)expr_06 + RuntimeHelpers.OffsetToStringData);
}
char* ptr = arg_0F_0;
int num = 352654597;
int num2 = num;
int* ptr2 = (int*)ptr;
for (int i = this.Length; i > 0; i -= 4)
{
num = ((num << 5) + num + (num >> 27) ^ *ptr2);
if (i <= 2)
{
break;
}
num2 = ((num2 << 5) + num2 + (num2 >> 27) ^ ptr2[(IntPtr)4 / 4]);
ptr2 += (IntPtr)8 / 4;
}
return num + num2 * 1566083941;
}

string.GetHashCode() returns different values in debug vs release, how do I avoid this?

To my surprise the folowing method produces a different result in debug vs release:
int result = "test".GetHashCode();
Is there any way to avoid this?
I need a reliable way to hash a string and I need the value to be consistent in debug and release mode. I would like to avoid writing my own hashing function if possible.
Why does this happen?
FYI, reflector gives me:
[ReliabilityContract(Consistency.WillNotCorruptState, Cer.MayFail), SecuritySafeCritical]
public override unsafe int GetHashCode()
{
fixed (char* str = ((char*) this))
{
char* chPtr = str;
int num = 0x15051505;
int num2 = num;
int* numPtr = (int*) chPtr;
for (int i = this.Length; i > 0; i -= 4)
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
if (i <= 2)
{
break;
}
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
numPtr += 2;
}
return (num + (num2 * 0x5d588b65));
}
}
GetHashCode() is not what you should be using to hash a string, almost 100% of the time. Without knowing what you're doing, I recommend that you use an actual hash algorithm, like SHA-1:
using(System.Security.Cryptography.SHA1Managed hp = new System.Security.Cryptography.SHA1Managed()) {
// Use hp.ComputeHash(System.Text.Encoding.ASCII (or Unicode, UTF8, UTF16, or UTF32 or something...).GetBytes(theString) to compute the hash code.
}
Update: For something a little bit faster, there's also SHA1Cng, which is significantly faster than SHA1Managed.
Here's a better approach that is much faster than SHA and you can replace the modified GetHasCode with it: C# fast hash murmur2
There are several implementations with different levels of "unmanaged" code, so if you need fully managed it's there and if you can use unsafe it's there too.
/// <summary>
/// Default implementation of string.GetHashCode is not consistent on different platforms (x32/x64 which is our case) and frameworks.
/// FNV-1a - (Fowler/Noll/Vo) is a fast, consistent, non-cryptographic hash algorithm with good dispersion. (see http://isthe.com/chongo/tech/comp/fnv/#FNV-1a)
/// </summary>
private static int GetFNV1aHashCode(string str)
{
if (str == null)
return 0;
var length = str.Length;
// original FNV-1a has 32 bit offset_basis = 2166136261 but length gives a bit better dispersion (2%) for our case where all the strings are equal length, for example: "3EC0FFFF01ECD9C4001B01E2A707"
int hash = length;
for (int i = 0; i != length; ++i)
hash = (hash ^ str[i]) * 16777619;
return hash;
}
I guess this implementation is slower than the unsafe one posted here. But it's much simpler and safe. Works good in case super speed is not needed.

Faster String GetHashCode (e.g. using Multicore or GPU)

According to http://www.codeguru.com/forum/showthread.php?t=463663 , C#'s getHashCode function in 3.5 is implemented as:
public override unsafe int GetHashCode()
{
fixed (char* str = ((char*) this))
{
char* chPtr = str;
int num = 0x15051505;
int num2 = num;
int* numPtr = (int*) chPtr;
for (int i = this.Length; i > 0; i -= 4)
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
if (i <= 2)
{
break;
}
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
numPtr += 2;
}
return (num + (num2 * 0x5d588b65));
}
}
I am curious if anyone can come up with a function which returns the same results, but is faster. It is OK to increase the overall starting and resource overhead of the main application. Requiring a one-time initialization (per application execution, not per call or per string) is OK.
Note that unlike Microsoft, considerations like, "doing it this way will make everything else slower and has costs that make this method stupid!" can be ignored, so it is possible that even assuming Microsoft's is perfect, it can be beaten by doing something "stupid."
This purely an exercise in my own curiosity and will not be used in real code.
Examples of ideas I've thought of:
Using multiple cores (calculating num2 and num independently)
Using the gpu
One way to make a function go faster is to take special cases into account.
A function with variable size inputs has special cases based on size.
Going parallel only makes sense when the the cost of going parallel
is smaller than the gain, and for this kind of computation it is likely
that the string would have to be fairly large to overcome the cost
of forking a parallel thread. But implementing that isn't hard;
basically you need a test for this.Length exceeding an empirically
determined threshold, and then forking multiple threads to compute
hashes on substrings, with a final step composing the subhashes into
a final hash. Implementation left for the reader.
Modern processors also have SIMD instructions, which can process up
to 32 (or 64) bytes in a single instruction. This would allow you
to process the string in 32 (16 bit character) chunks in one-two
SIMD instructions per chunk; and then fold the 64 byte result into
a single hashcode at the end. This is likely to be extremely fast
for strings of any reasonable size. The implementation of this
from C# is harder, because one doesn't expect a virtual machine
to provide provide easy (or portable) access to the SIMD instructions
that you need. Implementation also left for the reader.
EDIT: Another answer suggests that Mono system does provide
SIMD instruction access.
Having said that, the particular implementation exhibited is pretty stupid.
The key observation is that the loop checks the limit twice on every iteration.
One can solve that problem by checking the end condition cases in advance,
and executing a loop that does the correct number of iterations.
One can do better than that by using
Duffs device
to jump into an unrolled loop of N iterations. This gets rid of
the loop limit checking overhead for N-1 iterations. That modification
would be very easy and surely be worth the effort to implement.
EDIT: You can also combine the SIMD idea and the loop unrolling idea to enable processing many chunks of 8/16 characters in a few SIMD instrucions.
For languages that can't jump into loops, one can do the equivalent of
Duff's device by simply peeling off the initial cases. A shot at
how to recode the original code using the loop peeling approach is the following:
public override unsafe int GetHashCode()
{
fixed (char* str = ((char*) this))
{
const int N=3; // a power of two controlling number of loop iterations
char* chPtr = str;
int num = 0x15051505;
int num2 = num;
int* numPtr = (int*) chPtr;
count = this.length;
unrolled_iterations = count >> (N+1); // could be 0 and that's OK
for (int i = unrolled_iterations; i > 0; i--)
{
// repeat 2**N times
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[2];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[3]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[4];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[5]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[6];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[7]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[8];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[9]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[10];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[11]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[12];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[13]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[14];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[15]; }
numPtr += 16;
}
if (count & ((1<<N)-1))
{
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[2];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[3]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[4];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[5]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[6];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[7]; }
numPtr += 8;
}
if (count & ((1<<(N-1))-1))
{
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1]; }
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[2];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[3]; }
numPtr += 4;
}
if (count & ((1<<(N-2)-1))
{
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1]; }
numPtr += 2;
}
// repeat N times and finally:
if { count & (1) }
{
{ num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
// numPtr += 1;
}
return (num + (num2 * 0x5d588b65));
}
}
I haven't compiled or tested this code, but the idea is right.
It depends on the compiler doing reasonable constant folding
and address arithmetic.
I tried to code this to preserve the exact hash value of the original,
but IMHO that isn't really a requirement.
It would be even simpler and a tiny bit faster if it didn't use
the num/num2 stunt, but simply updated num for each character.
Corrected version (by Brian) as a static function:
public static unsafe int GetHashCodeIra(string x)
{
fixed (char* str = x.ToCharArray())
{
const int N = 2; // a power of two controlling number of loop iterations
char* chPtr = str;
int num = 0x15051505;
int num2 = num;
int* numPtr = (int*)chPtr;
int count = (x.Length+1) / 2;
int unrolled_iterations = count >> (N+1); // could be 0 and that's OK
for (int i = unrolled_iterations; i > 0; i--)
{ // repeat 2**N times
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
}
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[2];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[3];
}
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[4];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[5];
}
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[6];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[7];
}
numPtr += 8;
}
if (0 != (count & ((1 << N) )))
{
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
}
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[2];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[3];
}
numPtr += 4;
}
if (0 != (count & ((1 << (N - 1) ))))
{
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
num2 = (((num2 << 5) + num2) + (num2 >> 0x1b)) ^ numPtr[1];
}
numPtr += 2;
}
// repeat N times and finally:
if (1 == (count & 1))
{
{
num = (((num << 5) + num) + (num >> 0x1b)) ^ numPtr[0];
// numPtr += 1;
}
}
return (num + (num2 * 0x5d588b65));
}
}
Threads and GPU most certainly will introduce overhead greater than possible performance boost. The approach that could be justified is using SIMD instruction sets, such as SSE. However, it would require testing whether this partcular instruction set is available, which may cost. It will also bring boost on long strings only.
If you want to try it, consider testing Mono support for SIMD before diving into C or assembly. Read here about development possibilities and gotchas.
You could parallelize this however the problem you will run into is that threads, CUDA, etc have overheads associated with them. Even if you use a thread pool, if your strings are not very large, let's say a typical string is 128-256 characters (probably less than this) you will probably still end up making each call to this function taking longer than it did originally.
Now, if you were dealing with very large strings, then yes it would improve your time. The simple algorithm is "embarrassingly parallel."
I think all of your suggested approaches are very inefficient compared to the current implementation.
Using GPU:
The string data needs to be transferred to the GPU and the result back, which takes a lot of time. GPU's are very fast, but only when comparing floating point calculations, which aren't used here. All operations are on Integers, for which x86 CPU power is decent.
Using Another CPU Core:
This would involve creating a separate thread, locking down memory and synchronizing the thread requesting the Hash Code. The incurred overhead simply outweighs the benefits of parallel processing.
If you would want to calculate Hash values of thousands of strings in one go, things might look a little different, but I can't imagine a scenario where this would justify implementing a faster GetHashCode().
Each step in the computation builds on the result of the previous step. If iterations of the loop run out of order, you will get a different result (the value of num from the previous iteration serves as input to the next iteration).
For that reason, any approach (multithreading, massively parallel execution on a GPU) that runs steps in parallel will generally skew the result.
Also, I would be surprised if the previously discussed loop unrolling is not already being done internally by the compiler to the extent that it actually makes a difference in execution time (compilers tend to be smarter than the average programmer these days, and loop unrolling has been around for a really long time as a compiler optimization technique).
Given that strings are immutable, the first thing that I would consider is caching the return result.

Categories

Resources