Parallelizing very large array base conversion

Parallelizing very large array base conversion - c#

I have a method that converts value to a newBase number of length length.
The logic in english is:
If we calculated every possible combination of numbers from 0 to (c-1)
with a length of x
what set would occur at point i
While the method below does work perfectly, because very large numbers are used, it can take a long time to complete:
For example, value=(((65536^480000)-1)/2), newbase=(65536), length=(480000) takes about an hour to complete on a 64 bit architecture, quad core PC).
private int[] GetValues(BigInteger value, int newBase, int length)
{
Stack<int> result = new Stack<int>();
while (value > 0)
{
result.Push((int)(value % newBase));
if (value < newBase)
value = 0;
else
value = value / newBase;
}
for (var i = result.Count; i < length; i++)
{
result.Push(0);
}
return result.ToArray();
}
My question is, how can I change this method into something that will allow multiple threads to work out part of the number?
I am working C#, but if you're not familiar with that then pseudocode is fine too.
Note: The method is from this question: Cartesian product subset returning set of mostly 0

If that GetValues method is really the bottleneck, there are several things you can do to speed it up.
First, you're dividing by newBase every time through the loop. Since newBase is an int, and the BigInteger divide method divides by a BigInteger, you're potentially incurring the cost of an int-to-BigInteger conversion on every iteration. You might consider:
BigInteger bigNewBase = newBase;
Also, you can cut the number of divides in half by calling DivRem:
while (value > 0)
{
BigInteger rem;
value = BigInteger.DivRem(value, bigNewBase, out rem);
result.Push((int)rem);
}
One other optimization, as somebody mentioned in comments, would be to store the digits in a pre-allocated array. You'll have to call Array.Reverse to get them in the proper order, but that takes approximately no time.
That method, by the way, doesn't lend itself to parallelizing because computing each digit depends on the computation of the previous digit.

Related

BitArray change bit within range

How can I ensure that when changing a bit from a BitArray, the BitArray value remains in a range.
Example:
Given the range [-5.12, 5.12] and
a = 0100000000000000011000100100110111010010111100011010100111111100 ( = 2.048)
By changing a bit at a random position, I need to ensure that the new value remains in the given range.

I'm not 100% sure what you are doing and this answer assumes you are storing a as a 64-bit value (long) currently. The following code may help point you in the right direction.
const double minValue = -5.12;
const double maxValue = 5.12;
var initialValue = Convert.ToInt64("100000000000000011000100100110111010010111100011010100111111100", 2);
var changedValue = ChangeRandomBit(initialValue); // However you're doing this
var changedValueAsDouble = BitConverter.Int64BitsToDouble(initialValue);
if ((changedValueAsDouble < minValue) || (changedValueAsDouble > maxValue))
{
// Do something
}

It looks like double (64 bits and result has decimal point).
As you may know it has sign bit, exponent and fraction, so you can not change random bit and still have value in the range, with some exceptions:
sign bit can be changed without problem if your range is [-x;+x] (same x);
changing exponent or fraction will require to check new value range but:
changing exponent of fraction bit from 1 to 0 will make |a| less.
I don't know what you are trying to achieve, care to share? Perhaps you are trying to validate or correct something, then you may have a look at this.

Here's an extension method that undoes the set bit if the new value of the float is outside the given range (this is an example only, it relies on the BitArray holding a float with no checks, which is pretty horrible so just hack a solution out of this, incl changing to double):
static class Extension
{
public static void SetFloat(this BitArray array, int index, bool value, float min, float max)
{
bool old = array.Get(index);
array.Set(index, value);
byte[] bytes = new byte[4];
array.CopyTo(bytes, 0);
float f = BitConverter.ToSingle(bytes, 0);
if (f < min || f > max)
array.Set(index, old);
}
}
Example use:
static void Main(string[] args)
{
float f = 2.1f;
byte[] bytes = System.BitConverter.GetBytes(f);
BitArray array = new BitArray(bytes);
array.Set(20, true, -5.12f, 5.12f);
}

If you can actually limit your precision, then this would be a lot easier. For example given the range:
[-5.12, 5.12]
If I multiply 5.12 by 100, I get
[-512, 512]
And the integer 512 in binary is, of course:
1000000000
So now you know you can set any of the first 9 bits and you'll be < 512 if the 10th bit is 0. If you set the 10th bit, you will have to set all the other bits to 0. With a little extra effort, this can be extended to deal with 2's complement negative values too (although, I might be inclined just to convert them to positive values)
Now if you actually need to accommodate the 3 d.p. of 2.048, then you'll need to multiply all you values by 1000 instead and it will be a little more difficult because 5120 in binary is 1010000000000
You know you can do anything you want with everything except the most significant bit (MSB) if the MSB is 0. In this case, if the MSB is 1, but the next 2 bits are 0, you can do anything you want with the remaining bits.
The logic involved with dealing directly with the number in IEEE-754 floating point format is probably going to be torturous.
Or you could just go with the "mutate the value and then test it" approach, if it's out-of-range, go back and try again. Which might be suitable (in practice), but won't be guaranteed to exit.
A final thought, depending on exactly what you are doing, you might want to also look at Gray Codes. The idea of a Gray Code is to make it such that each value is only 1 bit flip apart. With naturally encoded binary, a flip of the MSB has orders of magnitude more impact on the final value than a flip of the LSB.

c# float [] average loses accuracy

I am trying to calculate average for an array of floats. I need to use indices because this is inside a binary search so the top and bottom will move. (Big picture we are trying to optimize a half range estimation so we don't have to re-create the array each pass).
Anyway I wrote a custom average loop and I'm getting 2 places less accuracy than the c# Average() method
float test = input.Average();
int count = (top - bottom) + 1;//number of elements in this iteration
int pos = bottom;
float average = 0f;//working average
while (pos <= top)
{
average += input[pos];
pos++;
}
average = average / count;
example:
0.0371166766 - c#
0.03711666 - my loop
125090.148 - c#
125090.281 - my loop
http://pastebin.com/qRE3VrCt

I'm getting 2 places less accuracy than the c# Average()
No, you are only losing 1 significant digit. The float type can only store 7 significant digits, the rest are just random noise. Inevitably in a calculation like this, you can accumulate round-off error and thus lose precision. Getting the round-off errors to balance out requires luck.
The only way to avoid it is to use a floating point type that has more precision to accumulate the result. Not an issue, you have double available. Which is why the Linq Average method looks like this:
public static float Average(this IEnumerable<float> source) {
if (source == null) throw Error.ArgumentNull("source");
double sum = 0; // <=== NOTE: double
long count = 0;
checked {
foreach (float v in source) {
sum += v;
count++;
}
}
if (count > 0) return (float)(sum / count);
throw Error.NoElements();
}
Use double to reproduce the Linq result with a comparable number of significant digits in the result.

I'd rewrite this as:
int count = (top - bottom) + 1;//number of elements in this iteration
double sum = 0;
for(int i = bottom; i <= top; i++)
{
sum += input[i];
}
float average = (float)(sum/count);
That way you're using a high precision accumulator, which helps reduce rounding errors.
btw. if performance isn't that important, you can still use LINQ to calculate the average of an array slice:
input.Skip(bottom).Take(top - bottom + 1).Average()
I'm not entirely sure if that fits your problem, but if you need to calculate the average of many subarrays, it can be useful to create a persistent sum array, so calculating an average simply becomes two table lookups and a division.

Just to add to the conversation, be careful when using Floating point primitives.
What Every Computer Scientist Should Know About Floating-Point Arithmetic
Internally floating point numbers store additional least significant bits that are not reflected in the displayed value (aka: Guard Bits or Guard Digits). They are, however, utilized when performing mathematical operations and equality checks. One common result is that a variable containing 0f is not always zero. When accumulating floating point values this can also lead to precision errors.
Use Decimal for your accumulator:
Will not have rounding errors due to Guard Digits
Is a 128bit data type (less likely to exceed Max Value in your accumulator).
For more info:
What is the difference between Decimal, Float and Double in C#?

Inconsistent multiplication performance with floats

While testing the performance of floats in .NET, I stumbled unto a weird case: for certain values, multiplication seems way slower than normal. Here is the test case:
using System;
using System.Diagnostics;
namespace NumericPerfTestCSharp {
class Program {
static void Main() {
Benchmark(() => float32Multiply(0.1f), "\nfloat32Multiply(0.1f)");
Benchmark(() => float32Multiply(0.9f), "\nfloat32Multiply(0.9f)");
Benchmark(() => float32Multiply(0.99f), "\nfloat32Multiply(0.99f)");
Benchmark(() => float32Multiply(0.999f), "\nfloat32Multiply(0.999f)");
Benchmark(() => float32Multiply(1f), "\nfloat32Multiply(1f)");
}
static void float32Multiply(float param) {
float n = 1000f;
for (int i = 0; i < 1000000; ++i) {
n = n * param;
}
// Write result to prevent the compiler from optimizing the entire method away
Console.Write(n);
}
static void Benchmark(Action func, string message) {
// warm-up call
func();
var sw = Stopwatch.StartNew();
for (int i = 0; i < 5; ++i) {
func();
}
Console.WriteLine(message + " : {0} ms", sw.ElapsedMilliseconds);
}
}
}
Results:
float32Multiply(0.1f) : 7 ms
float32Multiply(0.9f) : 946 ms
float32Multiply(0.99f) : 8 ms
float32Multiply(0.999f) : 7 ms
float32Multiply(1f) : 7 ms
Why are the results so different for param = 0.9f?
Test parameters: .NET 4.5, Release build, code optimizations ON, x86, no debugger attached.

As others have mentioned, various processors do not support normal-speed calculations when subnormal floating-point values are involved. This is either a design defect (if the behavior impairs your application or is otherwise troublesome) or a feature (if you prefer the cheaper processor or alternative use of silicon that was enabled by not using gates for this work).
It is illuminating to understand why there is a transition at .5:
Suppose you are multiplying by p. Eventually, the value becomes so small that the result is some subnormal value (below 2-126 in 32-bit IEEE binary floating point). Then multiplication becomes slow. As you continue multiplying, the value continues decreasing, and it reaches 2-149, which is the smallest positive number that can be represented. Now, when you multiply by p, the exact result is of course 2-149p, which is between 0 and 2-149, which are the two nearest representable values. The machine must round the result and return one of these two values.
Which one? If p is less than ½, then 2-149p is closer to 0 than to 2-149, so the machine returns 0. Then you are not working with subnormal values anymore, and multiplication is fast again. If p is greater than ½, then 2-149p is closer to 2-149 than to 0, so the machine returns 2-149, and you continue working with subnormal values, and multiplication remains slow. If p is exactly ½, the rounding rules say to use the value that has zero in the low bit of its significand (fraction portion), which is zero (2-149 has 1 in its low bit).
You report that .99f appears fast. This should end with the slow behavior. Perhaps the code you posted is not exactly the code for which you measured fast performance with .99f? Perhaps the starting value or the number of iterations were changed?
There are ways to work around this problem. One is that the hardware has mode settings that specify to change any subnormal values used or obtained to zero, called “denormals as zero” or “flush to zero” modes. I do not use .NET and cannot advise you about how to set these modes in .NET.
Another approach is to add a tiny value each time, such as
n = (n+e) * param;
where e is at least 2-126/param. Note that 2-126/param should be calculated rounded upward, unless you can guarantee that n is large enough that (n+e) * param does not produce a subnormal value. This also presumes n is not negative. The effect of this is to make sure the calculated value is always large enough to be in the normal range, never subnormal.
Adding e in this way of course changes the results. However, if you are, for example, processing audio with some echo effect (or other filter), then the value of e is too small to cause any effects observable by humans listening to the audio. It is likely too small to cause any change in the hardware behavior when producing the audio.

I suspect this has something to do with denormal values (fp values smaller than ~ 1e-38) and the cost associated with processing them.
If you test for denormal values and remove them, sanity is restored.
static void float32Multiply(float param) {
float n = 1000f;
int zeroCount=0;
for (int i = 0; i < 1000000; ++i) {
n = n * param;
if(n<1e-38)n=0;
}
// Write result to prevent the compiler from optimizing the entire method away
Console.Write(n);
}

Generating uniform random integers with a certain maximum

I want to generate uniform integers that satisfy 0 <= result <= maxValue.
I already have a generator that returns uniform values in the full range of the built in unsigned integer types. Let's call the methods for this byte Byte(), ushort UInt16(), uint UInt32() and ulong UInt64(). Assume that the result of these methods is perfectly uniform.
The signature of the methods I want are uint UniformUInt(uint maxValue) and ulong UniformUInt(ulong maxValue).
What I'm looking for:
Correctness
I'd prefer the return values to be distributed in the given interval.
But a very small bias is acceptable if it increases performance significantly. By that I mean a bias of an order that allows distinguisher with probability 2/3 given 2^64 values.
It must work correctly for any maxValue.
Performance
The method should be fast.
Efficiency
The method does consume little raw randomness, since depending on the underlying generator, generating the raw bytes might be costly. Wasting a few bits is fine, but consuming say 128 bits to generate a single number is probably excessive.
It's also possible to cache some left over randomness from the previous call in some member variables.
Be careful with int overflows, and wrapping behavior.
I already have a solution(I'll post it as an answer), but it's a bit ugly for my tastes. So I'd like to get ideas for better solutions.
Suggestions on how to unit test with large maxValues would be nice too, since I can't generate a histogram with 2^64 buckets and 2^74 random values. Another complication is that with certain bugs, only some maxValue distributions are biased a lot, and others only very slightly.

How about something like this as a general-purpose solution? The algorithm is based on that used by Java's nextInt method, rejecting any values that would cause a non-uniform distribution. So long as the output of your UInt32 method is perfectly uniform then this should be too.
uint UniformUInt(uint inclusiveMaxValue)
{
unchecked
{
uint exclusiveMaxValue = inclusiveMaxValue + 1;
// if exclusiveMaxValue is a power of two then we can just use a mask
// also handles the edge case where inclusiveMaxValue is uint.MaxValue
if ((exclusiveMaxValue & (~exclusiveMaxValue + 1)) == exclusiveMaxValue)
return UInt32() & inclusiveMaxValue;
uint bits, val;
do
{
bits = UInt32();
val = bits % exclusiveMaxValue;
// if (bits - val + inclusiveMaxValue) overflows then val has been
// taken from an incomplete chunk at the end of the range of bits
// in that case we reject it and loop again
} while (bits - val + inclusiveMaxValue < inclusiveMaxValue);
return val;
}
}
The rejection process could, theoretically, keep looping forever; in practice the performance should be pretty good. It's difficult to suggest any generally applicable optimisations without knowing (a) the expected usage patterns, and (b) the performance characteristics of your underlying RNG.
For example, if most callers will be specifying a max value <= 255 then it might not make sense to ask for four bytes of randomness every time. On the other hand, the performance benefit of requesting fewer bytes might be outweighed by the additional cost of always checking how many you actually need. (And, of course, once you do have specific information then you can keep optimising and testing until your results are good enough.)

I am not sure, that his is an answer. It definitly needs more space than a comment, so I have to write it here, but I am willing to delete if others think this is stupid.
From the OQ I get, that
Entropy bits are very expensive
Everything else should be considered expensive, but less so than entropy.
My idea is to use binary digits to half, quater ... the maxValue space, until it is reduced to a number. Somthing like
I'l use maxValue=333 (decimal) as an example and assume a function getBit(), that randomly returns 0 or 1
offset:=0
space:=maxValue
while (space>0)
//Right-shift the value, keeping the rightmost bit this should be
//efficient on x86 and x64, if coded in real code, not pseudocode
remains:=space & 1
part:=floor(space/2)
space:=part
//In the 333 example, part is now 166, but 2*166=332 If we were to simply chose one
//half of the space, we would be heavily biased towards the upper half, so in case
//we have a remains, we consume a bit of entropy to decide which half is bigger
if (remains)
if(getBit())
part++;
//Now we decide which half to chose, consuming a bit of entropy
if (getBit())
offset+=part;
//Exit condition: The remeinind number space=0 is guaranteed to be met
//In the 333 example, offset will be 0, 166 or 167, remaining space will be 166
}
randomResult:=offset
getBit() can either come from your entropy source, if it is bit-based, or by consuming n bits of entropy at once on first call (obviously with n being the optimum for your entropy source), and shifting this until empty.

My current solution. A bit ugly for my tastes. It also has two divisions per generated number, which might negatively impact performance (I haven't profiled this part yet).
uint UniformUInt(uint maxResult)
{
uint rand;
uint count = maxResult + 1;
if (maxResult < 0x100)
{
uint usefulCount = (0x100 / count) * count;
do
{
rand = Byte();
} while (rand >= usefulCount);
return rand % count;
}
else if (maxResult < 0x10000)
{
uint usefulCount = (0x10000 / count) * count;
do
{
rand = UInt16();
} while (rand >= usefulCount);
return rand % count;
}
else if (maxResult != uint.MaxValue)
{
uint usefulCount = (uint.MaxValue / count) * count;//reduces upper bound by 1, to avoid long division
do
{
rand = UInt32();
} while (rand >= usefulCount);
return rand % count;
}
else
{
return UInt32();
}
}
ulong UniformUInt(ulong maxResult)
{
if (maxResult < 0x100000000)
return InternalUniformUInt((uint)maxResult);
else if (maxResult < ulong.MaxValue)
{
ulong rand;
ulong count = maxResult + 1;
ulong usefulCount = (ulong.MaxValue / count) * count;//reduces upper bound by 1, since ulong can't represent any more
do
{
rand = UInt64();
} while (rand >= usefulCount);
return rand % count;
}
else
return UInt64();
}

How can I perform division in a program, digit by digit?

I'm messing around with writing a class similar to mpz (C) or BigInteger (Java). This is just for fun, so please don't go on about how I shouldn't be writing my own.
I have a class similar to:
public class HugeInt
{
public List<Integer> digits;
public HugeInt(String value)
{
// convert string value into its seperate digits.
// store them in instance variable above
}
}
Now, doing the add() and subtract() method of this class are pretty simple. Here is an example:
private List<Integer> add(List<Integer> a, List<Integer> b)
{
List<Integer> smallerDigits = (compareDigits(a,b) < 0) ? a : b;
List<Integer> largerDigits = (compareDigits(a,b) >= 0) ? a : b;
List<Integer> result = new ArrayList<Integer>();
int carry = 0;
for(int i = 0; i < largerDigits.size(); i++)
{
int num1 = largerDigits.get(i);
int num2 = (i < smallerDigits.size()) ? smallerDigits.get(i) : 0;
result.add((num1 + num2 + carry) % 10);
carry = ((num1 + num2 + carry) / 10);
}
if (carry != 0) result.add(carry);
return result;
}
Similarly, doing the multiply wasn't that hard either.
I see on wikipedia there is a page on Division Algorithms, but I'm not sure which one is appropriate for what I'm trying to do.
Because these positive integers (represented as digits) can be arbitrarily long, I want to make sure I don't attempt to do any operations on anything other than digit-by-digit basis.
However, can anyone point me in the right direction for doing a division of two numbers that are represented as List<Integer>'s? Also, I can ignore the remainder as this is integer division.

You could just do long division, but this certainly isn't the optimal way to do it (edit: although it seems that something like this is a good way to do it). You could look at other implementations of big integer libraries, and a bit of Googling turns up a fair bit of useful information.

This may be a slight overkill, but if this is the kind of things you do for fun, you'll enjoy reading this:
http://www.fizyka.umk.pl/nrbook/c20-6.pdf
(that's "Arithmetic at Arbitrary Precision" from "Numerical recipes in C").
Pretty fascinating, as is most of this book, with good explanations and lots of code.

Since I assume you're just dealing with integer division it's not very hard. Multiplication is repeated addition, division is the opposite - repeated subtraction. So what you'll do is check how many times you can subtract the divisor from the dividend. For example, 3 can be subtracted from 10 3 times without going <0, so the integer division quotient is 3.

This article A Larger Integer does not show how to implement digit by digit operations for "larger integers", but it does show how to implement a (apparently fully functional) 128 bit integer in terms of two Int64 types. I would imagine that it would not be too hard to extend the approach to use an array of Int64 types to yield an arbitary length integer. I just spent a few minutes looking back over the article and the implementation of multiply looks like it could get pretty involved for arbitrary length.
The article shows how to implement division (quotient and remainder) using binary division.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.