Math.NET CryptoRandomSource Next is Biased

Math.NET CryptoRandomSource Next is Biased - c#

I am developing a gaming platform that is subject to heavy regulatory scrutiny. I chose Math.NET because it seemed like a good fit. However I have just received this comment back from our auditors.
Comments please if this is accurate and how it can be resolved?
In RandomSource(), Next(int, int) is defined as follows:
public override sealed int Next(int minValue, int maxValue)
{
if (minValue > maxValue)
{
throw new ArgumentException(Resources.ArgumentMinValueGreaterThanMaxValue);
}
if (_threadSafe)
{
lock (_lock)
{
return (int)(DoSample()*(maxValue - minValue)) + minValue;
}
}
return (int)(DoSample()*(maxValue - minValue)) + minValue;
}
This creates a bias in the same way as before. Using an un-scaled value from the RNG and multiplying it by the range without previously eliminating the bias (unless the range is a power of 2, there will be a bias ).

Update: The implementation of Next(minInclusive, maxExclusive) has been changed in Math.NET Numerics v3.13 following this discussion. Since v3.13 it is no longer involving floating point numbers, but instead samples integers with as many bits as needed to support the requested range (power of two) and rejects those outside of the actual range. This way it avoids adding any bias on top of the byte sampling itself (as provided e.g. by the crypto RNG)
Assumption: DoSample() returns a uniformly distributed sample in the range [0,1) (double precision floating point number).
Multiplying it with the range R=max-min will result in a uniformly distributed sample in the range [0,R). Casting this to an integer, which is essentially a floor, will result in a uniformly distributed discrete sample of one of 0,1,2,...,R-1. I don't see where the fact that R is even, odd, or a power of two may affect bias in this step.
A few runs to compute 100'000'000 samples also do not indicate obvious bias, but of course this is no proof:
var r = new CryptoRandomSource();
long[] h = new long[8];
for (int i = 0; i < 100000000; i++)
{
h[r.Next(2,7)]++;
}
0
0
19996313
20001286
19998092
19998328
20005981
0
0
0
20000288
20002035
20006269
19994927
19996481
0
0
0
19998296
19997777
20001463
20002759
19999705
0

I've come up with this solution for a value between 0 and max inclusive. I'm no maths expert so comments welcome.
It seems to satisfy the regulatory spec I have which says
2b) If a particular random number selected is outside the range of equal distribution of re-scaling values, it is permissible to discard that random number and select the next in sequence for the purpose of re-scaling."
private readonly CryptoRandomSource _random = new CryptoRandomSource();
private int GetRandomNumber(int max)
{
int number;
var nextPowerOfTwo = (int)Math.Pow(2, Math.Ceiling(Math.Log(max) / Math.Log(2)));
do
{
// Note: 2nd param of Next is an *exclusive* value. Add 1 to satisfy this
number = _random.Next(0, nextPowerOfTwo + 1);
} while (number > max);
return number;
}

Related

Double every time brings different values

It's my generating algorithm it's generating random double elements for the array which sum must be 1
public static double [] GenerateWithSumOfElementsIsOne(int elements)
{
double sum = 1;
double [] arr = new double [elements];
for (int i = 0; i < elements - 1; i++)
{
arr[i] = RandomHelper.GetRandomNumber(0, sum);
sum -= arr[i];
}
arr[elements - 1] = sum;
return arr;
}
And the method helper
public static double GetRandomNumber(double minimum, double maximum)
{
Random random = new Random();
return random.NextDouble() * (maximum - minimum) + minimum;
}
My test cases are:
[Test]
[TestCase(7)]
[TestCase(5)]
[TestCase(4)]
[TestCase(8)]
[TestCase(10)]
[TestCase(50)]
public void GenerateWithSumOfElementsIsOne(int num)
{
Assert.AreEqual(1, RandomArray.GenerateWithSumOfElementsIsOne(num).Sum());
}
And the thing is - when I'm testing it returns every time different value like this cases :
Expected: 1
But was: 0.99999999999999967d
Expected: 1
But was: 0.99999999999999989d
But in the next test, it passes sometimes all of them, sometimes not.
I know that troubles with rounding and ask for some help, dear experts :)

https://en.wikipedia.org/wiki/Floating-point_arithmetic
In computing, floating-point arithmetic is arithmetic using formulaic
representation of real numbers as an approximation so as to support a
trade-off between range and precision. For this reason, floating-point
computation is often found in systems which include very small and
very large real numbers, which require fast processing times. A number
is, in general, represented approximately to a fixed number of
significant digits (the significand) and scaled using an exponent in
some fixed base; the base for the scaling is normally two, ten, or
sixteen.
In short, this is what floats do, they dont hold every single value and do approximate. If you would like more precision try using a Decimal instead, or adding tolerance by an epsilon (an upper bound on the relative error due to rounding in floating point arithmetic)
var ratio = a / b;
var diff = Math.Abs(ratio - 1);
return diff <= epsilon;

Round up errors are frequent in case of floating point types (like Single and Double), e.g. let's compute an easy sum:
// 0.1 + 0.1 + ... + 0.1 = ? (100 times). Is it 0.1 * 100 == 10? No!
Console.WriteLine((Enumerable.Range(1, 100).Sum(i => 0.1)).ToString("R"));
Outcome:
9.99999999999998
That's why when comparing floatinfg point values with == or != add tolerance:
// We have at least 8 correct digits
// i.e. the asbolute value of the (round up) error is less than tolerance
Assert.IsTrue(Math.Abs(RandomArray.GenerateWithSumOfElementsIsOne(num).Sum() - 1.0) < 1e-8);

Getting a 0.5% chance in C#

As I new to C# I don't really want to mess around with the Random() for a long time trying to get what I want, and I also want to know it does what I need it to do without giving it a really long amount of time in testing. How can I get a 0.5% chance of something (1/200) with a random? would this code would? How random is Random really.. this isn't a question of "how random is random" so don't go posting duplicates, but its a question on how I can do this.
My question is not "How random is Random, its if this code is the best way to do the job, and will it achieve what I am trying to achieve."
var random = new Random();
var randomNumber = random.Next(1, 200);
if (randomNumber == 87)
{
// I can put any number inbetween 1 and 200, will this work?
// If we reach this if statement we have got a 0.5 chance?
}

Firstly you should convert your chance into a normalized value between 0.0 and 1.0. This is the mathematical notion of a probability.
For your case, this would give you double probability = 0.005;.
Then you can do the following:
if (rng.NextDouble() < probability)
{
...
This works because Random.NextDouble() returns a random number evenly distributed within the half-open interval [0.0, 1.0) (i.e. up to but not including 1.0.)
So if your probability is 0.0 the body of the if will never be executed, and if your probability is 1.0 then it will always be executed.
The advantage of using a normalised probability is that it works with any probability, and not just with integral probabilities.
If you do happen to have a percentage probability, you convert it to a normalised one very simply - by dividing it by 100.0.
Addendum:
There's little advantage to using Random.Next(int min, int max) instead, because that only works for integral probabilities. And behind the scenes, Random.Next(int min, int max) is implemented like this:
public virtual int Next(int minValue, int maxValue) {
if (minValue>maxValue) {
throw new ArgumentOutOfRangeException("minValue",Environment.GetResourceString("Argument_MinMaxValue", "minValue", "maxValue"));
}
Contract.EndContractBlock();
long range = (long)maxValue-minValue;
if( range <= (long)Int32.MaxValue) {
return ((int)(Sample() * range) + minValue);
}
else {
return (int)((long)(GetSampleForLargeRange() * range) + minValue);
}
}
And NextDouble() is implemented as:
public virtual double NextDouble() {
return Sample();
}
Note that both these implementations call Sample().
Finally I just want to note that the built-in Random class isn't particularly great - it doesn't have a very long period. I use a RNG based on a 128-bit XOR-Shift which is very fast and generates very "good" random numbers.
(I use one based on this XORSHIFT+ generator.)

Generate a Random Int32 for the full range of possible numbers

If I wanted to generate a random number for all possible numbers an Int32 could contain would the following code be a reasonable way of doing so? Is there any reason why it may not be a good idea? (ie. a uniform distribution at least as good as Random.Next() itself anyway)
public static int NextInt(Random Rnd) //-2,147,483,648 to 2,147,483,647
{
int AnInt;
AnInt = Rnd.Next(System.Int32.MinValue, System.Int32.MaxValue);
AnInt += Rnd.Next(2);
return AnInt;
}

You could use Random.NextBytes to obtain 4 bytes, then use BitConverter.ToInt32 to convert those to an int.
Something like:
byte[] buf = new byte[4];
Rnd.NextBytes(buf);
int i = BitConverter.ToInt32(buf,0);

Your proposed solution will slightly skew the distribution. The minValue and maxValue will occur less frequently than the interior values. As an example, assume that int has a MinValue of -2 and a MaxValue of 1. Here are the possible initial values, with each followed by the resulting values after the Random(2):
-2: -2 -1
-1: -1 0
0: 0 1
half of the negative -2 values will get modified up to -1, and only half of 0 will get modified up to 1. So the values -2 and 1 will occur less frequently than -1 and 0.
Damien's solution is good. Another choice would be:
if (Random(2) == 0) {
return Random(int.MinValue, 0);
} else {
return 1 + Random(-1, int.MaxValue);
}
another solution, similar to Damiens approach, and faster than the previous one would be
int i = r.Next(ushort.MinValue, ushort.MaxValue + 1) << 16;
i |= r.Next(ushort.MinValue, ushort.MaxValue + 1);

A uniform distribution does not mean you get each number exactly once. For that you need a permutation
Now, if you need a random permutation of all 4-billion numbers you're a bit stuck. .NET does not allow objects to be larger than 2GBs. You can work around that, but I assume that's not really what you need.
If you less numbers (say, 100, or 5 million, less than a few billions) without repetitions, you should do this:
Maintain a set of integers, starting empty. Choose a random number. If it's already in the set, choose another random number. If it's not in the set, add it and return it.
That way you guarantee each number will be returned only once.

I have a class where I get random bytes into a 8KB buffer and distribute numbers from by converting them from the random bytes. This gives you the full int distribution. The 8KB buffer is used to you do not need to call NextBytes for every new random byte[].
// Get 4 bytes from the random buffer and cast to int (all numbers equally this way
public int GetRandomInt()
{
CheckBuf(sizeof(int));
return BitConverter.ToInt32(_buf, _idx);
}
// Get bytes for your buffer. Both random class and cryptoAPI support this
protected override void GetNewBuf(byte[] buf)
{
_rnd.NextBytes(buf);
}
// cyrptoAPI does better random numbers but is slower
public StrongRandomNumberGenerator()
{
_rnd = new RNGCryptoServiceProvider();
}

Generating uniform random integers with a certain maximum

I want to generate uniform integers that satisfy 0 <= result <= maxValue.
I already have a generator that returns uniform values in the full range of the built in unsigned integer types. Let's call the methods for this byte Byte(), ushort UInt16(), uint UInt32() and ulong UInt64(). Assume that the result of these methods is perfectly uniform.
The signature of the methods I want are uint UniformUInt(uint maxValue) and ulong UniformUInt(ulong maxValue).
What I'm looking for:
Correctness
I'd prefer the return values to be distributed in the given interval.
But a very small bias is acceptable if it increases performance significantly. By that I mean a bias of an order that allows distinguisher with probability 2/3 given 2^64 values.
It must work correctly for any maxValue.
Performance
The method should be fast.
Efficiency
The method does consume little raw randomness, since depending on the underlying generator, generating the raw bytes might be costly. Wasting a few bits is fine, but consuming say 128 bits to generate a single number is probably excessive.
It's also possible to cache some left over randomness from the previous call in some member variables.
Be careful with int overflows, and wrapping behavior.
I already have a solution(I'll post it as an answer), but it's a bit ugly for my tastes. So I'd like to get ideas for better solutions.
Suggestions on how to unit test with large maxValues would be nice too, since I can't generate a histogram with 2^64 buckets and 2^74 random values. Another complication is that with certain bugs, only some maxValue distributions are biased a lot, and others only very slightly.

How about something like this as a general-purpose solution? The algorithm is based on that used by Java's nextInt method, rejecting any values that would cause a non-uniform distribution. So long as the output of your UInt32 method is perfectly uniform then this should be too.
uint UniformUInt(uint inclusiveMaxValue)
{
unchecked
{
uint exclusiveMaxValue = inclusiveMaxValue + 1;
// if exclusiveMaxValue is a power of two then we can just use a mask
// also handles the edge case where inclusiveMaxValue is uint.MaxValue
if ((exclusiveMaxValue & (~exclusiveMaxValue + 1)) == exclusiveMaxValue)
return UInt32() & inclusiveMaxValue;
uint bits, val;
do
{
bits = UInt32();
val = bits % exclusiveMaxValue;
// if (bits - val + inclusiveMaxValue) overflows then val has been
// taken from an incomplete chunk at the end of the range of bits
// in that case we reject it and loop again
} while (bits - val + inclusiveMaxValue < inclusiveMaxValue);
return val;
}
}
The rejection process could, theoretically, keep looping forever; in practice the performance should be pretty good. It's difficult to suggest any generally applicable optimisations without knowing (a) the expected usage patterns, and (b) the performance characteristics of your underlying RNG.
For example, if most callers will be specifying a max value <= 255 then it might not make sense to ask for four bytes of randomness every time. On the other hand, the performance benefit of requesting fewer bytes might be outweighed by the additional cost of always checking how many you actually need. (And, of course, once you do have specific information then you can keep optimising and testing until your results are good enough.)

I am not sure, that his is an answer. It definitly needs more space than a comment, so I have to write it here, but I am willing to delete if others think this is stupid.
From the OQ I get, that
Entropy bits are very expensive
Everything else should be considered expensive, but less so than entropy.
My idea is to use binary digits to half, quater ... the maxValue space, until it is reduced to a number. Somthing like
I'l use maxValue=333 (decimal) as an example and assume a function getBit(), that randomly returns 0 or 1
offset:=0
space:=maxValue
while (space>0)
//Right-shift the value, keeping the rightmost bit this should be
//efficient on x86 and x64, if coded in real code, not pseudocode
remains:=space & 1
part:=floor(space/2)
space:=part
//In the 333 example, part is now 166, but 2*166=332 If we were to simply chose one
//half of the space, we would be heavily biased towards the upper half, so in case
//we have a remains, we consume a bit of entropy to decide which half is bigger
if (remains)
if(getBit())
part++;
//Now we decide which half to chose, consuming a bit of entropy
if (getBit())
offset+=part;
//Exit condition: The remeinind number space=0 is guaranteed to be met
//In the 333 example, offset will be 0, 166 or 167, remaining space will be 166
}
randomResult:=offset
getBit() can either come from your entropy source, if it is bit-based, or by consuming n bits of entropy at once on first call (obviously with n being the optimum for your entropy source), and shifting this until empty.

My current solution. A bit ugly for my tastes. It also has two divisions per generated number, which might negatively impact performance (I haven't profiled this part yet).
uint UniformUInt(uint maxResult)
{
uint rand;
uint count = maxResult + 1;
if (maxResult < 0x100)
{
uint usefulCount = (0x100 / count) * count;
do
{
rand = Byte();
} while (rand >= usefulCount);
return rand % count;
}
else if (maxResult < 0x10000)
{
uint usefulCount = (0x10000 / count) * count;
do
{
rand = UInt16();
} while (rand >= usefulCount);
return rand % count;
}
else if (maxResult != uint.MaxValue)
{
uint usefulCount = (uint.MaxValue / count) * count;//reduces upper bound by 1, to avoid long division
do
{
rand = UInt32();
} while (rand >= usefulCount);
return rand % count;
}
else
{
return UInt32();
}
}
ulong UniformUInt(ulong maxResult)
{
if (maxResult < 0x100000000)
return InternalUniformUInt((uint)maxResult);
else if (maxResult < ulong.MaxValue)
{
ulong rand;
ulong count = maxResult + 1;
ulong usefulCount = (ulong.MaxValue / count) * count;//reduces upper bound by 1, since ulong can't represent any more
do
{
rand = UInt64();
} while (rand >= usefulCount);
return rand % count;
}
else
return UInt64();
}

Generation of (pseudo) random constrained values of (U)Int64 and Decimal

Note: For brevity's sake, the following will not discern between randomness and pseudo-randomness. Also, in this context, constrained means between given min and max values)
The System.Random class provides random generation of integers, doubles and byte arrays.
Using Random.Next, one can easily generate random constrained values of type Boolean, Char, (S)Byte, (U)Int16, (U)Int32. Using Random.NextDouble(), one can similarly generate constrained values of types Double and Single (as far as my understanding of this type goes). Random string generation (of a given length and alphabet) has also been tackled before.
Consider the remaining primitive data types (excluding Object): Decimal and (U)Int64. Their random generation has been tackled as well (Decimal, (U)Int64 using Random.NextBytes()), but not when constrained. Rejection sampling (i.e. looping until the generated value is the desired range) could theoretically be used, but it is obviously not a practical solution. Normalizing NextDouble() won't work because it doesn't have enough significant digits.
In short, I am asking for the proper implementation of the following functions:
long NextLong(long min, long max)
long NextDecimal(decimal min, decimal max)
Note that, since System.DateTime is based on a ulong, the first function would allow for random constrained generation of such structs as well (similar to here, only in ticks instead of minutes).

This should do it. For decimal I utilized Jon Skeet's initial approach to generating random decimals (no constraints). For long I provided a method to produced random non-negative longs which is then used to create the a value in the random range.
Note that for decimal the resulting distribution is not a uniform distribution on [minValue, maxValue]. It merely is uniform on all the bit representations of decimals that fall in the range [minValue, maxValue]. I do not see an easy way around this without using rejection sampling.
For long the resulting distribution is uniform on [minValue, maxValue).
static class RandomExtensions {
static int NextInt32(this Random rg) {
unchecked {
int firstBits = rg.Next(0, 1 << 4) << 28;
int lastBits = rg.Next(0, 1 << 28);
return firstBits | lastBits;
}
}
public static decimal NextDecimal(this Random rg) {
bool sign = rg.Next(2) == 1;
return rg.NextDecimal(sign);
}
static decimal NextDecimal(this Random rg, bool sign) {
byte scale = (byte)rg.Next(29);
return new decimal(rg.NextInt32(),
rg.NextInt32(),
rg.NextInt32(),
sign,
scale);
}
static decimal NextNonNegativeDecimal(this Random rg) {
return rg.NextDecimal(false);
}
public static decimal NextDecimal(this Random rg, decimal maxValue) {
return (rg.NextNonNegativeDecimal() / Decimal.MaxValue) * maxValue; ;
}
public static decimal NextDecimal(this Random rg, decimal minValue, decimal maxValue) {
if (minValue >= maxValue) {
throw new InvalidOperationException();
}
decimal range = maxValue - minValue;
return rg.NextDecimal(range) + minValue;
}
static long NextNonNegativeLong(this Random rg) {
byte[] bytes = new byte[sizeof(long)];
rg.NextBytes(bytes);
// strip out the sign bit
bytes[7] = (byte)(bytes[7] & 0x7f);
return BitConverter.ToInt64(bytes, 0);
}
public static long NextLong(this Random rg, long maxValue) {
return (long)((rg.NextNonNegativeLong() / (double)Int64.MaxValue) * maxValue);
}
public static long NextLong(this Random rg, long minValue, long maxValue) {
if (minValue >= maxValue) {
throw new InvalidOperationException();
}
long range = maxValue - minValue;
return rg.NextLong(range) + minValue;
}
}

Let's assume you know how to generate N random bits. This is pretty easily done either using NextBytes or repeated calls to Random.Next with appropriate limits.
To generate a long/ulong in the right range, work out how large the range is, and how many bits it takes to represent it. You can then use rejection sampling which will at worst reject half the generated values (e.g. if you want a value in the range [0, 128], which means you'll generate [0, 255] multiple times). If you want a non-zero based range, just work out the size of the range, generate a random value in [0, size) and then add the base.
Generating a random decimal is signficantly harder, I believe - aside from anything else, you'd have to specify the distribution you wanted.

I came here looking for a way to generate 64 bit values within an arbitrary range. The other answers failed to produce a random number when given certain ranges (e.g. long.MinValue to long.MaxValue). Here's my version that seems to solve the problem:
public static long NextInt64(this Random random, long minValue, long maxValue)
{
Contract.Requires(random != null);
Contract.Requires(minValue <= maxValue);
Contract.Ensures(Contract.Result<long>() >= minValue &&
Contract.Result<long>() < maxValue);
return (long)(minValue + (random.NextUInt64() % ((decimal)maxValue - minValue)));
}
It uses the following extension Methods:
public static ulong NextUInt64(this Random random)
{
Contract.Requires(random != null);
return BitConverter.ToUInt64(random.NextBytes(8), 0);
}
public static byte[] NextBytes(this Random random, int byteCount)
{
Contract.Requires(random != null);
Contract.Requires(byteCount > 0);
Contract.Ensures(Contract.Result<byte[]>() != null &&
Contract.Result<byte[]>().Length == byteCount);
var buffer = new byte[byteCount];
random.NextBytes(buffer);
return buffer;
}
The distribution is not perfectly even when the size of the requested range is not a clean divisor of 2^64, but it at least provides a random number within the request range for any given range.

Based upon Jon Skeet's method, here's my stab at it:
public static long NextLong(this Random rnd, long min, long max)
{
if (max <= min)
{
throw new Exception("Min must be less than max.");
}
long dif = max - min;
var bytes = new byte[8];
rnd.NextBytes(bytes);
bytes[7] &= 0x7f; //strip sign bit
long posNum = BitConverter.ToInt64(bytes, 0);
while (posNum > dif)
{
posNum >>= 1;
}
return min + posNum;
}
Let me know if you see any errors.

long posNum = BitConverter.ToInt64(Guid.NewGuid().ToByteArray(), 0);
use this instead of NextBytes

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.