Random number in range with equal probability - c#

This might be more Math related than C#, but I need a C# solution so I'm putting it here.
My question is about the probability of random number generators, more specifically if each possible value is returned with an equal probability.
I know there is the Random.Next(int, int) method which returns a number between the first integer and last (with the last being exclusive).
Random.Next() [without overloads] will return a value between 0 and Int32.MaxValue (which is 2147483647) - 1, so 2147483646.
If I want a value between 1 and 10, I could call Random.Next(1, 11) to do this, however does every value between 1 and 10 have an equal probability of occuring?
For example, the range is 10, so 2147483646 is not perfectly divisible by 10, so the values 1-6 have a slightly higher probability of occuring (because 2147483646 % 10 = 6). This is of course assuming that every value within Random.Next() [without overloads] returns a value between 0 and 2147483646 with equal probability.
How would one ensure that every number within a range has an equal probability of occuring? Let's say for a lottery type system where it would be unfair for some people to have a higher probility than others, I'm not saying I would use the C# built in RNG for this, I was just using it as an example.

I note that no one actually answered the meaty question in your post:
For example, the range is 10, so 2147483646 is not perfectly divisible by 10, so the values 1-6 have a slightly higher probability of occuring (because 2147483646 % 10 = 6). This is of course assuming that every value within Random.Next() [without overloads] returns a value between 0 and 2147483646 with equal probability.
How would one ensure that every number within a range has an equal probability of occuring?
Right, so you just throw out the values that cause the imbalance. For example, let's say that you had a RNG that could produce a uniform distribution over { 0, 1, 2, 3, 4 }, and you wanted to use it to produce a uniform distribution over { 0, 1 }. The naive implementation is: draw from {0, 1, 2, 3, 4} and then return the value % 2; this, however, would obviously produce a biased sample. This happens because, as you note, 5 (the number of items) is not evenly divisible by 2. So, instead, throw any draws that produce the value 4. Thus, the algorithm would be
draw from { 0, 1, 2, 3, 4 }
if the value is 4, throw it out
otherwise, return the value % 2
You can use this basic idea to solve the general problem.
however does every value between 1 and 10 have an equal probability of occuring?
Yes, it does. From MSDN:
Pseudo-random numbers are chosen with equal probability from a finite set of numbers.
Edit: Apparently the documentation is NOT consistent with the current implementation in .NET. The documentation states the draws are uniform, but the code suggests that it is not. However, that does NOT negate the fact that this is a soluble problem, and my approach is one way to solve it.

The C# built in RNG is, as you expect, a uniformly distributed one. Every number has an equal likelihood of occurring given the range you specify for Next(min, max).
You can test this yourself (I have) by taking, say, 1M samples and storing how many times each number actually appears. You'll get an almost flat-line curve if you graph it.
Also note that, each number having an equal likelihood doesn't mean that each number will occur the same amount of times. If you're looking at random numbers from 1 to 10, in 100 iterations, it won't be an even distribution of 10x occurrence for each number. Some numbers may occur 8 times, and others 12 or 13 times. However, with more iterations, this tends to even out somewhat.
Also, since it's mentioned in the comments, I'll add: if you want something stronger, look up cryptographic PRNGs. Mersenne Twister is particularly good from what I've seen (fast, cheap to compute, huge period) and it has open-source implementations in C#.

Test program:
var a = new int[10];
var r = new Random();
for (int i = 0; i < 1000000; i++) a[r.Next(1, 11) - 1]++;
for (int i = 0; i < a.Length; i++) Console.WriteLine("{0,2}{1,10}", i + 1, a[i]);
Output:
1 99924
2 100199
3 100568
4 100406
5 100114
6 99418
7 99759
8 99573
9 100121
10 99918
Conclusion:
Each value is returned with an equal probability.

Ashes and dtb are incorrect: You are right to suspect that some numbers would have a greater chance of occurring than others.
When you call .Next(x, y), there are y - x possible return values. The .NET 4.0 Random class calculates a return value based on the return value of NextDouble() (this is a slightly simplified description).
Obviously, the set of possible double values is finite, and, as you note, it may not be a multiple of the size of the set of possible return values of .Next(x, y). Therefore, assuming that the set of input values is uniformly distributed, some output values will have a slightly greater probability of occurring.
I don't know off hand how many numeric double values there are (i.e., excluding infinity and NaN values), but it is certainly larger than 2^32. In your case, if we assume 2^32 values, for the sake of argument, then we have to map 4294967296 inputs to 10 outputs. Some values would have a 429496730 / 429496729 greater probability of occurring, or 0.00000023283064397913028110629 percent greater. In fact, since the number of input states is greater than 2^32, the difference in probability would be even smaller.

Related

calculate sum of numbers closest to a given number

I want to find out what is the best way to do this in C#:
I have a array of lets say 20 numbers, and then one more additional variable.
I want to get the sum of the numbers which is closest to the given variable.
Lets say, I have 1.1, 1.5, 1.7, 1.9, 2.2, 3.1, 3.2, 1,5, 4.5, 4.1. And then the additional variable has value of 5.
I want to get the sum of some numbers in the array which will be closest to the given number, and once I'll get that number, remove those numbers from the list and add them to a new array.
Every comment is welcomed.
Thanks
You are describing the optimization problem for Subset Sum Problem.
The problem is NP-Complete, so there is no known polynomial solution to it.
However, since the input is fairly small scale - an exponential solution of checking all subsets is feasible, since there are only 2^20 ~= 1000000 (a bit more, actually, but close enough for estimating run time)
Pseudo code should be something like:
getClosestSum(list,sum,number):
if (list is empty):
return sum
candidate1 <- getClosest(list[1:],sum,number)
candidate2 <- getClosest(list[1:],sum+list[0],number)
if (abs(number-candidate1) < abs(number-candidate2)):
return candidate1
else:
return candidate2

find a unique output based on two inputs? [duplicate]

This question already has answers here:
Mapping two integers to one, in a unique and deterministic way
(19 answers)
Closed 7 years ago.
I need to find a way, such that user has to input 2 numbers (int) and for every different value a single output (int preferably!) is returned.
Say user enters 6, 8 it returns k when user enter anything else like 6,7 or 9,8 or any other input m, n except for 6, 8 (even if only one input is changed) a completely different output is produced. But the thing is, it should be unique for only that m, n so I cant use something like m*n because 6 X 4 = 24 but also, 12 X 2 = 24 so the output is not unique, so I need to find a way where for every different input, there is a totally different output that is not repeated for any other value.
EDIT: In response to Nicolas: the input range can be anything but will be less then 1000 (but more then 1 of course!)
EDIT 2: In response to Rawling, I can use long (Int64) but not preferably use float or doulbe, becuase this output will be used in a for loop, and float and double are terrible for for loop, you can check it here
Since your two numbers are less than 1000, you can do k = (1000 * x1) + x2 to get a unique answer. The maximum value would be 999999, which is well within the range of a 32-bit int.
You can always return a long: from two integers a and b, return 2^|INT_SIZE|*a + b
It is easy to see from pigeonhole principle, that given two ints, one cannot return a unique int for every different input. Explanation: If you have 2 numbers, each containing n bits, then there are 2^n possibilities for each number, and thus there are (2^n)^2 possible pairs, so from piegeonhole principle - you need at least lg_2((2^n)^2) = 2n bits to represent them,
EDIT: Your edit mentions the range of your numbers is [1,1000] - thus the same idea can be applied: 1000*a + b will generate a unique int for each pairs.
Note that for the same reasons, the range of the resulting integer must be [1,1000000] - or you will get clashes.
Because I don't have 50 posts to comment, I must say, there are functions
called Pairing Functions.
Pairing functions such as Cantor's Pairing Function(Shown on the previous link) and Szudzik's Pairing Function which allows the inputs to be infinite and still be able to provide an unique and deterministic output.
Here is another similar question on stackoverflow. (Great, I need 10 reputation to post more than two links..)
(http://) stackoverflow.com/questions/919612/mapping-two-integers-to-one-in-a-unique-and-deterministic-way
EDIT: I'm late.
If you didn't have a hard upper bound, you could do the following:
int Unique (int x, int y)
{
int n = x + y;
int t = (n%2==0) ? ((n/2) * (n+1)) : (n * ((n+1)/2));
return t + x;
}
Mathematically speaking, this will return a unique non negative integer for each (non-negative) pair of integers with no upper bound.
Programatically speaking, it will run into overflow problems, which could be overcome by using long instead of int for everything except the input variables.
The canonical mathematical solution is to use prime powers. As every number can be decomposed uniquely into its prime factors, returning 2^n * 3^m will give you different results for every n and m.
This can be extended to 2^n * 3^m * 5^a * 7^b *11^c and so on; you only need to check that you do not run out of 32-bit integers. If there is a risk of overflowing, you can take the remainder after dividing by a prime larger than your input range, and you will still have uniqueness.

Unexpected Behavior of Math.Floor(double) and Math.Ceiling(double)

This question is about the threshold at which Math.Floor(double) and Math.Ceiling(double) decide to give you the previous or next integer value. I was disturbed to find that the threshold seems to have nothing to do with Double.Epsilon, which is the smallest value that can be represented with a double. For example:
double x = 3.0;
Console.WriteLine( Math.Floor( x - Double.Epsilon ) ); // expected 2, got 3
Console.WriteLine( Math.Ceiling( x + Double.Epsilon) ); // expected 4, got 3
Even multiplying Double.Epsilon by a fair bit didn't do the trick:
Console.WriteLine( Math.Floor( x - Double.Epsilon*1000 ) ); // expected 2, got 3
Console.WriteLine( Math.Ceiling( x + Double.Epsilon*1000) ); // expected 4, got 3
With some experimentation, I was able to determine that the threshold is somewhere around 2.2E-16, which is very small, but VASTLY bigger than Double.Epsilon.
The reason this question came up is that I was trying to calculate the number of digits in a number with the formula var digits = Math.Floor( Math.Log( n, 10 ) ) + 1. This formula doesn't work for n=1000 (which I stumbled on completely by accident) because Math.Log( 1000, 10 ) returns a number that's 4.44E-16 off its actual value. (I later found that the built-in Math.Log10(double) provides much more accurate results.)
Shouldn't the threshold should be tied to Double.Epsilon or, if not, shouldn't the threshold be documented (I couldn't find any mention of this in the official MSDN documentation)?
Shouldn't the threshold should be tied to Double.Epsilon
No.
The representable doubles are not uniformly distributed over the real numbers. Close to zero there are many representable values. But the further from zero you get, the further apart representable doubles are. For very large numbers even adding 1 to a double will not give you a new value.
Therefore the threshold you are looking for depends on how large your number is. It is not a constant.
The value of Double.Epsilon is 4.94065645841247e-324. Adding or subtracting this value to 3 results in 3, due to the way floating-point works.
A double has 53 bits of mantissa, so the smallest value you can add that will have any impact will be approximately 2^53 time smaller than your variable. So something around 1e-16 sounds about right (order of magnitude).
So to answer your question: there is no "threshold"; floor and ceil simply act on their argument in exactly the way you would expect.
This is going to be hand-waving rather than references to specifications, but I hope my "intuitive explanation" suits you well.
Epsilon represents the smallest magnitude that can be represented, that is different from zero. Considering the mantissa and exponent of a double, that's going to be extremely tiny -- think 10^-324. There's over three hundred zeros between the decimal point and the first non-zero digit.
However, a Double represents roughly 14-15 digits of precision. That still leaves 310 digits of zeros between Epsilon and and integers.
Doubles are fixed to a certain bit length. If you really want arbitrary precision calculations, you should use an arbitrary-precision library instead. And be prepared for it to be significantly slower -- representing all 325 digits that would be necessary to store a number such as 2+epsilon will require roughly 75 times more storage per number. That storage isn't free and calculating with it certainly cannot go at full CPU speed.

Centering Divisions Around Zero

I'm trying to create something that sort of resembles a histogram. I'm trying to create buckets from an array.
Suppose I have a random array doubles between -10 and 10; this is very simplified. I then want to specify a center point, in this case 0 and the number of buckets.
If I want 4 buckets the division would be -10 to -5, -5 to 0, 0 to 5 and 5 to 10. Not that complicated right. Now if I change the min and max to -12 and -9 and as for 4 divisions its more complicated. I either want a division at -3 and 3; it is centered around 0 ; or one at -6 to 0 and 0 to 6.
Its not that hard to find the division size
= Math.Ceiling((Abs(Max) + Abs(Min)) / Divisions)
Then you would basically have an if statement to determine whether you want it centered on 0 or on an edge. You then iterate out from either 0 or DivisionSize/2 depending on the situation. You may not ALWAYS end up with the specified number of divisions but it will be close. Then you iterate through the array and increment the bin count.
Does this seem like a good way to go about this? This method would surely work but it does not seem to be the most elegant. I'm curious as to whether the creation of the bins and the counting from the list could be done in a clever class with linq in a more elegant way?
Something like creating the bins and then having each bin be a property {get;} that returns list.Count(x=> x >= Lower && x < Upper).
To me it seems simpler: You need to find lower bound and size of each "division".
Since you want it to be symmetrical around 0 depending on number of divisions you either get one that includes 0 for odd numbers (-3,3) or around 0 for even ones (-3,0)(0,3)
lowerBound = - Max(Abs(from), Abs(to))
bucketSize = 2 * lowerBound / divisions
(throw in Ceiling and update bucketSize and lowerBound if needed)
Than use .Aggregate to update array of buckets (position would be (value-lowerBound)/devisions, with additional range checks if needed).
Note: do not implement get the way you suggested - it is not expected for getters to perfomr non-trivial work like walking large array.

Finding 2 or more numbers having the given number as GCF

I don't want to find the GCF of given numbers. I use Euclidean for that. I want to generate a series of numbers having a given GCF. For example if I choose 4, I should get something like 100, 72 or 4, 8 etc.,
Any pointers would be appreciated.
A series of pairs of numbers having N as a GCF is {N,N}, {N,2N}, {N,3N}, ....
In fact, any set consisting of N and 1 or more multiples of N has N as its GCF.
1.Maybe this question can be better answered at http://math.stackexchange.com
2.Just construct the numbers you are interested in by multiplying the numbers that are not factors of the GCD. for your example of given GCD=4 that means
$k_1=4$ the GCD itself
$k_2=4 * 2$ since 4 does not divide 2
$k_3=4 * 3$ since 4 does not divide 3
$not k_4=4 * 4$ since 4 divides 4 but
$ k_4=4 * 5$ since 4 does not divide 5 etc.
If 4 is the input, you want a list of numbers whose greatest common factor is 4. You can ensure this by making 4 the only factor in the entire series. Therefore, you multiply the number (4) by all primes to ensure that.
prime-list = 3, 5, 7, 11, 13, 17
gcf-list for 4 -> (3*4)12, (4*5)20, (4*7)28, (4*11)44, (4*13)52, (4*17)68, ...
This will give you a list such that the GCF of any two numbers is 4
Choose a set of numbers that are pairwise-independent (that is gcd(x,y) = 1 for every x<>y in the set). Multiply each number by your target GCD.
I realize that this is an old question but I am going to provide my own answer along with an explanation of how I got there. First, let's call the GCF n.
Initially I would have suggested doing something like picking random integers and multiplying them each by n to get the set of numbers, this would of course give you numbers evenly divisible by n but not necessarily numbers with a GCF of n. If the integers happened to all have a GCF other than '1' then the GCF of the resulting set would actually have a GCF of n times that number, not n. That being said multiplying n by a set of integers seems the best way of ensuring that each number in the set is at least divisible by n
One option would be to make one of those numbers 1 but that would reduce the randomness of the set as n would always be in the resulting set.
Next you could use some prime numbers and multiply them by n but that would also reduce randomness as there would be less possible numbers, and the numbers don't actually need to be prime, just co-prime (GCF = 1 for the entire set)
You could also pick a set of numbers where each pair of numbers were co-prime but again, the entire set needs to be co-prime not co-prime pair-wise (which would be pretty processor intensive with larger sets).
So if you are going for fairly random numbers I would start by determining how many numbers you want in the set (whether that is randomly determined or predetermined) and then generating one less than that number completely 'randomly'. I would then compute the common prime factors for those numbers and then pick a random number that does not have any of those prime factors. Merely ensuring it does not have the same GCF is not sufficient as the GCF could have common factors to the final number. It only requires one number in a set that does not have any of the same prime factors as the other numbers in the set to make the GCF of that set '1'. I would then take that set of numbers and multiply each by n to get the set of numbers you want.

Categories

Resources