Random Probability Selection - c#

Say I have 10 prizes to give to 100 people. Each person gets a shot, one at a time. So if the first person fails to win a prize, the probability goes up, 10 in 99, and so one... Also all 10 prizes MUST go.
What would be the best way to write this in such a way that by the end if there is still a prize left, that person would have a 1 in 1 chance to get a prize...
What I was thinking like this:
int playersLeft = 100
int winners = 0
while (winners < 10)
winners += (random.Next(playersLeft--)<(10-winners)) ? 1 : 0;
I wanted to know if there was a better or more straight forward way to do it. I know it seems simple but this simple task is part of a very important aspect of the app and it must be right.
TO CLARIFY: Why I want to do something like this:
In reality there is an unlimited number of players, each with an X in Y probability to win, say 10/100 = 10%. However if I leave it to the random number generator, there is a chance that in 100 players, only 9 would win, or worst, 11. In my app, I must assure that no more and no less than 10 players for every 100 will win.

Should every person have equal chances of winning? In that case why not just select randomly 10 distinct numbers 1-100 and then pretend to do it in order?
var winners = new HashSet<int>();
while(winners.Count < 10)
{
var number = random.Next(100);
if(!winners.Contains(number)) winners.Add(number);
}
for(i = 0; i < 100; i++)
{
if(winners.Contains(i)) Console.WriteLine("{0} won!!!", i);
else Console.WriteLine("{0} didn't win, sorry...", i);
}

I have thought about this some more and have come up with the following. We can give the first guy a fair shot at winning and then if the rest of the rewards are distributed fairly among the rest of the people (no matter if he wins or loses) the whole thing will be fair. Of course that's far from formal proof, so feel free to correct me. The following should give a fair system:
int prizes = 10;
for(int i = 100; i >= 1; i++)
{
var result = random.Next(people);
if(result < prizes)
{
Console.WriteLine("{0} won", i);
prizes--;
}
}
Edit: Proof this works:
The first person trivially has n/k chance of winning (n being the number of prizes, k being the number of people.
Let's assume we distribute the remaining prizes fairly among the rest of the people. In that case they will have with probability n/k, n-1 prizes distributed between them and with probability (k-n)/k, n prizes. That adds up to (n*(n-1))/k + (n*(k-n))/k = n*(k-1)/k on average which is their fair share of the prizes.
We use the same method to either distribute n-1 or n prizes among the k-1 people. Q.E.D.

This will give you the behavior of forcing the probability of a winner to go to 1.0 as the number of people shrinks. However, as #obrok pointed out, the probability of a person winning a prize depends on their rank in the list of 100 people.
This is actually the same algorithm that is used for "N choose K" subset selection. http://mcherm.com/permalinks/1/a-random-selection-algorithm
int prizes = 10;
int people = 100;
while ( prizes > 0 ) {
double probOfWin = (double) prizes / people;
if ( random.NextDouble() <= probOfWin ) {
prizes--;
}
people--;
}

The perfectly fair way to do is to generate a random number from 1 to (100! / (90! * 10!)) (since this is the number of possible combinations of prizewinners) and use that to award the prizes.
However it's easier to use some multiple of that number, such as the number of permutations of prizewinners, which is (100! / 90!). One way of doing this is to populate an array of 100 integers but remove the winning integer from the array each time (swapping it with the last non-winning integer is the easiest way to achieve this).
Your algorithm effectively requires randomness of 100! so it is much less efficient, although I believe it is still perfectly fair.

Related

Stat issue - comparing list against itself

This one is for you CompSci or stats people. Can you please tell me, if theList contains 72,786 "things," what the value of compareCount will be at the end of the loops? I'm thinking it's 72,786^2-1 but it's been soo long since this old brain worked like that. Much obliged for your time and assistance!
List<thing> theList = new List<thing>();//list contains 73,786 "things"
private void compare()
{
int compareCount = 0;
for(int i = 0; i < theList.Count-1; i++)
{
for(int comp = i + 1; comp < theList.Count; comp++)
{
compare(theList[i], theList[comp]);
compareCount++;
}
}
}
The compareCount in your code will have the value (72786^2 - 72786) / 2 = 2648864505. I've confirmed that by running it. As it is written now, there is no need to have the call compare(theList[i], theList[comp]) in the inner loop (as it doesn't influence the count in any way).
Here's how I remember the (n^2 - n)/2 formula: a round robin tournament with n players, each of them meeting all other players exactly once.
The match plan is a square with n rows and columns (n * n = n^2 combinations). Since a player doesn't play against himself the n matches on the diagonal from upper left to lower right must be subtracted (n^2 - nmatches left now). Pairings of player A against player B in the triangle above the diagonal are the same as pairings of B against A in the triangle below (there are (n^2 - n)/2 of such pairings). Subtracting this number from n^2 - n gives the final result of (n^2 - n)/2 possible matches.

True random is not true random I am really confused

Alright I tested the way below.
Generated x times random numbers between 0~x and then checked the ones that were not generated.
I would assume that it would be very close to 100%. What I mean is all numbers between 0~x are generated.
But results are shocking. About 36% of the numbers are missing.
Is my random function not really random?
Here below my random class:
private static Random seedGenerator = new Random();
private static ThreadLocal<Random> random =
new ThreadLocal<Random>(SeededRandomFactory);
private static Random SeededRandomFactory()
{
lock (seedGenerator)
return new Random(seedGenerator.Next());
}
public static int GenerateRandomValueMin(int irRandValRange, int irMinValue)
{
return random.Value.Next(irMinValue, irMinValue + irRandValRange);
}
Here the below results:
Between 0-10, missing numbers count: 4, percent: 40%
Between 0-100, missing numbers count: 36, percent: 36%
Between 0-1000, missing numbers count: 369, percent: 36,9%
Between 0-10000, missing numbers count: 3674, percent: 36,74%
Between 0-100000, missing numbers count: 36583, percent: 36,58%
Between 0-1000000, missing numbers count: 367900, percent: 36,79%
Between 0-10000000, missing numbers count: 3678122, percent: 36,78%
Between 0-100000000, missing numbers count: 36797477, percent: 36,8%
Here the code how I check:
File.WriteAllText("results.txt", "");
int irFirst = 10;
for (int i = 0; i < 8; i++)
{
HashSet<int> hsGenerated = new HashSet<int>();
for (int k = 0; k < irFirst; k++)
{
hsGenerated.Add(GenerateRandomValue.GenerateRandomValueMin(irFirst, 0));
}
int irNotFound = 0;
for (int k = 0; k < irFirst; k++)
{
if (hsGenerated.Contains(k) == false)
irNotFound++;
}
string srSonuc =
string.Format(
"Between 0-{0}, missing numbers count: {1}, percent: {2}%",
irFirst, irNotFound,
Math.Round((Convert.ToDouble(irNotFound)/Convert.ToDouble(irFirst))*100.0, 2).ToString()
);
using (StreamWriter w = File.AppendText("sonuclar.txt"))
{
w.WriteLine(srSonuc);
}
irFirst = irFirst * 10;
}
As mentioned in the comments, your testing method is off.
You draw x times a number between 0 and x. The probability that a specific number is not drawn is:
As x approaches infinity, p will go towards 1/e (or approx. 36.7879441%) And this is the number you are seeing in your results.
Also, as x approaches infinity, you will observe this probability as outcome of your sample (Law of large numbers)
This has to do with probability. When you have a bowl with a red and a white marble. And you take one, put it back an take another one you cannot guarantee that you see both. You could take the red one twice. You are doing the same thing with more objects.
To elaborate on true true randomness:
I would expect close to 99% percent instead of 64%. Or at least 90%+ percent. So you say this isn't possible with current technology
That is simple. Thanks to modern math, technology and my super powers I can tell you how to do that: You need more draws than numbers to choose from. The formula becomes:
where n is you desired percentage of missing numbers. For example if you are willing to accept 5% numbers missing, you must draw three times as many random numbers. For a 1% chance, you need to iterate 4.6 times the maximum number.
This math assumes a perfectly uniform random number generation.
Your results are exactly what is to be expected from a uniform distribution where you sample with replacement.
Consider the simplest possible example. You have a coin and toss it twice. So we assume that we are sampling from a uniform discrete distribution.
The possible outcomes, which occur with equal probability of 0.25 are:
TT
TH
HT
HH
As you can see, only two of the four outcomes have both heads and tails.
This is known as sampling with replacement. So, once we have sampled a tails, then we "put it back in the bag", and it could come out again on the next sample.
Now suppose we sample without replacement. In that case there are two possible outcomes:
TH
HT
And as you see, each possible value appears exactly once.
Essentially your expectation for the results is not correct. As another example, suppose you toss a coin and it comes down tails. What do you expect will happen on the next toss. You are arguing that the coin must now come down heads. But that is clearly nonsense.
If you did want to sample without replacement, and it's not clear that's really what you want, then you do so with the Fisher-Yates shuffle.
If you want 100 million unique random number you could do something like this:
Now using Fisher-Yates suffle algorithm:
List<int> numbers = new List<int>(100000000);
for (int i = 0; i < numbers.Capacity; i++)
{
int rnd = random.Next(numbers.Count + 1);
if (rnd == numbers.Count)
numbers.Add(i);
else
{
numbers.Add(numbers[rnd]);
numbers[rnd] = i;
}
}
By the way you could calculate irNotFound much faster:
int irNotFound = irFirst - hsGenerated.Count;
Good luck with your quest.

Efficient algorithm to get primes between two large numbers

I'm a beginner in C#, I'm trying to write an application to get primes between two numbers entered by the user. The problem is: At large numbers (valid numbers are in the range from 1 to 1000000000) getting the primes takes long time and according to the problem I'm solving, the whole operation must be carried out in a small time interval. This is the problem link for more explanation:
SPOJ-Prime
And here's the part of my code that's responsible of getting primes:
public void GetPrime()
{
int L1 = int.Parse(Limits[0]);
int L2 = int.Parse(Limits[1]);
if (L1 == 1)
{
L1++;
}
for (int i = L1; i <= L2; i++)
{
for (int k = L1; k <= L2; k++)
{
if (i == k)
{
continue;
}
else if (i % k == 0)
{
flag = false;
break;
}
else
{
flag = true;
}
}
if (flag)
{
Console.WriteLine(i);
}
}
}
Is there any faster algorithm?
Thanks in advance.
I remember solving the problem like this:
Use the sieve of eratosthenes to generate all primes below sqrt(1000000000) = ~32 000 in an array primes.
For each number x between m and n only test if it's prime by testing for divisibility against numbers <= sqrt(x) from the array primes. So for x = 29 you will only test if it's divisibile by 2, 3 and 5.
There's no point in checking for divisibility against non-primes, since if x divisible by non-prime y, then there exists a prime p < y such that x divisible by p, since we can write y as a product of primes. For example, 12 is divisible by 6, but 6 = 2 * 3, which means that 12 is also divisible by 2 or 3. By generating all the needed primes in advance (there are very few in this case), you significantly reduce the time needed for the actual primality testing.
This will get accepted and doesn't require any optimization or modification to the sieve, and it's a pretty clean implementation.
You can do it faster by generalising the sieve to generate primes in an interval [left, right], not [2, right] like it's usually presented in tutorials and textbooks. This can get pretty ugly however, and it's not needed. But if anyone is interested, see:
http://pastie.org/9199654 and this linked answer.
You are doing a lot of extra divisions that are not needed - if you know a number is not divisible by 3, there is no point in checking if it is divisible by 9, 27, etc. You should try to divide only by the potential prime factors of the number. Cache the set of primes you are generating and only check division by the previous primes. Note that you do need to generate the initial set of primes below L1.
Remember that no number will have a prime factor that's greater than its own square root, so you can stop your divisions at that point. For instance, you can stop checking potential factors of the number 29 after 5.
You also do can increment by 2 so you can disregard checking if an even number is prime altogether (special casing the number 2, of course.)
I used to ask this question during interviews - as a test I compared an implementation similar to yours with the algorithm I described. With the optimized algorithm, I could generate hundreds of thousands of primes very fast - I never bothered waiting around for the slow, straightforward implementation.
You could try the Sieve of Eratosthenes. The basic difference would be that you start at L1 instead of starting at 2.
Let's change the question a bit: How quickly can you generate the primes between m and n and simply write them to memory? (Or, possibly, to a RAM disk.) On the other hand, remember the range of parameters as described on the problem page: m and n can be as high as a billion, while n-m is at most a million.
IVlad and Brian most of a competitive solution, even if it is true that a slower solution could be good enough. First generate or even precompute the prime numbers less than sqrt(billion); there aren't very many of them. Then do a truncated Sieve of Eratosthenes: Make an array of length n-m+1 to keep track of the status of every number in the range [m,n], with initially every such number marked as prime (1). Then for each precomputed prime p, do a loop that looks like this:
for(k=ceil(m/p)*p; k <= n; k += p) status[k-m] = 0;
This loop marks all of the numbers in the range m <= x <= n as composite (0) if they are multiple of p. If this is what IVlad meant by "pretty ugly", I don't agree; I don't think that it's so bad.
In fact, almost 40% of this work is just for the primes 2, 3, and 5. There is a trick to combine the sieve for a few primes with initialization of the status array. Namely, the pattern of divisibility by 2, 3, and 5 repeats mod 30. Instead of initializing the array to all 1s, you can initialize it to a repeating pattern of 010000010001010001010001000001. If you want to be even more cutting edge, you can advance k by 30*p instead of by p, and only mark off the multiples in the same pattern.
After this, realistic performance gains would involve steps like using a bit vector rather than a char array to keep the sieve data in on-chip cache. And initializing the bit vector word by word rather than bit by bit. This does get messy, and also hypothetical since you can get to the point of generating primes faster than you can use them. The basic sieve is already very fast and not very complicated.
One thing no one's mentioned is that it's rather quick to test a single number for primality. Thus, if the range involved is small but the numbers are large (ex. generate all primes between 1,000,000,000 and 1,000,100,000), it would be faster to just check every number for primality individually.
There are many algorithms finding prime numbers. Some are faster, others are easier.
You can start by making some easiest optimizations. For example,
why are you searching if every number is prime? In other words, are you sure that, given a range of 411 to 418, there is a need to search if numbers 412, 414, 416 and 418 are prime? Numbers which divide by 2 and 3 can be skipped with very simple code modifications.
if the number is not 5, but ends by a digit '5' (1405, 335), it is not prime bad idea: it will make the search slower.
what about caching the results? You can then divide by primes rather by every number. Moreover, only primes less than square root of the number you search are concerned.
If you need something really fast and optimized, taking an existing algorithm instead of reinventing the wheel can be an alternative. You can also try to find some scientific papers explaining how to do it fast, but it can be difficult to understand and to translate to code.
int ceilingNumber = 1000000;
int myPrimes = 0;
BitArray myNumbers = new BitArray(ceilingNumber, true);
for(int x = 2; x < ceilingNumber; x++)
if(myNumbers[x])
{
for(int y = x * 2; y < ceilingNumber; y += x)
myNumbers[y] = false;
}
for(int x = 2; x < ceilingNumber; x++)
if(myNumbers[x])
{
myPrimes++;
Console.Out.WriteLine(x);
}
Console.Out.WriteLine("======================================================");
Console.Out.WriteLine("There is/are {0} primes between 0 and {1} ",myPrimes,ceilingNumber);
Console.In.ReadLine();
I think i have a very fast and efficient(generate all prime even if using type BigInteger) algorithm to getting prime number,it much more faster and simpler than any other one and I use it to solve almost problem related to prime number in Project Euler with just a few seconds for complete solution(brute force)
Here is java code:
public boolean checkprime(int value){ //Using for loop if need to generate prime in a
int n, limit;
boolean isprime;
isprime = true;
limit = value / 2;
if(value == 1) isprime =false;
/*if(value >100)limit = value/10; // if 1 number is not prime it will generate
if(value >10000)limit = value/100; //at lest 2 factor (not 1 or itself)
if(value >90000)limit = value/300; // 1 greater than average 1 lower than average
if(value >1000000)limit = value/1000; //ex: 9997 =13*769 (average ~ sqrt(9997) is 100)
if(value >4000000)limit = value/2000; //so we just want to check divisor up to 100
if(value >9000000)limit = value/3000; // for prime ~10000
*/
limit = (int)Math.sqrt(value); //General case
for(n=2; n <= limit; n++){
if(value % n == 0 && value != 2){
isprime = false;
break;
}
}
return isprime;
}
import java.io.*;
import java.util.Scanner;
class Test{
public static void main(String args[]){
Test tt=new Test();
Scanner obj=new Scanner(System.in);
int m,n;
System.out.println(i);
m=obj.nextInt();
n=obj.nextInt();
tt.IsPrime(n,m);
}
public void IsPrime(int num,int k)
{
boolean[] isPrime = new boolean[num+1];
// initially assume all integers are prime
for (int i = 2; i <= num; i++) {
isPrime[i] = true;
}
// mark non-primes <= N using Sieve of Eratosthenes
for (int i = 2; i*i <= num; i++) {
// if i is prime, then mark multiples of i as nonprime
// suffices to consider mutiples i, i+1, ..., N/i
if (isPrime[i]) {
for (int j = i; i*j <=num; j++) {
isPrime[i*j] = false;
}
}
}
for (int i =k; i <= num; i++) {
if (isPrime[i])
{
System.out.println(i);
}
}
}
}
List<int> prime(int x, int y)
{
List<int> a = new List<int>();
int b = 0;
for (int m = x; m < y; m++)
{
for (int i = 2; i <= m / 2; i++)
{
b = 0;
if (m % i == 0)
{
b = 1;
break;
}
}
if (b == 0) a.Add(m)`
}
return a;
}

Algorithm for creating unique bingo faces

Does anyone know of an algorithm that can generate unique bingo card faces? I'm looking to implement this algorithm in C#.
Thanks,
get 5 sets containing 15 numbers each (1-15 for set 1, 16-30 for set 2...)
select 5 different numbers in sets 1,2,4,5
select 4 different numbers in set 3
To check if that card already exists
Check each existing card for top left correspondance with new card
if both numbers are equal, then move to the second number
if you get 24 times the same number at the same place then both cards are equal and new card must be rejected
This is an interesting problem, but as Michael Madsen reported, given the number of possibilities, you would probably be better generate them randomly and after, check if you have duplicates. (Unless you want to generate all 111 quadrillion possibilities, which I hope you have data storage space for!)
Here's a function for generating a random subset of integers from a given range which you might find useful:
private static IEnumerable<int> RandomSubsetOfRange(int min, int max, int count)
{
Random random = new Random();
int size = max - min + 1;
for (int i = 0; i <= size; i += 1)
{
if (random.NextDouble() <= ((float)count / (float)(size - i + 1)))
{
yield return min + i;
count -= 1;
}
}
}

Building a non sequential list of numbers (From a large range)

I need to create a non sequential list of numbers that fit within a range. For instance i need to a generate a list of numbers from 1 to 1million and make sure that non of the numbers are in a sequential order, that they are completly shuffled. I guess my first question is, are there any good algorithms out there that could help and how best to implement this.
I currently am not sure the best way to implement, either via a c# console app that will spit out the numbers in an XML file or in a database that will spit out the numbers into a table or a set of tables, but that is really secondary to actually working out the best way of "shuffling" the set of numbers.
Any advice guys?
Rob
First off, if none of the numbers are in sequential order then every number in the sequence must be less than its predecessor. A sequence which has that property is sorted from biggest to smallest! Clearly that is not what you want. (Or perhaps you simply do not want any subsequence of the form 5, 6, 7 ? But 6, 8, 20 would be OK?)
To answer your question properly we need to know more information about the problem space. Things I would want to know:
1) Is the size of the range equal to, larger than, or smaller than the size of the sequence? That is, are you going to ask for ten numbers between 1 and 10, five numbers between 1 and 10 or fifty numbers between 1 and 10?
2) Is it acceptable for the sequence to contain duplicates? (If the number of items in the sequence is larger than the range, then clearly yes.)
3) What is the randomness being used for? Most random number generators are only pseudo-random; a clever attacker can deduce the next "random" number by knowing the previous ones. If for example you are generating a series of five cards out of a deck of 52 to make a poker hand, you want really strong randomness; you don't want players to be able to deduce what their opponents have in their hands.
How "non-sequential" do you want it?
You could easily generate a list of random numbers from a range with the Random class:
Random rnd1 = new Random();
List<int> largeList = new List<int>();
for (int i = 0, i < largeNumber, i++)
{
largeList.Add(rnd1.Next(1, 1000001);
}
Edit to add
Admittedly the Durstenfeld algorithm (modern version of the Fisher–Yates shuffle apparently) is much faster:
var fisherYates = new List<int>(upperBound);
for (int i = 0; i < upperBound; i++)
{
fisherYates.Add(i);
}
int n = upperBound;
while (n > 1)
{
n--;
int k = rnd.Next(n + 1);
int temp = fisherYates[k];
fisherYates[k] = fisherYates[n];
fisherYates[n] = temp;
}
For the range 1 to 10000 doing a brute force "find a random number I've not yet used" takes around 4-5 seconds, while this takes around 0.001.
Props to Greg Hewgill for the links.
I understand, that you want to get a random array of lenth 1mio with all numbers from 1 to 1mio. No duplicates, is that right?
You should build up an array with your numbers ranging from 1 to 1mio. Then start shuffling. But it can happen (that is true randomness) that two ore even more numbers are sequential.
Have a look here
Here's a C# function to get you started:
public IEnumerable<int> GetRandomSequence(int max)
{
var r = new Random();
while (true)
{
yield return r.GetNext(max);
}
}
call it like this to get a million numbers ranged 0-9999999:
var numbers = GetRandomSequence(9999999).Take(1000000);
As for sorting, or if you don't want to allow repeats, look at Enumerable.GetRange() (which will give you a consecutive ordered sequence) and use a Fisher-Yates (or Knuth) shuffle algorithm (which you can find all over the place).
"completly shuffled" is a very misunderstood term. One trick fraud experts use when examining what should be "random" data is to watch for cases where there no duplicate values (like 3743***88***123, because in a truly random sequence the chances of not having such a pair is very low... Exactly what are you trying to do ? What, exactly do you mean by "completly shuffled"? If all you mean is random sequence of digits, then just use the Random class in the CLR. to generate random numbers between 0 and 1M... as many as you need...
Well ,you could go with something like this (assuming that you want every number exactly once):
DECLARE #intFrom int
DECLARE #intTo int
DECLARE #tblList table (_id uniqueidentifier, _number int)
SET #intFrom = 0
SET #intTo = 1000000
WHILE (#intFrom < #intTo)
BEGIN
INSERT INTO #tblList
SELECT NewID(), #intFrom
SET #intFrom = #intFrom + 1
END
SELECT *
FROM #tblList
ORDER BY _id
DISCLAIMER: I didn't test this, since I don't have an SQL Server at my disposal at the moment.
This may get you what you need:
1) Populate a list of numbers in order. If your range is 1 - x, it'll look like this:
[1, 2, 4, 5, 6, 7, 8, 9, ... , x]
2) Loop over the list x times, each time choosing a random number between 0 and the length of your list - 1.
3) Use this chosen number to select the corresponding element from your list, and add this number to your output list.
4) Delete the element you just selected from your list. Rinse, repeat.
This will work for any range of numbers, not just lists that start with 1 or 0. The pseudocode looks like this:
nums = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
shuffled_nums = []
for i in range(0, len(nums)):
random_index = rand(0,len(nums))
shuffled_nums.add(nums[random_index])
del(nums[random_index])

Categories

Resources