Convert percentage to nearest fraction - c#

We have a very data intensive system. It stores raw data, then computes percentages based on the number of correct responses / total trials.
Recently we have had customers who want to import old data into our system.
I need a way to covert a percentage to the nearest fraction.
Examples.
33% needs to give me 2/6. EVEN though 1/3 is .33333333
67% needs to give me 4/6. EVEN though 4/6 is .6666667
I realize I could just compute that to be 67/100, but that means i'd have to add 100 data points to the system when 6 would suffice.
Does anyone have any ideas?
EDIT
Denominator could be anything. They are giving me a raw, rounded percentage and i'm trying to get as close to it with RAW data as possible

Your requirements are contradicting: On the one hand, you want to "convert a percentage to the nearest fraction" (*), but on the other hand, you want fractions with small(est) numbers. You need to find some compromise when/how to drop precision in favor of smaller numbers. Your problem as it stands is not solvable.
(*) The nearest fraction f for any given (integer) percentage n is n/100. Per definition.

I have tried to satisfy your requirement by using continued fractions. By limiting the depth to three I got a reasonable approximation.
I failed to come up with an iterative (or recursive) approach in resonable time. Nevertheless I have cleaned it up a little. (I know that 3 letter variable names are not good but I can't think of good names for them :-/ )
The code gives you the best rational approximation within the specified tolerance it can find. The resulting fraction is reduced and is the best approximation among all fractions with the same or lower denominator.
public partial class Form1 : Form
{
Random rand = new Random();
public Form1()
{
InitializeComponent();
}
private void button1_Click(object sender, EventArgs e)
{
for (int i = 0; i < 10; i++)
{
double value = rand.NextDouble();
var fraction = getFraction(value);
var numerator = fraction.Key;
var denominator = fraction.Value;
System.Console.WriteLine(string.Format("Value {0:0.0000} approximated by {1}/{2} = {3:0.0000}", value, numerator, denominator, (double)numerator / denominator));
}
/*
Output:
Value 0,4691 approximated by 8/17 = 0,4706
Value 0,0740 approximated by 1/14 = 0,0714
Value 0,7690 approximated by 3/4 = 0,7500
Value 0,7450 approximated by 3/4 = 0,7500
Value 0,3748 approximated by 3/8 = 0,3750
Value 0,7324 approximated by 3/4 = 0,7500
Value 0,5975 approximated by 3/5 = 0,6000
Value 0,7544 approximated by 3/4 = 0,7500
Value 0,7212 approximated by 5/7 = 0,7143
Value 0,0469 approximated by 1/21 = 0,0476
Value 0,2755 approximated by 2/7 = 0,2857
Value 0,8763 approximated by 7/8 = 0,8750
Value 0,8255 approximated by 5/6 = 0,8333
Value 0,6170 approximated by 3/5 = 0,6000
Value 0,3692 approximated by 3/8 = 0,3750
Value 0,8057 approximated by 4/5 = 0,8000
Value 0,3928 approximated by 2/5 = 0,4000
Value 0,0235 approximated by 1/43 = 0,0233
Value 0,8528 approximated by 6/7 = 0,8571
Value 0,4536 approximated by 5/11 = 0,4545
*/
}
private KeyValuePair<int, int> getFraction(double value, double tolerance = 0.02)
{
double f0 = 1 / value;
double f1 = 1 / (f0 - Math.Truncate(f0));
int a_t = (int)Math.Truncate(f0);
int a_r = (int)Math.Round(f0);
int b_t = (int)Math.Truncate(f1);
int b_r = (int) Math.Round(f1);
int c = (int)Math.Round(1 / (f1 - Math.Truncate(f1)));
if (Math.Abs(1.0 / a_r - value) <= tolerance)
return new KeyValuePair<int, int>(1, a_r);
else if (Math.Abs(b_r / (a_t * b_r + 1.0) - value) <= tolerance)
return new KeyValuePair<int, int>(b_r, a_t * b_r + 1);
else
return new KeyValuePair<int, int>(c * b_t + 1, c * a_t * b_t + a_t + c);
}
}

Would it have to return 2/6 rather than 1/3? If its always in 6ths, then
Math.Round((33 * 6)/100) = 2

Answering my own question here. Would this work?
public static Fraction Convert(decimal value) {
for (decimal numerator = 1; numerator <= 10; numerator++) {
for (decimal denomenator = 1; denomenator < 10; denomenator++) {
var result = numerator / denomenator;
if (Math.Abs(value - result) < .01m)
return new Fraction() { Numerator = numerator, Denomenator = denomenator };
}
}
throw new Exception();
}
This will keep my denominator below 10.

Related

Using Math.NET Numerics is it possible to generate a normally distributed sample with upper and lower bounds?

It's very easy to generate normally distributed data with a desired mean and standard distribution:
IEnumerable<double> sample = MathNet.Numerics.Distributions.Normal.Samples(mean, sd).Take(n);
However with a sufficiently large value for n you will get values miles away from the mean. To put it into context I have a real world data set with mean = 15.93 and sd = 6.84. For this data set it is impossible to have a value over 30 or under 0, but I cannot see a way to add upper and lower bounds to the data that is generated.
I can remove data that falls outside of this range as below, but this results in the mean and SD for the generated sample differing significantly (in my opinion, probably not statistically) from the values I requested.
Normal.Samples(mean, sd).Where(x => x is >= 0 and <= 30).Take(n);
Is there any way to ensure that the values generated fall within a specified range without effecting the mean and SD of the generated data?
The following proposed solution relies on a specific formula for calculating the standard deviation relative to the bounds: the standard deviation has to be a third of the difference between the mean and the required minimum or maximum.
This first code block is the TruncatedNormalDistribution class, which encapsulates MathNet's Normal class. The main technique for making a truncated normal distribution is in the constructor. Note the resulting workaround that is required in the Sample method:
using MathNet.Numerics.Distributions;
public class TruncatedNormalDistribution {
public TruncatedNormalDistribution(double xMin, double xMax) {
XMin = xMin;
XMax = xMax;
double mean = XMin + (XMax - XMin) / 2; // Halfway between minimum and maximum.
// If the standard deviation is a third of the difference between the mean and
// the required minimum or maximum of a normal distribution, 99.7% of samples should
// be in the required range.
double standardDeviation = (mean - XMin) / 3;
Distribution = new Normal(mean, standardDeviation);
}
private Normal Distribution { get; }
private double XMin { get; }
private double XMax { get; }
public double CumulativeDistribution(double x) {
return Distribution.CumulativeDistribution(x);
}
public double Density(double x) {
return Distribution.Density(x);
}
public double Sample() {
// Constrain results lower than XMin or higher than XMax
// to those bounds.
return Math.Clamp(Distribution.Sample(), XMin, XMax);
}
}
And here is a usage example. For a visual representation of the results, open each of the two output CSV files in a spreadsheet, such as Excel, and map its data to a line chart:
// Put the path of the folder where the CSVs will be saved here
const string chartFolderPath =
#"C:\Insert\chart\folder\path\here";
const double xMin = 0;
const double xMax = 100;
var distribution = new TruncatedNormalDistribution(xMin, xMax);
// Densities
var dictionary = new Dictionary<double, double>();
for (double x = xMin; x <= xMax; x += 1) {
dictionary.Add(x, distribution.Density(x));
}
string csvPath = Path.Combine(
chartFolderPath,
$"Truncated Normal Densities, Range {xMin} to {xMax}.csv");
using var writer = new StreamWriter(csvPath);
foreach ((double key, double value) in dictionary) {
writer.WriteLine($"{key},{value}");
}
// Cumulative Distributions
dictionary.Clear();
for (double x = xMin; x <= xMax; x += 1) {
dictionary.Add(x, distribution.CumulativeDistribution(x));
}
csvPath = Path.Combine(
chartFolderPath,
$"Truncated Normal Cumulative Distributions, Range {xMin} to {xMax}.csv");
using var writer2 = new StreamWriter(csvPath);
foreach ((double key, double value) in dictionary) {
writer2.WriteLine($"{key},{value}");
}

How to use the Nelder Meade Simplex algorithm in mathdotnet for function maximization

In my C# program I have a dataset where each data point consists of:
a stimulus intensity (intensity) as x-coordinate
the percentage of correct response (percentageCorrect) to stimulus as y-coordinate
When the intensity is low percentageCorrect is low. When the intensity is high the percentageCorrect is high. The function graph is an S-shaped curve as the percentageCorrect reaches an asymptote at low and high ends.
I am trying to find the threshold intensity where percentageCorrect is half way between the asymtotes at either end (center of the S-shaped curve)
I understand this to be a function maximization problem that can be solved by the Nelder Meade Simplex algorithm.
I am trying to solve my problem using the Nelder Meade Simplex algorithm in mathdotnet and its IObjectiveFunction parameter.
However, I am having trouble understanding the API of the NedlerMeadeSimplex class FindMinimum method and the IObjectiveFunction EvaluateAt method.
I am new to numerical analysis that is pre-requisite for this question.
Specific questions are:
For the NedlerMeadeSimplex class FindMinimum method what are the initialGuess and initialPertubation parameters?
For the IObjectiveFunction EvaluateAt method, what is the point parameter? I vaguely understand that the point parameter is a datum in the dataset being minimized
How can I map my data set to this API and solve my problem?
Thanks for any guidance on this.
The initial guess is a guess at the model parameters.
I've always used the forms that don't require an entry of the initialPertubation parameter, so I can't help you there.
The objective function is what your are trying to minimize. For example, for a least squares fit, it would calculate the sum of squared areas at the point given in the argument. Something like this:
private double SumSqError(Vector<double> v)
{
double err = 0;
for (int i = 0; i < 100; i++)
{
double y_val = v[0] + v[1] * Math.Exp(v[2] * x[i]);
err += Math.Pow(y_val - y[i], 2);
}
return err;
}
You don't have to supply the point. The algorithm does that over and over while searching for the minimum. Note that the subroutine as access to the vector x.
Here is the code for a test program fitting a function to random data:
private void btnMinFit_Click(object sender, EventArgs e)
{
Random RanGen = new Random();
x = new double[100];
y = new double[100];
// fit exponential expression with three parameters
double a = 5.0;
double b = 0.5;
double c = 0.05;
// create data set
for (int i = 0; i < 100; i++) x[i] = 10 + Convert.ToDouble(i) * 90.0 / 99.0; // values span 10 to 100
for (int i = 0; i < 100; i++)
{
double y_val = a + b * Math.Exp(c * x[i]);
y[i] = y_val + 0.1 * RanGen.NextDouble() * y_val; // add error term scaled to y-value
}
// var fphv = new Func<double, double, double, double>((x, A, B) => A * x + B * x + A * B * x * x); extraneous test
var f1 = new Func<Vector<double>, double>(x => LogEval(x));
var obj = ObjectiveFunction.Value(f1);
var solver = new NelderMeadSimplex(1e-5, maximumIterations: 10000);
var initialGuess = new DenseVector(new[] { 3.0, 6.0, 0.6 });
var result = solver.FindMinimum(obj, initialGuess);
Console.WriteLine(result.MinimizingPoint.ToString());
}

Random number with Probabilities in C#

I have converted this Java program into a C# program.
using System;
using System.Collections.Generic;
namespace RandomNumberWith_Distribution__Test
{
public class DistributedRandomNumberGenerator
{
private Dictionary<Int32, Double> distribution;
private double distSum;
public DistributedRandomNumberGenerator()
{
distribution = new Dictionary<Int32, Double>();
}
public void addNumber(int val, double dist)
{
distribution.Add(val, dist);// are these two
distSum += dist; // lines correctly translated?
}
public int getDistributedRandomNumber()
{
double rand = new Random().NextDouble();//generate a double random number
double ratio = 1.0f / distSum;//why is ratio needed?
double tempDist = 0;
foreach (Int32 i in distribution.Keys)
{
tempDist += distribution[i];
if (rand / ratio <= tempDist)//what does "rand/ratio" signify? What does this comparison achieve?
{
return i;
}
}
return 0;
}
}
public class MainClass
{
public static void Main(String[] args)
{
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(2, 0.3d);
drng.addNumber(3, 0.5d);
//=================
// start simulation
int testCount = 1000000;
Dictionary<Int32, Double> test = new Dictionary<Int32, Double>();
for (int i = 0; i < testCount; i++)
{
int random = drng.getDistributedRandomNumber();
if (test.ContainsKey(random))
{
double prob = test[random]; // are these
prob = prob + 1.0 / testCount;// three lines
test[random] = prob; // correctly translated?
}
else
{
test.Add(random, 1.0 / testCount);// is this line correctly translated?
}
}
foreach (var item in test.Keys)
{
Console.WriteLine($"{item}, {test[item]}");
}
Console.ReadLine();
}
}
}
I have several questions:
Can you check if the marked-by-comment lines are correct or need explanation?
Why doesn't getDistributedRandomNumber() check if the sum of the distribution 1 before proceeding to further calculations?
The method
public void addNumber(int val, double dist)
Is not correctly translated, since you are missing the following lines:
if (this.distribution.get(value) != null) {
distSum -= this.distribution.get(value);
}
Those lines should cover the case when you call the following (based on your example code):
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(1, 0.5d);
So calling the method addNumber twice with the same first argument, the missing code part looks if the first argument is already present in the dictionary and if so it will remove the "old" value from the dictionary to insert the new value.
Your method should look like this:
public void addNumber(int val, double dist)
{
if (distribution.TryGetValue(val, out var oldDist)) //get the old "dist" value, based on the "val"
{
distribution.Remove(val); //remove the old entry
distSum -= oldDist; //substract "distSum" with the old "dist" value
}
distribution.Add(val, dist); //add the "val" with the current "dist" value to the dictionary
distSum += dist; //add the current "dist" value to "distSum"
}
Now to your second method
public int getDistributedRandomNumber()
Instead of calling initializing a new instance of Random every time this method is called you should only initialize it once, so change the line
double rand = new Random().NextDouble();
to this
double rand = _random.NextDouble();
and initialize the field _random outside of a method inside the class declaration like this
public class DistributedRandomNumberGenerator
{
private Dictionary<Int32, Double> distribution;
private double distSum;
private Random _random = new Random();
... rest of your code
}
This will prevent new Random().NextDouble() from producing the same number over and over again if called in a loop.
You can read about this problem here: Random number generator only generating one random number
As I side note, fields in c# are named with a prefix underscore. You should consider renaming distribution to _distribution, same applies for distSum.
Next:
double ratio = 1.0f / distSum;//why is ratio needed?
Ratio is need because the method tries its best to do its job with the information you have provided, imagine you only call this:
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
int random = drng.getDistributedRandomNumber();
With those lines you told the class you want to have the number 1 in 20% of the cases, but what about the other 80%?
And that's where the ratio variable comes in place, it calculates a comparable value based on the sum of probabilities you have given.
eg.
double ratio = 1.0f / distSum;
As with the latest example drng.addNumber(1, 0.2d); distSum will be 0.2, which translates to a probability of 20%.
double ratio = 1.0f / 0.2;
The ratio is 5.0, with a probability of 20% the ratio is 5 because 100% / 5 = 20%.
Now let's have a look at how the code reacts when the ratio is 5
double tempDist = 0;
foreach (Int32 i in distribution.Keys)
{
tempDist += distribution[i];
if (rand / ratio <= tempDist)
{
return i;
}
}
rand will be to any given time a value that is greater than or equal to 0.0, and less than 1.0., that's how NextDouble works, so let's assume the following 0.254557522132321 as rand.
Now let's look what happens step by step
double tempDist = 0; //initialize with 0
foreach (Int32 i in distribution.Keys) //step through the added probabilities
{
tempDist += distribution[i]; //get the probabilities and add it to a temporary probability sum
//as a reminder
//rand = 0.254557522132321
//ratio = 5
//rand / ratio = 0,0509115044264642
//tempDist = 0,2
// if will result in true
if (rand / ratio <= tempDist)
{
return i;
}
}
If we didn't apply the ratio the if would be false, but that would be wrong, since we only have a single value inside our dictionary, so no matter what the rand value might be the if statement should return true and that's the natur of rand / ratio.
To "fix" the randomly generated number based on the sum of probabilities we added. The rand / ratio will only be usefull if you didn't provide probabilites that perfectly sum up to 1 = 100%.
eg. if your example would be this
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(2, 0.3d);
drng.addNumber(3, 0.5d);
You can see that the provided probabilities equal to 1 => 0.2 + 0.3 + 0.5, in this case the line
if (rand / ratio <= tempDist)
Would look like this
if (rand / 1 <= tempDist)
Divding by 1 will never change the value and rand / 1 = rand, so the only use case for this devision are cases where you didn't provided a perfect 100% probability, could be either more or less.
As a side note, I would suggest changing your code to this
//call the dictionary distributions (notice the plural)
//dont use .Keys
//var distribution will be a KeyValuePair
foreach (var distribution in distributions)
{
//access the .Value member of the KeyValuePair
tempDist += distribution.Value;
if (rand / ratio <= tempDist)
{
return i;
}
}
Your test routine seems to be correctly translated.

How can I return a number based on a skewed normal distribution?

If I want to get e^x as you can see in this figure
which I can just call Math.Exp(x);
What I want to do make a function that returns y for my own graph (like this) which is a normal distribution skewed left or right or not skewed at all. It will have some standard deviation and some maximum height.
I've been googling and thinking about how to do it for a while but my math skills just aren't good enough. I was hoping I could get some help with this.
First, a skewed normal random point(x) will be created, then probability distribution function (PDF) of that point will be found.
The math used here depends on Ermak and Nasstrom's 1995 study. If you like, take a look to sample fortran77 code exists in this publication (Most of the variable names used in the code do not fit C# naming conventions, simply because I wanted the reader to relate them to the original paper).
private static double GetSkewedRandomNumber(double standardDeviation=1, double skewness=0, int dbIteration = 10)
{
var random = new Random();
var variance = Math.Pow(standardDeviation, 2);
const double a = 2.236067977; // --> Square root of 5
const double b = 0.222222222;
const double c = 243 / 32;
double finalrun, sumdbran = 0;
double dbmom3, dbmean1, dbmean2, dbprob1, dbdelta1, dbdelta2, randomNumber1, randomNumber2, dbran, terma, termb;
dbmom3 = Math.Sqrt(dbIteration) * skewness * Math.Pow(variance, 1.5);
terma = b / variance;
termb = Math.Sqrt(Math.Pow(dbmom3, 2) + c * Math.Pow(variance, 3));
dbmean1 = terma * (dbmom3 - termb);
dbmean2 = terma * (dbmom3 + termb);
dbprob1 = dbmean2 / (2 * a * dbmean1 * (dbmean1 - dbmean2));
dbdelta1 = -a * dbmean1;
dbdelta2 = a * dbmean2;
//Loop for summation of double block random numbers in each final random number
for (int i = 0; i < dbIteration; i++)
{
randomNumber1 = random.NextDouble();
randomNumber2 = random.NextDouble();
if (randomNumber1 < (2 * dbdelta1 * dbprob1))
dbran = dbmean1 + (2 * dbdelta1 * (randomNumber2 - 0.5));
else
dbran = dbmean2 + (2 * dbdelta2 * (randomNumber2 - 0.5));
//sumdbran is the sum of double block random numbers created by the iteration
sumdbran = sumdbran + dbran;
}
//Calculate final skewed normal random number
finalrun = sumdbran / Math.Sqrt(dbIteration);
return finalrun;
}
and below code is for PDF:
public static double GetPDF(double mean, double standardDeviation, double skewness, double x)
{
var pdf = 0d;
var variable = skewness * x;
var normalPDF = NormalDistribution.GetPDF(0, 1, x);
var normalCDF = NormalDistribution.GetCDF(0, 1, variable);
pdf = 2 * normalPDF * normalCDF;
return pdf;
}
You can implement NormalDistribution.GetPDF and NormalDistribution.GetCDF methods yourself (mentioned in GetPDF method). They are simply calculating Probability Density Funciton and Cumulative Distribution Function of a normal distribution. To make it simple and focus on the question I prefered not to add the code. For those who also want to check CDF calculation of the skewed normal distribution please check here
Here is an example of a skewed distribution derived from above code and its relevant pdf-cdf graphs.
pdf graph of positive (+0.75) skewed normal distribution
pdf graph of negative (-0.75) skewed normal distribution
cdf graph of positive (0.75) skewed normal distribution
cdf graph of negative (-0.75) skewed normal distribution

selection based on percentage weighting

I have a set of values, and an associated percentage for each:
a: 70% chance
b: 20% chance
c: 10% chance
I want to select a value (a, b, c) based on the percentage chance given.
how do I approach this?
my attempt so far looks like this:
r = random.random()
if r <= .7:
return a
elif r <= .9:
return b
else:
return c
I'm stuck coming up with an algorithm to handle this. How should I approach this so it can handle larger sets of values without just chaining together if-else flows.
(any explanation or answers in pseudo-code are fine. a python or C# implementation would be especially helpful)
Here is a complete solution in C#:
public class ProportionValue<T>
{
public double Proportion { get; set; }
public T Value { get; set; }
}
public static class ProportionValue
{
public static ProportionValue<T> Create<T>(double proportion, T value)
{
return new ProportionValue<T> { Proportion = proportion, Value = value };
}
static Random random = new Random();
public static T ChooseByRandom<T>(
this IEnumerable<ProportionValue<T>> collection)
{
var rnd = random.NextDouble();
foreach (var item in collection)
{
if (rnd < item.Proportion)
return item.Value;
rnd -= item.Proportion;
}
throw new InvalidOperationException(
"The proportions in the collection do not add up to 1.");
}
}
Usage:
var list = new[] {
ProportionValue.Create(0.7, "a"),
ProportionValue.Create(0.2, "b"),
ProportionValue.Create(0.1, "c")
};
// Outputs "a" with probability 0.7, etc.
Console.WriteLine(list.ChooseByRandom());
For Python:
>>> import random
>>> dst = 70, 20, 10
>>> vls = 'a', 'b', 'c'
>>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
>>> for _ in range(12): print random.choice(picks),
...
a c c b a a a a a a a a
>>> for _ in range(12): print random.choice(picks),
...
a c a c a b b b a a a a
>>> for _ in range(12): print random.choice(picks),
...
a a a a c c a c a a c a
>>>
General idea: make a list where each item is repeated a number of times proportional to the probability it should have; use random.choice to pick one at random (uniformly), this will match your required probability distribution. Can be a bit wasteful of memory if your probabilities are expressed in peculiar ways (e.g., 70, 20, 10 makes a 100-items list where 7, 2, 1 would make a list of just 10 items with exactly the same behavior), but you could divide all the counts in the probabilities list by their greatest common factor if you think that's likely to be a big deal in your specific application scenario.
Apart from memory consumption issues, this should be the fastest solution -- just one random number generation per required output result, and the fastest possible lookup from that random number, no comparisons &c. If your likely probabilities are very weird (e.g., floating point numbers that need to be matched to many, many significant digits), other approaches may be preferable;-).
Knuth references Walker's method of aliases. Searching on this, I find http://code.activestate.com/recipes/576564-walkers-alias-method-for-random-objects-with-diffe/ and http://prxq.wordpress.com/2006/04/17/the-alias-method/. This gives the exact probabilities required in constant time per number generated with linear time for setup (curiously, n log n time for setup if you use exactly the method Knuth describes, which does a preparatory sort you can avoid).
Take the list of and find the cumulative total of the weights: 70, 70+20, 70+20+10. Pick a random number greater than or equal to zero and less than the total. Iterate over the items and return the first value for which the cumulative sum of the weights is greater than this random number:
def select( values ):
variate = random.random() * sum( values.values() )
cumulative = 0.0
for item, weight in values.items():
cumulative += weight
if variate < cumulative:
return item
return item # Shouldn't get here, but just in case of rounding...
print select( { "a": 70, "b": 20, "c": 10 } )
This solution, as implemented, should also be able to handle fractional weights and weights that add up to any number so long as they're all non-negative.
Let T = the sum of all item weights
Let R = a random number between 0 and T
Iterate the item list subtracting each item weight from R and return the item that causes the result to become <= 0.
def weighted_choice(probabilities):
random_position = random.random() * sum(probabilities)
current_position = 0.0
for i, p in enumerate(probabilities):
current_position += p
if random_position < current_position:
return i
return None
Because random.random will always return < 1.0, the final return should never be reached.
import random
def selector(weights):
i=random.random()*sum(x for x,y in weights)
for w,v in weights:
if w>=i:
break
i-=w
return v
weights = ((70,'a'),(20,'b'),(10,'c'))
print [selector(weights) for x in range(10)]
it works equally well for fractional weights
weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
print [selector(weights) for x in range(10)]
If you have a lot of weights, you can use bisect to reduce the number of iterations required
import random
import bisect
def make_acc_weights(weights):
acc=0
acc_weights = []
for w,v in weights:
acc+=w
acc_weights.append((acc,v))
return acc_weights
def selector(acc_weights):
i=random.random()*sum(x for x,y in weights)
return weights[bisect.bisect(acc_weights, (i,))][1]
weights = ((70,'a'),(20,'b'),(10,'c'))
acc_weights = make_acc_weights(weights)
print [selector(acc_weights) for x in range(100)]
Also works fine for fractional weights
weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
acc_weights = make_acc_weights(weights)
print [selector(acc_weights) for x in range(100)]
today, the update of python document give an example to make a random.choice() with weighted probabilities:
If the weights are small integer ratios, a simple technique is to build a sample population with repeats:
>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
>>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
>>> random.choice(population)
'Green'
A more general approach is to arrange the weights in a cumulative distribution with itertools.accumulate(), and then locate the random value with bisect.bisect():
>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]
'Blue'
one note: itertools.accumulate() needs python 3.2 or define it with the Equivalent.
I think you can have an array of small objects (I implemented in Java although I know a little bit C# but I am afraid can write wrong code), so you may need to port it yourself. The code in C# will be much smaller with struct, var but I hope you get the idea
class PercentString {
double percent;
String value;
// Constructor for 2 values
}
ArrayList<PercentString> list = new ArrayList<PercentString();
list.add(new PercentString(70, "a");
list.add(new PercentString(20, "b");
list.add(new PercentString(10, "c");
double percent = 0;
for (int i = 0; i < list.size(); i++) {
PercentString p = list.get(i);
percent += p.percent;
if (random < percent) {
return p.value;
}
}
If you are really up to speed and want to generate the random values quickly, the Walker's algorithm mcdowella mentioned in https://stackoverflow.com/a/3655773/1212517 is pretty much the best way to go (O(1) time for random(), and O(N) time for preprocess()).
For anyone who is interested, here is my own PHP implementation of the algorithm:
/**
* Pre-process the samples (Walker's alias method).
* #param array key represents the sample, value is the weight
*/
protected function preprocess($weights){
$N = count($weights);
$sum = array_sum($weights);
$avg = $sum / (double)$N;
//divide the array of weights to values smaller and geq than sum/N
$smaller = array_filter($weights, function($itm) use ($avg){ return $avg > $itm;}); $sN = count($smaller);
$greater_eq = array_filter($weights, function($itm) use ($avg){ return $avg <= $itm;}); $gN = count($greater_eq);
$bin = array(); //bins
//we want to fill N bins
for($i = 0;$i<$N;$i++){
//At first, decide for a first value in this bin
//if there are small intervals left, we choose one
if($sN > 0){
$choice1 = each($smaller);
unset($smaller[$choice1['key']]);
$sN--;
} else{ //otherwise, we split a large interval
$choice1 = each($greater_eq);
unset($greater_eq[$choice1['key']]);
}
//splitting happens here - the unused part of interval is thrown back to the array
if($choice1['value'] >= $avg){
if($choice1['value'] - $avg >= $avg){
$greater_eq[$choice1['key']] = $choice1['value'] - $avg;
}else if($choice1['value'] - $avg > 0){
$smaller[$choice1['key']] = $choice1['value'] - $avg;
$sN++;
}
//this bin comprises of only one value
$bin[] = array(1=>$choice1['key'], 2=>null, 'p1'=>1, 'p2'=>0);
}else{
//make the second choice for the current bin
$choice2 = each($greater_eq);
unset($greater_eq[$choice2['key']]);
//splitting on the second interval
if($choice2['value'] - $avg + $choice1['value'] >= $avg){
$greater_eq[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
}else{
$smaller[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
$sN++;
}
//this bin comprises of two values
$choice2['value'] = $avg - $choice1['value'];
$bin[] = array(1=>$choice1['key'], 2=>$choice2['key'],
'p1'=>$choice1['value'] / $avg,
'p2'=>$choice2['value'] / $avg);
}
}
$this->bins = $bin;
}
/**
* Choose a random sample according to the weights.
*/
public function random(){
$bin = $this->bins[array_rand($this->bins)];
$randValue = (lcg_value() < $bin['p1'])?$bin[1]:$bin[2];
}
Here is my version that can apply to any IList and normalize the weight. It is based on Timwi's solution : selection based on percentage weighting
/// <summary>
/// return a random element of the list or default if list is empty
/// </summary>
/// <param name="e"></param>
/// <param name="weightSelector">
/// return chances to be picked for the element. A weigh of 0 or less means 0 chance to be picked.
/// If all elements have weight of 0 or less they all have equal chances to be picked.
/// </param>
/// <returns></returns>
public static T AnyOrDefault<T>(this IList<T> e, Func<T, double> weightSelector)
{
if (e.Count < 1)
return default(T);
if (e.Count == 1)
return e[0];
var weights = e.Select(o => Math.Max(weightSelector(o), 0)).ToArray();
var sum = weights.Sum(d => d);
var rnd = new Random().NextDouble();
for (int i = 0; i < weights.Length; i++)
{
//Normalize weight
var w = sum == 0
? 1 / (double)e.Count
: weights[i] / sum;
if (rnd < w)
return e[i];
rnd -= w;
}
throw new Exception("Should not happen");
}
I've my own solution for this:
public class Randomizator3000
{
public class Item<T>
{
public T value;
public float weight;
public static float GetTotalWeight<T>(Item<T>[] p_itens)
{
float __toReturn = 0;
foreach(var item in p_itens)
{
__toReturn += item.weight;
}
return __toReturn;
}
}
private static System.Random _randHolder;
private static System.Random _random
{
get
{
if(_randHolder == null)
_randHolder = new System.Random();
return _randHolder;
}
}
public static T PickOne<T>(Item<T>[] p_itens)
{
if(p_itens == null || p_itens.Length == 0)
{
return default(T);
}
float __randomizedValue = (float)_random.NextDouble() * (Item<T>.GetTotalWeight(p_itens));
float __adding = 0;
for(int i = 0; i < p_itens.Length; i ++)
{
float __cacheValue = p_itens[i].weight + __adding;
if(__randomizedValue <= __cacheValue)
{
return p_itens[i].value;
}
__adding = __cacheValue;
}
return p_itens[p_itens.Length - 1].value;
}
}
And using it should be something like that (thats in Unity3d)
using UnityEngine;
using System.Collections;
public class teste : MonoBehaviour
{
Randomizator3000.Item<string>[] lista;
void Start()
{
lista = new Randomizator3000.Item<string>[10];
lista[0] = new Randomizator3000.Item<string>();
lista[0].weight = 10;
lista[0].value = "a";
lista[1] = new Randomizator3000.Item<string>();
lista[1].weight = 10;
lista[1].value = "b";
lista[2] = new Randomizator3000.Item<string>();
lista[2].weight = 10;
lista[2].value = "c";
lista[3] = new Randomizator3000.Item<string>();
lista[3].weight = 10;
lista[3].value = "d";
lista[4] = new Randomizator3000.Item<string>();
lista[4].weight = 10;
lista[4].value = "e";
lista[5] = new Randomizator3000.Item<string>();
lista[5].weight = 10;
lista[5].value = "f";
lista[6] = new Randomizator3000.Item<string>();
lista[6].weight = 10;
lista[6].value = "g";
lista[7] = new Randomizator3000.Item<string>();
lista[7].weight = 10;
lista[7].value = "h";
lista[8] = new Randomizator3000.Item<string>();
lista[8].weight = 10;
lista[8].value = "i";
lista[9] = new Randomizator3000.Item<string>();
lista[9].weight = 10;
lista[9].value = "j";
}
void Update ()
{
Debug.Log(Randomizator3000.PickOne<string>(lista));
}
}
In this example each value has a 10% chance do be displayed as a debug =3
Based loosely on python's numpy.random.choice(a=items, p=probs), which takes an array and a probability array of the same size.
public T RandomChoice<T>(IEnumerable<T> a, IEnumerable<double> p)
{
IEnumerator<T> ae = a.GetEnumerator();
Random random = new Random();
double target = random.NextDouble();
double accumulator = 0;
foreach (var prob in p)
{
ae.MoveNext();
accumulator += prob;
if (accumulator > target)
{
break;
}
}
return ae.Current;
}
The probability array p must sum to (approx.) 1. This is to keep it consistent with the numpy interface (and mathematics), but you could easily change that if you wanted.

Categories

Resources