I have converted this Java program into a C# program.
using System;
using System.Collections.Generic;
namespace RandomNumberWith_Distribution__Test
{
public class DistributedRandomNumberGenerator
{
private Dictionary<Int32, Double> distribution;
private double distSum;
public DistributedRandomNumberGenerator()
{
distribution = new Dictionary<Int32, Double>();
}
public void addNumber(int val, double dist)
{
distribution.Add(val, dist);// are these two
distSum += dist; // lines correctly translated?
}
public int getDistributedRandomNumber()
{
double rand = new Random().NextDouble();//generate a double random number
double ratio = 1.0f / distSum;//why is ratio needed?
double tempDist = 0;
foreach (Int32 i in distribution.Keys)
{
tempDist += distribution[i];
if (rand / ratio <= tempDist)//what does "rand/ratio" signify? What does this comparison achieve?
{
return i;
}
}
return 0;
}
}
public class MainClass
{
public static void Main(String[] args)
{
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(2, 0.3d);
drng.addNumber(3, 0.5d);
//=================
// start simulation
int testCount = 1000000;
Dictionary<Int32, Double> test = new Dictionary<Int32, Double>();
for (int i = 0; i < testCount; i++)
{
int random = drng.getDistributedRandomNumber();
if (test.ContainsKey(random))
{
double prob = test[random]; // are these
prob = prob + 1.0 / testCount;// three lines
test[random] = prob; // correctly translated?
}
else
{
test.Add(random, 1.0 / testCount);// is this line correctly translated?
}
}
foreach (var item in test.Keys)
{
Console.WriteLine($"{item}, {test[item]}");
}
Console.ReadLine();
}
}
}
I have several questions:
Can you check if the marked-by-comment lines are correct or need explanation?
Why doesn't getDistributedRandomNumber() check if the sum of the distribution 1 before proceeding to further calculations?
The method
public void addNumber(int val, double dist)
Is not correctly translated, since you are missing the following lines:
if (this.distribution.get(value) != null) {
distSum -= this.distribution.get(value);
}
Those lines should cover the case when you call the following (based on your example code):
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(1, 0.5d);
So calling the method addNumber twice with the same first argument, the missing code part looks if the first argument is already present in the dictionary and if so it will remove the "old" value from the dictionary to insert the new value.
Your method should look like this:
public void addNumber(int val, double dist)
{
if (distribution.TryGetValue(val, out var oldDist)) //get the old "dist" value, based on the "val"
{
distribution.Remove(val); //remove the old entry
distSum -= oldDist; //substract "distSum" with the old "dist" value
}
distribution.Add(val, dist); //add the "val" with the current "dist" value to the dictionary
distSum += dist; //add the current "dist" value to "distSum"
}
Now to your second method
public int getDistributedRandomNumber()
Instead of calling initializing a new instance of Random every time this method is called you should only initialize it once, so change the line
double rand = new Random().NextDouble();
to this
double rand = _random.NextDouble();
and initialize the field _random outside of a method inside the class declaration like this
public class DistributedRandomNumberGenerator
{
private Dictionary<Int32, Double> distribution;
private double distSum;
private Random _random = new Random();
... rest of your code
}
This will prevent new Random().NextDouble() from producing the same number over and over again if called in a loop.
You can read about this problem here: Random number generator only generating one random number
As I side note, fields in c# are named with a prefix underscore. You should consider renaming distribution to _distribution, same applies for distSum.
Next:
double ratio = 1.0f / distSum;//why is ratio needed?
Ratio is need because the method tries its best to do its job with the information you have provided, imagine you only call this:
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
int random = drng.getDistributedRandomNumber();
With those lines you told the class you want to have the number 1 in 20% of the cases, but what about the other 80%?
And that's where the ratio variable comes in place, it calculates a comparable value based on the sum of probabilities you have given.
eg.
double ratio = 1.0f / distSum;
As with the latest example drng.addNumber(1, 0.2d); distSum will be 0.2, which translates to a probability of 20%.
double ratio = 1.0f / 0.2;
The ratio is 5.0, with a probability of 20% the ratio is 5 because 100% / 5 = 20%.
Now let's have a look at how the code reacts when the ratio is 5
double tempDist = 0;
foreach (Int32 i in distribution.Keys)
{
tempDist += distribution[i];
if (rand / ratio <= tempDist)
{
return i;
}
}
rand will be to any given time a value that is greater than or equal to 0.0, and less than 1.0., that's how NextDouble works, so let's assume the following 0.254557522132321 as rand.
Now let's look what happens step by step
double tempDist = 0; //initialize with 0
foreach (Int32 i in distribution.Keys) //step through the added probabilities
{
tempDist += distribution[i]; //get the probabilities and add it to a temporary probability sum
//as a reminder
//rand = 0.254557522132321
//ratio = 5
//rand / ratio = 0,0509115044264642
//tempDist = 0,2
// if will result in true
if (rand / ratio <= tempDist)
{
return i;
}
}
If we didn't apply the ratio the if would be false, but that would be wrong, since we only have a single value inside our dictionary, so no matter what the rand value might be the if statement should return true and that's the natur of rand / ratio.
To "fix" the randomly generated number based on the sum of probabilities we added. The rand / ratio will only be usefull if you didn't provide probabilites that perfectly sum up to 1 = 100%.
eg. if your example would be this
DistributedRandomNumberGenerator drng = new DistributedRandomNumberGenerator();
drng.addNumber(1, 0.2d);
drng.addNumber(2, 0.3d);
drng.addNumber(3, 0.5d);
You can see that the provided probabilities equal to 1 => 0.2 + 0.3 + 0.5, in this case the line
if (rand / ratio <= tempDist)
Would look like this
if (rand / 1 <= tempDist)
Divding by 1 will never change the value and rand / 1 = rand, so the only use case for this devision are cases where you didn't provided a perfect 100% probability, could be either more or less.
As a side note, I would suggest changing your code to this
//call the dictionary distributions (notice the plural)
//dont use .Keys
//var distribution will be a KeyValuePair
foreach (var distribution in distributions)
{
//access the .Value member of the KeyValuePair
tempDist += distribution.Value;
if (rand / ratio <= tempDist)
{
return i;
}
}
Your test routine seems to be correctly translated.
Suppose I have number 87.6 of type double here I want to round it, so I have applied C# build in method of round to get the output something like this
double test2 = 87.6;
Console.WriteLine(Math.Round(test2, 0));
this will generate 88 which is fine. However, I wanted to be round back to 87 my logic would be on 0.8 and not on 0.5. So for instance if my input is 87.8 then I want to get 88 and if my input is 88.7 then I want to round it to 87.
I've got the answer from the comment section here is the logic for this
double test2 = 87.6;
test2 -= 0.3;
Console.WriteLine(Math.Round(test2, 0));
This 0.3 will make the difference
I think this would work:
public static class RoundingExtensions {
public static int RoundWithBreak(this valueToRound, double breakValue = .5) {
if (breakValue <= 0 || breakValue >= 1) { throw new Exception("Must be between 0 and 1") }
var difference = breakValue - .5;
var min = Math.Floor(breakValue);
var toReturn = Math.Round(breakValue - difference, 0);
return toReturn < min ? min : toReturn;
}
}
Consumed:
var test = 8.7;
var result = test.RoundWithBreak(.8);
OK the title is ugly but the problem is quite straightforward:
I have a WPF control where I want to display plot lines. My "viewport" has its limits, and these limits (for example, bottom and top value in object coordinates) are doubles.
So I would like to draw lines at every multiple of, say, 5. If my viewport goes from -8.3 to 22.8, I would get [-5, 0, 5, 10, 15, 20].
I would like to use LINQ, it seems the natural candidate, but cannot find a way...
I imagine something along these lines:
int nlines = (int)((upper_value - lower_value)/step);
var seq = Enumerable.Range((int)(Math.Ceiling(magic_number)), nlines).Select(what_else);
Given values are (double)lower_value, (double)upper_value and (int)step.
Enumerable.Range should do the trick:
Enumerable.Range(lower_value, upper_value - lower_value)
.Where(x => x % step == 0);
Try this code:
double lower_value = -8.3;
double upper_value = 22.8;
int step = 5;
int low = (int)lower_value / step;
int up = (int)upper_value / step;
var tt = Enumerable.Range(low, up - low + 1).Select(i => i * step);
EDIT
This code is intended for all negative values of the lower_value and for positive values which are divisible by the step. To make it work for all other positive values as well, the following correction should be applied:
if (lower_value > step * low)
low++;
The first problem is to determine the nearest factor of your step value from your starting point. Some simple arithmetic can deduce this value:
public static double RoundToMultiple(double value, double multiple)
{
return value - value % multiple;
}
To then create a sequence of all factors of a given value between a range an iterator block is well suited:
public static IEnumerable<double> FactorsInRange(
double start, double end, double factor)
{
var current = RoundToMultiple(start, factor);
while (start < end)
{
yield return start;
current = current + factor;
}
}
If you have the Generate method from MoreLinq, then you could write this without an explicit iterator block:
public static IEnumerable<double> FactorsInRange(
double start, double end, double factor)
{
return Generate(RoundToMultiple(start, factor),
current => current + factor)
.TakeWhile(current => current < end);
}
To avoid having to enumerate every number, you'll have to go outside of LINQ:
List<int> steps;
int currentStep = (lower_value / step) * step; //This takes advantage of integer division to "floor" the factor
steps.Add(currentStep);
while (currentStep < upper_value)
{
currentStep += step;
steps.Add(currentStep);
}
I made some adjustments to the code.
private List<int> getMultiples(double lower_value, double upper_value, int step) {
List<int> steps = new List<int>();
int currentStep = (int)(lower_value / step) * step; //This takes advantage of integer division to "floor" the factor
steps.Add(currentStep);
while (currentStep <= upper_value) {
steps.Add(currentStep);
currentStep += step;
}
return steps;
}
I've started playing around with an attempt to create the following:
public static IEnumerable<List<T>> OptimizedBatches<T>(this IEnumerable<T> items)
Then the client of this extension method would use it like this:
foreach (var list in extracter.EnumerateAll().OptimizedBatches())
{
// at some unknown batch size, process time starts to
// increase at an exponential rate
}
Here's an example:
batch length time
1 100ms
2 102ms
4 110ms
8 111ms
16 118ms
32 119ms
64 134ms
128 500ms <-- doubled length but time it took more than doubled
256 1100ms <-- oh no!!
From the above, the best batch length is 64 because 64/134 is the best ratio of length/time.
So the question is what algorithm to use to automatically pick the optimal batch length based on the successive times between iterator steps?
Here's what I have so far - it's not done yet...
class LengthOptimizer
{
private Stopwatch sw;
private int length = 1;
private List<RateRecord> rateRecords = new List<RateRecord>();
public int Length
{
get
{
if (sw == null)
{
length = 1;
sw = new Stopwatch();
}
else
{
sw.Stop();
rateRecords.Add(new RateRecord { Length = length, ElapsedMilliseconds = sw.ElapsedMilliseconds });
length = rateRecords.OrderByDescending(c => c.Rate).First().Length;
}
sw.Start();
return length;
}
}
}
struct RateRecord
{
public int Length { get; set; }
public long ElapsedMilliseconds { get; set; }
public float Rate { get { return ((float)Length) / ElapsedMilliseconds; } }
}
The main problem I see here is creating the "optimity scale", that is, why do you consider that 32 -> 119ms is acceptable and 256 -> 1100ms is not; or why certain configuration is better than other one.
Once this is done, the algorithm will be straightforward: just returning the ranking values for each input conditions and making decisions based on "which one gets a higher value".
The first step for creating this scale is finding out the variable which better describes the ideal behaviour you are looking for. A simple first approach: length/time. That is, from your inputs:
batch length time ratio1
1 100ms 0.01
2 102ms 0.019
4 110ms 0.036
8 111ms 0.072
16 118ms 0.136
32 119ms 0.269
64 134ms 0.478
128 500ms 0.256
256 1100ms 0.233
The bigger is ratio1, the better. Logically, it is not the same having 0.269 with 32 length than 0.256 with 128 and thus more information has to be accounted for.
You might create a more complex ranking ratio weighting the two involved variables better (e.g., trying different exponents). But I think that the best approach for this problem is creating a system of "zones" and calculating a generic ranking from it. Example:
Zone 1 -> length from 1 to 8; ideal ratio for this zone = 0.1
Zone 2 -> length from 9 to 32; ideal ratio for this zone = 0.3
Zone 3 -> length from 33 to 64; ideal ratio for this zone = 0.45
Zone 4 -> length from 65 to 256; ideal ratio for this zone = 0.35
The ranking associated to each configuration will be the result of putting the given ratio1 with respect to the ideal value for the given zone.
2 102ms 0.019 -> (zone 1) 0.019/0.1 = 0.19 (or 1.9 in a 0-10 scale)
16 118ms 0.136 -> (zone 2) 0.136/0.3 = 0.45 (or 4.5 in a 0-10 scale)
etc.
These values might be compared and thus you would automatically know that the second case is much better than the first one.
This is just a simple example but I guess that provides a good enough insight into what is the real problem here: setting up an accurate ranking allowing to perfectly identify which configuration is better.
I would go with a ranking approach like varocarbas suggested.
Here is an initial implementation to get you started:
public sealed class DataFlowOptimizer<T>
{
private readonly IEnumerable<T> _collection;
private RateRecord bestRate = RateRecord.Default;
private uint batchLength = 1;
private struct RateRecord
{
public static RateRecord Default = new RateRecord { Length = 1, ElapsedTicks = 0 };
private float _rate;
public int Length { get; set; }
public long ElapsedTicks { get; set; }
public float Rate
{
get
{
if(_rate == default(float) && ElapsedTicks > 0)
{
_rate = ((float)Length) / ElapsedTicks;
}
return _rate;
}
}
}
public DataFlowOptimizer(IEnumerable<T> collection)
{
_collection = collection;
}
public int BatchLength { get { return (int)batchLength; } }
public float Rate { get { return bestRate.Rate; } }
public IEnumerable<IList<T>> GetBatch()
{
var stopwatch = new Stopwatch();
var batch = new List<T>();
var benchmarks = new List<RateRecord>(5);
IEnumerator<T> enumerator = null;
try
{
enumerator = _collection.GetEnumerator();
uint count = 0;
stopwatch.Start();
while(enumerator.MoveNext())
{
if(count == batchLength)
{
benchmarks.Add(new RateRecord { Length = BatchLength, ElapsedTicks = stopwatch.ElapsedTicks });
var currentBatch = batch.ToList();
batch.Clear();
if(benchmarks.Count == 10)
{
var currentRate = benchmarks.Average(x => x.Rate);
if(currentRate > bestRate.Rate)
{
bestRate = new RateRecord { Length = BatchLength, ElapsedTicks = (long)benchmarks.Average(x => x.ElapsedTicks) };
batchLength = NextPowerOf2(batchLength);
}
// Set margin of error at 10%
else if((bestRate.Rate * .9) > currentRate)
{
// Shift the current length and make sure it's >= 1
var currentPowOf2 = ((batchLength >> 1) | 1);
batchLength = PreviousPowerOf2(currentPowOf2);
}
benchmarks.Clear();
}
count = 0;
stopwatch.Restart();
yield return currentBatch;
}
batch.Add(enumerator.Current);
count++;
}
}
finally
{
if(enumerator != null)
enumerator.Dispose();
}
stopwatch.Stop();
}
uint PreviousPowerOf2(uint x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return x - (x >> 1);
}
uint NextPowerOf2(uint x)
{
x |= (x >> 1);
x |= (x >> 2);
x |= (x >> 4);
x |= (x >> 8);
x |= (x >> 16);
return (x+1);
}
}
Sample program in LinqPad:
public IEnumerable<int> GetData()
{
return Enumerable.Range(0, 100000000);
}
void Main()
{
var optimizer = new DataFlowOptimizer<int>(GetData());
foreach(var batch in optimizer.GetBatch())
{
string.Format("Length: {0} Rate {1}", optimizer.BatchLength, optimizer.Rate).Dump();
}
}
Describe an objective function f that maps a batch size s and runtime t(s) to a score f(s,t(s))
Try lots of s values and evaluate f(s,t(s)) for each one
Choose the s value that maximizes f
I have a set of values, and an associated percentage for each:
a: 70% chance
b: 20% chance
c: 10% chance
I want to select a value (a, b, c) based on the percentage chance given.
how do I approach this?
my attempt so far looks like this:
r = random.random()
if r <= .7:
return a
elif r <= .9:
return b
else:
return c
I'm stuck coming up with an algorithm to handle this. How should I approach this so it can handle larger sets of values without just chaining together if-else flows.
(any explanation or answers in pseudo-code are fine. a python or C# implementation would be especially helpful)
Here is a complete solution in C#:
public class ProportionValue<T>
{
public double Proportion { get; set; }
public T Value { get; set; }
}
public static class ProportionValue
{
public static ProportionValue<T> Create<T>(double proportion, T value)
{
return new ProportionValue<T> { Proportion = proportion, Value = value };
}
static Random random = new Random();
public static T ChooseByRandom<T>(
this IEnumerable<ProportionValue<T>> collection)
{
var rnd = random.NextDouble();
foreach (var item in collection)
{
if (rnd < item.Proportion)
return item.Value;
rnd -= item.Proportion;
}
throw new InvalidOperationException(
"The proportions in the collection do not add up to 1.");
}
}
Usage:
var list = new[] {
ProportionValue.Create(0.7, "a"),
ProportionValue.Create(0.2, "b"),
ProportionValue.Create(0.1, "c")
};
// Outputs "a" with probability 0.7, etc.
Console.WriteLine(list.ChooseByRandom());
For Python:
>>> import random
>>> dst = 70, 20, 10
>>> vls = 'a', 'b', 'c'
>>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
>>> for _ in range(12): print random.choice(picks),
...
a c c b a a a a a a a a
>>> for _ in range(12): print random.choice(picks),
...
a c a c a b b b a a a a
>>> for _ in range(12): print random.choice(picks),
...
a a a a c c a c a a c a
>>>
General idea: make a list where each item is repeated a number of times proportional to the probability it should have; use random.choice to pick one at random (uniformly), this will match your required probability distribution. Can be a bit wasteful of memory if your probabilities are expressed in peculiar ways (e.g., 70, 20, 10 makes a 100-items list where 7, 2, 1 would make a list of just 10 items with exactly the same behavior), but you could divide all the counts in the probabilities list by their greatest common factor if you think that's likely to be a big deal in your specific application scenario.
Apart from memory consumption issues, this should be the fastest solution -- just one random number generation per required output result, and the fastest possible lookup from that random number, no comparisons &c. If your likely probabilities are very weird (e.g., floating point numbers that need to be matched to many, many significant digits), other approaches may be preferable;-).
Knuth references Walker's method of aliases. Searching on this, I find http://code.activestate.com/recipes/576564-walkers-alias-method-for-random-objects-with-diffe/ and http://prxq.wordpress.com/2006/04/17/the-alias-method/. This gives the exact probabilities required in constant time per number generated with linear time for setup (curiously, n log n time for setup if you use exactly the method Knuth describes, which does a preparatory sort you can avoid).
Take the list of and find the cumulative total of the weights: 70, 70+20, 70+20+10. Pick a random number greater than or equal to zero and less than the total. Iterate over the items and return the first value for which the cumulative sum of the weights is greater than this random number:
def select( values ):
variate = random.random() * sum( values.values() )
cumulative = 0.0
for item, weight in values.items():
cumulative += weight
if variate < cumulative:
return item
return item # Shouldn't get here, but just in case of rounding...
print select( { "a": 70, "b": 20, "c": 10 } )
This solution, as implemented, should also be able to handle fractional weights and weights that add up to any number so long as they're all non-negative.
Let T = the sum of all item weights
Let R = a random number between 0 and T
Iterate the item list subtracting each item weight from R and return the item that causes the result to become <= 0.
def weighted_choice(probabilities):
random_position = random.random() * sum(probabilities)
current_position = 0.0
for i, p in enumerate(probabilities):
current_position += p
if random_position < current_position:
return i
return None
Because random.random will always return < 1.0, the final return should never be reached.
import random
def selector(weights):
i=random.random()*sum(x for x,y in weights)
for w,v in weights:
if w>=i:
break
i-=w
return v
weights = ((70,'a'),(20,'b'),(10,'c'))
print [selector(weights) for x in range(10)]
it works equally well for fractional weights
weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
print [selector(weights) for x in range(10)]
If you have a lot of weights, you can use bisect to reduce the number of iterations required
import random
import bisect
def make_acc_weights(weights):
acc=0
acc_weights = []
for w,v in weights:
acc+=w
acc_weights.append((acc,v))
return acc_weights
def selector(acc_weights):
i=random.random()*sum(x for x,y in weights)
return weights[bisect.bisect(acc_weights, (i,))][1]
weights = ((70,'a'),(20,'b'),(10,'c'))
acc_weights = make_acc_weights(weights)
print [selector(acc_weights) for x in range(100)]
Also works fine for fractional weights
weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
acc_weights = make_acc_weights(weights)
print [selector(acc_weights) for x in range(100)]
today, the update of python document give an example to make a random.choice() with weighted probabilities:
If the weights are small integer ratios, a simple technique is to build a sample population with repeats:
>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
>>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
>>> random.choice(population)
'Green'
A more general approach is to arrange the weights in a cumulative distribution with itertools.accumulate(), and then locate the random value with bisect.bisect():
>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]
'Blue'
one note: itertools.accumulate() needs python 3.2 or define it with the Equivalent.
I think you can have an array of small objects (I implemented in Java although I know a little bit C# but I am afraid can write wrong code), so you may need to port it yourself. The code in C# will be much smaller with struct, var but I hope you get the idea
class PercentString {
double percent;
String value;
// Constructor for 2 values
}
ArrayList<PercentString> list = new ArrayList<PercentString();
list.add(new PercentString(70, "a");
list.add(new PercentString(20, "b");
list.add(new PercentString(10, "c");
double percent = 0;
for (int i = 0; i < list.size(); i++) {
PercentString p = list.get(i);
percent += p.percent;
if (random < percent) {
return p.value;
}
}
If you are really up to speed and want to generate the random values quickly, the Walker's algorithm mcdowella mentioned in https://stackoverflow.com/a/3655773/1212517 is pretty much the best way to go (O(1) time for random(), and O(N) time for preprocess()).
For anyone who is interested, here is my own PHP implementation of the algorithm:
/**
* Pre-process the samples (Walker's alias method).
* #param array key represents the sample, value is the weight
*/
protected function preprocess($weights){
$N = count($weights);
$sum = array_sum($weights);
$avg = $sum / (double)$N;
//divide the array of weights to values smaller and geq than sum/N
$smaller = array_filter($weights, function($itm) use ($avg){ return $avg > $itm;}); $sN = count($smaller);
$greater_eq = array_filter($weights, function($itm) use ($avg){ return $avg <= $itm;}); $gN = count($greater_eq);
$bin = array(); //bins
//we want to fill N bins
for($i = 0;$i<$N;$i++){
//At first, decide for a first value in this bin
//if there are small intervals left, we choose one
if($sN > 0){
$choice1 = each($smaller);
unset($smaller[$choice1['key']]);
$sN--;
} else{ //otherwise, we split a large interval
$choice1 = each($greater_eq);
unset($greater_eq[$choice1['key']]);
}
//splitting happens here - the unused part of interval is thrown back to the array
if($choice1['value'] >= $avg){
if($choice1['value'] - $avg >= $avg){
$greater_eq[$choice1['key']] = $choice1['value'] - $avg;
}else if($choice1['value'] - $avg > 0){
$smaller[$choice1['key']] = $choice1['value'] - $avg;
$sN++;
}
//this bin comprises of only one value
$bin[] = array(1=>$choice1['key'], 2=>null, 'p1'=>1, 'p2'=>0);
}else{
//make the second choice for the current bin
$choice2 = each($greater_eq);
unset($greater_eq[$choice2['key']]);
//splitting on the second interval
if($choice2['value'] - $avg + $choice1['value'] >= $avg){
$greater_eq[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
}else{
$smaller[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
$sN++;
}
//this bin comprises of two values
$choice2['value'] = $avg - $choice1['value'];
$bin[] = array(1=>$choice1['key'], 2=>$choice2['key'],
'p1'=>$choice1['value'] / $avg,
'p2'=>$choice2['value'] / $avg);
}
}
$this->bins = $bin;
}
/**
* Choose a random sample according to the weights.
*/
public function random(){
$bin = $this->bins[array_rand($this->bins)];
$randValue = (lcg_value() < $bin['p1'])?$bin[1]:$bin[2];
}
Here is my version that can apply to any IList and normalize the weight. It is based on Timwi's solution : selection based on percentage weighting
/// <summary>
/// return a random element of the list or default if list is empty
/// </summary>
/// <param name="e"></param>
/// <param name="weightSelector">
/// return chances to be picked for the element. A weigh of 0 or less means 0 chance to be picked.
/// If all elements have weight of 0 or less they all have equal chances to be picked.
/// </param>
/// <returns></returns>
public static T AnyOrDefault<T>(this IList<T> e, Func<T, double> weightSelector)
{
if (e.Count < 1)
return default(T);
if (e.Count == 1)
return e[0];
var weights = e.Select(o => Math.Max(weightSelector(o), 0)).ToArray();
var sum = weights.Sum(d => d);
var rnd = new Random().NextDouble();
for (int i = 0; i < weights.Length; i++)
{
//Normalize weight
var w = sum == 0
? 1 / (double)e.Count
: weights[i] / sum;
if (rnd < w)
return e[i];
rnd -= w;
}
throw new Exception("Should not happen");
}
I've my own solution for this:
public class Randomizator3000
{
public class Item<T>
{
public T value;
public float weight;
public static float GetTotalWeight<T>(Item<T>[] p_itens)
{
float __toReturn = 0;
foreach(var item in p_itens)
{
__toReturn += item.weight;
}
return __toReturn;
}
}
private static System.Random _randHolder;
private static System.Random _random
{
get
{
if(_randHolder == null)
_randHolder = new System.Random();
return _randHolder;
}
}
public static T PickOne<T>(Item<T>[] p_itens)
{
if(p_itens == null || p_itens.Length == 0)
{
return default(T);
}
float __randomizedValue = (float)_random.NextDouble() * (Item<T>.GetTotalWeight(p_itens));
float __adding = 0;
for(int i = 0; i < p_itens.Length; i ++)
{
float __cacheValue = p_itens[i].weight + __adding;
if(__randomizedValue <= __cacheValue)
{
return p_itens[i].value;
}
__adding = __cacheValue;
}
return p_itens[p_itens.Length - 1].value;
}
}
And using it should be something like that (thats in Unity3d)
using UnityEngine;
using System.Collections;
public class teste : MonoBehaviour
{
Randomizator3000.Item<string>[] lista;
void Start()
{
lista = new Randomizator3000.Item<string>[10];
lista[0] = new Randomizator3000.Item<string>();
lista[0].weight = 10;
lista[0].value = "a";
lista[1] = new Randomizator3000.Item<string>();
lista[1].weight = 10;
lista[1].value = "b";
lista[2] = new Randomizator3000.Item<string>();
lista[2].weight = 10;
lista[2].value = "c";
lista[3] = new Randomizator3000.Item<string>();
lista[3].weight = 10;
lista[3].value = "d";
lista[4] = new Randomizator3000.Item<string>();
lista[4].weight = 10;
lista[4].value = "e";
lista[5] = new Randomizator3000.Item<string>();
lista[5].weight = 10;
lista[5].value = "f";
lista[6] = new Randomizator3000.Item<string>();
lista[6].weight = 10;
lista[6].value = "g";
lista[7] = new Randomizator3000.Item<string>();
lista[7].weight = 10;
lista[7].value = "h";
lista[8] = new Randomizator3000.Item<string>();
lista[8].weight = 10;
lista[8].value = "i";
lista[9] = new Randomizator3000.Item<string>();
lista[9].weight = 10;
lista[9].value = "j";
}
void Update ()
{
Debug.Log(Randomizator3000.PickOne<string>(lista));
}
}
In this example each value has a 10% chance do be displayed as a debug =3
Based loosely on python's numpy.random.choice(a=items, p=probs), which takes an array and a probability array of the same size.
public T RandomChoice<T>(IEnumerable<T> a, IEnumerable<double> p)
{
IEnumerator<T> ae = a.GetEnumerator();
Random random = new Random();
double target = random.NextDouble();
double accumulator = 0;
foreach (var prob in p)
{
ae.MoveNext();
accumulator += prob;
if (accumulator > target)
{
break;
}
}
return ae.Current;
}
The probability array p must sum to (approx.) 1. This is to keep it consistent with the numpy interface (and mathematics), but you could easily change that if you wanted.