How to correctly calculate Fisher Transform indicator - c#

I'm writing a small technical analysis library that consists of items that are not availabile in TA-lib. I've started with an example I found on cTrader and matched it against the code found in the TradingView version.
Here's the Pine Script code from TradingView:
len = input(9, minval=1, title="Length")
high_ = highest(hl2, len)
low_ = lowest(hl2, len)
round_(val) => val > .99 ? .999 : val < -.99 ? -.999 : val
value = 0.0
value := round_(.66 * ((hl2 - low_) / max(high_ - low_, .001) - .5) + .67 * nz(value[1]))
fish1 = 0.0
fish1 := .5 * log((1 + value) / max(1 - value, .001)) + .5 * nz(fish1[1])
fish2 = fish1[1]
Here's my attempt to implement the indicator:
public class FisherTransform : IndicatorBase
{
public int Length = 9;
public decimal[] Fish { get; set; }
public decimal[] Trigger { get; set; }
decimal _maxHigh;
decimal _minLow;
private decimal _value1;
private decimal _lastValue1;
public FisherTransform(IEnumerable<Candle> candles, int length)
: base(candles)
{
Length = length;
RequiredCount = Length;
_lastValue1 = 1;
}
protected override void Initialize()
{
Fish = new decimal[Series.Length];
Trigger = new decimal[Series.Length];
}
public override void Compute(int startIndex = 0, int? endIndex = null)
{
if (endIndex == null)
endIndex = Series.Length;
for (int index = 0; index < endIndex; index++)
{
if (index == 1)
{
Fish[index - 1] = 1;
}
_minLow = Series.Average.Lowest(Length, index);
_maxHigh = Series.Average.Highest(Length, index);
_value1 = Maths.Normalize(0.66m * ((Maths.Divide(Series.Average[index] - _minLow, Math.Max(_maxHigh - _minLow, 0.001m)) - 0.5m) + 0.67m * _lastValue1));
_lastValue1 = _value1;
Fish[index] = 0.5m * Maths.Log(Maths.Divide(1 + _value1, Math.Max(1 - _value1, .001m))) + 0.5m * Fish[index - 1];
Trigger[index] = Fish[index - 1];
}
}
}
IndicatorBase class and CandleSeries class
Math Helpers
The problem
The output values appear to be within the expected range however my Fisher Transform cross-overs do not match up with what I am seeing on TradingView's version of the indicator.
Question
How do I properly implement the Fisher Transform indicator in C#? I'd like this to match TradingView's Fisher Transform output.
What I Know
I've check my data against other indicators that I have personally written and indicators from TA-Lib and those indicators pass my unit tests. I've also checked my data against the TradingView data candle by candle and found that my data matches as expected. So I don't suspect my data is the issue.
Specifics
CSV Data - NFLX 5 min agg
Pictured below is the above-shown Fisher Transform code applied to a TradingView chart. My goal is to match this output as close as possible.
Fisher Cyan
Trigger Magenta
Expected Outputs:
Crossover completed at 15:30 ET
Approx Fisher Value is 2.86
Approx Trigger Value is 1.79
Crossover completed at 10:45 ET
Approx Fisher Value is -3.67
Approx Trigger Value is -3.10
My Actual Outputs:
Crossover completed at 15:30 ET
My Fisher Value is 1.64
My Trigger Value is 1.99
Crossover completed at 10:45 ET
My Fisher Value is -1.63
My Trigger Value is -2.00
Bounty
To make your life easier I'm including a small console application
complete with passing and failing unit tests. All unit tests are
conducted against the same data set. The passing unit tests are from a
tested working Simple Moving Average indicator. The failing unit
tests are against the Fisher Transform indicator in question.
Project Files (updated 5/14)
Help get my FisherTransform tests to pass and I'll award the bounty.
Just comment if you need any additional resources or information.
Alternative Answers that I'll consider
Submit your own working FisherTransform in C#
Explain why my FisherTransform is actually working as expected

The code has two errors.
1) wrong extra brackets. The correct line is:
_value1 = Maths.Normalize(0.66m * (Maths.Divide(Series.Average[index] - _minLow, Math.Max(_maxHigh - _minLow, 0.001m)) - 0.5m) + 0.67m * _lastValue1);
2) Min and max functions must be:
public static decimal Highest(this decimal[] series, int length, int index)
{
var maxVal = series[index]; // <----- HERE WAS AN ERROR!
var lookback = Math.Max(index - length, 0);
for (int i = index; i-- > lookback;)
maxVal = Math.Max(series[i], maxVal);
return maxVal;
}
public static decimal Lowest(this decimal[] series, int length, int index)
{
var minVal = series[index]; // <----- HERE WAS AN ERROR!
var lookback = Math.Max(index - length, 0);
for (int i = index; i-- > lookback;)
{
//if (series[i] != 0) // <----- HERE WAS AN ERROR!
minVal = Math.Min(series[i], minVal);
}
return minVal;
}
3) confusing test params. Please recheck your unittest values. AFTER THE UPDATE TESTS STILL NOT FIXED. For an example, the first FisherTransforms_ValuesAreReasonablyClose_First() has mixed values
var fish = result.Fish.Last(); //is equal to -3.1113144510775780365063063706
var trig = result.Trigger.Last(); //is equal to -3.6057793808025449204415435710
// TradingView Values for NFLX 5m chart at 10:45 ET
var fisherValue = -3.67m;
var triggerValue = -3.10m;

Related

Nearest value from user input in an array C#

so in my application , I read some files into it and ask the user for a number , in these files there a lot of numbers and I am trying to find the nearest value when the number they enter is not in the file. So far I have as following
static int nearest(int close_num, int[] a)
{
foreach (int bob in a)
{
if ((close_num -= bob) <= 0)
return bob;
}
return -1;
}
Console.WriteLine("Enter a number to find out if is in the selected Net File: ");
int i3 = Convert.ToInt32(Console.ReadLine());
bool checker = false;
//Single nearest = 0;
//linear search#1
for (int i = 0; i < a.Length; i++)//looping through array
{
if(a[i] == i3)//checking to see the value is found in the array
{
Console.WriteLine("Value found and the position of it in the descending value of the selected Net File is: " + a[i]);
checker = true;
}
else
{
int found = nearest(i3,a);
Console.WriteLine("Cannot find this number in the Net File however here the closest number to that: " + found );
//Console.WriteLine("Cannot find this number in the Net File however here the closest number to that : " + nearest);
}
}
When a value that is in the file is entered the output is fine , but when it comes to the nearest value I cannot figure a way. I can't use this such as BinarySearchArray for this. a = the array whilst i3 is the value the user has entered. Would a binary search algorithm just be simpler for this?
Any help would be appreciated.
You need to make a pass over all the elements of the array, comparing each one in turn to find the smallest difference. At the same time, keep a note of the current nearest value.
There are many ways to do this; here's a fairly simple one:
static int nearest(int close_num, int[] a)
{
int result = -1;
long smallestDelta = long.MaxValue;
foreach (int bob in a)
{
long delta = (bob > close_num) ? (bob - close_num) : (close_num - bob);
if (delta < smallestDelta)
{
smallestDelta = delta;
result = bob;
}
}
return result;
}
Note that delta is calculated so that it is the absolute value of the difference.
Well, first we should define, what is nearest. Assuming that,
int nearest for given int number is the item of int[] a such that Math.Abs(nearest - number) is the smallest possible value
we can put it as
static int nearest(int number, int[] a)
{
long diff = -1;
int result = 0;
foreach (int item in a)
{
// actual = Math.Abs((long)item - number);
long actual = (long)item - number;
if (actual < 0)
actual = -actual;
// if item is the very first value or better than result
if (diff < 0 || actual < diff) {
result = item;
diff = actual;
}
}
return result;
}
The only tricky part is long for diff: it may appear that item - number exceeds int range (and will either have IntegerOverflow exceprion thrown or *invalid answer), e.g.
int[] a = new int[] {int.MaxValue, int.MaxValue - 1};
Console.Write(nearest(int.MinValue, a));
Note, that expected result is 2147483646, not 2147483647
what about LINQ ?
var nearestNumber = a.OrderBy(x => Math.Abs(x - i3)).First();
Just iterate through massive and find the minimal delta between close_num and array members
static int nearest(int close_num, int[] a)
{
// initialize as big number, 1000 just an example
int min_delta=1000;
int result=-1;
foreach (int bob in a)
{
if (Math.Abs(bob-close_num) <= min_delta)
{
min_delta = bob-close_num;
result = bob;
}
}
return result;
}

NAudio FFT returns small and equal magnitude values for all frequencies

I'm working on a project with NAudio 1.9 and I want to compute an fft for an entire song, i.e split the song in chunks of equal size and compute fft for each chunk. The problem is that NAudio FFT function returns really small and equal values for any freq in the freq spectrum.
I searched for previous related posts but none seemed to help me.
The code that computes FFT using NAudio:
public IList<FrequencySpectrum> Fft(uint windowSize) {
IList<Complex[]> timeDomainChunks = this.SplitInChunks(this.audioContent, windowSize);
return timeDomainChunks.Select(this.ToFrequencySpectrum).ToList();
}
private IList<Complex[]> SplitInChunks(float[] audioContent, uint chunkSize) {
IList<Complex[]> splittedContent = new List<Complex[]>();
for (uint k = 0; k < audioContent.Length; k += chunkSize) {
long size = k + chunkSize < audioContent.Length ? chunkSize : audioContent.Length - k;
Complex[] chunk = new Complex[size];
for (int i = 0; i < chunk.Length; i++) {
//i've tried windowing here but didn't seem to help me
chunk[i].X = audioContent[k + i];
chunk[i].Y = 0;
}
splittedContent.Add(chunk);
}
return splittedContent;
}
private FrequencySpectrum ToFrequencySpectrum(Complex[] timeDomain) {
int m = (int) Math.Log(timeDomain.Length, 2);
//true = forward fft
FastFourierTransform.FFT(true, m, timeDomain);
return new FrequencySpectrum(timeDomain, 44100);
}
The FrequencySpectrum:
public struct FrequencySpectrum {
private readonly Complex[] frequencyDomain;
private readonly uint samplingFrequency;
public FrequencySpectrum(Complex[] frequencyDomain, uint samplingFrequency) {
if (frequencyDomain.Length == 0) {
throw new ArgumentException("Argument value must be greater than 0", nameof(frequencyDomain));
}
if (samplingFrequency == 0) {
throw new ArgumentException("Argument value must be greater than 0", nameof(samplingFrequency));
}
this.frequencyDomain = frequencyDomain;
this.samplingFrequency = samplingFrequency;
}
//returns magnitude for freq
public float this[uint freq] {
get {
if (freq >= this.samplingFrequency) {
throw new IndexOutOfRangeException();
}
//find corresponding bin
float k = freq / ((float) this.samplingFrequency / this.FftWindowSize);
Complex c = this.frequencyDomain[checked((uint) k)];
return (float) Math.Sqrt(c.X * c.X + c.Y * c.Y);
}
}
}
for a file that contains a sine wave of 440Hz
expected output: values like 0.5 for freq=440 and 0 for the others
actual output: values like 0.000168153987f for any freq in the spectrum
It seems that I made 4 mistakes:
1) Here I'm asumming that sampling freq is 44100. This was not the reason my code wasn't working, though
return new FrequencySpectrum(timeDomain, 44100);
2) Always make a visual representation of your output data! I must learn this lesson... It seems that for a file containing a 440Hz sine wave I'm getting the right result but...
3) The frequency spectrum is a little shifted from what I was expecting because of this:
int m = (int) Math.Log(timeDomain.Length, 2);
FastFourierTransform.FFT(true, m, timeDomain);
timeDomain is an array of size 44100 becaused that's the value of windowSize (I called the method with windowSize = 44100), but FFT method expects a window size with a value power of 2. I'm saying "Here, NAudio, compute me the fft of this array that has 44100 elements, but take into account only the first 32768". I didn't realize that this was going to have serious implications on the result:
float k = freq / ((float) this.samplingFrequency / this.FftWindowSize);
Here this.FftWindowSize is a property based on the size of the array, not on m. So, after visualizing the result I found out that magnitude of 440Hz freq was actually corresponding to the call:
spectrum[371]
instead of
spectrum[440]
So, my mistake was that the window size of fft (m) was not corresponding to the actual length of the array (FrequencySpectrum.FftWindowSize).
4) The small values that I was receiving for the magnitudes came from the fact that the audio file on which I was testing my code wasn't recorded with enough gain.

Split a list into n different sublists matching condition while remaining as close as possible

My question has a lot in common with this one:
Split a list of numbers into n chunks such that the chunks have (close to) equal sums and keep the original order
The main difference is that I have a slightly different metric to figure out which split is "best", and I have an arbitrary condition to respect while doing so.
Every item in my list has two components. Weight and Volume. I have to split them into n different subgroups, while having the total weights of every subgroup as close as possible. The way to test that is simply to get the difference between the heaviest and the lightest subgroup. The smaller this difference is, the better. This means that subgroups [15][15][15][10] are worth the same in final score as subgroups [15][13][11][10].
Then, this is the part I can't figure out how to add into the algorithms proposed as answers to the linked question, I have a hard condition that has to be respected. There is a maximum volume [v] for each subgroup, and none of them can go above it. Going above does not reduce score, it invalidates the entire answer.
How could the algorithms (and code snippets) used as answers to the previous be adapated to take into account the volume condition and the slightly different scoring method?
I am looking for code, pseudo-code or written (detailed) explanation of how this could be done. The question is taggued C# because that's what I'm using, but I am confident that I can translate from any non-esoteric language so feel free to go with whatever you like if you answer with code.
As mentioned in the other question, this problem is very complex and finding the best solution might not be feasible in reasonable computing time, therefore I am looking for an answer that gives a "good enough" solution, even if it might not be the best.
I've formulated a deterministic solution for the given problem using dynamic-programming, sharing the code for the same https://ideone.com/pkfyxg
#include<iostream>
#include<vector>
#include<climits>
#include<cstring>
#include<algorithm>
using namespace std;
// Basic structure for the given problem
struct Item {
float weight;
float volume;
Item(float weight, float volume) {
this->weight = weight;
this->volume = volume;
}
bool operator<(const Item &other) const {
if(weight == other.weight) {
return volume < other.volume;
}
return weight < other.weight;
}
};
// Some constant values
const static int INF = INT_MAX / 100;
const static int MAX_NUM_OF_ITEMS = 1000;
const static int MAX_N = 1000;
// Parameters that we define in main()
float MAX_VOLUME;
vector<Item> items;
// DP lookup tables
int till[MAX_NUM_OF_ITEMS];
float dp[MAX_NUM_OF_ITEMS][MAX_N];
/**
* curIndex: the starting index from where we aim to formulate a new group
* left: number of groups still left to be formed
*/
float solve(int curIndex, int left) {
// Baseline condition
if(curIndex >= items.size() && left == 0) {
return 0;
}
if(curIndex >= items.size() && left != 0) {
return INF;
}
// If we have no more groups to be found, but there still are items left
// then invalidate the solution by returning INF
if(left <= 0 && curIndex < items.size()) {
return INF;
}
// Our lookup dp table
if(dp[curIndex][left] >= 0) {
return dp[curIndex][left];
}
// minVal is the metric to optimize which is the `sum of the differences
// for each group` we intialize it as INF
float minVal = INF;
// The volume of the items we're going to pick for this group
float curVolume = 0;
// Let's try to see how large can this group be by trying to expand it
// one item at a time
for(int i = curIndex; i < items.size(); i++) {
// Verfiy we can put the item i in this group or not
if(curVolume + items[i].volume > MAX_VOLUME) {
break;
}
curVolume += items[i].volume;
// Okay, let's see if it's possible for this group to exist
float val = (items[i].weight - items[curIndex].weight) + solve(i + 1, left - 1);
if(minVal >= val) {
minVal = val;
// The lookup table till tells that the group starting at index
// curIndex, expands till i, i.e. [curIndex, i] is our valid group
till[curIndex] = i + 1;
}
}
// Store the result in dp for memoization and return the value
return dp[curIndex][left] = minVal;
}
int main() {
// The maximum value for Volume
MAX_VOLUME = 6;
// The number of groups we need
int NUM_OF_GROUPS = 5;
items = vector<Item>({
// Item(weight, volume)
Item(5, 2),
Item(2, 1),
Item(10, 3),
Item(7, 2),
Item(3, 1),
Item(5, 3),
Item(4, 3),
Item(3, 2),
Item(10, 1),
Item(11, 3),
Item(19, 1),
Item(21, 2)
});
// Initialize the dp with -1 as default value for unexplored states
memset(dp, -1, sizeof dp);
// Sort the items based on weights first
sort(items.begin(), items.end());
// Solve for the given problem
int val = solve(0, NUM_OF_GROUPS);
// If return value is INF, it means we couldn't distribute it in n
// groups due to the contraint on volume or maybe the number of groups
// was greater than the number of items we had ^_^
if(val >= INF) {
cout << "Not possible to distribute in " << NUM_OF_GROUPS;
return 0;
}
// If a solution exists, use the lookup till array to find which items
// belong to which set
int curIndex = 0, group = 1;
while(curIndex < items.size()) {
cout << "Group #" << group << ": ";
for(int i = curIndex; i < till[curIndex]; i++)
cout << "(" << items[i].weight << ", " << items[i].volume << ") ";
cout << '\n';
group++;
curIndex = till[curIndex];
}
}
I've added comments to the code to help you understand it's working better. The overall runtime complexity for the same is O(num_of_groups * (num_of_items)2) Let me know if you need more explanation around the same ^^;

C# Simple Constrained Weighted Average Algorithm with/without Solver

I'm at a loss as to why I can't get this seemingly simple problem solved using Microsoft Solver Foundation.
All I need is to modify the weights (numbers) of certain observations to ensure that no 1 observation's weight AS A PERCENTAGE exceeds 25%. This is for the purposes of later calculating a constrained weighted average with the results of this algorithm.
For example, given the 5 weights of { 45, 100, 33, 500, 28 }, I would expect the result of this algorithm to be { 45, 53, 33, 53, 28 }, where 2 of the numbers had to be reduced such that they're within the 25% threshold of the new total (212 = 45+53+33+53+28) while the others remained untouched. Note that even though initially, the 2nd weight of 100 was only 14% of the total (706), as a result of decreasing the 4th weight of 500, it subsequently pushed up the % of the other observations and therein lies the only challenge with this.
I tried to recreate this using Solver only for it to tell me that it is the solution is "Infeasible" and it just returns all 1s. Update: solution need not use Solver, any alternative is welcome so long as it is fast when dealing with a decent number of weights.
var solver = SolverContext.GetContext();
var model = solver.CreateModel();
var decisionList = new List<Decision>();
decisionList.Add(new Decision(Domain.IntegerRange(1, 45), "Dec1"));
decisionList.Add(new Decision(Domain.IntegerRange(1, 100), "Dec2"));
decisionList.Add(new Decision(Domain.IntegerRange(1, 33), "Dec3"));
decisionList.Add(new Decision(Domain.IntegerRange(1, 500), "Dec4"));
decisionList.Add(new Decision(Domain.IntegerRange(1, 28), "Dec5"));
model.AddDecisions(decisionList.ToArray());
int weightLimit = 25;
foreach (var decision in model.Decisions)
{
model.AddConstraint(decision.Name + "weightLimit", 100 * (decision / Model.Sum(model.Decisions.ToArray())) <= weightLimit);
}
model.AddGoal("calcGoal", GoalKind.Maximize, Model.Sum(model.Decisions.ToArray()));
var solution = solver.Solve();
foreach (var decision in model.Decisions)
{
Debug.Print(decision.GetDouble().ToString());
}
Debug.Print("Solution Quality: " + solution.Quality.ToString());
Any help with this would be very much appreciated, thanks in advance.
I ditched Solver b/c it didn't live up to its name imo (or I didn't live up to its standards :)). Below is where I landed. Because this function gets used many times and on large lists of input weights, efficiency and performance are key so this function attempts to do the least # of iterations possible (let me know if anyone has any suggested improvements though). The results get used for a weighted average so I use "AttributeWeightPair" to store the value (attribute) and its weight and the function below is what modifies the weights to be within the constraint when given a list of these AWPs. The function assumes that weightLimit is passed in as a %, e.g. 25% gets passed in as 25, not 0.25 --- ok I'll stop stating what'll be obvious from the code - so here it is:
public static List<AttributeWeightPair<decimal>> WeightLimiter(List<AttributeWeightPair<decimal>> source, decimal weightLimit)
{
weightLimit /= 100; //convert to percentage
var zeroWeights = source.Where(w => w.Weight == 0).ToList();
var nonZeroWeights = source.Where(w => w.Weight > 0).ToList();
if (nonZeroWeights.Count == 0)
return source;
//return equal weights if given infeasible constraint
if ((1m / nonZeroWeights.Count()) > weightLimit)
{
nonZeroWeights.ForEach(w => w.Weight = 1);
return nonZeroWeights.Concat(zeroWeights).ToList();
}
//return original list if weight-limiting is unnecessary
if ((nonZeroWeights.Max(w => w.Weight) / nonZeroWeights.Sum(w => w.Weight)) <= weightLimit)
{
return source;
}
//sort (ascending) and store original weights
nonZeroWeights = nonZeroWeights.OrderBy(w => w.Weight).ToList();
var originalWeights = nonZeroWeights.Select(w => w.Weight).ToList();
//set starting point and determine direction from there
var initialSumWeights = nonZeroWeights.Sum(w => w.Weight);
var initialLimit = weightLimit * initialSumWeights;
var initialSuspects = nonZeroWeights.Where(w => w.Weight > initialLimit).ToList();
var initialTarget = weightLimit * (initialSumWeights - (initialSuspects.Sum(w => w.Weight) - initialLimit * initialSuspects.Count()));
var antepenultimateIndex = Math.Max(nonZeroWeights.FindLastIndex(w => w.Weight <= initialTarget), 1); //needs to be at least 1
for (int i = antepenultimateIndex; i < nonZeroWeights.Count(); i++)
{
nonZeroWeights[i].Weight = originalWeights[antepenultimateIndex - 1]; //set cap equal to the preceding weight
}
bool goingUp = (nonZeroWeights[antepenultimateIndex].Weight / nonZeroWeights.Sum(w => w.Weight)) > weightLimit ? false : true;
//Procedure 1 - find the weight # at which a cap would result in a weight % just UNDER the weight limit
int penultimateIndex = antepenultimateIndex;
bool justUnderTarget = false;
while (!justUnderTarget)
{
for (int i = penultimateIndex; i < nonZeroWeights.Count(); i++)
{
nonZeroWeights[i].Weight = originalWeights[penultimateIndex - 1]; //set cap equal to the preceding weight
}
var currentMaxPcntWeight = nonZeroWeights[penultimateIndex].Weight / nonZeroWeights.Sum(w => w.Weight);
if (currentMaxPcntWeight == weightLimit)
{
return nonZeroWeights.Concat(zeroWeights).ToList();
}
else if (goingUp && currentMaxPcntWeight < weightLimit)
{
nonZeroWeights[penultimateIndex].Weight = originalWeights[penultimateIndex]; //reset
if (penultimateIndex < nonZeroWeights.Count() - 1)
penultimateIndex++; //move up
else break;
}
else if (!goingUp && currentMaxPcntWeight > weightLimit)
{
if (penultimateIndex > 1)
penultimateIndex--; //move down
else break;
}
else
{
justUnderTarget = true;
}
}
if (goingUp) //then need to back up a step
{
penultimateIndex = (penultimateIndex > 1 ? penultimateIndex - 1 : 1);
for (int i = penultimateIndex; i < nonZeroWeights.Count(); i++)
{
nonZeroWeights[i].Weight = originalWeights[penultimateIndex - 1];
}
}
//Procedure 2 - increment the modified weights (subject to a cap equal to their original values) until the weight limit is hit (allowing a very slight overage for the last term in some cases)
int ultimateIndex = penultimateIndex;
var sumWeights = nonZeroWeights.Sum(w => w.Weight); //use this counter instead of summing every time for condition check within loop
bool justOverTarget = false;
while (!justOverTarget)
{
for (int i = ultimateIndex; i < nonZeroWeights.Count(); i++)
{
if (nonZeroWeights[i].Weight + 1 > originalWeights[i])
{
if (ultimateIndex < nonZeroWeights.Count() - 1)
ultimateIndex++;
else justOverTarget = true;
}
else
{
nonZeroWeights[i].Weight++;
sumWeights++;
}
}
if ((nonZeroWeights.Last().Weight / sumWeights) >= weightLimit)
{
justOverTarget = true;
}
}
return nonZeroWeights.Concat(zeroWeights).ToList();
}
public class AttributeWeightPair<T>
{
public T Attribute { get; set; }
public decimal? Weight { get; set; }
public AttributeWeightPair(T attribute, decimal? count)
{
this.Attribute = attribute;
this.Weight = count;
}
}

selection based on percentage weighting

I have a set of values, and an associated percentage for each:
a: 70% chance
b: 20% chance
c: 10% chance
I want to select a value (a, b, c) based on the percentage chance given.
how do I approach this?
my attempt so far looks like this:
r = random.random()
if r <= .7:
return a
elif r <= .9:
return b
else:
return c
I'm stuck coming up with an algorithm to handle this. How should I approach this so it can handle larger sets of values without just chaining together if-else flows.
(any explanation or answers in pseudo-code are fine. a python or C# implementation would be especially helpful)
Here is a complete solution in C#:
public class ProportionValue<T>
{
public double Proportion { get; set; }
public T Value { get; set; }
}
public static class ProportionValue
{
public static ProportionValue<T> Create<T>(double proportion, T value)
{
return new ProportionValue<T> { Proportion = proportion, Value = value };
}
static Random random = new Random();
public static T ChooseByRandom<T>(
this IEnumerable<ProportionValue<T>> collection)
{
var rnd = random.NextDouble();
foreach (var item in collection)
{
if (rnd < item.Proportion)
return item.Value;
rnd -= item.Proportion;
}
throw new InvalidOperationException(
"The proportions in the collection do not add up to 1.");
}
}
Usage:
var list = new[] {
ProportionValue.Create(0.7, "a"),
ProportionValue.Create(0.2, "b"),
ProportionValue.Create(0.1, "c")
};
// Outputs "a" with probability 0.7, etc.
Console.WriteLine(list.ChooseByRandom());
For Python:
>>> import random
>>> dst = 70, 20, 10
>>> vls = 'a', 'b', 'c'
>>> picks = [v for v, d in zip(vls, dst) for _ in range(d)]
>>> for _ in range(12): print random.choice(picks),
...
a c c b a a a a a a a a
>>> for _ in range(12): print random.choice(picks),
...
a c a c a b b b a a a a
>>> for _ in range(12): print random.choice(picks),
...
a a a a c c a c a a c a
>>>
General idea: make a list where each item is repeated a number of times proportional to the probability it should have; use random.choice to pick one at random (uniformly), this will match your required probability distribution. Can be a bit wasteful of memory if your probabilities are expressed in peculiar ways (e.g., 70, 20, 10 makes a 100-items list where 7, 2, 1 would make a list of just 10 items with exactly the same behavior), but you could divide all the counts in the probabilities list by their greatest common factor if you think that's likely to be a big deal in your specific application scenario.
Apart from memory consumption issues, this should be the fastest solution -- just one random number generation per required output result, and the fastest possible lookup from that random number, no comparisons &c. If your likely probabilities are very weird (e.g., floating point numbers that need to be matched to many, many significant digits), other approaches may be preferable;-).
Knuth references Walker's method of aliases. Searching on this, I find http://code.activestate.com/recipes/576564-walkers-alias-method-for-random-objects-with-diffe/ and http://prxq.wordpress.com/2006/04/17/the-alias-method/. This gives the exact probabilities required in constant time per number generated with linear time for setup (curiously, n log n time for setup if you use exactly the method Knuth describes, which does a preparatory sort you can avoid).
Take the list of and find the cumulative total of the weights: 70, 70+20, 70+20+10. Pick a random number greater than or equal to zero and less than the total. Iterate over the items and return the first value for which the cumulative sum of the weights is greater than this random number:
def select( values ):
variate = random.random() * sum( values.values() )
cumulative = 0.0
for item, weight in values.items():
cumulative += weight
if variate < cumulative:
return item
return item # Shouldn't get here, but just in case of rounding...
print select( { "a": 70, "b": 20, "c": 10 } )
This solution, as implemented, should also be able to handle fractional weights and weights that add up to any number so long as they're all non-negative.
Let T = the sum of all item weights
Let R = a random number between 0 and T
Iterate the item list subtracting each item weight from R and return the item that causes the result to become <= 0.
def weighted_choice(probabilities):
random_position = random.random() * sum(probabilities)
current_position = 0.0
for i, p in enumerate(probabilities):
current_position += p
if random_position < current_position:
return i
return None
Because random.random will always return < 1.0, the final return should never be reached.
import random
def selector(weights):
i=random.random()*sum(x for x,y in weights)
for w,v in weights:
if w>=i:
break
i-=w
return v
weights = ((70,'a'),(20,'b'),(10,'c'))
print [selector(weights) for x in range(10)]
it works equally well for fractional weights
weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
print [selector(weights) for x in range(10)]
If you have a lot of weights, you can use bisect to reduce the number of iterations required
import random
import bisect
def make_acc_weights(weights):
acc=0
acc_weights = []
for w,v in weights:
acc+=w
acc_weights.append((acc,v))
return acc_weights
def selector(acc_weights):
i=random.random()*sum(x for x,y in weights)
return weights[bisect.bisect(acc_weights, (i,))][1]
weights = ((70,'a'),(20,'b'),(10,'c'))
acc_weights = make_acc_weights(weights)
print [selector(acc_weights) for x in range(100)]
Also works fine for fractional weights
weights = ((0.7,'a'),(0.2,'b'),(0.1,'c'))
acc_weights = make_acc_weights(weights)
print [selector(acc_weights) for x in range(100)]
today, the update of python document give an example to make a random.choice() with weighted probabilities:
If the weights are small integer ratios, a simple technique is to build a sample population with repeats:
>>> weighted_choices = [('Red', 3), ('Blue', 2), ('Yellow', 1), ('Green', 4)]
>>> population = [val for val, cnt in weighted_choices for i in range(cnt)]
>>> random.choice(population)
'Green'
A more general approach is to arrange the weights in a cumulative distribution with itertools.accumulate(), and then locate the random value with bisect.bisect():
>>> choices, weights = zip(*weighted_choices)
>>> cumdist = list(itertools.accumulate(weights))
>>> x = random.random() * cumdist[-1]
>>> choices[bisect.bisect(cumdist, x)]
'Blue'
one note: itertools.accumulate() needs python 3.2 or define it with the Equivalent.
I think you can have an array of small objects (I implemented in Java although I know a little bit C# but I am afraid can write wrong code), so you may need to port it yourself. The code in C# will be much smaller with struct, var but I hope you get the idea
class PercentString {
double percent;
String value;
// Constructor for 2 values
}
ArrayList<PercentString> list = new ArrayList<PercentString();
list.add(new PercentString(70, "a");
list.add(new PercentString(20, "b");
list.add(new PercentString(10, "c");
double percent = 0;
for (int i = 0; i < list.size(); i++) {
PercentString p = list.get(i);
percent += p.percent;
if (random < percent) {
return p.value;
}
}
If you are really up to speed and want to generate the random values quickly, the Walker's algorithm mcdowella mentioned in https://stackoverflow.com/a/3655773/1212517 is pretty much the best way to go (O(1) time for random(), and O(N) time for preprocess()).
For anyone who is interested, here is my own PHP implementation of the algorithm:
/**
* Pre-process the samples (Walker's alias method).
* #param array key represents the sample, value is the weight
*/
protected function preprocess($weights){
$N = count($weights);
$sum = array_sum($weights);
$avg = $sum / (double)$N;
//divide the array of weights to values smaller and geq than sum/N
$smaller = array_filter($weights, function($itm) use ($avg){ return $avg > $itm;}); $sN = count($smaller);
$greater_eq = array_filter($weights, function($itm) use ($avg){ return $avg <= $itm;}); $gN = count($greater_eq);
$bin = array(); //bins
//we want to fill N bins
for($i = 0;$i<$N;$i++){
//At first, decide for a first value in this bin
//if there are small intervals left, we choose one
if($sN > 0){
$choice1 = each($smaller);
unset($smaller[$choice1['key']]);
$sN--;
} else{ //otherwise, we split a large interval
$choice1 = each($greater_eq);
unset($greater_eq[$choice1['key']]);
}
//splitting happens here - the unused part of interval is thrown back to the array
if($choice1['value'] >= $avg){
if($choice1['value'] - $avg >= $avg){
$greater_eq[$choice1['key']] = $choice1['value'] - $avg;
}else if($choice1['value'] - $avg > 0){
$smaller[$choice1['key']] = $choice1['value'] - $avg;
$sN++;
}
//this bin comprises of only one value
$bin[] = array(1=>$choice1['key'], 2=>null, 'p1'=>1, 'p2'=>0);
}else{
//make the second choice for the current bin
$choice2 = each($greater_eq);
unset($greater_eq[$choice2['key']]);
//splitting on the second interval
if($choice2['value'] - $avg + $choice1['value'] >= $avg){
$greater_eq[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
}else{
$smaller[$choice2['key']] = $choice2['value'] - $avg + $choice1['value'];
$sN++;
}
//this bin comprises of two values
$choice2['value'] = $avg - $choice1['value'];
$bin[] = array(1=>$choice1['key'], 2=>$choice2['key'],
'p1'=>$choice1['value'] / $avg,
'p2'=>$choice2['value'] / $avg);
}
}
$this->bins = $bin;
}
/**
* Choose a random sample according to the weights.
*/
public function random(){
$bin = $this->bins[array_rand($this->bins)];
$randValue = (lcg_value() < $bin['p1'])?$bin[1]:$bin[2];
}
Here is my version that can apply to any IList and normalize the weight. It is based on Timwi's solution : selection based on percentage weighting
/// <summary>
/// return a random element of the list or default if list is empty
/// </summary>
/// <param name="e"></param>
/// <param name="weightSelector">
/// return chances to be picked for the element. A weigh of 0 or less means 0 chance to be picked.
/// If all elements have weight of 0 or less they all have equal chances to be picked.
/// </param>
/// <returns></returns>
public static T AnyOrDefault<T>(this IList<T> e, Func<T, double> weightSelector)
{
if (e.Count < 1)
return default(T);
if (e.Count == 1)
return e[0];
var weights = e.Select(o => Math.Max(weightSelector(o), 0)).ToArray();
var sum = weights.Sum(d => d);
var rnd = new Random().NextDouble();
for (int i = 0; i < weights.Length; i++)
{
//Normalize weight
var w = sum == 0
? 1 / (double)e.Count
: weights[i] / sum;
if (rnd < w)
return e[i];
rnd -= w;
}
throw new Exception("Should not happen");
}
I've my own solution for this:
public class Randomizator3000
{
public class Item<T>
{
public T value;
public float weight;
public static float GetTotalWeight<T>(Item<T>[] p_itens)
{
float __toReturn = 0;
foreach(var item in p_itens)
{
__toReturn += item.weight;
}
return __toReturn;
}
}
private static System.Random _randHolder;
private static System.Random _random
{
get
{
if(_randHolder == null)
_randHolder = new System.Random();
return _randHolder;
}
}
public static T PickOne<T>(Item<T>[] p_itens)
{
if(p_itens == null || p_itens.Length == 0)
{
return default(T);
}
float __randomizedValue = (float)_random.NextDouble() * (Item<T>.GetTotalWeight(p_itens));
float __adding = 0;
for(int i = 0; i < p_itens.Length; i ++)
{
float __cacheValue = p_itens[i].weight + __adding;
if(__randomizedValue <= __cacheValue)
{
return p_itens[i].value;
}
__adding = __cacheValue;
}
return p_itens[p_itens.Length - 1].value;
}
}
And using it should be something like that (thats in Unity3d)
using UnityEngine;
using System.Collections;
public class teste : MonoBehaviour
{
Randomizator3000.Item<string>[] lista;
void Start()
{
lista = new Randomizator3000.Item<string>[10];
lista[0] = new Randomizator3000.Item<string>();
lista[0].weight = 10;
lista[0].value = "a";
lista[1] = new Randomizator3000.Item<string>();
lista[1].weight = 10;
lista[1].value = "b";
lista[2] = new Randomizator3000.Item<string>();
lista[2].weight = 10;
lista[2].value = "c";
lista[3] = new Randomizator3000.Item<string>();
lista[3].weight = 10;
lista[3].value = "d";
lista[4] = new Randomizator3000.Item<string>();
lista[4].weight = 10;
lista[4].value = "e";
lista[5] = new Randomizator3000.Item<string>();
lista[5].weight = 10;
lista[5].value = "f";
lista[6] = new Randomizator3000.Item<string>();
lista[6].weight = 10;
lista[6].value = "g";
lista[7] = new Randomizator3000.Item<string>();
lista[7].weight = 10;
lista[7].value = "h";
lista[8] = new Randomizator3000.Item<string>();
lista[8].weight = 10;
lista[8].value = "i";
lista[9] = new Randomizator3000.Item<string>();
lista[9].weight = 10;
lista[9].value = "j";
}
void Update ()
{
Debug.Log(Randomizator3000.PickOne<string>(lista));
}
}
In this example each value has a 10% chance do be displayed as a debug =3
Based loosely on python's numpy.random.choice(a=items, p=probs), which takes an array and a probability array of the same size.
public T RandomChoice<T>(IEnumerable<T> a, IEnumerable<double> p)
{
IEnumerator<T> ae = a.GetEnumerator();
Random random = new Random();
double target = random.NextDouble();
double accumulator = 0;
foreach (var prob in p)
{
ae.MoveNext();
accumulator += prob;
if (accumulator > target)
{
break;
}
}
return ae.Current;
}
The probability array p must sum to (approx.) 1. This is to keep it consistent with the numpy interface (and mathematics), but you could easily change that if you wanted.

Categories

Resources