Generating random int with specific condition C# - c#

Consider some special condition, where we want to generate location data with some random speed.
public class Location
{
public double Lat { get; set; }
public double Lng { get; set; }
public int Speed { get; set; }
public DateTime Date { get; set; }
}
The speed can be randomly generated using Random.Next() method.
Now consider that we are going to have limit of 1 to 200 for speed,
and we want that most of the Random.Next(1,200) result be more likely at the rang of 1 to 120
(for example if we have 160 locations, most of the location speeds be at around 1 to 120(about 60% to 80% of the location) and the rest be at about range of 120 to 200)
I know some bad and ugly ways where you can divide your locations randomly into two list of locations and then generate speed separately for those lists, but I'm looking for a better and more efficient way.
Thanks!
Edit :
I have to mention that there is a property called
Date which is Type of DateTime and defines the time of a locations occurrence.
A list of location that is going to be generated will be a path so the speeds being generated should be relative to locations and time to just seem right(for example 2 continues locations can't have 2 unrelative speed like first location : 80KM/h , second location: 140 KM/h in a short time span of 30 seconds) so just the speed, date time and location should seem logically a normal path.

Can't you just use two random numbers? The first to determine which range, and the second to choose a number in the appropriate range?
Random rng = new Random(); // This should only be created once, somewhere.
double proportionInLowerRange = 0.7;
int speed;
if (rng.NextDouble() <= proportionInLowerRange)
speed = rng.Next(1, 121);
else
speed = rng.Next(120, 201);
Note: The probabilities are linear across both ranges, so if you wanted a normal distribution this wouldn't work.

Related

Optimization Algorithm in C#

I have an optimization issue that I'm not sure where to go from here. I have a program that tries to find the best combination of inputs that return the highest predicted r squared value. The problem is that I have 21 total inputs (List) and I need them in a set of 15 inputs. The formula for total combinations is:
n! / r!(n - r)! = 21! / 15!(21 - 15)! = 54,264 possible combinations
So obviously running through each combination and calculating the predicted rsquared is not an ideal solution so is there an better way/algorithm/method I can use to try to skip or narrow down the bad combinations so that I'm only processing the fewest amount of combinations? Here is my current psuedo code for this issue:
public BestCombo GetBestCombo(List<List<MultipleRegressionInfo>> combosList)
{
BestCombo bestCombo = new BestCombo();
foreach (var combo in combosList)
{
var predRsquared = CalculatePredictedRSquared(combo);
if (predRsquared > bestCombo.predRSquared)
{
bestCombo.predRSquared = predRsquared;
bestCombo.BestRSquaredCombo = combo;
}
}
return bestCombo;
}
public class BestCombo
{
public double predRSquared { get; set; }
public IEnumerable<MultipleRegressionInfo> BestRSquaredCombo { get; set; }
}
public class MultipleRegressionInfo
{
public List<double> input { get; set; }
public List<double> output { get; set; }
}
public double CalculatePredictedRSquared(List<MultipleRegressionInfo> combo)
{
Matrix<double> matrix = BuildMatrix(combo.Select(i => i.input).ToArray());
Vector<double> vector = BuildVector(combo.ElementAt(0).output);
var coefficients = CalculateWithQR(matrix, vector);
var y = CalculateYIntercept(coefficients, input, output);
var estimateList = CalculateEstimates(coefficients, y, input, output);
return GetPredRsquared(estimateList, output);
}
54,264 is not enormous for a computer - it might be worth timing a few calls to compute R^2 and multiplying up to see just how long this would take.
There is a branch and bound algorithm for this sort of problem, which relies on the fact that R^2(A,B,C) >= R^2(A,B) - that the R^2 can only decrease when you drop a variable. Recursively search the space of all sets of variables of size at least 15. After computing the R^2 for a set of variables, make recursive calls with sets produced by dropping a single variable from the set, where any such drop must be to the right of any existing gap (so A.CDE produces A..DE, A.C.E, and A.CD. but not ..CDE, which will be produced by .BCDE). You can terminate the recursion when you get down to the desired size of set, or when you find an R^2 that is no better than the best answer so far.
If it happens that you often find R^2 values no better than the best answer so far, this will save time - but this is not guaranteed. You can attempt to improve the efficiency by chosing to investigate the sets with highest R^2 first, hoping that you find a new best answer good enough to rule out their siblings by the time you come to them, and by using a procedure to calculate R^2 for A.CDE that makes use of the calculations you have already done for ABCDE.

Output of code using System.Random does not approach theoretical limit as iterations increase

I'm not good at stats, so I tried to solve a simple problem in C#. The problem: "A given team has a 65% chance to win a single game against another team. What is the probability that they will win a best-of-5 set?"
I wanted to look at the relationship between that probability and the number of games in the set. How does a Bo3 compare to a Bo5, and so on?
I did this by creating Set and Game objects and running iterations. The win decision is done with this code:
Won = rnd.Next(1, 100) <= winChance;
rnd is, as you might expect, a static System.Random object.
Here's my Set object code:
public class Set
{
public int NumberOfGames { get; private set; }
public List<Game> Games { get; private set; }
public Set(int numberOfGames, int winChancePct)
{
NumberOfGames = numberOfGames;
GamesNeededToWin = Convert.ToInt32(Math.Ceiling(NumberOfGames / 2m));
Games = Enumerable.Range(1, numberOfGames)
.Select(i => new Game(winChancePct))
.ToList();
}
public int GamesNeededToWin { get; private set; }
public bool WonSet => Games.Count(g => g.Won) >= GamesNeededToWin;
}
My issue is that the results I get aren't quite what they should be. Someone who sucks less at stats did the math for me, and it seems my code is always overestimating the chance of winning the set, and the number of iterations doesn't improve the accuracy.
The results I get (% set win by games per set) are below. The first column is the games per set, the next is the statistical win rate (which my results should be approaching), and the remaining columns are my results based on the number of iterations. As you can see, more iterations don't seem to be making the numbers more accurate.
Games Per Set|Expected Set Win Rate|10K|100K|1M|10M
1 65.0% 66.0% 65.6% 65.7% 65.7%
3 71.8% 72.5% 72.7% 72.7% 72.7%
5 76.5% 78.6% 77.4% 77.5% 77.5%
7 80.0% 80.7% 81.2% 81.0% 81.1%
9 82.8% 84.1% 83.9% 83.9% 83.9%
The entire project is posted on github here if you want to look.
Any insight into why this isn't producing accurate results would be greatly appreciated.
The answer of Darren Sisson is correct; your computation is off by approximately 1%, and so all your results are as well.
My recommendation is that you solve the problem by encapsulating your desired semantics into an object which you can then test independently:
interface IDistribution<T>
{
T Sample();
}
static class Extensions
{
public static IEnumerable<T> Samples(this IDistribution<T> d)
{
while (true) yield return d.Sample();
}
}
class Bernoulli : IDistribution<bool>
{
// Note that we could also make it IDistribution<int> and return
// 0 and 1 instead of false and true; that would be the more
// "classic" approach to a Bernoulli distribution. Your choice.
private double d;
private Random random = new Random();
private Bernoulli(double d) { this.d = d; }
public static Make(double d) => new Bernoulli(d);
public bool Sample() => random.NextDouble() < d;
}
And now you have a biased coin flipper which you can test independently. You can now write code like:
int flips = 1000;
int heads = Bernoulli
.Make(0.65)
.Samples()
.Take(flips)
.Where(x => x)
.Count();
to do 1000 coin flips with a 65% chance of heads.
Notice that what we are doing here is constructing a probability distribution monad and then using the tools of LINQ to express a conditional probability. This is a powerful technique; your application barely scratches the surface of what we can do with it.
Exercise: construct extension methods Where, Select and SelectMany which take not IEnumerable<T> but rather IDistribution<T>; can you express the semantics of the distribution in terms of the distribution type itself, rather than making a transformation from the distribution monad to the sequence monad? Can you do the same for zip joins?
Exercise: construct other implementations of IDistribution<T>. Can you construct, say, a Cauchy distribution of doubles? What about a normal distribution? What about a dice-rolling distribution on a fair die of n sides? Now, can you put this all together? What is the distribution which is: flip a coin; if heads, roll four dice and add them together, otherwise roll two dice and discard all the doubles, and multiply the results.
quick look, The Random function's upper bound is exclusive so would need to be set to 101

Setting random variables C# that equal add up to a total

I'm creating a game in which someone opens a chest and the chest will give them a random prize. The maximum I can give out is 85,000,000 in 10,000 chests which is 8,500 average however I want some to make it so some chests will be below this value and above and to be able to set a min lose of 2,500 and max win 250,000 but still get the total value of 85,000,000.
I'm really struggling to come up with an algorithm for this using my C# knowledge.
Here goes some OOP. You have Player class. Which stores some info - amount of gold he has, chests left to open, and total amount of gold in chests he will find.
public class Player
{
private int gold = 0;
private int goldLeftInChests = 85000000;
private int chestsToOpen = 10000;
private Random random = new Random();
public void OpenChest()
{
if (chestsToOpen == 0)
return; // or whatever you want after 10000 chests.
int goldInChest = CalculateGoldInNextChest();
goldLeftInChests -= goldInChest;
chestsToOpen--;
gold += goldInChest;
}
private int CalculateGoldInNextChest()
{
if (chestsToOpen == 1)
return goldLeftInChests;
var average = goldLeftInChests / chestsToOpen;
return random.Next(average);
}
}
When next chest is opened, gold in chest is calculated and player data ajusted - we add some gold to player and reduce total amount of gold in chests, and chests left to open.
Calculating gold in a chest is very simple. We get average amount left and calculate number between 1 and average. First time this value will always be below 8500. But next time average will be little bit bigger. So player will have chance to find more than 8500. If he will be unlucky again, average will grow. Or it will be reduced if palyer gets lot of gold.
UPDATE: As #Hans pointed, I didn't count min and max restrictions for gold in chests. Also there is a problem in #Hans solution - you should move gold between 10000 chests lot of time to get some chests close to 250000 value. And you have to fill and keep all 10000 values. Next problem I thought about was random numbers distribution in .NET. Values have equal probability on all interval we are using. So if we are generating value from 2500 to 250000, chance that we'll get value around 8500 (average) is like 12000 (8500±6000) vs 235500 (250000-12000-2500). That means generating default random numbers from given range will give you lot of big numbers in the begining, and then you will stick near lowest boundary (2500). So you need random numbers with different distribution - Gaussian variables. We still want to have 8500 gold with highest probablity, and 250000 with lowest probability. Something like that:
And last part - calculation. We need to update only one method :)
private int CalculateGoldInNextChest()
{
const int mean = 8500;
var goldPerChestRange = new Range(2500, 250000);
var averageRange = new Range(mean - 2500, mean + 2500);
if (chestsToOpen == 1)
return goldLeftInChests;
do
{
int goldInChest = (int)random.NextGaussian(mu: mean, sigma: 50000);
int averageLeft = (goldLeftInChests - goldInChest) / (chestsToOpen - 1);
if (goldPerChestRange.Contains(goldInChest) && averageRange.Contains(averageLeft))
return goldInChest;
}
while (true);
}
Note: I used range to make code more readable. Running tests several times produces nice top values more than 200000.
pseudocode algoritm:
use an array of chests
index of array is chest number; length of array is amount of chests
value in array is amount in chest at that index
initial value is total amount divided by number of chests
now repeat a number of times (say: 10 times the number of chests)
get two random chests
work out the maximum amount you can transfer from chest 1 to chest 2, so that 1 doesn't get below the minimum and 2 doesn't get above the maximum
get a random value below that maximum and transfer it
Now try and implement this in C#.
This should be a good starting point. Each chest gets filled randomly with the limits adapting to make sure the remaining chests can also get valid values.
Random rand = new Random();
int[] chests = new int[numOffChests];
int remaining = TotalValue;
for(int i = 0; i < numOffChests; i++)
{
int minB = Math.Max(remaining / (numOffChests - i), maxChestValue);
int maxB = Math.Min(remaining - (numOffChests - i * minChestValue), maxChestValue);
int val = rand.Next(minB, maxB);
remaining -= val;
chests[i] = val;
}
The distribution has to be heavily skewed to get that range of values with that mean. Try an exponential formula, X=exp(a*U+b)+c where U is uniform on [0,1]. Then the conditions are
-2,500 = exp(b)+c
250,000 = exp(a+b)+c
8,500 = integral(exp(a*u+b), u=0..1)
= exp(b)/a*(exp(a)-1)+c
= 252,500/a+c
which gives the two equations
250,000+2,500*exp(a) = c*(1-exp(a))
8,500 = 252,500/a+c
A bit of graphical and numerical solution gives the "magic" numbers
a = 22.954545,
b = -10.515379,
c = -2500.00002711621
Now fill 10,000 chests according to that formula, compute the sum over the chest prices and distribute the, with high probability small, excess in any pattern you like.
If you want to hit the upper and lower bounds more regularly, increase the bounds at the basis of the computation and cut the computed value if outside the original bounds.
I assume that a probabilistic function gives the chance of a win/lose value V to occur. Let's say that the probability for V is proportional to (250000-V)**2, giving fewer chances to get high prizes.
To simplify some rounding issues, let's also assume that win/lose are multiple of 100. You may then make the following (untested) computations:
int minwin = -2500 ;
int maxwin = 250000;
int chestcount = 10000;
int maxamount = 85000;
// ----------- get probabilities for each win/lose amount to occur in all chests ----------
long probtotal = 0 ;
List<long> prob = new List<long> ;
for (long i=minwin;i<=maxwin;i++) if (i%100==0)
{ long ii=(maxwin-i)*(maxwin-i) ; prob.Add((float)ii) ; probtotal+=ii ; }
for (int i=0;i<prob.Count;i++) prob[i]=prob[i]/(probtotal) ;
for (int i=0;i<prob.Count;i++)
Console.writeLine("Win/lose amount"+((i-minwin)*100).ToString()+" probability="+(proba[i]*100).ToString("0.000")) ;
// Transform "prob" so as to indicate the float count of chest corresponding to each win/lose amount
for (int i=0;i<prob.Count;i++) prob[i]=prob[i]*chestcount ;
// ---------- Set the 10000 chest values starting from the highest win -------------
int chestindex=0 ;
List<int> chestvalues = new List<int>();
float remainder = 0 ;
int totalwin=0 ;
for (int i=0;i<prob.Count;i++)
{
int n = (int)(prob[i]+remainder) ; // only the integer part of the float ;
remainder = prob[i]+remainder-n ;
// set to n chests the win/lose amount
int curwin=(i-minwin)*100 ;
for (int j=0;j<n && chestvalues.count<chestcount;j++) chestvalues.Add(curwin) ;
totalwin+=curwin ;
}
// if stvalues.count lower than chestcount, create missing chestvalues
for (int i=chestvalues.Count;i<chestcount;i++) chestvalues.Add(0) ;
// --------------- due to float computations, we perhaps want to make some adjustments --------------
// i.e. if totalwin>maxamount (not sure if it may happen), decrease some chestvalues
...
// --------------- We have now a list of 10000 chest values to be randomly sorted --------------
Random rnd = new Random();
SortedList<int,int> randomchestvalues = new SortedList<int,int>() ;
for (int i=0;i<chestcount;i++) randomchestvalues.Add(rnd.Next(0,99999999),chestvalues[i]) ;
// display the first chests amounts
for (int i=0;i<chestcount;i++) if (i<234)
{ int chestamount = randomchestvalues.GetByIndex(i) ; Console.WriteLine(i.ToString()+":"+chestamount) ; }
}

Determining % of time above a certain value in a dataset

I have a dataset of voltages (Sampled every 500ms). Lets say it looks something like this (In an array):
0ms -> 1.4v
500ms -> 1.3v
1000ms -> 1.2v
1500ms -> 1.5v
2000ms -> 1.3v
2500ms -> 1.3v
3000ms -> 1.2v
3500ms -> 1.3v
Assuming the transition between readings is linear (IE: 250ms = 1.35v), how would I go about calculating the total % of time that the reading is above or equal to 1.3v?
I was initially going to just get % of values that are >= 1.3v (IE: 6/8 in sample array), however this only works if the angle between points is 45 degrees. I am assuming I have to do something like create a line from point 1 to point 2 and find the intercept with the base line (1.3v). Then do the same for point 2 and point 3 and find the distance between both intersects (Say 700ms) then repeat for all points and get as a % of total sample time.
EDIT
Maybe I wasn't clear when I initially asked. I need help with identifying how I can perform these calculations, IE: objects/classes that I can use to help me virtually graph these lines and perform these calculations or any 3rd party math packages that might offer these capabilities.
The important part is not to think in data points, but in intervals. Every interval (e.g. 0-500, 500-1000, ...) is one of three cases (starting with float variables above and below both 0):
Trivial: Both start and end point are below your threshold - below += 1
Trivial: Both start and end point are above your threshold - above += 1
Interesting: One point is below, one above your threshold. Let's call the smaller value min and the higher value max. Now we do above += (max-threshold)/(max-min) and below += (threshold-min)/(max-min), so we linearily distribute this interval between both states.
Finally normalize the results by dividing both above and below by the number of intervals. This will give you a pair of numbers, that represent the fractions, i.e. that add up to 1 modulo rounding errors. Ofcourse multiplication with 100 gives you the percentages.
EDIT
#phoog pointed out in the comment, that I did not mention an "equal" case. This is by design, as your question already contains that: You chose >= as a comparison, so I definitly ment to use the same comparison here.
If I've understood the problem correctly, you can use a class like this to hold each entry:
public class DataEntry
{
public DataEntry(int time, double reading)
{
Time = time;
Reading = reading;
}
public int Time { get; set; }
public double Reading { get; set; }
}
And then the following link statement to get segments above 1.3:
var entries = new List<DataEntry>()
{
new DataEntry(0, 1.4),
new DataEntry(500, 1.3),
new DataEntry(1000, 1.2),
new DataEntry(1500, 1.5),
new DataEntry(2000, 1.3),
new DataEntry(2500, 1.3),
new DataEntry(3000, 1.2),
new DataEntry(3500, 1.3)
};
double totalTime = entries
.OrderBy(e => e.Time)
.Take(entries.Count - 1)
.Where((t, i) => t.Reading >= 1.3 && entries[i + 1].Reading >= 1.3)
.Sum(t => 500);
var perct = (totalTime / entries.Max(e => e.Time));
This should give you the 500ms segments that remained above 1.3.

Standard deviation of generic list? [duplicate]

This question already has answers here:
How do I determine the standard deviation (stddev) of a set of values?
(12 answers)
Standard Deviation in LINQ
(8 answers)
Closed 9 years ago.
I need to calculate the standard deviation of a generic list. I will try to include my code. Its a generic list with data in it. The data is mostly floats and ints. Here is my code that is relative to it without getting into to much detail:
namespace ValveTesterInterface
{
public class ValveDataResults
{
private List<ValveData> m_ValveResults;
public ValveDataResults()
{
if (m_ValveResults == null)
{
m_ValveResults = new List<ValveData>();
}
}
public void AddValveData(ValveData valve)
{
m_ValveResults.Add(valve);
}
Here is the function where the standard deviation needs to be calculated:
public float LatchStdev()
{
float sumOfSqrs = 0;
float meanValue = 0;
foreach (ValveData value in m_ValveResults)
{
meanValue += value.LatchTime;
}
meanValue = (meanValue / m_ValveResults.Count) * 0.02f;
for (int i = 0; i <= m_ValveResults.Count; i++)
{
sumOfSqrs += Math.Pow((m_ValveResults - meanValue), 2);
}
return Math.Sqrt(sumOfSqrs /(m_ValveResults.Count - 1));
}
}
}
Ignore whats inside the LatchStdev() function because I'm sure its not right. Its just my poor attempt to calculate the st dev. I know how to do it of a list of doubles, however not of a list of generic data list. If someone had experience in this, please help.
The example above is slightly incorrect and could have a divide by zero error if your population set is 1. The following code is somewhat simpler and gives the "population standard deviation" result. (http://en.wikipedia.org/wiki/Standard_deviation)
using System;
using System.Linq;
using System.Collections.Generic;
public static class Extend
{
public static double StandardDeviation(this IEnumerable<double> values)
{
double avg = values.Average();
return Math.Sqrt(values.Average(v=>Math.Pow(v-avg,2)));
}
}
This article should help you. It creates a function that computes the deviation of a sequence of double values. All you have to do is supply a sequence of appropriate data elements.
The resulting function is:
private double CalculateStandardDeviation(IEnumerable<double> values)
{
double standardDeviation = 0;
if (values.Any())
{
// Compute the average.
double avg = values.Average();
// Perform the Sum of (value-avg)_2_2.
double sum = values.Sum(d => Math.Pow(d - avg, 2));
// Put it all together.
standardDeviation = Math.Sqrt((sum) / (values.Count()-1));
}
return standardDeviation;
}
This is easy enough to adapt for any generic type, so long as we provide a selector for the value being computed. LINQ is great for that, the Select funciton allows you to project from your generic list of custom types a sequence of numeric values for which to compute the standard deviation:
List<ValveData> list = ...
var result = list.Select( v => (double)v.SomeField )
.CalculateStdDev();
Even though the accepted answer seems mathematically correct, it is wrong from the programming perspective - it enumerates the same sequence 4 times. This might be ok if the underlying object is a list or an array, but if the input is a filtered/aggregated/etc linq expression, or if the data is coming directly from the database or network stream, this would cause much lower performance.
I would highly recommend not to reinvent the wheel and use one of the better open source math libraries Math.NET. We have been using that lib in our company and are very happy with the performance.
PM> Install-Package MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation();
var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
See http://numerics.mathdotnet.com/docs/DescriptiveStatistics.html for more information.
Lastly, for those who want to get the fastest possible result and sacrifice some precision, read "one-pass" algorithm https://en.wikipedia.org/wiki/Standard_deviation#Rapid_calculation_methods
I see what you're doing, and I use something similar. It seems to me you're not going far enough. I tend to encapsulate all data processing into a single class, that way I can cache the values that are calculated until the list changes.
for instance:
public class StatProcessor{
private list<double> _data; //this holds the current data
private _avg; //we cache average here
private _avgValid; //a flag to say weather we need to calculate the average or not
private _calcAvg(); //calculate the average of the list and cache in _avg, and set _avgValid
public double average{
get{
if(!_avgValid) //if we dont HAVE to calculate the average, skip it
_calcAvg(); //if we do, go ahead, cache it, then set the flag.
return _avg; //now _avg is garunteed to be good, so return it.
}
}
...more stuff
Add(){
//add stuff to the list here, and reset the flag
}
}
You'll notice that using this method, only the first request for average actually computes the average. After that, as long as we don't add (or remove, or modify at all, but those arnt shown) anything from the list, we can get the average for basically nothing.
Additionally, since the average is used in the algorithm for the standard deviation, computing the standard deviation first will give us the average for free, and computing the average first will give us a little performance boost in the standard devation calculation, assuming we remember to check the flag.
Furthermore! places like the average function, where you're looping through every value already anyway, is a great time to cache things like the minimum and maximum values. Of course, requests for this information need to first check whether theyve been cached, and that can cause a relative slowdown compared to just finding the max using the list, since it does all the extra work setting up all the concerned caches, not just the one your accessing.

Categories

Resources