Cant seem to compute normal distribution

Cant seem to compute normal distribution - c#

I have interprated the formula in wikipedia in c# code, i do get a nice normal curve, but is it rational to get values that exceeds 1? isnt it suppose to be a distribution function?
this is the C# implementation :
double up = Math.Exp(-Math.Pow(x , 2) / ( 2 * s * s ));
double down = ( s * Math.Sqrt(2 * Math.PI) );
return up / down;
i double checked it several times and it seems fine to me so whats wrong? my implementation or understanding?
for example if we define x=0 and s=0.1 this impl would return 3.989...

A distribution function, a pdf, has the property that its values are >= 0 and the integral of the pdf over -inf to +inf must be 1. But the integrand, that is the pdf, can take any value >= 0, including values greater than 1.
In other words, there is no reason, a priori, to believe that a pdf value > 1 indicates a problem.
You can think about this for the normal curve by considering what reducing the variance means. Smaller variance values concentrate the probability mass in the centre. Given that the total mass is always one, as the mass concentrates in the centre, the peak value must increase. You can see that trend in the graph the you link to.
What you should do is compare the output of your code with known good implementations. For instance, Wolfram Alpha gives the same value as you quote: http://www.wolframalpha.com/input/?i=normal+distribution+pdf+mean%3D0+standard+deviation%3D0.1+x%3D0&x=6&y=7
Do a little more testing of this nature, captured in a unit test, and you will be able to rely on your code with confidence.

Wouldn't you want something more like this?
public static double NormalDistribution(double value)
{
return (1 / Math.Sqrt(2 * Math.PI)) * Math.Exp(-Math.Pow(value, 2) / 2);
}

Yes, it's totally OK; The distribution itself (PDF) can be anything from 0 to +infinity; the thing should be in the range [0..1] is the corresponding integral(s) (e.g. CDF).
You can convince yourself if look at the case of non-random value: if the value is not a random at all and can have only one constant value the distribution degenerates (standard error is zero, mean is the value) into Dirac Delta Function: a peak of infinite hight but of zero width; integral however (CDF) from -infinity to +infinity is 1.
// If you have special functions implemented (i.e. Erf)
// outcoume is in [0..inf) range
public static Double NormalPDF(Double value, Double mean, Double sigma) {
Double v = (value - mean) / sigma;
return Math.Exp(-v * v / 2.0) / (sigma * Math.Sqrt(Math.PI * 2));
}
// outcome is in [0..1] range
public static Double NormalCDF(Double value, Double mean, Double sigma, Boolean isTwoTail) {
if (isTwoTail)
value = 1.0 - (1.0 - value) / 2.0;
//TODO: You should have Erf implemented
return 0.5 + Erf((value - mean) / (Math.Sqrt(2) * sigma)) / 2.0;
}

Related

Weighted random float number with single target and chance of hitting target

I'm trying to create a random float generator (range of 0.0-1.0), where I can supply a single target value, and a strength value that increases or decreases the chance that this target will be hit. For example, if my target is 0.7, and I have a high strength value, I would expect the function to return mostly values around 0.7.
Put another way, I want a function that, when run a lot of times, would produce a distribution graph something like this:
Histogram
Something like a bell curve, yes, but with a strict range limit (instead of the -inf/+inf range limit of a normal distribution). Clamping a normal distribution is not ideal, I want the distribution to naturally end at the range limits.
The approach I've been attempting is to come up with a formula to transform a value from uniform distribution to the mythical distribution I'm envisioning. Something like an inverse sine:
Inverse Sine
with the ability to widen out that middle point, via the strength value:
Widened Midpoint
and also the ability to move that midpoint up and down, via the target value:
Target changed to 0.7 (courtesy of MS Paint because I couldn't figure this part out mathematically)
The range of this theoretical "strength value" is up for debate. I could imagine either a limited value, say between 0 and 1, where 0 means it's uniform distribution and 1 means it's a 100% chance of hitting the target; or, I could imagine a value that approaches a 100% chance the higher it gets, without ever reaching it. Something along either line would work.
I'm working in C# but this can be language-agnostic. Any help pointing me in the right direction is appreciated. Also happy to clarify further.

I'm not a mathematician but I took a look and I feel like I got something that might work for you.
All i did was take the normal distribution formula:
and use 0.7 as mu to shift the distribution towards 0.7. I added a leading coefficient of 0.623 to shift the values to be between 0 and 1 and migrated it from formula to C#, this can be found below.
Usage:
DistributedRandom random = new DistributedRandom();
// roll for the chance to hit
double roll = random.NextDouble();
// add a strength modifier to lower or strengthen the roll based on level or something
double actualRoll = 0.7d * roll;
Definition
public class DistributedRandom : Random
{
public double Mean { get; set; } = 0.7d;
private const double limit = 0.623d;
private const double alpha = 0.25d;
private readonly double sqrtOf2Pi;
private readonly double leadingCoefficient;
public DistributedRandom()
{
sqrtOf2Pi = Math.Sqrt(2 * Math.PI);
leadingCoefficient = 1d / (alpha * sqrtOf2Pi);
leadingCoefficient *= limit;
}
public override double NextDouble()
{
double x = base.NextDouble();
double exponent = -0.5d * Math.Pow((x - Mean) / alpha, 2d);
double result = leadingCoefficient * Math.Pow(Math.E,exponent);
return result;
}
}
Edit:
In case you're not looking for output similar to the distribution histogram that you provided and instead want something more similar to the sigmoid function you drew I have created an alternate version.
Thanks to Ruzihm for pointing this out.
I went ahead and used the CDF for the normal distribution: where erf is defined as the error function: . I added a coefficient of 1.77 to scale the output to keep it within 0d - 1d.
It should produce numbers similar to this:
Here you can find the alternate class:
public class DistributedRandom : Random
{
public double Mean { get; set; } = 0.7d;
private const double xOffset = 1d;
private const double yOffset = 0.88d;
private const double alpha = 0.25d;
private readonly double sqrtOf2Pi = Math.Sqrt(2 * Math.PI);
private readonly double leadingCoefficient;
private const double cdfLimit = 1.77d;
private readonly double sqrt2 = Math.Sqrt(2);
private readonly double sqrtPi = Math.Sqrt(Math.PI);
private readonly double errorFunctionCoefficient;
private readonly double cdfDivisor;
public DistributedRandom()
{
leadingCoefficient = 1d / (alpha * sqrtOf2Pi);
errorFunctionCoefficient = 2d / sqrtPi;
cdfDivisor = alpha * sqrt2;
}
public override double NextDouble()
{
double x = base.NextDouble();
return CDF(x) - yOffset;
}
private double DistributionFunction(double x)
{
double exponent = -0.5d * Math.Pow((x - Mean) / alpha, 2d);
double result = leadingCoefficient * Math.Pow(Math.E, exponent);
return result;
}
private double ErrorFunction(double x)
{
return errorFunctionCoefficient * Math.Pow(Math.E,-Math.Pow(x,2));
}
private double CDF(double x)
{
x = DistributionFunction(x + xOffset)/cdfDivisor;
double result = 0.5d * (1 + ErrorFunction(x));
return cdfLimit * result;
}
}

I came up with a workable solution. This isn't quite as elegant as I was aiming for because it requires 2 random numbers per result, but it definitely fulfills the requirement. Basically it takes one random number, uses another random number that's exponentially curved towards 1, and lerps towards the target.
I wrote it out in python because it was easier for me to visualize the histogram of it:
import math
import random
# Linearly interpolate between a and b by t.
def lerp(a, b, t):
return ((1.0 - t) * a) + (t * b)
# What we want the median value to be.
target = 0.7
# How often we will hit that median value. (0 = uniform distribution, higher = greater chance of hitting median)
strength = 1.0
values = []
for i in range(0, 1000):
# Start with a base float between 0 and 1.
base = random.random()
# Get another float between 0 and 1, that trends towards 1 with a higher strength value.
adjust = random.random()
adjust = 1.0 - math.pow(1.0 - adjust, strength)
# Lerp the base float towards the target by the adjust amount.
value = lerp(base, target, adjust)
values.append(value)
# Graph histogram
import matplotlib.pyplot as plt
import scipy.special as sps
count, bins, ignored = plt.hist(values, 50, density=True)
plt.show()
Target = 0.7, Strength = 1
Target = 0.2, Strength = 1
Target = 0.7, Strength = 3
Target = 0.7, Strength = 0
(This is meant to be uniform distribution - it might look kinda jagged, but I tested and that's just python's random number generator.)

Why is my angle of 2 vectors function return NaN even though i follow the formula

I'm making a function that calculates the angle between 2 given vectors for my unity game using the dot product formula:
vector(a)*vector(b)=|vector(a)|*|vector(b)|*cos(the angle)
so I figured that the angle would equals
acos((vector(a)*vector(b))/(|vector(a)|*|vector(b)|))
Anyway here's my code:
float rotateAngle(Vector2 a,Vector2 b)
{
return Mathf.Acos((a.x * b.x + a.y * b.y) / ((Mathf.Sqrt(a.x * a.x + a.y * a.y)) * (Mathf.Sqrt(b.x * b.x + b.y * b.y)))) * (180 / Mathf.PI);
}
But when i played it the console showed NaN. I've tried and reviewed the code and the formula but returned empty-handed.
Can someone help me? Thank you in advanced!!

float.NaN is the result of undefined (for real numbers) mathematical operations such as 0 / 0 (note from the docs that x / 0 where x != 0 rather returns positive or negative infinity) or the square root of a negative value. As soon as one operant in an operation already is NaN then also the entire operation returns again NaN.
The second (square root of a negative value) can not happen here since you are using squared values so most probably your vectors have a magnitude of 0.
If you look at the Vector2 source code you will find their implementation of Vector2.Angle or Vector2.SignedAngle (which you should rather use btw as they are tested and way more efficient).
public static float Angle(Vector2 from, Vector2 to)
{
// sqrt(a) * sqrt(b) = sqrt(a * b) -- valid for real numbers
float denominator = (float)Math.Sqrt(from.sqrMagnitude * to.sqrMagnitude);
if (denominator < kEpsilonNormalSqrt)
return 0F;
float dot = Mathf.Clamp(Dot(from, to) / denominator, -1F, 1F);
return (float)Math.Acos(dot) * Mathf.Rad2Deg;
}
// Returns the signed angle in degrees between /from/ and /to/. Always returns the smallest possible angle
public static float SignedAngle(Vector2 from, Vector2 to)
{
float unsigned_angle = Angle(from, to);
float sign = Mathf.Sign(from.x * to.y - from.y * to.x);
return unsigned_angle * sign;
}
There you will find that the first thing they check is
float denominator = (float)Math.Sqrt(from.sqrMagnitude * to.sqrMagnitude);
if (denominator < kEpsilonNormalSqrt)
return 0F;
which basically makes exactly sure that both given vectors have a "big enough" magnitude, in particular one that is not 0 ;)
Long story short: Don't reinvent the wheel and rather use already built-in Vector2.Angle or Vector2.SignedAngle

NaN are typically the result of invalid mathematical operations on floating point numbers. A common source is division by zero, so my guess would be that the vector is 0,0.
I would also recommend using the built in functions for computing the normalization, Length/Magnitude, Dot etc. that will make the code much easier to read, and the compiler should be fairly good at optimizing that kind of code. If you need to do any additional optimization, only do so after you have done some measurements.

Matching excels rounding in a C# application

I am currently in the process of turning a rather lofty Excel sheet that is used for calculating scientific values into a C# application. However, I am hitting some problems in regards to the rounding.
All of my values are stored as doubles, and when you perform a small number of operations on them they match the excel sheet within acceptable accuracy (5 or 6 decimal places). When they are put through rather large operations with division, multiplication, square roots. They start to drift off by quite a large margin. I switched the entire code base to decimals at another point to test if it resolved this issue, it lessened the gap but the issue still remained.
I am aware this is due to the nature of decimal numbers in software development, but it's imperative I match excels rounding as much as possible. Research on this topic points me towards the standards that excel uses to round and it seems C# by default uses a slightly different one. Despite learning of this I am still unsure of how to proceed on replicating excels rounding. I'm wondering if anyone has any advice or previous experience on this topic?
Any help would be greatly appreciated.
EDIT : I would just like to clarify that I am not rounding my numbers whatsoever. The rounding on both the sheet and my code is implicitly being applied. I have tested the same formulas inside of a totally different software package (A form builder called K2). The resulting numbers match my c# application so it seems excels implicit rounding differs in some way.
One of the offending formulas:
(8.04 * Math.Pow(10, -5)) *
(Math.Pow(preTestTestingDetails.PitotCp, 2)) * (DeltaH) *
(tempDGMAverage + 273.0) /
(StackTemp + 273) *
((preTestTestingDetails.BarometricPressure / 0.133322 +
((preTestTestingDetails.StackStaticPressure / 9.80665) / 13.6)) /
(preTestTestingDetails.BarometricPressure / 0.133322)) *
(preTestTestingDetails.EstimatedMolWeight /
((preTestTestingDetails.EstimatedMolWeight * (1 - (EstimatedMoisture / 100))) +
(18 * (EstimatedMoisture / 100)))) *
Math.Pow((1 - (EstimatedMoisture / 100)), 2) *
(Math.Pow(preTestTestingDetails.NozzleMean, 4));

In C# the result of
int x = 5;
var result = x / 2; // result is 2 and of type int
... because an integer division is performed. So if integers are involved (not a double with no decimals, but a value of type int or long), make sure to convert to double before dividing.
int x = 5;
double result = x / 2; // result is 2.0 because conversion to double is made after division
This works:
int x = 5;
var result = (double)x / 2; // result is 2.5 and of type double
int x = 5;
var result = x / 2.0; // result is 2.5 and of type double
int x = 5;
var result = 0.5 * x; // result is 2.5 and of type double
The only place in your formula where this could happen is EstimatedMoisture / 100, in case EstimatedMoisture is of type int. If this is the case, fix it with EstimatedMoisture / 100.0.
Instead of 8.04 * Math.Pow(10, -5), you can write 8.04e-5. This avoids rounding effects of Math.Pow!
I don't know how Math.Pow(a, b) works, but the general formula is a^b=exp(b*ln(a)). So instead of writing Math.Pow(something, 2), write something * something. This is both, faster and more accurate.
Using constants for magic numbers adds clarity. Using temps for common sub-expressions makes the formula more readable.
const double mmHg_to_kPa = 0.133322;
const double g0 = 9.80665;
var p = preTestTestingDetails;
double moisture = EstimatedMoisture / 100.0;
double dryness = 1.0 - moisture;
double pressure_mmHg = p.BarometricPressure / mmHg_to_kPa;
double nozzleMean2 = p.NozzleMean * p.NozzleMean;
double nozzleMean4 = nozzleMean2 * nozzleMean2;
double result = 8.04E-05 *
p.PitotCp * p.PitotCp * DeltaH * (tempDGMAverage + 273.0) / (StackTemp + 273.0) *
((pressure_mmHg + p.StackStaticPressure / g0 / 13.6) / pressure_mmHg) *
(p.EstimatedMolWeight / (p.EstimatedMolWeight * dryness + 18.0 * moisture)) *
dryness * dryness * nozzleMean4;
Why not use 273.15 instead of 273.0 if precision is a concern?

Resource intensive mathematical calculation method in .NET

I have a numerical computation method in my .NET code that will be called more than 1000 times.
private double CalculatePressureLossThroughPipe(double length, double flow, double diameter)
{
double costA = 0, costB = 0;
double frictionFactor = 0;
double pressure = 0;
double velocity = flow / CalculatePipeArea(diameter);
// calculate Reynods No
double reNo = ((this.mDensity * velocity * diameter) / this.mViscosity);
// calculate frictionFactor
costA = Math.Pow((2.457 * Math.Log(1 / (Math.Pow((7 / reynoldsNo), 0.9) + (0.27 * 0.000015) / diameter))), 16);
costB = Math.Pow((37530 / reNo), 16);
frictionFactor = (2 * Math.Pow(((Math.Pow((8 / reNo), 12)) + (1 / Math.Pow((costA + costB), 1.5))), 0.083333));
// Calulate Pressure
pressure = (DesignConstants.PRESSSURE_CONSTANT * 2 * frictionFactor * length * Math.Pow(velocity, 2) * this.mDensity / diameter);
return pressure;
}
This function will be called in a loop, with different set of input parameters. The loop itself is quite intensive which calls the above mentioned function (with unique parameters) every time.The function although looks small is quite resource intensive.Is there an alternate way to process the method calls without using the standard members from System.Math ?

It looks like the expression (Math.Pow((7/reynoldsNo), 0.9) + (0.27*0.000015)can be precalculated since it's not dependent on any of your inputs. In any case when you say this method 'is quite resource intensive' presumably you mean it takes a long time - have you benchmarked it ? What would an acceptable time be ? These are the things you need to find out before trying to optimise anything.

You could try to improve the performance using multiple threads (using Tasks / Threads) and vectorization.
Using System.Numerics you may be able to leverage the power of SIMD, possibly increasing performance 4 times.

First of all you should analyze all the mathematical expressions and
reduce the number of those that can be precalculated:
(0.27*0.000015)
also try to use multiplication instead of Math.Pow if possible: velocity * velocity would be faster than Math.Pow(velocity, 2)
if possible you can try Pow approximation algorithms - they are faster but not so precise. Look for more information this article: http://martin.ankerl.com/2007/10/04/optimized-pow-approximation-for-java-and-c-c/
Are you using Parallel class for your loop to utilize multicore/multiprocessor of your PC? https://msdn.microsoft.com/library/dd537608(v=vs.110).aspx

Optimization of a distance calculation function

In my code I have to do a lot of distance calculation between pairs of lat/long values.
the code looks like this:
double result = Math.Acos(Math.Sin(lat2rad) * Math.Sin(lat1rad)
+ Math.Cos(lat2rad) * Math.Cos(lat1rad) * Math.Cos(lon2rad - lon1rad));
(lat2rad e.g. is latitude converted to radians).
I have identified this function as the performance bottleneck of my application. Is there any way to improve this?
(I cannot use look-up tables since the coordinates are varying). I have also looked at this question where a lookup scheme like a grid is suggested, which might be a possibility.
Thanks for your time! ;-)

If your goal is to rank (compare) distances, then approximations (sin and cos table lookups) could drastically reduce your amount of computations required (implement quick reject.)
Your goal is to only proceed with the actual trigonometric computation if the difference between the approximated distances (to be ranked or compared) falls below a certain threshold.
E.g. using lookup tables with 1000 samples (i.e. sin and cos sampled every 2*pi/1000), the lookup uncertainty is at most 0.006284. Using uncertainty calculation for the parameter to ACos, the cumulated uncertainty, also be the threshold uncertainty, will be at most 0.018731.
So, if evaluating Math.Sin(lat2rad) * Math.Sin(lat1rad)
+ Math.Cos(lat2rad) * Math.Cos(lat1rad) * Math.Cos(lon2rad - lon1rad) using sin and cos lookup tables for two coordinate-set pairs (distances) yields a certain ranking (one distance appears greater than the other based on the approximation), and the difference's modulus is greater than the threshold above, then the approximation is valid. Otherwise proceed with the actual trigonometric calculation.

Would the CORDIC algorithm work for you (in regards to speed/accuracy)?

Using inspiration from #Brann I think you can reduce the calculation a bit (Warning its a long time since I did any of this and it will need to be verified). Some sort of lookup of precalculated values probably the fastest though
You have :
1: ACOS( SIN A SIN B + COS A COS B COS(A-B) )
but 2: COS(A-B) = SIN A SIN B + COS A COS B
which is rewritten as 3: SIN A SIN B = COS(A-B) - COS A COS B
replace SIN A SIN B in 1. you have :
4: ACOS( COS(A-B) - COS A COS B + COS A COS B COS(A-B) )
You pre-calculate X = COS(A-B) and Y = COS A COS B and you put the values into 4
to give:
ACOS( X - Y + XY )
4 trig calculations instead of 6 !

Change the way you store long/lat:
struct LongLat
{
float
long,
lat,
x,y,z;
}
When creating a long/lat, also compute the (x,y,z) 3D point that represents the equivalent position on a unit sphere centred at the origin.
Now, to determine if point B is nearer to point A than point C, do the following:
// is B nearer to A than C?
bool IsNearer (LongLat A, LongLat B, LongLat C)
{
return (A.x * B.x + A.y * B.y + A.z * B.z) < (A.x * C.x + A.y * C.y + A.z * C.z);
}
and to get the distance between two points:
float Distance (LongLat A, LongLat B)
{
// radius is the size of sphere your mapping long/lats onto
return radius * acos (A.x * B.x + A.y * B.y + A.z * B.z);
}
You could remove the 'radius' term, effectively normalising the distances.

Switching to lookup tables for sin/cos/acos. Will be faster, there are alot of c/c++ fixed point libraries that also include those.
Here is code from someone else on Memoization. Which might work if the actual values used are more clustered.
Here is an SO question on Fixed Point.

What is the bottle neck? Is the the sine/cosine function calls or the arcsine call?
If your sine/cosine calls are slow, you could use the following theorem to prevent so many calls:
1 = sin(x)^2 + cos(x)^2
cos(x) = sqrt(1 - sin(x)^2)
But I like the mapping idea so that you don't have to recompute values you've already computed. Although be careful as the map could get very large very quickly.

How exact do you need the values to be?
If you round your values a bit then you could store the result of all lookups and check if thay have been used befor each calculation?

Well, since lat and lon are garenteed to be within a certain range, you could try using some form of a lookup table for you Math.* method calls. Say, a Dictionary<double,double>

I would argue that you may want to re-examine how you found that function to be the bottleneck. (IE did you profile the application?)
The equation to me seems very light weight and shouldn't cause any trouble.
Granted, I don't know your application and you say you do a lot of these calculations.
Nevertheless it is something to consider.

As someone else pointed out, are you sure this is your bottleneck?
I've done some performance testing of a similar application I'm building where I call a simple method to return a distance between two points using standard trig. 20,000 calls to it shoves it right at the top of the profiling output, yet there's no way I can make it faster... It's just the shear # of calls.
In this case, I need to reduce the # calls to it... Not that this is the bottleneck.

I use a different algorithm for calculating distance between 2 lati/longi positions, it could be lighter than yours since it only does 1 Cos call and 1 Sqrt call.
public static double GetDistanceBetweenTwoPos(double lat1, double long1, double lat2, double long2)
{
double distance = 0;
double x = 0;
double y = 0;
x = 69.1 * (lat1 - lat2);
y = 69.1 * (long1 - long2) * System.Math.Cos(lat2 / 57.3);
//calculation base : Miles
distance = System.Math.Sqrt(x * x + y * y);
//Distance calculated in Kilometres
return distance * 1.609;
}

someone has already mentioned memoisation and this is a bit similar. if you comparing the same point to many other points then it is better to precalculate parts of that equation.
instead of
double result = Math.Acos(Math.Sin(lat2rad) * Math.Sin(lat1rad)
+ Math.Cos(lat2rad) * Math.Cos(lat1rad) * Math.Cos(lon2rad - lon1rad));
have:
double result = Math.Acos(lat2rad.sin * lat1rad.sin
+ lat2rad.cos * lat1rad.cos * (lon2rad.cos * lon1rad.cos + lon1rad.sin * lon2rad.sin));
and i think that's the same formula as someone else has posted because part of the equation will disappear when you expand the brackets:)

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Cant seem to compute normal distribution - c#

Wouldn't you want something more like this? public static double NormalDistribution(double value) { return (1 / Math.Sqrt(2 * Math.PI)) * Math.Exp(-Math.Pow(value, 2) / 2); }

Related

Weighted random float number with single target and chance of hitting target

Why is my angle of 2 vectors function return NaN even though i follow the formula

Matching excels rounding in a C# application

Resource intensive mathematical calculation method in .NET

Optimization of a distance calculation function

Categories

Resources