Cross correlation using mathdotnet - c#

I have recently started using Mathdotnet Numerics statistical package to do data analysis in c#.
I am looking for the cross correlation function. Does Mathdotnet have an API for this?
Previously I have been using MATLAB xcorr or Python numpy.correlate. So I am looking for a C# equivalent of these.
I have looked through their documentation but it isn't very straightforward.
https://numerics.mathdotnet.com/api/

Correlation can be calculated by any of the methods from MathNet.Numerics.Statistics.Correlation, like Pearson or Spearman. But if you're looking for results like the ones provided by Matlab's xcorr or autocorr, then you have to manually calculate the correlation using those methods for each lag/delay value between your input samples. Notice this example includes both, cross and auto correlation.
double fs = 50; //sampling rate, Hz
double te = 1; //end time, seconds
int size = (int)(fs * te); //sample size
var t = Enumerable.Range(0, size).Select(p => p / fs).ToArray();
var y1 = t.Select(p => p < te / 2 ? 1.0 : 0).ToArray();
var y2 = t.Select(p => p < te / 2 ? 1.0 - 2*p : 0).ToArray();
var r12 = StatsHelper.CrossCorrelation(y1, y2); // Y1 * Y2
var r21 = StatsHelper.CrossCorrelation(y2, y1); // Y2 * Y1
var r11 = StatsHelper.CrossCorrelation(y1, y1); // Y1 * Y1 autocorrelation
StatsHelper:
public static class StatsHelper
{
public static LagCorr CrossCorrelation(double[] x1, double[] x2)
{
if (x1.Length != x2.Length)
throw new Exception("Samples must have same size.");
var len = x1.Length;
var len2 = 2 * len;
var len3 = 3 * len;
var s1 = new double[len3];
var s2 = new double[len3];
var cor = new double[len2];
var lag = new double[len2];
Array.Copy(x1, 0, s1, len, len);
Array.Copy(x2, 0, s2, 0, len);
for (int i = 0; i < len2; i++)
{
cor[i] = Correlation.Pearson(s1, s2);
lag[i] = i - len;
Array.Copy(s2,0,s2,1,s2.Length-1);
s2[0] = 0;
}
return new LagCorr { Corr = cor, Lag = lag };
}
}
LagCorr:
public class LagCorr
{
public double[] Lag { get; set; }
public double[] Corr { get; set; }
}
EDIT: Adding Matlab comparison results:
clear;
step=0.02;
t=[0:step:1-step];
y1=ones(1,50);
y1(26:50)=0;
y2=[1-2*t];
y2(26:50)=0;
[cor12,lags12]=xcorr(y1,y2);
[cor21,lags21]=xcorr(y2,y1);
[cor11,lags11]=xcorr(y1,y1);
[cor22,lags22]=xcorr(y2,y2);
subplot(2,3,1);
plot(t,y1);
title('Y1');
axis([0 1 -0.5 1.5]);
subplot(2,3,2);
plot(lags12,cor12);
title('Y1*Y2');
axis([-30 30 0 15]);
subplot(2,3,3);
plot(lags11,cor11);
title('Y1*Y1');
axis([-30 30 0 30]);
subplot(2,3,4);
plot(t,y2);
title('Y2');
axis([0 1 -0.5 1.5]);
subplot(2,3,5);
plot(lags21,cor21);
title('Y2*Y1');
axis([-30 30 0 15]);
subplot(2,3,6);
plot(lags22,cor22);
title('Y2*Y2');
axis([-30 30 0 10]);

I have tried the above solution with a sine wave that was shifted backwards by 20 time units with respect to a first sine wave. It gave me the correct result that the maximum of the correlation is at -20 (see below). One could discuss whether its appropriate to apply a zero padding, the zeros are not usually part of the signal. The MATLAB cross-correlation is not normalized the same way, it's not a "Pearson correlation" as in the example above.
The definition of the MATLAB cross-correlation is different: for scaling option "none" its a convolution with the time reversed signal. There are also various scaling options but none of them gives the same result as the Pearson correlation:
matlab definition of xcorr
My result: cross correlation of sin(n*0.1) with sin(n*0.1 - 20*0.1) using the example above:

Related

How to use the Nelder Meade Simplex algorithm in mathdotnet for function maximization

In my C# program I have a dataset where each data point consists of:
a stimulus intensity (intensity) as x-coordinate
the percentage of correct response (percentageCorrect) to stimulus as y-coordinate
When the intensity is low percentageCorrect is low. When the intensity is high the percentageCorrect is high. The function graph is an S-shaped curve as the percentageCorrect reaches an asymptote at low and high ends.
I am trying to find the threshold intensity where percentageCorrect is half way between the asymtotes at either end (center of the S-shaped curve)
I understand this to be a function maximization problem that can be solved by the Nelder Meade Simplex algorithm.
I am trying to solve my problem using the Nelder Meade Simplex algorithm in mathdotnet and its IObjectiveFunction parameter.
However, I am having trouble understanding the API of the NedlerMeadeSimplex class FindMinimum method and the IObjectiveFunction EvaluateAt method.
I am new to numerical analysis that is pre-requisite for this question.
Specific questions are:
For the NedlerMeadeSimplex class FindMinimum method what are the initialGuess and initialPertubation parameters?
For the IObjectiveFunction EvaluateAt method, what is the point parameter? I vaguely understand that the point parameter is a datum in the dataset being minimized
How can I map my data set to this API and solve my problem?
Thanks for any guidance on this.
The initial guess is a guess at the model parameters.
I've always used the forms that don't require an entry of the initialPertubation parameter, so I can't help you there.
The objective function is what your are trying to minimize. For example, for a least squares fit, it would calculate the sum of squared areas at the point given in the argument. Something like this:
private double SumSqError(Vector<double> v)
{
double err = 0;
for (int i = 0; i < 100; i++)
{
double y_val = v[0] + v[1] * Math.Exp(v[2] * x[i]);
err += Math.Pow(y_val - y[i], 2);
}
return err;
}
You don't have to supply the point. The algorithm does that over and over while searching for the minimum. Note that the subroutine as access to the vector x.
Here is the code for a test program fitting a function to random data:
private void btnMinFit_Click(object sender, EventArgs e)
{
Random RanGen = new Random();
x = new double[100];
y = new double[100];
// fit exponential expression with three parameters
double a = 5.0;
double b = 0.5;
double c = 0.05;
// create data set
for (int i = 0; i < 100; i++) x[i] = 10 + Convert.ToDouble(i) * 90.0 / 99.0; // values span 10 to 100
for (int i = 0; i < 100; i++)
{
double y_val = a + b * Math.Exp(c * x[i]);
y[i] = y_val + 0.1 * RanGen.NextDouble() * y_val; // add error term scaled to y-value
}
// var fphv = new Func<double, double, double, double>((x, A, B) => A * x + B * x + A * B * x * x); extraneous test
var f1 = new Func<Vector<double>, double>(x => LogEval(x));
var obj = ObjectiveFunction.Value(f1);
var solver = new NelderMeadSimplex(1e-5, maximumIterations: 10000);
var initialGuess = new DenseVector(new[] { 3.0, 6.0, 0.6 });
var result = solver.FindMinimum(obj, initialGuess);
Console.WriteLine(result.MinimizingPoint.ToString());
}

How to correctly calculate Fisher Transform indicator

I'm writing a small technical analysis library that consists of items that are not availabile in TA-lib. I've started with an example I found on cTrader and matched it against the code found in the TradingView version.
Here's the Pine Script code from TradingView:
len = input(9, minval=1, title="Length")
high_ = highest(hl2, len)
low_ = lowest(hl2, len)
round_(val) => val > .99 ? .999 : val < -.99 ? -.999 : val
value = 0.0
value := round_(.66 * ((hl2 - low_) / max(high_ - low_, .001) - .5) + .67 * nz(value[1]))
fish1 = 0.0
fish1 := .5 * log((1 + value) / max(1 - value, .001)) + .5 * nz(fish1[1])
fish2 = fish1[1]
Here's my attempt to implement the indicator:
public class FisherTransform : IndicatorBase
{
public int Length = 9;
public decimal[] Fish { get; set; }
public decimal[] Trigger { get; set; }
decimal _maxHigh;
decimal _minLow;
private decimal _value1;
private decimal _lastValue1;
public FisherTransform(IEnumerable<Candle> candles, int length)
: base(candles)
{
Length = length;
RequiredCount = Length;
_lastValue1 = 1;
}
protected override void Initialize()
{
Fish = new decimal[Series.Length];
Trigger = new decimal[Series.Length];
}
public override void Compute(int startIndex = 0, int? endIndex = null)
{
if (endIndex == null)
endIndex = Series.Length;
for (int index = 0; index < endIndex; index++)
{
if (index == 1)
{
Fish[index - 1] = 1;
}
_minLow = Series.Average.Lowest(Length, index);
_maxHigh = Series.Average.Highest(Length, index);
_value1 = Maths.Normalize(0.66m * ((Maths.Divide(Series.Average[index] - _minLow, Math.Max(_maxHigh - _minLow, 0.001m)) - 0.5m) + 0.67m * _lastValue1));
_lastValue1 = _value1;
Fish[index] = 0.5m * Maths.Log(Maths.Divide(1 + _value1, Math.Max(1 - _value1, .001m))) + 0.5m * Fish[index - 1];
Trigger[index] = Fish[index - 1];
}
}
}
IndicatorBase class and CandleSeries class
Math Helpers
The problem
The output values appear to be within the expected range however my Fisher Transform cross-overs do not match up with what I am seeing on TradingView's version of the indicator.
Question
How do I properly implement the Fisher Transform indicator in C#? I'd like this to match TradingView's Fisher Transform output.
What I Know
I've check my data against other indicators that I have personally written and indicators from TA-Lib and those indicators pass my unit tests. I've also checked my data against the TradingView data candle by candle and found that my data matches as expected. So I don't suspect my data is the issue.
Specifics
CSV Data - NFLX 5 min agg
Pictured below is the above-shown Fisher Transform code applied to a TradingView chart. My goal is to match this output as close as possible.
Fisher Cyan
Trigger Magenta
Expected Outputs:
Crossover completed at 15:30 ET
Approx Fisher Value is 2.86
Approx Trigger Value is 1.79
Crossover completed at 10:45 ET
Approx Fisher Value is -3.67
Approx Trigger Value is -3.10
My Actual Outputs:
Crossover completed at 15:30 ET
My Fisher Value is 1.64
My Trigger Value is 1.99
Crossover completed at 10:45 ET
My Fisher Value is -1.63
My Trigger Value is -2.00
Bounty
To make your life easier I'm including a small console application
complete with passing and failing unit tests. All unit tests are
conducted against the same data set. The passing unit tests are from a
tested working Simple Moving Average indicator. The failing unit
tests are against the Fisher Transform indicator in question.
Project Files (updated 5/14)
Help get my FisherTransform tests to pass and I'll award the bounty.
Just comment if you need any additional resources or information.
Alternative Answers that I'll consider
Submit your own working FisherTransform in C#
Explain why my FisherTransform is actually working as expected
The code has two errors.
1) wrong extra brackets. The correct line is:
_value1 = Maths.Normalize(0.66m * (Maths.Divide(Series.Average[index] - _minLow, Math.Max(_maxHigh - _minLow, 0.001m)) - 0.5m) + 0.67m * _lastValue1);
2) Min and max functions must be:
public static decimal Highest(this decimal[] series, int length, int index)
{
var maxVal = series[index]; // <----- HERE WAS AN ERROR!
var lookback = Math.Max(index - length, 0);
for (int i = index; i-- > lookback;)
maxVal = Math.Max(series[i], maxVal);
return maxVal;
}
public static decimal Lowest(this decimal[] series, int length, int index)
{
var minVal = series[index]; // <----- HERE WAS AN ERROR!
var lookback = Math.Max(index - length, 0);
for (int i = index; i-- > lookback;)
{
//if (series[i] != 0) // <----- HERE WAS AN ERROR!
minVal = Math.Min(series[i], minVal);
}
return minVal;
}
3) confusing test params. Please recheck your unittest values. AFTER THE UPDATE TESTS STILL NOT FIXED. For an example, the first FisherTransforms_ValuesAreReasonablyClose_First() has mixed values
var fish = result.Fish.Last(); //is equal to -3.1113144510775780365063063706
var trig = result.Trigger.Last(); //is equal to -3.6057793808025449204415435710
// TradingView Values for NFLX 5m chart at 10:45 ET
var fisherValue = -3.67m;
var triggerValue = -3.10m;

NAudio FFT returns small and equal magnitude values for all frequencies

I'm working on a project with NAudio 1.9 and I want to compute an fft for an entire song, i.e split the song in chunks of equal size and compute fft for each chunk. The problem is that NAudio FFT function returns really small and equal values for any freq in the freq spectrum.
I searched for previous related posts but none seemed to help me.
The code that computes FFT using NAudio:
public IList<FrequencySpectrum> Fft(uint windowSize) {
IList<Complex[]> timeDomainChunks = this.SplitInChunks(this.audioContent, windowSize);
return timeDomainChunks.Select(this.ToFrequencySpectrum).ToList();
}
private IList<Complex[]> SplitInChunks(float[] audioContent, uint chunkSize) {
IList<Complex[]> splittedContent = new List<Complex[]>();
for (uint k = 0; k < audioContent.Length; k += chunkSize) {
long size = k + chunkSize < audioContent.Length ? chunkSize : audioContent.Length - k;
Complex[] chunk = new Complex[size];
for (int i = 0; i < chunk.Length; i++) {
//i've tried windowing here but didn't seem to help me
chunk[i].X = audioContent[k + i];
chunk[i].Y = 0;
}
splittedContent.Add(chunk);
}
return splittedContent;
}
private FrequencySpectrum ToFrequencySpectrum(Complex[] timeDomain) {
int m = (int) Math.Log(timeDomain.Length, 2);
//true = forward fft
FastFourierTransform.FFT(true, m, timeDomain);
return new FrequencySpectrum(timeDomain, 44100);
}
The FrequencySpectrum:
public struct FrequencySpectrum {
private readonly Complex[] frequencyDomain;
private readonly uint samplingFrequency;
public FrequencySpectrum(Complex[] frequencyDomain, uint samplingFrequency) {
if (frequencyDomain.Length == 0) {
throw new ArgumentException("Argument value must be greater than 0", nameof(frequencyDomain));
}
if (samplingFrequency == 0) {
throw new ArgumentException("Argument value must be greater than 0", nameof(samplingFrequency));
}
this.frequencyDomain = frequencyDomain;
this.samplingFrequency = samplingFrequency;
}
//returns magnitude for freq
public float this[uint freq] {
get {
if (freq >= this.samplingFrequency) {
throw new IndexOutOfRangeException();
}
//find corresponding bin
float k = freq / ((float) this.samplingFrequency / this.FftWindowSize);
Complex c = this.frequencyDomain[checked((uint) k)];
return (float) Math.Sqrt(c.X * c.X + c.Y * c.Y);
}
}
}
for a file that contains a sine wave of 440Hz
expected output: values like 0.5 for freq=440 and 0 for the others
actual output: values like 0.000168153987f for any freq in the spectrum
It seems that I made 4 mistakes:
1) Here I'm asumming that sampling freq is 44100. This was not the reason my code wasn't working, though
return new FrequencySpectrum(timeDomain, 44100);
2) Always make a visual representation of your output data! I must learn this lesson... It seems that for a file containing a 440Hz sine wave I'm getting the right result but...
3) The frequency spectrum is a little shifted from what I was expecting because of this:
int m = (int) Math.Log(timeDomain.Length, 2);
FastFourierTransform.FFT(true, m, timeDomain);
timeDomain is an array of size 44100 becaused that's the value of windowSize (I called the method with windowSize = 44100), but FFT method expects a window size with a value power of 2. I'm saying "Here, NAudio, compute me the fft of this array that has 44100 elements, but take into account only the first 32768". I didn't realize that this was going to have serious implications on the result:
float k = freq / ((float) this.samplingFrequency / this.FftWindowSize);
Here this.FftWindowSize is a property based on the size of the array, not on m. So, after visualizing the result I found out that magnitude of 440Hz freq was actually corresponding to the call:
spectrum[371]
instead of
spectrum[440]
So, my mistake was that the window size of fft (m) was not corresponding to the actual length of the array (FrequencySpectrum.FftWindowSize).
4) The small values that I was receiving for the magnitudes came from the fact that the audio file on which I was testing my code wasn't recorded with enough gain.

Alglib Data fitting with minlmoptimize does not minimize the results. Full c# included

I'm having trouble implementing the lm optimizer in the alglib library. I'm not sure why the parameters are hardly changing at all while still receiving an exit code of 4. I have been unable to determine what i am doing wrong with the documentation for alglib. Below is the full source I am running:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Threading.Tasks;
namespace FBkineticsFitter
{
class Program
{
public static int Main(string[] args)
{
/*
* This code finds the parameters ka, kd, and Bmax from the minimization of the residuals using "V" mode of the Levenberg-Marquardt optimizer (alglib library).
* This optimizer is used because the equation is non-linear and this particular version of the optimizer does not require the ab inito calculation of partial
* derivatives, a jacobian matrix, or other parameter-space definitions, so it's implementation is simple.
*
* The equations being solved represent a model of a protein-protein interaction where protein in solution is interacting with immobilized protein on a sensor
* in a 1:1 stoichiometery. Mass transport limit is not taken into account. The detials of this equation are described in:
* R.B.M. Schasfoort and Anna J. Tudos Handbook of Surface Plasmon Resonance, 2008, Chapter 5, ISBN: 978-0-85404-267-8
*
* Y=((ka*Cpro*Bmax)/(ka*Cpro+kd))*(1-exp(-1*X*(ka*Cpro+kd))) ; this equation describes the association phase
*
* Y=Req*exp(-1*X*kd) ; this equation describes the dissociation phase
*
* The data are fit globally such that Bmax and Req parameters are linked and kd parameters are linked during simultaneous optimization for the most robust fit
*
* Y= signal
* X= time
* ka= association constant
* kd= dissociation constant
* Bmax= maximum binding capacity at equilibrium
* Req=(Cpro/(Cpro+kobs))*Bmax :. in this case Req=Bmax because Cpro=0 during the dissociation step
* Cpro= concentration of protein in solution
*
* additional calculations:
* kobs=ka*Cpro
* kD=kd/ka
*/
GetRawDataXY(#"C:\Results.txt");
double epsg = .0000001;
double epsf = 0;
double epsx = 0;
int maxits = 0;
alglib.minlmstate state;
alglib.minlmreport rep;
alglib.minlmcreatev(2, GlobalVariables.param, 0.0001, out state);
alglib.minlmsetcond(state, epsg, epsf, epsx, maxits);
alglib.minlmoptimize(state, Calc_residuals, null, null);
alglib.minlmresults(state, out GlobalVariables.param, out rep);
System.Console.WriteLine("{0}", rep.terminationtype); ////1=relative function improvement is no more than EpsF. 2=relative step is no more than EpsX. 4=gradient norm is no more than EpsG. 5=MaxIts steps was taken. 7=stopping conditions are too stringent,further improvement is impossible, we return best X found so far. 8= terminated by user
System.Console.WriteLine("{0}", alglib.ap.format(GlobalVariables.param, 20));
System.Console.ReadLine();
return 0;
}
public static void Calc_residuals(double[] param, double[] fi, object obj)
{
/*calculate the difference of the model and the raw data at each X (I.E. residuals)
* the sum of the square of the residuals is returned to the optimized function to be minimized*/
fi[0] = 0;
fi[1] = 0;
for (int i = 0; i < GlobalVariables.rawXYdata[0].Count();i++ )
{
if (GlobalVariables.rawXYdata[1][i] <= GlobalVariables.breakpoint)
{
fi[0] += System.Math.Pow((kaEQN(GlobalVariables.rawXYdata[0][i]) - GlobalVariables.rawXYdata[1][i]), 2);
}
else
{
fi[1] += System.Math.Pow((kdEQN(GlobalVariables.rawXYdata[0][i]) - GlobalVariables.rawXYdata[1][i]), 2);
}
}
}
public static double kdEQN(double x)
{
/*Calculate kd Y value based on the incremented parameters*/
return GlobalVariables.param[2] * Math.Exp(-1 * x * GlobalVariables.param[1]);
}
public static double kaEQN(double x)
{
/*Calculate ka Y value based on the incremented parameters*/
return ((GlobalVariables.param[0] * GlobalVariables.Cpro * GlobalVariables.param[2]) / (GlobalVariables.param[0] * GlobalVariables.Cpro + GlobalVariables.param[1])) * (1 - Math.Exp(-1 * x * (GlobalVariables.param[0] * GlobalVariables.Cpro + GlobalVariables.param[1])));
}
public static void GetRawDataXY(string filename)
{
/*Read in Raw data From tab delim txt*/
string[] elements = { "x", "y" };
int count = 0;
GlobalVariables.rawXYdata[0] = new double[1798];
GlobalVariables.rawXYdata[1] = new double[1798];
using (StreamReader sr = new StreamReader(filename))
{
while (sr.Peek() >= 0)
{
elements = sr.ReadLine().Split('\t');
GlobalVariables.rawXYdata[0][count] = Convert.ToDouble(elements[0]);
GlobalVariables.rawXYdata[1][count] = Convert.ToDouble(elements[1]);
count++;
}
}
}
public class GlobalVariables
{
public static double[] param = new double[] { 1, .02, 0.13 }; ////ka,kd,Bmax these are initial guesses for the algorithm
public static double[][] rawXYdata = new double[2][];
public static double Cpro = 100E-9;
public static double kD = 0;
public static double breakpoint = 180;
}
}
}
According to Sergey Bochkanova The issue is the following:
"You should use param[] array which is provided to you by optimizer. It creates its internal copy of your param, and updates this copy - not your param array.
From the optimizer point of view, it has function which never changes when it changes its internal copy of param. So, it terminates right after first iteration."
Here is the updated and working example code:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Threading.Tasks;
namespace FBkineticsFitter
{
class Program
{
public static int Main(string[] args)
{
/*
* This code finds the parameters ka, kd, and Bmax from the minimization of the residuals using "V" mode of the Levenberg-Marquardt optimizer (alglib library).
* This optimizer is used because the equation is non-linear and this particular version of the optimizer does not require the ab inito calculation of partial
* derivatives, a jacobian matrix, or other parameter-space definitions, so it's implementation is simple.
*
* The equations being solved represent a model of a protein-protein interaction where protein in solution is interacting with immobilized protein on a sensor
* in a 1:1 stoichiometery. Mass transport limit is not taken into account. The detials of this equation are described in:
* R.B.M. Schasfoort and Anna J. Tudos Handbook of Surface Plasmon Resonance, 2008, Chapter 5, ISBN: 978-0-85404-267-8
*
* Y=((Cpro*Rmax)/(Cpro+kd))*(1-exp(-1*X*(ka*Cpro+kd))) ; this equation describes the association phase
*
* Y=Req*exp(-1*X*kd)+NS ; this equation describes the dissociation phase
*
* According to ForteBio's Application Notes #14 the amplitudes of the data can be correctly accounted for by modifying the above equations as follows:
*
* Y=(Rmax*(1/(1+(kd/(ka*Cpro))))*(1-exp(((-1*Cpro)+kd)*X)) ; this equation describes the association phase
*
* Y=Y0*(exp(-1*kd*(X-X0))) ; this equation describes the dissociation phase
*
*
*
* The data are fit simultaneously such that all fitting parameters are linked during optimization for the most robust fit
*
* Y= signal
* X= time
* ka= association constant [fitting parameter 0]
* kd= dissociation constant [fitting parameter 1]
* Rmax= maximum binding capacity at equilibrium [fitting parameter 2]
* KD=kd/ka
* kobs=ka*Cpro+kd
* Req=(Cpro/(Cpro+KD))*Rmax
* Cpro= concentration of protein in solution
* NS= non-specific binding at time=infinity (constant correction for end point of fit) [this is taken into account in the amplitude corrected formula: Y0=Ylast]
* Y0= the initial value of Y for the first point of the dissociation curve (I.E. the last point of the association phase)
* X0= the initial value of X for the first point of the dissociation phase
*
*/
GetRawDataXY(#"C:\Results.txt");
double epsg = .00001;
double epsf = 0;
double epsx = 0;
int maxits = 10000;
alglib.minlmstate state;
alglib.minlmreport rep;
double[] param = new double[] { 1000000, .0100, 0.20};////ka,kd,Rmax these are initial guesses for the algorithm and should be mid range for the expected data., The last parameter Rmax should be guessed as the maximum Y-value of Ka
double[] scaling= new double[] { 1E6,1,1};
alglib.minlmcreatev(2, param, 0.001, out state);
alglib.minlmsetcond(state, epsg, epsf, epsx, maxits);
alglib.minlmsetgradientcheck(state, 1);
alglib.minlmsetscale(state, scaling);
alglib.minlmoptimize(state, Calc_residuals, null, V.rawXYdata);
alglib.minlmresults(state, out param, out rep);
System.Console.WriteLine("{0}", rep.terminationtype); ////1=relative function improvement is no more than EpsF. 2=relative step is no more than EpsX. 4=gradient norm is no more than EpsG. 5=MaxIts steps was taken. 7=stopping conditions are too stringent,further improvement is impossible, we return best X found so far. 8= terminated by user
System.Console.WriteLine("{0}", alglib.ap.format(param, 25));
System.Console.ReadLine();
return 0;
}
public static void Calc_residuals(double[] param, double[] fi, object obj)
{
/*calculate the difference of the model and the raw data at each X (I.E. residuals)
* the sum of the square of the residuals is returned to the optimized function to be minimized*/
CalcVariables(param);
fi[0] = 0;
fi[1] = 0;
for (int i = 0; i < V.rawXYdata[0].Count(); i++)
{
if (V.rawXYdata[0][i] <= V.breakpoint)
{
fi[0] += System.Math.Pow((kaEQN(V.rawXYdata[0][i], param) - V.rawXYdata[1][i]), 2);
}
else
{
if (!V.breakpointreached)
{
V.breakpointreached = true;
V.X_0 = V.rawXYdata[0][i];
V.Y_0 = V.rawXYdata[1][i];
}
fi[1] += System.Math.Pow((kdEQN(V.rawXYdata[0][i], param) - V.rawXYdata[1][i]), 2);
}
}
if (param[0] <= 0 || param[1] <=0 || param[2] <= 0)////Exponentiates the error if the parameters go negative to favor positive non-zero values
{
fi[0] = Math.Pow(fi[0], 2);
fi[1] = Math.Pow(fi[1], 2);
}
System.Console.WriteLine("{0}"+" "+V.Cpro+" -->"+fi[0], alglib.ap.format(param, 5));
Console.WriteLine((kdEQN(V.rawXYdata[0][114], param)));
}
public static double kdEQN(double X, double[] param)
{
/*Calculate kd Y value based on the incremented parameters*/
return (V.Rmax * (1 / (1 + (V.kd / (V.ka * V.Cpro)))) * (1 - Math.Exp((-1 * V.ka * V.Cpro) * V.X_0))) * Math.Exp((-1 * V.kd) * (X - V.X_0));
}
public static double kaEQN(double X, double[] param)
{
/*Calculate ka Y value based on the incremented parameters*/
return ((V.Cpro * V.Rmax) / (V.Cpro + V.kd)) * (1 - Math.Exp(-1 * X * ((V.ka * V.Cpro) + V.kd)));
}
public static void GetRawDataXY(string filename)
{
/*Read in Raw data From tab delim txt*/
string[] elements = { "x", "y" };
int count = 0;
V.rawXYdata[0] = new double[226];
V.rawXYdata[1] = new double[226];
using (StreamReader sr = new StreamReader(filename))
{
while (sr.Peek() >= 0)
{
elements = sr.ReadLine().Split('\t');
V.rawXYdata[0][count] = Convert.ToDouble(elements[0]);
V.rawXYdata[1][count] = Convert.ToDouble(elements[1]);
count++;
}
}
}
public class V
{
/*Global Variables*/
public static double[][] rawXYdata = new double[2][];
public static double Cpro = 100E-9;
public static bool breakpointreached = false;
public static double X_0 = 0;
public static double Y_0 = 0;
public static double ka = 0;
public static double kd = 0;
public static double Rmax = 0;
public static double KD = 0;
public static double Kobs = 0;
public static double Req = 0;
public static double breakpoint = 180;
}
public static void CalcVariables(double[] param)
{
V.ka = param[0];
V.kd = param[1];
V.Rmax = param[2];
V.KD = param[1] / param[0];
V.Kobs = param[0] * V.Cpro + param[1];
V.Req = (V.Cpro / (V.Cpro + param[0] * V.Cpro + param[1])) * param[2];
}
}
}

How can I return a number based on a skewed normal distribution?

If I want to get e^x as you can see in this figure
which I can just call Math.Exp(x);
What I want to do make a function that returns y for my own graph (like this) which is a normal distribution skewed left or right or not skewed at all. It will have some standard deviation and some maximum height.
I've been googling and thinking about how to do it for a while but my math skills just aren't good enough. I was hoping I could get some help with this.
First, a skewed normal random point(x) will be created, then probability distribution function (PDF) of that point will be found.
The math used here depends on Ermak and Nasstrom's 1995 study. If you like, take a look to sample fortran77 code exists in this publication (Most of the variable names used in the code do not fit C# naming conventions, simply because I wanted the reader to relate them to the original paper).
private static double GetSkewedRandomNumber(double standardDeviation=1, double skewness=0, int dbIteration = 10)
{
var random = new Random();
var variance = Math.Pow(standardDeviation, 2);
const double a = 2.236067977; // --> Square root of 5
const double b = 0.222222222;
const double c = 243 / 32;
double finalrun, sumdbran = 0;
double dbmom3, dbmean1, dbmean2, dbprob1, dbdelta1, dbdelta2, randomNumber1, randomNumber2, dbran, terma, termb;
dbmom3 = Math.Sqrt(dbIteration) * skewness * Math.Pow(variance, 1.5);
terma = b / variance;
termb = Math.Sqrt(Math.Pow(dbmom3, 2) + c * Math.Pow(variance, 3));
dbmean1 = terma * (dbmom3 - termb);
dbmean2 = terma * (dbmom3 + termb);
dbprob1 = dbmean2 / (2 * a * dbmean1 * (dbmean1 - dbmean2));
dbdelta1 = -a * dbmean1;
dbdelta2 = a * dbmean2;
//Loop for summation of double block random numbers in each final random number
for (int i = 0; i < dbIteration; i++)
{
randomNumber1 = random.NextDouble();
randomNumber2 = random.NextDouble();
if (randomNumber1 < (2 * dbdelta1 * dbprob1))
dbran = dbmean1 + (2 * dbdelta1 * (randomNumber2 - 0.5));
else
dbran = dbmean2 + (2 * dbdelta2 * (randomNumber2 - 0.5));
//sumdbran is the sum of double block random numbers created by the iteration
sumdbran = sumdbran + dbran;
}
//Calculate final skewed normal random number
finalrun = sumdbran / Math.Sqrt(dbIteration);
return finalrun;
}
and below code is for PDF:
public static double GetPDF(double mean, double standardDeviation, double skewness, double x)
{
var pdf = 0d;
var variable = skewness * x;
var normalPDF = NormalDistribution.GetPDF(0, 1, x);
var normalCDF = NormalDistribution.GetCDF(0, 1, variable);
pdf = 2 * normalPDF * normalCDF;
return pdf;
}
You can implement NormalDistribution.GetPDF and NormalDistribution.GetCDF methods yourself (mentioned in GetPDF method). They are simply calculating Probability Density Funciton and Cumulative Distribution Function of a normal distribution. To make it simple and focus on the question I prefered not to add the code. For those who also want to check CDF calculation of the skewed normal distribution please check here
Here is an example of a skewed distribution derived from above code and its relevant pdf-cdf graphs.
pdf graph of positive (+0.75) skewed normal distribution
pdf graph of negative (-0.75) skewed normal distribution
cdf graph of positive (0.75) skewed normal distribution
cdf graph of negative (-0.75) skewed normal distribution

Categories

Resources