I am trying to rewrite the R function acf that computes Auto-Correlation into C#:
class AC
{
static void Main(string[] args)
{
double[] y = new double[] { 772.9, 909.4, 1080.3, 1276.2, 1380.6, 1354.8, 1096.9, 1066.7, 1108.7, 1109, 1203.7, 1328.2, 1380, 1435.3, 1416.2, 1494.9, 1525.6, 1551.1, 1539.2, 1629.1, 1665.3, 1708.7, 1799.4, 1873.3, 1973.3, 2087.6, 2208.3, 2271.4, 2365.6, 2423.3, 2416.2, 2484.8, 2608.5, 2744.1, 2729.3, 2695, 2826.7, 2958.6, 3115.2, 3192.4, 3187.1, 3248.8, 3166, 3279.1, 3489.9, 3585.2, 3676.5 };
Console.WriteLine(String.Join("\n", acf(y, 17)));
Console.Read();
}
public static double[] acf(double[] series, int maxlag)
{
List<double> acf_values = new List<double>();
float flen = (float)series.Length;
float xbar = ((float)series.Sum()) / flen;
int N = series.Length;
double variance = 0.0;
for (int j = 0; j < N; j++)
{
variance += (series[j] - xbar)*(series[j] - xbar);
}
variance = variance / N;
for (int lag = 0; lag < maxlag + 1; lag++)
{
if (lag == 0)
{
acf_values.Add(1.0);
continue;
}
double autocv = 0.0;
for (int k = 0; k < N - lag; k++)
{
autocv += (series[k] - xbar) * (series[lag + k] - xbar);
}
autocv = autocv / (N - lag);
acf_values.Add(autocv / variance);
}
return acf_values.ToArray();
}
}
I have two problems with this code:
For large arrays (length = 25000), this code takes about 1-2 seconds whereas R's acf function returns in less than 200 ms.
The output does not match R's output exactly.
Any suggestions on where I messed up or any optimizations to the code?
C# R
1 1 1
2 0.945805846 0.925682317
3 0.89060465 0.85270658
4 0.840762283 0.787096604
5 0.806487301 0.737850083
6 0.780259665 0.697253317
7 0.7433111 0.648420319
8 0.690344341 0.587527097
9 0.625632533 0.519141887
10 0.556860982 0.450228026
11 0.488922355 0.38489632
12 0.425406196 0.325843042
13 0.367735169 0.273845337
14 0.299647764 0.216766466
15 0.22344712 0.156888402
16 0.14575994 0.099240809
17 0.072389526 0.047746281
18 -0.003238526 -0.002067146
You might try changing this line:
autocv = autocv / (N - lag);
to this:
autocv = autocv / N;
Either of these is an acceptable divisor for the expected value, and R is clearly using the second one.
To see this without having access to a C# compiler, we can read in the table that you have, and adjust the values by dividing each value in the C# column by N/(N - lag), and see that they agree with the values from R.
N is 47 here, and lag ranges from 0 to 17, so N - lag is 47:30.
After copying the table above into my local clipboard:
cr <- read.table(file='clipboard', comment='', check.names=FALSE)
cr$adj <- cr[[1]]/47*(47:30)
max(abs(cr$R - cr$adj))
## [1] 2.2766e-09
A much closer approximation.
You might do better if you define flen and xbar as type double as floats do not have 9 decimal digits of precision.
The reason that R is so much faster is that acf is implemented as native and non-managed code (either C or FORTRAN).
Related
I am trying to turn the BBP Formula (Bailey-Borwein-Plouffe) in to C# code, it is digit extraction of pi in base 16 (spigot algorithm), the idea is give the input of what index/decimal place you want of pi then get that single digit. Let's say I want the digit that are at the decimal place/index 40000 (in base 16) without having to calculate pi with 40000 decimals because I don't care about the other digits.
Anyhow here is the math formula, (doesn't look like it should be to much code? )
Can't say I understand 100% what the formal mean, if I did I probably be able to make it in to code, but from my understanding looking at it.
Is this correct?
pseudo code
Pi = SUM = (for int n = 0; n < infinity;n++) { SUM += ((4/((8*n)+1))
- (2/((8*n)+4)) - (1/((8*n)+5)) - (1/((8*n)+6))*((1/16)^n)) }
Capital sigma basically is like a "for loop" to sum sequences together?
example
and in C# code:
static int CapSigma(int _start, int _end)
{
int sum = 0;
for(int n = _start; n <= _end; n++)
{
sum += n;
}
return (sum);
}
Code so far (not working):
static int BBPpi(int _precision)
{
int pi = 0;
for(int n = 0; n < _precision; n++)
{
pi += ((16 ^ -n) * (4 / (8 * n + 1) - 2 / (8 * n + 4) - 1 / (8 * n + 5) - 1 / (8 * n + 6)));
}
return (pi);
}
I'm not sure how to make it in to actual code also if my pseudo code math is correct?
How to sum 0 to infinity? Can't do it in a for loop and also where in the formula is the part ("input") that specify what nth (index) digit you want to get out? is it the start n (n = 0)? so too get digit 40000 would be n =40000?
You need to cast to double :
class Program
{
static void Main(string[] args)
{
double sum = 0;
for (int i = 1; i < 100; i++)
{
sum += BBPpi(i);
Console.WriteLine(sum.ToString());
}
Console.ReadLine();
}
static double BBPpi(int n)
{
double pi = ((16 ^ -n) * (4.0 / (8.0 * (double)n + 1.0) - 2 / (8.0 * (double)n + 4.0) - 1 / (8.0 * (double)n + 5.0) - 1.0 / (8.0 * (double)n + 6.0)));
return (pi);
}
}
I'm taking the Coursera machine learning course right now and I cant get my gradient descent linear regression function to minimize. I use: one dependent variable, an intercept, and four values of x and y, therefore the equations are fairly simple. The final value of the Gradient Decent equation varies wildly depending on the initial values of alpha and beta and I cant figure out why.
I've only been coding for about two weeks, so my knowledge is limited to say the least, please keep this in mind if you take the time to help.
using System;
namespace LinearRegression
{
class Program
{
static void Main(string[] args)
{
Random rnd = new Random();
const int N = 4;
//We randomize the inital values of alpha and beta
double theta1 = rnd.Next(0, 100);
double theta2 = rnd.Next(0, 100);
//Values of x, i.e the independent variable
double[] x = new double[N] { 1, 2, 3, 4 };
//VAlues of y, i.e the dependent variable
double[] y = new double[N] { 5, 7, 9, 12 };
double sumOfSquares1;
double sumOfSquares2;
double temp1;
double temp2;
double sum;
double learningRate = 0.001;
int count = 0;
do
{
//We reset the Generalized cost function, called sum of squares
//since I originally used SS to
//determine if the function was minimized
sumOfSquares1 = 0;
sumOfSquares2 = 0;
//Adding 1 to counter for each iteration to keep track of how
//many iterations are completed thus far
count += 1;
//First we calculate the Generalized cost function, which is
//to be minimized
sum = 0;
for (int i = 0; i < (N - 1); i++)
{
sum += Math.Pow((theta1 + theta2 * x[i] - y[i]), 2);
}
//Since we have 4 values of x and y we have 1/(2*N) = 1 /8 = 0.125
sumOfSquares1 = 0.125 * sum;
//Then we calcualte the new alpha value, using the derivative of
//the cost function.
sum = 0;
for (int i = 0; i < (N - 1); i++)
{
sum += theta1 + theta2 * x[i] - y[i];
}
//Since we have 4 values of x and y we have 1/(N) = 1 /4 = 0.25
temp1 = theta1 - learningRate * 0.25 * sum;
//Same for the beta value, it has a different derivative
sum = 0;
for (int i = 0; i < (N - 1); i++)
{
sum += (theta1 + theta2 * x[i]) * x[i] - y[i];
}
temp2 = theta2 - learningRate * 0.25 * sum;
//WE change the values of alpha an beta at the same time, otherwise the
//function wont work
theta1 = temp1;
theta2 = temp2;
//We then calculate the cost function again, with new alpha and beta values
sum = 0;
for (int i = 0; i < (N - 1); i++)
{
sum += Math.Pow((theta1 + theta2 * x[i] - y[i]), 2);
}
sumOfSquares2 = 0.125 * sum;
Console.WriteLine("Alpha: {0:N}", theta1);
Console.WriteLine("Beta: {0:N}", theta2);
Console.WriteLine("GCF Before: {0:N}", sumOfSquares1);
Console.WriteLine("GCF After: {0:N}", sumOfSquares2);
Console.WriteLine("Iterations: {0}", count);
Console.WriteLine(" ");
} while (sumOfSquares2 <= sumOfSquares1 && count < 5000);
//we end the iteration cycle once the generalized cost function
//cannot be reduced any further or after 5000 iterations
Console.ReadLine();
}
}
}
There are two bugs in the code.
First, I assume that you would like to iterate through all the element in the array. So rework the for loop like this: for (int i = 0; i < N; i++)
Second, when updating the theta2 value the summation is not calculated well. According to the update function it should be look like this: sum += (theta1 + theta2 * x[i] - y[i]) * x[i];
Why the final values depend on the initial values?
Because the gradient descent update step is calculated from these values. If the initial values (Starting Point) are too big or too small, then it will be too far away from the final values (Final Value). You could solve this problem by:
Increasing the iteration steps (e.g. 5000 to 50000): gradient descent algorithm has more time to converge.
Decreasing the learning rate (e.g. 0.001 to 0.01): gradient descent update steps are bigger, therefore it converges faster. Note: if the learning rate is too small, then it is possible to step through the global minimum.
The slope (theta2) is around 2.5 and the intercept (theta1) is around 2.3 for the given data. I have created a github project to fix your code and i have also added a shorter solution using LINQ. It is 5 line of codes. If you are curious check it out here.
I have a list of probabilities like
0.0442857142857143
0.664642857142857
0.291071428571429
I want to convert them to the nearest percentages so that the sum of percentages adds up to 100
so something like this
0.0442857142857143 - 4 %
0.664642857142857 - 67 %
0.291071428571429 - 29 %
I cannot rely on Math.Round to always give me results which will add up to 1. What would be the best way to do this?
This is an method that could do the job.
public int[] Round(params decimal[] values)
{
decimal total = values.Sum();
var percents = values.Select(x=> Math.Round(x/total*100)).ToArray();
int totalPercent = perents.Sum();
var diff = 100 - totalPercent;
percents[percents.Lenght - 1] += diff;
return percents;
}
Interesting collection of answers.
The problem here is that you are getting a cumulative error in your rounding operations. In some cases the accumulated error cancels out - some values round up, others down, cancelling the total error. In other cases such as the one you have here, the rounding errors are all negative, giving an accumulated total error of (approximately) -1.
The only way to work around this in the general case is to keep track of the total accumulated error and add/subtract when that error gets large enough. It's tedious, but the only real way to get this right:
static int[] ToIntPercents(double[] values)
{
int[] results = new int[values.Length];
double error = 0;
for (int i = 0; i < values.Length; i++)
{
double val = values[i] * 100;
int percent = (int)Math.Round(val + error);
error += val - percent;
if (Math.Abs(error) >= 0.5)
{
int sign = Math.Sign(error);
percent += sign;
error -= sign;
}
results[i] = percent;
}
return results;
}
This code produces reasonable results for any size array with a sum of approximately +1.0000 (or close enough). Array can contain negative and positive values, just as long as the sum is close enough to +1.0000 to introduce no gross errors.
The code accumulates the rounding errors and when the total error exceeds the acceptable range of -0.5 < error < +0.5 it adjusts the output. Using this method the the output array for your numbers would be: [4, 67, 29]. You could change the acceptable error range to be 0 <= error < 1, giving the output [4, 66, 30], but this causes odd results when the array contains negative numbers. If that's your preference, change the if statement in the middle of the method to read:
if (error < 0 || error >= 1)
You could just multiply the number by 100 (if you have the decimal number)
0.0442857142857143 * 100 = 4 %
0.664642857142857 * 100 = 66 %
0.291071428571429 * 100 = 29 %
E: correct, 0.291071428571429 wouldn't add up to 30%...
Since you don't seem to care which number is bumped, I'll use the last. The algo is pretty simple, and will works for both the .4 edge case where you must add 1 and the one at .5 where you must remove 1 :
1) Round each number but the last one
2) Subtract 100 from the sum you have
3) Assign the remainder to the last number
As an extension method, it looks like this :
public static int[] SplitIntoPercentage(this double[] input)
{
int[] results = new int[input.Length];
for (int i = 0; i < input.Length - 1; i++)
{
results[i] = (int)Math.Round(input[i] * 100, MidpointRounding.AwayFromZero);
}
results[input.Length - 1] = 100 - results.Sum();
return results;
}
And here's the associated unit tests :
[TestMethod]
public void IfSumIsUnder100ItShouldBeBumpedToIt()
{
double[] input = new []
{
0.044,
0.664,
0.294
};
var result = input.SplitIntoPercentage();
Assert.AreEqual(100, result.Sum());
Assert.AreEqual(4, result[0]);
Assert.AreEqual(66, result[1]);
Assert.AreEqual(30, result[2]);
}
[TestMethod]
public void IfSumIsOver100ItShouldBeReducedToIt()
{
double[] input = new[]
{
0.045,
0.665,
0.295
};
var result = input.SplitIntoPercentage();
Assert.AreEqual(100, result.Sum());
Assert.AreEqual(5, result[0]);
Assert.AreEqual(67, result[1]);
Assert.AreEqual(28, result[2]);
}
Once refactored a little bit, the result looks like this :
public static int[] SplitIntoPercentage(this double[] input)
{
int[] results = RoundEachValueButTheLast(input);
results = SetTheLastValueAsTheRemainder(input, results);
return results;
}
private static int[] RoundEachValueButTheLast(double[] input)
{
int[] results = new int[input.Length];
for (int i = 0; i < input.Length - 1; i++)
{
results[i] = (int)Math.Round(input[i]*100, MidpointRounding.AwayFromZero);
}
return results;
}
private static int[] SetTheLastValueAsTheRemainder(double[] input, int[] results)
{
results[input.Length - 1] = 100 - results.Sum();
return results;
}
Logic is , Firstly we have to round off the "after decimal value" then apply round off to whole value.
static long PercentageOut(double value)
{
value = value * 100;
value = Math.Round(value, 1, MidpointRounding.AwayFromZero); // Rounds "up"
value = Math.Round(value, 0, MidpointRounding.AwayFromZero); // Rounds to even
return Convert.ToInt64(value);
}
static void Main(string[] args)
{
double d1 = 0.0442857142857143;
double d2 = 0.664642857142857;
double d3 = 0.291071428571429;
long l1 = PercentageOut(d1);
long l2 = PercentageOut(d2);
long l3 = PercentageOut(d3);
Console.WriteLine(l1);
Console.WriteLine(l2);
Console.WriteLine(l3);
}
Output
4
67
29
---
sum is 100 %
Can anyone provide an explanation of the difference between using Math.Pow() and Math.Exp() in C# and .net ?
Is Exp()just taking a number to the Power using itself as the Exponent?
Math.Pow computes x y for some x and y.
Math.Exp computes e x for some x, where e is Euler's number.
Note that while Math.Pow(Math.E, d) produces the same result as Math.Exp(d), a quick benchmark comparison shows that Math.Exp actually executes about twice as fast as Math.Pow:
Trial Operations Pow Exp
1 1000 0.0002037 0.0001344 (seconds)
2 100000 0.0106623 0.0046347
3 10000000 1.0892492 0.4677785
Math.Pow(Math.E,n) = Math.Exp(n) //of course this is not actual code, just a human equation.
More info: Math.Pow and Math.Exp
Math.Exp(x) is ex. (See http://en.wikipedia.org/wiki/E_(mathematical_constant).)
Math.Pow(a, b) is ab.
Math.Pow(Math.E, x) and Math.Exp(x) are the same, though the second one is the idiomatic one to use if you are using e as the base.
Just a quick extension to the Benchmark contribution from p.s.w.g -
I wanted to see one more comparison, for equivalent of 10^x ==> e^(x * ln(10)), or {double ln10 = Math.Log(10.0); y = Math.Exp(x * ln10);}
Here's what I've got:
Operation Time
Math.Exp(x) 180 ns (nanoseconds)
Math.Pow(y, x) 440 ns
Math.Exp(x*ln10) 160 ns
Times are per 10x calls to Math functions.
What I don't understand is why the time for including a multiply in the loop, before entry to Exp(), consistently produces shorter times, unless there's a bug in this code, or the algorithm is value dependent?
The program follows.
namespace _10X {
public partial class Form1 : Form {
int nLoops = 1000000;
int ix;
// Values - Just to not always use the same number, and to confirm values.
double[] x = { 0.5, 1, 1.5, 2, 2.5, 3, 3.5, 4, 4.5, 5 };
public Form1() {
InitializeComponent();
Proc();
}
void Proc() {
double y;
long t0;
double t1, t2, t3;
t0 = DateTime.Now.Ticks;
for (int i = 0; i < nLoops; i++) {
for (ix = 0; ix < x.Length; ix++)
y = Math.Exp(x[ix]);
}
t1 = (double)(DateTime.Now.Ticks - t0) * 1e-7 / (double)nLoops;
t0 = DateTime.Now.Ticks;
for (int i = 0; i < nLoops; i++) {
for (ix = 0; ix < x.Length; ix++)
y = Math.Pow(10.0, x[ix]);
}
t2 = (double)(DateTime.Now.Ticks - t0) * 1e-7 / (double)nLoops;
double ln10 = Math.Log(10.0);
t0 = DateTime.Now.Ticks;
for (int i = 0; i < nLoops; i++) {
for (ix = 0; ix < x.Length; ix++)
y = Math.Exp(x[ix] * ln10);
}
t3 = (double)(DateTime.Now.Ticks - t0) * 1e-7 / (double)nLoops;
textBox1.Text = "t1 = " + t1.ToString("F8") + "\r\nt2 = " + t2.ToString("F8")
+ "\r\nt3 = " + t3.ToString("F8");
}
private void btnGo_Click(object sender, EventArgs e) {
textBox1.Clear();
Proc();
}
}
}
So I think I'm going with Math.Exp(x * ln10) until someone finds the bug...
In c# how do I evenly divide 100 into 7?
So the result would be
16
14
14
14
14
14
14
The code below is incorrect as all 7 values are set to 15 (totalling 105).
double [] vals = new double[7];
for (int i = 0; i < vals.Length; i++)
{
vals[i] = Math.Ceiling(100d / vals.Length);
}
Is there an easy way to do this in c#?
Thanks
To get my suggested result of 15, 15, 14, 14, 14, 14, 14:
// This doesn't try to cope with negative numbers :)
public static IEnumerable<int> DivideEvenly(int numerator, int denominator)
{
int rem;
int div = Math.DivRem(numerator, denominator, out rem);
for (int i=0; i < denominator; i++)
{
yield return i < rem ? div+1 : div;
}
}
Test:
foreach (int i in DivideEvenly(100, 7))
{
Console.WriteLine(i);
}
Here you go:
Func<int, int, IEnumerable<int>> f = (a, b) =>
Enumerable.Range(0,a/b).Select((n) => a / b + ((a % b) <= n ? 0 : 1))
Good luck explaining it in class though :)
Since this seems to be homework, here is a hint and not the full code.
You are doing Math.Ceiling and it converts 14.28 into 15.
The algorithm is this
Divide 100 by 7, put the result in X
Get the highest even number below X and put this in Y.
Multiply Y by 7 and put the answer in Z.
Take Z away from 100.
The answer is then 6 lots of Y plus whatever the result of step 4 was.
This algorithm may only work for this specific instance.
I'm sure you can write that in C#
Not sure if this is exactly what you are after, but I would think that if you use Math.ceiling you will always end up with too big a total. Math.floor would underestimate and leave you with a difference that can be added to one of your pieces as you see fit.
For example by this method you might end up with 7 lots of 14 giving you a remainder of 2. You can then either put this 2 into one of your pieces giving you the answer you suggested, or you could split it out evenly and add get two pieces of 15 (as suggested in one of the comments)
Not sure why you are working with doubles but wanting integer division semantics.
double input = 100;
const int Buckets = 7;
double[] vals = new double[Buckets];
for (int i = 0; i < vals.Length; i++)
{
vals[i] = Math.Floor(input / Buckets);
}
double remainder = input % Buckets;
// give all of the remainder to the first value
vals[0] += remainder;
example for ints with more flexibility,
int input = 100;
const int Buckets = 7;
int [] vals = new int[Buckets];
for (int i = 0; i < vals.Length; i++)
{
vals[i] = input / Buckets;
}
int remainder = input % Buckets;
// give all of the remainder to the first value
vals[0] += remainder;
// If instead you wanted to distribute the remainder evenly,
// priority to first
for (int r = 0; r < remainder;r++)
{
vals[r % Buckets] += 1;
}
It is worth pointing out that the double example may not be numerically stable in that certain input values and bucket sizes could result in leaking fractional values.