What is the C# equivalent to LINEST from Excel? [duplicate] - c#

This question already has an answer here:
Interop Excel method LinEst failing with DISP_E_TYPEMISMATCH
(1 answer)
Closed 3 years ago.
Is any inbuit function is there or we need to write our own.
In later case could you please give me some link where it has been implemented.
And how it works?
Thanks

There's no built-in functionality in C# to calculate the best fit line using the least squares method. I wouldn't expect there to be one either since Excel is used for data manipulation/statistics and C# is a general purpose programming language.
There are plenty of people that have posted implementations to various sites though. I'd suggest checking them out and learning the algorithm behind their calculations.
Here's a link to one implementation:
Maths algorithms in C#: Linear least squares fit

There is pretty extensive documentation in the Online Help. And no, this is not available in C# by default. Both C#/.NET and Excel have quite differing uses, hence the different feature set.

Having attempted to solve this problem using this question and other questions which are similar/the same, I couldn't get a good example of how to accomplish this. However, pooling many posts (and Office Help's description of what LINEST actually does) I thought I would post my solution code.
/// <summary>
/// Finds the Gradient using the Least Squares Method
/// </summary>
/// <returns>The y intercept of a trendline of best fit through the data X and Y</returns>
public decimal LeastSquaresGradient()
{
//The DataSetsMatch method ensures that X and Y
//(both List<decimal> in this situation) have the same number of elements
if (!DataSetsMatch())
{
throw new ArgumentException("X and Y must contain the same number of elements");
}
//These variables are used store the variances of each point from its associated mean
List<decimal> varX = new List<decimal>();
List<decimal> varY = new List<decimal>();
foreach (decimal x in X)
{
varX.Add(x - AverageX());
}
foreach (decimal y in Y)
{
varY.Add(y - AverageY());
}
decimal topLine = 0;
decimal bottomLine = 0;
for (int i = 0; i < X.Count; i++)
{
topLine += (varX[i] * varY[i]);
bottomLine += (varX[i] * varX[i]);
}
if (bottomLine != 0)
{
return topLine / bottomLine;
}
else
{
return 0;
}
}
/// <summary>
/// Finds the Y Intercept using the Least Squares Method
/// </summary>
/// <returns>The y intercept of a trendline of best fit through the data X and Y</returns>
public decimal LeastSquaresYIntercept()
{
return AverageY() - (LeastSquaresGradient() * AverageX());
}
/// <summary>
/// Averages the Y.
/// </summary>
/// <returns>The average of the List Y</returns>
public decimal AverageX()
{
decimal temp = 0;
foreach (decimal t in X)
{
temp += t;
}
if (X.Count == 0)
{
return 0;
}
return temp / X.Count;
}
/// <summary>
/// Averages the Y.
/// </summary>
/// <returns>The average of the List Y</returns>
public decimal AverageY()
{
decimal temp = 0;
foreach (decimal t in Y)
{
temp += t;
}
if (Y.Count == 0)
{
return 0;
}
return temp / Y.Count;
}

Here's an implementation of Excel's LINEST() function in C#. It returns the slope for a given set of data, normalized using the same "least squares" method that LINEST() uses:
public static double CalculateLinest(double[] y, double[] x)
{
double linest = 0;
if (y.Length == x.Length)
{
double avgY = y.Average();
double avgX = x.Average();
double[] dividend = new double[y.Length];
double[] divisor = new double[y.Length];
for (int i = 0; i < y.Length; i++)
{
dividend[i] = (x[i] - avgX) * (y[i] - avgY);
divisor[i] = Math.Pow((x[i] - avgX), 2);
}
linest = dividend.Sum() / divisor.Sum();
}
return linest;
}
Also, here's a method I wrote to get the "b" (y-intercept) value that Excel's LINEST function generates.
private double CalculateYIntercept(double[] x, double[] y, double linest)
{
return (y.Average() - linest * x.Average());
}
Since these methods only work for one set of data, I would recommend calling them inside of a loop if you wish to produce multiple sets of linear regression data.
This link helped me find my answer: https://agrawalreetesh.blogspot.com/2011/11/how-to-calculate-linest-of-given.html

Related

What wrong with this implement of this arcsine approximate in C#

This is a formula to approximate arcsine(x) using Taylor series from this blog
This is my implementation in C#, I don't know where is the wrong place, the code give wrong result when running:
When i = 0, the division will be 1/x. So I assign temp = 1/x at startup. For each iteration, I change "temp" after "i".
I use a continual loop until the two next value is very "near" together. When the delta of two next number is very small, I will return the value.
My test case:
Input is x =1, so excected arcsin(X) will be arcsin (1) = PI/2 = 1.57079633 rad.
class Arc{
static double abs(double x)
{
return x >= 0 ? x : -x;
}
static double pow(double mu, long n)
{
double kq = mu;
for(long i = 2; i<= n; i++)
{
kq *= mu;
}
return kq;
}
static long fact(long n)
{
long gt = 1;
for (long i = 2; i <= n; i++) {
gt *= i;
}
return gt;
}
#region arcsin
static double arcsinX(double x) {
int i = 0;
double temp = 0;
while (true)
{
//i++;
var iFactSquare = fact(i) * fact(i);
var tempNew = (double)fact(2 * i) / (pow(4, i) * iFactSquare * (2*i+1)) * pow(x, 2 * i + 1) ;
if (abs(tempNew - temp) < 0.00000001)
{
return tempNew;
}
temp = tempNew;
i++;
}
}
public static void Main(){
Console.WriteLine(arcsin());
Console.ReadLine();
}
}
In many series evaluations, it is often convenient to use the quotient between terms to update the term. The quotient here is
(2n)!*x^(2n+1) 4^(n-1)*((n-1)!)^2*(2n-1)
a[n]/a[n-1] = ------------------- * --------------------- -------
(4^n*(n!)^2*(2n+1)) (2n-2)!*x^(2n-1)
=(2n(2n-1)²x²)/(4n²(2n+1))
= ((2n-1)²x²)/(2n(2n+1))
Thus a loop to compute the series value is
sum = 1;
term = 1;
n=1;
while(1 != 1+term) {
term *= (n-0.5)*(n-0.5)*x*x/(n*(n+0.5));
sum += term;
n += 1;
}
return x*sum;
The convergence is only guaranteed for abs(x)<1, for the evaluation at x=1 you have to employ angle halving, which in general is a good idea to speed up convergence.
You are saving two different temp values (temp and tempNew) to check whether or not continuing computation is irrelevant. This is good, except that you are not saving the sum of these two values.
This is a summation. You need to add every new calculated value to the total. You are only keeping track of the most recently calculated value. You can only ever return the last calculated value of the series. So you will always get an extremely small number as your result. Turn this into a summation and the problem should go away.
NOTE: I've made this a community wiki answer because I was hardly the first person to think of this (just the first to put it down in a comment). If you feel that more needs to be added to make the answer complete, just edit it in!
The general suspicion is that this is down to Integer Overflow, namely one of your values (probably the return of fact() or iFactSquare()) is getting too big for the type you have chosen. It's going to negative because you are using signed types — when it gets to too large a positive number, it loops back into the negative.
Try tracking how large n gets during your calculation, and figure out how big a number it would give you if you ran that number through your fact, pow and iFactSquare functions. If it's bigger than the Maximum long value in 64-bit like we think (assuming you're using 64-bit, it'll be a lot smaller for 32-bit), then try using a double instead.

Can't get cost function for logistic regression to work

I'm trying to implement logistic regression by myself writing code in C#. I found a library (Accord.NET) that I use to minimize the cost function. However I'm always getting different minimums. Therefore I think something may be wrong in the cost function that I wrote.
static double costfunction(double[] thetas)
{
int i = 0;
double sum = 0;
double[][] theta_matrix_transposed = MatrixCreate(1, thetas.Length);
while(i!=thetas.Length) { theta_matrix_transposed[0][i] = thetas[i]; i++;}
i = 0;
while (i != m) // m is the number of examples
{
int z = 0;
double[][] x_matrix = MatrixCreate(thetas.Length, 1);
while (z != thetas.Length) { x_matrix[z][0] = x[z][i]; z++; } //Put values from the training set to the matrix
double p = MatrixProduct(theta_matrix_transposed, x_matrix)[0][0];
sum += y[i] * Math.Log(sigmoid(p)) + (1 - y[i]) * Math.Log(1 - sigmoid(p));
i++;
}
double value = (-1 / m) * sum;
return value;
}
static double sigmoid(double z)
{
return 1 / (1 + Math.Exp(-z));
}
x is a list of lists that represent the training set, one list for each feature. What's wrong with the code? Why am I getting different results every time I run the L-BFGS? Thank you for your patience, I'm just getting started with machine learning!
That is very common with these optimization algorithms - the minima you arrive at depends on your weight initialization. The fact that you are getting different minimums doesn't necessarily mean something is wrong with your implementation. Instead, check your gradients to make sure they are correct using the finite differences method, and also look at your train/validation/test accuracy to see if they are also acceptable.

Grid Computing API

I want to write a distributed software system (system where you can execute programs faster than on a single pc), that can execute different kinds of programs.(As it is a school project, I'll probably execute programs like Prime finder and Pi calculator on it)
My preferences is that it should written for C# with .NET, have good documentation, be simple to write(not new in C# with .NET, but I'm not professional) and to be able to write tasks for the grid easily and/or to load programs to the network directly from .exe.
I've looked a little at the:
MPAPI
Utilify(from the makers of Alchemy)
NGrid (Outdated?)
Which one is the best for my case? Do you have any experience with them?
ps. I'm aware of many similar questions here, but they were either outdated, not with proper answers or didn't answer my question, and therefore I choose to ask again.
I just contacted the founder of Utilify (Krishna Nadiminti) and while active development has paused for now, he has kindly released all the source code here on Bitbucket.
I think it is worth continuing this project as there are literally no comparable alternative as of now (even commercial). I may start working on it but don't wait for me :).
Got same problem. I tried NGrid, Alchemi and MS
PI.net.
After all i decided to start my own open source project to play around, check here: http://lucygrid.codeplex.com/.
UPDATE:
See how looks PI example:
The function passed to AsParallelGrid will be executed by the grid nodes.
You can play with it running the DEMO project.
/// <summary>
/// Distributes simple const processing
/// </summary>
class PICalculation : AbstractDemo
{
public int Steps = 100000;
public int ChunkSize = 50;
public PICalculation()
{
}
override
public string Info()
{
return "Calculates PI over the grid.";
}
override
public string Run(bool enableLocalProcessing)
{
double sum = 0.0;
double step = 1.0 / (double)Steps;
/* ORIGINAL VERSION
object obj = new object();
Parallel.ForEach(
Partitioner.Create(0, Steps),
() => 0.0,
(range, state, partial) =>
{
for (long i = range.Item1; i < range.Item2; i++)
{
double x = (i - 0.5) * step;
partial += 4.0 / (1.0 + x * x);
}
return partial;
},
partial => { lock (obj) sum += partial; });
*/
sum = Enumerable
.Range(0, Steps)
// Create bucket
.GroupBy(s => s / 50)
// Local variable initialization is not distributed over the grid
.Select(i => new
{
Item1 = i.First(),
Item2 = i.Last() + 1, // Inclusive
Step = step
})
.AsParallelGrid(data =>
{
double partial = 0;
for (var i = data.Item1; i != data.Item2 ; ++i)
{
double x = (i - 0.5) * data.Step;
partial += (double)(4.0 / (1.0 + x * x));
}
return partial;
}, new GridSettings()
{
EnableLocalProcessing = enableLocalProcessing
})
.Sum() * step;
return sum.ToString();
}
}

C# to Lambda - count decimal places / first significant decimal

Out of curiosity, would there be an equivalent Lambda expression for the following?
... just started using lambda so not familiar yet with methods like zip ...
//Pass in a double and return the number of decimal places
//ie. 0.00009 should result in 5
//EDIT: Number of decimal places is good.
//However, what I really want is the position of the first non-zero digit
//after the decimal place.
int count=0;
while ((int)double_in % 10 ==0)
{
double_in*=10;
count++;
}
double1.ToString().SkipWhile(c => c!='.').Skip(1).Count()
For example:
double double1 = 1.06696;
int count = double1.ToString().SkipWhile(c => c!='.').Skip(1).Count(); // count = 5;
double double2 = 16696;
int count2 = double2.ToString().SkipWhile(c => c!='.').Skip(1).Count(); // count = 0;
Math.Ceiling(-Math.Log(double_in, 10))
I'd write an InfiniteSequence function like
/// <summary>
/// Returns an inifinte sequence of integers starting with 1
/// </summary>
public static IEnumerable<int> InfiniteSequence() {
int value = 0;
while (true) {
yield return ++value;
}
}
(This kind of infinite enumeration is missing anyway in .NET :) ...)
And then use it like
var count = InfiniteSequence().Select(i => (int)(double_in * Math.Power(10,i))).TakeWhile(v=>v%10==0).Count();
That would be a direct translation (except for the way the powers of 10 are calculated) of the original code.
If thought this would more likely answer your question, and is culture invariant.
Math.Max(0, num.ToString().Length - Math.Truncate(num).ToString().Length - 1)

C# search with resemblance / affinity

Suppose We have a, IEnumerable Collection with 20 000 Person object items.
Then suppose we have created another Person object.
We want to list all Persons that ressemble this Person.
That means, for instance, if the Surname affinity is more than 90 % , add that Person to the list.
e.g. ("Andrew" vs "Andrw")
What is the most effective / quick way of doing this?
Iterating through the collection and comparing char by char with affinity determination? OR?
Any ideas?
Thank you!
You may be interested in:
Levenshtein Distance Algorithm
Peter Norvig - How to Write a Spelling Corrector
(you'll be interested in the part where he compares a word against a collection of existing words)
Depending on how often you'll need to do this search, the brute force iterate and compare method might be fast enough. Twenty thousand records really isn't all that much and unless the number of requests is large your performance may be acceptable.
That said, you'll have to implement the comparison logic yourself and if you want a large degree of flexibility (or if you need find you have to work on performance) you might want to look at something like Lucene.Net. Most of the text search engines I've seen and worked with have been more file-based, but I think you can index in-memory objects as well (however I'm not sure about that).
Good luck!
I'm not sure if you're asking for help writing the search given your existing affinity function, or if you're asking for help writing the affinity function. So for the moment I'll assume you're completely lost.
Given that assumption, you'll notice that I divided the problem into two pieces, and that's what you need to do as well. You need to write a function that takes two string inputs and returns a boolean value indicating whether or not the inputs are sufficiently similar. Then you need a separate search a delegate that will match any function with that kind of signature.
The basic signature for your affinity function might look like this:
bool IsAffinityMatch(string p1, string p2)
And then your search would look like this:
MyPersonCollection.Where(p => IsAffinityMatch(p.Surname, OtherPerson.Surname));
I provide the source code of that Affinity method:
/// <summary>
/// Compute Levenshtein distance according to the Levenshtein Distance Algorithm
/// </summary>
/// <param name="s">String 1</param>
/// <param name="t">String 2</param>
/// <returns>Distance between the two strings.
/// The larger the number, the bigger the difference.
/// </returns>
private static int Compare(string s, string t)
{
/* if both string are not set, its uncomparable. But others fields can still match! */
if (string.IsNullOrEmpty(s) && string.IsNullOrEmpty(t)) return 0;
/* if one string has value and the other one hasn't, it's definitely not match */
if (string.IsNullOrEmpty(s) || string.IsNullOrEmpty(t)) return -1;
s = s.ToUpper().Trim();
t = t.ToUpper().Trim();
int n = s.Length;
int m = t.Length;
int[,] d = new int[n + 1, m + 1];
int cost;
if (n == 0) return m;
if (m == 0) return n;
for (int i = 0; i <= n; d[i, 0] = i++) ;
for (int j = 0; j <= m; d[0, j] = j++) ;
for (int i = 1; i <= n; i++)
{
for (int j = 1; j <= m; j++)
{
cost = (t.Substring(j - 1, 1) == s.Substring(i - 1, 1) ? 0 : 1);
d[i, j] = System.Math.Min(System.Math.Min(d[i - 1, j] + 1, d[i, j - 1] + 1),
d[i - 1, j - 1] + cost);
}
}
return d[n, m];
}
that means, if 0 is returned, 2 strings are identical.

Categories

Resources