I want to write a distributed software system (system where you can execute programs faster than on a single pc), that can execute different kinds of programs.(As it is a school project, I'll probably execute programs like Prime finder and Pi calculator on it)
My preferences is that it should written for C# with .NET, have good documentation, be simple to write(not new in C# with .NET, but I'm not professional) and to be able to write tasks for the grid easily and/or to load programs to the network directly from .exe.
I've looked a little at the:
MPAPI
Utilify(from the makers of Alchemy)
NGrid (Outdated?)
Which one is the best for my case? Do you have any experience with them?
ps. I'm aware of many similar questions here, but they were either outdated, not with proper answers or didn't answer my question, and therefore I choose to ask again.
I just contacted the founder of Utilify (Krishna Nadiminti) and while active development has paused for now, he has kindly released all the source code here on Bitbucket.
I think it is worth continuing this project as there are literally no comparable alternative as of now (even commercial). I may start working on it but don't wait for me :).
Got same problem. I tried NGrid, Alchemi and MS
PI.net.
After all i decided to start my own open source project to play around, check here: http://lucygrid.codeplex.com/.
UPDATE:
See how looks PI example:
The function passed to AsParallelGrid will be executed by the grid nodes.
You can play with it running the DEMO project.
/// <summary>
/// Distributes simple const processing
/// </summary>
class PICalculation : AbstractDemo
{
public int Steps = 100000;
public int ChunkSize = 50;
public PICalculation()
{
}
override
public string Info()
{
return "Calculates PI over the grid.";
}
override
public string Run(bool enableLocalProcessing)
{
double sum = 0.0;
double step = 1.0 / (double)Steps;
/* ORIGINAL VERSION
object obj = new object();
Parallel.ForEach(
Partitioner.Create(0, Steps),
() => 0.0,
(range, state, partial) =>
{
for (long i = range.Item1; i < range.Item2; i++)
{
double x = (i - 0.5) * step;
partial += 4.0 / (1.0 + x * x);
}
return partial;
},
partial => { lock (obj) sum += partial; });
*/
sum = Enumerable
.Range(0, Steps)
// Create bucket
.GroupBy(s => s / 50)
// Local variable initialization is not distributed over the grid
.Select(i => new
{
Item1 = i.First(),
Item2 = i.Last() + 1, // Inclusive
Step = step
})
.AsParallelGrid(data =>
{
double partial = 0;
for (var i = data.Item1; i != data.Item2 ; ++i)
{
double x = (i - 0.5) * data.Step;
partial += (double)(4.0 / (1.0 + x * x));
}
return partial;
}, new GridSettings()
{
EnableLocalProcessing = enableLocalProcessing
})
.Sum() * step;
return sum.ToString();
}
}
Related
I am trying to use the L-BFGS solver in Accord.net maths package in C#.
However, I cannot find how to define the starting value of the optimization.
How can we define it ?
According to official examples, the following syntax defines the initial value of x in the optimization process. However it does not work properly in the following example - as if another starting point was used by the algorithm.
//Target function to minimize;
public double f(double[] x) {
double z = Math.Cos(x[0])-0.2*x[0] + x[1] * x[1]; //Function with multiple local minima : x ~ { (2n+1)pi , 0 }
return z;
}
//Gradient
public double[] g(double[] x) {
double[] grad = {-Math.Sin(x[0])-0.2 , 2 * x[1]};
return grad;
}
double[] x = {3*3.141592,0}; // Starting value (local minimum, -2.88)
var lbfgs = new BroydenFletcherGoldfarbShanno(numberOfVariables: 2, function: f, gradient: g);
bool success = lbfgs.Minimize();
double minValue = lbfgs.Value;
double[] solution = lbfgs.Solution; // {3.34,0} This solution is a local min that has a higher value (-1.65) than the local min next to which we started !!
The syntax is simply:
lbfgs.Minimize(x);
Thank you "500 - Internal Server Error" !
I'm trying to implement logistic regression by myself writing code in C#. I found a library (Accord.NET) that I use to minimize the cost function. However I'm always getting different minimums. Therefore I think something may be wrong in the cost function that I wrote.
static double costfunction(double[] thetas)
{
int i = 0;
double sum = 0;
double[][] theta_matrix_transposed = MatrixCreate(1, thetas.Length);
while(i!=thetas.Length) { theta_matrix_transposed[0][i] = thetas[i]; i++;}
i = 0;
while (i != m) // m is the number of examples
{
int z = 0;
double[][] x_matrix = MatrixCreate(thetas.Length, 1);
while (z != thetas.Length) { x_matrix[z][0] = x[z][i]; z++; } //Put values from the training set to the matrix
double p = MatrixProduct(theta_matrix_transposed, x_matrix)[0][0];
sum += y[i] * Math.Log(sigmoid(p)) + (1 - y[i]) * Math.Log(1 - sigmoid(p));
i++;
}
double value = (-1 / m) * sum;
return value;
}
static double sigmoid(double z)
{
return 1 / (1 + Math.Exp(-z));
}
x is a list of lists that represent the training set, one list for each feature. What's wrong with the code? Why am I getting different results every time I run the L-BFGS? Thank you for your patience, I'm just getting started with machine learning!
That is very common with these optimization algorithms - the minima you arrive at depends on your weight initialization. The fact that you are getting different minimums doesn't necessarily mean something is wrong with your implementation. Instead, check your gradients to make sure they are correct using the finite differences method, and also look at your train/validation/test accuracy to see if they are also acceptable.
I´m writing a C# (.NET 4.5) application that is used to aggregate time based events for reporting purposes. To make my query logic reusable for both realtime and historical data I make use of the Reactive Extensions (2.0) and their IScheduler infrastructure (HistoricalScheduler and friends).
For example, assume we create a list of events (sorted chronologically, but they may coincide!) whose only payload ist their timestamp and want to know their distribution across buffers of a fixed duration:
const int num = 100000;
const int dist = 10;
var events = new List<DateTimeOffset>();
var curr = DateTimeOffset.Now;
var gap = new Random();
var time = new HistoricalScheduler(curr);
for (int i = 0; i < num; i++)
{
events.Add(curr);
curr += TimeSpan.FromMilliseconds(gap.Next(dist));
}
var stream = Observable.Generate<int, DateTimeOffset>(
0,
s => s < events.Count,
s => s + 1,
s => events[s],
s => events[s],
time);
stream.Buffer(TimeSpan.FromMilliseconds(num), time)
.Subscribe(l => Console.WriteLine(time.Now + ": " + l.Count));
time.AdvanceBy(TimeSpan.FromMilliseconds(num * dist));
Running this code results in a System.StackOverflowException with the following stack trace (it´s the last 3 lines all the way down):
mscorlib.dll!System.Threading.Interlocked.Exchange<System.IDisposable>(ref System.IDisposable location1, System.IDisposable value) + 0x3d bytes
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x37 bytes
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes
...
System.Reactive.Core.dll!System.Reactive.Disposables.AnonymousDisposable.Dispose() + 0x4d bytes
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x4f bytes
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes
...
Ok, the problem seems to come from my use of Observable.Generate(), depending on the list size (num) and regardless of the choice of scheduler.
What am I doing wrong? Or more generally, what would be the preferred way to create an IObservable from an IEnumerable of events that provide their own timestamps?
(update - realized I didn't provide an alternative: see at bottom of answer)
The problem is in how Observable.Generate works - it's used to unfold a corecursive (think recursion turned inside out) generator based on the arguments; if those arguments end up generating a very nested corecursive generator, you'll blow your stack.
From this point on, I'm speculating a lot (don't have the Rx source in front of me) (see below), but I'm willing to bet your definition ends up expanding into something like:
initial_state =>
generate_next(initial_state) =>
generate_next(generate_next(initial_state)) =>
generate_next(generate_next(generate_next(initial_state))) =>
generate_next(generate_next(generate_next(generate_next(initial_state)))) => ...
And on and on until your call stack gets big enough to overflow. At, say, a method signature + your int counter, that'd be something like 8-16 bytes per recursive call (more depending on how the state machine generator is implemented), so 60,000 sounds about right (1M / 16 ~ 62500 maximum depth)
EDIT: Pulled up the source - confirmed: the "Run" method of Generate looks like this - take note of the nested calls to Generate:
protected override IDisposable Run(
IObserver<TResult> observer,
IDisposable cancel,
Action<IDisposable> setSink)
{
if (this._timeSelectorA != null)
{
Generate<TState, TResult>.α α =
new Generate<TState, TResult>.α(
(Generate<TState, TResult>) this,
observer,
cancel);
setSink(α);
return α.Run();
}
if (this._timeSelectorR != null)
{
Generate<TState, TResult>.δ δ =
new Generate<TState, TResult>.δ(
(Generate<TState, TResult>) this,
observer,
cancel);
setSink(δ);
return δ.Run();
}
Generate<TState, TResult>._ _ =
new Generate<TState, TResult>._(
(Generate<TState, TResult>) this,
observer,
cancel);
setSink(_);
return _.Run();
}
EDIT: Derp, didn't offer any alternatives...here's one that might work:
(EDIT: fixed Enumerable.Range, so stream size won´t be multiplied by chunkSize)
const int num = 160000;
const int dist = 10;
var events = new List<DateTimeOffset>();
var curr = DateTimeOffset.Now;
var gap = new Random();
var time = new HistoricalScheduler(curr);
for (int i = 0; i < num; i++)
{
events.Add(curr);
curr += TimeSpan.FromMilliseconds(gap.Next(dist));
}
// Size too big? Fine, we'll chunk it up!
const int chunkSize = 10000;
var numberOfChunks = events.Count / chunkSize;
// Generate a whole mess of streams based on start/end indices
var streams =
from chunkIndex in Enumerable.Range(0, (int)Math.Ceiling((double)events.Count / chunkSize) - 1)
let startIdx = chunkIndex * chunkSize
let endIdx = Math.Min(events.Count, startIdx + chunkSize)
select Observable.Generate<int, DateTimeOffset>(
startIdx,
s => s < endIdx,
s => s + 1,
s => events[s],
s => events[s],
time);
// E pluribus streamum
var stream = Observable.Concat(streams);
stream.Buffer(TimeSpan.FromMilliseconds(num), time)
.Subscribe(l => Console.WriteLine(time.Now + ": " + l.Count));
time.AdvanceBy(TimeSpan.FromMilliseconds(num * dist));
OK, I´ve taken a different factory method that doesn´t require lamdba expressions as state transitions and now I don´t see any stack overflows anymore. I´m not yet sure if this would qualify as a correct answer to my question, but it works and I thought I´d share it here:
var stream = Observable.Create<DateTimeOffset>(o =>
{
foreach (var e in events)
{
time.Schedule(e, () => o.OnNext(e));
}
time.Schedule(events[events.Count - 1], () => o.OnCompleted());
return Disposable.Empty;
});
Manually scheduling the events before (!) returning the subscription seems awkward to me, but in this case it can be done inside the lambda expression.
If there is anything wrong about this approach, please correct me. Also, I´d still be happy to hear what implicit assumptions by System.Reactive I have violated with my original code.
(Oh my, I should have checked that earlier: with RX v1.0, the original Observable.Generate() does in fact seem to work!)
Suppose I have written such a class (number of functions doesn't really matter, but in real, there will be somewhere about 3 or 4).
private class ReallyWeird
{
int y;
Func<double, double> f1;
Func<double, double> f2;
Func<double, double> f3;
public ReallyWeird()
{
this.y = 10;
this.f1 = (x => 25 * x + y);
this.f2 = (x => f1(x) + y * f1(x));
this.f3 = (x => Math.Log(f2(x) + f1(x)));
}
public double CalculusMaster(double x)
{
return f3(x) + f2(x);
}
}
I wonder if the C# compiler can optimize such a code so that it won't go through numerous stack calls.
Is it able to inline delegates at compile-time at all? If yes, on which conditions and to which limits? If no, is there an answer why?
Another question, maybe even more important: will it be significantly slower than if I had declared f1, f2 and f3 as methods?
I ask this because I want to keep my code as DRY as possible, so I want to implement a static class which extends the basic random number generator (RNG) functionality: its methods accept one delegate (e.g. from method NextInt() of the RNG) and returning another Func delegate (e.g. for generating ulongs), built on top of the former. and as long as there are many different RNG's which can generate ints, I prefer not to think about implementing all the same extended functionality ten times in different places.
So, this operation may be performed several times (i.e. initial method of the class may be 'wrapped' by a delegate twice or even three times). I wonder what will be the performance overhead like.
Thank you!
If you use Expression Trees instead of complete Func<> the compiler will be able to optimize the expressions.
Edit To clarify, note that I'm not saying the runtime would optimize the expression tree itself (it shouldn't) but rather that since the resulting Expression<> tree is .Compile()d in one step, the JIT engine will simply see the repeated subexpressions and be able to optimize, consolidate, substitue, shortcut and what else it does normally.
(I'm not absolutely sure that it does on all platforms, but at least it should be able to fully leverage the JIT engine)
Comment response
First, expression trees has in potential shall equal execution speed as Func<> (however a Func<> will not have the same runtime cost - JITting probably takes place while jitting the enclosing scope; in case of ngen, it will even be AOT, as opposed to an expresseion tree)
Second: I agree that Expression Trees can be hard to use. See here for a famous simple example of how to compose expressions. However, more complicated examples are pretty hard to come by. If I've got the time I'll see whether I can come up with a PoC and see what MS.Net and MONO actually generate in MSIL for these cases.
Third: don't forget Henk Holterman is probably right in saying this is premature optimization (although composing Expression<> instead of Func<> ahead of time just adds the flexibility)
Lastly, when you really think about driving this very far, you might consider using Compiler As A Service (which Mono already has, I believe it is still upcoming for Microsoft?).
I would not expect a compiler to optimize this. The complications (because of the delegates) would be huge.
And I would not worry about a few stack-frames here either. With 25 * x + y the stack+call overhead could be significant but call a few other methods (PRNG) and the part you are focusing on here becomes very marginal.
I compiled a quick test application where I compared the delegate approach to an approach where I defined each calculation as a function.
When doing 10.000.000 calculations for each version I got the following results:
Running using delegates: 920 ms average
Running using regular method calls: 730 ms average
So while there is a difference it is not very large and probably negligible.
Now, there may be an error in my calculations so I am adding the entire code below. I compiled it in release mode in Visual Studio 2010:
class Program
{
const int num = 10000000;
static void Main(string[] args)
{
for (int run = 1; run <= 5; run++)
{
Console.WriteLine("Run " + run);
RunTest1();
RunTest2();
}
Console.ReadLine();
}
static void RunTest1()
{
Console.WriteLine("Test1");
var t = new Test1();
var sw = Stopwatch.StartNew();
double x = 0;
for (var i = 0; i < num; i++)
{
t.CalculusMaster(x);
x += 1.0;
}
sw.Stop();
Console.WriteLine("Total time for " + num + " iterations: " + sw.ElapsedMilliseconds + " ms");
}
static void RunTest2()
{
Console.WriteLine("Test2");
var t = new Test2();
var sw = Stopwatch.StartNew();
double x = 0;
for (var i = 0; i < num; i++)
{
t.CalculusMaster(x);
x += 1.0;
}
sw.Stop();
Console.WriteLine("Total time for " + num + " iterations: " + sw.ElapsedMilliseconds + " ms");
}
}
class Test1
{
int y;
Func<double, double> f1;
Func<double, double> f2;
Func<double, double> f3;
public Test1()
{
this.y = 10;
this.f1 = (x => 25 * x + y);
this.f2 = (x => f1(x) + y * f1(x));
this.f3 = (x => Math.Log(f2(x) + f1(x)));
}
public double CalculusMaster(double x)
{
return f3(x) + f2(x);
}
}
class Test2
{
int y;
public Test2()
{
this.y = 10;
}
private double f1(double x)
{
return 25 * x + y;
}
private double f2(double x)
{
return f1(x) + y * f1(x);
}
private double f3(double x)
{
return Math.Log(f2(x) + f1(x));
}
public double CalculusMaster(double x)
{
return f3(x) + f2(x);
}
}
This question already has an answer here:
Interop Excel method LinEst failing with DISP_E_TYPEMISMATCH
(1 answer)
Closed 3 years ago.
Is any inbuit function is there or we need to write our own.
In later case could you please give me some link where it has been implemented.
And how it works?
Thanks
There's no built-in functionality in C# to calculate the best fit line using the least squares method. I wouldn't expect there to be one either since Excel is used for data manipulation/statistics and C# is a general purpose programming language.
There are plenty of people that have posted implementations to various sites though. I'd suggest checking them out and learning the algorithm behind their calculations.
Here's a link to one implementation:
Maths algorithms in C#: Linear least squares fit
There is pretty extensive documentation in the Online Help. And no, this is not available in C# by default. Both C#/.NET and Excel have quite differing uses, hence the different feature set.
Having attempted to solve this problem using this question and other questions which are similar/the same, I couldn't get a good example of how to accomplish this. However, pooling many posts (and Office Help's description of what LINEST actually does) I thought I would post my solution code.
/// <summary>
/// Finds the Gradient using the Least Squares Method
/// </summary>
/// <returns>The y intercept of a trendline of best fit through the data X and Y</returns>
public decimal LeastSquaresGradient()
{
//The DataSetsMatch method ensures that X and Y
//(both List<decimal> in this situation) have the same number of elements
if (!DataSetsMatch())
{
throw new ArgumentException("X and Y must contain the same number of elements");
}
//These variables are used store the variances of each point from its associated mean
List<decimal> varX = new List<decimal>();
List<decimal> varY = new List<decimal>();
foreach (decimal x in X)
{
varX.Add(x - AverageX());
}
foreach (decimal y in Y)
{
varY.Add(y - AverageY());
}
decimal topLine = 0;
decimal bottomLine = 0;
for (int i = 0; i < X.Count; i++)
{
topLine += (varX[i] * varY[i]);
bottomLine += (varX[i] * varX[i]);
}
if (bottomLine != 0)
{
return topLine / bottomLine;
}
else
{
return 0;
}
}
/// <summary>
/// Finds the Y Intercept using the Least Squares Method
/// </summary>
/// <returns>The y intercept of a trendline of best fit through the data X and Y</returns>
public decimal LeastSquaresYIntercept()
{
return AverageY() - (LeastSquaresGradient() * AverageX());
}
/// <summary>
/// Averages the Y.
/// </summary>
/// <returns>The average of the List Y</returns>
public decimal AverageX()
{
decimal temp = 0;
foreach (decimal t in X)
{
temp += t;
}
if (X.Count == 0)
{
return 0;
}
return temp / X.Count;
}
/// <summary>
/// Averages the Y.
/// </summary>
/// <returns>The average of the List Y</returns>
public decimal AverageY()
{
decimal temp = 0;
foreach (decimal t in Y)
{
temp += t;
}
if (Y.Count == 0)
{
return 0;
}
return temp / Y.Count;
}
Here's an implementation of Excel's LINEST() function in C#. It returns the slope for a given set of data, normalized using the same "least squares" method that LINEST() uses:
public static double CalculateLinest(double[] y, double[] x)
{
double linest = 0;
if (y.Length == x.Length)
{
double avgY = y.Average();
double avgX = x.Average();
double[] dividend = new double[y.Length];
double[] divisor = new double[y.Length];
for (int i = 0; i < y.Length; i++)
{
dividend[i] = (x[i] - avgX) * (y[i] - avgY);
divisor[i] = Math.Pow((x[i] - avgX), 2);
}
linest = dividend.Sum() / divisor.Sum();
}
return linest;
}
Also, here's a method I wrote to get the "b" (y-intercept) value that Excel's LINEST function generates.
private double CalculateYIntercept(double[] x, double[] y, double linest)
{
return (y.Average() - linest * x.Average());
}
Since these methods only work for one set of data, I would recommend calling them inside of a loop if you wish to produce multiple sets of linear regression data.
This link helped me find my answer: https://agrawalreetesh.blogspot.com/2011/11/how-to-calculate-linest-of-given.html