Delegate stack efficiency - c#

Suppose I have written such a class (number of functions doesn't really matter, but in real, there will be somewhere about 3 or 4).
private class ReallyWeird
{
int y;
Func<double, double> f1;
Func<double, double> f2;
Func<double, double> f3;
public ReallyWeird()
{
this.y = 10;
this.f1 = (x => 25 * x + y);
this.f2 = (x => f1(x) + y * f1(x));
this.f3 = (x => Math.Log(f2(x) + f1(x)));
}
public double CalculusMaster(double x)
{
return f3(x) + f2(x);
}
}
I wonder if the C# compiler can optimize such a code so that it won't go through numerous stack calls.
Is it able to inline delegates at compile-time at all? If yes, on which conditions and to which limits? If no, is there an answer why?
Another question, maybe even more important: will it be significantly slower than if I had declared f1, f2 and f3 as methods?
I ask this because I want to keep my code as DRY as possible, so I want to implement a static class which extends the basic random number generator (RNG) functionality: its methods accept one delegate (e.g. from method NextInt() of the RNG) and returning another Func delegate (e.g. for generating ulongs), built on top of the former. and as long as there are many different RNG's which can generate ints, I prefer not to think about implementing all the same extended functionality ten times in different places.
So, this operation may be performed several times (i.e. initial method of the class may be 'wrapped' by a delegate twice or even three times). I wonder what will be the performance overhead like.
Thank you!

If you use Expression Trees instead of complete Func<> the compiler will be able to optimize the expressions.
Edit To clarify, note that I'm not saying the runtime would optimize the expression tree itself (it shouldn't) but rather that since the resulting Expression<> tree is .Compile()d in one step, the JIT engine will simply see the repeated subexpressions and be able to optimize, consolidate, substitue, shortcut and what else it does normally.
(I'm not absolutely sure that it does on all platforms, but at least it should be able to fully leverage the JIT engine)
Comment response
First, expression trees has in potential shall equal execution speed as Func<> (however a Func<> will not have the same runtime cost - JITting probably takes place while jitting the enclosing scope; in case of ngen, it will even be AOT, as opposed to an expresseion tree)
Second: I agree that Expression Trees can be hard to use. See here for a famous simple example of how to compose expressions. However, more complicated examples are pretty hard to come by. If I've got the time I'll see whether I can come up with a PoC and see what MS.Net and MONO actually generate in MSIL for these cases.
Third: don't forget Henk Holterman is probably right in saying this is premature optimization (although composing Expression<> instead of Func<> ahead of time just adds the flexibility)
Lastly, when you really think about driving this very far, you might consider using Compiler As A Service (which Mono already has, I believe it is still upcoming for Microsoft?).

I would not expect a compiler to optimize this. The complications (because of the delegates) would be huge.
And I would not worry about a few stack-frames here either. With 25 * x + y the stack+call overhead could be significant but call a few other methods (PRNG) and the part you are focusing on here becomes very marginal.

I compiled a quick test application where I compared the delegate approach to an approach where I defined each calculation as a function.
When doing 10.000.000 calculations for each version I got the following results:
Running using delegates: 920 ms average
Running using regular method calls: 730 ms average
So while there is a difference it is not very large and probably negligible.
Now, there may be an error in my calculations so I am adding the entire code below. I compiled it in release mode in Visual Studio 2010:
class Program
{
const int num = 10000000;
static void Main(string[] args)
{
for (int run = 1; run <= 5; run++)
{
Console.WriteLine("Run " + run);
RunTest1();
RunTest2();
}
Console.ReadLine();
}
static void RunTest1()
{
Console.WriteLine("Test1");
var t = new Test1();
var sw = Stopwatch.StartNew();
double x = 0;
for (var i = 0; i < num; i++)
{
t.CalculusMaster(x);
x += 1.0;
}
sw.Stop();
Console.WriteLine("Total time for " + num + " iterations: " + sw.ElapsedMilliseconds + " ms");
}
static void RunTest2()
{
Console.WriteLine("Test2");
var t = new Test2();
var sw = Stopwatch.StartNew();
double x = 0;
for (var i = 0; i < num; i++)
{
t.CalculusMaster(x);
x += 1.0;
}
sw.Stop();
Console.WriteLine("Total time for " + num + " iterations: " + sw.ElapsedMilliseconds + " ms");
}
}
class Test1
{
int y;
Func<double, double> f1;
Func<double, double> f2;
Func<double, double> f3;
public Test1()
{
this.y = 10;
this.f1 = (x => 25 * x + y);
this.f2 = (x => f1(x) + y * f1(x));
this.f3 = (x => Math.Log(f2(x) + f1(x)));
}
public double CalculusMaster(double x)
{
return f3(x) + f2(x);
}
}
class Test2
{
int y;
public Test2()
{
this.y = 10;
}
private double f1(double x)
{
return 25 * x + y;
}
private double f2(double x)
{
return f1(x) + y * f1(x);
}
private double f3(double x)
{
return Math.Log(f2(x) + f1(x));
}
public double CalculusMaster(double x)
{
return f3(x) + f2(x);
}
}

Related

Difference in syntax and time complexity for C# for loop

I am trying to figure out what the difference between the following for loops is.
The first is code that I wrote while practicing algorithms on codewars.com. It times out when attempting the larger test cases.
The second is one of the top solutions. It seems functionally similar (obviously its more concise) but runs much faster and does not time out. Can anyone explain to me what the difference is? Also, the return statement in the second snippet is confusing to me. What exactly does this syntax mean? Maybe this is where it is more efficient.
public static long findNb(long m)
{
int sum = 0;
int x = new int();
for (int n = 0; sum < m; n++)
{
sum += n*n*n;
x = n;
System.Console.WriteLine(x);
}
if (sum == m)
{
return x;
}
return -1;
}
vs
public static long findNb(long m) //seems similar but doesnt time out
{
long total = 1, i = 2;
for(; total < m; i++) total += i * i * i;
return total == m ? i - 1 : -1;
}
The second approach uses long for the total value. Chances are that you're using an m value that's high enough to exceed the number of values representable by int. So your math overflows and the n value becomes a negative number. You get caught in an infinite loop, where n can never get as big as m.
And, like everyone else says, get rid of the WriteLine.
Also, the return statement in the second snippet is confusing to me. What exactly does this syntax mean?
It's a ternary conditional operator.
Both approaches are roughly the same, except unwanted System.Console.WriteLine(x); which spolis the fun: printing on the Console (UI!) is a slow operation.
If you are looking for a fast solution (esp. for the large m and long loop) you can just precompute all (77936) values:
public class Solver {
static Dictionary<long, long> s_Sums = new Dictionary<long, long>();
private static void Build() {
long total = 0;
for (long i = 0; i <= 77936; ++i) {
total += i * i * i;
s_Sums.Add(total, i);
}
}
static Solver()
Build();
}
public static long findNb(long m) {
return s_Sums.TryGetValue(m, out long result)
? result
: -1;
}
}
When I run into micro optimisation challenges like this, I always use BenchmarkDotnet. It's the tool to use to get all the insights to performance, memory allocations, deviations in .NET Framework versions, 64bit vs 32 bit etc. etc.
But as others write - remember to remove the WriteLine() statement :)

C# Compiler Optimization

Why does the compiler do to optimize my code?
I have 2 functions:
public void x1() {
x++;
x++;
}
public void x2() {
x += 2;
}
public void x3() {
x = x + 2;
}
public void y3() {
x = x * x + x * x;
}
And that is what I can see with ILSpy after compiling in Release mode:
// test1.Something
public void x1()
{
this.x++;
this.x++;
}
// test1.Something
public void x2()
{
this.x += 2;
}
// test1.Something
public void x3()
{
this.x += 2;
}
// test1.Something
public void y3()
{
this.x = this.x * this.x + this.x * this.x;
}
x2 and x3 might be ok. But why is x1 not optimized to the same result? There is not reason to keep it a 2 step increment?
And why is y3 not x=2*(x*x)? Shouldn't that be faster than x*x+x*x?
That leads to the question? What kind of optimization does the C# compiler if not such simple things?
When I read articles about writing code you often hear, write it readable and the compiler will do the rest. But in this case the compiler does nearly nothing.
Adding one more example:
public void x1() {
int a = 1;
int b = 1;
int c = 1;
x = a + b + c;
}
and using ILSpy:
// test1.Something
public void x1()
{
int a = 1;
int b = 1;
int c = 1;
this.x = a + b + c;
}
Why is it not this.x = 3?
The compiler cannot perform this optimization without making an assumption that variable x is not accessed concurrently with your running method. Otherwise it risks changing the behavior of your method in a detectable way.
Consider a situation when the object referenced by this is accessed concurrently from two threads. Tread A repeatedly sets x to zero; thread B repeatedly calls x1().
If the compiler optimizes x1 to be an equivalent of x2, the two observable states for x after your experiment would be 0 and 2:
If A finishes before B, you get 2
If B finishes before A, you get 0
If A pre-empts B in the middle, you would still get a 2.
However, the original version of x1 allows for three outcomes: x can end up being 0, 1, or 2.
If A finishes before B, you get 2
If B finishes before A, you get 0
If B gets pre-empted after the first increment, then A finishes, and then B runs to completion, you get 1.
x1 and x2 are NOT the same:
if x were a public field and was accessed in a multi-threaded environment, it's entirely possible that a second thread mutates x between the two calls, which would not be possible with the code in x2.
For y2, if + and/or * were overloaded for the type of x then x*x + x*x could be different than 2*x*x.
The compiler will optimize things like (not an exhaustive list my any means):
removing local variables that are not used (freeing up registers)
removing code that does not affect the logic flow or the output.
inlining calls to simple methods
Compiler optimization should NOT change the behavior of the program (although it does happen). So reordering/combining math operations are out of scope of optimization.
write it readable and the compiler will do the rest.
Well, the compiler may do some optimization, but there is still a LOT that can be done to improve performance at design-time. Yes readable code is definitely valuable, but the compiler's job is to generate working IL that corresponds with your source code, not to change your source code to be faster.

Can't get cost function for logistic regression to work

I'm trying to implement logistic regression by myself writing code in C#. I found a library (Accord.NET) that I use to minimize the cost function. However I'm always getting different minimums. Therefore I think something may be wrong in the cost function that I wrote.
static double costfunction(double[] thetas)
{
int i = 0;
double sum = 0;
double[][] theta_matrix_transposed = MatrixCreate(1, thetas.Length);
while(i!=thetas.Length) { theta_matrix_transposed[0][i] = thetas[i]; i++;}
i = 0;
while (i != m) // m is the number of examples
{
int z = 0;
double[][] x_matrix = MatrixCreate(thetas.Length, 1);
while (z != thetas.Length) { x_matrix[z][0] = x[z][i]; z++; } //Put values from the training set to the matrix
double p = MatrixProduct(theta_matrix_transposed, x_matrix)[0][0];
sum += y[i] * Math.Log(sigmoid(p)) + (1 - y[i]) * Math.Log(1 - sigmoid(p));
i++;
}
double value = (-1 / m) * sum;
return value;
}
static double sigmoid(double z)
{
return 1 / (1 + Math.Exp(-z));
}
x is a list of lists that represent the training set, one list for each feature. What's wrong with the code? Why am I getting different results every time I run the L-BFGS? Thank you for your patience, I'm just getting started with machine learning!
That is very common with these optimization algorithms - the minima you arrive at depends on your weight initialization. The fact that you are getting different minimums doesn't necessarily mean something is wrong with your implementation. Instead, check your gradients to make sure they are correct using the finite differences method, and also look at your train/validation/test accuracy to see if they are also acceptable.

C# regarding a parameter and running program

I hope somebody can be of assistance, thank you in advance.
I am using C# to generate some simulation models for evaluating the medium stationary time of a request in a system and the degree of usage of that serving station from the system.
I am using a function to generate the required numbers:
public double genTP(double miu)
{
Random random = new Random();
double u, x;
u = (double)random.NextDouble();
x = (-1 / miu) * Math.Log(1 - u);
return x;
}
This is the main:
Program p1 = new Program();
double NS = 1000000;
double lambda = 4;
double miu = 10;
double STP = 0;
double STS = 0;
double STL = 0;
double i = 1;
double Ta = 0;
double Tp = 0;
double Dis = 0;
do
{
Tp = p1.genTP(miu);
STP += Tp;
STS += Ta + Tp;
Dis = p1.genDIS(lambda);
if (Dis < Ta + Tp)
{
Ta = Ta + Tp - Dis;
}
else
{
STL += Dis - (Ta + Tp);
Ta = 0;
}
i++;
} while (i <= NS);
Console.WriteLine(STS / NS);
Console.WriteLine((STP/(STP+STL))*100);
1) The medium stationary time (which is r) returned is wrong, I get values like 0.09.. but I should get something like ~0.1665.. The algorithm is ok, I am 100% sure of that, I tried something like that in Matlab and it was good. Also the degree of usage (the last line) returned is ok (around ~39.89), only the r is wrong. Could it be a problem with the function, especially the random function that should generate a number?
2)Regarding my function genTP, if I change the parameter from double to int, then it returns 0 at the end. I used debugger to check why is that, and I saw that when the method calculates the value of x with (-1 / miu), this returns 0 automatically, I have tried to cast to double but with no result. I was thinking that this could be a source of the problem.
You're creating a new instance of Random each time you call genTP. If this is called multiple times in quick succession (as it is) then it will use the same seed each time. See my article on random numbers for more information.
Create a single instance of Random and pass it into the method:
private static double GenerateTP(Random random, double miu)
{
double u = random.NextDouble();
return (-1 / miu) * Math.Log(1 - u);
}
And...
Random random = new Random();
do
{
double tp = GenerateTP(random, miu);
...
}
A few more suggestions:
Declare your variables at the point of first use, with minimal scope
Follow .NET naming conventions
Don't make methods instance methods if they don't use any state
I prefer to create a static random field in the calculation class
static Random random = new Random();
Now I can use it without thinking of quick succession and I don't need to give it as a function parameter (I'm not trying to say it works faster but just similar to mathematical notation)
Regarding your second question, it is because the compiler does an integer division because they are two integers.
int / int = int (and the result is truncated).
If any of the args are floating point types, the operation is promoted to a floating point division. If you change your arg to an int, you should either use -1d (a double with value -1), or cast 'miu' to a double before usage:
x = (-1d / miu) ...
or
x = (-1 / (double)miu) ...
Or, of course, use them both as doubles. I prefer the first way.

Can I improve the "double.IsNaN( x )" function call on embedded C#?

I´ve a line of code that is called millions of times inside a for loop, checking if a passed argument is double.NaN. I´ve profiled my application and one of the bottlenecks is this simple function:
public void DoSomething(double[] args)
{
for(int i = 0; i < args.Length;i++)
{
if(double.IsNan(args[i]))
{
//Do something
}
}
}
Can I optimize it even if I can´t change the code inside the if?
If you have really optimized other parts of your code, you can let this function become a little bit cryptic an utilize the definition of Not a Number (NaN):
"The predicate x != y is True but all
others, x < y , x <= y , x == y , x >=
y and x > y , are False whenever x or
y or both are NaN.” (IEEE Standard 754
for Binary Floating-Point Arithmetic)
Translating that to your code you would get:
public void DoSomething(double[] args)
{
for(int i = 0; i < args.Length;i++)
{
double value = args[i];
if(value != value)
{
//Do something
}
}
}
In an ARM device using WindoWs CE + .NET Compact Framework 3.5 with around 50% probability of getting a Nan, value != value is twice as fast as double.IsNan(value).
Just be sure to measure your application execution after!
I find it hard (but not impossible) to believe that any other check on args[i] would be faster than double.IsNan().
One possibility is if this is a function. There is an overhead with calling functions, sometimes substantial, especially if the function itself is relatively small.
You could take advantage of the fact that the bit patterns for IEEE754 NaNs are well known and just do some bit checks (without calling a function to do it) - this would remove that overhead. In C, I'd try that with a macro. Where the exponent bits are all 1 and the mantissa bits are not all 0, that's a NaN (signalling or quiet is decided by the sign bit but you're probably not concerned with that). In addition, NaNs are never equal to one another so you could test for equality between args[i] and itself - false means it's a NaN.
Another possibility may be workable if the array is used more often than it's changed. Maintain another array of booleans which indicate whether or not the associated double is a NaN. Then, whenever one of the doubles changes, compute the associated boolean.
Then your function becomes:
public void DoSomething(double[] args, boolean[] nan) {
for(int i = 0; i < args.Length; i++) {
if (nan[i]) {
//Do something
}
}
}
This is the same sort of "trick" used in databases where you pre-compute values only when the data changes rather than every time you read it out. If you're in a situation where the data is being used a lot more than being changed, it's a good optimisation to look into (most algorithms can trade off space for time).
But remember the optimisation mantra: Measure, don't guess!
Just to further reiterate how important performance testing is I ran the following test on my Core i5-750 in 64-bit native and 32-bit mode on Windows 7 compiled with VS 2010 targetting .NET 4.0 and got the following results:
public static bool DoSomething(double[] args) {
bool ret = false;
for (int i = 0; i < args.Length; i++) {
if (double.IsNaN(args[i])) {
ret = !ret;
}
}
return ret;
}
public static bool DoSomething2(double[] args) {
bool ret = false;
for (int i = 0; i < args.Length; i++) {
if (args[i] != args[i]) {
ret = !ret;
}
}
return ret;
}
public static IEnumerable<R> Generate<R>(Func<R> func, int num) {
for (int i = 0; i < num; i++) {
yield return func();
}
}
static void Main(string[] args) {
Random r = new Random();
double[] data = Generate(() => {
var res = r.NextDouble();
return res < 0.5 ? res : Double.NaN;
}, 1000000).ToArray();
Stopwatch sw = new Stopwatch();
sw.Start();
DoSomething(data);
Console.WriteLine(sw.ElapsedTicks);
sw.Reset();
sw.Start();
DoSomething2(data);
Console.WriteLine(sw.ElapsedTicks);
Console.ReadKey();
}
In x86 mode (Release, naturally):
DoSomething() = 139544
DoSomething2() = 137924
In x64 mode:
DoSomething() = 19417
DoSomething2() = 17448
However, something interesting happens if our distribution of NaN's is sparser. If we change our 0.5 constant to 0.9 (only 10% NaN's) we get:
x86:
DoSomething() = 31483
DoSomething2() = 31731
x64:
DoSomething() = 31432
DoSomething2() = 31513
Reordering the calls shows the same trend as well. Food for thought.

Categories

Resources