I'm working on a n image processing library which extends OpenCV, HALCON, ... . The library must be with .NET Framework 3.5 and since my experiences with .NET are limited I would like to ask some questions regarding the performance.
I have encountered a few specific things which I cannot explain to myself properly and would like you to ask a) why and b) what is the best practise to deal with the cases.
My first question is about Math.pow. I already found some answers here on StackOverflow which explains it quite well (a) but not what to do about this(b). My benchmark Program looks like this
Stopwatch watch = new Stopwatch(); // from the Diagnostics class
watch.Start();
for (int i = 0; i < 1000000; i++)
double result = Math.Pow(4,7) // the function call
watch.Stop()
The result was not very nice (~300ms on my computer) (I have run the test 10 times and calcuated the average value).
My first idea was to check wether this is because it is a static function. So I implemented my own class
class MyMath
{
public static double Pow (double x, double y) //Using some expensive functions to calculate the power
{
return Math.Exp(Math.Log(x) * y);
}
public static double PowLoop (double x, int y) // Using Loop
{
double res = x;
for(int i = 1; i < y; i++)
res *= x;
return res;
}
public static double Pow7 (double x) // Using inline calls
{
return x * x * x * x * x * x * x;
}
}
THe third thing I checked were if I would replace the Math.Pow(4,7) directly through 4*4*4*4*4*4*4.
The results are (the average out of 10 test runs)
300 ms Math.Pow(4,7)
356 ms MyMath.Pow(4,7) //gives wrong rounded results
264 ms MyMath.PowLoop(4,7)
92 ms MyMath.Pow7(4)
16 ms 4*4*4*4*4*4*4
Now my situation now is basically like this: Don't use Math for Pow. My only problem is just that... do I really have to implement my own Math-class now? It seems somehow ineffective to implement an own class just for the power function. (Btw. PowLoop and Pow7 are even faster in the Release build by ~25% while Math.Pow is not).
So my final questions are
a) am I wrong if I wouldn't use Math.Pow at all (but for fractions maybe) (which makes me somehow sad).
b) if you have code to optimize, are you really writing all such mathematical operations directly?
c) is there maybe already a faster (open-source^^) library for mathematical operations
d) the source of my question is basically: I have assumed that the .NET Framework itself already provides very optimized code / compile results for such basic operations - be it the Math-Class or handling arrays and I was a little surprised how much benefit I would gain by writing my own code. Are there some other, general "fields" or something else to look out in C# where I cannot trust C# directly.
Two things to bear in mind:
You probably don't need to optimise this bit of code. You've just done a million calls to the function in less than a second. Is this really going to cause big problems in your program?
Math.Pow is probably fairly optimal anyway. At a guess, it will be calling a proper numerics library written in a lower level language, which means you shouldn't expect orders of magnitude increases.
Numerical programming is harder than you think. Even the algorithms that you think you know how to calculate, aren't calculated that way. For example, when you calculate the mean, you shouldn't just add up the numbers and divide by how many numbers you have. (Modern numerics libraries use a two pass routine to correct for floating point errors.)
That said, if you decide that you definitely do need to optimise, then consider using integers rather than floating point values, or outsourcing this to another numerics library.
Firstly, integer operations are much faster than floating point. If you don't need floating point values, don't use the floating point data type. This generally true for any programming language.
Secondly, as you have stated yourself, Math.Pow can handle reals. It makes use of a much more intricate algorithm than a simple loop. No wonder it is slower than simply looping. If you get rid of the loop and just do n multiplications, you are also cutting off the overhead of setting up the loop - thus making it faster. But if you don't use a loop, you have to know
the value of the exponent beforehand - it can't be supplied at runtime.
I am not really sure why Math.Exp and Math.Log is faster. But if you use Math.Log, you can't find the power of negative values.
Basically int are faster and avoiding loops avoid extra overhead. But you are trading off some flexibility when you go for those. But it is generally a good idea to avoid reals when all you need are integers, but in this case coding up a custom function when one already exists seems a little too much.
The question you have to ask yourself is whether this is worth it. Is Math.Pow actually slowing your program down? And in any case, the Math.Pow already bundled with your language is often the fastest or very close to that. If you really wanted to make an alternate implementation that is really general purpose (i.e. not limited to only integers, positive values, etc.), you will probably end up using the same algorithm used in the default implementation anyway.
When you are talking about making a million iterations of a line of code then obviously every little detail will make a difference.
Math.Pow() is a function call which will be substantially slower than your manual 4*4...*4 example.
Don't write your own class as its doubtful you'll be able to write anything more optimised than the standard Math class.
Related
I need to operate with variables that can be either positive or negative values, in a way that a method should be able to add or substract its absolute value, but without changing its sign.
I have the following code to do something alike:
public static class FloatExtensions
{
public static float sumToItsAbsolute(this float currentValue, float summedValue)
{
float currentAbsoluteValue = Math.Abs(currentValue);
return (currentValue / currentAbsoluteValue) * (currentAbsoluteValue + summedValue);
}
}
I can work with it, but I would like to know if there is a better approach.
EDIT: By "better", I mean more performant. And if there is no potential bugs I have not seen yet.
Thanks in advance.
If you spend a little time looking around the documentation of the Math class (which you are already using, so it is known to you already), you will notice that it offers a many handy functions. Among them a function to get the sign of a value.
Thus, instead of doing v / Math.Abs(v) to get the sign of v as values -1 or 1 (functional and creative, but cumbersome, and perhaps even prone to precision errors IEEE floating point operations might suffer from), you could just do Math.Sign(v) to get the sign.
With regards to bugs: I'd like to throw the ball back into your court and like to encourage you to think a little bit about the behavior of your code for yourself first. Note how your code is doing a division and a multiplication of a sum. Can you find values for currentValue where either the division or the multiplication might fail or yield the wrong result?
Performancewise, i really doubt this function is what would make your program slower than you desire. If your program is behaving slower than expected in some aspects, use a profiler to figure out the actual part(s) in your program that account for most of the performance costs instead of eyeballing and blind-guessing it.
I've encountered non-optimal code in several open source projects, when programmers do not think about what they are using.
There is up to a 10 times performance difference between two cases, because of Math.Pow use Exp and Ln functions in internal, how it is explained in this answer.
The usual multiplication is better than powering in most cases (with small powers), but the best, of course, is the Exponentation by squaring algorithm.
Thus, I think that the compiler or JITter must perform such optimization with powers and other functions. Why is it still not introduced? Am I right?
Read the anwser you've referenced again, it clearly states that CRT uses a pow() function which Microsoft bought from Intel. The example you see using Math.Log and Math.Exp is an example the writer of the article has found in a programming book.
The "problem" with general exponentiation methods is that that they are build to produce the most accurate results for all cases. This often results in sub-optimal performance for certain cases. To increase the preformance of these certain cases, conditional logic must be added which results in performance loss for all cases. Because squaring or cubing a value is that simple to write without the Math.Pow method, there is no need to optimize these cases and taking the extra loss for all other cases.
i would say that would be a bad idea, because both methods do NOT return the same results every time.
here is a small test script
var r = new Random();
var any = Enumerable.Range(0, 1000).AsParallel().All(p =>
{
var d = r.NextDouble();
var pow = Math.Pow(d, 2.0);
var sqr = d * d;
var identical = pow == sqr;
if (!identical)
MessageBox.Show(d.ToString());
return identical;
});
there are different accuracies of both implementations. if a reliable calculation is done, it should be reproducable. if for example just in the release implementation the square optimization would be used, then the debug and release version would return different solutions. that can be quite a mess for error debugging ...
I've been using System.Math quite a lot lately and the other day I was wondering, how Microsoft would have implemented the Sqrt method in the library. So I popped open my best mate Reflector and tried to Disassemble the method in the library, but it showed:
[MethodImpl(MethodImplOptions.InternalCall),ReliabilityContract(Consistency.WillNotCorruptState, Cer.Success)]
public static extern double Sqrt(double d);
That day for the first time ever, I realized how dependent my kids are on the framework, to eat.
Jokes apart, but i was wondering what sort of algorithm MS would have used to implement this method or in other words how would you write your own implementation of Math.Sqrt in C# if you had no library support.
Any of the methods you find back with Reflector or the Reference Source that have the MethodImplOptions.InternalCall attribute are actually implemented in C++ inside the CLR. You can get the source code for these from the SSCLI20 distribution. The relevant file is clr/src/vm/ecall.cpp, it contains a table of method names with function pointers, used by the JIT compiler to directly embed the call address into the generated machine code. The relevant table section is
FCIntrinsic("Cos", COMDouble::Cos, CORINFO_INTRINSIC_Cos)
FCIntrinsic("Sqrt", COMDouble::Sqrt, CORINFO_INTRINSIC_Sqrt)
FCIntrinsic("Round", COMDouble::Round, CORINFO_INTRINSIC_Round)
...
Which takes you to clr/src/classlibnative/float/comfloat.cpp
FCIMPL1_V(double, COMDouble::Sqrt, double d)
WRAPPER_CONTRACT;
STATIC_CONTRACT_SO_TOLERANT;
return (double) sqrt(d);
FCIMPLEND
It just calls the CRT function. But that's not what happens in the x86 jitter, note the 'intrinsic' in the table declaration. You won't find that in the SSLI20 version of the jitter, it is a simple one unencumbered by patents. The shipping one however does turn it into an intrinsic:
double d = 2.0;
Console.WriteLine(Math.Sqrt(d));
translates to
00000008 fld dword ptr ds:[0072156Ch]
0000000e fsqrt
..etc
In other words, Math.Sqrt() translates to a single floating point machine code instruction. Check this answer for details on how that beats native code handily.
The function will be translated into assembler instructions. Such as the fsqrt instruction of the x87.
You could implement floating point numbers in software, but that will most likely be much slower. I think for Sqrt an iterative algorithm the typical implementation.
Google.com will give you more answers than StackOverflow.com
Have a look at this page:
http://en.wikipedia.org/wiki/Methods_of_computing_square_roots
One algorithm can be found under the title " Binary numeral system (base 2)" in the above wiki page.
But, software implementations will NOT be efficient. Modern CPU's have hardware implementations for math functions in FPU. You just need to invoke the correct instructions of the processor (in assembly or machine language)
public double Sqrt(int number)
{
double x = number / 2;
for (int i = 0; i < 100; i++) x = (x + number / x) / 2d;
return x;
}
Very crude method but if I used something more elaborate such as log method, you could ask "and how can I implement the log method?"
To recreate the System.Math.Sqrt function, just do this:
public static double Sqrt(double n) => Math.Pow(n, 1 / 2);
Yes, I am using a profiler (ANTS). But at the micro-level it cannot tell you how to fix your problem. And I'm at a microoptimization stage right now. For example, I was profiling this:
for (int x = 0; x < Width; x++)
{
for (int y = 0; y < Height; y++)
{
packedCells.Add(Data[x, y].HasCar);
packedCells.Add(Data[x, y].RoadState);
packedCells.Add(Data[x, y].Population);
}
}
ANTS showed that the y-loop-line was taking a lot of time. I thought it was because it has to constantly call the Height getter. So I created a local int height = Height; before the loops, and made the inner loop check for y < height. That actually made the performance worse! ANTS now told me the x-loop-line was a problem. Huh? That's supposed to be insignificant, it's the outer loop!
Eventually I had a revelation - maybe using a property for the outer-loop-bound and a local for the inner-loop-bound made CLR jump often between a "locals" cache and a "this-pointer" cache (I'm used to thinking in terms of CPU cache). So I made a local for Width as well, and that fixed it.
From there, it was clear that I should make a local for Data as well - even though Data was not even a property (it was a field). And indeed that bought me some more performance.
Bafflingly, though, reordering the x and y loops (to improve cache usage) made zero difference, even though the array is huge (3000x3000).
Now, I want to learn why the stuff I did improved the performance. What book do you suggest I read?
CLR via C# by Jeffrey Richter.
It is such a great book that someone stolen it in my library together with C# in depth.
The CLR is not involved at all here, this should all be translated to straight machine code without calls into the CLR. The JIT compiler is responsible for generating that machine code, it has an optimizer that tries to come up with the most efficient code. It has limitations, it cannot spend a large amount of time on it.
One of the important things it does is figuring out what local variables should be stored in the CPU registers. That's something that changed when you put the Height property in a local variable. It possibly decided to store that variable in a register. But now there's one less available to store another variable. Like the x or y variable, one that's critical for speed. Yes, that will slow it down.
You got a bad diagnostic about the outer loop. That could possibly be caused by the JIT optimizer re-arranging the loop code, giving the profiler a harder time mapping the machine code back to the corresponding C# statement.
Similarly, the optimizer might have decided that you were using the array inefficiently and switched the indexing order back. Not so sure it actually does that, but not impossible.
Anyhoo, the only way you can get some insight here is by looking at the generated machine code. There are many decent books about x86 assembly code, although they might be a bit hard to find these days. Your starting point is Debug + Windows + Disassembly.
Keep in mind however that even the machine code is not a very good predictor of how efficient code is going to run. Modern CPU cores are enormously complicated and the machine code is no longer representative for what actually happens inside the core. The only tried and true way is what you've already been doing: trial and error.
Albin - no. Honestly I didn't think that running outside a profiler would change the performance difference, so I didn't bother. You think I should have? Has that been a problem for you before? (I am compiling with optimizations on though)
Running under a debugger changes the performance: when it's being run under a debugger, the just-in-time compiler automatically disables optimizations (to make it easier to debug)!
If you must, use the debugger to attach to an already-running already-JITted process.
One thing you should know about working with Arrays is that the CLR will always make sure that array-indices are not out-of-bounds. It has an optimization for 1-dimensional arrays but not for 2+ dimensions.
Knowing this, you may want to benchmark MyCell Data[][] instead of MyCell Data[,]
Hm, I don't think that the loop enrolling is the real problem.
1. I'd try to avoid accessing the array Data three times per inner loop.
2. I'd also recommend, to re-think the three Add statements: you are apparently accessing a collection three times to add trivial some data. Make it only one access per iteration and add a data type containing three entries:
for (int y = 0; ... {
tTemp = Data[x, y];
packedCells.Add(new {
tTemp.HasCar, tTemp.RoadState, tTemp.Population
});
}
Another look reveals, that you are basically vectorizing a matrix by copying it into an array (or some other sequential collection)... Is that necessary at all? Why don't you just define a specialized indexer which simulates that linear access? Even better, if you only need to enumerate the entries (in that example you do, no random access required), why don't you use an adequate LINQ expression?
Point 1) Educated guesses are not the way to do performance tuning. In this case I can guess about as well as most, but guessing is the wrong way to do it.
Point 2) Profilers need to be well understood before you know what they're actually telling you. Here's a discussion of the issues. For example, what many profilers do is tell you "where the program spends its time", i.e. where the program counter spends its time, so they are almost absolutely blind to time requested by function calls, which is what your inner loop seems to consist of.
I do a lot of performance tuning, and here is what I do. I cycle between two activities:
Overall time measurement. This doesn't require special tools. I'm not trying to measure individual routines.
"Bottleneck" location. This does not require running the code at any kind of speed, because I'm not measuring. What I'm doing is locating lines of code that are responsible for a significant percent of time. I know which lines they are because they are on the stack for that percent, and stack samples easily find them.
Once I find a "bottleneck" and fix it, I go back to the first step, measure what percent of time I saved, and do it all again on the next "bottleneck", typically from 2 to 6 times. I am helped by the "magnification effect", in which a fixed problem magnifies the percentage used by remaining problems. It works for both macro and micro optimization.
(Sorry if I can't write "bottleneck" without quotes, because I don't think I've ever found a performance problem that resembled the neck of a bottle. Rather they were all simply doing things that didn't really need to be done.)
Since the comment might be overseen, I repeat myself: it is quite cumbersome to optimize code which is per se overfluous. You do not really need to explicitely linearize your matrix at all, see the comment above: Define a linearizing adapter which implements IEnumerable<MyCell> and feed it into the formatter.
I am getting a warning when I try to add another answer, so I am going to recycle this one.. :) After reading Steve's comments and thinking about it for a while, I suggest the following:
If serializing a multi-dimensional array is too slow (haven't tryied, I just believe you...) don't use it at all! It appears, that your matrix is not sparse and has fixed dimensions. So define the structure holding your cells as simple linear array with indexer:
[Serializable()]
class CellMatrix {
Cell [] mCells;
public int Rows { get; }
public int Columns { get; }
public Cell this (int i, int j) {
get {
return mCells[i + Rows * j];
}
// setter...
}
// constructor taking rows/cols...
}
A thing like this should serialize as fast as native Array does... I don't recommend hard coding the layout of Cell in order to save few bytes there...
Cheers,
Paul
Even experienced programmers write C# code like this sometimes:
double x = 2.5;
double y = 3;
if (x + 0.5 == 3) {
// this will never be executed
}
Basically, it's common knowledge that two doubles (or floats) can never be precisely equal to each other, because of the way the computer handles floating point arithmetic.
The problem is, everyone sort-of knows this, but code like this is still all over the place. It's just so easy to overlook.
Questions for you:
How have you dealt with this in your development organization?
Is this such a common thing that the compiler should be checking that we all should be screaming really loud for VS2010 to include a compile-time warning if someone is comparing two doubles/floats?
UPDATE: Folks, thanks for the comments. I want to clarify that I most certainly understand that the code above is incorrect. Yes, you never want to == compare doubles and floats. Instead, you should use epsilon-based comparison. That's obvious. The real question here is "how do you pinpoint the problem", not "how do you solve the technical issue".
Floating point values certainly can be equal to each other, and in the case you've given they always will be equal. You should almost never compare for equality using equals, but you do need to understand why - and why the example you've shown isn't appropriate.
I don't think it's something the compiler should necessarily warn about, but you may want to see whether it's something FxCop can pick up on. I can't see it in the warning list, but it may be there somewhere...
Personally I'm reasonably confident that competent developers would be able to spot this in code review, but that does rely on you having a code review in place to start with. It also relies on your developers knowing when to use double and when to use decimal, which is something I've found often isn't the case...
static int _yes = 0;
static int _no = 0;
static void Main(string[] args)
{
for (int i = 0; i < 1000000; i++)
{
double x = 1;
double y = 2;
if (y - 1 == x)
{
_yes++;
}
else
{
_no++;
}
}
Console.WriteLine("Yes: " + _yes);
Console.WriteLine("No: " + _no);
Console.Read();
}
Output
Yes: 1000000
No: 0
In our organization we have a lot of financial calculations and we don't use float and double for such tasks. We use Decimal in .NET, BigDecimal in Java and Numeric in MSSQL to escape round-off errors.
This article describes the problem: What Every CS Should Know About floating-Point Arithmetic
If FxCop or similar (as Jon suggests) doesn't work out for you a more heavy handed approach might be to take a copy of the code - replace all instances of float or double with a class you've written that's somewhat similar to System.Double, except that you overload the == operator to generate a warning!
I don't know if this is feasible in practice as I've not tried it - but let us know if you do try :-)
Mono's Gendarme is an FxCop-like tool. It has a rule called AvoidFloatingPointEqualityRule under the Correctness category. You could try it to find instances of this error in your code. I haven't used it, but it should analyse regular .net dll's. The FxCop rule with the same name was removed long ago.