I have been reading up on virtual methods and how they are called. As discussed here and here, I have reached the conclusion that they should not really be that different.
The C# compiler emits IL code that calls static methods by the call IL instruction and calls virtual/non-virtual members by callvirt. It seems it is the job of JIT to actually figure out if the object the method being called from is actually null or not. So the check is the same for both.
Also, as discussed in the first article, it seems that vtables or tables that hold metadata on method definitions, are flattened at compile time. In other words, the tables contain exactly which method the object should call without a need for a recursive search up the inheritance chain.
With all the above, why are virtual methods considered slower? Is maybe one level of indirection(if any)that big of a deal? Please explain...
You're looking at the difference between a function call instruction with direct vs indirect addressing. But most of the "cost" of an indirect function call is not the call itself, but the lost opportunity to perform optimizations which require static knowledge of the target. Inlining, cross-procedure aliasing analysis, and so on.
Figuring out which actual method implementation to execute is going to have some cost, as opposed to just knowing. That cost can be very small, and it is quite likely that the cost is entirely negligible for any particular context because it really doesn't take that long. But the cost is non-zero, so in particularly performance sensitive applications it will make some difference.
Related
I'm working on a hands-off log mechanism for my c# application.
Here's what I'd like it to look like:
function a(arg1, arg2, arg 3.....) calls function b(arg4,arg5,arg6....), which in turn calls log() which is than able to detect the stacktrace (this can be done via Environment.StackTrace) and the values with which each function (e.g. a and b) in the stacktrace is called.
I want it to work in debug and release mode (or, at least, in debug mode).
Is this possible to do in .net?
Provably not possible:
By the time b is called, the space in the stack used by a's arg1 (the IL stack, so possibly it was never even put in a stack, but had been enregistered on the call) is not guaranteed to still be used by arg1.
By extension, if arg1 is a reference-type, the object it referred to is not guaranteed to not have been garbage collected, if it isn't used after the call to b.
Edit:
A bit more detail, since your comment suggests you're not grokking this and still think it should be possible.
The calling conventions used by the jitter are not specified in the specs for any of the relevant standards, which gives implementers freedom to make improvements. They do indeed differ between 32-bit and 64-bit versions, and different releases.
However, articles from MS people suggest that the convention used is akin to the __fastcall convention. In your call to a, arg1 would be put into the ECX register*, and arg2 into the EDX register (I'm simplifying by assuming 32-bit x86, with amd64 even more arguments are enregistered) of the core the code is running on. arg3 would be pushed on the stack and would indeed exist in memory.
Note that at this point, there is no memory location in which arg1 and arg2 exist, they're only in a CPU register.
In the course of executing the method itself, the registers and memory are used as necessary. And the b is called.
Now, if a is going to need arg1 or arg2 it'll have to push that before it calls b. But if it doesn't, then it won't - and things might even be re-ordered to reduce this need. Conversely, those registers may have already been used for something else already by this point - the jitter isn't stupid, so if it needs a register or a slot on the stack and there's one going unused for the rest of the method, it's going to reuse that space. (For that matter, at the level above this, the C# compiler will reuse slots in the virtual stack that the IL produced uses).
So, when b is called, arg4 is placed in register ECX, arg5 into EDX and arg6 pushed on the stack. At this point, arg1 and arg2 don't exist and you can no longer find out what they were than you can read a book after it has been recycled and turned into toilet paper.
(Interesting note is that it's very common for a method to call another with the same arguments in the same position, in which case ECX and EDX can be just left alone).
Then, b returns, putting its return value in the EAX register, or EDX:EAX pair or in memory with EAX pointing to it, depending on size, a does some more work before putting its return in that register, and so on.
Now, this is assuming there haven't been any optimisations done. It's possible that in fact, b wasn't called at all, but rather that its code was inlined. In this case whether the values where in registers or on the stack - and in the latter case, where they were on the stack, no longer has anything to do with b's signature and everything to do with where the relevant values are during a's execution, and it would be different in the case of another "call" to b, or even in the case of another "call" to b from a, since the entire call of a including its call to b could have been inlined in one case, not inlined in another, and inlined differently in yet another. If for example, arg4 came straight from a value returned by another call, it could be in the EAX register at this point, while arg5 was in ECX as it was the same as arg1 and arg6 was somewhere half-way in the middle of the stack-space being used by a.
Another possibility is that the call to b was a tail-call that was eliminated: Because the call to b was going to have its return value immediately returned too by a (or some other possibilities), then rather than pushing to the stack, the values being used by a are replaced in-place, and the return address changed so that the return from b jumps back to the method that called a, skipping some of the work (and reducing memory use to the extent that some functional style approaches that would overflow the stack instead work and indeed work well). In this case, during the call to b, the parameters to a are likely completely gone, even those that had been on the stack.
It's highly debatable whether this last case should even be considered an optimisation at all; some languages heavily depend upon it being done as with it they give good performance and without they give horrible performance if they even work at all (instead of overflowing the stack).
There can be all manner of other optimisations. There should be all manner of other optimisations - if the .NET team or the Mono team do something that makes my code faster or use less memory but otherwise behave the same, without my having to something, I for one won't be complaining!
And that's assuming that the person writing the C# in the first place never changed the value of a parameter, which certainly isn't going to be true. Consider this code:
IEnumerable<T> RepeatedlyInvoke(Func<T> factory, int count)
{
if(count < 0)
throw new ArgumentOutOfRangeException();
while(count-- != 0)
yield return factory();
}
Even if the C# compiler and the jitter had been designed in such a wasteful way that you could guarantee parameters weren't changed in the ways described above, how could you know what count had already been from within the invocation of factory? Even on the first call it's different, and it's not like the above is strange code.
So, in summary:
Jitter: Parameters are often enregistered. You can expect x86 to put 2 pointer, reference or integer parameters in registers and amd64 to put 4 pointer, reference or integer parameters and 4 floating-point parameters into registers. They have no location to read them from.
Jitter: Parameters on the stack are often over-written.
Jitter: There may not be a real call at all, so there's no place to look for parameters as they could be anywhere.
Jitter: The "call" may be re-using the same frame as the last one.
Compiler: The IL may re-use slots for locals.
Human: The programmer may change parameter values.
From all of that, how on earth is it going to be possible to know what arg1 was?
Now, add in the existence of garbage collection. Imagine if we could magically know what arg1 was anyway, despite all of this. If it was a reference to an object on the heap, it might still do us no good, because if all of the above meant that there were no more references active on the stack - and it should be clear that this quite definitely does happen - and the GC kicks in, then the object could have been collected. So all we can magically get hold of is a reference to something that no longer exists - indeed quite possibly to an area in the heap now being used for something else, bang goes the entire type safety of the entire framework!
It's not in the slightest bit comparable to reflection obtaining the IL, because:
The IL is static, rather than just a state at a given point in time. Likewise, we can get a copy of our favourite books from a library a lot more easily than we can get back our reaction the first time we read them.
The IL doesn't reflect the impact of inlining etc. anyway. If a call was inlined every time it was actually used, and then we used reflection to get a MethodBody of that method, the fact that its normally inlined is irrelevant.
The suggestions in other answers about profiling, AOP, and interception are as close as you're going to get.
*Actually, this is the real first parameter to instance members. Lets pretend everything is static so we don't have to keep pointing this out.
It's impossible in .net. At the runtime JITter may decide to use CPU registers instead of stack to store method parameters or even rewrite the initial (passed) values in the stack. So it would be very performance-costly to .net to allow to log parameters at any point in source code.
As far as I know the only way you can do it in general is to use .net CLR profiling API. (Typemock framework for example is able to do such things and it uses CLR profiling API)
If you only need to intercept virtual functions/properties (including interfaces methods/properties) calls you can use any intercepting framework (Unity or Castle for example).
There are some information about .net profiling API:
MSDN Magazine
MSDN Blogs
Brian Long's blog
This is not possible in C#, you should use an AOP approach and perform method argument logging when each method is called. This way you can centralize your logging code, make it reusable and then you would just need to mark which methods require argument logging.
I believe this could be easily achievable using an AOP framework like PostSharp.
Possibly not going to happen without type-mocking or some ICorDebug magic.
Even the StackFrame class only lists members which allow you to get information about the source, and not parameters.
The functionality you are after however exists as IntelliTrace with method logging. You can filter what you need for review.
Being that properties are just methods under the hood, it's understandable that the performance of any logic they might perform may or may not improve performance - so it's understandable why the JIT needs to check if methods are worth inlining.
Automatic properties however (as far as I understand) cannot have any logic, and simply return or set the value of the underlying field. As far as I know, automatic properties are treated by the Compiler and the JIT just like any other methods.
(Everything below will rely on the assumption that the above paragraph is correct.)
Value Type properties show different behavior than the variable itself, but Reference Type properties supposedly should have the exact same behavior as direct access to the underlying variable.
// Automatic Properties Example
public Object MyObj { get; private set; }
Is there any case where automatic properties to Reference Types could show a performance hit by being inlined?
If not, what prevents either the Compiler or the JIT from automatically inlining them?
Note: I understand that the performance gain would probably be insignificant, especially when the JIT is likely to inline them anyway if used enough times - but small as the gain may be, it seems logical that such a seemingly simple optimization would be introduced regardless.
EDIT: The JIT compiler doesn't work in the way you think it does, which I guess is why you're probably not completely understanding what I was trying to convey above. I've quoted your comment below:
That is a different matter, but as far as I understand methods are only checked for being inline-worthy if they are called enough times. Not the mention that the checking itself is a performance hit. (Let the size of the performance hit be irrelevant for now.)
First, most, if not all, methods are checked to see if they can be inlined. Second, keep in mind that methods are only ever JITed once and it is during that one time that the JITer will determine if any methods called inside of it will be inlined. This can happen before any code is executed at all by your program. What makes a called method a good candidate for inlining?
The x86 JIT compiler (x64 and ia64 don't necessarily use the same optimization techniques) checks a few things to determine if a method is a good candidate for inlining, definitely not just the number of times it is called. The article lists things like if inlining will make the code smaller, if the call site will be executed a lot of times (ie in a loop), and others. Each method is optimized on its own, so the method may be inlined in one calling method but not in another, as in the example of a loop. These optimization heuristics are only available to JIT, the C# compiler just doesn't know: it's producing IL, not native code. There's a huge difference between them; native vs IL code size can be quite different.
To summarize, the C# compiler doesn't inline properties for performance reasons.
The jit compiler inlines most simple properties, including automatic properties. You can read more about how the JIT decides to inline method calls at this interesting blog post.
Well, the C# compiler doesn't inline any methods at all. I assume this is the case because of the way the CLR is designed. Each assembly is designed to be portable from machine to machine. A lot of times, you can change the internal behavior of a .NET assembly without having to recompile all the code, it can just be a drop in replacement (at least when types haven't changed). If the code were inlined, it breaks that (great, imo) design and you lose that luster.
Let's talk about inlining in C++ first. (Full disclosure, I haven't used C++ full time in a while, so I may be vague, my explanations rusty, or completely incorrect! I'm counting on my fellow SOers to correct and scold me)
The C++ inline keyword is like telling the compiler, "Hey man, I'd like you to inline this function, because I think it will improve performance". Unfortunately, it is only telling the compiler you'd prefer it inlined; it is not telling it that it must.
Perhaps at an earlier date, when compilers were less optimized than they are now, the compiler would more often than not compile that function inlined. However, as time went on and compilers grew smarter, the compiler writers discovered that in most cases, they were better at determining when a function should be inlined that the developer was. For those few cases where it wasn't, developers could use the seriouslybro_inlineme keyword (officially called __forceinline in VC++).
Now, why would the compiler writers do this? Well, inlining a function doesn't always mean increased performance. While it certainly can, it can also devastate your programs performance, if used incorrectly. For example, we all know one side effect of inlining code is increased code size, or "fat code syndrome" (disclaimer: not a real term). Why is "fat code syndrome" a problem? If you take a look at the article I linked above, it explains, among other things, memory is slow, and the bigger your code, the less likely it will fit in the fastest CPU cache (L1). Eventually it can only fit in memory, and then, inlining has done nothing. However, compilers know when these situations can happen, and do their best to prevent it.
Putting that together with your question, let's look at it this way: the C# compiler is like a developer writing code for the JIT compiler: the JIT is just smarter (but not a genius). It often knows when inlining will benefit or harm execution speed. "Senior developer" C# compiler doesn't have any idea how inlining a method call could benefit the runtime execution of your code, so it doesn't. I guess that actually means the C# compiler is smart, because it leaves the job of optimization to those who are better than it, in this case, the JIT compiler.
Automatic properties however (as far as I understand) cannot have any
logic, and simply return or set the value of the underlying field. As
far as I know, automatic properties are treated by the Compiler and
the JIT just like any other methods.
That automatic properties cannot have any logic is an implementation detail, there is not any special knowledge of that fact that is required for compilation. In fact, as you say auto properties are compiled down to method calls.
Suppose auto propes were inlined and the class and property are defined in a different assembly. This would mean that if the property implementation changes, you would have to recompile the application to see that change. That defeats using properties in the first place which should allow you to change the internal implementation without having to recompile the consuming application.
Automatic properties are just that - property get/set methods generated automatically. As result there is nothing special in IL for them. C# compiler by itself does very small number of optimizations.
As for reasons why not to inline - imagine your type is in a separate assembly hence you are free to change source of that assembly to have insanely complicated get/set for the property. As result compiler can't reason on complexity of the get/set code when it sees your automatic property first time while creating new assembly depending on your type.
As you've already noted in your question - "especially when the JIT is likely to inline them anyway" - this property methods will likely be inlined at JIT time.
I have two questions, stemming from observed behavior of C# static methods (which I may be misinterpretting):
First:
Would a recursive static method be tail call optimized in a sense by the way the static method is implemented under the covers?
Second:
Would it be equivalent to functional programming to write an entire application with static methods and no variables beyond local scope? I am wondering because I still haven't wrapped my head around this "no side effects" term I keep hearing about functional programming..
Edit:
Let me mention, I do use and understand why and when to use static methods in the normal C# OO methodology, and I do understand tail call optimization will not be explicitly done to a recursive static method. That said, I understand tail call optimization to be an attempt at stopping the creation of a new stack frame with each pass, and I had at a couple points observed what appeared to be a static method executing within the frame of it's calling method, though I may have misinterpreted my observation.
Would a recursive static method be tail call optimized in a sense by the way the static method is implemented under the covers?
Static methods have nothing to do with tail recursion optimization. All the rules equally apply to instance and static methods, but personally I would never rely on JIT optimizing away my tail calls. Moreover, C# compiler doesn't emit tail call instruction but sometimes it is performed anyway. In short, you never know.
F# compiler supports tail recursion optimization and, when possible, compiles recursion to loops.
See more details on C# vs F# behavior in this question.
Would it be equivalent to functional programming to write an entire application with static methods and no variables beyond local scope?
It's both no and yes.
Technically, nothing prevents you from calling Console.WriteLine from a static method (which is a static method itself!) which obviously has side-effects. Nothing also prevents you from writing a class (with instance methods) that does not change any state (i.e. instance methods don't access instance fields). However from the design point of view, such methods don't really make sense as instance methods, right?
If you Add an item to .NET Framework List<T> (which has side effects), you will modify its state.
If you append an item to an F# list, you will get another list, and the original will not be modified.
Note that append indeed is a static method on List module. Writing “transformation” methods in separate modules encourages side-effect free design, as no internal storage is available by definition, even if the language allows it (F# does, LISP doesn't). However nothing really prevents you from writing a side-effect free non-static method.
Finally, if you want to grok functional language concepts, use one! It's so much more natural to write F# modules that operate immutable F# data structures than imitate the same in C# with or without static methods.
The CLR does do some tail call optimisations but only in 64-bit CLR processes. See the following for where it is done: David Broman's CLR Profiling API Blog: Tail call JIT conditions.
As for building software with just static variables and local scope, I've done this a lot and it's actually fine. It's just another way of doing things that is as valid as OO is. In fact because there is no state outside the function/closure, it's safer and easier to test.
I read the entire SICP book from cover to cover first however: http://mitpress.mit.edu/sicp/
No side effects simply means that the function can be called with the same arguments as many times as you like and always return the same value. That simply defines that the result of the function is always consistent therefore does not depend on any external state. Due to this, it's trivial to parallelize the function, cache it, test it, modify it, decorate it etc.
However, a system without side effects is typically useless, so things that do IO will always have side effects. It allows you to neatly encapsulate everything else though which is the point.
Objects are not always the best way, despite what people say. In fact, if you've ever used a LISP variant, you will no doubt determine that typical OO does sometimes get in the way.
There's a pretty good book written on this subject, http://www.amazon.com/Real-World-Functional-Programming-Examples/dp/1933988924.
And in the real world using F# unfortunately isn't an option due to team skills or existing codebases, which is another reason I do love this book, as it has shows many ways to implement F# features in the code you use day to day. And to me at least the vast reduction in state bugs, which take far longer to debug than simple logic errors, is worth the slight reduction in OOP orthodoxy.
For the most part having no static state and operating in a static method only on the parameters given will eliminate side-effects, as you're limiting yourself to pure functions. One point to watch out for though is retrieving data to be acted on or saving data to a database in such a function. Combining OOP and static methods, though, can help here, by having your static methods delegate to lower level objects commands to manipulate state.
Also a great help in enforcing function purity is to keep objects immutable whenever possible. Any object acted on should return a new modified instance, and the original copy discarded.
Regarding second question: I believe you mean "side effects" of mutable data structures, and obviously this is not a problem for (I believe) most functional languages. For instance, Haskel mostly (or even all!?) uses immutable data structures. So there is nothing about "static" behaviour.
I assume that public or private static targets must have reduced memory usage, due to the fact that there is only one copy of the static target in memory.
It seems like because a method is static that might make the method a potential point for further optimization by the CLR compiler beyond what is possible with a non-static function. Just a flimsy theory though, so I've come to ask you all.
Do static public or private methods provide any increased performance benefit beyond reduced memory usage?
(Note: I'm not interested in responses that talk on the problems of premature optimization. Certainly that's sound advice I follow everyday, but that does not mean optimization is not necessary at times. (double negative!). Allow me to indulge my curiosity, at the least)
From Static Classes and Static Class Members (C# Programming Guide)
A call to a static method generates a
call instruction in Microsoft
intermediate language (MSIL), whereas
a call to an instance method generates
a callvirt instruction, which also
checks for a null object references.
However, most of the time the
performance difference between the two
is not significant.
Aside from what astander said, your question suggests a misunderstanding of what instance methods do. Regardless of whether the function is static or not, there is only one copy of the function code in memory. A non-static method has to be called through an object, but the object does not carry its own private copy of the method. So the memory usage of static and non-static methods is in fact identical, and as others have pointed out, the performance characteristics are nearly identical.
Non-static member variables, however, do exist separately for every object that you create. But it is nearly always a waste of time to worry about that memory usage, unless you actually have a memory-related problem in your program.
This is a little bit off-topic, but none the less important.
The choice of making methods static or instance should not be based on execution time (which anyway seems not to matter). It should be based on whether the method operates on an object. For instance, all the Math.* methods are static while e.g. (most) String.* methods are instance since they operate on a String instance. My personal philosophy: a good design should make up for the few cycles that may be saved elsewhere.
Another view on the subject: I recently worked with a guy who had been told that static methods are evil because they take us back to the dark age of procedural programming and thus shall be avoided at all costs. This resulted in weird examples of classes that required instances for access to methods that had absolutely no interest in the internals of the object.
Phew, it felt good to get that from my hearth.
Good answers - basically it doesn't matter, which is the answer to nearly every question of this sort. Even if it did make a difference - If execution time of your program cost a dollar, this sort of issue is likely to cost a fraction of a cent, and it is very likely that there are other things costing a great deal more.
MeasureIt to be certain, but you'll find unless you're creating a globe-spanning ultra-high-volume transaction processing supercomputing cluster, it's not going have an appreciable difference.
I need to get three objects out of a function, my instinct is to create a new type to return the three refs. Or if the refs were the same type I could use an array. However pass-by-ref is easier:
private void Mutate_AddNode_GetGenes(ref NeuronGene newNeuronGene, ref ConnectionGene newConnectionGene1, ref ConnectionGene newConnectionGene2)
{
}
There's obviously nothing wrong with this but I hesitate to use this approach, mostly I think for reasons of aesthetics and psycholgical bias. Are there actually any good reasons to use one of these approaches over the others? Perhaps a performance issue with creating extra wrapper objects or pushing parameters onto the stack. Note that in my particular case this is CPU intensive code. CPU cycles matter.
Is there a more elegant C#2 of C#3 approach?
Thanks.
For almost all computing problems, you will not notice the CPU difference. Since your sample code has the word "Gene" in it, you may actually fall into the rare category of code that would notice.
Creating and destroying objects just to wrap other objects would cost a bit of performance (they need to be created and garbage collected after all).
Aesthetically I would not create an object just to group unrelated objects, but if they logically belong together it is perfectly fine to define a containing object.
If you're worrying about the performance of a wrapping type (which is a lot cleaner, IMHO), you should use a struct. Current 32-bits implementations of .NET (and the upcomming 64-bits 4.0) support inlining / optimizing away of structs in many cases, so you'd probably see no performance difference whatsoever between a struct and ref arguments.
Worrying about the relative execution speed of those two options is probably a premature optimization. Focus on getting the algorithm correct first, and having clean, maintainable code. When that's done, you can run a profiler on it and optimize the 20% of the code that takes 80% of the CPU time. Even if this method ends up being in that 20%, the difference between the two calling styles is probably to small to register.
So, performance issues aside, I'd probably use a container class. Since this method takes only those three parameters, and (presumably) modifies each one, it sounds like it would make sense to have it as a method of the container class, with three member variables instead of ref parameters.