Do I need to worry about inlining in Unity/C#? [duplicate] - c#

This question already has an answer here:
AggressiveInlining doesn't exist
(1 answer)
Closed 5 years ago.
For code clarity I sometimes create a function that should very obviously be inlined, be it either a wrapper, or a function that is only called in a single point, or a short function that is supposed to be called frequently and be fast.
In C I would inline it without a second thought, but in Unity/C# there's no way to do that AFAIK (this appears to be only available at .NET 4.5).
Can I trust the compiler to be smart enough to actually inline smartly, or I'd better sometimes sacrifice code clarity for performance, mistrusting the compiler?
Sure it depends case by case, premature optimization is evil, and you should profile instead of guessing. However a general overview of this subject might still be useful as a guideline, to improve upon.

Manually forcing in-lining in C# at compile time doesn't make much sense. When the code is run the just-in-time compiler can decide to in-line the code based on these heuristics:
http://blogs.msdn.com/b/ericgu/archive/2004/01/29/64717.aspx
Methods that are greater than 32 bytes of IL will not be inlined.
Virtual functions are not inlined.
Methods that have complex flow control will not be in-lined. Complex flow control is any flow control other than if/then/else; in this case, switch or while.
Methods that contain exception-handling blocks are not inlined, though methods that throw exceptions are still candidates for inlining.
If any of the method's formal arguments are structs, the method will not be inlined.
If you're absolutely sure that the method has to be in-lined you can use these above heurstics to make the method more appealing to in-line.
MethodImplOptions.AggressiveInlining is mostly useful for inlining across assembly boundaries, something I do not believe the just-in-time compiler can do (but I'd have to check that).

Related

What means "Non-inline to improve code quality as uncommon path"

I was looking at source code of List and LargeArrayBuilder classes in .net core 3.0 and saw that method AddWithBufferAllocation marked by comment "Non-inline to improve code quality as uncommon path". Also, method has attribute MethodImpl with value MethodImplOptions.NoInlining. How does noinlining improve code quality?
I believe this degrades performance. I could be wrong, but the call method is the overhead of physically calling a method (call instruction) and passing parameters. Especially, if we use to value type.
sources
It improves performance in the general case when that if test is false and it doesn't make the method call at all.
It means that AddRange, that might make the method call (but rarely) is smaller and can be better optimised than the same method with, potentially, all of AddWithBufferAllocation inlined inside it.

Why does tail call optimization need an op code?

So I've read many times before that technically .NET does support tail call optimization (TCO) because it has the opcode for it, and just C# doesn't generate it.
I'm not exactly sure why TCO needs an opcode or what it would do. As far as I know, the requirement for being able to do TCO is that the results of a recursive call are not combined with any variables in the current function scope. If you don't have that, then I don't see how an opcode prevents you from having to keep a stack frame open. If you do have that, then can't the compiler always easily compile it to something iterative?
So what is the point of an opcode? Obviously there's something I'm missing. In cases where TCO is possible at all, can't it always be handled at the compiler level than at the opcode level? What's an example of where it can't?
Following the links you already provided, this is the part which seems to me, answers your question pretty closely..
Source
The CLR and tail calls
When you're dealing with languages managed by the CLR, there are two kinds of compilers in play. There's the compiler that goes from your language's source code down to IL (C# developers know this as csc.exe), and then there's the compiler that goes from IL to native code (the JIT 32/64 bit compilers that are invoked at run time or NGEN time). Both the source->IL and IL->native compilers understand the tail call optimization. But the IL->native compiler--which I'll just refer to as JIT--has the final say on whether the tail call optimization will ultimately be used. The source->IL compiler can help to generate IL that is conducive to making tail calls, including the use of the "tail." IL prefix (more on that later). In this way, the source->IL compiler can structure the IL it generates to persuade the JIT into making a tail call. But the JIT always has the option to do whatever it wants.
When does the JIT make tail calls?
I asked Fei Chen and Grant Richins, neighbors down the hall from me who happen to work on the JIT, under what conditions the various JITs will employ the tail call optimization. The full answer is rather detailed. The quick summary is that the JITs try to use the tail call optimization whenever they can, but there are lots of reasons why the tail call optimization can't be used. Some reasons why tail calling is a non-option:
Caller doesn't return immediately after the call (duh :-))
Stack arguments between caller and callee are incompatible in a way that would require shifting things around in the caller's frame before the callee could execute
Caller and callee return different types
We inline the call instead (inlining is way better than tail calling, and opens the door to many more optimizations)
Security gets in the way
The debugger / profiler turned off JIT optimizations
The most interesting part in context of your question, which makes it super clear in my opinion, among many scenarios, is example of security mentioned above...
Security in .NET in many cases depends on the stack being accurate... at runtime.. Which is why, as highlighted above, the burden is shared by both the source to CIL compiler, and (runtime) CIL-to-native JIT compilers, with the final say being with the latter.
Guess: In a simple language like x86 assembler where you manage the stack "manually", you don't need an opcode - you can just set up the call stack appropriately.
But in something higher-level like .NET CIL, the stack is partially managed for you, and the whole act of invoking a function is a single opcode (e.g. call). So you need a different opcode to implement TCO - one that does "pass control flow to this function, but without creating a new stack frame".

Why aren't Automatic Properties inlined by default?

Being that properties are just methods under the hood, it's understandable that the performance of any logic they might perform may or may not improve performance - so it's understandable why the JIT needs to check if methods are worth inlining.
Automatic properties however (as far as I understand) cannot have any logic, and simply return or set the value of the underlying field. As far as I know, automatic properties are treated by the Compiler and the JIT just like any other methods.
(Everything below will rely on the assumption that the above paragraph is correct.)
Value Type properties show different behavior than the variable itself, but Reference Type properties supposedly should have the exact same behavior as direct access to the underlying variable.
// Automatic Properties Example
public Object MyObj { get; private set; }
Is there any case where automatic properties to Reference Types could show a performance hit by being inlined?
If not, what prevents either the Compiler or the JIT from automatically inlining them?
Note: I understand that the performance gain would probably be insignificant, especially when the JIT is likely to inline them anyway if used enough times - but small as the gain may be, it seems logical that such a seemingly simple optimization would be introduced regardless.
EDIT: The JIT compiler doesn't work in the way you think it does, which I guess is why you're probably not completely understanding what I was trying to convey above. I've quoted your comment below:
That is a different matter, but as far as I understand methods are only checked for being inline-worthy if they are called enough times. Not the mention that the checking itself is a performance hit. (Let the size of the performance hit be irrelevant for now.)
First, most, if not all, methods are checked to see if they can be inlined. Second, keep in mind that methods are only ever JITed once and it is during that one time that the JITer will determine if any methods called inside of it will be inlined. This can happen before any code is executed at all by your program. What makes a called method a good candidate for inlining?
The x86 JIT compiler (x64 and ia64 don't necessarily use the same optimization techniques) checks a few things to determine if a method is a good candidate for inlining, definitely not just the number of times it is called. The article lists things like if inlining will make the code smaller, if the call site will be executed a lot of times (ie in a loop), and others. Each method is optimized on its own, so the method may be inlined in one calling method but not in another, as in the example of a loop. These optimization heuristics are only available to JIT, the C# compiler just doesn't know: it's producing IL, not native code. There's a huge difference between them; native vs IL code size can be quite different.
To summarize, the C# compiler doesn't inline properties for performance reasons.
The jit compiler inlines most simple properties, including automatic properties. You can read more about how the JIT decides to inline method calls at this interesting blog post.
Well, the C# compiler doesn't inline any methods at all. I assume this is the case because of the way the CLR is designed. Each assembly is designed to be portable from machine to machine. A lot of times, you can change the internal behavior of a .NET assembly without having to recompile all the code, it can just be a drop in replacement (at least when types haven't changed). If the code were inlined, it breaks that (great, imo) design and you lose that luster.
Let's talk about inlining in C++ first. (Full disclosure, I haven't used C++ full time in a while, so I may be vague, my explanations rusty, or completely incorrect! I'm counting on my fellow SOers to correct and scold me)
The C++ inline keyword is like telling the compiler, "Hey man, I'd like you to inline this function, because I think it will improve performance". Unfortunately, it is only telling the compiler you'd prefer it inlined; it is not telling it that it must.
Perhaps at an earlier date, when compilers were less optimized than they are now, the compiler would more often than not compile that function inlined. However, as time went on and compilers grew smarter, the compiler writers discovered that in most cases, they were better at determining when a function should be inlined that the developer was. For those few cases where it wasn't, developers could use the seriouslybro_inlineme keyword (officially called __forceinline in VC++).
Now, why would the compiler writers do this? Well, inlining a function doesn't always mean increased performance. While it certainly can, it can also devastate your programs performance, if used incorrectly. For example, we all know one side effect of inlining code is increased code size, or "fat code syndrome" (disclaimer: not a real term). Why is "fat code syndrome" a problem? If you take a look at the article I linked above, it explains, among other things, memory is slow, and the bigger your code, the less likely it will fit in the fastest CPU cache (L1). Eventually it can only fit in memory, and then, inlining has done nothing. However, compilers know when these situations can happen, and do their best to prevent it.
Putting that together with your question, let's look at it this way: the C# compiler is like a developer writing code for the JIT compiler: the JIT is just smarter (but not a genius). It often knows when inlining will benefit or harm execution speed. "Senior developer" C# compiler doesn't have any idea how inlining a method call could benefit the runtime execution of your code, so it doesn't. I guess that actually means the C# compiler is smart, because it leaves the job of optimization to those who are better than it, in this case, the JIT compiler.
Automatic properties however (as far as I understand) cannot have any
logic, and simply return or set the value of the underlying field. As
far as I know, automatic properties are treated by the Compiler and
the JIT just like any other methods.
That automatic properties cannot have any logic is an implementation detail, there is not any special knowledge of that fact that is required for compilation. In fact, as you say auto properties are compiled down to method calls.
Suppose auto propes were inlined and the class and property are defined in a different assembly. This would mean that if the property implementation changes, you would have to recompile the application to see that change. That defeats using properties in the first place which should allow you to change the internal implementation without having to recompile the consuming application.
Automatic properties are just that - property get/set methods generated automatically. As result there is nothing special in IL for them. C# compiler by itself does very small number of optimizations.
As for reasons why not to inline - imagine your type is in a separate assembly hence you are free to change source of that assembly to have insanely complicated get/set for the property. As result compiler can't reason on complexity of the get/set code when it sees your automatic property first time while creating new assembly depending on your type.
As you've already noted in your question - "especially when the JIT is likely to inline them anyway" - this property methods will likely be inlined at JIT time.

Is writing only static methods equivalent to side-effect free programming in C#?

I have two questions, stemming from observed behavior of C# static methods (which I may be misinterpretting):
First:
Would a recursive static method be tail call optimized in a sense by the way the static method is implemented under the covers?
Second:
Would it be equivalent to functional programming to write an entire application with static methods and no variables beyond local scope? I am wondering because I still haven't wrapped my head around this "no side effects" term I keep hearing about functional programming..
Edit:
Let me mention, I do use and understand why and when to use static methods in the normal C# OO methodology, and I do understand tail call optimization will not be explicitly done to a recursive static method. That said, I understand tail call optimization to be an attempt at stopping the creation of a new stack frame with each pass, and I had at a couple points observed what appeared to be a static method executing within the frame of it's calling method, though I may have misinterpreted my observation.
Would a recursive static method be tail call optimized in a sense by the way the static method is implemented under the covers?
Static methods have nothing to do with tail recursion optimization. All the rules equally apply to instance and static methods, but personally I would never rely on JIT optimizing away my tail calls. Moreover, C# compiler doesn't emit tail call instruction but sometimes it is performed anyway. In short, you never know.
F# compiler supports tail recursion optimization and, when possible, compiles recursion to loops.
See more details on C# vs F# behavior in this question.
Would it be equivalent to functional programming to write an entire application with static methods and no variables beyond local scope?
It's both no and yes.
Technically, nothing prevents you from calling Console.WriteLine from a static method (which is a static method itself!) which obviously has side-effects. Nothing also prevents you from writing a class (with instance methods) that does not change any state (i.e. instance methods don't access instance fields). However from the design point of view, such methods don't really make sense as instance methods, right?
If you Add an item to .NET Framework List<T> (which has side effects), you will modify its state.
If you append an item to an F# list, you will get another list, and the original will not be modified.
Note that append indeed is a static method on List module. Writing “transformation” methods in separate modules encourages side-effect free design, as no internal storage is available by definition, even if the language allows it (F# does, LISP doesn't). However nothing really prevents you from writing a side-effect free non-static method.
Finally, if you want to grok functional language concepts, use one! It's so much more natural to write F# modules that operate immutable F# data structures than imitate the same in C# with or without static methods.
The CLR does do some tail call optimisations but only in 64-bit CLR processes. See the following for where it is done: David Broman's CLR Profiling API Blog: Tail call JIT conditions.
As for building software with just static variables and local scope, I've done this a lot and it's actually fine. It's just another way of doing things that is as valid as OO is. In fact because there is no state outside the function/closure, it's safer and easier to test.
I read the entire SICP book from cover to cover first however: http://mitpress.mit.edu/sicp/
No side effects simply means that the function can be called with the same arguments as many times as you like and always return the same value. That simply defines that the result of the function is always consistent therefore does not depend on any external state. Due to this, it's trivial to parallelize the function, cache it, test it, modify it, decorate it etc.
However, a system without side effects is typically useless, so things that do IO will always have side effects. It allows you to neatly encapsulate everything else though which is the point.
Objects are not always the best way, despite what people say. In fact, if you've ever used a LISP variant, you will no doubt determine that typical OO does sometimes get in the way.
There's a pretty good book written on this subject, http://www.amazon.com/Real-World-Functional-Programming-Examples/dp/1933988924.
And in the real world using F# unfortunately isn't an option due to team skills or existing codebases, which is another reason I do love this book, as it has shows many ways to implement F# features in the code you use day to day. And to me at least the vast reduction in state bugs, which take far longer to debug than simple logic errors, is worth the slight reduction in OOP orthodoxy.
For the most part having no static state and operating in a static method only on the parameters given will eliminate side-effects, as you're limiting yourself to pure functions. One point to watch out for though is retrieving data to be acted on or saving data to a database in such a function. Combining OOP and static methods, though, can help here, by having your static methods delegate to lower level objects commands to manipulate state.
Also a great help in enforcing function purity is to keep objects immutable whenever possible. Any object acted on should return a new modified instance, and the original copy discarded.
Regarding second question: I believe you mean "side effects" of mutable data structures, and obviously this is not a problem for (I believe) most functional languages. For instance, Haskel mostly (or even all!?) uses immutable data structures. So there is nothing about "static" behaviour.

Examples of CLR compiler optimizations

I'm doing a presentation in few months about .Net performance and optimization, I wanted to provide some samples of unnecessary optimization, things that will be done by the compiler anyways.
where can I find some explanation on what optimizations the compiler is actually capable of maybe some before and after code?
check out these links
C# Compiler Optimizations
compiler optimization
msdn
Also checkout this book on MSIL
1. Microsoft Intermediate Language: Comparison Between C# and VB.NET / Niranjan Kumar
What I think would be even better than examples of "things that will be done by the compiler anyways" would be examples of scenarios where the compiler doesn't perform "optimizations" that the developer assumes will yield a performance improvement but which, in fact, won't.
For example sometimes a developer will assume that caching a value locally will improve performance, when actually the savings of having one less value on the stack outweighs the miniscule cost of a field access that can be inlined.
Or the developer might assume that "force-inlining" a method call (essentially by stripping out the call itself and replacing with copied/pasted code) will be worthwhile, when in reality keeping the method call as-is would result in its getting inlined by the compiler only when it makes sense (when the benefit of inlining outweighs the growth in code size).
This is only a general idea, of course. I don't have concrete code samples that I can point to; but maybe you can scrounge some up if you look for them.

Categories

Resources