I have a function that I use to add vectors, like this:
public static Vector AddVector(Vector v1, Vector v2)
{
return new Vector(
v1.X + v2.X,
v1.Y + v2.Y,
v1.Z + v2.Z);
}
Not very interesting. However, I overload the '+' operator for vectors and in the overload I call the AddVector function to avoid code duplication. I was curious whether this would result in two method calls or if it would be optimized at compile or JIT time. I found out that it did result in two method calls because I managed to gain 10% in total performance by duplicating the code of the AddVector as well as the dot product method in the '+' and '*' operator overload methods. Of course, this is a niche case because they get called tens of thousands of times per second, but I didn't expect this. I guess I expected the method to be inlined in the other, or something. And I suppose it's not just the overhead of the method call, but also the copying of the method arguments into the other method (they're structs).
It's no big deal, I can just duplicate the code (or perhaps just remove the AddVector method since I never call it directly) but it will nag me a lot in the future when I decide to create a method for something, like splitting up a large method into several smaller ones.
If you compile into debug mode or begin the process with a debugger attatched (though you can add one later) then a large class of JIT optimisations, including inlining, won't happen.
Try re-running your tests by compiling it in Release mode and then running it without a debugger attatched (Ctrl+F5 in VS) and see if you see the optimisations you expected.
"And I suppose it's not just the overhead of the method call, but also the copying of the method arguments into the other method (they're structs)."
Why don't you test this out? Write a version of AddVector that takes a reference to two vector structs, instead of the structs themselves.
Don't assume that struct is the right choice for performance. The copying cost can be significant in some scenarios. Until you measure you don't know. Furthermore, structs have spooky behaviors, especially if they're mutable, but even if they're not.
In addition, what others have said is correct:
Running under a debugger will disable JIT optimizations, making your performance measurements invalid.
Compiling in Debug mode also makes performance measurements invalid.
I had VS in Release mode and I ran without debugging so that can't be to blame. Running the .exe in the Release folder yields the same result. I have .NET 3.5 SP1 installed.
And whether or not I use structs depends on how many I create of something and how large it is when copying versus referencing.
You say Vector is a struct. According to a blog post from 2004, value types are a reason for not inlining a method. I don't know whether the rules have changed about that in the meantime.
Theres only one optimization I can think of, maybe you want to have a vOut parameter, so you avoid the call to new() and hence reduce garbage collection - Of course, this depends entirely on what you are doing with the returned vector and if you need to persist it or not, and if you're running into garbage collection problems.
Related
Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)
First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.
If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.
Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.
It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.
why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.
If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.
I'm working on a hands-off log mechanism for my c# application.
Here's what I'd like it to look like:
function a(arg1, arg2, arg 3.....) calls function b(arg4,arg5,arg6....), which in turn calls log() which is than able to detect the stacktrace (this can be done via Environment.StackTrace) and the values with which each function (e.g. a and b) in the stacktrace is called.
I want it to work in debug and release mode (or, at least, in debug mode).
Is this possible to do in .net?
Provably not possible:
By the time b is called, the space in the stack used by a's arg1 (the IL stack, so possibly it was never even put in a stack, but had been enregistered on the call) is not guaranteed to still be used by arg1.
By extension, if arg1 is a reference-type, the object it referred to is not guaranteed to not have been garbage collected, if it isn't used after the call to b.
Edit:
A bit more detail, since your comment suggests you're not grokking this and still think it should be possible.
The calling conventions used by the jitter are not specified in the specs for any of the relevant standards, which gives implementers freedom to make improvements. They do indeed differ between 32-bit and 64-bit versions, and different releases.
However, articles from MS people suggest that the convention used is akin to the __fastcall convention. In your call to a, arg1 would be put into the ECX register*, and arg2 into the EDX register (I'm simplifying by assuming 32-bit x86, with amd64 even more arguments are enregistered) of the core the code is running on. arg3 would be pushed on the stack and would indeed exist in memory.
Note that at this point, there is no memory location in which arg1 and arg2 exist, they're only in a CPU register.
In the course of executing the method itself, the registers and memory are used as necessary. And the b is called.
Now, if a is going to need arg1 or arg2 it'll have to push that before it calls b. But if it doesn't, then it won't - and things might even be re-ordered to reduce this need. Conversely, those registers may have already been used for something else already by this point - the jitter isn't stupid, so if it needs a register or a slot on the stack and there's one going unused for the rest of the method, it's going to reuse that space. (For that matter, at the level above this, the C# compiler will reuse slots in the virtual stack that the IL produced uses).
So, when b is called, arg4 is placed in register ECX, arg5 into EDX and arg6 pushed on the stack. At this point, arg1 and arg2 don't exist and you can no longer find out what they were than you can read a book after it has been recycled and turned into toilet paper.
(Interesting note is that it's very common for a method to call another with the same arguments in the same position, in which case ECX and EDX can be just left alone).
Then, b returns, putting its return value in the EAX register, or EDX:EAX pair or in memory with EAX pointing to it, depending on size, a does some more work before putting its return in that register, and so on.
Now, this is assuming there haven't been any optimisations done. It's possible that in fact, b wasn't called at all, but rather that its code was inlined. In this case whether the values where in registers or on the stack - and in the latter case, where they were on the stack, no longer has anything to do with b's signature and everything to do with where the relevant values are during a's execution, and it would be different in the case of another "call" to b, or even in the case of another "call" to b from a, since the entire call of a including its call to b could have been inlined in one case, not inlined in another, and inlined differently in yet another. If for example, arg4 came straight from a value returned by another call, it could be in the EAX register at this point, while arg5 was in ECX as it was the same as arg1 and arg6 was somewhere half-way in the middle of the stack-space being used by a.
Another possibility is that the call to b was a tail-call that was eliminated: Because the call to b was going to have its return value immediately returned too by a (or some other possibilities), then rather than pushing to the stack, the values being used by a are replaced in-place, and the return address changed so that the return from b jumps back to the method that called a, skipping some of the work (and reducing memory use to the extent that some functional style approaches that would overflow the stack instead work and indeed work well). In this case, during the call to b, the parameters to a are likely completely gone, even those that had been on the stack.
It's highly debatable whether this last case should even be considered an optimisation at all; some languages heavily depend upon it being done as with it they give good performance and without they give horrible performance if they even work at all (instead of overflowing the stack).
There can be all manner of other optimisations. There should be all manner of other optimisations - if the .NET team or the Mono team do something that makes my code faster or use less memory but otherwise behave the same, without my having to something, I for one won't be complaining!
And that's assuming that the person writing the C# in the first place never changed the value of a parameter, which certainly isn't going to be true. Consider this code:
IEnumerable<T> RepeatedlyInvoke(Func<T> factory, int count)
{
if(count < 0)
throw new ArgumentOutOfRangeException();
while(count-- != 0)
yield return factory();
}
Even if the C# compiler and the jitter had been designed in such a wasteful way that you could guarantee parameters weren't changed in the ways described above, how could you know what count had already been from within the invocation of factory? Even on the first call it's different, and it's not like the above is strange code.
So, in summary:
Jitter: Parameters are often enregistered. You can expect x86 to put 2 pointer, reference or integer parameters in registers and amd64 to put 4 pointer, reference or integer parameters and 4 floating-point parameters into registers. They have no location to read them from.
Jitter: Parameters on the stack are often over-written.
Jitter: There may not be a real call at all, so there's no place to look for parameters as they could be anywhere.
Jitter: The "call" may be re-using the same frame as the last one.
Compiler: The IL may re-use slots for locals.
Human: The programmer may change parameter values.
From all of that, how on earth is it going to be possible to know what arg1 was?
Now, add in the existence of garbage collection. Imagine if we could magically know what arg1 was anyway, despite all of this. If it was a reference to an object on the heap, it might still do us no good, because if all of the above meant that there were no more references active on the stack - and it should be clear that this quite definitely does happen - and the GC kicks in, then the object could have been collected. So all we can magically get hold of is a reference to something that no longer exists - indeed quite possibly to an area in the heap now being used for something else, bang goes the entire type safety of the entire framework!
It's not in the slightest bit comparable to reflection obtaining the IL, because:
The IL is static, rather than just a state at a given point in time. Likewise, we can get a copy of our favourite books from a library a lot more easily than we can get back our reaction the first time we read them.
The IL doesn't reflect the impact of inlining etc. anyway. If a call was inlined every time it was actually used, and then we used reflection to get a MethodBody of that method, the fact that its normally inlined is irrelevant.
The suggestions in other answers about profiling, AOP, and interception are as close as you're going to get.
*Actually, this is the real first parameter to instance members. Lets pretend everything is static so we don't have to keep pointing this out.
It's impossible in .net. At the runtime JITter may decide to use CPU registers instead of stack to store method parameters or even rewrite the initial (passed) values in the stack. So it would be very performance-costly to .net to allow to log parameters at any point in source code.
As far as I know the only way you can do it in general is to use .net CLR profiling API. (Typemock framework for example is able to do such things and it uses CLR profiling API)
If you only need to intercept virtual functions/properties (including interfaces methods/properties) calls you can use any intercepting framework (Unity or Castle for example).
There are some information about .net profiling API:
MSDN Magazine
MSDN Blogs
Brian Long's blog
This is not possible in C#, you should use an AOP approach and perform method argument logging when each method is called. This way you can centralize your logging code, make it reusable and then you would just need to mark which methods require argument logging.
I believe this could be easily achievable using an AOP framework like PostSharp.
Possibly not going to happen without type-mocking or some ICorDebug magic.
Even the StackFrame class only lists members which allow you to get information about the source, and not parameters.
The functionality you are after however exists as IntelliTrace with method logging. You can filter what you need for review.
Being that properties are just methods under the hood, it's understandable that the performance of any logic they might perform may or may not improve performance - so it's understandable why the JIT needs to check if methods are worth inlining.
Automatic properties however (as far as I understand) cannot have any logic, and simply return or set the value of the underlying field. As far as I know, automatic properties are treated by the Compiler and the JIT just like any other methods.
(Everything below will rely on the assumption that the above paragraph is correct.)
Value Type properties show different behavior than the variable itself, but Reference Type properties supposedly should have the exact same behavior as direct access to the underlying variable.
// Automatic Properties Example
public Object MyObj { get; private set; }
Is there any case where automatic properties to Reference Types could show a performance hit by being inlined?
If not, what prevents either the Compiler or the JIT from automatically inlining them?
Note: I understand that the performance gain would probably be insignificant, especially when the JIT is likely to inline them anyway if used enough times - but small as the gain may be, it seems logical that such a seemingly simple optimization would be introduced regardless.
EDIT: The JIT compiler doesn't work in the way you think it does, which I guess is why you're probably not completely understanding what I was trying to convey above. I've quoted your comment below:
That is a different matter, but as far as I understand methods are only checked for being inline-worthy if they are called enough times. Not the mention that the checking itself is a performance hit. (Let the size of the performance hit be irrelevant for now.)
First, most, if not all, methods are checked to see if they can be inlined. Second, keep in mind that methods are only ever JITed once and it is during that one time that the JITer will determine if any methods called inside of it will be inlined. This can happen before any code is executed at all by your program. What makes a called method a good candidate for inlining?
The x86 JIT compiler (x64 and ia64 don't necessarily use the same optimization techniques) checks a few things to determine if a method is a good candidate for inlining, definitely not just the number of times it is called. The article lists things like if inlining will make the code smaller, if the call site will be executed a lot of times (ie in a loop), and others. Each method is optimized on its own, so the method may be inlined in one calling method but not in another, as in the example of a loop. These optimization heuristics are only available to JIT, the C# compiler just doesn't know: it's producing IL, not native code. There's a huge difference between them; native vs IL code size can be quite different.
To summarize, the C# compiler doesn't inline properties for performance reasons.
The jit compiler inlines most simple properties, including automatic properties. You can read more about how the JIT decides to inline method calls at this interesting blog post.
Well, the C# compiler doesn't inline any methods at all. I assume this is the case because of the way the CLR is designed. Each assembly is designed to be portable from machine to machine. A lot of times, you can change the internal behavior of a .NET assembly without having to recompile all the code, it can just be a drop in replacement (at least when types haven't changed). If the code were inlined, it breaks that (great, imo) design and you lose that luster.
Let's talk about inlining in C++ first. (Full disclosure, I haven't used C++ full time in a while, so I may be vague, my explanations rusty, or completely incorrect! I'm counting on my fellow SOers to correct and scold me)
The C++ inline keyword is like telling the compiler, "Hey man, I'd like you to inline this function, because I think it will improve performance". Unfortunately, it is only telling the compiler you'd prefer it inlined; it is not telling it that it must.
Perhaps at an earlier date, when compilers were less optimized than they are now, the compiler would more often than not compile that function inlined. However, as time went on and compilers grew smarter, the compiler writers discovered that in most cases, they were better at determining when a function should be inlined that the developer was. For those few cases where it wasn't, developers could use the seriouslybro_inlineme keyword (officially called __forceinline in VC++).
Now, why would the compiler writers do this? Well, inlining a function doesn't always mean increased performance. While it certainly can, it can also devastate your programs performance, if used incorrectly. For example, we all know one side effect of inlining code is increased code size, or "fat code syndrome" (disclaimer: not a real term). Why is "fat code syndrome" a problem? If you take a look at the article I linked above, it explains, among other things, memory is slow, and the bigger your code, the less likely it will fit in the fastest CPU cache (L1). Eventually it can only fit in memory, and then, inlining has done nothing. However, compilers know when these situations can happen, and do their best to prevent it.
Putting that together with your question, let's look at it this way: the C# compiler is like a developer writing code for the JIT compiler: the JIT is just smarter (but not a genius). It often knows when inlining will benefit or harm execution speed. "Senior developer" C# compiler doesn't have any idea how inlining a method call could benefit the runtime execution of your code, so it doesn't. I guess that actually means the C# compiler is smart, because it leaves the job of optimization to those who are better than it, in this case, the JIT compiler.
Automatic properties however (as far as I understand) cannot have any
logic, and simply return or set the value of the underlying field. As
far as I know, automatic properties are treated by the Compiler and
the JIT just like any other methods.
That automatic properties cannot have any logic is an implementation detail, there is not any special knowledge of that fact that is required for compilation. In fact, as you say auto properties are compiled down to method calls.
Suppose auto propes were inlined and the class and property are defined in a different assembly. This would mean that if the property implementation changes, you would have to recompile the application to see that change. That defeats using properties in the first place which should allow you to change the internal implementation without having to recompile the consuming application.
Automatic properties are just that - property get/set methods generated automatically. As result there is nothing special in IL for them. C# compiler by itself does very small number of optimizations.
As for reasons why not to inline - imagine your type is in a separate assembly hence you are free to change source of that assembly to have insanely complicated get/set for the property. As result compiler can't reason on complexity of the get/set code when it sees your automatic property first time while creating new assembly depending on your type.
As you've already noted in your question - "especially when the JIT is likely to inline them anyway" - this property methods will likely be inlined at JIT time.
I'm building an application that is seriously slower than it should be (a process takes 4 seconds when it should take only .1 seconds, that's my goal at least).
I have a bunch of methods that pass an array from one to the other. This has kept my code nice and organized, but I'm worried that it's killing the efficiency of my code.
Can anyone confirm if this is the case?
Also, I have all of my code contained in a class separate from my UI. Is this going make things run significantly slower than if I had my code contained in the Form1.cs file?
Edit: There are about 95000 points that need to be calculated, each point goes through 7 methods that does additional calculations.
Have you tried any profiling or performance tools to narrow down why the slowdown occurs?
It might show you ways that you could use to refactor your code and improve performance.
This question asked by another user has several options that you can choose from:
Good .Net Profilers
No. This is not what is killing your code speed, unless many methods means like a million or something. You probably have more things iterating through your array than you need or realize, and the array itself may have a larger memory footprint than you realize.
Perhaps you should look into a design where instead of passing the array to 7 methods, you iterate the array once, passing the members to 7 methods, this will minimize the number of times you're iterating through 95000 members.
In general, function calls are basic enough to be highly optimized by any interpreter (or compiler). Therefore these do not produce to much blow-up in run time. In fact, if wrap your problem to, say, some fancy iterative solution, you save handling the stack, but instead have to handle some iteration variables, which will not be to hard.
I know, there have been programmers who wondered why their recursive algorithms have been so slow, until someone told them not to pass array entries by value.
You should provide some sample code. In general, you should for other bottlenecks, or find another algorithm.
Just need to run it against a good profiling tool. I've got some stuff I wished only took 4 seconds - works with upwards of a hundred million records in a pass.
An Array is a reference type not a value type. Therefore you never pass the array. You are actually passing the pointer to the array in memory. So passing the array isn't your issue. Most likely you have an issue with what you do with your array. You need to do what Jamie Keeling said and run it through a profiler or even just debug it and see if you get stuck in some big loops.
Why are you loading them all into an array and doing each method in turn rather than iterating through them as loaded?
If you can obtain them (from whatever input source) deal with them and output them (whether to screen, file our wherever) this will inevitably use less memory and reduce start-up time, at the very least.
If this answer is applicable to your situation, start by changing your methods to deal with enumerations rather than arrays (non-breaking change, since arrays are enumerations), then change your input method to yield return items as loaded rather than loading an entire array.
Sorry for posting an old link (.NET 1.1) but it was contained in VS2010 article, so:
Here you can read about method costs. (Initial link)
Then, if you start your code from VS (no matters, even in Release mode) the VS debugger connects to your code and slow it down.
I know that for this advise I will be minused but... The max performance will be achieved with unsafe operations with arrays (yes it's UNSAFE, but when there is performance deal, so...)
And the last - refactor your code to use minimum of methods which working with your arrays. It will improve the performance.
I need to get three objects out of a function, my instinct is to create a new type to return the three refs. Or if the refs were the same type I could use an array. However pass-by-ref is easier:
private void Mutate_AddNode_GetGenes(ref NeuronGene newNeuronGene, ref ConnectionGene newConnectionGene1, ref ConnectionGene newConnectionGene2)
{
}
There's obviously nothing wrong with this but I hesitate to use this approach, mostly I think for reasons of aesthetics and psycholgical bias. Are there actually any good reasons to use one of these approaches over the others? Perhaps a performance issue with creating extra wrapper objects or pushing parameters onto the stack. Note that in my particular case this is CPU intensive code. CPU cycles matter.
Is there a more elegant C#2 of C#3 approach?
Thanks.
For almost all computing problems, you will not notice the CPU difference. Since your sample code has the word "Gene" in it, you may actually fall into the rare category of code that would notice.
Creating and destroying objects just to wrap other objects would cost a bit of performance (they need to be created and garbage collected after all).
Aesthetically I would not create an object just to group unrelated objects, but if they logically belong together it is perfectly fine to define a containing object.
If you're worrying about the performance of a wrapping type (which is a lot cleaner, IMHO), you should use a struct. Current 32-bits implementations of .NET (and the upcomming 64-bits 4.0) support inlining / optimizing away of structs in many cases, so you'd probably see no performance difference whatsoever between a struct and ref arguments.
Worrying about the relative execution speed of those two options is probably a premature optimization. Focus on getting the algorithm correct first, and having clean, maintainable code. When that's done, you can run a profiler on it and optimize the 20% of the code that takes 80% of the CPU time. Even if this method ends up being in that 20%, the difference between the two calling styles is probably to small to register.
So, performance issues aside, I'd probably use a container class. Since this method takes only those three parameters, and (presumably) modifies each one, it sounds like it would make sense to have it as a method of the container class, with three member variables instead of ref parameters.