List<int> list = ...
for(int i = 0; i < list.Count; ++i)
{
...
}
So does the compiler know the list.Count does not have to be called each iteration?
Are you sure about that?
List<int> list = new List<int> { 0 };
for (int i = 0; i < list.Count; ++i)
{
if (i < 100)
{
list.Add(i + 1);
}
}
If the compiler cached the Count property above, the contents of list would be 0 and 1. If it did not, the contents would be the integers from 0 to 100.
Now, that might seem like a contrived example to you; but what about this one?
List<int> list = new List<int>();
int i = 0;
while (list.Count <= 100)
{
list.Add(i++);
}
It may seem as if these two code snippets are completely different, but that's only because of the way we tend to think about for loops versus while loops. In either case, the value of a variable is checked on every iteration. And in either case, that value very well could change.
Typically it's not safe to assume the compiler optimizes something when the behavior between "optimized" and "non-optimized" versions of the same code is actually different.
The C# compiler does not do any optimizations like this. The JIT compiler, however, optimizes this for arrays, I believe (which are not resizable), but not for lists.
A List's count property can change within the loop structure, so it would be an incorrect optimization.
It's worth noting, as nobody else has mentioned it, that there is no knowing from looking at a loop like this what the "Count" property will actually do, or what side effects it may have.
Consider the following cases:
A third party implementation of a property called "Count" could execute any code it wished to. e.g. return a Random number for all we know. With List we can be a bit more confident about how it will operate, but how is the JIT to tell these implementations apart?
Any method call within the loop could potentially alter the return value of Count (not just a straight "Add" directly on the collection, but a user method that is called in the loop might also party on the collection)
Any other thread that happens to be executing concurrently could also change the Count value.
The JIT just can't "know" that Count is constant.
However, the JIT compiler can make the code run much more efficiently by inlining the implementation of the Count property (as long as it is a trivial implementation). In your example it may well be inlined down to a simple test of a variable value, avoiding the overhead of a function call on each iteration, and thus making the final code nice and fast. (Note: I don't know if the JIT will do this, just that it could. I don't really care - see the last sentence of my answer to find out why)
But even with inlining, the value may still be changed between iterations of the loop, so it would still need to be read from RAM for each comparison. If you were to copy Count into a local variable and the JIT could determine by looking at the code in the loop that the local variable will remain constant for the loop's lifetime, then it may be able to further optimise it (e.g. by holding the constant value in a register rather than having to read it from RAM on each iteration). So if you (as a programmer) know that Count will be constant for the lifetime of the loop, you may be able to help the JIT by caching Count in a local variable. This gives the JIT the best chance of optimising the loop. (But there are no guarantees that the JIT will actually apply this optimisation, so it may make no difference to the execution times to manually "optimise" this way. You also risk things going wrong if your assumption (that Count is constant) is incorrect. Or your code may break if another programmer edits the contents of the loop so that Count is no longer constant, and he doesn't spot your cleverness)
So the moral of the story is: The JIT can make a pretty good stab at optimising this case by inlining. Even if it doesn't do this now, it may do it with the next C# version. You might not gain any advantage by manually "optmising" the code, and you risk changing its behaviour and thus breaking it, or at least making future maintenance of your code more risky, or possibly losing out on future JIT enhancements. So the best approach is to just write it the way you have, and optimise it when your profiler tells you that the loop is your performance bottleneck.
Hence, IMHO it's interesting to consider/understand cases like this, but ultimately you don't actually need to know. A little bit of knowledge can be a dangerous thing. Just let the JIT do its thing, and then profile the result to see if it needs improving.
If you take a look at the IL generated for Dan Tao's example you'll see a line like this at the condition of the loop:
callvirt instance int32 [mscorlib]System.Collections.Generic.List`1<int32>::get_Count()
This is undeniable proof that Count (i.e. get_Count()) is called for every iteration of the loop.
For all the other commenters who say that the 'Count' property could change in a loop body: JIT optimizations let you take advantage of the actual code that's running, not the worst-case of what might happen. In general, the Count could change. But it doesn't in all code.
So in the poster's example (which might not have any Count-changing), is it unreasonable for the JIT to detect that the code in the loop doesn't change whatever internal variable List uses to hold its length? If it detects that list.Count is constant, wouldn't it lift that variable access out of the loop body?
I don't know if the JIT does this or not. But I am not so quick to brush this problem off as trivially "never."
No, it doesn't. Because condition is calculated on each step. It can be more complex than just comparsion with count, and any boolean expression is allowed:
for(int i = 0; new Random().NextDouble() < .5d; i++)
Console.WriteLine(i);
http://msdn.microsoft.com/en-us/library/aa664753(VS.71).aspx
It depends on the particular implementation of Count; I've never noticed any performance issues with using the Count property on a List so I assume it's ok.
In this case you can save yourself some typing with a foreach.
List<int> list = new List<int>(){0};
foreach (int item in list)
{
// ...
}
Related
I keep seeing examples online, where there is a property of an element within a method that is copied to a local variable before use. For example, something like this (from Microsoft's StackPanel source code):
UIElementCollection children = arrangeElement.InternalChildren;
...
for (int i = 0, count = children.Count; i < count; ++i)
{
UIElement child = (UIElement)children[i];
if (child == null) { continue; }
...
}
Can anyone explain to me what the benefit of doing that is (if there is one), rather than accessing the property directly each time, like this?:
for (int i = 0, count = arrangeElement.InternalChildren.Count; i < count; ++i)
{
UIElement child = (UIElement)arrangeElement.InternalChildren[i];
if (child == null) { continue; }
...
}
Clearly, it saves a few characters on the screen, but that's not much of a reason to do this. Also, I understand why we might want to do this with a long running method, as a form of caching:
double value = GetValueFromLongRunningMethod();
...
for (int i = 0; i < someCollection.Count; i++) DoSomethingWith(value);
But I see this done with properties a lot and wonder why. Here's another commonly found example from the internet to do with virtualization:
IItemContainerGenerator generator = this.ItemContainerGenerator;
GeneratorPosition position = generator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Why do that instead of this?:
GeneratorPosition position =
this.ItemContainerGenerator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Finally, if this is done for the same reason that we might cache the result of a long running method, then how are we supposed to know which properties need to be accessed in this way?
Firstly, it avoids calling .InternalChildren lots of times. This could be a small but noticeable reduction of virtual calls (since it is used in a loop), but in some cases it might be much more significant. In some cases, a property that returns a collection or array might allocate every time it is called; DataRow.ItemArray is a classic example of this - so it is actively harmful to call it each time. An additional consideration is that even if it returns the same array each time it is called, there is JIT magic that happens to elide bounds checking, but it'll only work if the JIT can see that you are iterating a single array for the entire duration. If you stick a property accessor in the middle: this won't be obvious and the bounds check removal won't happen. It also might not happen if you've manually hoisted the upper bound!
Side note: if it isn't an array, then foreach would probably usually be preferable, and there would not be any advantage to introducing a local, due to how foreach works internally.
Note: since you're using .Count vs .Length, this definitely isn't an array, and you should probably simplify to:
foreach(UIElement child = in arrangeElement.InternalChildren) {...}
or
foreach(var child = in arrangeElement.InternalChildren) {...}
Not only does this remove this question completely, but it means that the type's own iterator (which might be an optimized struct iterator, or might be a simple IEnumerable<T> class, such as a compiler-generated iterator block) can be used. This usually has more direct access to the internals, and thus bypasses a few indirections and API checks that indexers require.
It might be fruitful in some cases like when you have to
debug some piece of code and you need to instantly see the value of variable
do a few operations at a time with an object, which requires casting - as result you cast it once
and sometimes, when you use value type objects this kind of making a local copy gives you an opportunity to not change the value of class' property
Why do that instead of this?:
GeneratorPosition position =
this.ItemContainerGenerator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Let's get very abstract about this:
We get a generator. That apparently is this.ItemContainerGenerator for now, but that could change.
We use it. Only once here, but usually in multiple statements.
When we later decide to get that generator elsewhere, the usage should stay the same.
The example is too small to make this convincing, but there is some kind of logic to be discerned here.
I was reading Improving .NET Application Performance and Scalability. The section titled Avoid Repetitive Field or Property Access contains a guideline:
If you use data that is static for the duration of the loop, obtain it
before the loop instead of repeatedly accessing a field or property.
The following code is given as an example of this:
for (int item = 0; item < Customer.Orders.Count; item++)
{
CalculateTax(Customer.State, Customer.Zip, Customer.Orders[item]);
}
becomes
string state = Customer.State;
string zip = Customer.Zip;
int count = Customers.Orders.Count;
for (int item = 0; item < count; item++)
{
CalculateTax(state, zip, Customer.Orders[item]);
}
The article states:
Note that if these are fields, it may be possible for the compiler to
do this optimization automatically. If they are properties, it is much
less likely. If the properties are virtual, it cannot be done
automatically.
Why is it "much less likely" for properties to be optimized by the compiler in this manner, and when can one expect for a particular property to be or not to be optimized? I would assume that properties where additional operations are performed in the accessors are harder for the compiler to optimize, and that those that only modify a backing field are more likely to be optimized, but would like some more concrete rules. Are auto-implemented properties always optimized?
It requires the jitter to apply two optimizations:
First the property getter method must be inlined so it turns into the equivalent of a field access. That tends to work when the getter is small and does not throw exceptions. This is necessary so the optimizer can be sure that the getter does not rely on state that can be affected by other code.
Note how the hand-optimized code would be wrong if, say, the Customer.Orders[] indexer would alter the Customer.State property. Lazy code like this is pretty unlikely of course but it's not like this has never been done :) The optimizer has to be sure.
Secondly, the field access code has to be hoisted out of the loop body. An optimization called "invariant code motion". Works on simple property getter code when the jitter can prove that the statements inside the loop body don't affect the value.
The jitter optimizer implements it but it is not stellar at it. In this particular case it is pretty likely that it will give up when it cannot inline the CalculateTax() method. A native compiler optimizes it much more aggressively, it can afford to burn the memory and analysis time on it. The jitter optimizer must meet a pretty hard deadline to avoid pauses.
Do keep the constraints of the optimizer in mind when you do this yourself. Pretty darn ugly bug of course if these methods do have side-effects that you did not count on. And only do this when the profiler told you that this code is on the hot path, the typical ~10% of your code that actually affects the execution time. Low odds here, the dbase query to get customer/order data is going to orders of magnitude more expensive than calculating tax. Luckily code transforms like this also tend to make code more readable so you usually get it for free. YMMV.
A backgrounder on jitter optimizations is here.
Why is it "much less likely" for properties to be optimized by the compiler in this manner, and when can one expect for a particular property to be or not to be optimized?
Properties are not always just wrappers for a field. If there is any degree of logic in a property, it becomes significantly more difficult for a compiler to prove that it is correct to re-use the value it first got when the loop began.
As an extreme example, consider
private Random rnd = new Random();
public int MyProperty
{
get { return rnd.Next(); }
}
I have a code using third-party-tool iterating over a collection of points.
for (int i = 0; i < pcoll.PointCount; i++) { /* ... */ }
When doing profiling via dotTrace I noticed that the PointCount-proerty is accessed every iteration (see picture above)
.
I expected that the value for this property is optimized away by the compiler but obviously that doesn't happen. Maybe this is actually a problem within the COM-based 3rd-party lib or also within dotTrace self when collecting the information.
I'm not sure if this topic wouldn't fit better to Gis.StackExchange. However maybe someone has any idea under which circumstances optimzation won't take place or how it might happen.
Simply put, how is the compiler to know whether pcoll.PointCount will change between invocations? It can't safely make the assumption that the value will remain unchanged, so it can't optimise this code by caching the value of the first call to pcoll.PointCount.
It may have changed in the meantime.
Indeed, one of the reasons to test i < pcoll.PointCount every iteration rather than just using foreach(var point in pcoll) is precisely because you think the collection might change in the meantime, and enumerators don't guarantee to cope with changes to the collection they enumerate.
This differs from, for example, an array accessed through a local variable, because the only way the Length of an array accessed through a local variable can change, is if the change is made locally.
Even there though, it's worth remembering that the compiler often skips some obvious optimisations because it's known that the jitter makes the same optimisation too.
The expected optimization is true for fields. But property has setter/getter (accessing property is in fact calling them as methods), so compiler will have hard time to try to optimize it.
To fix, make it a field or read it once
var max = pcoll.PointCount;
for (int i = 0; i < max; i++) { /* ... */ }
I use both Javascript and C# on a daily basis and I sometimes have to consider hoisting when using Javascript. However, C# doesn't seem to implement hoisting(that I know of) and I can't figure out why. Is it more of a design choice or is it more akin to a security or language constraint that applies to all statically typed languages?
For the record, I'm not saying i WANT it to exist in C#. I just want to understand why it doesn't.
EDIT: I noticed the issue when I declared a variable after a LINQ query, but the LINQ query was deferred until after the variable declaration.
var results = From c In db.LoanPricingNoFee Where c.LoanTerm == LoanTerm
&& c.LoanAdvance <= UpperLimit Select c
Order By c.LoanInstalment Ascending;
Int LoanTerm = 12;
Throws an error whereas:
int LoanTerm = 12;
var results = From c In db.LoanPricingNoFee Where c.LoanTerm == LoanTerm
&& c.LoanAdvance <= UpperLimit Select c
Order By c.LoanInstalment Ascending;
Does not.
Of all the programming languages I have used, Javascript has the most confusing scope system and hoisting is a part of that. The outcome is that it is easy to write unpredictable code in JavaScript and you have to be careful with how you write it to make it into the powerful and expressive language it can be.
C#, in common with almost every other language, assumes that you will not use a variable until you have declared it. Because it has a compiler it can enforce that by simply refusing to compile if you try to use an undeclared variable. The other approach to this, more often seen in scripting languages, is that if a variable is used without having been declared it is instantiated at first use. This can make it somewhat hard to follow the flow of code and is often used as a criticism of languages that behave that way. Most people who have used languages with block level scope ( where variables only exist at the level where they were declared ) find it a particularly weird feature of Javascript.
A couple of big reasons that hoisting can cause problems:
It is absolutely counter-intuitive and makes code harder to read and its behaviour harder to predict unless you are conscious of this behaviour. Hard to read and hard to predict code is far more likely to include bugs.
In terms of limiting the number of bugs in your code, limiting the lifetime of your variables can be really helpful. If you can declare the variable and use it in two lines of code, then having ten lines of code in between those two lines gives a lot of opportunities to accidentally affect the behaviour of the variable. There is a lot of information on this in Code Complete - if you haven't read that, I heartily recommend it.
There is a classic UX concept of the Principle Of Least Astonishment - features like hoisting ( or like the way Javascript handles equality ) tend to break that. People don't often think of user experience when developing programming languages, but actually programmers tend to be quite discerning users and more than a little grumpy when they find themselves routinely caught out by odd features. Javascript is very lucky that it's unique ubiquity in the browser has created a kind of enforced popularity that meant we have to tolerate its many quirks and problematic design decisions.
Finally, I cannot imagine a reason why it would be a useful addition to a language like C#- what possible benefit could it confer?
"Is it more of a design choice or is it more akin to a security or language constraint that applies to all statically typed languages?"
It's not a constraint of static typing. It would be trivial for the compiler to move all variable declarations to the top of the scope (in Javascript this is the top of the function, in C# the top of the current block) and to error if a name was declared with different types.
So the reason hoisting doesn't exist in C# is purely a design decision. Why it was designed that way I can't say I wasn't on the team. But it was probably due to the ease of parsing (both for human programmers and the compiler) if variables are always declared before use.
There is a form of Hoisting that exists in C# (and Java), in the context of Loop-invariant code motion - which is the JIT compiler optimization which "hoists" (pulls up) expressions from loop statements that don't effect the actual loop.
You can learn more about it here.
Quote:
“Hoisting” is a compiler optimization that moves loop-invariant code
out of loops. “Loop-invariant code” is code that is referentially
transparent to the loop and can be replaced with its values, so that
it doesn’t change the semantic of the loop. This optimization improves
runtime performance by executing the code only once rather than at
each iteration.
So this written code
public void Update(int[] arr, int x, int y)
{
for (var i = 0; i < arr.Length; i++)
{
arr[i] = x + y;
}
}
is actually optimized to be somewhat like this:
public void Update(int[] arr, int x, int y)
{
var temp = x + y;
var length = arr.Length;
for (var i = 0; i < length; i++)
{
arr[i] = temp;
}
}
This happens in the JIT - i.e. when translating the IL into native machine instructions so its not so easy to view (you can check here, and here).
I'm not an expert in reading assembly, but here is what I got from running this snippet with BenchmarkDotNet, and my comments on it showing that the optimization actually took place:
int[] arr = new int[10];
int x = 11;
int y = 19;
public void Update()
{
for (var i = 0; i < arr.Length; i++)
{
arr[i] = x + y;
}
}
Generated:
Because it is a faulty concept, most probably existing because of rushed implementation of JavaScript. It is a bad approach to coding, which can mislead even experienced javascript coder about scope of a variable.
Function hoisting has a potentially unnecessary cost in work that the compiler has to fulfill. For example, if a variable declaration is never even reached because various code control decisions returned the function, then the processor does not need to waste time pushing an undefined null-reference variable onto the stack memory and then popping it from the stack as part of it's method's clean up operations when it wasn't even reached.
Also, remember that JavaScript has "variable hoisting" and "function hoisting" (among others) which are treated differently. Function hoisting wouldn't make sense in C# since it is not a top-down interpreted language. Once the code is compiled, the method might not ever be called. In JavaScript, however, the "self-invoking" functions are evaluated immediately as the interpreter parses them.
I doubt that it was an arbitrary design decision: Not only is hoisting inefficient for C#, but it just wouldn't make sense for the way that C# works.
I quite often write code that copies member variables to a local stack variable in the belief that it will improve performance by removing the pointer dereference that has to take place whenever accessing member variables.
Is this valid?
For example
public class Manager {
private readonly Constraint[] mConstraints;
public void DoSomethingPossiblyFaster()
{
var constraints = mConstraints;
for (var i = 0; i < constraints.Length; i++)
{
var constraint = constraints[i];
// Do something with it
}
}
public void DoSomethingPossiblySlower()
{
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
// Do something with it
}
}
}
My thinking is that DoSomethingPossiblyFaster is actually faster than DoSomethingPossiblySlower.
I know this is pretty much a micro optimisation, but it would be useful to have a definitive answer.
Edit
Just to add a little bit of background around this. Our application has to process a lot of data coming from telecom networks, and this method is likely to be called about 1 billion times a day for some of our servers. My view is that every little helps, and sometimes all I am trying to do is give the compiler a few hints.
Which is more readable? That should usually be your primary motivating factor. Do you even need to use a for loop instead of foreach?
As mConstraints is readonly I'd potentially expect the JIT compiler to do this for you - but really, what are you doing in the loop? The chances of this being significant are pretty small. I'd almost always pick the second approach simply for readability - and I'd prefer foreach where possible. Whether the JIT compiler optimizes this case will very much depend on the JIT itself - which may vary between versions, architectures, and even how large the method is or other factors. There can be no "definitive" answer here, as it's always possible that an alternative JIT will optimize differently.
If you think you're in a corner case where this really matters, you should benchmark it - thoroughly, with as realistic data as possible. Only then should you change your code away from the most readable form. If you're "quite often" writing code like this, it seems unlikely that you're doing yourself any favours.
Even if the readability difference is relatively small, I'd say it's still present and significant - whereas I'd certainly expect the performance difference to be negligible.
If the compiler/JIT isn't already doing this or a similar optimization for you (this is a big if), then DoSomethingPossiblyFaster should be faster than DoSomethingPossiblySlower. The best way to explain why is to look at a rough translation of the C# code to straight C.
When a non-static member function is called, a hidden pointer to this is passed into the function. You'd have roughly the following, ignoring virtual function dispatch since it's irrelevant to the question (or equivalently making Manager sealed for simplicity):
struct Manager {
Constraint* mConstraints;
int mLength;
}
void DoSomethingPossiblyFaster(Manager* this) {
Constraint* constraints = this->mConstraints;
int length = this->mLength;
for (int i = 0; i < length; i++)
{
Constraint constraint = constraints[i];
// Do something with it
}
}
void DoSomethingPossiblySlower()
{
for (int i = 0; i < this->mLength; i++)
{
Constraint constraint = (this->mConstraints)[i];
// Do something with it
}
}
The difference is that in DoSomethingPossiblyFaster, mConstraints lives on the stack and access only requires one layer of pointer indirection, since it's at a fixed offset from the stack pointer. In DoSomethingPossiblySlower, if the compiler misses the optimization opportunity, there's an extra pointer indirection. The compiler has to read a fixed offset from the stack pointer to access this and then read a fixed offset from this to get mConstraints.
There are two possible optimizations that could negate this hit:
The compiler could do exactly what you did manually and cache mConstraints on the stack.
The compiler could store this in a register so that it doesn't need to fetch it from the stack on every loop iteration before dereferencing it. This means that fetching mConstraints from this or from the stack is basically the same operation: A single dereference of a fixed offset from a pointer that's already in a register.
You know the response you will get, right? "Time it."
There is probably not a definitive answer. First, the compiler might do the optimization for you. Second, even if it doesn't, indirect addressing at the assembly level may not be significantly slower. Third, it depends on the cost of making the local copy, compared to the number of loop iterations. Then there are caching effects to consider.
I love to optimize, but this is one place I would definitely say wait until you have a problem, then experiment. This is a possible optimization that can be added when needed, not one of those optimizations that needs to be planned up front to avoid a massive ripple effect later.
Edit: (towards a definitive answer)
Compiling both functions in release mode and examining the IL with IL Dasm shows that in both places the "PossiblyFaster" function uses the local variable, it has one less instruction
ldloc.0 vs
ldarg.0; ldfld class Constraint[] Manager::mConstraints
Of course, this is still one level removed from the machine code - you don't know what the JIT compiler will do for you. But it is likely that "PossiblyFaster" is marginally faster.
However, I still don't recommend adding the extra variable until you are sure this function is the most expensive thing in your system.
I've profiled this and came up with a bunch of interesting results that are probably only valid for my specific example, but I thought would be worth while noting here.
The fastest is X86 release mode. That runs one iteration of my test in 7.1 seconds, whereas the equivalent X64 code takes 8.6 seconds. This was running 5 iterations, each iteration processing the loop 19.2 million times.
The fastest approach for the loop was:
foreach (var constraint in mConstraints)
{
... do stuff ...
}
The second fastest approach, which massively surprised me was the following
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
... do stuff ...
}
I guess this was because mConstraints was stored in a register for the loop.
This slowed down when I removed the readonly option for mConstraints.
So, my summary from this is that being readable in this situation does give performance as well.