Why isn't compiler optimizing away this code - c#

I have a code using third-party-tool iterating over a collection of points.
for (int i = 0; i < pcoll.PointCount; i++) { /* ... */ }
When doing profiling via dotTrace I noticed that the PointCount-proerty is accessed every iteration (see picture above)
.
I expected that the value for this property is optimized away by the compiler but obviously that doesn't happen. Maybe this is actually a problem within the COM-based 3rd-party lib or also within dotTrace self when collecting the information.
I'm not sure if this topic wouldn't fit better to Gis.StackExchange. However maybe someone has any idea under which circumstances optimzation won't take place or how it might happen.

Simply put, how is the compiler to know whether pcoll.PointCount will change between invocations? It can't safely make the assumption that the value will remain unchanged, so it can't optimise this code by caching the value of the first call to pcoll.PointCount.

It may have changed in the meantime.
Indeed, one of the reasons to test i < pcoll.PointCount every iteration rather than just using foreach(var point in pcoll) is precisely because you think the collection might change in the meantime, and enumerators don't guarantee to cope with changes to the collection they enumerate.
This differs from, for example, an array accessed through a local variable, because the only way the Length of an array accessed through a local variable can change, is if the change is made locally.
Even there though, it's worth remembering that the compiler often skips some obvious optimisations because it's known that the jitter makes the same optimisation too.

The expected optimization is true for fields. But property has setter/getter (accessing property is in fact calling them as methods), so compiler will have hard time to try to optimize it.
To fix, make it a field or read it once
var max = pcoll.PointCount;
for (int i = 0; i < max; i++) { /* ... */ }

Related

What is the benefit of using a local variable?

I keep seeing examples online, where there is a property of an element within a method that is copied to a local variable before use. For example, something like this (from Microsoft's StackPanel source code):
UIElementCollection children = arrangeElement.InternalChildren;
...
for (int i = 0, count = children.Count; i < count; ++i)
{
UIElement child = (UIElement)children[i];
if (child == null) { continue; }
...
}
Can anyone explain to me what the benefit of doing that is (if there is one), rather than accessing the property directly each time, like this?:
for (int i = 0, count = arrangeElement.InternalChildren.Count; i < count; ++i)
{
UIElement child = (UIElement)arrangeElement.InternalChildren[i];
if (child == null) { continue; }
...
}
Clearly, it saves a few characters on the screen, but that's not much of a reason to do this. Also, I understand why we might want to do this with a long running method, as a form of caching:
double value = GetValueFromLongRunningMethod();
...
for (int i = 0; i < someCollection.Count; i++) DoSomethingWith(value);
But I see this done with properties a lot and wonder why. Here's another commonly found example from the internet to do with virtualization:
IItemContainerGenerator generator = this.ItemContainerGenerator;
GeneratorPosition position = generator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Why do that instead of this?:
GeneratorPosition position =
this.ItemContainerGenerator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Finally, if this is done for the same reason that we might cache the result of a long running method, then how are we supposed to know which properties need to be accessed in this way?
Firstly, it avoids calling .InternalChildren lots of times. This could be a small but noticeable reduction of virtual calls (since it is used in a loop), but in some cases it might be much more significant. In some cases, a property that returns a collection or array might allocate every time it is called; DataRow.ItemArray is a classic example of this - so it is actively harmful to call it each time. An additional consideration is that even if it returns the same array each time it is called, there is JIT magic that happens to elide bounds checking, but it'll only work if the JIT can see that you are iterating a single array for the entire duration. If you stick a property accessor in the middle: this won't be obvious and the bounds check removal won't happen. It also might not happen if you've manually hoisted the upper bound!
Side note: if it isn't an array, then foreach would probably usually be preferable, and there would not be any advantage to introducing a local, due to how foreach works internally.
Note: since you're using .Count vs .Length, this definitely isn't an array, and you should probably simplify to:
foreach(UIElement child = in arrangeElement.InternalChildren) {...}
or
foreach(var child = in arrangeElement.InternalChildren) {...}
Not only does this remove this question completely, but it means that the type's own iterator (which might be an optimized struct iterator, or might be a simple IEnumerable<T> class, such as a compiler-generated iterator block) can be used. This usually has more direct access to the internals, and thus bypasses a few indirections and API checks that indexers require.
It might be fruitful in some cases like when you have to
debug some piece of code and you need to instantly see the value of variable
do a few operations at a time with an object, which requires casting - as result you cast it once
and sometimes, when you use value type objects this kind of making a local copy gives you an opportunity to not change the value of class' property
Why do that instead of this?:
GeneratorPosition position =
this.ItemContainerGenerator.GeneratorPositionFromIndex(firstVisibleItemIndex);
Let's get very abstract about this:
We get a generator. That apparently is this.ItemContainerGenerator for now, but that could change.
We use it. Only once here, but usually in multiple statements.
When we later decide to get that generator elsewhere, the usage should stay the same.
The example is too small to make this convincing, but there is some kind of logic to be discerned here.

Compiler optimization of properties that remain static for the duration of a loop

I was reading Improving .NET Application Performance and Scalability. The section titled Avoid Repetitive Field or Property Access contains a guideline:
If you use data that is static for the duration of the loop, obtain it
before the loop instead of repeatedly accessing a field or property.
The following code is given as an example of this:
for (int item = 0; item < Customer.Orders.Count; item++)
{
CalculateTax(Customer.State, Customer.Zip, Customer.Orders[item]);
}
becomes
string state = Customer.State;
string zip = Customer.Zip;
int count = Customers.Orders.Count;
for (int item = 0; item < count; item++)
{
CalculateTax(state, zip, Customer.Orders[item]);
}
The article states:
Note that if these are fields, it may be possible for the compiler to
do this optimization automatically. If they are properties, it is much
less likely. If the properties are virtual, it cannot be done
automatically.
Why is it "much less likely" for properties to be optimized by the compiler in this manner, and when can one expect for a particular property to be or not to be optimized? I would assume that properties where additional operations are performed in the accessors are harder for the compiler to optimize, and that those that only modify a backing field are more likely to be optimized, but would like some more concrete rules. Are auto-implemented properties always optimized?
It requires the jitter to apply two optimizations:
First the property getter method must be inlined so it turns into the equivalent of a field access. That tends to work when the getter is small and does not throw exceptions. This is necessary so the optimizer can be sure that the getter does not rely on state that can be affected by other code.
Note how the hand-optimized code would be wrong if, say, the Customer.Orders[] indexer would alter the Customer.State property. Lazy code like this is pretty unlikely of course but it's not like this has never been done :) The optimizer has to be sure.
Secondly, the field access code has to be hoisted out of the loop body. An optimization called "invariant code motion". Works on simple property getter code when the jitter can prove that the statements inside the loop body don't affect the value.
The jitter optimizer implements it but it is not stellar at it. In this particular case it is pretty likely that it will give up when it cannot inline the CalculateTax() method. A native compiler optimizes it much more aggressively, it can afford to burn the memory and analysis time on it. The jitter optimizer must meet a pretty hard deadline to avoid pauses.
Do keep the constraints of the optimizer in mind when you do this yourself. Pretty darn ugly bug of course if these methods do have side-effects that you did not count on. And only do this when the profiler told you that this code is on the hot path, the typical ~10% of your code that actually affects the execution time. Low odds here, the dbase query to get customer/order data is going to orders of magnitude more expensive than calculating tax. Luckily code transforms like this also tend to make code more readable so you usually get it for free. YMMV.
A backgrounder on jitter optimizations is here.
Why is it "much less likely" for properties to be optimized by the compiler in this manner, and when can one expect for a particular property to be or not to be optimized?
Properties are not always just wrappers for a field. If there is any degree of logic in a property, it becomes significantly more difficult for a compiler to prove that it is correct to re-use the value it first got when the loop began.
As an extreme example, consider
private Random rnd = new Random();
public int MyProperty
{
get { return rnd.Next(); }
}

In C#, does copying a member variable to a local stack variable improve performance?

I quite often write code that copies member variables to a local stack variable in the belief that it will improve performance by removing the pointer dereference that has to take place whenever accessing member variables.
Is this valid?
For example
public class Manager {
private readonly Constraint[] mConstraints;
public void DoSomethingPossiblyFaster()
{
var constraints = mConstraints;
for (var i = 0; i < constraints.Length; i++)
{
var constraint = constraints[i];
// Do something with it
}
}
public void DoSomethingPossiblySlower()
{
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
// Do something with it
}
}
}
My thinking is that DoSomethingPossiblyFaster is actually faster than DoSomethingPossiblySlower.
I know this is pretty much a micro optimisation, but it would be useful to have a definitive answer.
Edit
Just to add a little bit of background around this. Our application has to process a lot of data coming from telecom networks, and this method is likely to be called about 1 billion times a day for some of our servers. My view is that every little helps, and sometimes all I am trying to do is give the compiler a few hints.
Which is more readable? That should usually be your primary motivating factor. Do you even need to use a for loop instead of foreach?
As mConstraints is readonly I'd potentially expect the JIT compiler to do this for you - but really, what are you doing in the loop? The chances of this being significant are pretty small. I'd almost always pick the second approach simply for readability - and I'd prefer foreach where possible. Whether the JIT compiler optimizes this case will very much depend on the JIT itself - which may vary between versions, architectures, and even how large the method is or other factors. There can be no "definitive" answer here, as it's always possible that an alternative JIT will optimize differently.
If you think you're in a corner case where this really matters, you should benchmark it - thoroughly, with as realistic data as possible. Only then should you change your code away from the most readable form. If you're "quite often" writing code like this, it seems unlikely that you're doing yourself any favours.
Even if the readability difference is relatively small, I'd say it's still present and significant - whereas I'd certainly expect the performance difference to be negligible.
If the compiler/JIT isn't already doing this or a similar optimization for you (this is a big if), then DoSomethingPossiblyFaster should be faster than DoSomethingPossiblySlower. The best way to explain why is to look at a rough translation of the C# code to straight C.
When a non-static member function is called, a hidden pointer to this is passed into the function. You'd have roughly the following, ignoring virtual function dispatch since it's irrelevant to the question (or equivalently making Manager sealed for simplicity):
struct Manager {
Constraint* mConstraints;
int mLength;
}
void DoSomethingPossiblyFaster(Manager* this) {
Constraint* constraints = this->mConstraints;
int length = this->mLength;
for (int i = 0; i < length; i++)
{
Constraint constraint = constraints[i];
// Do something with it
}
}
void DoSomethingPossiblySlower()
{
for (int i = 0; i < this->mLength; i++)
{
Constraint constraint = (this->mConstraints)[i];
// Do something with it
}
}
The difference is that in DoSomethingPossiblyFaster, mConstraints lives on the stack and access only requires one layer of pointer indirection, since it's at a fixed offset from the stack pointer. In DoSomethingPossiblySlower, if the compiler misses the optimization opportunity, there's an extra pointer indirection. The compiler has to read a fixed offset from the stack pointer to access this and then read a fixed offset from this to get mConstraints.
There are two possible optimizations that could negate this hit:
The compiler could do exactly what you did manually and cache mConstraints on the stack.
The compiler could store this in a register so that it doesn't need to fetch it from the stack on every loop iteration before dereferencing it. This means that fetching mConstraints from this or from the stack is basically the same operation: A single dereference of a fixed offset from a pointer that's already in a register.
You know the response you will get, right? "Time it."
There is probably not a definitive answer. First, the compiler might do the optimization for you. Second, even if it doesn't, indirect addressing at the assembly level may not be significantly slower. Third, it depends on the cost of making the local copy, compared to the number of loop iterations. Then there are caching effects to consider.
I love to optimize, but this is one place I would definitely say wait until you have a problem, then experiment. This is a possible optimization that can be added when needed, not one of those optimizations that needs to be planned up front to avoid a massive ripple effect later.
Edit: (towards a definitive answer)
Compiling both functions in release mode and examining the IL with IL Dasm shows that in both places the "PossiblyFaster" function uses the local variable, it has one less instruction
ldloc.0 vs
ldarg.0; ldfld class Constraint[] Manager::mConstraints
Of course, this is still one level removed from the machine code - you don't know what the JIT compiler will do for you. But it is likely that "PossiblyFaster" is marginally faster.
However, I still don't recommend adding the extra variable until you are sure this function is the most expensive thing in your system.
I've profiled this and came up with a bunch of interesting results that are probably only valid for my specific example, but I thought would be worth while noting here.
The fastest is X86 release mode. That runs one iteration of my test in 7.1 seconds, whereas the equivalent X64 code takes 8.6 seconds. This was running 5 iterations, each iteration processing the loop 19.2 million times.
The fastest approach for the loop was:
foreach (var constraint in mConstraints)
{
... do stuff ...
}
The second fastest approach, which massively surprised me was the following
for (var i = 0; i < mConstraints.Length; i++)
{
var constraint = mConstraints[i];
... do stuff ...
}
I guess this was because mConstraints was stored in a register for the loop.
This slowed down when I removed the readonly option for mConstraints.
So, my summary from this is that being readable in this situation does give performance as well.

Does the c# compiler optimizes Count properties?

List<int> list = ...
for(int i = 0; i < list.Count; ++i)
{
...
}
So does the compiler know the list.Count does not have to be called each iteration?
Are you sure about that?
List<int> list = new List<int> { 0 };
for (int i = 0; i < list.Count; ++i)
{
if (i < 100)
{
list.Add(i + 1);
}
}
If the compiler cached the Count property above, the contents of list would be 0 and 1. If it did not, the contents would be the integers from 0 to 100.
Now, that might seem like a contrived example to you; but what about this one?
List<int> list = new List<int>();
int i = 0;
while (list.Count <= 100)
{
list.Add(i++);
}
It may seem as if these two code snippets are completely different, but that's only because of the way we tend to think about for loops versus while loops. In either case, the value of a variable is checked on every iteration. And in either case, that value very well could change.
Typically it's not safe to assume the compiler optimizes something when the behavior between "optimized" and "non-optimized" versions of the same code is actually different.
The C# compiler does not do any optimizations like this. The JIT compiler, however, optimizes this for arrays, I believe (which are not resizable), but not for lists.
A List's count property can change within the loop structure, so it would be an incorrect optimization.
It's worth noting, as nobody else has mentioned it, that there is no knowing from looking at a loop like this what the "Count" property will actually do, or what side effects it may have.
Consider the following cases:
A third party implementation of a property called "Count" could execute any code it wished to. e.g. return a Random number for all we know. With List we can be a bit more confident about how it will operate, but how is the JIT to tell these implementations apart?
Any method call within the loop could potentially alter the return value of Count (not just a straight "Add" directly on the collection, but a user method that is called in the loop might also party on the collection)
Any other thread that happens to be executing concurrently could also change the Count value.
The JIT just can't "know" that Count is constant.
However, the JIT compiler can make the code run much more efficiently by inlining the implementation of the Count property (as long as it is a trivial implementation). In your example it may well be inlined down to a simple test of a variable value, avoiding the overhead of a function call on each iteration, and thus making the final code nice and fast. (Note: I don't know if the JIT will do this, just that it could. I don't really care - see the last sentence of my answer to find out why)
But even with inlining, the value may still be changed between iterations of the loop, so it would still need to be read from RAM for each comparison. If you were to copy Count into a local variable and the JIT could determine by looking at the code in the loop that the local variable will remain constant for the loop's lifetime, then it may be able to further optimise it (e.g. by holding the constant value in a register rather than having to read it from RAM on each iteration). So if you (as a programmer) know that Count will be constant for the lifetime of the loop, you may be able to help the JIT by caching Count in a local variable. This gives the JIT the best chance of optimising the loop. (But there are no guarantees that the JIT will actually apply this optimisation, so it may make no difference to the execution times to manually "optimise" this way. You also risk things going wrong if your assumption (that Count is constant) is incorrect. Or your code may break if another programmer edits the contents of the loop so that Count is no longer constant, and he doesn't spot your cleverness)
So the moral of the story is: The JIT can make a pretty good stab at optimising this case by inlining. Even if it doesn't do this now, it may do it with the next C# version. You might not gain any advantage by manually "optmising" the code, and you risk changing its behaviour and thus breaking it, or at least making future maintenance of your code more risky, or possibly losing out on future JIT enhancements. So the best approach is to just write it the way you have, and optimise it when your profiler tells you that the loop is your performance bottleneck.
Hence, IMHO it's interesting to consider/understand cases like this, but ultimately you don't actually need to know. A little bit of knowledge can be a dangerous thing. Just let the JIT do its thing, and then profile the result to see if it needs improving.
If you take a look at the IL generated for Dan Tao's example you'll see a line like this at the condition of the loop:
callvirt instance int32 [mscorlib]System.Collections.Generic.List`1<int32>::get_Count()
This is undeniable proof that Count (i.e. get_Count()) is called for every iteration of the loop.
For all the other commenters who say that the 'Count' property could change in a loop body: JIT optimizations let you take advantage of the actual code that's running, not the worst-case of what might happen. In general, the Count could change. But it doesn't in all code.
So in the poster's example (which might not have any Count-changing), is it unreasonable for the JIT to detect that the code in the loop doesn't change whatever internal variable List uses to hold its length? If it detects that list.Count is constant, wouldn't it lift that variable access out of the loop body?
I don't know if the JIT does this or not. But I am not so quick to brush this problem off as trivially "never."
No, it doesn't. Because condition is calculated on each step. It can be more complex than just comparsion with count, and any boolean expression is allowed:
for(int i = 0; new Random().NextDouble() < .5d; i++)
Console.WriteLine(i);
http://msdn.microsoft.com/en-us/library/aa664753(VS.71).aspx
It depends on the particular implementation of Count; I've never noticed any performance issues with using the Count property on a List so I assume it's ok.
In this case you can save yourself some typing with a foreach.
List<int> list = new List<int>(){0};
foreach (int item in list)
{
// ...
}

Why doesn't the C# compiler stop properties from referring to themselves?

If I do this I get a System.StackOverflowException:
private string abc = "";
public string Abc
{
get
{
return Abc; // Note the mistaken capitalization
}
}
I understand why -- the property is referencing itself, leading to an infinite loop. (See previous questions here and here).
What I'm wondering (and what I didn't see answered in those previous questions) is why doesn't the C# compiler catch this mistake? It checks for some other kinds of circular reference (classes inheriting from themselves, etc.), right? Is it just that this mistake wasn't common enough to be worth checking for? Or is there some situation I'm not thinking of, when you'd want a property to actually reference itself in this way?
You can see the "official" reason in the last comment here.
Posted by Microsoft on 14/11/2008 at
19:52
Thanks for the suggestion for
Visual Studio!
You are right that we could easily
detect property recursion, but we
can't guarantee that there is nothing
useful being accomplished by the
recursion. The body of the property
could set other fields on your object
which change the behavior of the next
recursion, could change its behavior
based on user input from the console,
or could even behave differently based
on random values. In these cases, a
self-recursive property could indeed
terminate the recursion, but we have
no way to determine if that's the case
at compile-time (without solving the
halting problem!).
For the reasons above (and the
breaking change it would take to
disallow this), we wouldn't be able to
prohibit self-recursive properties.
Alex Turner
Program Manager
Visual C# Compiler
Another point in addition to Alex's explanation is that we try to give warnings for code which does something that you probably didn't intend, such that you could accidentally ship with the bug.
In this particular case, how much time would the warning actually save you? A single test run. You'll find this bug the moment you test the code, because it always immediately crashes and dies horribly. The warning wouldn't actually buy you much of a benefit here. The likelihood that there is some subtle bug in a recursive property evaluation is low.
By contrast, we do give a warning if you do something like this:
int customerId;
...
this.customerId= this.customerId;
There's no horrible crash-and-die, and the code is valid code; it assigns a value to a field. But since this is nonsensical code, you probably didn't mean to do it. Since it's not going to die horribly, we give a warning that there's something here that you probably didn't intend and might not otherwise discover via a crash.
Property referring to itself does not always lead to infinite recursion and stack overflow. For example, this works fine:
int count = 0;
public string Abc
{
count++;
if (count < 1) return Abc;
return "Foo";
}
Above is a dummy example, but I'm sure one could come up with useful recursive code that is similar. Compiler cannot determine if infinite recursion will happen (halting problem).
Generating a warning in the simple case would be helpful.
They probably considered it would unnecessary complicate the compiler without any real gain.
You will discover this typo easily the first time you call this property.
First of all, you'll get a warning for unused variable abc.
Second, there is nothing bad in teh recursion, provided that it's not endless recursion. For example, the code might adjust some inner variables and than call the same getter recursively. There is however for the compiler no easy way at all to prove that some recursion is endless or not (the task is at least NP). The compiler could catch some easy cases, but then the consumers would be surprised that the more complicated cases get through the compiler's checks.
The other cases cases that it checks for (except recursive constructor) are invalid IL.
In addition, all of those cases, even recursive constructors) are guarenteed to fail.
However, it is possible, albeit unlikely, to intentionally create a useful recursive property (using if statements).

Categories

Resources