I've been doing a lot of research on C# optimization for a game that I'm building with XNA, and I still don't quite understand whether local variables are instance variables give better performance when constantly being updated and used.
According to http://www.dotnetperls.com/optimization , you should avoid parameters and local variables, meaning instance variables are the best option in terms of performance.
But a while ago, I read on another StackOverflow post (I can't seem to find where it was) that local variables are stored in a part of memory that is far quicker to access, and that every time an instance variable is set, the previous value has to be erased as a tedious extra step before a new value can be assigned.
I know that design-wise, it might break encapsulation to use instance variables in that kind of situation, but I'm strictly curious about performance. Currently in my game, I pass around local variables to 3 out of 7 methods in a class, but I could easily promote the variables to instance variables and be able to entirely avoid parameter passing and local variables.
So which would be better?
Are your variables reference (class, or string) or value (struct) types?
For reference types there's no meaningful difference between passing them as a method argument and holding them on an object instance. In the first case when entering the function the argument will (for functions with a small argument count) end up in a register. In the second case the reference exists as an offset of the data pointed to in memory by 'this'. Either scenario is a quick grab of a memory address and then fetching the associated data out of memory (this is the expensive part).
For value types the above is true for certain types (integers or floats that can fit in your CPU's registers). For those specific things it's probably a little cheaper to pass-by-value vs. extracting them off 'this'. For other value types (DateTime or structs you might make yourself or any struct with multiple members) when the data is going to be too large to pass in through a register so this no longer matters.
It's pretty unlikely, though, that any of this matters for the performance of your application (even a game). Most common .NET performance problems (that are not simply inefficient algorithms) are going to, in some form, come from garbage generation. This can manifest itself through accidental boxing, poor use of string building, or poor object lifetime management (your objects have lifespans that are neither very short nor very long/permanent).
Personally, I wouldn't be looking at this as the culprit for performance issues (unless you are constantly passing large structs). My naive understanding is that GC pressure is the usual consideration with XNA games, so being frugal with your object instances basically.
If the variable is method-local, the value itself or the reference (when a reference type) will be located on the stack. If you promote those to class member variables they will be located in the class's memory on the heap.
Method calls would technically become faster as you are no longer copying references or values on the call (because presumably you can remove the parameters from the method if the method is also local to the class).
I'm not sure about the relative performance, but to me it seems that if you need to persist the value then the value makes some sense being in the class...
To me it seems like any potential gains from doing one in favour of the other is outweighed by the subtle differences between the two - making them roughly equivalent or so small a difference as to not care.
Of course, all this stands to be corrected in the face of hard numbers from performance profiling.
Not passing arguments will be slightly faster, not initialising local objects (if they are objects) will be faster.
What you read in both articles is not contradictory, one is mentioning that passing argument costs time and the other mention that initialising objects (in local objects) can cost time as well.
About allocating new objects, one thing you can do is to reuse objects rather discarding them. For example, some time ago, I had to write some code for traders which would compute the price of a few products in real time in C/C++ and C#. I obtained a major boost of performance by not re-creating the system of equation from scratch but only by merely updating the system incrementally from the previous system.
This avoided allocating memory for the new objects, initialising new objects and often the system would be nearly the same so I would have only to modify tiny bits to update it.
Usually, before to do any optimisation you want to make sure that you are optimising something that will impact significantly the overall performance ?
Related
I'm asking for a project in C#, but I assume this question applies to other languages as well. I've heard that massive object creation and destruction causes massive overhead and performance issues. I'm wondering if I can get around this by simply using structs instead of objects.
"Making struct instead of object" - as you term it (I suppose what you mean by object is class) would most likely be of little help since creating struct instance, due to struct's nature, will require you to refer it by value rather than by by reference - and this may (not always) make your memory use heavier
That being said, what you probably need is Flyweight Design Pattern
From https://sourcemaking.com/design_patterns/flyweight:
Flyweight Design Pattern
Intent
Use sharing to support large numbers of fine-grained objects efficiently.
The Motif GUI strategy of replacing heavy-weight widgets with light-weight gadgets.
Problem
Designing objects down to the lowest levels of system "granularity" provides optimal flexibility, but can be unacceptably expensive in terms of performance and memory usage.
Discussion
The Flyweight pattern describes how to share objects to allow their use at fine granularities without prohibitive cost. Each "flyweight" object is divided into two pieces: the state-dependent (extrinsic) part, and the state-independent (intrinsic) part. Intrinsic state is stored (shared) in the Flyweight object. Extrinsic state is stored or computed by client objects, and passed to the Flyweight when its operations are invoked.
Here is some facts that you must now about struct and class in C# :
A struct in C# is faster to create than a class since it's allocated on the stack and not on the heap
struct is a value type, class is a reference type. So working with a reference type (passing it as parameter, copying it, ...) is much faster than working with the value type. see Difference between struct and class
struct fields are fast to access than class fields since they are allocated on the stack
Here is some facts about how the GC works in .Net :
You can't have control on when the GC is triggered by the CLR, it can interrupts your program at any time (there is some options that you can use to tell the CLR that you are running a sensitive part of the code but it doesn't prevent the GC from running if memory is needed. see GC Latency Modes)
You can't have control on the time that the GC takes to do it's work
When the GC is doing a full collection, it freezes all your program threads (depending on whether you are in gcConcurrent or gcServer mode see gcServer mode ).
Knowing all of that and to be short, if you don't want your program to suffer from the GC work, you have to use reference types for objects that will live the longer in your program, and use value types for objects that will be used for a very short time and in a very localized scope.
Just curious about this. Following are two code snippets for the same function:
void MyFunc1()
{
int i = 10;
object obj = null;
if(something) return;
}
And the other one is...
void MyFunc1()
{
if(something) return;
int i = 10;
object obj = null;
}
Now does the second one has the benefit of NOT allocating the variables when something is true? OR the local stack variables (in current scope) are always allocated as soon as the function is called and moving the return statement to the top has no effect?
A link to dotnetperls.com article says "When you call a method in your C# program, the runtime allocates a separate memory region to store all the local variable slots. This memory is allocated on the stack even if you do not access the variables in the function call."
UPDATED
Here is a comparison of the IL code for these two functions. Func2 refers to second snipped. It seems like the variable in both the cases are allocated at the beginning, though in case of Func2() they are initialized later on. So no benefit as such I guess.
Peter Duniho's answer is correct. I want to draw attention to the more fundamental problem in your question:
does the second one have the benefit of NOT allocating the variables when something is true?
Why ought that to be a benefit? Your presumption is that allocating the space for a local variable has a cost, that not doing so has a benefit and that this benefit is somehow worth obtaining. Analyzing the actual cost of local variables is very, very difficult; the presumption that there is a clear benefit in avoiding an allocation conditionally is not warranted.
To address your specific question:
The local stack variables (in current scope) are always allocated as soon as the function is called and moving the return statement to the top has no effect?
I can't answer such a complicated question easily. Let's break it down into much simpler questions:
Variables are storage locations. What are the lifetimes of the storage locations associated with local variables?
Storage locations for "ordinary" local variables -- and formal parameters of lambdas, methods, and so on -- have short, predictable lifetimes. None of them live before the method is entered, and none of them live after the method terminates, either normally or exceptionally. The C# language specification clearly calls out that local variable lifetimes are permitted to be shorter at runtime than you might think if doing so does not cause an observable change to a single-threaded program.
Storage locations for "unusual" local variables -- outer variables of lambdas, local variables in iterator blocks, local variables in async methods, and so on -- have lifetimes which are difficult to analyze at compile time or at run time, and are therefore moved to the garbage-collected heap, which uses GC policy to determine the lifetimes of the variables. There is no requirement that such variables ever be cleaned up; their storage lifetime can be extended arbitrarily at the whim of the C# compiler or the runtime.
Can a local which is unused be optimized away entirely?
Yes. If the C# compiler or the runtime can determine that removing the local from the program entirely has no observable effect in a single-threaded program, then it may do so at its whim. Essentially this is reducing its lifetime to zero.
How are storage locations for "ordinary" locals allocated?
This is an implementation detail, but typically there are two techniques. Either space is reserved on the stack, or the local is enregistered.
How does the runtime determine whether a local is enregistered or put on the stack?
This is an implementation detail of the jitter's optimizer. There are many factors, such as:
whether the address of the local could possibly be taken; registers have no address
whether the local is passed as a parameter to another method
whether the local is a parameter of the current method
what the calling conventions are of all the methods involved
the size of the local
and many, many more factors
Suppose we consider only the ordinary locals which are put on the stack. Is it the case that storage locations for all such locals are allocated when a method is entered?
Again, this is an implementation detail, but typically the answer is yes.
So a "stack local" that is used conditionally would not be allocated off the stack conditionally? Rather, its stack location would always be allocated.
Typically, yes.
What are the performance tradeoffs inherent in that decision?
Suppose we have two locals, A and B, and one is used conditionally and the other is used unconditionally. Which is faster:
Add two units to the current stack pointer
Initialize the two new stack slots to zero
or
Add one unit to the current stack pointer
Initialize the new stack slot to zero
If the condition is met, add one unit to the current stack pointer and initialize the new stack slot to zero
Keep in mind that "add one" and "add two" have the same cost.
This scheme is not cheaper if the variable B is unused, and has twice the cost if it is used. That's not a win.
But what about space? The conditional scheme uses either one or two units of stack space but the unconditional scheme uses two regardless.
Correct. Stack space is cheap. Or, more accurately, the million bytes of stack space you get per thread is insanely expensive, and that expense is paid up front, when you allocate the thread. Most programs never use anywhere close to a million bytes of stack space; trying to optimize use of that space is like spending an hour deciding whether to pay $5.01 for a latte vs $5.02 when you have a million dollars in the bank; it's not worth it.
Suppose 100% of the stack-based locals are allocated conditionally. Could the jitter put the addition to the stack pointer after the conditional code?
In theory, yes. Whether the jitter actually makes this optimization -- an optimization which saves literally less than a billionth of a second -- I don't know. Keep in mind that any code the jitter runs to make the decision to save that billionth of a second is code that takes far more than a billionth of a second. Again, it makes no sense to spend hours worrying about pennies; time is money.
And of course, how realistic is it that the billionth of a second you save will be the common path? Most method calls do something, not return immediately.
Also, keep in mind that the stack pointer is going to have to move for all the temporary value slots that aren't enregistered, regardless of whether those slots have names or not. How many scenarios are there where the condition that determines whether or not the method returns itself has no subexpression which touches the stack? Because that's the condition you're actually proposing that gets optimized. This seems like a vanishingly small set of scenarios, in which you get a vanishingly small benefit. If I were writing an optimizer I would spend exactly zero percent of my valuable time on solving this problem, when there are far juicier low-hanging fruit scenarios that I could be optimizing for.
Suppose there are two locals that are each allocated conditionally under different conditions. Are there additional costs imposed by a conditional allocation scheme other than possibly doing two stack pointer moves instead of one or zero?
Yes. In the straightforward scheme where you move the stack pointer two slots and say "stack pointer is A, stack pointer + 1 is B", you now have a consistent-throughout-the-method way to characterize the variables A and B. If you conditionally move the stack pointer then sometimes the stack pointer is A, sometimes it is B, and sometimes it is neither. That greatly complicates all the code that uses A and B.
What if the locals are enregistered?
Then this becomes a problem in register scheduling; I refer you to the extensive literature on this subject. I am far from an expert in it.
The only way to know for sure when this happens for your program, when you run it, is to look at the code the JIT compiler emits when you run your program. None of us can even answer the specific question with authority (well, I guess someone who wrote the CLR could, provided they knew which version of the CLR you're using, and possible some other details about configuration and your actual program code).
Any allocation on the stack of a local variable is strictly "implementation detail". And the CLS doesn't promise us any specific implementation.
Some locals never wind up on the stack per se, normally due to being stored in a register, but it would be legal for the runtime to use heap space instead, as long as it preserves the normal lifetime semantics of a local vaiable.
See also Eric Lippert's excellent series The Stack Is An Implementation Detail
It maybe unclear for me but, when I read the msdn doc and I try to understand deeply Struct behaviour.
From msdn
Dealing with Stack :
This will yield performance gains.
and :
Whenever you have a need for a type that will be used often and is
mostly just a piece of data, structs might be a good option.
I don't understand because, I guess when I pass a Struct in parameter of a method the "copy value" process must be slower than "copy reference" process?
The cost of passing a struct is proportional to its size. If the struct is smaller than a reference or the same size as a reference then passing its value will have the same cost as passing a reference.
If not, then you are correct; copying the struct might be more expensive than copying the reference. That's why the design guidelines say to keep a struct small.
(Note that when you call a method on a struct, the "this" is actually passed as a reference to the variable that contains the struct value; that's how you can write a mutable struct.)
There are potential performance gains when using structs, but as you correctly point out, there are potential performance losses as well. Structs are cheap (in both memory and time) to allocate and cheap to deallocate (in time), and cheap to copy if they are small. References are slightly more expensive in both memory and time to allocate, more expensive to deallocate, and cheap to copy. If you have a large number of small structs -- say, a million Point structs -- then it will be cheaper to allocate and deallocate an array with a million structs in it than an array with a million references to a million instances of a Point class.
But if the struct is big, then all that additional copying might be more expensive than the benefit you get from the more efficient allocation and deallocation. You have to look at the whole picture when doing performance analysis; don't make the "struct vs class" decision on the basis of performance without empirical data to back up that decision.
There is much misinformation on the internet, in our own documentation, and in many books, about how memory management works behind the scenes in C#. If you are interested in learning what is myth and what is reality, I recommend reading my series of articles on the subject. Start from the bottom:
http://blogs.msdn.com/b/ericlippert/archive/tags/memory+management/
Another recommendation for structs is that they should be small; not larger than 16 bytes. That way they can be copied with a single instruction, or just a few instructions.
Copying a reasonably small amount of data will be almost as fast as copying a reference, and then it will be faster for the method to access the data as there is no redirection needed.
If the struct is smaller than a pointer (i.e. 32 or 64 bits), it will even be faster to copy the value than to copy a reference.
Even if a structure is a bit larger than a reference, there is still some overhead involved with creating objects. Each object has some overhead and has to be allocated as a separate memory block. A byte as a value type takes up just a single byte, but if you box the byte as an object, it will take up 16 or 24 bytes on the heap, plus another 4 or 8 bytes for the reference.
Anyhow, a decision to use a struct or a class should normally be about what kind of data they represent, and not just about performance. A structure works well for data that represent a single entity, so that you can treat it as a single value.
It is true about what you said regarding copy process as in copying a reference takes lesser time than copying a struct as all struct and references are stored on the stack. But the reason why msdn suggests using struct would give a performance gain is the time takes to access the stack and the heap.
If you need a type that contains mostly static data and does is not huge (meaning it does not contain huge arrays, multi dimensional or otherwise, of value types) then it would be wiser to use struct rather than reference types as the access for stack is much lower than the managed heap.
Along with that, the time taken for allocation and deallocation, or you can say in short, the management of heap is somewhat time consuming as compared to the stack.
You can have a better understanding of this topic here and here as this has been explained in detail.
Hope it helps.
I need to get three objects out of a function, my instinct is to create a new type to return the three refs. Or if the refs were the same type I could use an array. However pass-by-ref is easier:
private void Mutate_AddNode_GetGenes(ref NeuronGene newNeuronGene, ref ConnectionGene newConnectionGene1, ref ConnectionGene newConnectionGene2)
{
}
There's obviously nothing wrong with this but I hesitate to use this approach, mostly I think for reasons of aesthetics and psycholgical bias. Are there actually any good reasons to use one of these approaches over the others? Perhaps a performance issue with creating extra wrapper objects or pushing parameters onto the stack. Note that in my particular case this is CPU intensive code. CPU cycles matter.
Is there a more elegant C#2 of C#3 approach?
Thanks.
For almost all computing problems, you will not notice the CPU difference. Since your sample code has the word "Gene" in it, you may actually fall into the rare category of code that would notice.
Creating and destroying objects just to wrap other objects would cost a bit of performance (they need to be created and garbage collected after all).
Aesthetically I would not create an object just to group unrelated objects, but if they logically belong together it is perfectly fine to define a containing object.
If you're worrying about the performance of a wrapping type (which is a lot cleaner, IMHO), you should use a struct. Current 32-bits implementations of .NET (and the upcomming 64-bits 4.0) support inlining / optimizing away of structs in many cases, so you'd probably see no performance difference whatsoever between a struct and ref arguments.
Worrying about the relative execution speed of those two options is probably a premature optimization. Focus on getting the algorithm correct first, and having clean, maintainable code. When that's done, you can run a profiler on it and optimize the 20% of the code that takes 80% of the CPU time. Even if this method ends up being in that 20%, the difference between the two calling styles is probably to small to register.
So, performance issues aside, I'd probably use a container class. Since this method takes only those three parameters, and (presumably) modifies each one, it sounds like it would make sense to have it as a method of the container class, with three member variables instead of ref parameters.
This question has been puzzling me for a long time now. I come from a heavy and long C++ background, and since I started programming in C# and dealing with garbage collection I always had the feeling that such 'magic' would come at a cost.
I recently started working in a big MMO project written in Java (server side). My main task is to optimize memory comsumption and CPU usage. Hundreds of thousands of messages per second are being sent and the same amount of objects are created as well. After a lot of profiling we discovered that the VM garbage collector was eating a lot of CPU time (due to constant collections) and decided to try to minimize object creation, using pools where applicable and reusing everything we can. This has proven to be a really good optimization so far.
So, from what I've learned, having a garbage collector is awesome, but you can't just pretend it does not exist, and you still need to take care about object creation and what it implies (at least in Java and a big application like this).
So, is this also true for .NET? if it is, to what extent?
I often write pairs of functions like these:
// Combines two envelopes and the result is stored in a new envelope.
public static Envelope Combine( Envelope a, Envelope b )
{
var envelope = new Envelope( _a.Length, 0, 1, 1 );
Combine( _a, _b, _operation, envelope );
return envelope;
}
// Combines two envelopes and the result is 'written' to the specified envelope
public static void Combine( Envelope a, Envelope b, Envelope result )
{
result.Clear();
...
}
A second function is provided in case someone has an already made Envelope that may be reused to store the result, but I find this a little odd.
I also sometimes write structs when I'd rather use classes, just because I know there'll be tens of thousands of instances being constantly created and disposed, and this feels really odd to me.
I know that as a .NET developer I shouldn't be worrying about this kind of issues, but my experience with Java and common sense tells me that I should.
Any light and thoughts on this matter would be much appreciated. Thanks in advance.
Yes, it's true of .NET as well. Most of us have the luxury of ignoring the details of memory management, but in your case -- or in cases where high volume is causing memory congestion -- then some optimizaition is called for.
One optimization you might consider for your case -- something I've been thinking about writing an article about, actually -- is the combination of structs and ref for real deterministic disposal.
Since you come from a C++ background, you know that in C++ you can instantiate an object either on the heap (using the new keyword and getting back a pointer) or on the stack (by instantiating it like a primitive type, i.e. MyType myType;). You can pass stack-allocated items by reference to functions and methods by telling the function to accept a reference (using the & keyword before the parameter name in your declaration). Doing this keeps your stack-allocated object in memory for as long as the method in which it was allocated remains in scope; once it goes out of scope, the object is reclaimed, the destructor is called, ba-da-bing, ba-da-boom, Bob's yer Uncle, and all done without pointers.
I used that trick to create some amazingly performant code in my C++ days -- at the expense of a larger stack and the risk of a stack overflow, naturally, but careful analysis managed to keep that risk very minimal.
My point is that you can do the same trick in C# using structs and refs. The tradeoffs? In addition to the risk of a stack overflow if you're not careful or if you use large objects, you are limited to no inheritance, and you tightly couple your code, making it it less testable and less maintainable. Additionally, you still have to deal with issues whenever you use core library calls.
Still, it might be worth a look-see in your case.
This is one of those issues where it is really hard to pin down a definitive answer in a way that will help you. The .NET GC is very good at tuning itself to the memory needs of you application. Is it good enough that your application can be coded without you needing to worry about memory management? I don't know.
There are definitely some common-sense things you can do to ensure that you don't hammer the GC. Using value types is definitely one way of accomplishing this but you need to be careful that you don't introduce other issues with poorly-written structs.
For the most part however I would say that the GC will do a good job managing all this stuff for you.
I've seen too many cases where people "optimize" the crap out of their code without much concern for how well it's written or how well it works even. I think the first thought should go towards making code solve the business problem at hand. The code should be well crafted and easily maintainable as well as properly tested.
After all of that, optimization should be considered, if testing indicates it's needed.
Random advice:
Someone mentioned putting dead objects in a queue to be reused instead of letting the GC at them... but be careful, as this means the GC may have more crap to move around when it consolidates the heap, and may not actually help you. Also, the GC is possibly already using techniques like this. Also, I know of at least one project where the engineers tried pooling and it actually hurt performance. It's tough to get a deep intuition about the GC. I'd recommend having a pooled and unpooled setting so you can always measure the perf differences between the two.
Another technique you might use in C# is dropping down to native C++ for key parts that aren't performing well enough... and then use the Dispose pattern in C# or C++/CLI for managed objects which hold unmanaged resources.
Also, be sure when you use value types that you are not using them in ways that implicitly box them and put them on the heap, which might be easy to do coming from Java.
Finally, be sure to find a good memory profiler.
I have the same thought all the time.
The truth is, though we were taught to watch out for unnecessary CPU tacts and memory consumption, the cost of little imperfections in our code just negligible in practice.
If you are aware of that and always watch, I believe you are okay to write not perfect code. If you have started with .NET/Java and have no prior experience in low level programming, the chances are you will write very abusive and ineffective code.
And anyway, as they say, the premature optimization is the root of all evil. You can spend hours optimizing one little function and it turns out then that some other part of code gives a bottleneck. Just keep balance doing it simple and doing it stupidly.
Although the Garbage Collector is there, bad code remains bad code. Therefore I would say yes as a .Net developer you should still care about how many objects you create and more importantly writing optimized code.
I have seen a considerable amount of projects get rejected because of this reason in Code Reviews inside our environment, and I strongly believe it is important.
.NET Memory management is very good and being able to programmatically tweak the GC if you need to is good.
I like the fact that you can create your own Dispose methods on classes by inheriting from IDisposable and tweaking it to your needs. This is great for making sure that connections to networks/files/databases are always cleaned up and not leaking that way. There is also the worry of cleaning up too early as well.
You might consider writing a set of object caches. Instead of creating new instances, you could keep a list of available objects somewhere. It would help you avoid the GC.
I agree with all points said above: the garbage collector is great, but it shouldn't be used as a crutch.
I've sat through many wasted hours in code-reviews debating over finer points of the CLR. The best definitive answer is to develop a culture of performance in your organization and actively profile your application using a tool. Bottlenecks will appear and you address as needed.
I think you answered your own question -- if it becomes a problem, then yes! I don't think this is a .Net vs. Java question, it's really a "should we go to exceptional lengths to avoid doing certain types of work" question. If you need better performance than you have, and after doing some profiling you find that object instantiation or garbage collection is taking tons of time, then that's the time to try some unusual approach (like the pooling you mentioned).
I wish I were a "software legend" and could speak of this in my own voice and breath, but since I'm not, I rely upon SL's for such things.
I suggest the following blog post by Andrew Hunter on .NET GC would be helpful:
http://www.simple-talk.com/dotnet/.net-framework/understanding-garbage-collection-in-.net/
Even beyond performance aspects, the semantics of a method which modifies a passed-in mutable object will often be cleaner than those of a method which returns a new mutable object based upon an old one. The statements:
munger.Munge(someThing, otherParams);
someThing = munger.ComputeMungedVersion(someThing, otherParams);
may in some cases behave identically, but while the former does one thing, the latter will do two--equivalent to:
someThing = someThing.Clone(); // Or duplicate it via some other means
munger.Munge(someThing, otherParams);
If someThing is the only reference, anywhere in the universe, to a particular object, then replacing it with a reference to a clone will be a no-op, and so modifying a passed-in object will be equivalent to returning a new one. If, however, someThing identifies an object to which other references exist, the former statement would modify the object identified by all those references, leaving all the references attached to it, while the latter would cause someThing to become "detached".
Depending upon the type of someThing and how it is used, its attachment or detachment may be moot issues. Attachment would be relevant if some object which holds a reference to the object could modify it while other references exist. Attachment is moot if the object will never be modified, or if no references outside someThing itself can possibly exist. If one can show that either of the latter conditions will apply, then replacing someThing with a reference to a new object will be fine. Unless the type of someThing is immutable, however, such a demonstration would require documentation beyond the declaration of someThing, since .NET provides no standard means of annotating that a particular reference will identify an object which--despite its being of mutable type--nobody is allowed to modify, nor of annotating that a particular reference should be the only one anywhere in the universe that identifies a particular object.