I just want to know that creating local variables to accept the return value of function is going to hit memory usage or performance in .Net applications , especially in ASP.Net.
say
MyObject myObject = Foo();
MyOtherObject myOtherObject = Boo();
SomeFuntion(myObject, myOtherObject);
OR
Should I use
MyFunction(Foo(), Boo());
Certainly the former usage has a better readability.. But what about the memory usage and performance?
Thanks in advance
123Developer
Don't optimise prematurely; in a release build it is quite likely that the compiler will optimise these away anyway! Either way, you are just talking a tiny amount of stack space for (presumably) a few references. Either approach is fine; go with whichever is more readable.
CIL (the intermediate language into which C# is compiled) is a stack-based language so the return values of the intermediate functions need to end up on the stack before being passed as arguments to the final one anyway.
There's no way of predicting what the C# compiler will do[1] in terms of locals; it may decide to use locals when you do, or it may use the stack behaviour and skip them altogether. Similarly it may synthesize locals even when you don't use them, or it might not.
Either way, the performance difference isn't worth worrying about.
[1] Yes, of course, you can compile and look at the IL it produces to determine what it will do, but that is only valid for the current version of the compiler you're using and is an implementation detail you shouldn't rely on.
I believe the memory performance would be essentially the same. And unless performance testing were to show a significant difference, choose the option with enhanced readability.
Don't be afraid to use local variables. The difference in memory usage and performance is very small, or in some cases none at all.
In your specific case the local variables may use 8 bytes (16 bytes on a 64-bit application) of stack space. However, the compiler can create local variables by itself if it's needed for temporary storage, so it's possible that both versions have the same set of local variables anyway.
Also, the compiler can use processor registers instead of stack space for some local variables, so it's not even certain that creating a local variable actually uses any stack space at all.
Allocating stack space is very cheap anyhow. When the method is called, a stack frame is created for the local data in the method. If more memory has to be allocated, that will only change how much the stack pointer is moved, it will not produce any extra code at all.
So, just write the code so that it's maintainable and robust, and trust the compiler to optimize the variable usage.
Related
Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)
First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.
If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.
Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.
It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.
why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.
If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.
Just curious about this. Following are two code snippets for the same function:
void MyFunc1()
{
int i = 10;
object obj = null;
if(something) return;
}
And the other one is...
void MyFunc1()
{
if(something) return;
int i = 10;
object obj = null;
}
Now does the second one has the benefit of NOT allocating the variables when something is true? OR the local stack variables (in current scope) are always allocated as soon as the function is called and moving the return statement to the top has no effect?
A link to dotnetperls.com article says "When you call a method in your C# program, the runtime allocates a separate memory region to store all the local variable slots. This memory is allocated on the stack even if you do not access the variables in the function call."
UPDATED
Here is a comparison of the IL code for these two functions. Func2 refers to second snipped. It seems like the variable in both the cases are allocated at the beginning, though in case of Func2() they are initialized later on. So no benefit as such I guess.
Peter Duniho's answer is correct. I want to draw attention to the more fundamental problem in your question:
does the second one have the benefit of NOT allocating the variables when something is true?
Why ought that to be a benefit? Your presumption is that allocating the space for a local variable has a cost, that not doing so has a benefit and that this benefit is somehow worth obtaining. Analyzing the actual cost of local variables is very, very difficult; the presumption that there is a clear benefit in avoiding an allocation conditionally is not warranted.
To address your specific question:
The local stack variables (in current scope) are always allocated as soon as the function is called and moving the return statement to the top has no effect?
I can't answer such a complicated question easily. Let's break it down into much simpler questions:
Variables are storage locations. What are the lifetimes of the storage locations associated with local variables?
Storage locations for "ordinary" local variables -- and formal parameters of lambdas, methods, and so on -- have short, predictable lifetimes. None of them live before the method is entered, and none of them live after the method terminates, either normally or exceptionally. The C# language specification clearly calls out that local variable lifetimes are permitted to be shorter at runtime than you might think if doing so does not cause an observable change to a single-threaded program.
Storage locations for "unusual" local variables -- outer variables of lambdas, local variables in iterator blocks, local variables in async methods, and so on -- have lifetimes which are difficult to analyze at compile time or at run time, and are therefore moved to the garbage-collected heap, which uses GC policy to determine the lifetimes of the variables. There is no requirement that such variables ever be cleaned up; their storage lifetime can be extended arbitrarily at the whim of the C# compiler or the runtime.
Can a local which is unused be optimized away entirely?
Yes. If the C# compiler or the runtime can determine that removing the local from the program entirely has no observable effect in a single-threaded program, then it may do so at its whim. Essentially this is reducing its lifetime to zero.
How are storage locations for "ordinary" locals allocated?
This is an implementation detail, but typically there are two techniques. Either space is reserved on the stack, or the local is enregistered.
How does the runtime determine whether a local is enregistered or put on the stack?
This is an implementation detail of the jitter's optimizer. There are many factors, such as:
whether the address of the local could possibly be taken; registers have no address
whether the local is passed as a parameter to another method
whether the local is a parameter of the current method
what the calling conventions are of all the methods involved
the size of the local
and many, many more factors
Suppose we consider only the ordinary locals which are put on the stack. Is it the case that storage locations for all such locals are allocated when a method is entered?
Again, this is an implementation detail, but typically the answer is yes.
So a "stack local" that is used conditionally would not be allocated off the stack conditionally? Rather, its stack location would always be allocated.
Typically, yes.
What are the performance tradeoffs inherent in that decision?
Suppose we have two locals, A and B, and one is used conditionally and the other is used unconditionally. Which is faster:
Add two units to the current stack pointer
Initialize the two new stack slots to zero
or
Add one unit to the current stack pointer
Initialize the new stack slot to zero
If the condition is met, add one unit to the current stack pointer and initialize the new stack slot to zero
Keep in mind that "add one" and "add two" have the same cost.
This scheme is not cheaper if the variable B is unused, and has twice the cost if it is used. That's not a win.
But what about space? The conditional scheme uses either one or two units of stack space but the unconditional scheme uses two regardless.
Correct. Stack space is cheap. Or, more accurately, the million bytes of stack space you get per thread is insanely expensive, and that expense is paid up front, when you allocate the thread. Most programs never use anywhere close to a million bytes of stack space; trying to optimize use of that space is like spending an hour deciding whether to pay $5.01 for a latte vs $5.02 when you have a million dollars in the bank; it's not worth it.
Suppose 100% of the stack-based locals are allocated conditionally. Could the jitter put the addition to the stack pointer after the conditional code?
In theory, yes. Whether the jitter actually makes this optimization -- an optimization which saves literally less than a billionth of a second -- I don't know. Keep in mind that any code the jitter runs to make the decision to save that billionth of a second is code that takes far more than a billionth of a second. Again, it makes no sense to spend hours worrying about pennies; time is money.
And of course, how realistic is it that the billionth of a second you save will be the common path? Most method calls do something, not return immediately.
Also, keep in mind that the stack pointer is going to have to move for all the temporary value slots that aren't enregistered, regardless of whether those slots have names or not. How many scenarios are there where the condition that determines whether or not the method returns itself has no subexpression which touches the stack? Because that's the condition you're actually proposing that gets optimized. This seems like a vanishingly small set of scenarios, in which you get a vanishingly small benefit. If I were writing an optimizer I would spend exactly zero percent of my valuable time on solving this problem, when there are far juicier low-hanging fruit scenarios that I could be optimizing for.
Suppose there are two locals that are each allocated conditionally under different conditions. Are there additional costs imposed by a conditional allocation scheme other than possibly doing two stack pointer moves instead of one or zero?
Yes. In the straightforward scheme where you move the stack pointer two slots and say "stack pointer is A, stack pointer + 1 is B", you now have a consistent-throughout-the-method way to characterize the variables A and B. If you conditionally move the stack pointer then sometimes the stack pointer is A, sometimes it is B, and sometimes it is neither. That greatly complicates all the code that uses A and B.
What if the locals are enregistered?
Then this becomes a problem in register scheduling; I refer you to the extensive literature on this subject. I am far from an expert in it.
The only way to know for sure when this happens for your program, when you run it, is to look at the code the JIT compiler emits when you run your program. None of us can even answer the specific question with authority (well, I guess someone who wrote the CLR could, provided they knew which version of the CLR you're using, and possible some other details about configuration and your actual program code).
Any allocation on the stack of a local variable is strictly "implementation detail". And the CLS doesn't promise us any specific implementation.
Some locals never wind up on the stack per se, normally due to being stored in a register, but it would be legal for the runtime to use heap space instead, as long as it preserves the normal lifetime semantics of a local vaiable.
See also Eric Lippert's excellent series The Stack Is An Implementation Detail
I've been doing a lot of research on C# optimization for a game that I'm building with XNA, and I still don't quite understand whether local variables are instance variables give better performance when constantly being updated and used.
According to http://www.dotnetperls.com/optimization , you should avoid parameters and local variables, meaning instance variables are the best option in terms of performance.
But a while ago, I read on another StackOverflow post (I can't seem to find where it was) that local variables are stored in a part of memory that is far quicker to access, and that every time an instance variable is set, the previous value has to be erased as a tedious extra step before a new value can be assigned.
I know that design-wise, it might break encapsulation to use instance variables in that kind of situation, but I'm strictly curious about performance. Currently in my game, I pass around local variables to 3 out of 7 methods in a class, but I could easily promote the variables to instance variables and be able to entirely avoid parameter passing and local variables.
So which would be better?
Are your variables reference (class, or string) or value (struct) types?
For reference types there's no meaningful difference between passing them as a method argument and holding them on an object instance. In the first case when entering the function the argument will (for functions with a small argument count) end up in a register. In the second case the reference exists as an offset of the data pointed to in memory by 'this'. Either scenario is a quick grab of a memory address and then fetching the associated data out of memory (this is the expensive part).
For value types the above is true for certain types (integers or floats that can fit in your CPU's registers). For those specific things it's probably a little cheaper to pass-by-value vs. extracting them off 'this'. For other value types (DateTime or structs you might make yourself or any struct with multiple members) when the data is going to be too large to pass in through a register so this no longer matters.
It's pretty unlikely, though, that any of this matters for the performance of your application (even a game). Most common .NET performance problems (that are not simply inefficient algorithms) are going to, in some form, come from garbage generation. This can manifest itself through accidental boxing, poor use of string building, or poor object lifetime management (your objects have lifespans that are neither very short nor very long/permanent).
Personally, I wouldn't be looking at this as the culprit for performance issues (unless you are constantly passing large structs). My naive understanding is that GC pressure is the usual consideration with XNA games, so being frugal with your object instances basically.
If the variable is method-local, the value itself or the reference (when a reference type) will be located on the stack. If you promote those to class member variables they will be located in the class's memory on the heap.
Method calls would technically become faster as you are no longer copying references or values on the call (because presumably you can remove the parameters from the method if the method is also local to the class).
I'm not sure about the relative performance, but to me it seems that if you need to persist the value then the value makes some sense being in the class...
To me it seems like any potential gains from doing one in favour of the other is outweighed by the subtle differences between the two - making them roughly equivalent or so small a difference as to not care.
Of course, all this stands to be corrected in the face of hard numbers from performance profiling.
Not passing arguments will be slightly faster, not initialising local objects (if they are objects) will be faster.
What you read in both articles is not contradictory, one is mentioning that passing argument costs time and the other mention that initialising objects (in local objects) can cost time as well.
About allocating new objects, one thing you can do is to reuse objects rather discarding them. For example, some time ago, I had to write some code for traders which would compute the price of a few products in real time in C/C++ and C#. I obtained a major boost of performance by not re-creating the system of equation from scratch but only by merely updating the system incrementally from the previous system.
This avoided allocating memory for the new objects, initialising new objects and often the system would be nearly the same so I would have only to modify tiny bits to update it.
Usually, before to do any optimisation you want to make sure that you are optimising something that will impact significantly the overall performance ?
We all know that it's good practice to create small methods that promote reuse, which inevitably will cause lots of methods being placed on the stack. However is it possible to reach the scenario where there are so many nested methods calls that a StackOverflow Exception occurs?
Would the accepted solution to be simply increase the stack size?
The documentation states that a such an exception will occur during "very deep or unbounded recursion" so it certainly seems possible, or does the .NET framework dynamically handle stack size for us?
My question can be summed up like so:
Is it possible to have such a well designed program (in
terms of small reusable methods) that is becomes necassary to increase the
stack size and hence use more resources?
The .NET stack size is fixed, and 1 MB by default.
Is it possible to have such a well designed program (in terms of small reusable methods) that is becomes necessary to increase the stack size and hence use more resources?
It will not be in the decomposition of your logic into methods.
The only way you'll encounter a Stack Overflow that is not a direct bug is with recursion. And when that happens (threatens), don't increase the stack but rewrite the code to use a different way to store data (like a Stack<T>).
Not really. I just did a very quick test, and a StackOverflowException occurs after 15,000 nested calls.
There's no way you'll be writing code that will non-recursively nest 15,000 times due to the sheer number of methods you have.
Obviously the exact number depends on many function-local variables you have allocated on the stack. But whatever that actual number may be, it is nowhere near enough to do what you're suggesting.
In the managed world the stack has a special role for performance. If you manage to allocate something on the stack (using primitives or structs) you don't have to put it on the heap. Allocating on the heap adds GC pressure which on average slows the program.
So I could image a program which is faster by allocating lots of stuff on the stack. Even using stackalloc (which is a less well known feature of C#/the CLR).
There are valid cases to do this. They are rare. Just saying "there are no valid uses" is plain wrong.
This question has been puzzling me for a long time now. I come from a heavy and long C++ background, and since I started programming in C# and dealing with garbage collection I always had the feeling that such 'magic' would come at a cost.
I recently started working in a big MMO project written in Java (server side). My main task is to optimize memory comsumption and CPU usage. Hundreds of thousands of messages per second are being sent and the same amount of objects are created as well. After a lot of profiling we discovered that the VM garbage collector was eating a lot of CPU time (due to constant collections) and decided to try to minimize object creation, using pools where applicable and reusing everything we can. This has proven to be a really good optimization so far.
So, from what I've learned, having a garbage collector is awesome, but you can't just pretend it does not exist, and you still need to take care about object creation and what it implies (at least in Java and a big application like this).
So, is this also true for .NET? if it is, to what extent?
I often write pairs of functions like these:
// Combines two envelopes and the result is stored in a new envelope.
public static Envelope Combine( Envelope a, Envelope b )
{
var envelope = new Envelope( _a.Length, 0, 1, 1 );
Combine( _a, _b, _operation, envelope );
return envelope;
}
// Combines two envelopes and the result is 'written' to the specified envelope
public static void Combine( Envelope a, Envelope b, Envelope result )
{
result.Clear();
...
}
A second function is provided in case someone has an already made Envelope that may be reused to store the result, but I find this a little odd.
I also sometimes write structs when I'd rather use classes, just because I know there'll be tens of thousands of instances being constantly created and disposed, and this feels really odd to me.
I know that as a .NET developer I shouldn't be worrying about this kind of issues, but my experience with Java and common sense tells me that I should.
Any light and thoughts on this matter would be much appreciated. Thanks in advance.
Yes, it's true of .NET as well. Most of us have the luxury of ignoring the details of memory management, but in your case -- or in cases where high volume is causing memory congestion -- then some optimizaition is called for.
One optimization you might consider for your case -- something I've been thinking about writing an article about, actually -- is the combination of structs and ref for real deterministic disposal.
Since you come from a C++ background, you know that in C++ you can instantiate an object either on the heap (using the new keyword and getting back a pointer) or on the stack (by instantiating it like a primitive type, i.e. MyType myType;). You can pass stack-allocated items by reference to functions and methods by telling the function to accept a reference (using the & keyword before the parameter name in your declaration). Doing this keeps your stack-allocated object in memory for as long as the method in which it was allocated remains in scope; once it goes out of scope, the object is reclaimed, the destructor is called, ba-da-bing, ba-da-boom, Bob's yer Uncle, and all done without pointers.
I used that trick to create some amazingly performant code in my C++ days -- at the expense of a larger stack and the risk of a stack overflow, naturally, but careful analysis managed to keep that risk very minimal.
My point is that you can do the same trick in C# using structs and refs. The tradeoffs? In addition to the risk of a stack overflow if you're not careful or if you use large objects, you are limited to no inheritance, and you tightly couple your code, making it it less testable and less maintainable. Additionally, you still have to deal with issues whenever you use core library calls.
Still, it might be worth a look-see in your case.
This is one of those issues where it is really hard to pin down a definitive answer in a way that will help you. The .NET GC is very good at tuning itself to the memory needs of you application. Is it good enough that your application can be coded without you needing to worry about memory management? I don't know.
There are definitely some common-sense things you can do to ensure that you don't hammer the GC. Using value types is definitely one way of accomplishing this but you need to be careful that you don't introduce other issues with poorly-written structs.
For the most part however I would say that the GC will do a good job managing all this stuff for you.
I've seen too many cases where people "optimize" the crap out of their code without much concern for how well it's written or how well it works even. I think the first thought should go towards making code solve the business problem at hand. The code should be well crafted and easily maintainable as well as properly tested.
After all of that, optimization should be considered, if testing indicates it's needed.
Random advice:
Someone mentioned putting dead objects in a queue to be reused instead of letting the GC at them... but be careful, as this means the GC may have more crap to move around when it consolidates the heap, and may not actually help you. Also, the GC is possibly already using techniques like this. Also, I know of at least one project where the engineers tried pooling and it actually hurt performance. It's tough to get a deep intuition about the GC. I'd recommend having a pooled and unpooled setting so you can always measure the perf differences between the two.
Another technique you might use in C# is dropping down to native C++ for key parts that aren't performing well enough... and then use the Dispose pattern in C# or C++/CLI for managed objects which hold unmanaged resources.
Also, be sure when you use value types that you are not using them in ways that implicitly box them and put them on the heap, which might be easy to do coming from Java.
Finally, be sure to find a good memory profiler.
I have the same thought all the time.
The truth is, though we were taught to watch out for unnecessary CPU tacts and memory consumption, the cost of little imperfections in our code just negligible in practice.
If you are aware of that and always watch, I believe you are okay to write not perfect code. If you have started with .NET/Java and have no prior experience in low level programming, the chances are you will write very abusive and ineffective code.
And anyway, as they say, the premature optimization is the root of all evil. You can spend hours optimizing one little function and it turns out then that some other part of code gives a bottleneck. Just keep balance doing it simple and doing it stupidly.
Although the Garbage Collector is there, bad code remains bad code. Therefore I would say yes as a .Net developer you should still care about how many objects you create and more importantly writing optimized code.
I have seen a considerable amount of projects get rejected because of this reason in Code Reviews inside our environment, and I strongly believe it is important.
.NET Memory management is very good and being able to programmatically tweak the GC if you need to is good.
I like the fact that you can create your own Dispose methods on classes by inheriting from IDisposable and tweaking it to your needs. This is great for making sure that connections to networks/files/databases are always cleaned up and not leaking that way. There is also the worry of cleaning up too early as well.
You might consider writing a set of object caches. Instead of creating new instances, you could keep a list of available objects somewhere. It would help you avoid the GC.
I agree with all points said above: the garbage collector is great, but it shouldn't be used as a crutch.
I've sat through many wasted hours in code-reviews debating over finer points of the CLR. The best definitive answer is to develop a culture of performance in your organization and actively profile your application using a tool. Bottlenecks will appear and you address as needed.
I think you answered your own question -- if it becomes a problem, then yes! I don't think this is a .Net vs. Java question, it's really a "should we go to exceptional lengths to avoid doing certain types of work" question. If you need better performance than you have, and after doing some profiling you find that object instantiation or garbage collection is taking tons of time, then that's the time to try some unusual approach (like the pooling you mentioned).
I wish I were a "software legend" and could speak of this in my own voice and breath, but since I'm not, I rely upon SL's for such things.
I suggest the following blog post by Andrew Hunter on .NET GC would be helpful:
http://www.simple-talk.com/dotnet/.net-framework/understanding-garbage-collection-in-.net/
Even beyond performance aspects, the semantics of a method which modifies a passed-in mutable object will often be cleaner than those of a method which returns a new mutable object based upon an old one. The statements:
munger.Munge(someThing, otherParams);
someThing = munger.ComputeMungedVersion(someThing, otherParams);
may in some cases behave identically, but while the former does one thing, the latter will do two--equivalent to:
someThing = someThing.Clone(); // Or duplicate it via some other means
munger.Munge(someThing, otherParams);
If someThing is the only reference, anywhere in the universe, to a particular object, then replacing it with a reference to a clone will be a no-op, and so modifying a passed-in object will be equivalent to returning a new one. If, however, someThing identifies an object to which other references exist, the former statement would modify the object identified by all those references, leaving all the references attached to it, while the latter would cause someThing to become "detached".
Depending upon the type of someThing and how it is used, its attachment or detachment may be moot issues. Attachment would be relevant if some object which holds a reference to the object could modify it while other references exist. Attachment is moot if the object will never be modified, or if no references outside someThing itself can possibly exist. If one can show that either of the latter conditions will apply, then replacing someThing with a reference to a new object will be fine. Unless the type of someThing is immutable, however, such a demonstration would require documentation beyond the declaration of someThing, since .NET provides no standard means of annotating that a particular reference will identify an object which--despite its being of mutable type--nobody is allowed to modify, nor of annotating that a particular reference should be the only one anywhere in the universe that identifies a particular object.