Please ignore code readability in this question.
In terms of performance, should the following code be written like this:
int maxResults = criteria.MaxResults;
if (maxResults > 0)
{
while (accounts.Count > maxResults)
accounts.RemoveAt(maxResults);
}
or like this:
if (criteria.MaxResults > 0)
{
while (accounts.Count > criteria.MaxResults)
accounts.RemoveAt(criteria.MaxResults);
}
?
Edit: criteria is a class, and MaxResults is a simple integer property (i.e., public int MaxResults { get { return _maxResults; } }.
Does the C# compiler treat MaxResults as a black box and evaluate it every time? Or is it smart enough to figure out that I've got 3 calls to the same property with no modification of that property between the calls? What if MaxResults was a field?
One of the laws of optimization is precalculation, so I instinctively wrote this code like the first listing, but I'm curious if this kind of thing is being done for me automatically (again, ignore code readability).
(Note: I'm not interested in hearing the 'micro-optimization' argument, which may be valid in the specific case I've posted. I'd just like some theory behind what's going on or not going on.)
First off, the only way to actually answer performance questions is to actually try it both ways and test the results in realistic conditions.
That said, the other answers which say that "the compiler" does not do this optimization because the property might have side effects are both right and wrong. The problem with the question (aside from the fundamental problem that it simply cannot be answered without actually trying it and measuring the result) is that "the compiler" is actually two compilers: the C# compiler, which compiles to MSIL, and the JIT compiler, which compiles IL to machine code.
The C# compiler never ever does this sort of optimization; as noted, doing so would require that the compiler peer into the code being called and verify that the result it computes does not change over the lifetime of the callee's code. The C# compiler does not do so.
The JIT compiler might. No reason why it couldn't. It has all the code sitting right there. It is completely free to inline the property getter, and if the jitter determines that the inlined property getter returns a value that can be cached in a register and re-used, then it is free to do so. (If you don't want it to do so because the value could be modified on another thread then you already have a race condition bug; fix the bug before you worry about performance.)
Whether the jitter actually does inline the property fetch and then enregister the value, I have no idea. I know practically nothing about the jitter. But it is allowed to do so if it sees fit. If you are curious about whether it does so or not, you can either (1) ask someone who is on the team that wrote the jitter, or (2) examine the jitted code in the debugger.
And finally, let me take this opportunity to note that computing results once, storing the result and re-using it is not always an optimization. This is a surprisingly complicated question. There are all kinds of things to optimize for:
execution time
executable code size -- this has a major effect on executable time because big code takes longer to load, increases the working set size, puts pressure on processor caches, RAM and the page file. Small slow code is often in the long run faster than big fast code in important metrics like startup time and cache locality.
register allocation -- this also has a major effect on execution time, particularly in architectures like x86 which have a small number of available registers. Enregistering a value for fast re-use can mean that there are fewer registers available for other operations that need optimization; perhaps optimizing those operations instead would be a net win.
and so on. It get real complicated real fast.
In short, you cannot possibly know whether writing the code to cache the result rather than recomputing it is actually (1) faster, or (2) better performing. Better performance does not always mean making execution of a particular routine faster. Better performance is about figuring out what resources are important to the user -- execution time, memory, working set, startup time, and so on -- and optimizing for those things. You cannot do that without (1) talking to your customers to find out what they care about, and (2) actually measuring to see if your changes are having a measurable effect in the desired direction.
If MaxResults is a property then no, it will not optimize it, because the getter may have complex logic, say:
private int _maxResults;
public int MaxReuslts {
get { return _maxResults++; }
set { _maxResults = value; }
}
See how the behavior would change if it in-lines your code?
If there's no logic...either method you wrote is fine, it's a very minute difference and all about how readable it is TO YOU (or your team)...you're the one looking at it.
Your two code samples are only guaranteed to have the same result in single-threaded environments, which .Net isn't, and if MaxResults is a field (not a property). The compiler can't assume, unless you use the synchronization features, that criteria.MaxResults won't change during the course of your loop. If it's a property, it can't assume that using the property doesn't have side effects.
Eric Lippert points out quite correctly that it depends a lot on what you mean by "the compiler". The C# -> IL compiler? Or the IL -> machine code (JIT) compiler? And he's right to point out that the JIT may well be able to optimize the property getter, since it has all of the information (whereas the C# -> IL compiler doesn't, necessarily). It won't change the situation with multiple threads, but it's a good point nonetheless.
It will be called and evaluated every time. The compiler has no way of determining if a method (or getter) is deterministic and pure (no side effects).
Note that actual evaluation of the property may be inlined by the JIT compiler, making it effectively as fast as a simple field.
It's good practise to make property evaluation an inexpensive operation. If you do some heavy calculation in the getter, consider caching the result manually, or changing it to a method.
why not test it?
just set up 2 console apps make it look 10 million times and compare the results ... remember to run them as properly released apps that have been installed properly or else you cannot gurantee that you are not just running the msil.
Really you are probably going to get about 5 answers saying 'you shouldn't worry about optimisation'. they clearly do not write routines that need to be as fast as possible before being readable (eg games).
If this piece of code is part of a loop that is executed billions of times then this optimisation could be worthwhile. For instance max results could be an overridden method and so you may need to discuss virtual method calls.
Really the ONLY way to answer any of these questions is to figure out is this is a piece of code that will benefit from optimisation. Then you need to know the kinds of things that are increasing the time to execute. Really us mere mortals cannot do this a priori and so have to simply try 2-3 different versions of the code and then test it.
If criteria is a class type, I doubt it would be optimized, because another thread could always change that value in the meantime. For structs I'm not sure, but my gut feeling is that it won't be optimized, but I think it wouldn't make much difference in performance in that case anyhow.
Related
Being that properties are just methods under the hood, it's understandable that the performance of any logic they might perform may or may not improve performance - so it's understandable why the JIT needs to check if methods are worth inlining.
Automatic properties however (as far as I understand) cannot have any logic, and simply return or set the value of the underlying field. As far as I know, automatic properties are treated by the Compiler and the JIT just like any other methods.
(Everything below will rely on the assumption that the above paragraph is correct.)
Value Type properties show different behavior than the variable itself, but Reference Type properties supposedly should have the exact same behavior as direct access to the underlying variable.
// Automatic Properties Example
public Object MyObj { get; private set; }
Is there any case where automatic properties to Reference Types could show a performance hit by being inlined?
If not, what prevents either the Compiler or the JIT from automatically inlining them?
Note: I understand that the performance gain would probably be insignificant, especially when the JIT is likely to inline them anyway if used enough times - but small as the gain may be, it seems logical that such a seemingly simple optimization would be introduced regardless.
EDIT: The JIT compiler doesn't work in the way you think it does, which I guess is why you're probably not completely understanding what I was trying to convey above. I've quoted your comment below:
That is a different matter, but as far as I understand methods are only checked for being inline-worthy if they are called enough times. Not the mention that the checking itself is a performance hit. (Let the size of the performance hit be irrelevant for now.)
First, most, if not all, methods are checked to see if they can be inlined. Second, keep in mind that methods are only ever JITed once and it is during that one time that the JITer will determine if any methods called inside of it will be inlined. This can happen before any code is executed at all by your program. What makes a called method a good candidate for inlining?
The x86 JIT compiler (x64 and ia64 don't necessarily use the same optimization techniques) checks a few things to determine if a method is a good candidate for inlining, definitely not just the number of times it is called. The article lists things like if inlining will make the code smaller, if the call site will be executed a lot of times (ie in a loop), and others. Each method is optimized on its own, so the method may be inlined in one calling method but not in another, as in the example of a loop. These optimization heuristics are only available to JIT, the C# compiler just doesn't know: it's producing IL, not native code. There's a huge difference between them; native vs IL code size can be quite different.
To summarize, the C# compiler doesn't inline properties for performance reasons.
The jit compiler inlines most simple properties, including automatic properties. You can read more about how the JIT decides to inline method calls at this interesting blog post.
Well, the C# compiler doesn't inline any methods at all. I assume this is the case because of the way the CLR is designed. Each assembly is designed to be portable from machine to machine. A lot of times, you can change the internal behavior of a .NET assembly without having to recompile all the code, it can just be a drop in replacement (at least when types haven't changed). If the code were inlined, it breaks that (great, imo) design and you lose that luster.
Let's talk about inlining in C++ first. (Full disclosure, I haven't used C++ full time in a while, so I may be vague, my explanations rusty, or completely incorrect! I'm counting on my fellow SOers to correct and scold me)
The C++ inline keyword is like telling the compiler, "Hey man, I'd like you to inline this function, because I think it will improve performance". Unfortunately, it is only telling the compiler you'd prefer it inlined; it is not telling it that it must.
Perhaps at an earlier date, when compilers were less optimized than they are now, the compiler would more often than not compile that function inlined. However, as time went on and compilers grew smarter, the compiler writers discovered that in most cases, they were better at determining when a function should be inlined that the developer was. For those few cases where it wasn't, developers could use the seriouslybro_inlineme keyword (officially called __forceinline in VC++).
Now, why would the compiler writers do this? Well, inlining a function doesn't always mean increased performance. While it certainly can, it can also devastate your programs performance, if used incorrectly. For example, we all know one side effect of inlining code is increased code size, or "fat code syndrome" (disclaimer: not a real term). Why is "fat code syndrome" a problem? If you take a look at the article I linked above, it explains, among other things, memory is slow, and the bigger your code, the less likely it will fit in the fastest CPU cache (L1). Eventually it can only fit in memory, and then, inlining has done nothing. However, compilers know when these situations can happen, and do their best to prevent it.
Putting that together with your question, let's look at it this way: the C# compiler is like a developer writing code for the JIT compiler: the JIT is just smarter (but not a genius). It often knows when inlining will benefit or harm execution speed. "Senior developer" C# compiler doesn't have any idea how inlining a method call could benefit the runtime execution of your code, so it doesn't. I guess that actually means the C# compiler is smart, because it leaves the job of optimization to those who are better than it, in this case, the JIT compiler.
Automatic properties however (as far as I understand) cannot have any
logic, and simply return or set the value of the underlying field. As
far as I know, automatic properties are treated by the Compiler and
the JIT just like any other methods.
That automatic properties cannot have any logic is an implementation detail, there is not any special knowledge of that fact that is required for compilation. In fact, as you say auto properties are compiled down to method calls.
Suppose auto propes were inlined and the class and property are defined in a different assembly. This would mean that if the property implementation changes, you would have to recompile the application to see that change. That defeats using properties in the first place which should allow you to change the internal implementation without having to recompile the consuming application.
Automatic properties are just that - property get/set methods generated automatically. As result there is nothing special in IL for them. C# compiler by itself does very small number of optimizations.
As for reasons why not to inline - imagine your type is in a separate assembly hence you are free to change source of that assembly to have insanely complicated get/set for the property. As result compiler can't reason on complexity of the get/set code when it sees your automatic property first time while creating new assembly depending on your type.
As you've already noted in your question - "especially when the JIT is likely to inline them anyway" - this property methods will likely be inlined at JIT time.
I'm building an application that is seriously slower than it should be (a process takes 4 seconds when it should take only .1 seconds, that's my goal at least).
I have a bunch of methods that pass an array from one to the other. This has kept my code nice and organized, but I'm worried that it's killing the efficiency of my code.
Can anyone confirm if this is the case?
Also, I have all of my code contained in a class separate from my UI. Is this going make things run significantly slower than if I had my code contained in the Form1.cs file?
Edit: There are about 95000 points that need to be calculated, each point goes through 7 methods that does additional calculations.
Have you tried any profiling or performance tools to narrow down why the slowdown occurs?
It might show you ways that you could use to refactor your code and improve performance.
This question asked by another user has several options that you can choose from:
Good .Net Profilers
No. This is not what is killing your code speed, unless many methods means like a million or something. You probably have more things iterating through your array than you need or realize, and the array itself may have a larger memory footprint than you realize.
Perhaps you should look into a design where instead of passing the array to 7 methods, you iterate the array once, passing the members to 7 methods, this will minimize the number of times you're iterating through 95000 members.
In general, function calls are basic enough to be highly optimized by any interpreter (or compiler). Therefore these do not produce to much blow-up in run time. In fact, if wrap your problem to, say, some fancy iterative solution, you save handling the stack, but instead have to handle some iteration variables, which will not be to hard.
I know, there have been programmers who wondered why their recursive algorithms have been so slow, until someone told them not to pass array entries by value.
You should provide some sample code. In general, you should for other bottlenecks, or find another algorithm.
Just need to run it against a good profiling tool. I've got some stuff I wished only took 4 seconds - works with upwards of a hundred million records in a pass.
An Array is a reference type not a value type. Therefore you never pass the array. You are actually passing the pointer to the array in memory. So passing the array isn't your issue. Most likely you have an issue with what you do with your array. You need to do what Jamie Keeling said and run it through a profiler or even just debug it and see if you get stuck in some big loops.
Why are you loading them all into an array and doing each method in turn rather than iterating through them as loaded?
If you can obtain them (from whatever input source) deal with them and output them (whether to screen, file our wherever) this will inevitably use less memory and reduce start-up time, at the very least.
If this answer is applicable to your situation, start by changing your methods to deal with enumerations rather than arrays (non-breaking change, since arrays are enumerations), then change your input method to yield return items as loaded rather than loading an entire array.
Sorry for posting an old link (.NET 1.1) but it was contained in VS2010 article, so:
Here you can read about method costs. (Initial link)
Then, if you start your code from VS (no matters, even in Release mode) the VS debugger connects to your code and slow it down.
I know that for this advise I will be minused but... The max performance will be achieved with unsafe operations with arrays (yes it's UNSAFE, but when there is performance deal, so...)
And the last - refactor your code to use minimum of methods which working with your arrays. It will improve the performance.
I have stumbled on an impossibly good performance behaviour with PostSharp. To evaluate the speed I wrote a little program, that would execute one function a specified number of times, and if PostSharp is enable it would generate and delete a few hundred strings, just in memory (non fixed composition, so they are not auto-interned). The loop executes in a non-trivial (a few milliseconds) amount of time.
Now, I am unable to measure the difference on a few million runs, and a crazy run of ~40 billion iterations amounted to a difference of just a few nanoseconds vs non-PostSharp version doing the same number of calls. To me, this is impossible. There must be something wrong with my test. I had the code peer-reviewed by my co-workers, so I am fairly confident the code does what I intend it to.
So, is there something wrong with using string generation (which is the expected use in the intended applications) as the slow-running simulation for the benchmarks?
Alternatively, has someone else performed (or know of) a PostSharp's runtime performance analysis?
Thank you.
On a 3 GHz processor, 40 billion clock cycles alone will take 13 seconds - and I sincerely doubt that a single iteration is taking just one clock cycle. Something's definitely wrong with your test.
Something's likely getting optimized away - maybe it sees that you're doing the same thing over and over again and is deciding not to do it at all (except the first time). You need to make sure you're randomizing your data when you do perf analysis.
I have done performance tests. They were published in PostSharp Blog
Some aspects can have the same performance as hand written code if they don't use features such as: reflection, access to method parameters, access to method instance. Since PostSharp emits MSIL instructions, the generated code can be inlined by the JIT compiler.
As reminded in other answers, be sure that (1) PostSharp is indeed invoked (use Reflector on the resulting assembly) and (2) you're using the Stopwatch properly. If you're comparing the average time of a single test, it's normal that the difference between PostSharp and hand-written code is just a few nanoseconds (in the hypothesis that you don't use expensive features).
Can you change your test, such that the generated strings are used in the next iteration (string length written to the console) or something like that?
Maybe the compiler optimizes your program in such a way that either the postsharp-function is not executed at all or that it is called asynchronously or executed on another cpu, because there is no reason to sync with the other iterations. If you link it more tightly, this may force then the compiler, to synchronize the actions.
Consider the following code sample:
private void AddEnvelope(MailMessage mail)
{
if (this.CopyEnvelope)
{
// Perform a few operations
}
}
vs
private void AddEnvelope(MailMessage mail)
{
if (!this.CopyEnvelope) return;
// Perform a few operations
}
Will the bottom code execute any faster? Why would ReSharper make this recommendation?
Update
Having thought about this question the answer might seem obvious to some. But lots of us developers were never in the habit of nesting zounds of if statements in the first place...
It doesn't matter. Stop agonizing over performance issues that don't exist - use a profiler to identify areas in your code that DO exhibit issues, and fix them. Proactive optimization - before you know that there is a problem - is by definition a waste of time.
Updated Answer:
It's a code maintainability suggestion. Easier to read than nesting the rest of the code in an IF statement. Examples/discussion of this can be seen at the following links:
Flattening Arrow Code
Replace Nested Conditional With Guard Clauses
Code Contracts section "Legacy Requires Statements"
Original Answer:
It will actually run (very negligibly) slower from having to perform a
NOT operation.
So much in fact, some people actually consider that prettier way to
code as it avoids an extra level of indentation for the bulk of the
code.
It's a refactor of a conditional that encompasses the entire method contents to a Guard Clause. It has nothing to do with optimization.
I like the comments about optimizing things like this, to add a little more to it...
The only time I can think of that it makes sense to optimize your if statements is when you have the results of TWO or more longish running methods that need to be combined to determine to do something else. You would only want to execute the second operation if the first operation yielded results that would pass the condition. Putting the one that is most likely to return false first will generally be a smarter choice. This is because if it is false, the second one will not be evaluated at all. Again, only worth worrying about if the operations are significant and you can predict which is more likely to pass or fail. Invert this for OR... if true, it will only evaluate the first, and so optimize that way.
i.e.
if (ThisOneUsuallyPasses() && ThisOneUsuallyFails())
isn't so good as
if (ThisOneUsuallyFails() && ThisOneUsuallyPasses())
because it's only on the odd case that the first one actually works that you have to look at the second. There's some other flavors of this you can derive, but I think you should get the point.
Better to worry about how you use strings, collections, index your database, and allocate objects than spend a lot of time worrying about single condition if statements if you are worrying about perf.
In general, what the bottom code you give will do is give you an opportunity to avoid a huge block of code inside an if statement which can lead to silly typo driven errors. Old school thinking was that you should only have one point that you return from a method to avoid a different breed of coder error. Current thinking (at least by some of the tool vendors such as jetbrains resharper, etc) seems to be that wrapping the least amount of code inside of conditional statements is better. Anything more than that would be subjective so I'll leave it at that.
This kind of "Optimizations" are not worth the time spent on refactoring your code, because all modern compilers does already enough small optimizations that they make this kind of tips trivial.
As mentioned above Performance Optimization is done through profilers to calculate how your system is performing and the potential bottlenecks before applying the performance fix, and then after the performance fix to see if your fix is any good.
Required reading: Cyclomatic_complexity
Cyclomatic Complexity is a quantitative measure of the number of linearly independent
paths through a program's source code
Which means, every time you branch using and if statement you increase the Cyclomatic Complexity by 1.
To test each linearly independent path through the program; in this case, the number of test cases will equal the cyclomatic complexity of the program.
Which means, if you want to test your code completely, for each if statement you would have to introduce a new test case.
So, by introducing more if statements the complexity of your code increases, as does the number of test cases required to test it.
By removing if statements, your code complexity decreases as does the number of test cases required to test.
This question has been puzzling me for a long time now. I come from a heavy and long C++ background, and since I started programming in C# and dealing with garbage collection I always had the feeling that such 'magic' would come at a cost.
I recently started working in a big MMO project written in Java (server side). My main task is to optimize memory comsumption and CPU usage. Hundreds of thousands of messages per second are being sent and the same amount of objects are created as well. After a lot of profiling we discovered that the VM garbage collector was eating a lot of CPU time (due to constant collections) and decided to try to minimize object creation, using pools where applicable and reusing everything we can. This has proven to be a really good optimization so far.
So, from what I've learned, having a garbage collector is awesome, but you can't just pretend it does not exist, and you still need to take care about object creation and what it implies (at least in Java and a big application like this).
So, is this also true for .NET? if it is, to what extent?
I often write pairs of functions like these:
// Combines two envelopes and the result is stored in a new envelope.
public static Envelope Combine( Envelope a, Envelope b )
{
var envelope = new Envelope( _a.Length, 0, 1, 1 );
Combine( _a, _b, _operation, envelope );
return envelope;
}
// Combines two envelopes and the result is 'written' to the specified envelope
public static void Combine( Envelope a, Envelope b, Envelope result )
{
result.Clear();
...
}
A second function is provided in case someone has an already made Envelope that may be reused to store the result, but I find this a little odd.
I also sometimes write structs when I'd rather use classes, just because I know there'll be tens of thousands of instances being constantly created and disposed, and this feels really odd to me.
I know that as a .NET developer I shouldn't be worrying about this kind of issues, but my experience with Java and common sense tells me that I should.
Any light and thoughts on this matter would be much appreciated. Thanks in advance.
Yes, it's true of .NET as well. Most of us have the luxury of ignoring the details of memory management, but in your case -- or in cases where high volume is causing memory congestion -- then some optimizaition is called for.
One optimization you might consider for your case -- something I've been thinking about writing an article about, actually -- is the combination of structs and ref for real deterministic disposal.
Since you come from a C++ background, you know that in C++ you can instantiate an object either on the heap (using the new keyword and getting back a pointer) or on the stack (by instantiating it like a primitive type, i.e. MyType myType;). You can pass stack-allocated items by reference to functions and methods by telling the function to accept a reference (using the & keyword before the parameter name in your declaration). Doing this keeps your stack-allocated object in memory for as long as the method in which it was allocated remains in scope; once it goes out of scope, the object is reclaimed, the destructor is called, ba-da-bing, ba-da-boom, Bob's yer Uncle, and all done without pointers.
I used that trick to create some amazingly performant code in my C++ days -- at the expense of a larger stack and the risk of a stack overflow, naturally, but careful analysis managed to keep that risk very minimal.
My point is that you can do the same trick in C# using structs and refs. The tradeoffs? In addition to the risk of a stack overflow if you're not careful or if you use large objects, you are limited to no inheritance, and you tightly couple your code, making it it less testable and less maintainable. Additionally, you still have to deal with issues whenever you use core library calls.
Still, it might be worth a look-see in your case.
This is one of those issues where it is really hard to pin down a definitive answer in a way that will help you. The .NET GC is very good at tuning itself to the memory needs of you application. Is it good enough that your application can be coded without you needing to worry about memory management? I don't know.
There are definitely some common-sense things you can do to ensure that you don't hammer the GC. Using value types is definitely one way of accomplishing this but you need to be careful that you don't introduce other issues with poorly-written structs.
For the most part however I would say that the GC will do a good job managing all this stuff for you.
I've seen too many cases where people "optimize" the crap out of their code without much concern for how well it's written or how well it works even. I think the first thought should go towards making code solve the business problem at hand. The code should be well crafted and easily maintainable as well as properly tested.
After all of that, optimization should be considered, if testing indicates it's needed.
Random advice:
Someone mentioned putting dead objects in a queue to be reused instead of letting the GC at them... but be careful, as this means the GC may have more crap to move around when it consolidates the heap, and may not actually help you. Also, the GC is possibly already using techniques like this. Also, I know of at least one project where the engineers tried pooling and it actually hurt performance. It's tough to get a deep intuition about the GC. I'd recommend having a pooled and unpooled setting so you can always measure the perf differences between the two.
Another technique you might use in C# is dropping down to native C++ for key parts that aren't performing well enough... and then use the Dispose pattern in C# or C++/CLI for managed objects which hold unmanaged resources.
Also, be sure when you use value types that you are not using them in ways that implicitly box them and put them on the heap, which might be easy to do coming from Java.
Finally, be sure to find a good memory profiler.
I have the same thought all the time.
The truth is, though we were taught to watch out for unnecessary CPU tacts and memory consumption, the cost of little imperfections in our code just negligible in practice.
If you are aware of that and always watch, I believe you are okay to write not perfect code. If you have started with .NET/Java and have no prior experience in low level programming, the chances are you will write very abusive and ineffective code.
And anyway, as they say, the premature optimization is the root of all evil. You can spend hours optimizing one little function and it turns out then that some other part of code gives a bottleneck. Just keep balance doing it simple and doing it stupidly.
Although the Garbage Collector is there, bad code remains bad code. Therefore I would say yes as a .Net developer you should still care about how many objects you create and more importantly writing optimized code.
I have seen a considerable amount of projects get rejected because of this reason in Code Reviews inside our environment, and I strongly believe it is important.
.NET Memory management is very good and being able to programmatically tweak the GC if you need to is good.
I like the fact that you can create your own Dispose methods on classes by inheriting from IDisposable and tweaking it to your needs. This is great for making sure that connections to networks/files/databases are always cleaned up and not leaking that way. There is also the worry of cleaning up too early as well.
You might consider writing a set of object caches. Instead of creating new instances, you could keep a list of available objects somewhere. It would help you avoid the GC.
I agree with all points said above: the garbage collector is great, but it shouldn't be used as a crutch.
I've sat through many wasted hours in code-reviews debating over finer points of the CLR. The best definitive answer is to develop a culture of performance in your organization and actively profile your application using a tool. Bottlenecks will appear and you address as needed.
I think you answered your own question -- if it becomes a problem, then yes! I don't think this is a .Net vs. Java question, it's really a "should we go to exceptional lengths to avoid doing certain types of work" question. If you need better performance than you have, and after doing some profiling you find that object instantiation or garbage collection is taking tons of time, then that's the time to try some unusual approach (like the pooling you mentioned).
I wish I were a "software legend" and could speak of this in my own voice and breath, but since I'm not, I rely upon SL's for such things.
I suggest the following blog post by Andrew Hunter on .NET GC would be helpful:
http://www.simple-talk.com/dotnet/.net-framework/understanding-garbage-collection-in-.net/
Even beyond performance aspects, the semantics of a method which modifies a passed-in mutable object will often be cleaner than those of a method which returns a new mutable object based upon an old one. The statements:
munger.Munge(someThing, otherParams);
someThing = munger.ComputeMungedVersion(someThing, otherParams);
may in some cases behave identically, but while the former does one thing, the latter will do two--equivalent to:
someThing = someThing.Clone(); // Or duplicate it via some other means
munger.Munge(someThing, otherParams);
If someThing is the only reference, anywhere in the universe, to a particular object, then replacing it with a reference to a clone will be a no-op, and so modifying a passed-in object will be equivalent to returning a new one. If, however, someThing identifies an object to which other references exist, the former statement would modify the object identified by all those references, leaving all the references attached to it, while the latter would cause someThing to become "detached".
Depending upon the type of someThing and how it is used, its attachment or detachment may be moot issues. Attachment would be relevant if some object which holds a reference to the object could modify it while other references exist. Attachment is moot if the object will never be modified, or if no references outside someThing itself can possibly exist. If one can show that either of the latter conditions will apply, then replacing someThing with a reference to a new object will be fine. Unless the type of someThing is immutable, however, such a demonstration would require documentation beyond the declaration of someThing, since .NET provides no standard means of annotating that a particular reference will identify an object which--despite its being of mutable type--nobody is allowed to modify, nor of annotating that a particular reference should be the only one anywhere in the universe that identifies a particular object.