To cache or not to cache - GetCustomAttributes

To cache or not to cache - GetCustomAttributes - c#

I currently have a function:
public static Attribute GetAttribute(MemberInfo Member, Type AttributeType)
{
Object[] Attributes = Member.GetCustomAttributes(AttributeType, true);
if (Attributes.Length > 0)
return (Attribute)Attributes[0];
else
return null;
}
I am wondering if it would be worthwhile caching all the attributes on a property into a
Attribute = _cache[MemberInfo][Type] dictionary,
This would require using GetCustomAttributes without any type parameter then enumerating over the result. Is it worth it?

You will get better bangs for your bucks if you replace the body of your method with this:
return Attribute.GetCustomAttribute(Member, AttributeType,false); // only look in the current member and don't go up the inheritance tree.
If you really need to cache on a type-basis:
public static class MyCacheFor<T>
{
static MyCacheFor()
{
// grab the data
Value = ExtractExpensiveData(typeof(T));
}
public static readonly MyExpensiveToExtractData Value;
private static MyExpensiveToExtractData ExtractExpensiveData(Type type)
{
// ...
}
}
Beats dictionary lookups everytime. Plus it's threadsafe:)
Cheers,
Florian
PS: Depends how often you call this. I had some cases where doing a lot of serialization using reflection really called for caching, as usual, you want to measure the performance gain versus the memory usage increase. Instrument your memory use and profile your CPU time.

The only way you can know for sure, is to profile it. I am sorry if this sounds like a cliche. But the reason why a saying is a cliche is often because it's true.
Caching the attribute is actually making the code more complex, and more error prone. So you might want to take this into account-- your development time-- before you decide.
So like optimization, don't do it unless you have to.
From my experience ( I am talking about AutoCAD-like Windows Application, with a lot of click-edit GUI operations and heavy number crunching), the reading of custom attribute is never--even once-- the performance bottleneck.

I just had a scenario where GetCustomAttributes turned out to be the performance bottleneck. In my case it was getting called hundreds of thousands of times in a dataset with many rows and this made the problem easy to isolate. Caching the attributes solved the problem.
Preliminary testing led to a barely noticeable performance hit at about 5000 calls on a modern day machine. (And it became drastically more noticeable as the dataset size increased.)
I generally agree with the other answers about premature optimization, however, on a scale of CPU instruction to DB call, I'd suggest that GetCustomAttributes would lean more towards the latter.

Your question is a case of premature optimization.
You don't know the inner workings of the reflection classes and therefore are making assumptions about the performance implications of calling GetCustomAttributes multiple times. The method itself could well cache its output already, meaning your code would actually add overhead with no performance improvement.
Save your brain cycles for thinking about things which you already know are problems!

Old question but GetCustomAttributes is costly/heavyweight
Using a cache if it is causing performance problems can be a good idea
The article I linked: (Dodge Common Performance Pitfalls to Craft Speedy Applications) was taken down but here a link to an archived version:
https://web.archive.org/web/20150118044646/http://msdn.microsoft.com:80/en-us/magazine/cc163759.aspx

Are you actually having a performance problem? If not then don't do it until you need it.
It might help depending on how often you call the method with the same paramters. If you only call it once per MemberInfo, Type combination then it won't do any good. Even if you do cache it you are trading speed for memory consumption. That might be fine for your application.

Related

Which is better in accessing a property value?

Which is better in accessing a property value?
Accessing like this
propertyobjA.objB.Prop1
propertyobjA.objB.Prop2
or assign to var
var objB = propertyobjA.objB;
then call objB.Prop1 and objB.Prop1
Which one improves performance in c#?

To be perfectly the honest, the answer is likely that the second will be faster, but I can pretty much guarantee that it will not matter in the slightest. You should be careful of thinking too hard about optimisation too early. 99% of all performance issues are down to much larger issues such as hitting a database too frequently, etc., not trivial issues like this. Even if there was a tiny difference between the two cases, unless this is some of the most time-critical software on the planet, what matters is readability (not that either are hard to read in this case), not which is faster.

It depends on what objB is. If you are calculating something (which you shouldn't do but can do) then of course assigning it to a value will yield better performance.
Another note, you should avoid having dependencies on sub properties of a variable, since you are putting a higher coupling between the classes.

I think this won't make a big difference performancewise (second alternative might be a bit faster). But this is not the place where your performance problems (if any) come from.
UPDATE: Thinking about, the value of propertyobjA.objB could change between getting Prop1 and Prop2, so the two alternatives cannot be considered as being the same code.

The impact to performance largely depends on the implementation of the propertyObjA.objB property getter. For instance, if it is simply implemented as:
public Foo objB { get { return this._objB; } }
Then calling that twice will have a negligible impact on performance.
If, however, that same property did something computationally expensive, then your second suggestion would perform better.
That being said, the framework guidelines state that you should not use property getters to hide potentially computationally expensive operations, instead preferring a method call instead, e.g.:
public objB ComputeB ();

You really ought to not concern yourself with things like that when writing code in a higher level language such as c#.
Modern compilers of such languages as c# and java are extremely sofisticated and will perform all kinds optimizations on your code. The end result for you as developer is that you will never see a difference in performance when writing a particular trivial piece of code one way or the other. The compiler will pick the most optimal way.
Everything else is down to preference. If you like to chain several property accesses, that's fine. If you like to assign an intermediate result to a variable to improve readability of your code, that's fine too.

Does inverting the "if" improve performance? [duplicate]

When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".

It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.

A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.

This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.

As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.

The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.

I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause

It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.

This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.

Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.

Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.

Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.

There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.

Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.

Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).

Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).

Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.

In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.

That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.

There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.

It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.

In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.

I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.

I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"

My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.

There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.

Faster assignment or check for bool values

The question is simple, which is faster between CalledOften1 and CalledOften2
class MyTest
{
public bool test = false;
void CalledOften1()
{
if (!test) test = true;
DoSomething();
}
void CalledOften2()
{
test = true;
DoSomething();
}
}
Is the compiler optimized (if possible) to avoid future assignments of test if it's already true?
UPDATE:
This question is just an information, I will not use the if (bla) style if I can write test=true, I prefer code readability.

I prefer to measure for these sorts of questions rather than guess:
CalledOften1: 52 million operations per second
CalledOften2: 53 million operations per second
So they are nearly the same. If anything, the simpler method is also the faster.

This is a perfect example of premature optimization.
If you want to set test to true every time, just set it. Don't complicate your code for a theorized speedup.
That being said, the reduced instruction set of the second example, along with being simpler and more maintainable, is most likely faster due to avoiding the branching and reducing the number of instructions. A single assignment of a bool is a very fast operation. If you really need to know how much faster it may be, I would profile this yourself. However, I suspect that either would be fast enough in any case.

I would expect the second version to be slightly faster, given that it doesn't involve any branching. It also expresses the intention of "make sure the variable is true, whatever it was before" more clearly IMO. However:
I doubt that it's significant
Any number of actual changes in context could make the results change (including your code, or the version of the framework you're running against)
Write the clearest code first, and optimize later
Benchmark this against your real code, under realistic conditions before you decide to change anything

Compiler optimizes only something that is definite at compile time. This is changed at runtime so answer is no. Compiler could optimize if you were checking against constant. CalledOften1 is faster, but the magnitude is so small that you would not notice. This is kind of microptimisation you should avoid.

If I had to guess, I would say that CalledOften2 is more optimized, as there is no logic test operation done.
In the end, if you are looking at this level of optimization, then your application will probably go as fast as it can. Any performance gain you get out of this type of optimization will likely never be noticed by anyone.
My two cents,
Brian

Premature optimization is the root of all evil. Use the one that expresses your intent most clearly.
(I'm guessing a read+branch is going to be more expensive than just a write, but don't really know the CLR. The important thing is that computers are increasing in speed exponentially, and programmers aren't. Algorithmic improvements in performance bottlenecks are worth exploring, barely measurable constant-time improvements for their own sake aren't.)

How to get optimization from a "pure function" in C#?

If I have the following function, it is considered pure in that it has no side effects and will always produce the same result given the same input x.
public static int AddOne(int x) { return x + 1; }
As I understand it, if the runtime understood the functional purity it could optimize execution so that return values wouldn't have to be re-calculated.
Is there a way to achieve this kind of runtime optimization in C#? And I assume there is a name for this kind of optimization. What's it called?
Edit: Obviously, my example function wouldn't have a lot of benefit from this kind of optimization. The example was given to express the type of purity I had in mind rather than the real-world example.

As others have noted, if you want to save on the cost of re-computing a result you've already computed, then you can memoize the function. This trades increased memory usage for increased speed -- remember to clear your cache occasionally if you suspect that you might run out of memory should the cache grow without bound.
However, there are other optimizations one can perform on pure functions than memoizing their results. For example, pure functions, having no side effects, are usually safe to call on other threads. Algorithms which use a lot of pure functions can often be parallelized to take advantage of multiple cores.
This area will become increasingly important as massively multi-core machines become less expensive and more common. We have a long-term research goal for the C# language to figure out some way to take advantage of the power of pure functions (and impure but "isolated" functions) in the language, compiler and runtime. But doing so involves many difficult problems, problems about which there is little consensus in industry or academia as to the best approach. Top minds are thinking about it, but do not expect any major results any time soon.

if the calculation was a costly one, you could cache the result in a dictionary?
static Dictionary<int, int> cache = new Dictionary<int, int>();
public static int AddOne(int x)
{
int result;
if(!cache.TryGetValue(x, out result))
{
result = x + 1;
cache[x] = result;
}
return result;
}
of course, the dictionary lookup in this case is more costly than the add :)
There's another much cooler way to do functional memoization explained by Wes Dyer here: http://blogs.msdn.com/wesdyer/archive/2007/01/26/function-memoization.aspx - if you do a LOT of this caching, then his Memoize function might save you a lot of code...

I think you're looking for functional memoization

The technique you are after is memoization: cache the results of execution, keyed off the arguments passed in to the function, in an array or dictionary. Runtimes do not tend to apply it automatically, although there are certainly cases where they would. Neither C# nor .NET applies memoization automatically. You can implement memoization yourself - it's rather easy -, but doing so is generally useful only for slower pure functions where you tend to repeat calculations and where you have enough memory.

This will probably be inlined (aka inline expansion) by the compiler ...
Just make sure you compile your code with the "Optimize Code" flag set (in VS : project properties / build tab / Optimize Code)
The other thing you can do is to cache the results (aka memoization). However, there is a huge initial performance hit due to your lookup logic, so this is interesting only for slow functions (ie not an int addition).
There is also a memory impact, but this can be managed through a clever use of weak references.
As I understand it, if the runtime
understood the functional purity it
could optimize execution so that
return values wouldn't have to be
re-calculated.
In your example, the runtime WILL have to compute the result, unless x is known at compile time. In that case, your code will be further optimized through the use of constant folding

How could the compiler do that ? How does it know what values of x are going to be passed in at runtime?
and re: other answers that mention inlining...
My understanding is that inlining (as an optimization) is warranted for small functions that are used only once (or only a very few times...) not because they have no side effects...

A compiler can optimize this function through a combination of inlining (replacing a function call with the body of that function at the call site) and constant propagation (replacing an expression with no free variables with the result of that expression). For example, in this bit of code:
AddOne(5);
AddOne can be inlined:
5 + 1;
Constant propagation can then simplify the expression:
6;
(Dead code elimination can then simplify this expression even further, but this is just an example).
Knowing that AddOne() has no side effects might also enable the a compiler to perform common subexpression elimination, so that:
AddOne(3) + AddOne(3)
may be transformed to:
int x = AddOne(3);
x + x;
or by strength reduction, even:
2*AddOne(3);
There is no way to command the c# JIT compiler to perform these optimizations; it optimizes at its own discretion. But it's pretty smart, and you should feel comfortable relying on it to perform these sorts of transformations without your intervention.

Another option is to use a fody plugin https://github.com/Dresel/MethodCache
you can decorate methods that should be cached. When using this you should of course take into consideration all the comments mentioned in the other answers.

Invert "if" statement to reduce nesting

When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".

It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.

A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.

This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.

As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.

The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.

I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause

It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.

This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.

Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.

Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.

Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.

There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.

Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.

Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).

Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).

Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.

In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.

That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.

There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.

It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.

In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.

I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.

I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"

My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.

There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

To cache or not to cache - GetCustomAttributes - c#

Related

Which is better in accessing a property value?

Does inverting the "if" improve performance? [duplicate]

Faster assignment or check for bool values

How to get optimization from a "pure function" in C#?

Invert "if" statement to reduce nesting

Categories

Resources