If the statement above is correct, then why when I use reflector on .Net BCL I see it is used a lot?
EDIT: let me rephrase: are all the GO-TO's I see in reflector written by humans or compilers?
I think the following excerpt from the Wikipedia Article on Goto is particularly relevant here:
Probably the most famous criticism of
GOTO is a 1968 letter by Edsger
Dijkstra called Go To Statement
Considered Harmful. In that letter
Dijkstra argued that unrestricted GOTO
statements should be abolished from
higher-level languages because they
complicated the task of analyzing and
verifying the correctness of programs
(particularly those involving loops).
An alternative viewpoint is presented
in Donald Knuth's Structured
Programming with go to Statements
which analyzes many common programming
tasks and finds that in some of them
GOTO is the optimal language construct
to use.
So, on the one hand we have Edsger Dijkstra (a incredibly talented computer scientist) arguing against the use of the GOTO statement, and specifically arguing against the excessive use of the GOTO statement on the grounds that it is a much less structured way of writing code.
On the other hand, we have Donald Knuth (another incredibly talented computer scientist) arguing that using GOTO, especially using it judiciously can actually be the "best" and most optimal construct for a given piece of program code.
Ultimately, IMHO, I believe both men are correct. Dijkstra is correct in that overuse of the GOTO statement certainly makes a piece of code less readable and less structured, and this is certainly true when viewing computer programming from a purely theoretical perspective.
However, Knuth is also correct as, in the "real world", where one must take a pragmatic approach, the GOTO statement when used wisely can indeed be the best choice of language construct to use.
The above isn't really correct - it was a polemical device used by Dijkstra at a time when gotos were about the only flow control structure in use. In fact, several people have produced rebuttals, including Knuth's classic "Structured Programming Using Goto" paper (title from memory). And there are some situations (error handling, state machines) where gotos can produce clearer code (IMHO), than the "structured" alternatives.
These goto's are very often generated by the compiler, especially inside enumerators.
The compiler always knows what she's doing.
If you find yourself in the need to use goto, you should make sure it is the only option. Most often you'll find there's a better solution.
Other than that, there are very few instances the use of goto can be justified, such as when using nested loops. Again, there are other options in this case still. You could break out the inner loop in a function and use a return statement instead. You need to look closely if the additional method call is really too costly.
In response to your edit:
No, not all gotos are compiler generated, but a lot of them result from compiler generated state machines (enumerators), switch case statements or optimized if else structures. There are only a few instances you'll be able to judge whether it was the compiler or the original developer. You can get a good hint by looking at the function/class name, a compiler will generate "forbidden" names to avoid name clashes with your code. If everything looks normal and the code has not been optimized or obfuscated the use of goto is probably intended.
Keep in mind that the code you are seeing in Reflector is a disassembly -- Reflector is looking at the compiled byte codes and trying to piece together the original source code.
With that, you must remember that rules against gotos apply to high-level code. All the constructs that are used to replace gotos (for, while, break, switch etc) all compile down to code using JMPs.
So, Reflector looks at code much like this:
A:
if !(a > b)
goto B;
DoStuff();
goto A;
B: ...
And must realize that it was actually coded as:
while (a > b)
DoStuff();
Sometimes the code being read to too complicated for it to recognize the pattern.
Go To statement itself is not harmful, it is even pretty useful sometimes. Harmful are users who tend to put it in inappropriate places in their code.
When compiled down to assembly code, all control structured and converted to (un)conditional jumps. However, the optimizer may be too powerful, and when the disassembler cannot identify what control structure a jump pattern corresponds to, the always-correct statement, i.e. goto label; will be emitted.
This has nothing to do with the harm(ful|less)ness of goto.
what about a double loop or many nested loops, of which you have break out, for ex.
foreach (KeyValuePair<DateTime, ChangedValues> changedValForDate in _changedValForDates)
{
foreach (KeyValuePair<string, int> TypVal in changedValForDate.Value.TypeVales)
{
RefreshProgress("Daten werden geändert...", count++, false);
if (IsProgressCanceled)
{
goto TheEnd; //I like goto :)
}
}
}
TheEnd:
in this case you should consider that here the following should be done with break:
foreach(KeyValuePair<DateTime, ChangedValues> changedValForDate in _changedValForDates)
{
foreach (KeyValuePair<string, int> TypVal in changedValForDate.Value.TypeVales)
{
RefreshProgress("Daten werden geändert...", count++, false);
if (IsProgressCanceled)
{
break; //I miss goto :|
}
}
if (IsProgressCanceled)
{
break; //I really miss goto now :|
}//waaaAA !! so many brakets, I hate'm
}
The general rule is that you don't need to use goto. As with any rule there are of course exceptions, but as with any exceptions they are few.
The goto command is like a drug. If it's used in limited amounts only in special situations, it's good. If you use too much all the time, it will ruin your life.
When you are looing at the code using Reflector, you are not seeing the actual code. You are seeing code that is recreated from what the compiler produced from the original code. When you see a goto in the recreated code, it's not certain that there was a goto in the original code. There might be a more structured command to control the flow, like a break or a continue which has been implemented by the compiler in the same way as a goto, so that Reflector can't tell the difference.
goto considered harmful (for human to use but
for computers its okay).
because no matter how madly we(human) use goto, compiler always knows how to read the code.
Believe me...
Reading others code with gotos in it is HARD. Reading your own code with gotos in it is HARDER.
That is why you see it used in low level (machine languages) and not in high level (human languages e.g. C#,Python...) ;)
"C provides the infinitely-abusable goto statement, and labels to branch to. Formally, the goto is never necessary, and in practice it is almost always easy to write code without it. We have not used goto in this book."
-- K&R (2nd Ed.) : Page 65
I sometimes use goto when I want to perform a termination action:
static void DoAction(params int[] args)
{
foreach (int arg in args)
{
Console.WriteLine(arg);
if (arg == 93) goto exit;
}
//edit:
if (args.Length > 3) goto exit;
//Do another gazillion actions you might wanna skip.
//etc.
//etc.
exit:
Console.Write("Delete resource or whatever");
}
So instead of hitting return, I send it to the last line that performs another final action I can refer to from various places in the snippet instead of just terminating.
In decompiled code, virtually all gotos that you see will be synthetic. Don't worry about them; they're an artifact of how the code is represented at the low level.
As to valid reasons for putting them in your own code? The main one I can think of is where the language you are using does not provide a control construct suitable for the problem you are tackling; languages which make it easy to make custom control flow systems typically don't have goto at all. It's also always possible to avoid using them at all, but rearranging arbitrarily complex code into a while loop and lots of conditionals with a whole battery of control variables... that can actually make the code even more obscure (and slower too; compilers usually aren't smart enough to pick apart such complexity). The main goal of programming should be to produce a description of a program that is both clear to the computer and to the people reading it.
If it's harmful or not, it's a matter of likes and dislikes of each one. I personally don't like them, and find them very unproductive as they attempt maintainability of the code.
Now, one thing is how gotos affect our reading of the code, while another is how the jitter performs when found one. From Eric Lippert's Blog, I'd like to quote:
We first run a pass to transform loops into gotos and labels.
So, in fact the compiler transforms pretty each flow control structure into goto/label pattern while emitting IL. When reflector reads the IL of the assembly, it recognizes the pattern, and transforms it back to the appropriate flow control structure.
In some cases, when the emitted code is too complicated for reflector to understand, it just shows you the C# code that uses labels and gotos, equivalent to the IL it's reading. This is the case for example when implementing IEnumerable<T> methods with yield return and yield break statements. Those kind of methods get transformed into their own classes implementing the IEnumerable<T> interface using an underlying state machine. I believe in BCL you'll find lots of this cases.
GOTO can be useful, if it's not overused as stated above. Microsoft even uses it in several instances within the .NET Framework itself.
These goto's are very often generated by the compiler, especially inside enumerators. The compiler always knows what she's doing.
If you find yourself in the need to use goto, you should make sure it is the only option. Most often you'll find there's a better solution.
Other than that, there are very few instances the use of goto can be justified, such as when using nested loops. Again, there are other options in this case still. You could break out the inner loop in a function and use a return statement instead. You need to look closely if the additional method call is really too costly.
Related
Consider the situation in which the main logic of a method should only actually run given a certain condition. As far as I know, there are two basic ways to achieve this:
If inverse condition is true, simply return:
public void aMethod(){
if(!aBoolean) return;
// rest of method code goes here
}
or
If original condition is true, continue execution:
public void aMethod(){
if(aBoolean){
// rest of method code goes here
}
}
Now, I would guess that which of these implementations is more efficient is dependent on the language its written in and/or how if statements and return statements, and possibly method calls, are implemented by the compiler/interpreter/VM (depending on language); so the first part of my question is, is this true?
The second part of my question is, if the the answer to the first part is "yes", which of the above code-flow patterns is more efficient specifically in C#/.NET 4.6.x?
Edit:
In reference to Dark Falcon's comment: the purpose of this question is not actually to fix performance issues or optimize any real code I've written, I am just curious about how each piece of each pattern is implemented by the compiler, e.g. for arguments sake, if it was compiled verbatim with no compiler optimizations, which would be more efficient?
TL;DR It doesn't make a difference. Current generations of processors (circa Ivy Bridge and later) don't use a static branch-prediction algorithm that you can reason about anymore, so there is no possible performance gain in using one form or the other.
On most older processors, the static branch-prediction strategy is generally that forward conditional jumps are assumed to be taken, while backwards conditional jumps are assumed not-taken. Therefore, there might be a small performance advantage to be gained the first time the code is executed by arranging for the fall-through case to be the most likely—i.e.,
if { expected } else { unexpected }.
But the fact is, this kind of low-level performance analysis makes very little sense when writing in a managed, JIT-compiled language like C#.
You're getting a lot of answers that say readability and maintainability should be your primary concern when writing code. This is regrettably common with "performance" questions, and while it is completely true and unarguable, it mostly skirts the question instead of answering it.
Moreover, it isn't clear why form "A" would be intrinsically more readable than form "B", or vice versa. There are just as many arguments one way or the other—do all parameter validation at the top of the function, or ensure there is only a single return point—and it ultimately gets down to doing what your style guide says, except in really egregious cases where you'd have to contort the code in all sorts of terrible ways, and then you should obviously do what is most readable.
Beyond being a completely reasonable question to ask on conceptual/theoretical grounds, understanding the performance implications also seems like an excellent way to make an informed decision about which general form to adopt when writing your style guide.
The remainder of the existing answers consist of misguided speculation, or downright incorrect information. Of course, that makes sense. Branch prediction is complicated, and as processors get smarter, it only gets harder to understand what is actually happening (or going to happen) under the hood.
First, let's get a couple of things straight. You make reference in the question to analyzing the performance of unoptimized code. No, you don't ever want to do that. It is a waste of time; you'll get meaningless data that does not reflect real-world usage, and then you'll try and draw conclusions from that data, which will end up being wrong (or maybe right, but for the wrong reasons, which is just as bad). Unless you're shipping unoptimized code to your clients (which you shouldn't be doing), then you don't care how unoptimized code performs. When writing in C#, there are effectively two levels of optimization. The first is performed by the C# compiler when it is generating the intermediate language (IL). This is controlled by the optimization switch in the project settings. The second level of optimization is performed by the JIT compiler when it translates the IL into machine code. This is a separate setting, and you can actually analyze the JITed machine code with optimization enabled or disabled. When you're profiling or benchmarking, or even analyzing the generated machine code, you need to have both levels of optimizations enabled.
But benchmarking optimized code is difficult, because the optimization often interferes with the thing you're trying to test. If you tried to benchmark code like that shown in the question, an optimizing compiler would likely notice that neither one of them is actually doing anything useful and transform them into no-ops. One no-op is equally fast as another no-op—or maybe it's not, and that's actually worse, because then all you're benchmarking is noise that has nothing to do with performance.
The best way to go here is to actually understand, on a conceptual level, how the code is going to be transformed by a compiler into machine code. Not only does that allow you to escape the difficulties of creating a good benchmark, but it also has value above and beyond the numbers. A decent programmer knows how to write code that produces correct results; a good programmer knows what is happening under the hood (and then makes an informed decision about whether or not they need to care).
There has been some speculation about whether the compiler will transform form "A" and form "B" into equivalent code. It turns out that the answer is complicated. The IL will almost certainly be different because it will be a more or less literal translation of the C# code that you actually write, regardless of whether or not optimizations are enabled. But it turns out that you really don't care about that, because IL isn't executed directly. It's only executed after the JIT compiler gets done with it, and the JIT compiler will apply its own set of optimizations. The exact optimizations depend on exactly what type of code you've written. If you have:
int A1(bool condition)
{
if (condition) return 42;
return 0;
}
int A2(bool condition)
{
if (!condition) return 0;
return 42;
}
it is very likely that the optimized machine code will be the same. In fact, even something like:
void B1(bool condition)
{
if (condition)
{
DoComplicatedThingA();
DoComplicatedThingB();
}
else
{
throw new InvalidArgumentException();
}
}
void B2(bool condition)
{
if (!condition)
{
throw new InvalidArgumentException();
}
DoComplicatedThingA();
DoComplicatedThingB();
}
will be treated as equivalent in the hands of a sufficiently capable optimizer. It is easy to see why: they are equivalent. It is trivial to prove that one form can be rewritten in the other without changing the semantics or behavior, and that is precisely what an optimizer's job is.
But let's assume that they did give you different machine code, either because you wrote complicated enough code that the optimizer couldn't prove that they were equivalent, or because your optimizer was just falling down on the job (which can sometimes happen with a JIT optimizer, since it prioritizes speed of code generation over maximally efficient generated code). For expository purposes, we'll imagine that the machine code is something like the following (vastly simplified):
C1:
cmp condition, 0 // test the value of the bool parameter against 0 (false)
jne ConditionWasTrue // if true (condition != 1), jump elsewhere;
// otherwise, fall through
call DoComplicatedStuff // condition was false, so do some stuff
ret // return
ConditionWasTrue:
call ThrowException // condition was true, throw an exception and never return
C2:
cmp condition, 0 // test the value of the bool parameter against 0 (false)
je ConditionWasFalse // if false (condition == 0), jump elsewhere;
// otherwise, fall through
call DoComplicatedStuff // condition was true, so do some stuff
ret // return
ConditionWasFalse:
call ThrowException // condition was false, throw an exception and never return
That cmp instruction is equivalent to your if test: it checks the value of condition and determines whether it's true or false, implicitly setting some flags inside the CPU. The next instruction is a conditional branch: it branches to the specification location/label based on the values of one or more flags. In this case, je is going to jump if the "equals" flag is set, while jne is going to jump if the "equals" flag is not set. Simple enough, right? This is exactly how it works on the x86 family of processors, which is probably the CPU for which your JIT compiler is emitting code.
And now we get to the heart of the question that you're really trying to ask; namely, does it matter whether we execute a je instruction to jump if the comparison set the equal flag, or whether we execute a jne instruction to jump if the comparison did not set the equal flag? Again, unfortunately, the answer is complicated, but enlightening.
Before continuing, we need to develop some understanding of branch prediction. These conditional jumps are branches to some arbitrary section in the code. A branch can either be taken (which means the branch actually happens, and the processor begins executing code found at a completely different location), or it can be not taken (which means that execution falls through to the next instruction as if the branch instruction wasn't even there). Branch prediction is very important because mispredicted branches are very expensive on modern processors with deep pipelines that use speculative execution. If it predicts right, it continues uninterrupted; however, if it predicts wrong, it has to throw away all of the code that it speculatively executed and start over. Therefore, a common low-level optimization technique is replacing branches with clever branchless code in cases where the branch is likely to be mispredicted. A sufficiently smart optimizer would turn if (condition) { return 42; } else { return 0; } into a conditional move that didn't use a branch at all, regardless of which way you wrote the if statement, making branch prediction irrelevant. But we're imagining that this didn't happen, and you actually have code with a conditional branch—how does it get predicted?
How branch prediction works is complicated, and getting more complicated all the time as CPU vendors continue to improve the circuitry and logic inside of their processors. Improving branch prediction logic is a significant way that hardware vendors add value and speed to the things they're trying to sell, and every vendor uses different and proprietary branch-prediction mechanisms. Worse, every generation of processor uses slightly different branch-prediction mechanisms, so reasoning about it in the "general case" is exceedingly difficult. Static compilers offer options that allow you to optimize the code they generate for a particular generation of microprocessor, but this doesn't generalize well when shipping code to a large number of clients. You have little choice but to resort to a "general purpose" optimization strategy, although this usually works pretty well. The big promise of a JIT compiler is that, because it compiles the code on your machine right before you use it, it can optimize for your specific machine, just like a static compiler invoked with the perfect options. This promise hasn't exactly been reached, but I won't digress down that rabbit hole.
All modern processors have dynamic branch prediction, but how exactly they implement it is variable. Basically, they "remember" whether a particular (recent) branch was taken or not taken, and then predict that it will go this way the next time. There are all kinds of pathological cases that you can imagine here, and there are, correspondingly, all kinds of cases in or approaches to the branch-prediction logic that help to mitigate the possible damage. Unfortunately, there isn't really anything you can do yourself when writing code to mitigate this problem—except getting rid of branches entirely, which isn't even an option available to you when writing in C# or other managed languages. The optimizer will do whatever it will; you just have to cross your fingers and hope that it is the most optimal thing. In the code we're considering, then, dynamic branch prediction is basically irrelevant and we won't talk about it any more.
What is important is static branch prediction—what prediction is the processor going to make the first time it executes this code, the first time it encounters this branch, when it doesn't have any real basis on which to make a decision? There are a bunch of plausible static prediction algorithms:
Predict all branches are not taken (some early processors did, in fact, use this).
Assume "backwards" conditional branches are taken, while "forwards" conditional branches are not taken. The improvement here is that loops (which jump backwards in the execution stream) will be correctly predicted most of the time. This is the static branch-prediction strategy used by most Intel x86 processors, up to about Sandy Bridge.
Because this strategy was used for so long, the standard advice was to arrange your if statements accordingly:
if (condition)
{
// most likely case
}
else
{
// least likely case
}
This possibly looks counter-intuitive, but you have to go back to what the machine code looks like that this C# code will be transformed into. Compilers will generally transform the if statement into a comparison and a conditional branch into the else block. This static branch prediction algorithm will predict that branch as "not taken", since it's a forward branch. The if block will just fall through without taking the branch, which is why you want to put the "most likely" case there.
If you get into the habit of writing code this way, it might have a performance advantage on certain processors, but it's never enough of an advantage to sacrifice readability. Especially since it only matters the first time the code is executed (after that, dynamic branch prediction kicks in), and executing code for the first time is always slow in a JIT-compiled language!
Always use the dynamic predictor's result, even for never-seen branches.
This strategy is pretty strange, but it's actually what most modern Intel processors use (circa Ivy Bridge and later). Basically, even though the dynamic branch-predictor may have never seen this branch and therefore may not have any information about it, the processor still queries it and uses the prediction that it returns. You can imagine this as being equivalent to an arbitrary static-prediction algorithm.
In this case, it absolutely does not matter how you arrange the conditions of an if statement, because the initial prediction is essentially going to be random. Some 50% of the time, you'll pay the penalty of a mispredicted branch, while the other 50% of the time, you'll benefit from a correctly predicted branch. And that's only the first time—after that, the odds get even better because the dynamic predictor now has more information about the nature of the branch.
This answer has already gotten way too long, so I'll refrain from discussing static prediction hints (implemented only in the Pentium 4) and other such interesting topics, bringing our exploration of branch prediction to a close. If you're interested in more, examine the CPU vendor's technical manuals (although most of what we know has to be empirically determined), read Agner Fog's optimization guides (for x86 processors), search online for various white-papers and blog posts, and/or ask additional questions about it.
The takeaway is probably that it doesn't matter, except on processors that use a certain static branch-prediction strategy, and even there, it hardly matters when you're writing code in a JIT-compiled language like C# because the first-time compilation delay exceeds the cost of a single mispredicted branch (which may not even be mispredicted).
Same issue when validating parameters to functions.
It's much cleaner to act like a night-club bouncer, kicking the no-hopers out as soon as possible.
public void aMethod(SomeParam p)
{
if (!aBoolean || p == null)
return;
// Write code in the knowledge that everything is fine
}
Letting them in only causes trouble later on.
public void aMethod(SomeParam p)
{
if (aBoolean)
{
if (p != null)
{
// Write code, but now you're indented
// and other if statements will be added later
}
// Later on, someone else could add code here by mistake.
}
// or here...
}
The C# language prioritizes safety (bug prevention) over speed. In other words, almost everything has been slowed down to prevent bugs, one way or another.
If you need speed so badly that you start worrying about if statements, then perhaps a faster language would suit your purposes better, possibly C++
Compiler writers can and do make use of statistics to optimize code, for example "else clauses are only executed 30% of the time".
However, the hardware guys probably do a better job of predicting execution paths. I would guess that these days, the most effective optimizations happen within the CPU, with their L1 and L2 caches, and compiler writers don't need to do a thing.
I am just curious about how each piece of each pattern is implemented
by the compiler, e.g. for arguments sake, if it was compiled verbatim
with no compiler optimizations, which would be more efficient?
The best way to test efficiency in this way is to run benchmarks on the code samples you're concerned with. With C# in particular it is not going to be obvious what the JIT is doing with these scenarios.
As a side note, I throw in a +1 for the other answers that point out that efficiency isn't only determined at the compiler level - code maintainability involves magnitudes of levels of efficiency more than what you'll get from this specific sort of pattern choice.
As [~Dark Falcon] mentioned you should not be concerned by micro optimization of little bits of code, the compiler will most probably optimize both approaches to the same thing.
Instead you should be very concerned about your program maintainability and ease of read
From this perspective you should choose B for two reasons:
It only has one exit point (just one return)
The if block is surrounded by curly braces
edit
But hey! as told in the comments that is just my opinion and what I consider good practices
When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".
It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.
A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.
This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.
As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.
The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.
I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause
It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.
This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.
Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.
Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.
Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.
There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.
Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.
Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).
Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).
Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.
In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.
That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.
There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.
It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.
In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.
I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.
I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"
My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.
There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.
I was reading through some C# code of mine today and found this line:
if (ProgenyList.ItemContainerGenerator.Status != System.Windows.Controls.Primitives.GeneratorStatus.ContainersGenerated) return;
Notice that you can tell without scrolling that it's an "if" statement that works with ItemContainerGenerator.Status, but you can't easily tell that if the "if" clause evaluates to "true" the method will return at that point.
Realistically I should have moved the "return" statement to a line by itself, but it got me thinking about languages that allow the "then" part of the statement first. If C# permitted it, the line could look like this:
return if (ProgenyList.ItemContainerGenerator.Status != System.Windows.Controls.Primitives.GeneratorStatus.ContainersGenerated);
This might be a bit "argumentative", but I'm wondering what people think about this kind of construct. It might serve to make lines like the one above more readable, but it also might be disastrous. Imagine this code:
return 3 if (x > y);
Logically we can only return if x > y, because there's no "else", but part of me looks at that and thinks, "are we still returning if x <= y? If so, what are we returning?"
What do you think of the "then before the if" construct? Does it exist in your language of choice? Do you use it often? Would C# benefit from it?
Let's reformat that a bit and see:
using System.Windows.Controls.Primitives;
...
if (ProgenyList.ItemContainerGenerator.Status != GeneratorStatus.ContainersGenerated)
{
return;
}
Now how hard is it to see the return statement? Admittedly in SO you still need to scroll over to see the whole of the condition, but in an IDE you wouldn't have to... partly due to not trying to put the condition and the result on the same line, and party due to the using directive.
The benefit of the existing C# syntax is that the textual order reflects the execution order - if you want to know what will happen, you read the code from top to bottom.
Personally I'm not a fan of "return if..." - I'd rather reformat code for readability than change the ordering.
I don't like the ambiguity this invites. Consider the following code:
doSomething(x)
if (x > y);
doSomethingElse(y);
What is it doing? Yes, the compiler could figure it out, but it would look pretty confusing for a programmer.
Yes.
It reads better. Ruby has this as part of its syntax - the term being 'statement modifiers'
irb(main):001:0> puts "Yay Ruby!" if 2 == 2
Yay Ruby!
=> nil
irb(main):002:0> puts "Yay Ruby!" if 2 == 3
=> nil
To close, I need to stress that you need to 'use this with discretion'. The ruby idiom is to use this for one-liners. It can be abused - however I guess this falls into the realm of responsible development - don't constrain the better developers by building in restrictions to protect the poor ones.
It's look ugly for me. The existing syntax much better.
if (x > y) return 3;
I think it's probably OK if the scope were limited to just return statements. As I said in my comment, imagine if this were allowed:
{
doSomething();
doSomethingElse();
// 50 lines...
lastThink();
} if (a < b);
But even just allowing it only on return statements is probably a slippery slope. People will ask, "return x if (a); is allowed, so why not something like doSomething() if (a);?" and then you're on your way down the slope :)
I know other languages do get away with it, but C#'s philosophy is more about making The One Right WayTM easy and having more than one way to do something is usually avoided (though with exceptions). Personally, I think it works pretty well, because I can look at someone else's code and know that it's pretty much in the same style that I'd write it in.
I don't see any problem with
return 3 if (x > y);
It probably bothers you because you are not accustomed to the syntax. It is also nice to be able to say
return 3 unless y <= x
This is a nice syntax option, but I don't think that c# needs it.
I think Larry Wall was very smart when he put this feature into Perl. The idea is that you want to put the most important part at the beginning where it's easy to see. If you have a short statement (i.e. not a compound statement), you can put it before the if/while/etc. If you have a long (i.e. compound) statement, it goes in braces after the condition.
Personally I like languages that let me choose.
That said, if you refactor as well as reformat, it probably doesn't matter what style you use, because they will be equally readable:
using System.Windows.Controls.Primitives;
...
var isContainersGenerated =
ProgenyList.ItemContainerGenerator.Status == GeneratorStatus.ContainersGenerated;
if (!isContainersGenerated) return;
//alternatively
return if (!isContainersGenerated);
There is a concern reading the code that you think a statement will execute only later to find out it might execute.
For example if you read "doSomething(x)", you're thinking "okay so this calls doSomething(x)" but then you read the "if" after it and have to realise that the previous call is conditional on the if statement.
When the "if" is first you know immediately that the following code might happen and can treat it as such.
We tend to read sequentially, so reading and going in your mind "the following might happen" is a lot easier than reading and then realising everything you just read needs to be reparsed and that you need to evaluate everything to see if it's within the scope of your new if statement.
Both Perl and Ruby have this and it works fine. Personally I'm fine with as much functionality you want to throw at me. The more choices I have to write my code the better the overall quality, right? "The right tool for the job" and all that.
Realistically though, it's kind of a moot point since it's pretty late for such a fundamental addition in C#'s lifecycle. We're at the point where any minor syntax change would take a lot of work to implement due to the size of the compiler code and its syntax parsing algorithm. For a new addition to be even considered it would have to bring quite a bit of functionality, and this is just a (not so) different way of saying the same thing.
Humans read beginning to end. In analyzing code flow, limits of the short term memory make it more difficult to read postfix conditions due to additional backtracking required. For short expressions, this may not be a problem, but for longer expressions it will incur significant overhead for users that are not seasoned in the language they are reading.
Agreed with confusing , I never heard about this construction before , so I think correct way using then before if must always contents the result of else, like
return (x > y) ? 3 : null;
else way there is no point of using Imperative constructions like
return 3 if (x > y);
return 4 if (x = y);
return 5 if (x < y);
imho It's kinda weird, because I have no idea where to use it...
It's like a lot of things really, it makes perfect sense when you use it in a limited context(a one liner), and makes absolutely no sense if you use it anywhere else.
The problem with that of course is that it'd be almost impossible to restrict the use to where it makes sense, and allowing its use where it doesn't make sense is just odd.
I know that there's a movement coming out of scripting languages to try and minimize the number of lines of code, but when you're talking about a compiled language, readability is really the key and as much as it might offend your sense of style, the 4 line model is clearer than the reversed if.
I think it's a useful construct and a programmer would use it to emphasize what is important in the code and to de-emphasize what is not important. It is about writing intention-revealing code.
I use something like this (in coffeescript):
index = bla.find 'a'
return if index is -1
The most important thing in this code is to get out (return) if nothing is found - notice the words I just used to explain the intention were in the same order as that in the code.
So this construct helps me to code in a way which reflects my intention slightly better.
It shouldn't be too surprising to realize that the order in which correct English or traditional programming-language grammar has typically required, isn't always the most effective or simplest way to create meaning.
Sometimes you need to let everything hang out and truly reassess what is really the best way to do something.
It's considered grammatically incorrect to put the answer before the question, why would it be any different in code?
I'm trying to understand a method using the disassembly feature of Reflector. As anyone that's used this tool will know, certain code is displayed with C# labels that were (presumably) not used in the original source.
In the 110 line method I'm looking at there are 11 label statements. Random snippet examples:
Label_0076:
if (enumerator.MoveNext())
{
goto Label_008F;
}
if (!base.IsValid)
{
return;
}
goto Label_0219;
Label_0087:
num = 0;
goto Label_01CB;
Label_01CB:
if (num < entityArray.Length)
{
goto Label_0194;
}
goto Label_01AE;
Label_01F3:
num++;
goto Label_01CB;
What sort of code makes Reflector display these labels everywhere and why can't it disassemble them?
Is there a good technique for deciphering them?
Actually, the C# compiler doesn't do much of any optimization - it leaves that to the JIT compiler (or ngen). As such, the IL it generates is pretty consistent and predictable, which is why tools like Reflector are able to decompile IL so effectively. One situation where the compiler does transform your code is in an iterator method. The method you're looking at probably contained something along the lines of:
foreach(var x in something)
if(x.IsValid)
yield return x;
Since the iterator transformation can be pretty complex, Reflector can't really deal with it. To get familiar with what to look for, write your own iterator methods and run them through Reflector to see what kind of IL gets generated based on your C# code. Then you'll know what to look for.
You're looking at code generated by the compiler. The compiler doesn't respect you. No, really. It doesn't respect me or anybody else, either. It looks at our code, scoffs at us, and rewrites it to run as efficiently as possible.
Nested if statements, recursion, "yield"s, case statements, and other code shortcuts will result in weird looking code. And if you're using lambdas with lots of enclosures, well, don't expect it to be pretty.
Any where, any chance the compiler can rewrite your code to make it run faster it will. So there isn't any one "sort of code" that will cause this. Reflector does its best to disassemble, but it can't divine the author's original code from its rewritten version. It does its best (which sometimes is even incorrect!) to translate IL into some form of acceptable code.
If you're having a hard time deciphering it, you could manually edit the code to inline goto's that only get called once and refactor goto's that are called more than once into method calls. Another alternative is to disassemble into another language. The code that translates IL into higher level languages isn't the same. The C++/CLI decompiler may do a better job for you and still be similar enough (find/replace -> with .) to be understandable.
There really isn't a silver bullet for this; at least not until somebody writes a better disassembler plugin.
When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".
It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.
A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.
This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.
As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.
The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.
I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause
It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.
This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.
Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.
Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.
Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.
There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.
Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.
Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).
Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).
Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.
In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.
That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.
There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.
It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.
In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.
I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.
I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"
My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.
There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.