Related
I have a simple class intended to store scaled integral values
using member variables "scaled_value" (long) with a "scale_factor".
I have a constructor that fills a new class instance with a decimal
value (although I think the value type is irrelevant).
Assignment to the "scaled_value" slot appears... to not happen.
I've inserted an explicit assignment of the constant 1 to it.
The Debug.Assert below fails... and scaled_value is zero.
On the assertion break in the immediate window I can inspect/set using assignment/inspect "scale_factor"; it changes as I set it.
I can inspect "scaled_value". It is always zero. I can type an
assignment to it which the immediate window executes, but its value
doesn't change.
I'm using Visual Studio 2017 with C# 2017.
What is magic about this slot?
public class ScaledLong : Base // handles scaled-by-power-of-ten long numbers
// intended to support equivalent of fast decimal arithmetic while hiding scale factors from user
{
public long scaled_value; // up to log10_MaxLong digits of decimal precision
public sbyte scale_factor; // power of ten representing location of decimal point range -21..+21. Set by constructor AND NEVER CHANGED.
public byte byte_size; // holds size of value in underlying memory array
string format_string;
<other constructors with same arguments except last value type>
public ScaledLong(sbyte sf, byte size, string format, decimal initial_value)
{
scale_factor = sf;
byte_size = size;
format_string = format;
decimal temp;
sbyte exponent;
{ // rip exponent out of decimal value leaving behind an integer;
_decimal_structure.value = initial_value;
exponent = (sbyte)_decimal_structure.Exponent;
_decimal_structure.Exponent = 0; // now decimal value is integral
temp = _decimal_structure.value;
}
sbyte sfDelta = (sbyte)(sf - exponent);
if (sfDelta >= 0)
{ // sfDelta > 0
this.scaled_value = 1;
Debug.Assert(scaled_value == 1);
scaled_value = (long)Math.Truncate(temp * DecimalTenToPower[sfDelta]);
}
else
{
temp = Math.Truncate(temp / DecimalHalfTenToPower[-sfDelta]);
temp += (temp % 2); /// this can overflow for value at very top of range, not worth fixing; note: this works for both + and- numbers (?)
scaled_value = (long)(temp / 2); // final result
}
}
The biggest puzzles often have the stupidest foundations. This one is a lesson in unintended side effects.
I found this by thinking about, wondering how in earth a member can get modified in unexpected ways. I found the solution before I read #mjwills comment, but he was definitely sniffing at the right thing.
What I left out (of course!) was that I had just coded a ToString() method for the class... that wasn't debugged. Why did I leave it out? Because it obviously can't affect anything so it can't be part of the problem.
Bzzzzt! it used the member variable as a scratchpad and zeroed it (there's the side effect); that was obviously unintended.
When this means is that when code the just runs, ToString() isn't called and the member variable DOES get modified correctly. (I even had unit tests for the "Set" routine checked all that and they were working).
But, when you are debugging.... the debugger can (and did in this case) show local variables. To do that, it will apparently call ToString() to get a nice displayable value. So the act of single stepping caused ToSTring() to get called, and its buggy scratch variable assignment zeroed out the slot after each step call.
So it wasn't a setter that bit me. It was arguably a getter. (Where is FORTRAN's PURE keyword when you need it?)
Einstein hated spooky actions at a distance. Programmers hate spooky side effects at a distance.
One wonders a bit at the idea of the debugger calling ToString() on a class, whose constructor hasn't finished. What assertions about the state of the class can ToString trust, given the constructor isn't done? I think the MS debugger should be fixed. With that, I would have spent my time debugging ToString instead of chasing this.
Thanks for putting up with my question. It got me to the answer.
If you still have a copy of that old/buggy code it would be interesting to try to build it under VS 2019 and Rider (hopefully the latest, 2022.1.1 at this point) with ReSharper (built in) allowed to do the picky scan and with a .ruleset allowed to bitch about just about anything (just for the 1st build - you'll turn off a lot but you need it to scream in order to see what to turn off). And with .NET 5.0 or 6.0
The reason I mention is that I remember some MS bragging about doing dataflow analysis to some degree in 2019 and I did see Rider complaining about some "unsafe assignments". If the old code is long lost - never mind.
CTOR-wise, if CTOR hasn't finished yet, we all know that the object "doesn't exist" yet and has invalid state, but to circumvent that, C# uses default values for everything. When you see code with constant assignments at the point of definition of data members that look trivial and pointless - the reason for that is that a lot of people do remember C++ and don't trust implicit defaults - just in case :-)
There is a 2-phase/round initialization sequence with 2 CTOR-s and implicit initializations in-between. Not widely documented (so that people with weak hearts don't use it :-) but completely deterministic and thread-safe (hidden fuses everywhere). Just for the sake of it's stability you never-ever want to have a call to any method before the 2 round is done (plain CTOR done still doesn't mean fully constructed object and any method invocation from the outside may trigger the 2nd round prematurely).
1st (plain) CTOR can be used in implicit initializations before the 2nd runs => you can control the (implicit) ordering, just have to be careful and step through it in debugger.
Oh and .ToString normally shouldn't be defined at all - on purpose :-) It's de-facto intrinsic => compiler can take it's liberties with it. Plus, if you define it, pretty soon you'll be obliged to support (and process) format specifiers.
I used to define ToJson (before big libs came to fore) to provide, let's say a controllable printable (which can also go over the wire and is 10-100 times faster than deserialization). These days VS debugger has a collection of "visualizers" and an option to tell debugger to use it or not (when it's off then it will jerk ToString's chain if it sees it.
Also, it's good to have dotPeek (or actual Reflector, owned by Redgate these days) with "find source code" turned off. Then you see the real generated code which is sometimes glorious (String is intrinsic and compiler goes a few extra miles to optimize its operations) and sometimes ugly (async/await - total faker, inefficient and flat out dangerous - how do you say "deadlock" in C# :-) - not kidding) but you need to to be able to see the final code or you are driving blind.
When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".
It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.
A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.
This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.
As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.
The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.
I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause
It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.
This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.
Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.
Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.
Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.
There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.
Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.
Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).
Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).
Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.
In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.
That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.
There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.
It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.
In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.
I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.
I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"
My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.
There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.
The example below may not be problematic as is, but it should be enough to illustrate a point. Imagine that there is a lot more work than trimming going on.
public string Thingy
{
set
{
// I guess we can throw a null reference exception here on null.
value = value.Trim(); // Well, imagine that there is so much processing to do
this.thingy = value; // That this.thingy = value.Trim() would not fit on one line
...
So, if the assignment has to take two lines, then I either have to abusereuse the parameter, or create a temporary variable. I am not a big fan of temporary variables. On the other hand, I am not a fan of convoluted code. I did not include an example where a function is involved, but I am sure you can imagine it. One concern I have is if a function accepted a string and the parameter was "abused", and then someone changed the signature to ref in both places - this ought to mess things up, but ... who would knowingly make such a change if it already worked without a ref? Seems like it is their responsibility in this case. If I mess with the value of value, am I doing something non-trivial under the hood? If you think that both approaches are acceptable, then which do you prefer and why?
Thanks.
Edit: Here is what I mean when I say I am not a fan of temp variables. I do not like code like this:
string userName = userBox.Text;
if (userName.Length < 5) {
MessageBox.Show("The user name " + userName + " that you entered is too short.");
....
Again, this may not be the best way to communicate a problem to the user, but it is just an illustration. The variable userName is unnecessary in my strong opinion in this case. I am not always against temporary variables, but when their use is very limited and they do not save that much typing, I strongly prefer not to use them.
First off, it's not a big deal.
But I would introduce a temp variable here. It costs nothing and is less prone to errors. Imagine someone has to maintain the code later. Better if value only has 1 meaning and purpose.
And don't call it temp, call it cleanedValue or something.
It is a good practice not to change the values of incoming parameters, even if you technically can. Don't touch the value.
I am not a big fan of temporary variables.
Well, programming is largely about creating temporary variables all over the place, reading and assigning values. You'd better start to love them. :)
One more remark regarding properties. Although you could technically put a lot of logic there, it is recommended to keep properties simple and try not to use any code that could throw exceptions. A need to call other functions may indicate that this property is better be made a method or that there is some initialization code needed somewhere. Just rethink what you're doing and whether it does really look like a property.
When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".
It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.
A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.
This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.
As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.
The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.
I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause
It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.
This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.
Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.
Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.
Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.
There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.
Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.
Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).
Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).
Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.
In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.
That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.
There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.
It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.
In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.
I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.
I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"
My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.
There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.
The IT department of a subsidiary of ours had a consulting company write them an ASP.NET application. Now it's having intermittent problems with mixing up who the current user is and has been known to show Joe some of Bob's data by mistake.
The consultants were brought back to troubleshoot and we were invited to listen in on their explanation. Two things stuck out.
First, the consultant lead provided this pseudo-code:
void MyFunction()
{
Session["UserID"] = SomeProprietarySessionManagementLookup();
Response.Redirect("SomeOtherPage.aspx");
}
He went on to say that the assignment of the session variable is asynchronous, which seemed untrue. Granted the call into the lookup function could do something asynchronously, but this seems unwise.
Given that alleged asynchronousness, his theory was that the session variable was not being assigned before the redirect's inevitable ThreadAbort exception was raised. This faulure then prevented SomeOtherPage from displaying the correct user's data.
Second, he gave an example of a coding best practice he recommends. Rather than writing:
int MyFunction(int x, int x)
{
try
{
return x / y;
}
catch(Exception ex)
{
// log it
throw;
}
}
the technique he recommended was:
int MyFunction(int x, int y, out bool isSuccessful)
{
isSuccessful = false;
if (y == 0)
return 0;
isSuccessful = true;
return x / y;
}
This will certainly work and could be better from a performance perspective in some situations.
However, from these and other discussion points it just seemed to us that this team was not well-versed technically.
Opinions?
Rule of thumb: If you need to ask if a consultant knows what he's doing, he probably doesn't ;)
And I tend to agree here. Obviously you haven't provided much, but they don't seem terribly competent.
I would agree. These guys seem quite incompetent.
(BTW, I'd check to see if in "SomeProprietarySessionManagementLookup," they're using static data. Saw this -- with behavior exactly as you describe on a project I inherited several months ago. It was a total head-slap moment when we finally saw it ... And wished we could get face to face with the guys who wrote it ... )
If the consultant has written an application that's supposed to be able to keep track of users and only show the correct data to the correct users and it doesn't do that, then clearly something's wrong. A good consultant would find the problem and fix it. A bad consultant would tell you that it was asynchronicity.
On the asynchronous part, the only way that could be true is if the assignment going on there is actually an indexer setter on Session that is hiding an asynchronous call with no callback indicating success/failure. This would seem to be a HORRIBLE design choice, and it looks like a core class in your framework, so I find it highly unlikely.
Usually asynchronous calls have a way to specify a callback so you can determine what the result is, or if the operation was successful. The documentation for Session should be pretty clear though on if it is actually hiding an asynchronous call, but yeah... doesn't look like the consultant knows what he is talking about...
The method call that is being assigned to the Session indexer cannot be asynch, because to get a value asynchronously, you HAVE to use a callback... no way around that, so if there is no explicit callback, it's definitely not asynch (well, internally there could be an asynchronous call, but the caller of the method would perceive it as synchronous, so it is irrelevant if the method internally for example invokes a web service asynchronously).
For the second point, I think this would be much better, and keep the same functionality essentially:
int MyFunction(int x, int y)
{
if (y == 0)
{
// log it
throw new DivideByZeroException("Divide by zero attempted!");
}
return x / y;
}
For the first point, that does indeed seem bizarre.
On the second one, it's reasonable to try to avoid division by 0 - it's entirely avoidable and that avoidance is simple. However, using an out parameter to indicate success is only reasonable in certain cases, such as int.TryParse and DateTime.TryParseExact - where the caller can't easily determine whether or not their arguments are reasonable. Even then, the return value is usually the success/failure and the out parameter is the result of the method.
Asp.net sessions, if you're using the built-in providers, won't accidentally give you someone else's session. SomeProprietarySessionManagementLookup() is the likely culprit and is returning bad values or just not working.
Session["UserID"] = SomeProprietarySessionManagementLookup();
First of all assigning the return value from an asynchronously SomeProprietarySessionManagementLookup() just wont work. The consultants code probably looks like:
public void SomeProprietarySessionManagementLookup()
{
// do some async lookup
Action<object> d = delegate(object val)
{
LookupSession(); // long running thing that looks up the user.
Session["UserID"] = 1234; // Setting session manually
};
d.BeginInvoke(null,null,null);
}
The consultant isn't totally full of BS, but they have written some buggy code. Response.Redirect() does throw a ThreadAbort, and if the proprietary method is asynchronous, asp.net doesn't know to wait for the asynchronous method to write back to the session before asp.net itself saves the session. This is probably why it sometimes works and sometimes doesn't.
Their code might work if the asp.net session is in-process, but a state server or db server wouldn't. It's timing dependent.
I tested the following. We use state server in development. This code works because the session is written to before the main thread finishes.
Action<object> d = delegate(object val)
{
System.Threading.Thread.Sleep(1000); // waits a little
Session["rubbish"] = DateTime.Now;
};
d.BeginInvoke(null, null, null);
System.Threading.Thread.Sleep(5000); // waits a lot
object stuff = Session["rubbish"];
if( stuff == null ) stuff = "not there";
divStuff.InnerHtml = Convert.ToString(stuff);
This next snippet of code doesn't work because the session was already saved back to state server by the time the asynchronous method gets around to setting a session value.
Action<object> d = delegate(object val)
{
System.Threading.Thread.Sleep(5000); // waits a lot
Session["rubbish"] = DateTime.Now;
};
d.BeginInvoke(null, null, null);
// wait removed - ends immediately.
object stuff = Session["rubbish"];
if( stuff == null ) stuff = "not there";
divStuff.InnerHtml = Convert.ToString(stuff);
The first step is for the consultant to make their code synchronous because their performance trick didn't work at all. If that fixes it, have the consultant properly implement using the Asynchronous Programming Design Pattern
I agree with him in part -- it's definitely better to check y for zero rather than catching the (expensive) exception. The out bool isSuccessful seems really dated to me, but whatever.
re: the asynchronous sessionid buffoonery -- may or may not be true, but it sounds like the consultant is blowing smoke for cover.
Cody's rule of thumb is dead right. If you have to ask, he probably doesn't.
It seems like point two its patently incorrect. .NET's standards explain that if a method fails it should throw an exception, which seems closer to the original; not the consulstant's suggestion. Assuming the exception is accurately & specifically describing the failure.
The consultants created the code in the first place right? And it doesn't work. I think you have quite a bit of dirt on them already.
The asynchronous answer sounds like BS, but there may be something in it. Presumably they have offered a suitable solution as well as pseudo-code describing the problem they themselves created. I would be more tempted to judge them on their solution rather than their expression of the problem. If their understanding is flawed their new solution won't work either. Then you'll know they are idiots. (In fact look round to see if you have a similar proof in any other areas of their code already)
The other one is a code style issue. There are a lot of different ways to cope with that. I personally don't like that style, but there will be circumstances under which it is suitable.
They're wrong on the async.
The assignment happens and then the page redirects. The function can start something asynchronously and return (and could even conceivably alter the Session in its own way), but whatever it does return has to be assigned in the code you gave before the redirect.
They're wrong on that defensive coding style in any low-level code and even in a higher-level function unless it's a specific business case that the 0 or NULL or empty string or whatever should be handled that way - in which case, it's always successful (that successful flag is a nasty code smell) and not an exception. Exceptions are for exceptions. You don't want to mask behaviors like this by coddling the callers of the functions. Catch things early and throw exceptions. I think Maguire covered this in Writing Solid Code or McConnell in Code Complete. Either way, it smells.
This guy does not know what he is doing. The obvious culprit is right here:
Session["UserID"] = SomeProprietarySessionManagementLookup();
I have to agree with John Rudy. My gut tells me the problem is in SomeProprietarySessionManagementLookup().
.. and your consultants do not sound to sure of themselves.
Storing in Session in not async. So that isn't true unless that function is async. But even so, since it isn't calling a BeginCall and have something to call on completion, the next line of code wouldn't execute until the Session line is complete.
For the second statement, while that could be used, it isn't exactly a best practice and you have a few things to note with it. You save the cost of throwing an exception, but wouldn't you want to know that you are trying to divide by zero instead of just moving past it?
I don't think that is a solid suggestion at all.
Quite strange. On the second item it may or may not be faster. It certainly isn't the same functionality though.
Typical "consultant" bollocks:
The problem is with whatever SomeProprietarySessionManagementLookup is doing
Exceptions are only expensive if they're thrown. Don't be afraid of try..catch, but throws should only occur in exceptional circumstances. If variable y shouldn't be zero then an ArgumentOutOfRangeException would be appropriate.
I'm guessing your consultant is suggesting use a status variable instead of exception for error handling is a better practice? I don't agree. How often does people forgot or too lazy to do error checking for return values? Also, pass/fail variable is not informative. There are more things can go wrong other than divide by zero like integer x/y is too big or x is NaN. When things go wrong, status variable cannot tell you what went wrong, but exception can. Exception is for exceptional case, and divide by zero or NaN are definitely exceptional cases.
The session thing is possible. It's a bug, beyond doubt, but it could be that the write arrives at whatever custom session state provider you're using after the next read. The session state provider API accommodates locking to prevent this sort of thing, but if the implementor has just ignored all that, your consultant could be telling the truth.
The second issue is also kinda valid. It's not quite idiomatic - it's a slightly reversed version of things like int.TryParse, which are there to avoid performance issues caused by throwing lots of exceptions. But unless you're calling that code an awful lot, it's unlikely it'll make a noticeable difference (compared to say, one less database query per page etc). It's certainly not something you should do by default.
If SomeProprietarySessionManagementLookup(); is doing an asynchronous assignment it would more likely look like this:
SomeProprietarySessionManagementLookup(Session["UserID"]);
The very fact that the code is assigning the result to Session["UserID"] would suggest that it is not supposed to be asynchronous and the result should be obtained before Response.Redirect is called. If SomeProprietarySessionManagementLookup is returning before its result is calculated they have a design flaw anyway.
The throw an exception or use an out parameter is a matter of opinion and circumstance and in actual practice won't amount to a hill of beans which ever way you do it. For the performance hit of exceptions to become an issue you would need to be calling the function a huge number of times which would probably be a problem in itself.
If the consultants deployed their ASP.NET application on your server(s), then they may have deployed it in uncompiled form, which means there would be a bunch of *.cs files floating around that you could look at.
If all you can find is compiled .NET assemblies (DLLs and EXEs) of theirs, then you should still be able to decompile them into somewhat readable source code. I'll bet if you look through the code you'll find them using static variables in their proprietary lookup code. You'd then have something very concrete to show your bosses.
This entire answer stream is full of typical programmer attitudes. It reminds me of Joel's 'Things you should never do' article (rewrite from scratch.) We don't really know anything about the system, other than there's a bug, and some guy posted some code online. There are so many unknowns that it is ridiculous to say "This guy does not know what he is doing."
Rather than pile on the Consultant, you could just as easily pile on the person who procured their services. No consultant is perfect, nor is a hiring manager ... but at the end of the day the real direction you should be taking is very clear: instead of trying to find fault you should expend energy into working collaboratively to find solutions. No matter how skilled someone is at their roles and responsibilities they will certainly have deficiencies. If you determine there is a pattern of incompentencies then you may choose to transition to another resource going forward, but assigning blame has never solved a single problem in history.
On the second point, I would not use exceptions here. Exceptions are reserved for exceptional cases.
However, division of anything by zero certainly does not equal zero (in math, at least), so this would be case specific.