Related
This is the first time I face a problem like this. Not being this my profession but only my hobby, I have no previous references.
In my program I have added one by one several functions to control a machine. After I added the last function (temperature measurement), I have started experiencing problems on other functions (approx. 8 of them running all together. The problem I am experiencing is on a chart (RPM of a motor) that is not related to this function but is affected by it. You see the difference between these two charts with and without the temperature measurement running. The real speed of the motor is the same in both charts but in the second one I loose pieces on the fly because the application slows down.
Without the temperature function.
With the temperature function
Particularly this function is disturbing the above control and I think is because the work load is becoming heavy for the application and or because I need sampling so there is some time waiting to get them:
private void AddT(decimal valueTemp)
{
sumTemp += valueTemp;
countTemp += 1;
if (countTemp >= 20) //take 20 samples and make average
{
OnAvarerageChangedTemp(sumTemp / countTemp);
sumTemp = 0;
countTemp = 0;
}
}
private void OnAvarerageChangedTemp(decimal avTemp)
{
float val3 = (float)avTemp;
decimal alarm = avTemp;
textBox2.Text = avTemp.ToString("F");
if (alarm > 230)
{
System.Media.SoundPlayer player = new System.Media.SoundPlayer();
player.Stream = Properties.Resources.alarma;
player.Play();
timer4.Start();
}
else
{
timer4.Stop();
panel2.BackColor = SystemColors.Control;
}
}
I am wondering if running this function on a different thread would solve the problem and how I can do that? Or if there is a different way to solve the problem.Sample code will be appreciated.
Update, added method call.
This is how I call the method AddT
if (b != "")
{
decimal convTemp; //corrente resistenza
decimal.TryParse(b, out convTemp);
AddT(convTemp);}
This is how I receive the data from the serial and pass it to the class that strips out unwonted chars and return values to the different variables.
This is the class that strips out the unwonted chars and return the values.
And this is how I manage the serial incoming data. Please do not laugh at me after seeing my coding. I do a different job and I am learning on my own.
It's very hard to tell if there's anything wrong and what it might be - it looks like subtle problem.
However, it might be easier to get a handle on these things if you refactor your code. There are many things in the code you've shown that make it harder than necessary to reason about what's happening.
You're using float and decimal - float isn't that accurate but small and fast; decimal (tries) to be precise but especially is predictable since it rounds errors the way a human might in base-10 - but it is quite slow, and is usually intended for calculations where precise reproducibility is necessary (e.g. financial stuff). You should probably use double everywhere.
You've got useless else {} code in the Stripper class.
Your Stripper is an instatiable class, when it should simply be a static class with a static method - Stripper is stateless.
You're catching exceptions just to rethrow them.
You're using TryParse, and not checking for success. Normally you'd only use TryParse if you (a) expect parsing to fail sometimes, and (b) can deal with that parse failure. If you don't expect failure or can't deal with it, you're better off with a crash you learn about soon than a subtly incorrect values.
In stripper, you're duplicating variables such as _currentMot, currentMot, and param4 but they're identical - use only the parameter, and give it a logical name.
You're using out parameters. It's almost always a better idea to define a simple struct and return that instead - this also allows you to ensure you can't easily mix up variable names, and it's much easier to encapsulate and reuse functionality since you don't need to duplicate a long call and argument definition.
Your string parsing logic is too fragile. You should probably avoid Replace entirely, and instead explicitly make a Substring without the characters you've checked for, and you have some oddly named things like test1 and test2 which refer to a lastChar that's not the last character - this might be OK, but better names can help keep things straight in your head too.
You have incorrect code comments (decimal convTemp; //corrente resistenza). I usually avoid all purely technical code comments; it's better to use descriptive variable names which are another form of self-documenting code but one in which the compiler can at least check if you use them consistently.
Rather that return 4 possibly empty values, your Stripper should probably accept a parameter "sink" object on which it can call AddT AddD and AddA directly.
I don't think any of the above will fix your issue, but I do believe they're help keep your code a little cleaner and (in the long run) make it easier to find the issues.
your problem is in the parsing of the values you have
decimal.TryParse(a, out convRes);
AddA(convRes);
and don't check for failed values you only accept the value if it returns true
if(decimal.TryParse(a, out convRes))
{
AddA(convRes);
}
you may have more errors but this one is making you process 0 values every time the TryParse fails.
Let's think of it as a family tree, a father has kids, those kids have kids, those kids have kids, etc...
So I have a recursive function that gets the father uses Recursion to get the children and for now just print them to debug output window...But at some point ( after one hour of letting it run and printing like 26000 rows) it gives me a StackOverFlowException.
So Am really running out of memory? hmmm? then shouldn't I get an "Out of memory exception"? on other posts I found people were saying if the number of recursive calls are too much, you might still get a SOF exception...
Anyway, my first thought was to break the tree into smaller sub-strees..so I know for a fact that my root father always has these five kids, so Instead of Calling my method one time with root passed to it, I said ok call it five times with Kids of root Passes to it.. It helped I think..but still one of them is so big - 26000 rows when it crashes - and still have this issue.
How about Application Domains and Creating new Processes at run time at some certain level of depth? Does that help?
How about creating my own Stack and using that instead of recursive methods? does that help?
here is also a high-level of my code, please take a look, maybe there is actually something silly wrong with this that causes SOF error:
private void MyLoadMethod(string conceptCKI)
{
// make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node.
// when this is zero, it means its a leaf.
int numberofKids = moTargetConceptList2.ConceptReltns.Count();
if (numberofKids == 0)
return;
for (int i = 1; i <= numberofKids; i++)
{
oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);
//Get the concept linked to the relation concept
if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
}
else
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
}
//builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);
MyLoadMethod(oConcept.ConceptCKI);
}
}
How about creating my own Stack and using that instead of recursive methods? does that help?
Yes!
When you instantiate a Stack<T> this will live on the heap and can grow arbitrarily large (until you run out of addressable memory).
If you use recursion you use the call stack. The call stack is much smaller than the heap. The default is 1 MB of call stack space per thread. Note this can be changed, but it's not advisable.
StackOverflowException is quite different to OutOfMemoryException.
OOME means that there is no memory available to the process at all. This could be upon trying to create a new thread with a new stack, or in trying to create a new object on the heap (and a few other cases).
SOE means that the thread's stack - by default 1M, though it can be set differently in thread creation or if the executable has a different default; hence ASP.NET threads have 256k as a default rather than 1M - was exhausted. This could be upon calling a method, or allocating a local.
When you call a function (method or property), the arguments of the call are placed on the stack, the address the function should return to when it returns are put on the stack, then execution jumps to the function called. Then some locals will be placed on the stack. Some more may be placed on it as the function continues to execute. stackalloc will also explicitly use some stack space where otherwise heap allocation would be used.
Then it calls another function, and the same happens again. Then that function returns, and execution jumps back to the stored return address, and the pointer within the stack moves back up (no need to clean up the values placed on the stack, they're just ignored now) and that space is available again.
If you use up that 1M of space, you get a StackOverflowException. Because 1M (or even 256k) is a large amount of memory for these such use (we don't put really large objects in the stack) the three things that are likely to cause an SOE are:
Someone thought it would be a good idea to optimise by using stackalloc when it wasn't, and they used up that 1M fast.
Someone thought it would be a good idea to optimise by creating a thread with a smaller than usual stack when it wasn't, and they use up that tiny stack.
A recursive (whether directly or through several steps) call falls into an infinite loop.
It wasn't quite infinite, but it was large enough.
You've got case 4. 1 and 2 are quite rare (and you need to be quite deliberate to risk them). Case 3 is by far the most common, and indicates a bug in that the recursion shouldn't be infinite, but a mistake means it is.
Ironically, in this case you should be glad you took the recursive approach rather than iterative - the SOE reveals the bug and where it is, while with an iterative approach you'd probably have an infinite loop bringing everything to a halt, and that can be harder to find.
Now for case 4, we've got two options. In the very very rare cases where we've got just slightly too many calls, we can run it on a thread with a larger stack. This doesn't apply to you.
Instead, you need to change from a recursive approach to an iterative one. Most of the time, this isn't very hard thought it can be fiddly. Instead of calling itself again, the method uses a loop. For example, consider the classic teaching-example of a factorial method:
private static int Fac(int n)
{
return n <= 1 ? 1 : n * Fac(n - 1);
}
Instead of using recursion we loop in the same method:
private static int Fac(int n)
{
int ret = 1;
for(int i = 1; i <= n, ++i)
ret *= i;
return ret;
}
You can see why there's less stack space here. The iterative version will also be faster 99% of the time. Now, imagine we accidentally call Fac(n) in the first, and leave out the ++i in the second - the equivalent bug in each, and it causes an SOE in the first and a program that never stops in the second.
For the sort of code you're talking about, where you keep producing more and more results as you go based on previous results, you can place the results you've got in a data-structure (Queue<T> and Stack<T> both serve well for a lot of cases) so the code becomes something like):
private void MyLoadMethod(string firstConceptCKI)
{
Queue<string> pendingItems = new Queue<string>();
pendingItems.Enqueue(firstConceptCKI);
while(pendingItems.Count != 0)
{
string conceptCKI = pendingItems.Dequeue();
// make some script calls to DB, so that moTargetConceptList2 will have Concept-Relations for the current node.
// when this is zero, it means its a leaf.
int numberofKids = moTargetConceptList2.ConceptReltns.Count();
for (int i = 1; i <= numberofKids; i++)
{
oUCMRConceptReltn = moTargetConceptList2.ConceptReltns.get_ItemByIndex(i, false);
//Get the concept linked to the relation concept
if (oUCMRConceptReltn.SourceCKI == sConceptCKI)
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.TargetCKI, false);
}
else
{
oConcept = moTargetConceptList2.ItemByKeyConceptCKI(oUCMRConceptReltn.SourceCKI, false);
}
//builder.AppendLine("\t" + oConcept.PrimaryCTerm.SourceString);
Debug.WriteLine(oConcept.PrimaryCTerm.SourceString);
pendingItems.Enque(oConcept.ConceptCKI);
}
}
}
(I haven't completely checked this, just added the queuing instead of recursing to the code in your question).
This should then do more or less the same as your code, but iteratively. Hopefully that means it'll work. Note that there is a possible infinite loop in this code if the data you are retrieving has a loop. In that case this code will throw an exception when it fills the queue with far too much stuff to cope. You can either debug the source data, or use a HashSet to avoid enqueuing items that have already been processed.
Edit: Better add how to use a HashSet to catch duplicates. First set up a HashSet, this could just be:
HashSet<string> seen = new HashSet<string>();
Or if the strings are used case-insensitively, you'd be better with:
HashSet<string> seen = new HashSet<string>(StringComparison.InvariantCultureIgnoreCase) // or StringComparison.CurrentCultureIgnoreCase if that's closer to how the string is used in the rest of the code.
Then before you go to use the string (or perhaps before you go to add it to the queue, you have one of the following:
If duplicate strings shouldn't happen:
if(!seen.Add(conceptCKI))
throw new InvalidOperationException("Attempt to use \" + conceptCKI + "\" which was already seen.");
Or if duplicate strings are valid, and we just want to skip performing the second call:
if(!seen.Add(conceptCKI))
continue;//skip rest of loop, and move on to the next one.
I think you have a recursion's ring (infinite recursion), not a really stack overflow error. If you are got more memory for stack - you will get the overflow error too.
For test it:
Declare a global variable for storing a operable objects:
private Dictionary<int,object> _operableIds = new Dictionary<int,object>();
...
private void Start()
{
_operableIds.Clear();
Recurtion(start_id);
}
...
private void Recurtion(int object_id)
{
if(_operableIds.ContainsKey(object_id))
throw new Exception("Have a ring!");
else
_operableIds.Add(object_id, null/*or object*/);
...
Recurtion(other_id)
...
_operableIds.Remove(object_id);
}
When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".
It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.
A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.
This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.
As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.
The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.
I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause
It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.
This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.
Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.
Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.
Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.
There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.
Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.
Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).
Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).
Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.
In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.
That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.
There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.
It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.
In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.
I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.
I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"
My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.
There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.
Are there any general rules when using recursion on how to avoid stackoverflows?
How many times you will be able to recurse will depend on:
The stack size (which is usually 1MB IIRC, but the binary can be hand-edited; I wouldn't recommend doing so)
How much stack each level of the recursion uses (a method with 10 uncaptured Guid local variables will be take more stack than a method which doesn't have any local variables, for example)
The JIT you're using - sometimes the JIT will use tail recursion, other times it won't. The rules are complicated and I can't remember them. (There's a blog post by David Broman back from 2007, and an MSDN page from the same author/date, but they may be out of date by now.)
How to avoid stack overflows? Don't recurse too far :) If you can't be reasonably sure that your recursion will terminate without going very far (I'd be worried at "more than 10" although that's very safe) then rewrite it to avoid recursion.
It really depends on what recursive algorithm you're using. If it's simple recursion, you can do something like this:
public int CalculateSomethingRecursively(int someNumber)
{
return doSomethingRecursively(someNumber, 0);
}
private int doSomethingRecursively(int someNumber, int level)
{
if (level >= MAX_LEVEL || !shouldKeepCalculating(someNumber))
return someNumber;
return doSomethingRecursively(someNumber, level + 1);
}
It's worth noting that this approach is really only useful where the level of recursion can be defined as a logical limit. In the case that this cannot occur (such as a divide and conquer algorithm), you will have to decide how you want to balance simplicity versus performance versus resource limitations. In these cases, you may have to switch between methods once you hit an arbritrary pre-defined limit. An effective means of doing this that I have used in the quicksort algorithm is to do it as a ratio of the total size of the list. In this case, the logical limit is a result of when conditions are no longer optimal.
I am not aware of any hard set to avoid stackoverflows. I personally try to ensure -
1. I have my base cases right.
2. The code reaches the base case at some point.
If you're finding yourself generating that many stack frames, you might want to consider unrolling your recursion into a loop.
Especially if you are doing multiple levels of recursion (A->B->C->A->B...) you might find that you can extract one of those levels into a loop and save yourself some memory.
The normal limit, if not much is left on the stack between successive calls, is around 15000-25000 levels deep. 25% of that if you are on IIS 6+.
Most recursive algorhitms can be expressed iteratively.
There are various way to increase allocated stack space, but I'll rather let you find an iterative version first. :)
Other than having a reasonable stack size and making sure you divide and conquer your problem such that you continually work on a smaller problem, not really.
I just thought of tail-recursion, but it turned out, that C# does not support it. However the .Net-Framework seems to support it:
http://blogs.msdn.com/abhinaba/archive/2007/07/27/tail-recursion-on-net.aspx
The default stack size for a thread is 1 MB, if you're running under the default CLR. However, other hosts may change that. For example the ASP host changes the default to 256 KB. This means that you may have code that runs perfectly well under VS, but breaks when you deploy it to the real hosting environment.
Fortunately you can specify a stack size, when you create a new thread by using the correct constructor. In my experience it is rarely necessary, but I have seen one case where this was the solution.
You can edit the PE header of the binary itself to change the default size. This is useful if you want to change the size for the main thread. Otherwise I would recommend using the appropriate constructor when creating threads.
I wrote a short article about this here. Basically, I pass an optional parameter called, depth, adding 1 to it each time I go deeper into it. Within the recursive method I check the depth for a value. If it is greater than the value I set, I throw an exception. The value (threshold) would be dependent on your applications needs.
Remember, if you have to ask about system limits, then you are probably doing something horribly wrong.
So, if you think you might get a stack overflow in normal operation then you need to think of a different approach to the problem.
It's not difficult to convert a recursive function into an iterative one, especially as C# has the Generic::Stack collection. Using the Stack type moves the memory used into the program's heap instead of the stack. This gives you the full address range to store the recursive data. If that isn't enough, it's not too difficult to page the data to disk. But I'd seriously consider other solutions if you get to this stage.
When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".
It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.
A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.
This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.
As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.
The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.
I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause
It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.
This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.
Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.
Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.
Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.
There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.
Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.
Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).
Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).
Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.
In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.
That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.
There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.
It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.
In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.
I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.
I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"
My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.
There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.