foreach loop List performance difference

foreach loop List performance difference - c#

While working on a project I ran into the following piece of code, which raised a performance flag.
foreach (var sample in List.Where(x => !x.Value.Equals("Not Reviewed")))
{
//do other work here
count++;
}
I decided to run a couple of quick tests comparing the original loop to the following loop:
foreach (var sample in List)
{
if (!sample.Value.Equals("Not Reviewed"))
{
//do other work here
count++;
}
}
and threw this loop in too to see what happens:
var tempList = List.Where(x => !x.Value.Equals("Not Reviewed"));
foreach (var sample in tempList)
{
//do other work here
count++;
}
I also populated the original list 3 different ways: 50-50 (so 50% of values where "Not Reviewed" and the rest other), 10-90 and 90-10. These are my results, the first and last loops
are mostly the same but the second one is much faster, especially on 10-90 case. Why exactly? I always thought Lambda had good performance.
EDIT
The count++ is not actually what's inside the loop, I just added that here for demonstration purposes, I guess I should've used "//do something here"
EDIT 2
Results running each one 1000 times:

Basically, there's a small amount of extra indirection - both for the test via a delegate, and for the iterating part. Given just how little work is being done per iteration, that extra indirection is relatively expensive.
That's neither surprising nor worrying, in my view. It's the kind of micro-optimization you can easily perform if you're in the rare situation of it being significant in your real-world application. In my experience it's pretty rare for this sort of loop to be a significant bottleneck in the app. The normal approach should be:
Define performance requirements
Implement the functional requirements in the clearest, simplest way you can
Measure your performance against the requirements
If performance is found wanting, investigate why and only move away from clarity as little as you can, getting the biggest "bang for buck" that you can
Repeat until you're done
Responding to an edit:
The count++ is not actually what's inside the loop, I just added that here for demonstration purposes, I guess I should've used "//do something here"
Well that's the important bit - the more work that is done there, the less significant anything else will be. Just counting is pretty darned fast, so I'd expect to see a large discrepancy. Do any amount of real work, and the difference will be smaller.

Related

Does inverting the "if" improve performance? [duplicate]

When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".

It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.

A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.

This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.

As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.

The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.

I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause

It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.

This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.

Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.

Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.

Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.

There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.

Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.

Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).

Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).

Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.

In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.

That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.

There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.

It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.

In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.

I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.

I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"

My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.

There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.

Calling a method in a linq foreach - how much overhead is there?

I'm thinking of replacing a lot of inline foreaches with Linq and in so doing will make new methods, e.g.
Current:
foreach(Item in List)
{
Statement1
Statement2
Statement3
}
Idea:
List.Foreach(Item => Method(Item))
Obviously Method() contains Statement1..3
Is this good practise or is calling a method thousands of times going to degrade performance? My Lists have 10,000-100,000 elements.

Well, for one thing you can probably make the ForEach statement more efficient using a method group conversion
List.ForEach(Method);
That's removed one level of indirection.
Personally though, I don't think it's a good idea. The first approach is more readable, and likely to perform about as well. What's the advantage of using List<T>.ForEach here?
Eric Lippert talks about this more in an excellent blog post. I would use List<T>.ForEach if you already had a delegate you wanted to execute against each element, but I wouldn't introduce a delegate and an extra method just for the sake of it.
In terms of efficiency, I wouldn't expect to see much difference. The first form may perform a little better as it doesn't have the indirection of the delegate call - but the second form may be more efficient if the iteration loop within ForEach makes use of the fact that it has access to the internal data structures of the List<T>. I very much doubt you'll notice it either way. You could try to measure it if you're really bothered, of course.

If your motivation for considering the change is that the three statements in the body are too complicated, then I'd probably use ordinary foreach, but refactor the body to a method:
foreach(var item in List)
Method(item);
If the code in the body isn't complicated, then I'd agree with Jon that there is no good reason for using ForEach (it doesn't make the code more readable/declarative in any way).
I generally don't like using "LINQ-like" constructs to do imperative processing at the end of a LINQ query. I think that using foreach more clearly states that you're finished with querying data and you're doing some processing now.

I'm totally agree with Jon Skeet's answer. But since we are talking about ForEach performance, I have something addtional to your question. Be aware of that if your Statement 1~3 is not relative with each other, that is:
foreach(Item in List)
{
DoSomething();
DoAnotherThing();
DoTheLastThing();
}
The code above probably has a worse performance than the following:
foreach(Item in List)
{
DoSomething();
}
foreach(Item in List)
{
DoAnotherThing();
}
foreach(Item in List)
{
DoTheLastThing();
}
The reason that the latter code which needs 2 more go-over-loops has a better performance, is because when it keeps calling DoSomething() thousands of times, some necessary variables are always warm in CPU registers. Very low costs are used to access those variables. On the other hand, if it calls DoAnotherThing() immediately after calling DoSomthing(), those variables of DoSomething() which already in CPU registers will cool down. Much more costs are needed to access these variables in the next loop.

I've always thought that you should write your code for readability first because the compiler and CLR do an exceptional job at optimisation. If you find that through benchmarking, that this code could be executed more quickly, then have a look at other options by all means.
E.g. for loops are quicker than foreach(), as they use array offsets which are internally optimised in the CLR.
But doesn't a List.ForEach () surely does a foreach ( ) anyway, so you are just giving the work to another method, rather than doing it yourself.
Strictly speaking, introducing more method calls will actually slow your code down on the first pass, because the CLR will JIT-compile methods as they are called, although subsequent calls to the method will not.
So my advice would be stick to writing readable code, then go from there if you can prove that this is a bottleneck of the system.

C# foreach vs functional each [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
Which one of these do you prefer?
foreach(var zombie in zombies)
{
zombie.ShuffleTowardsSurvivors();
zombie.EatNearbyBrains();
}
or
zombies.Each(zombie => {
zombie.ShuffleTowardsSurvivors();
zombie.EatNearbyBrains();
});

The first. It's part of the language for a reason.
Personally, I'd only use the second, functional approach to flow control if there is a good reason to do so, such as using Parallel.ForEach in .NET 4. It has many disadvantages, including:
It's slower. It's going to introduce a delegate invocation at each element, just like you did foreach (..) { myDelegate(); }
It's non-standard, so will be more difficult to understand by most developers
If you close over any locals, you're going to force the compiler to make a closure. This can lead to strange issues if there's threading involved, plus adds completely unnecessary bloat to the assembly.
I see no reason to write your own syntax for a flow control construct that already exists in the language.

Here you're doing some very imperative things like writing a statement rather than an expression (as presumably the Each method returns no value) and mutating state (which one can only assume the methods do, as they also appear to return no value) yet you're trying to pass them off as 'functional programming' by passing a collection of statements as a delegate. This code could barely be further from the ideals and idioms of functional programming, so why try to disguise it as such?
As much as I like multi-paradigm languages such as C#, I think they are easiest to understand and maintain when paradigms are mixed at a higher level (e.g. an entire method written in either a functional or an imperative style) rather than when multiple paradigms are mixed within a single statement or expression.
If you're writing imperative code just be honest about it and use a loop. It's nothing to be ashamed of. Imperative code is not an inherently bad thing.

Second form.
In my opinion, the less language constructs and keywords you have to use, the better. C# has enough extraneous crud in it as it is.
Generally the less you have to type, the better. Seriously, how could you not want to use "var" in situations like this? Surely if being explicit was your only goal, you'd still be using hungarian notation... you have an IDE that gives you type information whenever you hover over... or of course Ctrl+Q if you're using Resharper...
#T.E.D. The performance implications of a delegate invocation are a secondary concern. If you're doing this a thousand terms sure, run dot trace and see if it's not acceptable.
#Reed Copsey: re non-standard, if a developer can't work out what ".Each" is doing then you've got more problems, heh. Hacking the language to make it nicer is one of the great joys of programming.

The lamda version is actually not slower. I just did a quick test and the delegate version is about 30% faster.
Here is the codez:
class Blah {
public void DoStuff() {
}
}
List<Blah> blahs = new List<Blah>();
DateTime start = DateTime.Now;
for(int i = 0; i < 30000000; i++) {
blahs.Add(new Blah());
}
TimeSpan elapsed = (DateTime.Now - start);
Console.WriteLine(string.Format(System.Globalization.CultureInfo.CurrentCulture, "Allocation - {0:00}:{1:00}:{2:00}.{3:000}",
elapsed.Hours,
elapsed.Minutes,
elapsed.Seconds,
elapsed.Milliseconds));
start = DateTime.Now;
foreach(var bl in blahs) {
bl.DoStuff();
}
elapsed = (DateTime.Now - start);
Console.WriteLine(string.Format(System.Globalization.CultureInfo.CurrentCulture, "foreach - {0:00}:{1:00}:{2:00}.{3:000}",
elapsed.Hours,
elapsed.Minutes,
elapsed.Seconds,
elapsed.Milliseconds));
start = DateTime.Now;
blahs.ForEach(bl=>bl.DoStuff());
elapsed = (DateTime.Now - start);
Console.WriteLine(string.Format(System.Globalization.CultureInfo.CurrentCulture, "lambda - {0:00}:{1:00}:{2:00}.{3:000}",
elapsed.Hours,
elapsed.Minutes,
elapsed.Seconds,
elapsed.Milliseconds));
OK, So I've run more tests and here are the results.
The order of the execution(forach, lambda or lambda, foreach) didn't make much difference, lambda version was still faster:
foreach - 00:00:00.561
lambda - 00:00:00.389
lambda - 00:00:00.317
foreach - 00:00:00.337
The difference in performance is a lot less for arrays of classes. Here are the numbers for Blah[30000000]:
lambda - 00:00:00.317
foreach - 00:00:00.337
Here is the same test but Blah being a struct:
Blah[] version
lambda - 00:00:00.676
foreach - 00:00:00.437
List version:
lambda - 00:00:00.461
foreach - 00:00:00.391
Optimized build, Blah is a struct using an array.
lambda - 00:00:00.426
foreach - 00:00:00.079
Conclusion: There is no blanket answer for performance of foreach vs lambda. The answer is It depends. Here is a more scientific test for List<T>. As far as I can tell it's pretty damn efficient. If you are really concerned with performance use for(int i... loop. For iterating over a collection of a thousand customer records (example) it really doesn't matter all that much.
As far as deciding between which version to use I would put potential performance hit for lambda version way at the bottom.
Conclusion #2 T[] (where T is a value type) foreach loop is about 5 times faster for this test in an optimized build. That's the only significant difference between a Debug and Release build. So there you go, for arrays of value types use foreach, everything else - it doesn't matter.

This question contains some useful discussion, as well as a link to an MSDN blog post, on the philosophical aspects of the topic.

I think extension methods are cool, but I think break and edit-and-continue are cooler.

I'd think the second form would be tougher to optimize, as there's no way for the compiler to unroll the loop any differently for this one call than it does for anybody else's call to the Each method.
Since it was asked, I'll elaborate. The method's implementation is quite liable to be compiled separately from the code that invokes it. This means that the compiler does not know exactly how many loops it is going to have to perform.
If you use the "foreach" form then that information may be avaliable to the compiler when it is creating the code for the loop (it also may not be available, in which case no difference).
For example, if the compiler happens to know (from previous code in the same file) that the list has exactly 20 items in it, it can replace the entire loop with 20 references.
However, when the compiler creates code for the "Each" method off in its source file, it has no idea how big the caller's list is going to be. It has to support any size. The best it can do is try to find some kind of optimum unrolling for its CPU, and add extra code to loop through that and do a proper loop if it is too small for the unrolling. For a typical small loop this might even end up being slower. Of course for small loops you don't care as much....unless they happen to be inside a big loop.
As another poster mentioned, this is (and should be) a secondary concern. The important thing is which is easier to read and/or maintain, but I don't see a huge difference there.

I don't prefer either, because of what I consider to be an un-needed use of 'var'. I would write is as:
foreach(Zombie zombie in zombies){
}
But as to the Functional or foreach, for me I most definitely prefer foreach, because there doesn't seem to be a good reason for the latter.

LINQ Optimization Question

So I've been using LINQ for a while, and I have a question.
I have a collection of objects. They fall into two categories based on their property values. I need to set a different property one way for one group, and one way for the other:
foreach(MyItem currentItem in myItemCollection)
{
if (currentItem.myProp == "CATEGORY_ONE")
{
currentItem.value = 1;
}
else if (currentItem.myProp == "CATEGORY_TWO")
{
currentItem.value = 2;
}
}
Alternately, I could do something like:
myItemCollection.Where(currentItem=>currentItem.myProp == "CATEGORY_ONE").ForEach(item=>item.value = 1);
myItemCollection.Where(currentItem=>currentItem.myProp == "CATEGORY_TWO").ForEach(item=>item.value = 2);
I would think the first one is faster, but figured it couldn't hurt to check.

Iterating through the collection only once (and not calling any delegates, and not using as many iterators) is likely to be slightly faster, but I very much doubt that it'll be significant.
Write the most readable code which does the job, and only worry about performance at the micro level (i.e. where it's easy to change) when it's a problem.
I think the first piece of code is more readable in this case. Less LINQy, but more readable.

How about doing it like that?
myItemCollection.ForEach(item => item.value = item.myProp == "CATEGORY_ONE" ? 1 : 2);

Real Answer
Only a profiler will really tell you which one is faster.
Fuzzy Answer
The first one is most likely faster in terms of raw speed. There are two reasons why
The list is only iterated a single time
The second one erquires 2 delegate invocations for every element in the list.
The real question though is "does the speed difference between the two solutions matter?" That is the only question that is relevant to your application. And only profiling can really give you much data on this.

Invert "if" statement to reduce nesting

When I ran ReSharper on my code, for example:
if (some condition)
{
Some code...
}
ReSharper gave me the above warning (Invert "if" statement to reduce nesting), and suggested the following correction:
if (!some condition) return;
Some code...
I would like to understand why that's better. I always thought that using "return" in the middle of a method problematic, somewhat like "goto".

It is not only aesthetic, but it also reduces the maximum nesting level inside the method. This is generally regarded as a plus because it makes methods easier to understand (and indeed, many static analysis tools provide a measure of this as one of the indicators of code quality).
On the other hand, it also makes your method have multiple exit points, something that another group of people believes is a no-no.
Personally, I agree with ReSharper and the first group (in a language that has exceptions I find it silly to discuss "multiple exit points"; almost anything can throw, so there are numerous potential exit points in all methods).
Regarding performance: both versions should be equivalent (if not at the IL level, then certainly after the jitter is through with the code) in every language. Theoretically this depends on the compiler, but practically any widely used compiler of today is capable of handling much more advanced cases of code optimization than this.

A return in the middle of the method is not necessarily bad. It might be better to return immediately if it makes the intent of the code clearer. For example:
double getPayAmount() {
double result;
if (_isDead) result = deadAmount();
else {
if (_isSeparated) result = separatedAmount();
else {
if (_isRetired) result = retiredAmount();
else result = normalPayAmount();
};
}
return result;
};
In this case, if _isDead is true, we can immediately get out of the method. It might be better to structure it this way instead:
double getPayAmount() {
if (_isDead) return deadAmount();
if (_isSeparated) return separatedAmount();
if (_isRetired) return retiredAmount();
return normalPayAmount();
};
I've picked this code from the refactoring catalog. This specific refactoring is called: Replace Nested Conditional with Guard Clauses.

This is a bit of a religious argument, but I agree with ReSharper that you should prefer less nesting. I believe that this outweighs the negatives of having multiple return paths from a function.
The key reason for having less nesting is to improve code readability and maintainability. Remember that many other developers will need to read your code in the future, and code with less indentation is generally much easier to read.
Preconditions are a great example of where it is okay to return early at the start of the function. Why should the readability of the rest of the function be affected by the presence of a precondition check?
As for the negatives about returning multiple times from a method - debuggers are pretty powerful now, and it's very easy to find out exactly where and when a particular function is returning.
Having multiple returns in a function is not going to affect the maintainance programmer's job.
Poor code readability will.

As others have mentioned, there shouldn't be a performance hit, but there are other considerations. Aside from those valid concerns, this also can open you up to gotchas in some circumstances. Suppose you were dealing with a double instead:
public void myfunction(double exampleParam){
if(exampleParam > 0){
//Body will *not* be executed if Double.IsNan(exampleParam)
}
}
Contrast that with the seemingly equivalent inversion:
public void myfunction(double exampleParam){
if(exampleParam <= 0)
return;
//Body *will* be executed if Double.IsNan(exampleParam)
}
So in certain circumstances what appears to be a a correctly inverted if might not be.

The idea of only returning at the end of a function came back from the days before languages had support for exceptions. It enabled programs to rely on being able to put clean-up code at the end of a method, and then being sure it would be called and some other programmer wouldn't hide a return in the method that caused the cleanup code to be skipped. Skipped cleanup code could result in a memory or resource leak.
However, in a language that supports exceptions, it provides no such guarantees. In a language that supports exceptions, the execution of any statement or expression can cause a control flow that causes the method to end. This means clean-up must be done through using the finally or using keywords.
Anyway, I'm saying I think a lot of people quote the 'only return at the end of a method' guideline without understanding why it was ever a good thing to do, and that reducing nesting to improve readability is probably a better aim.

I'd like to add that there is name for those inverted if's - Guard Clause. I use it whenever I can.
I hate reading code where there is if at the beginning, two screens of code and no else. Just invert if and return. That way nobody will waste time scrolling.
http://c2.com/cgi/wiki?GuardClause

It doesn't only affect aesthetics, but it also prevents code nesting.
It can actually function as a precondition to ensure that your data is valid as well.

This is of course subjective, but I think it strongly improves on two points:
It is now immediately obvious that your function has nothing left to do if condition holds.
It keeps the nesting level down. Nesting hurts readability more than you'd think.

Multiple return points were a problem in C (and to a lesser extent C++) because they forced you to duplicate clean-up code before each of the return points. With garbage collection, the try | finally construct and using blocks, there's really no reason why you should be afraid of them.
Ultimately it comes down to what you and your colleagues find easier to read.

Guard clauses or pre-conditions (as you can probably see) check to see if a certain condition is met and then breaks the flow of the program. They're great for places where you're really only interested in one outcome of an if statement. So rather than say:
if (something) {
// a lot of indented code
}
You reverse the condition and break if that reversed condition is fulfilled
if (!something) return false; // or another value to show your other code the function did not execute
// all the code from before, save a lot of tabs
return is nowhere near as dirty as goto. It allows you to pass a value to show the rest of your code that the function couldn't run.
You'll see the best examples of where this can be applied in nested conditions:
if (something) {
do-something();
if (something-else) {
do-another-thing();
} else {
do-something-else();
}
}
vs:
if (!something) return;
do-something();
if (!something-else) return do-something-else();
do-another-thing();
You'll find few people arguing the first is cleaner but of course, it's completely subjective. Some programmers like to know what conditions something is operating under by indentation, while I'd much rather keep method flow linear.
I won't suggest for one moment that precons will change your life or get you laid but you might find your code just that little bit easier to read.

Performance-wise, there will be no noticeable difference between the two approaches.
But coding is about more than performance. Clarity and maintainability are also very important. And, in cases like this where it doesn't affect performance, it is the only thing that matters.
There are competing schools of thought as to which approach is preferable.
One view is the one others have mentioned: the second approach reduces the nesting level, which improves code clarity. This is natural in an imperative style: when you have nothing left to do, you might as well return early.
Another view, from the perspective of a more functional style, is that a method should have only one exit point. Everything in a functional language is an expression. So if statements must always have an else clauses. Otherwise the if expression wouldn't always have a value. So in the functional style, the first approach is more natural.

There are several good points made here, but multiple return points can be unreadable as well, if the method is very lengthy. That being said, if you're going to use multiple return points just make sure that your method is short, otherwise the readability bonus of multiple return points may be lost.

Performance is in two parts. You have performance when the software is in production, but you also want to have performance while developing and debugging. The last thing a developer wants is to "wait" for something trivial. In the end, compiling this with optimization enabled will result in similar code. So it's good to know these little tricks that pay off in both scenarios.
The case in the question is clear, ReSharper is correct. Rather than nesting if statements, and creating new scope in code, you're setting a clear rule at the start of your method. It increases readability, it will be easier to maintain, and it reduces the amount of rules one has to sift through to find where they want to go.

Personally I prefer only 1 exit point. It's easy to accomplish if you keep your methods short and to the point, and it provides a predictable pattern for the next person who works on your code.
eg.
bool PerformDefaultOperation()
{
bool succeeded = false;
DataStructure defaultParameters;
if ((defaultParameters = this.GetApplicationDefaults()) != null)
{
succeeded = this.DoSomething(defaultParameters);
}
return succeeded;
}
This is also very useful if you just want to check the values of certain local variables within a function before it exits. All you need to do is place a breakpoint on the final return and you are guaranteed to hit it (unless an exception is thrown).

Avoiding multiple exit points can lead to performance gains. I am not sure about C# but in C++ the Named Return Value Optimization (Copy Elision, ISO C++ '03 12.8/15) depends on having a single exit point. This optimization avoids copy constructing your return value (in your specific example it doesn't matter). This could lead to considerable gains in performance in tight loops, as you are saving a constructor and a destructor each time the function is invoked.
But for 99% of the cases saving the additional constructor and destructor calls is not worth the loss of readability nested if blocks introduce (as others have pointed out).

Many good reasons about how the code looks like. But what about results?
Let's take a look to some C# code and its IL compiled form:
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length == 0) return;
if ((args.Length+2)/3 == 5) return;
Console.WriteLine("hey!!!");
}
}
This simple snippet can be compiled. You can open the generated .exe file with ildasm and check what is the result. I won't post all the assembler thing but I'll describe the results.
The generated IL code does the following:
If the first condition is false, jumps to the code where the second is.
If it's true jumps to the last instruction. (Note: the last instruction is a return).
In the second condition the same happens after the result is calculated. Compare and: got to the Console.WriteLine if false or to the end if this is true.
Print the message and return.
So it seems that the code will jump to the end. What if we do a normal if with nested code?
using System;
public class Test {
public static void Main(string[] args) {
if (args.Length != 0 && (args.Length+2)/3 != 5)
{
Console.WriteLine("hey!!!");
}
}
}
The results are quite similar in IL instructions. The difference is that before there were two jumps per condition: if false go to next piece of code, if true go to the end. And now the IL code flows better and has 3 jumps (the compiler optimized this a bit):
First jump: when Length is 0 to a part where the code jumps again (Third jump) to the end.
Second: in the middle of the second condition to avoid one instruction.
Third: if the second condition is false, jump to the end.
Anyway, the program counter will always jump.

In theory, inverting if could lead to better performance if it increases branch prediction hit rate. In practice, I think it is very hard to know exactly how branch prediction will behave, especially after compiling, so I would not do it in my day-to-day development, except if I am writing assembly code.
More on branch prediction here.

That is simply controversial. There is no "agreement among programmers" on the question of early return. It's always subjective, as far as I know.
It's possible to make a performance argument, since it's better to have conditions that are written so they are most often true; it can also be argued that it is clearer. It does, on the other hand, create nested tests.
I don't think you will get a conclusive answer to this question.

There are a lot of insightful answers there already, but still, I would to direct to a slightly different situation: Instead of precondition, that should be put on top of a function indeed, think of a step-by-step initialization, where you have to check for each step to succeed and then continue with the next. In this case, you cannot check everything at the top.
I found my code really unreadable when writing an ASIO host application with Steinberg's ASIOSDK, as I followed the nesting paradigm. It went like eight levels deep, and I cannot see a design flaw there, as mentioned by Andrew Bullock above. Of course, I could have packed some inner code to another function, and then nested the remaining levels there to make it more readable, but this seems rather random to me.
By replacing nesting with guard clauses, I even discovered a misconception of mine regarding a portion of cleanup-code that should have occurred much earlier within the function instead of at the end. With nested branches, I would never have seen that, you could even say they led to my misconception.
So this might be another situation where inverted ifs can contribute to a clearer code.

It's a matter of opinion.
My normal approach would be to avoid single line ifs, and returns in the middle of a method.
You wouldn't want lines like it suggests everywhere in your method but there is something to be said for checking a bunch of assumptions at the top of your method, and only doing your actual work if they all pass.

In my opinion early return is fine if you are just returning void (or some useless return code you're never gonna check) and it might improve readability because you avoid nesting and at the same time you make explicit that your function is done.
If you are actually returning a returnValue - nesting is usually a better way to go cause you return your returnValue just in one place (at the end - duh), and it might make your code more maintainable in a whole lot of cases.

I'm not sure, but I think, that R# tries to avoid far jumps. When You have IF-ELSE, compiler does something like this:
Condition false -> far jump to false_condition_label
true_condition_label:
instruction1
...
instruction_n
false_condition_label:
instruction1
...
instruction_n
end block
If condition is true there is no jump and no rollout L1 cache, but jump to false_condition_label can be very far and processor must rollout his own cache. Synchronising cache is expensive. R# tries replace far jumps into short jumps and in this case there is bigger probability, that all instructions are already in cache.

I think it depends on what you prefer, as mentioned, theres no general agreement afaik.
To reduce annoyment, you may reduce this kind of warning to "Hint"

My idea is that the return "in the middle of a function" shouldn't be so "subjective".
The reason is quite simple, take this code:
function do_something( data ){
if (!is_valid_data( data ))
return false;
do_something_that_take_an_hour( data );
istance = new object_with_very_painful_constructor( data );
if ( istance is not valid ) {
error_message( );
return ;
}
connect_to_database ( );
get_some_other_data( );
return;
}
Maybe the first "return" it's not SO intuitive, but that's really saving.
There are too many "ideas" about clean codes, that simply need more practise to lose their "subjective" bad ideas.

There are several advantages to this sort of coding but for me the big win is, if you can return quick you can improve the speed of your application. IE I know that because of Precondition X that I can return quickly with an error. This gets rid of the error cases first and reduces the complexity of your code. In a lot of cases because the cpu pipeline can be now be cleaner it can stop pipeline crashes or switches. Secondly if you are in a loop, breaking or returning out quickly can save you a lots of cpu. Some programmers use loop invariants to do this sort of quick exit but in this you can broke your cpu pipeline and even create memory seek problem and mean the the cpu needs to load from outside cache. But basically I think you should do what you intended, that is end the loop or function not create a complex code path just to implement some abstract notion of correct code. If the only tool you have is a hammer then everything looks like a nail.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

foreach loop List performance difference - c#

Related

Does inverting the "if" improve performance? [duplicate]

Calling a method in a linq foreach - how much overhead is there?

C# foreach vs functional each [closed]

LINQ Optimization Question

Invert "if" statement to reduce nesting

Categories

Resources