A colleague once said that God is killing a kitten every time I write a for-loop.
When asked how to avoid for-loops, his answer was to use a functional language. However, if you are stuck with a non-functional language, say C#, what techniques are there to avoid for-loops or to get rid of them by refactoring? With lambda expressions and LINQ perhaps? If so, how?
Questions
So the question boils down to:
Why are for-loops bad? Or, in what context are for-loops to avoid and why?
Can you provide C# code examples of how it looks before, i.e. with a loop, and afterwards without a loop?
Functional constructs often express your intent more clearly than for-loops in cases where you operate on some data set and want to transform, filter or aggregate the elements.
Loops are very appropriate when you want to repeatedly execute some action.
For example
int x = array.Sum();
much more clearly expresses your intent than
int x = 0;
for (int i = 0; i < array.Length; i++)
{
x += array[i];
}
Why are for-loops bad? Or, in what
context are for-loops to avoid and
why?
If your colleague has a functional programming, then he's probably already familiar with the basic reasons for avoiding for loops:
Fold / Map / Filter cover most use cases of list traversal, and lend themselves well to function composition. For-loops aren't a good pattern because they aren't composable.
Most of the time, you traverse through a list to fold (aggregate), map, or filter values in a list. These higher order functions already exist in every mainstream functional language, so you rarely see the for-loop idiom used in functional code.
Higher order functions are the bread and butter of function composition, meaning you can easily combine simple function into something more complex.
To give a non-trivial example, consider the following in an imperative language:
let x = someList;
y = []
for x' in x
y.Add(f x')
z = []
for y' in y
z.Add(g y')
In a functional language, we'd write map g (map f x), or we can eliminate the intermediate list using map (f . g) x. Now we can, in principle, eliminate the intermediate list from the imperative version, and that would help a little -- but not much.
The main problem with the imperative version is simply that the for-loops are implementation details. If you want change the function, you change its implementation -- and you end up modifying a lot of code.
Case in point, how would you write map g (filter f x) in imperatively? Well, since you can't reuse your original code which maps and maps, you need to write a new function which filters and maps instead. And if you have 50 ways to map and 50 ways to filter, how you need 50^50 functions, or you need to simulate the ability to pass functions as first-class parameters using the command pattern (if you've ever tried functional programming in Java, you understand what a nightmare this can be).
Back in the the functional universe, you can generalize map g (map f x) in way that lets you swap out the map with filter or fold as needed:
let apply2 a g b f x = a g (b f x)
And call it using apply2 map g filter f or apply2 map g map f or apply2 filter g filter f or whatever you need. Now you'd probably never write code like that in the real world, you'd probably simplify it using:
let mapmap g f = apply2 map g map f
let mapfilter g f = apply2 map g filter f
Higher-order functions and function composition give you a level of abstraction that you cannot get with the imperative code.
Abstracting out the implementation details of loops let's you seamlessly swap one loop for another.
Remember, for-loops are an implementation detail. If you need to change the implementation, you need to change every for-loop.
Map / fold / filter abstract away the loop. So if you want to change the implementation of your loops, you change it in those functions.
Now you might wonder why you'd want to abstract away a loop. Consider the task of mapping items from one type to another: usually, items are mapped one at a time, sequentially, and independently from all other items. Most of the time, maps like this are prime candidates for parallelization.
Unfortunately, the implementation details for sequential maps and parallel maps aren't interchangeable. If you have a ton of sequential maps all over your code, and you want swap them out for parallel maps, you have two choices: copy/paste the same parallel mapping code all over your code base, or abstract away mapping logic into two functions map and pmap. Once you're go the second route, you're already knee-deep in functional programming territory.
If you understand the purpose of function composition and abstracting away implementation details (even details as trivial as looping), you can start to appreciate just how and why functional programming is so powerful in the first place.
For loops are not bad. There are many very valid reasons to keep a for loop.
You can often "avoid" a for loop by reworking it using LINQ in C#, which provides a more declarative syntax. This can be good or bad depending on the situation:
Compare the following:
var collection = GetMyCollection();
for(int i=0;i<collection.Count;++i)
{
if(collection[i].MyValue == someValue)
return collection[i];
}
vs foreach:
var collection = GetMyCollection();
foreach(var item in collection)
{
if(item.MyValue == someValue)
return item;
}
vs. LINQ:
var collection = GetMyCollection();
return collection.FirstOrDefault(item => item.MyValue == someValue);
Personally, all three options have their place, and I use them all. It's a matter of using the most appropriate option for your scenario.
There's nothing wrong with for loops but here are some of the reasons people might prefer functional/declarative approaches like LINQ where you declare what you want rather than how you get it:-
Functional approaches are potentially easier to parallelize either manually using PLINQ or by the compiler. As CPUs move to even more cores this may become more important.
Functional approaches make it easier to achieve lazy evaluation in multi-step processes because you can pass the intermediate results to the next step as a simple variable which hasn't been evaluated fully yet rather than evaluating the first step entirely and then passing a collection to the next step (or without using a separate method and a yield statement to achieve the same procedurally).
Functional approaches are often shorter and easier to read.
Functional approaches often eliminate complex conditional bodies within for loops (e.g. if statements and 'continue' statements) because you can break the for loop down into logical steps - selecting all the elements that match, doing an operation on them, ...
For loops don't kill people (or kittens, or puppies, or tribbles). People kill people.
For loops, in and of themselves, are not bad. However, like anything else, it's how you use them that can be bad.
Sometime you don't kill just one kitten.
for (int i = 0; i < kittens.Length; i++)
{
kittens[i].Kill();
}
Sometimes you kill them all.
You can refactor your code well enough so that you won't see them often. A good function name is definitely more readable that a for loop.
Taking the example from AndyC :
Loop
// mystrings is a string array
List<string> myList = new List<string>();
foreach(string s in mystrings)
{
if(s.Length > 5)
{
myList.add(s);
}
}
Linq
// mystrings is a string array
List<string> myList = mystrings.Where<string>(t => t.Length > 5)
.ToList<string();
Wheter you use the first or the second version inside your function, It's easier to read
var filteredList = myList.GetStringLongerThan(5);
Now that's an overly simple example, but you get my point.
Your colleague is not right. For loops are not bad per se. They are clean, readable and not particularly error prone.
Your colleague is wrong about for loops being bad in all cases, but correct that they can be rewritten functionally.
Say you have an extension method that looks like this:
void ForEach<T>(this IEnumerable<T> collection, Action <T> action)
{
foreach(T item in collection)
{
action(item)
}
}
Then you can write a loop like this:
mycollection.ForEach(x => x.DoStuff());
This may not be very useful now. But if you then replace your implementation of the ForEach extension method for use a multi threaded approach then you gain the advantages of parallelism.
This obviously isn't always going to work, this implementation only works if the loop iterations are completely independent of each other, but it can be useful.
Also: always be wary of people who say some programming construct is always wrong.
A simple (and pointless really) example:
Loop
// mystrings is a string array
List<string> myList = new List<string>();
foreach(string s in mystrings)
{
if(s.Length > 5)
{
myList.add(s);
}
}
Linq
// mystrings is a string array
List<string> myList = mystrings.Where<string>(t => t.Length > 5).ToList<string>();
In my book, the second one looks a lot tidier and simpler, though there's nothing wrong with the first one.
Sometimes a for-loop is bad if there exists a more efficient alternative. Such as searching, where it might be more efficient to sort a list and then use quicksort or binary sort. Or when you are iterating over items in a database. It is usually much more efficient to use set-based operations in a database instead of iterating over the items.
Otherwise if the for-loop, especially a for-each makes the most sense and is readable, then I would go with that rather than rafactor it into something that isn't as intuitive. I personally don't believe in these religious sounding "always do it this way, because that is the only way". Rather it is better to have guidelines, and understand in what scenarios it is appropriate to apply those guidelines. It is good that you ask the Why's!
For loop is, let's say, "bad" as it implies branch prediction in CPU, and possibly performance decrease when branch prediction miss.
But CPU (having a branch prediction accuracy of 97%) and compiler with tecniques like loop unrolling, make loop performance reduction negligible.
If you abstract the for loop directly you get:
void For<T>(T initial, Func<T,bool> whilePredicate, Func<T,T> step, Action<T> action)
{
for (T t = initial; whilePredicate(t); step(t))
{
action(t);
}
}
The problem I have with this from a functional programming perspective is the void return type. It essentially means that for loops do not compose nicely with anything. So the goal is not to have a 1-1 conversion from for loop to some function, it is to think functionally and avoid doing things that do not compose. Instead of thinking of looping and acting think of the whole problem and what you are mapping from and to.
A for loop can always be replaced by a recursive function that doesn't involve the use of a loop. A recursive function is a more functional stye of programming.
But if you blindly replace for loops with recursive functions, then kittens and puppies will both die by the millions, and you will be done in by a velocirapter.
OK, here's an example. But please keep in mind that I do not advocate making this change!
The for loop
for (int index = 0; index < args.Length; ++index)
Console.WriteLine(args[index]);
Can be changed to this recursive function call
WriteValuesToTheConsole(args, 0);
static void WriteValuesToTheConsole<T>(T[] values, int startingIndex)
{
if (startingIndex < values.Length)
{
Console.WriteLine(values[startingIndex]);
WriteValuesToTheConsole<T>(values, startingIndex + 1);
}
}
This should work just the same for most values, but it is far less clear, less effecient, and could exhaust the stack if the array is too large.
Your colleague may be suggesting under certain circumstances where database data is involved that it is better to use an aggregate SQL function such as Average() or Sum() at query time as opposed to processing the data on the C# side within an ADO .NET application.
Otherwise for loops are highly effective when used properly, but realize that if you find yourself nesting them to three or more orders, you might need a better algorithm, such as one that involves recursion, subroutines or both. For example, a bubble sort has a O(n^2) runtime on its worst-case (reverse order) scenario, but a recursive sort algorithm is only O(n log n), which is much better.
Hopefully this helps.
Jim
Any construct in any language is there for a reason. It's a tool to be used to accomplish a task. Means to an end. In every case, there are manners in which to use it appropriately, that is, in a clear and concise way and within the spirit of the language AND manners to abuse it. This applies to the much-misaligned goto statement as well as to your for loop conundrum, as well as while, do-while, switch/case, if-then-else, etc. If the for loop is the right tool for what you're doing, USE IT and your colleague will need to come to terms with your design decision.
It depends upon what is in the loop but he/she may be referring to a recursive function
//this is the recursive function
public static void getDirsFiles(DirectoryInfo d)
{
//create an array of files using FileInfo object
FileInfo [] files;
//get all files for the current directory
files = d.GetFiles("*.*");
//iterate through the directory and print the files
foreach (FileInfo file in files)
{
//get details of each file using file object
String fileName = file.FullName;
String fileSize = file.Length.ToString();
String fileExtension =file.Extension;
String fileCreated = file.LastWriteTime.ToString();
io.WriteLine(fileName + " " + fileSize +
" " + fileExtension + " " + fileCreated);
}
//get sub-folders for the current directory
DirectoryInfo [] dirs = d.GetDirectories("*.*");
//This is the code that calls
//the getDirsFiles (calls itself recursively)
//This is also the stopping point
//(End Condition) for this recursion function
//as it loops through until
//reaches the child folder and then stops.
foreach (DirectoryInfo dir in dirs)
{
io.WriteLine("--------->> {0} ", dir.Name);
getDirsFiles(dir);
}
}
The question is if the loop will be mutating state or causing side effects. If so, use a foreach loop. If not, consider using LINQ or other functional constructs.
See "foreach" vs "ForEach" on Eric Lippert's Blog.
Related
While I was programming I came up with this question,
What is better, having a method accept a single entity or a List of those entity's?
For example I need a List of strings. I can either have:
a method accepting a List and return a List of strings with the results.
List<string> results = methodwithlist(List[objects]);
or
a method accepting a object and return a string. Then use this function in a loop and so filling a list.
for int i = 0; i < List<objects>.Count;i++;)
{
results = methodwithsingleobject(List<objects>[i]);
}
** This is just a example. I need to know which one is better, or more used and why.
Thanks!
Well, it's easy to build the first form when you've got the second - but using LINQ, you really don't need to write your own, once you've got the projection. For example, you could write:
List<string> results = objectList.Select(X => MethodWithSingleObject()).ToList();
Generally it's easier to write and test a method which only deals with a single value, unless it actually needs to know the rest of the values in the collection (e.g. to find aggregates).
I would choose the second because it's easier to use when you have a single string (i.e. it's more general purpose). Also, the responsibility of the method itself is more clear because the method should not have anything to do with lists if it's purpose is just to modify a string.
Also, you can simplify the call with Linq:
result = yourList.Select(p => methodwithsingleobject(p));
This question comes up a lot when learning any language, the answer is somewhat moot since the standard coding practice is to rely upon LINQ to optimize the code for you at runtime. But this presumes you're using a version of the language that supports it. But if you do want to do some research on this there are a few Stack Overflow articles that delve into this and also give external resources to review:
In .NET, which loop runs faster, 'for' or 'foreach'?
C#, For Loops, and speed test... Exact same loop faster second time around?
What I have learned, though, is not to rely too heavily on Count and to use Length on typed Collections as that can be a lot faster.
Hope this is helpful.
The question field is a bit too short to pose my real question. If anyone can recapitulate it better, please feel free.
My real question is this: I'm reading a lot of other people's code in C# these days, and I have noticed that one particular form of iteration is widely spread, (see in code).
My first question is:
Are all these iterations equivalent?
And my second is: why prefer the first? Has it something to do with readibility? Now I don't believe the first form is more readable then the for-form once you get used to it, and readibility is far too much a subjective item in these constructs, of course, what you use the most will seem more readable, but I can assure everyone that the for-form is at least as readable, since it has all in one line, and you can even read the initializing in the construct.
Thus the second question: why is the 3rd form seen much less in code?
// the 'widespread' construct
int nr = getNumber();
while (NotZero(nr))
{
Console.Write(1/nr);
nr = getNumber();
}
// the somewhat shorter form
int nr;
while (NotZero(nr = getNumber()))
Console.Write(1 / nr);
// the for - form
for (int nr = getNumber(); NotZero(nr); nr = getNumber())
Console.Write(1 / nr);
The first and third forms you've shown repeat the call to GetNumber. I prefer the second form, although it has the disadvantage of using a side-effect within a condition of course. However I pretty much only do that with a while loop. Usually I don't end up passing the result as an argument though - the common situations I find myself in are:
string line;
while ( (line = reader.ReadLine()) != null)
...
and
int bytesRead;
while ( (bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
...
Both of these are now so idiomatic to me that they don't cause me any problems - and as I say, they allow me to only state each piece of logic once.
If you don't like the variable having too much scope, you can just introduce an extra block:
{
int bytesRead;
while ( (bytesRead = stream.Read(buffer, 0, buffer.Length)) > 0)
{
// Body
}
}
Personally I don't tend to do this - the "too-wide" scope doesn't bother me that much.
I suspect it wouldn't be too hard to write a method to encapsulate all of this. Something like:
ForEach(() => reader.ReadLine(), // Way to obtain a value
line => line != null, // Condition
line =>
{
// body
};
Mind you, for line reading I have a class which helps:
foreach (string line in new LineReader(file))
{
// body
}
(It doesn't just work with files - it's pretty flexible.)
Are all this iterations equivalents?
yes
why prefer the first? Has it sth. to do with readibility?
because you may want to extend the scope of the nr var beyond the while loop?
why is the 3th form seen much less in code?
it is equivalent, same statements!
You may prefer the latter because you don't want to extend the scope of the nr variable
I think that the third form (for-loop) is the best of these alternatives, because it puts things into the right scope. On the other hand, having to repeat the call to getNumber() is a bit awkward, too.
Generally, I think that explicit looping is widely overused. High-level languages should provide mapping, filtering, and reducing. When these high level constructs are applicable and available, looping instead is like using goto instead of looping.
If mapping, filtering, or reducing is not applicable, I would perhaps write a little macro for this kind of loop (C# doesn't have those, though, does it?).
I offer another alternative
foreach (var x in InitInfinite(() => GetNumber()).TakeWhile(NotZero))
{
Console.WriteLine(1.0/x);
}
where InitInfinite is a trivial helper function. Whole program:
using System;
using System.Collections.Generic;
using System.Linq;
class Program
{
static IEnumerable<T> InitInfinite<T>(Func<T> f)
{
while (true)
{
yield return f();
}
}
static int N = 5;
static int GetNumber()
{
N--;
return N;
}
static bool NotZero(int n) { return n != 0; }
static void Main(string[] args)
{
foreach (var x in InitInfinite(() => GetNumber()).TakeWhile(NotZero))
{
Console.WriteLine(1.0/x);
}
}
}
I think people use the while() loop often because it best represents the way you would visualize the task in your head. I think think there is any performance benefits for using it over any other loop structure.
Here is a random speculation:
When I write C# code, the only two looping constructs I write are while() and foreach(). That is, no one uses 'for' any more, since 'foreach' often works and is often superior. (This is an overgeneralization, but it has a core of truth.) As a result, my brain has to strain to read any 'for' loop because it's unfamiliar.
As for why (1) and (2) are "preferred" over (3), my feeling is that most people think of the latter as a way to iterate over a range, using the condition to define the range, rather than continuing to iterate over a block while some condition still holds. The keyword semantics lend themselves to this interpretation and I suspect that, partly because of that, people find that the expressions are most readable in that context. For instance, I would never use (1) or (2) to iterate over a range, though I could.
Between (1) and (2), I'm torn. I used to use (2) (in C) most often due to the compactness, but now (in C#) I generally write (1). I suppose that I've come to value readability over compactness and (1) seems easier to parse quickly and thus more readable to my mind even though I do end up repeating a small amount of logic.
Honestly, I rarely write while statements anymore, typically using foreach -- or LINQ -- in the cases where while statements would previously been used. Come to think of it, I'm not sure I use many for statements, either, except in unit tests where I'm generating some fixed number of a test object.
I'm looking through a generic list to find items based on a certain parameter.
In General, what would be the best and fastest implementation?
1. Looping through each item in the list and saving each match to a new list and returning that
foreach(string s in list)
{
if(s == "match")
{
newList.Add(s);
}
}
return newList;
Or
2. Using the FindAll method and passing it a delegate.
newList = list.FindAll(delegate(string s){return s == "match";});
Don't they both run in ~ O(N)? What would be the best practice here?
Regards,
Jonathan
You should definitely use the FindAll method, or the equivalent LINQ method. Also, consider using the more concise lambda instead of your delegate if you can (requires C# 3.0):
var list = new List<string>();
var newList = list.FindAll(s => s.Equals("match"));
I would use the FindAll method in this case, as it is more concise, and IMO, has easier readability.
You are right that they are pretty much going to both perform in O(N) time, although the foreach statement should be slightly faster given it doesn't have to perform a delegate invocation (delegates incur a slight overhead as opposed to directly calling methods).
I have to stress how insignificant this difference is, it's more than likely never going to make a difference unless you are doing a massive number of operations on a massive list.
As always, test to see where the bottlenecks are and act appropriately.
Jonathan,
A good answer you can find to this is in chapter 5 (performance considerations) of Linq To Action.
They measure a for each search that executes about 50 times and that comes up with foreach = 68ms per cycle / List.FindAll = 62ms per cycle. Really, it would probably be in your interest to just create a test and see for yourself.
List.FindAll is O(n) and will search the entire list.
If you want to run your own iterator with foreach, I'd recommend using the yield statement, and returning an IEnumerable if possible. This way, if you end up only needing one element of your collection, it will be quicker (since you can stop your caller without exhausting the entire collection).
Otherwise, stick to the BCL interface.
Any perf difference is going to be extremely minor. I would suggest FindAll for clarity, or, if possible, Enumerable.Where. I prefer using the Enumerable methods because it allows for greater flexibility in refactoring the code (you don't take a dependency on List<T>).
Yes, they both implementations are O(n). They need to look at every element in the list to find all matches. In terms of readability I would also prefer FindAll. For performance considerations have a look at LINQ in Action (Ch 5.3). If you are using C# 3.0 you could also apply a lambda expression. But that's just the icing on the cake:
var newList = aList.FindAll(s => s == "match");
Im with the Lambdas
List<String> newList = list.FindAll(s => s.Equals("match"));
Unless the C# team has improved the performance for LINQ and FindAll, the following article seems to suggest that for and foreach would outperform LINQ and FindAll on object enumeration: LINQ on Objects Performance.
This artilce was dated back to March 2009, just before this post originally asked.
I'm working with a code base where lists need to be frequently searched for a single element.
Is it faster to use a Predicate and Find() than to manually do an enumeration on the List?
for example:
string needle = "example";
FooObj result = _list.Find(delegate(FooObj foo) {
return foo.Name == needle;
});
vs.
string needle = "example";
foreach (FooObj foo in _list)
{
if (foo.Name == needle)
return foo;
}
While they are equivalent in functionality, are they equivalent in performance as well?
They are not equivalent in performance. The Find() method requires a method (in this case delegate) invocation for every item in the list. Method invocation is not free and is relatively expensive as compared to an inline comparison. The foreach version requires no extra method invocation per object.
That being said, I wouldn't pick one or the other based on performance until I actually profiled my code and found this to be a problem. I haven't yet found the overhead of this scenario to every be a "hot path" problem for code I've written and I use this pattern a lot with Find and other similar methods.
If searching your list is too slow as-is, you can probably do better than a linear search. If you can keep the list sorted, you can use a binary search to find the element in O(lg n) time.
If you're searching a whole lot, consider replacing that list with a Dictionary to index your objects by name.
Technically, the runtime performance of the delegate version will be slightly worse than the other version - but in most cases you'd be hard pressed to perceive any difference.
Of more importance (IHMO) is the code time performance of being able to write what you want, rather than how you want it. This makes a big difference in maintainability.
This original code:
string needle = "example";
foreach (FooObj foo in _list)
{
if (foo.Name == needle)
return foo;
}
requires any maintainer to read the code and understand that you're looking for a particular item.
This code
string needle = "example";
return _list.Find(
delegate(FooObj foo)
{
return foo.Name == needle;
});
makes it clear that you're looking for a particular item - quicker to understand.
Finally, this code, using features from C# 3.0:
string needle = "example";
return _list.Find( foo => foo.Name == needle);
does exactly the same thing, but in one line that's even faster to read and understand (well, once you understand lambda expressions, anyway).
In summary, given that the performance of the alternatives is nearly equal, choose the one that makes the code easier to read and maintain.
"I'm working with a code base where lists need to be frequently searched for a single element"
It is better to change your data structure to be Dictionary instead of List to get better performance
Similar question was asked for List.ForEach vs. foreach-iteration (foreach vs someList.Foreach(){}).
In that case List.ForEach was a bit faster.
As Jared pointed out, there are differences.
But, as always, don't worry unless you know it's a bottleneck. And if it is a bottleneck, that's probably because the lists are big, in which case you should consider using a faster find - a hash table or binary tree, or even just sorting the list and doing binary search will give you log(n) performance which will have far more impact than tweaking your linear case.
There are apparently many ways to iterate over a collection. Curious if there are any differences, or why you'd use one way over the other.
First type:
List<string> someList = <some way to init>
foreach(string s in someList) {
<process the string>
}
Other Way:
List<string> someList = <some way to init>
someList.ForEach(delegate(string s) {
<process the string>
});
I suppose off the top of my head, that instead of the anonymous delegate I use above, you'd have a reusable delegate you could specify...
There is one important, and useful, distinction between the two.
Because .ForEach uses a for loop to iterate the collection, this is valid (edit: prior to .net 4.5 - the implementation changed and they both throw):
someList.ForEach(x => { if(x.RemoveMe) someList.Remove(x); });
whereas foreach uses an enumerator, so this is not valid:
foreach(var item in someList)
if(item.RemoveMe) someList.Remove(item);
tl;dr: Do NOT copypaste this code into your application!
These examples aren't best practice, they are just to demonstrate the differences between ForEach() and foreach.
Removing items from a list within a for loop can have side effects. The most common one is described in the comments to this question.
Generally, if you are looking to remove multiple items from a list, you would want to separate the determination of which items to remove from the actual removal. It doesn't keep your code compact, but it guarantees that you do not miss any items.
We had some code here (in VS2005 and C#2.0) where the previous engineers went out of their way to use list.ForEach( delegate(item) { foo;}); instead of foreach(item in list) {foo; }; for all the code that they wrote. e.g. a block of code for reading rows from a dataReader.
I still don't know exactly why they did this.
The drawbacks of list.ForEach() are:
It is more verbose in C# 2.0. However, in C# 3 onwards, you can use the "=>" syntax to make some nicely terse expressions.
It is less familiar. People who have to maintain this code will wonder why you did it that way. It took me awhile to decide that there wasn't any reason, except maybe to make the writer seem clever (the quality of the rest of the code undermined that). It was also less readable, with the "})" at the end of the delegate code block.
See also Bill Wagner's book "Effective C#: 50 Specific Ways to Improve Your C#" where he talks about why foreach is preferred to other loops like for or while loops - the main point is that you are letting the compiler decide the best way to construct the loop. If a future version of the compiler manages to use a faster technique, then you will get this for free by using foreach and rebuilding, rather than changing your code.
a foreach(item in list) construct allows you to use break or continue if you need to exit the iteration or the loop. But you cannot alter the list inside a foreach loop.
I'm surprised to see that list.ForEach is slightly faster. But that's probably not a valid reason to use it throughout , that would be premature optimisation. If your application uses a database or web service that, not loop control, is almost always going to be be where the time goes. And have you benchmarked it against a for loop too? The list.ForEach could be faster due to using that internally and a for loop without the wrapper would be even faster.
I disagree that the list.ForEach(delegate) version is "more functional" in any significant way. It does pass a function to a function, but there's no big difference in the outcome or program organisation.
I don't think that foreach(item in list) "says exactly how you want it done" - a for(int 1 = 0; i < count; i++) loop does that, a foreach loop leaves the choice of control up to the compiler.
My feeling is, on a new project, to use foreach(item in list) for most loops in order to adhere to the common usage and for readability, and use list.Foreach() only for short blocks, when you can do something more elegantly or compactly with the C# 3 "=>" operator. In cases like that, there may already be a LINQ extension method that is more specific than ForEach(). See if Where(), Select(), Any(), All(), Max() or one of the many other LINQ methods doesn't already do what you want from the loop.
As they say, the devil is in the details...
The biggest difference between the two methods of collection enumeration is that foreach carries state, whereas ForEach(x => { }) does not.
But lets dig a little deeper, because there are some things you should be aware of that can influence your decision, and there are some caveats you should be aware of when coding for either case.
Lets use List<T> in our little experiment to observe behavior. For this experiment, I am using .NET 4.7.2:
var names = new List<string>
{
"Henry",
"Shirley",
"Ann",
"Peter",
"Nancy"
};
Lets iterate over this with foreach first:
foreach (var name in names)
{
Console.WriteLine(name);
}
We could expand this into:
using (var enumerator = names.GetEnumerator())
{
}
With the enumerator in hand, looking under the covers we get:
public List<T>.Enumerator GetEnumerator()
{
return new List<T>.Enumerator(this);
}
internal Enumerator(List<T> list)
{
this.list = list;
this.index = 0;
this.version = list._version;
this.current = default (T);
}
public bool MoveNext()
{
List<T> list = this.list;
if (this.version != list._version || (uint) this.index >= (uint) list._size)
return this.MoveNextRare();
this.current = list._items[this.index];
++this.index;
return true;
}
object IEnumerator.Current
{
{
if (this.index == 0 || this.index == this.list._size + 1)
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumOpCantHappen);
return (object) this.Current;
}
}
Two things become immediate evident:
We are returned a stateful object with intimate knowledge of the underlying collection.
The copy of the collection is a shallow copy.
This is of course in no way thread safe. As was pointed out above, changing the collection while iterating is just bad mojo.
But what about the problem of the collection becoming invalid during iteration by means outside of us mucking with the collection during iteration? Best practices suggests versioning the collection during operations and iteration, and checking versions to detect when the underlying collection changes.
Here's where things get really murky. According to the Microsoft documentation:
If changes are made to the collection, such as adding, modifying, or
deleting elements, the behavior of the enumerator is undefined.
Well, what does that mean? By way of example, just because List<T> implements exception handling does not mean that all collections that implement IList<T> will do the same. That seems to be a clear violation of the Liskov Substitution Principle:
Objects of a superclass shall be replaceable with objects of its
subclasses without breaking the application.
Another problem is that the enumerator must implement IDisposable -- that means another source of potential memory leaks, not only if the caller gets it wrong, but if the author does not implement the Dispose pattern correctly.
Lastly, we have a lifetime issue... what happens if the iterator is valid, but the underlying collection is gone? We now a snapshot of what was... when you separate the lifetime of a collection and its iterators, you are asking for trouble.
Lets now examine ForEach(x => { }):
names.ForEach(name =>
{
});
This expands to:
public void ForEach(Action<T> action)
{
if (action == null)
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
int version = this._version;
for (int index = 0; index < this._size && (version == this._version || !BinaryCompatibility.TargetsAtLeast_Desktop_V4_5); ++index)
action(this._items[index]);
if (version == this._version || !BinaryCompatibility.TargetsAtLeast_Desktop_V4_5)
return;
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
Of important note is the following:
for (int index = 0; index < this._size && ... ; ++index)
action(this._items[index]);
This code does not allocate any enumerators (nothing to Dispose), and does not pause while iterating.
Note that this also performs a shallow copy of the underlying collection, but the collection is now a snapshot in time. If the author does not correctly implement a check for the collection changing or going 'stale', the snapshot is still valid.
This doesn't in any way protect you from the problem of the lifetime issues... if the underlying collection disappears, you now have a shallow copy that points to what was... but at least you don't have a Dispose problem to deal with on orphaned iterators...
Yes, I said iterators... sometimes its advantageous to have state. Suppose you want to maintain something akin to a database cursor... maybe multiple foreach style Iterator<T>'s is the way to go. I personally dislike this style of design as there are too many lifetime issues, and you rely on the good graces of the authors of the collections you are relying on (unless you literally write everything yourself from scratch).
There is always a third option...
for (var i = 0; i < names.Count; i++)
{
Console.WriteLine(names[i]);
}
It ain't sexy, but its got teeth (apologies to Tom Cruise and the movie The Firm)
Its your choice, but now you know and it can be an informed one.
For fun, I popped List into reflector and this is the resulting C#:
public void ForEach(Action<T> action)
{
if (action == null)
{
ThrowHelper.ThrowArgumentNullException(ExceptionArgument.match);
}
for (int i = 0; i < this._size; i++)
{
action(this._items[i]);
}
}
Similarly, the MoveNext in Enumerator which is what is used by foreach is this:
public bool MoveNext()
{
if (this.version != this.list._version)
{
ThrowHelper.ThrowInvalidOperationException(ExceptionResource.InvalidOperation_EnumFailedVersion);
}
if (this.index < this.list._size)
{
this.current = this.list._items[this.index];
this.index++;
return true;
}
this.index = this.list._size + 1;
this.current = default(T);
return false;
}
The List.ForEach is much more trimmed down than MoveNext - far less processing - will more likely JIT into something efficient..
In addition, foreach() will allocate a new Enumerator no matter what. The GC is your friend, but if you're doing the same foreach repeatedly, this will make more throwaway objects, as opposed to reusing the same delegate - BUT - this is really a fringe case. In typical usage you will see little or no difference.
I guess the someList.ForEach() call could be easily parallelized whereas the normal foreach is not that easy to run parallel.
You could easily run several different delegates on different cores, which is not that easy to do with a normal foreach.
Just my 2 cents
I know two obscure-ish things that make them different. Go me!
Firstly, there's the classic bug of making a delegate for each item in the list. If you use the foreach keyword, all your delegates can end up referring to the last item of the list:
// A list of actions to execute later
List<Action> actions = new List<Action>();
// Numbers 0 to 9
List<int> numbers = Enumerable.Range(0, 10).ToList();
// Store an action that prints each number (WRONG!)
foreach (int number in numbers)
actions.Add(() => Console.WriteLine(number));
// Run the actions, we actually print 10 copies of "9"
foreach (Action action in actions)
action();
// So try again
actions.Clear();
// Store an action that prints each number (RIGHT!)
numbers.ForEach(number =>
actions.Add(() => Console.WriteLine(number)));
// Run the actions
foreach (Action action in actions)
action();
The List.ForEach method doesn't have this problem. The current item of the iteration is passed by value as an argument to the outer lambda, and then the inner lambda correctly captures that argument in its own closure. Problem solved.
(Sadly I believe ForEach is a member of List, rather than an extension method, though it's easy to define it yourself so you have this facility on any enumerable type.)
Secondly, the ForEach method approach has a limitation. If you are implementing IEnumerable by using yield return, you can't do a yield return inside the lambda. So looping through the items in a collection in order to yield return things is not possible by this method. You'll have to use the foreach keyword and work around the closure problem by manually making a copy of the current loop value inside the loop.
More here
You could name the anonymous delegate :-)
And you can write the second as:
someList.ForEach(s => s.ToUpper())
Which I prefer, and saves a lot of typing.
As Joachim says, parallelism is easier to apply to the second form.
List.ForEach() is considered to be more functional.
List.ForEach() says what you want done. foreach(item in list) also says exactly how you want it done. This leaves List.ForEach free to change the implementation of the how part in the future. For example, a hypothetical future version of .Net might always run List.ForEach in parallel, under the assumption that at this point everyone has a number of cpu cores that are generally sitting idle.
On the other hand, foreach (item in list) gives you a little more control over the loop. For example, you know that the items will be iterated in some kind of sequential order, and you could easily break in the middle if an item meets some condition.
Some more recent remarks on this issue are available here:
https://stackoverflow.com/a/529197/3043
The entire ForEach scope (delegate function) is treated as a single line of code (calling the function), and you cannot set breakpoints or step into the code. If an unhandled exception occurs the entire block is marked.
Behind the scenes, the anonymous delegate gets turned into an actual method so you could have some overhead with the second choice if the compiler didn't choose to inline the function. Additionally, any local variables referenced by the body of the anonymous delegate example would change in nature because of compiler tricks to hide the fact that it gets compiled to a new method. More info here on how C# does this magic:
http://blogs.msdn.com/oldnewthing/archive/2006/08/04/688527.aspx
The ForEach function is member of the generic class List.
I have created the following extension to reproduce the internal code:
public static class MyExtension<T>
{
public static void MyForEach(this IEnumerable<T> collection, Action<T> action)
{
foreach (T item in collection)
action.Invoke(item);
}
}
So a the end we are using a normal foreach (or a loop for if you want).
On the other hand, using a delegate function is just another way to define a function, this code:
delegate(string s) {
<process the string>
}
is equivalent to:
private static void myFunction(string s, <other variables...>)
{
<process the string>
}
or using labda expressions:
(s) => <process the string>
The second way you showed uses an extension method to execute the delegate method for each of the elements in the list.
This way, you have another delegate (=method) call.
Additionally, there is the possibility to iterate the list with a for loop.
One thing to be wary of is how to exit from the Generic .ForEach method - see this discussion. Although the link seems to say that this way is the fastest. Not sure why - you'd think they would be equivalent once compiled...