When using lambda expressions or anonymous methods in C#, we have to be wary of the access to modified closure pitfall. For example:
foreach (var s in strings)
{
query = query.Where(i => i.Prop == s); // access to modified closure
...
}
Due to the modified closure, the above code will cause all of the Where clauses on the query to be based on the final value of s.
As explained here, this happens because the s variable declared in foreach loop above is translated like this in the compiler:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
instead of like this:
while (enumerator.MoveNext())
{
string s;
s = enumerator.Current;
...
}
As pointed out here, there are no performance advantages to declaring a variable outside the loop, and under normal circumstances the only reason I can think of for doing this is if you plan to use the variable outside the scope of the loop:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
var finalString = s;
However variables defined in a foreach loop cannot be used outside the loop:
foreach(string s in strings)
{
}
var finalString = s; // won't work: you're outside the scope.
So the compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable, or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Your criticism is entirely justified.
I discuss this problem in detail here:
Closing over the loop variable considered harmful
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable? or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The latter. The C# 1.0 specification actually did not say whether the loop variable was inside or outside the loop body, as it made no observable difference. When closure semantics were introduced in C# 2.0, the choice was made to put the loop variable outside the loop, consistent with the "for" loop.
I think it is fair to say that all regret that decision. This is one of the worst "gotchas" in C#, and we are going to take the breaking change to fix it. In C# 5 the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time.
The for loop will not be changed, and the change will not be "back ported" to previous versions of C#. You should therefore continue to be careful when using this idiom.
What you are asking is thoroughly covered by Eric Lippert in his blog post Closing over the loop variable considered harmful and its sequel.
For me, the most convincing argument is that having new variable in each iteration would be inconsistent with for(;;) style loop. Would you expect to have a new int i in each iteration of for (int i = 0; i < 10; i++)?
The most common problem with this behavior is making a closure over iteration variable and it has an easy workaround:
foreach (var s in strings)
{
var s_for_closure = s;
query = query.Where(i => i.Prop == s_for_closure); // access to modified closure
My blog post about this issue: Closure over foreach variable in C#.
Having been bitten by this, I have a habit of including locally defined variables in the innermost scope which I use to transfer to any closure. In your example:
foreach (var s in strings)
query = query.Where(i => i.Prop == s); // access to modified closure
I do:
foreach (var s in strings)
{
string search = s;
query = query.Where(i => i.Prop == search); // New definition ensures unique per iteration.
}
Once you have that habit, you can avoid it in the very rare case you actually intended to bind to the outer scopes. To be honest, I don't think I have ever done so.
In C# 5.0, this problem is fixed and you can close over loop variables and get the results you expect.
The language specification says:
8.8.4 The foreach statement
(...)
A foreach statement of the form
foreach (V v in x) embedded-statement
is then expanded to:
{
E e = ((C)(x)).GetEnumerator();
try {
while (e.MoveNext()) {
V v = (V)(T)e.Current;
embedded-statement
}
}
finally {
… // Dispose e
}
}
(...)
The placement of v inside the while loop is important for how it is
captured by any anonymous function occurring in the
embedded-statement. For example:
int[] values = { 7, 9, 13 };
Action f = null;
foreach (var value in values)
{
if (f == null) f = () => Console.WriteLine("First value: " + value);
}
f();
If v was declared outside of the while loop, it would be shared
among all iterations, and its value after the for loop would be the
final value, 13, which is what the invocation of f would print.
Instead, because each iteration has its own variable v, the one
captured by f in the first iteration will continue to hold the value
7, which is what will be printed. (Note: earlier versions of C#
declared v outside of the while loop.)
Related
I dont know why this I'm getting System.IndexOutOfRangeException: 'Index was outside the bounds of the array.' with this code
IEnumerable<char> query = "Text result";
string illegals = "abcet";
for (int i = 0; i < illegals.Length; i++)
{
query = query.Where(c => c != illegals[i]);
}
foreach (var item in query)
{
Console.Write(item);
}
Please can someone explain what's wrong with my code.
The problem is that your lambda expression is capturing the variable i, but the delegate isn't being executed until after the loop. By the time the expression c != illegals[i] is executed, i is illegals.Length, because that's the final value of i. It's important to understand that lambda expressions capture variables, rather than "the values of those variables at the point of the lambda expression being converted into a delegate".
Here are five ways of fixing your code:
Option 1: local copy of i
Copy the value of i into a local variable within the loop, so that each iteration of the loop captures a new variable in the lambda expression. That new variable isn't changed by the rest of the execution of the loop.
for (int i = 0; i < illegals.Length; i++)
{
int copy = i;
query = query.Where(c => c != illegals[copy]);
}
Option 2: extract illegals[i] outside the lambda expression
Extract the value of illegals[i] in the loop (outside the lambda expression) and use that value in the lambda expression. Again, the changing value of i doesn't affect the variable.
for (int i = 0; i < illegals.Length; i++)
{
char illegal = illegals[i];
query = query.Where(c => c != illegal);
}
Option 3: use a foreach loop
This option only works properly with C# 5 and later compilers, as the meaning of foreach changed (for the better) in C# 5.
foreach (char illegal in illegals)
{
query = query.Where(c => c != illegal);
}
Option 4: use Except once
LINQ provides a method to perform set exclusion: Except. This is not quite the same as the earlier options though, as you'll only get a single copy of any particular character in your output. So if e wasn't in illegals, you'd get a result of "Tex resul" with the above options, but "Tex rsul" using Except. Still, it's worth knowing about:
// Replace the loop entirely with this
query = query.Except(illegals);
Option 5: Use Contains once
You can call Where once, with a lambda expression that calls Contains:
// Replace the loop entirely with this
query = query.Where(c => !illegals.Contains(c));
This happens because, although your for loop seems at first glance to be correctly bounded, each iteration captures the index in the closure that is passed to Where. one of the most useful properties of closures is that they capture by reference, enabling all sorts of powerful and sophisticated techniques. However, in this case it means that, by the time the query is executed in the ensuing foreach loop. The index has been incremented past the length of the array.
The most straightforward change to fix this is create a loop scoped copy the current value of the index loop control variable and refer to this in your closure instead of referring directly to the loop control variable.
Ex:
for (int i = 0; i < illegals.Length; i++)
{
var index = i;
query = query.Where(c => c != illegals[index]);
}
However, as has been noted by others, there are better ways to write this that void the problem entirely and they also have the virtue that they raise the level of abstraction.
For example, you can use System.Linq.Enumerable.Except
var legals = query.Except(illegals);
This question already has answers here:
Will using 'var' affect performance?
(12 answers)
Closed 9 years ago.
The language I use is C#.
Let we have a List of objects of type T,
List<T> collection = new List<T>{.....};
Say that we want to go over each item of collection. That can be done in many ways. Among of them, are the following two:
foreach(var item in collection)
{
// code goes here
}
and
foreach(T item in collection)
{
// code goes here
}
Does the second way be better than the first or not and why?
Thanks in advance for your answers.
They're both exactly the same. var is syntactic sugar for convenience. It makes no difference to the speed with which a List is traversed.
The rule of thumb I follow with var is to only use it if the type of the object is present on the right-hand side of an assignment, so in this case I'd prefer to explicitly specify the type in the foreach to make it clearer for other engineers, but it's down to personal choice. If you hover over a var in Visual Studio, it will display the type (assuming it can infer what is should be).
Quoting MSDN:
An implicitly typed local variable is strongly typed just as if you
had declared the type yourself, but the compiler determines the type.
So
var i = 10; // implicitly typed
int i = 10; //explicitly typed
Are exactly the same.
Now, for 'better' - It'll heavily depend on what's your parameter to judge that. If it's speed, then a for loop may be better than a foreach, and T[] better than List<T>, according to Patrick Smacchia. Main points:
for loops on List are a bit more than 2 times cheaper than foreach loops on List.
Looping on array is around 2 times cheaper than looping on List.
As a consequence, looping on array using for is 5 times cheaper than looping on List using foreach (which I believe, is what we all do).
Quote source: In .NET, which loop runs faster, 'for' or 'foreach'?
Reference: http://msdn.microsoft.com/en-us/library/bb383973.aspx
If you compare the IL code then you will see that the are really 100% the same.
var is only syntactic sugar:
C# Code:
List<int> collection = new List<int>();
collection.Add(1);
collection.Add(2);
collection.Add(3);
foreach (var myInt in collection)
{
Console.WriteLine(myInt);
}
foreach (var T in collection)
{
Console.WriteLine(T);
}
bool flag;
System.Collections.Generic.List<int> list = new System.Collections.Generic.List<int>();
list.Add(1);
list.Add(2);
list.Add(3);
System.Collections.Generic.List<int>.Enumerator enumerator = list.GetEnumerator();
try
{
while (flag)
{
int i1 = enumerator.get_Current();
System.Console.WriteLine(i1);
flag = enumerator.MoveNext();
}
}
finally
{
enumerator.Dispose();
}
enumerator = list.GetEnumerator();
try
{
while (flag)
{
int i2 = enumerator.get_Current();
System.Console.WriteLine(i2);
flag = enumerator.MoveNext();
}
}
finally
{
enumerator.Dispose();
}
There is no faster way to iterate through same collection.
No matter what you use, your own loop or extension methods - this is all the same. When you use var - it still compiles to the same thing.
The only difference might be that if you use Dictionary, it will be faster than the List<T> or Collection in terms of searching for values. Dictionary was designed with optimization for search
1st way (with var) might be better for readability.
Consider this:
List<User> list = new List<User>();
var users = list.GroupBy(x => x.Name).OrderBy(x => x.Key);
foreach (var user in users)
{
//blah
}
vs
foreach (System.Linq.IGrouping<string, User> user in users)
{
}
I believe that was the main reason for having var in the first place.
What is a closure? Do we have them in .NET?
If they do exist in .NET, could you please provide a code snippet (preferably in C#) explaining it?
I have an article on this very topic. (It has lots of examples.)
In essence, a closure is a block of code which can be executed at a later time, but which maintains the environment in which it was first created - i.e. it can still use the local variables etc of the method which created it, even after that method has finished executing.
The general feature of closures is implemented in C# by anonymous methods and lambda expressions.
Here's an example using an anonymous method:
using System;
class Test
{
static void Main()
{
Action action = CreateAction();
action();
action();
}
static Action CreateAction()
{
int counter = 0;
return delegate
{
// Yes, it could be done in one statement;
// but it is clearer like this.
counter++;
Console.WriteLine("counter={0}", counter);
};
}
}
Output:
counter=1
counter=2
Here we can see that the action returned by CreateAction still has access to the counter variable, and can indeed increment it, even though CreateAction itself has finished.
If you are interested in seeing how C# implements Closure read "I know the answer (its 42) blog"
The compiler generates a class in the background to encapsulate the anoymous method and the variable j
[CompilerGenerated]
private sealed class <>c__DisplayClass2
{
public <>c__DisplayClass2();
public void <fillFunc>b__0()
{
Console.Write("{0} ", this.j);
}
public int j;
}
for the function:
static void fillFunc(int count) {
for (int i = 0; i < count; i++)
{
int j = i;
funcArr[i] = delegate()
{
Console.Write("{0} ", j);
};
}
}
Turning it into:
private static void fillFunc(int count)
{
for (int i = 0; i < count; i++)
{
Program.<>c__DisplayClass1 class1 = new Program.<>c__DisplayClass1();
class1.j = i;
Program.funcArr[i] = new Func(class1.<fillFunc>b__0);
}
}
Closures are functional values that hold onto variable values from their original scope. C# can use them in the form of anonymous delegates.
For a very simple example, take this C# code:
delegate int testDel();
static void Main(string[] args)
{
int foo = 4;
testDel myClosure = delegate()
{
return foo;
};
int bar = myClosure();
}
At the end of it, bar will be set to 4, and the myClosure delegate can be passed around to be used elsewhere in the program.
Closures can be used for a lot of useful things, like delayed execution or to simplify interfaces - LINQ is mainly built using closures. The most immediate way it comes in handy for most developers is adding event handlers to dynamically created controls - you can use closures to add behavior when the control is instantiated, rather than storing data elsewhere.
Func<int, int> GetMultiplier(int a)
{
return delegate(int b) { return a * b; } ;
}
//...
var fn2 = GetMultiplier(2);
var fn3 = GetMultiplier(3);
Console.WriteLine(fn2(2)); //outputs 4
Console.WriteLine(fn2(3)); //outputs 6
Console.WriteLine(fn3(2)); //outputs 6
Console.WriteLine(fn3(3)); //outputs 9
A closure is an anonymous function passed outside of the function in which it is created.
It maintains any variables from the function in which it is created that it uses.
A closure is when a function is defined inside another function (or method) and it uses the variables from the parent method. This use of variables which are located in a method and wrapped in a function defined within it, is called a closure.
Mark Seemann has some interesting examples of closures in his blog post where he does a parallel between oop and functional programming.
And to make it more detailed
var workingDirectory = new DirectoryInfo(Environment.CurrentDirectory);//when this variable
Func<int, string> read = id =>
{
var path = Path.Combine(workingDirectory.FullName, id + ".txt");//is used inside this function
return File.ReadAllText(path);
};//the entire process is called a closure.
Here is a contrived example for C# which I created from similar code in JavaScript:
public delegate T Iterator<T>() where T : class;
public Iterator<T> CreateIterator<T>(IList<T> x) where T : class
{
var i = 0;
return delegate { return (i < x.Count) ? x[i++] : null; };
}
So, here is some code that shows how to use the above code...
var iterator = CreateIterator(new string[3] { "Foo", "Bar", "Baz"});
// So, although CreateIterator() has been called and returned, the variable
// "i" within CreateIterator() will live on because of a closure created
// within that method, so that every time the anonymous delegate returned
// from it is called (by calling iterator()) it's value will increment.
string currentString;
currentString = iterator(); // currentString is now "Foo"
currentString = iterator(); // currentString is now "Bar"
currentString = iterator(); // currentString is now "Baz"
currentString = iterator(); // currentString is now null
Hope that is somewhat helpful.
Closures are chunks of code that reference a variable outside themselves, (from below them on the stack), that might be called or executed later, (like when an event or delegate is defined, and could get called at some indefinite future point in time)... Because the outside variable that the chunk of code references may gone out of scope (and would otherwise have been lost), the fact that it is referenced by the chunk of code (called a closure) tells the runtime to "hold" that variable in scope until it is no longer needed by the closure chunk of code...
Basically closure is a block of code that you can pass as an argument to a function. C# supports closures in form of anonymous delegates.
Here is a simple example:
List.Find method can accept and execute piece of code (closure) to find list's item.
// Passing a block of code as a function argument
List<int> ints = new List<int> {1, 2, 3};
ints.Find(delegate(int value) { return value == 1; });
Using C#3.0 syntax we can write this as:
ints.Find(value => value == 1);
If you write an inline anonymous method (C#2) or (preferably) a Lambda expression (C#3+), an actual method is still being created. If that code is using an outer-scope local variable - you still need to pass that variable to the method somehow.
e.g. take this Linq Where clause (which is a simple extension method which passes a lambda expression):
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
// this is a predicate, i.e. a Func<T, bool> written as a lambda expression
// which is still a method actually being created for you in compile time
{
i++;
return true;
});
if you want to use i in that lambda expression, you have to pass it to that created method.
So the first question that arises is: should it be passed by value or reference?
Pass by reference is (I guess) more preferable as you get read/write access to that variable (and this is what C# does; I guess the team in Microsoft weighed the pros and cons and went with by-reference; According to Jon Skeet's article, Java went with by-value).
But then another question arises: Where to allocate that i?
Should it actually/naturally be allocated on the stack?
Well, if you allocate it on the stack and pass it by reference, there can be situations where it outlives it's own stack frame. Take this example:
static void Main(string[] args)
{
Outlive();
var list = whereItems.ToList();
Console.ReadLine();
}
static IEnumerable<string> whereItems;
static void Outlive()
{
var i = 0;
var items = new List<string>
{
"Hello","World"
};
whereItems = items.Where(x =>
{
i++;
Console.WriteLine(i);
return true;
});
}
The lambda expression (in the Where clause) again creates a method which refers to an i. If i is allocated on the stack of Outlive, then by the time you enumerate the whereItems, the i used in the generated method will point to the i of Outlive, i.e. to a place in the stack that is no longer accessible.
Ok, so we need it on the heap then.
So what the C# compiler does to support this inline anonymous/lambda, is use what is called "Closures": It creates a class on the Heap called (rather poorly) DisplayClass which has a field containing the i, and the Function that actually uses it.
Something that would be equivalent to this (you can see the IL generated using ILSpy or ILDASM):
class <>c_DisplayClass1
{
public int i;
public bool <GetFunc>b__0()
{
this.i++;
Console.WriteLine(i);
return true;
}
}
It instantiates that class in your local scope, and replaces any code relating to i or the lambda expression with that closure instance. So - anytime you are using the i in your "local scope" code where i was defined, you are actually using that DisplayClass instance field.
So if I would change the "local" i in the main method, it will actually change _DisplayClass.i ;
i.e.
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
{
i++;
return true;
});
filtered.ToList(); // will enumerate filtered, i = 2
i = 10; // i will be overwriten with 10
filtered.ToList(); // will enumerate filtered again, i = 12
Console.WriteLine(i); // should print out 12
it will print out 12, as "i = 10" goes to that dispalyclass field and changes it just before the 2nd enumeration.
A good source on the topic is this Bart De Smet Pluralsight module (requires registration) (also ignore his erroneous use of the term "Hoisting" - what (I think) he means is that the local variable (i.e. i) is changed to refer to the the new DisplayClass field).
In other news, there seems to be some misconception that "Closures" are related to loops - as I understand "Closures" are NOT a concept related to loops, but rather to anonymous methods / lambda expressions use of local scoped variables - although some trick questions use loops to demonstrate it.
A closure aims to simplify functional thinking, and it allows the runtime to manage
state, releasing extra complexity for the developer. A closure is a first-class function
with free variables that are bound in the lexical environment. Behind these buzzwords
hides a simple concept: closures are a more convenient way to give functions access
to local state and to pass data into background operations. They are special functions
that carry an implicit binding to all the nonlocal variables (also called free variables or
up-values) referenced. Moreover, a closure allows a function to access one or more nonlocal variables even when invoked outside its immediate lexical scope, and the body
of this special function can transport these free variables as a single entity, defined in
its enclosing scope. More importantly, a closure encapsulates behavior and passes it
around like any other object, granting access to the context in which the closure was
created, reading, and updating these values.
Just out of the blue,a simple and more understanding answer from the book C# 7.0 nutshell.
Pre-requisit you should know :A lambda expression can reference the local variables and parameters of the method
in which it’s defined (outer variables).
static void Main()
{
int factor = 2;
//Here factor is the variable that takes part in lambda expression.
Func<int, int> multiplier = n => n * factor;
Console.WriteLine (multiplier (3)); // 6
}
Real part:Outer variables referenced by a lambda expression are called captured variables. A lambda expression that captures variables is called a closure.
Last Point to be noted:Captured variables are evaluated when the delegate is actually invoked, not when the variables were captured:
int factor = 2;
Func<int, int> multiplier = n => n * factor;
factor = 10;
Console.WriteLine (multiplier (3)); // 30
A closure is a function, defined within a function, that can access the local variables of it as well as its parent.
public string GetByName(string name)
{
List<things> theThings = new List<things>();
return theThings.Find<things>(t => t.Name == name)[0];
}
so the function inside the find method.
t => t.Name == name
can access the variables inside its scope, t, and the variable name which is in its parents scope. Even though it is executed by the find method as a delegate, from another scope all together.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Is there a reason for C#'s reuse of the variable in a foreach?
Looping through a list of Actions
today I encounter a problem about the C# foreach function, it didn't give me the proper result as I expected. here is the code:
using System;
using System.Collections.Generic;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
int[] data = new int[] { 1, 2, 3, 4, 5 };
List<Func<int>> actions = new List<Func<int>>();
foreach (int x in data)
{
actions.Add(delegate() { return x; });
}
foreach (var foo in actions)
{
Console.WriteLine(foo());
}
Console.ReadKey();
}
}
}
when I Run it in console application and it has five 5 printed on the screen. Why? I just cann't understand. Have asked some people and they just said that there is closure in this code, But I am not very clear about this, I remember that in javascript , I often encounter the closure, but in above code, why there is closure? thx.
In C#4 all iterations of a foreach loop share the same variable, and thus the same closure.
The specification says:
foreach (V v in x) embedded-statement
is then expanded to:
{
E e = ((C)(x)).GetEnumerator();
try
{
V v;
while (e.MoveNext())
{
v = (V)(T)e.Current;
embedded-statement
}
}
finally
{
… // Dispose e
}
}
You can see that v is declared in a block outside the while-loop, which causes this sharing behavior.
This will probably be changed in C#5.
We are taking the breaking change. In C# 5, the loop variable of a foreach will be logically inside the loop, and therefore closures will close over a fresh copy of the variable each time. The "for" loop will not be changed.
http://blogs.msdn.com/b/ericlippert/archive/2009/11/12/closing-over-the-loop-variable-considered-harmful.aspx
The key is that when you are creating the delegates within your foreach loop you are creating a closure over the loop variable x, not its current value.
Only when you execute the delegates in actions will the value be determined, which is the value of x at that time. Since you have completed the foreach loop by then the value will be the last item in your data array which is 5.
Consider this code.
var values = new List<int> {123, 432, 768};
var funcs = new List<Func<int>>();
values.ForEach(v=>funcs.Add(()=>v));
funcs.ForEach(f=>Console.WriteLine(f()));//prints 123,432,768
funcs.Clear();
foreach (var v1 in values)
{
funcs.Add(()=>v1);
}
foreach (var func in funcs)
{
Console.WriteLine(func()); //prints 768,768,768
}
I know that the second foreach prints 768 3 times because of the closure variable captured by the lambda. why does it not happen in the first case?How does foreach keyword different from the method Foreach? Is it beacuse the expression is evaluated when i do values.ForEach
foreach only introduces one variable. While the lambda parameter variable is "fresh" each time it is invoked.
Compare with:
foreach (var v1 in values) // v1 *same* variable each loop, value changed
{
var freshV1 = v1; // freshV1 is *new* variable each loop
funcs.Add(() => freshV1);
}
foreach (var func in funcs)
{
Console.WriteLine(func()); //prints 123,432,768
}
That is,
foreach (T v in ...) { }
can be thought of as:
T v;
foreach(v in ...) {}
Happy coding.
The difference is that in the foreach loop, you've got a single variable v1 which is captured. That variable takes on each value within values - but you're only using it at the end... which means we only see the final value each time.
In your List<T>.ForEach version, each iteration introduces a new variable (the parameter f) - so each lambda expression is capturing a separate variable, which never changes in value.
Eric Lippert has blogged about this - but note that this behaviour may change in future versions of C#.