I dont know why this I'm getting System.IndexOutOfRangeException: 'Index was outside the bounds of the array.' with this code
IEnumerable<char> query = "Text result";
string illegals = "abcet";
for (int i = 0; i < illegals.Length; i++)
{
query = query.Where(c => c != illegals[i]);
}
foreach (var item in query)
{
Console.Write(item);
}
Please can someone explain what's wrong with my code.
The problem is that your lambda expression is capturing the variable i, but the delegate isn't being executed until after the loop. By the time the expression c != illegals[i] is executed, i is illegals.Length, because that's the final value of i. It's important to understand that lambda expressions capture variables, rather than "the values of those variables at the point of the lambda expression being converted into a delegate".
Here are five ways of fixing your code:
Option 1: local copy of i
Copy the value of i into a local variable within the loop, so that each iteration of the loop captures a new variable in the lambda expression. That new variable isn't changed by the rest of the execution of the loop.
for (int i = 0; i < illegals.Length; i++)
{
int copy = i;
query = query.Where(c => c != illegals[copy]);
}
Option 2: extract illegals[i] outside the lambda expression
Extract the value of illegals[i] in the loop (outside the lambda expression) and use that value in the lambda expression. Again, the changing value of i doesn't affect the variable.
for (int i = 0; i < illegals.Length; i++)
{
char illegal = illegals[i];
query = query.Where(c => c != illegal);
}
Option 3: use a foreach loop
This option only works properly with C# 5 and later compilers, as the meaning of foreach changed (for the better) in C# 5.
foreach (char illegal in illegals)
{
query = query.Where(c => c != illegal);
}
Option 4: use Except once
LINQ provides a method to perform set exclusion: Except. This is not quite the same as the earlier options though, as you'll only get a single copy of any particular character in your output. So if e wasn't in illegals, you'd get a result of "Tex resul" with the above options, but "Tex rsul" using Except. Still, it's worth knowing about:
// Replace the loop entirely with this
query = query.Except(illegals);
Option 5: Use Contains once
You can call Where once, with a lambda expression that calls Contains:
// Replace the loop entirely with this
query = query.Where(c => !illegals.Contains(c));
This happens because, although your for loop seems at first glance to be correctly bounded, each iteration captures the index in the closure that is passed to Where. one of the most useful properties of closures is that they capture by reference, enabling all sorts of powerful and sophisticated techniques. However, in this case it means that, by the time the query is executed in the ensuing foreach loop. The index has been incremented past the length of the array.
The most straightforward change to fix this is create a loop scoped copy the current value of the index loop control variable and refer to this in your closure instead of referring directly to the loop control variable.
Ex:
for (int i = 0; i < illegals.Length; i++)
{
var index = i;
query = query.Where(c => c != illegals[index]);
}
However, as has been noted by others, there are better ways to write this that void the problem entirely and they also have the virtue that they raise the level of abstraction.
For example, you can use System.Linq.Enumerable.Except
var legals = query.Except(illegals);
Related
This question already has answers here:
Captured variable in a loop in C#
(10 answers)
Closed last month.
I have this code and do not understand why the out put is 22! I am afraid it should be 01!
can anyone explain what happens? if the list store a method with a parameter, so the parameters should be 0 and 1 respectively!
List<Action> list = new List<Action>();
for (int i = 0; i < 2; i++)
{
list.Add(() => Console.Write(i));
}
foreach (var it in list)
{
it();
}
It is Closure (1, 2).
In your case Console.Write(i) will use value of i in the moment of action call. You firstly increment i in for loop then in second loop you call every action in the list. In the moment of call of every action i has value 2 - so, you get 22 as output.
To get expected result you should create local copy of i and use it:
for (int i = 0; i < 2; i++)
{
var temp = i;
list.Add(() => Console.Write(temp));
}
Addition to Roma Doskoch's anwser, another approach is to avoid for.
var list = Enumerable
.Range(0, 2)
.Select<int, Action>(i => () => Console.Write(i));
Closures capture variables, not values.
In your code, the closure captures the variable i, not whatever value happens to be stored in i on each iteration. When you invoke the action, the variable i has a value of 2 (because the loop has finished) and therefore 2 will be printed out twice.
In order to avoid this, as other answers already point out, you need to create a new variable every time around as a workaround to not being able to capture values; if you declare a new variable on every iteration then the result of capturing the variable is equivalent to capturing the value because you won't be changing it on the next iteration.
When using lambda expressions or anonymous methods in C#, we have to be wary of the access to modified closure pitfall. For example:
foreach (var s in strings)
{
query = query.Where(i => i.Prop == s); // access to modified closure
...
}
Due to the modified closure, the above code will cause all of the Where clauses on the query to be based on the final value of s.
As explained here, this happens because the s variable declared in foreach loop above is translated like this in the compiler:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
instead of like this:
while (enumerator.MoveNext())
{
string s;
s = enumerator.Current;
...
}
As pointed out here, there are no performance advantages to declaring a variable outside the loop, and under normal circumstances the only reason I can think of for doing this is if you plan to use the variable outside the scope of the loop:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
var finalString = s;
However variables defined in a foreach loop cannot be used outside the loop:
foreach(string s in strings)
{
}
var finalString = s; // won't work: you're outside the scope.
So the compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable, or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Your criticism is entirely justified.
I discuss this problem in detail here:
Closing over the loop variable considered harmful
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable? or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The latter. The C# 1.0 specification actually did not say whether the loop variable was inside or outside the loop body, as it made no observable difference. When closure semantics were introduced in C# 2.0, the choice was made to put the loop variable outside the loop, consistent with the "for" loop.
I think it is fair to say that all regret that decision. This is one of the worst "gotchas" in C#, and we are going to take the breaking change to fix it. In C# 5 the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time.
The for loop will not be changed, and the change will not be "back ported" to previous versions of C#. You should therefore continue to be careful when using this idiom.
What you are asking is thoroughly covered by Eric Lippert in his blog post Closing over the loop variable considered harmful and its sequel.
For me, the most convincing argument is that having new variable in each iteration would be inconsistent with for(;;) style loop. Would you expect to have a new int i in each iteration of for (int i = 0; i < 10; i++)?
The most common problem with this behavior is making a closure over iteration variable and it has an easy workaround:
foreach (var s in strings)
{
var s_for_closure = s;
query = query.Where(i => i.Prop == s_for_closure); // access to modified closure
My blog post about this issue: Closure over foreach variable in C#.
Having been bitten by this, I have a habit of including locally defined variables in the innermost scope which I use to transfer to any closure. In your example:
foreach (var s in strings)
query = query.Where(i => i.Prop == s); // access to modified closure
I do:
foreach (var s in strings)
{
string search = s;
query = query.Where(i => i.Prop == search); // New definition ensures unique per iteration.
}
Once you have that habit, you can avoid it in the very rare case you actually intended to bind to the outer scopes. To be honest, I don't think I have ever done so.
In C# 5.0, this problem is fixed and you can close over loop variables and get the results you expect.
The language specification says:
8.8.4 The foreach statement
(...)
A foreach statement of the form
foreach (V v in x) embedded-statement
is then expanded to:
{
E e = ((C)(x)).GetEnumerator();
try {
while (e.MoveNext()) {
V v = (V)(T)e.Current;
embedded-statement
}
}
finally {
… // Dispose e
}
}
(...)
The placement of v inside the while loop is important for how it is
captured by any anonymous function occurring in the
embedded-statement. For example:
int[] values = { 7, 9, 13 };
Action f = null;
foreach (var value in values)
{
if (f == null) f = () => Console.WriteLine("First value: " + value);
}
f();
If v was declared outside of the while loop, it would be shared
among all iterations, and its value after the for loop would be the
final value, 13, which is what the invocation of f would print.
Instead, because each iteration has its own variable v, the one
captured by f in the first iteration will continue to hold the value
7, which is what will be printed. (Note: earlier versions of C#
declared v outside of the while loop.)
I have a class that contains four EnumerableRowCollections, which all point to the same DataTable. The main one will need different combinations of the other three filtered out in different class instances. Since three of them are related, I put them in an array.
EnumerableRowCollection<DataRow> valid;
EnumerableRowCollection<DataRow>[] pending;
All of these collections are defined in the class constructor, but evaluated later due to LINQ's lazy evaluation.
I also have an array of Booleans, which are used to determine which "pending" collections are filtered out of the "valid" collection. These are also assigned in the constructor, and are never changed.
Boolean[] pendingIsValid;
The "valid" collection is filtered like this:
for (var i = 0; i < pending.Length; i++)
if (pendingIsValid[i] && pending[i].Count() > 0)
valid = valid.Where(r => !pending[i].Contains(r));
This also occurs in the constructor, but the Where clause is evaluated lazily, as expected.
This works most of the time, however, in a few cases I got a weird exception when the collection evaluation took place down the road.
I get an IndexOutOfRange because of the local iterator variable, i, in my for loop above is set to 3.
Questions:
Can I make "Where" evaluate the array indexer (or other sub-expressions) non-lazily?
How does the iterator get incremented to 3 at all? Does this lazy evaluation count as "re-entering" the loop?
!?!?
Change it to this:
for (var i = 0; i < pending.Length; i++)
if (pendingIsValid[i] && pending[i].Count() > 0)
{
var j = i;
valid = valid.Where(r => !pending[j].Contains(r));
}
For question #1 - you can make it not lazy by adding .ToList() at the end. However, with the above fix, you can keep it lazy.
Have a read of this: Captured variable in a loop in C# for the explanation
Excellent, Rob. I also figured out this while I was waiting for a response, but yours looks a bit cleaner.
for (var i = 0; i < pending.Length; i++) {
var p = pending[i];
if (pendingIsValid[i] && p.Count() > 0)
valid = valid.Where(r => !p.Contains(r));
}
Can someone please explain me what I am missing here. Based on my basic understanding linq result will be calculated when the result will be used and I can see that in following code.
static void Main(string[] args)
{
Action<IEnumerable<int>> print = (x) =>
{
foreach (int i in x)
{
Console.WriteLine(i);
}
};
int[] arr = { 1, 2, 3, 4, 5 };
int cutoff = 1;
IEnumerable<int> result = arr.Where(x => x < cutoff);
Console.WriteLine("First Print");
cutoff = 3;
print(result);
Console.WriteLine("Second Print");
cutoff = 4;
print(result);
Console.Read();
}
Output:
First Print
1
2
Second Print
1
2
3
Now I changed the
arr.Where(x => x < cutoff);
to
IEnumerable<int> result = arr.Take(cutoff);
and the output is as follow.
First Print
1
Second Print
1
Why with Take, it does not use the current value of the variable?
The behavior your seeing comes from the different way in which the arguments to the LINQ functions are evaluated. The Where method recieves a lambda which captures the value cutoff by reference. It is evaluated on demand and hence sees the value of cutoff at that time.
The Take method (and similar methods like Skip) take an int parameter and hence cutoff is passed by value. The value used is the value of cutoff at the moment the Take method is called, not when the query is evaluated
Note: The term late binding here is a bit incorrect. Late binding generally refers to the process where the members an expression binds to are determined at runtime vs. compile time. In C# you'd accomplish this with dynamic or reflection. The behavior of LINQ to evaluate it's parts on demand is known as delayed execution.
There's a few different things getting confused here.
Late-binding: This is where the meaning of code is determined after it was compiled. For example, x.DoStuff() is early-bound if the compiler checks that objects of x's type have a DoStuff() method (considering extension methods and default arguments too) and then produces the call to it in the code it outputs, or fails with a compiler error otherwise. It is late-bound if the search for the DoStuff() method is done at run-time and throws a run-time exception if there was no DoStuff() method. There are pros and cons to each, and C# is normally early-bound but has support for late-binding (most simply through dynamic but the more convoluted approaches involving reflection also count).
Delayed execution: Strictly speaking, all Linq methods immediately produce a result. However, that result is an object which stores a reference to an enumerable object (often the result of the previous Linq method) which it will process in an appropriate manner when it is itself enumerated. For example, we can write our own Take method as:
private static IEnumerable<T> TakeHelper<T>(IEnumerable<T> source, int number)
{
foreach(T item in source)
{
yield return item;
if(--number == 0)
yield break;
}
}
public static IEnumerable<T> Take<T>(this IEnumerable<T> source, int number)
{
if(source == null)
throw new ArgumentNullException();
if(number < 0)
throw new ArgumentOutOfRangeException();
if(number == 0)
return Enumerable.Empty<T>();
return TakeHelper(source, number);
}
Now, when we use it:
var taken4 = someEnumerable.Take(4);//taken4 has a value, so we've already done
//something. If it was going to throw
//an argument exception it would have done so
//by now.
var firstTaken = taken4.First();//only now does the object in taken4
//do the further processing that iterates
//through someEnumerable.
Captured variables: Normally when we make use of a variable, we make use of how its current state:
int i = 2;
string s = "abc";
Console.WriteLine(i);
Console.WriteLine(s);
i = 3;
s = "xyz";
It's pretty intuitive that this prints 2 and abc and not 3 and xyz. In anonymous functions and lambda expressions though, when we make use of a variable we are "capturing" it as a variable, and so we will end up using the value it has when the delegate is invoked:
int i = 2;
string s = "abc";
Action λ = () =>
{
Console.WriteLine(i);
Console.WriteLine(s);
};
i = 3;
s = "xyz";
λ();
Creating the λ doesn't use the values of i and s, but creates a set of instructions as to what to do with i and s when λ is invoked. Only when that happens are the values of i and s used.
Putting it all together: In none of your cases do you have any late-binding. That is irrelevant to your question.
In both you have delayed execution. Both the call to Take and the call to Where return enumerable objects which will act upon arr when they are enumerated.
In only one do you have a captured variable. The call to Take passes an integer directly to Take and Take makes use of that value. The call to Where passes a Func<int, bool> created from a lambda expression, and that lambda expression captures an int variable. Where knows nothing of this capture, but the Func does.
That's the reason the two behave so differently in how they treat cutoff.
Take doesn't take a lambda, but an integer, as such it can't change when you change the original variable.
Will this fail ?
Resharper reports this as an instance of "Access to modified Closure"
Will the lambda be triggered for every value ? Is the iterator generating the complete list of all interval values in this before the line that changes first runs? Or is the line first = itvl;
running for reach iteration, and that changed value of first used for subsequent iterations ??
public HourInterval FirstInterval
{
get
{
var first = HourInterval.Make(DateTime.MaxValue);
foreach (var itvl in this.Where
(itvl => itvl < first))
first = itvl;
return first;
}
}
NOTE. HourInterval is a value-type struct that represents each one-hour-long calendar hour... and this is an IEnumerable collection of HourInterval objects
EDIT:
The above is what Resharper suggested to convert to a LINQ expression from the below foreach construction ...
public HourInterval FirstInterval
{
get
{
var first = HourInterval.Make(DateTime.MaxValue);
foreach (var itvl in this)
if(itvl < first)
first = itvl;
return first;
}
}
OK, this is a bit of a mess.
First off, it is a poor programming practice to use the same variable name in two slightly inconsistent ways in the same code. It's very confusing. Frankly, I would prefer this to be illegal; the reason that it is not illegal is a bit complicated; see http://blogs.msdn.com/b/ericlippert/archive/2009/11/05/simple-names-are-not-so-simple-part-two.aspx for details.
Let's get rid of that problem:
var first = HourInterval.Make(DateTime.MaxValue);
foreach (var itvl in this.Where(x => x < first))
first = itvl;
Now, the next question is: is Resharper correct to note that this is an access to a modified closure? Yes, Resharper is correct; you are modifying a closed-over variable of a lambda that will be called repeatedly. Resharper is noting that this is dangerous because Resharper does not know what "Where" does. For all Resharper knows, "Where" is caching every predicate it gets and is saving it up to execute later, in the mistaken belief that each predicate will do something different. In fact each predicate is the same, because each predicate is closed over the same variable, not closed over different variables.
Clearly no sensible implementation of "Where" will do that. But Resharper doesn't know that.
The next question is: is this a sensible thing to do? No. This is a terribly unidiomatic and confusing way to implement "Min", by modifying the closed-over variable of a predicate to "Where". If you want to write Min, just write Min:
static DateTime? Min(this IEnumerable<DateTime> seq)
{
DateTime? min = null;
foreach(DateTime current in seq)
{
if (min == null || current < min.Value)
min = current;
}
return min;
}
There, that returns the earliest date in the sequence, or null if the sequence is empty. No messing about with Where and predicates and mutated closures and all that nonsense: write the code to be straightforward and obviously correct.
Your code should work, but that's an unnecessarily complicated way to do it.
Try
this.Aggregate((min, next) => next < min ? next : min);