Access to modified Closure - c#

Will this fail ?
Resharper reports this as an instance of "Access to modified Closure"
Will the lambda be triggered for every value ? Is the iterator generating the complete list of all interval values in this before the line that changes first runs? Or is the line first = itvl;
running for reach iteration, and that changed value of first used for subsequent iterations ??
public HourInterval FirstInterval
{
get
{
var first = HourInterval.Make(DateTime.MaxValue);
foreach (var itvl in this.Where
(itvl => itvl < first))
first = itvl;
return first;
}
}
NOTE. HourInterval is a value-type struct that represents each one-hour-long calendar hour... and this is an IEnumerable collection of HourInterval objects
EDIT:
The above is what Resharper suggested to convert to a LINQ expression from the below foreach construction ...
public HourInterval FirstInterval
{
get
{
var first = HourInterval.Make(DateTime.MaxValue);
foreach (var itvl in this)
if(itvl < first)
first = itvl;
return first;
}
}

OK, this is a bit of a mess.
First off, it is a poor programming practice to use the same variable name in two slightly inconsistent ways in the same code. It's very confusing. Frankly, I would prefer this to be illegal; the reason that it is not illegal is a bit complicated; see http://blogs.msdn.com/b/ericlippert/archive/2009/11/05/simple-names-are-not-so-simple-part-two.aspx for details.
Let's get rid of that problem:
var first = HourInterval.Make(DateTime.MaxValue);
foreach (var itvl in this.Where(x => x < first))
first = itvl;
Now, the next question is: is Resharper correct to note that this is an access to a modified closure? Yes, Resharper is correct; you are modifying a closed-over variable of a lambda that will be called repeatedly. Resharper is noting that this is dangerous because Resharper does not know what "Where" does. For all Resharper knows, "Where" is caching every predicate it gets and is saving it up to execute later, in the mistaken belief that each predicate will do something different. In fact each predicate is the same, because each predicate is closed over the same variable, not closed over different variables.
Clearly no sensible implementation of "Where" will do that. But Resharper doesn't know that.
The next question is: is this a sensible thing to do? No. This is a terribly unidiomatic and confusing way to implement "Min", by modifying the closed-over variable of a predicate to "Where". If you want to write Min, just write Min:
static DateTime? Min(this IEnumerable<DateTime> seq)
{
DateTime? min = null;
foreach(DateTime current in seq)
{
if (min == null || current < min.Value)
min = current;
}
return min;
}
There, that returns the earliest date in the sequence, or null if the sequence is empty. No messing about with Where and predicates and mutated closures and all that nonsense: write the code to be straightforward and obviously correct.

Your code should work, but that's an unnecessarily complicated way to do it.
Try
this.Aggregate((min, next) => next < min ? next : min);

Related

Task.Factory.StartNew duplication issue [duplicate]

When using lambda expressions or anonymous methods in C#, we have to be wary of the access to modified closure pitfall. For example:
foreach (var s in strings)
{
query = query.Where(i => i.Prop == s); // access to modified closure
...
}
Due to the modified closure, the above code will cause all of the Where clauses on the query to be based on the final value of s.
As explained here, this happens because the s variable declared in foreach loop above is translated like this in the compiler:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
instead of like this:
while (enumerator.MoveNext())
{
string s;
s = enumerator.Current;
...
}
As pointed out here, there are no performance advantages to declaring a variable outside the loop, and under normal circumstances the only reason I can think of for doing this is if you plan to use the variable outside the scope of the loop:
string s;
while (enumerator.MoveNext())
{
s = enumerator.Current;
...
}
var finalString = s;
However variables defined in a foreach loop cannot be used outside the loop:
foreach(string s in strings)
{
}
var finalString = s; // won't work: you're outside the scope.
So the compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable, or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The compiler declares the variable in a way that makes it highly prone to an error that is often difficult to find and debug, while producing no perceivable benefits.
Your criticism is entirely justified.
I discuss this problem in detail here:
Closing over the loop variable considered harmful
Is there something you can do with foreach loops this way that you couldn't if they were compiled with an inner-scoped variable? or is this just an arbitrary choice that was made before anonymous methods and lambda expressions were available or common, and which hasn't been revised since then?
The latter. The C# 1.0 specification actually did not say whether the loop variable was inside or outside the loop body, as it made no observable difference. When closure semantics were introduced in C# 2.0, the choice was made to put the loop variable outside the loop, consistent with the "for" loop.
I think it is fair to say that all regret that decision. This is one of the worst "gotchas" in C#, and we are going to take the breaking change to fix it. In C# 5 the foreach loop variable will be logically inside the body of the loop, and therefore closures will get a fresh copy every time.
The for loop will not be changed, and the change will not be "back ported" to previous versions of C#. You should therefore continue to be careful when using this idiom.
What you are asking is thoroughly covered by Eric Lippert in his blog post Closing over the loop variable considered harmful and its sequel.
For me, the most convincing argument is that having new variable in each iteration would be inconsistent with for(;;) style loop. Would you expect to have a new int i in each iteration of for (int i = 0; i < 10; i++)?
The most common problem with this behavior is making a closure over iteration variable and it has an easy workaround:
foreach (var s in strings)
{
var s_for_closure = s;
query = query.Where(i => i.Prop == s_for_closure); // access to modified closure
My blog post about this issue: Closure over foreach variable in C#.
Having been bitten by this, I have a habit of including locally defined variables in the innermost scope which I use to transfer to any closure. In your example:
foreach (var s in strings)
query = query.Where(i => i.Prop == s); // access to modified closure
I do:
foreach (var s in strings)
{
string search = s;
query = query.Where(i => i.Prop == search); // New definition ensures unique per iteration.
}
Once you have that habit, you can avoid it in the very rare case you actually intended to bind to the outer scopes. To be honest, I don't think I have ever done so.
In C# 5.0, this problem is fixed and you can close over loop variables and get the results you expect.
The language specification says:
8.8.4 The foreach statement
(...)
A foreach statement of the form
foreach (V v in x) embedded-statement
is then expanded to:
{
E e = ((C)(x)).GetEnumerator();
try {
while (e.MoveNext()) {
V v = (V)(T)e.Current;
embedded-statement
}
}
finally {
… // Dispose e
}
}
(...)
The placement of v inside the while loop is important for how it is
captured by any anonymous function occurring in the
embedded-statement. For example:
int[] values = { 7, 9, 13 };
Action f = null;
foreach (var value in values)
{
if (f == null) f = () => Console.WriteLine("First value: " + value);
}
f();
If v was declared outside of the while loop, it would be shared
among all iterations, and its value after the for loop would be the
final value, 13, which is what the invocation of f would print.
Instead, because each iteration has its own variable v, the one
captured by f in the first iteration will continue to hold the value
7, which is what will be printed. (Note: earlier versions of C#
declared v outside of the while loop.)

count objects that meet certain condition in List-collection

I want to count the occurences of objects within a List<T> that match a certain condition.
For example like this
int List<T>.Count(Predicate<T> match)
So for example if have a list of chores, I can see how many are overdue.
int overdue = listOfChores.Count((element) => { return element.DueDate <= DateTime.Today; });
I know that does not exist and so far I solve problems like that in the following way:
int overdue = listOfChores.FindAll([...]).Count;
However that allocates and initializes a new List etc. only to get the count.
A way to do this with less allocation overhead etc.:
int good = 0;
foreach(chore element in listOfChores)
if(element.DueDate <= DateTime.Today)
good++;
The last approach can also be exandend to count several conditions without iterating over the loop more than once. (I already found that getting the count property only takes O(1), but making the List to count from still eats a lot of time)
int a = 0;
int b = 0;
foreach(chore element in listOfChores)
if(element.CondA)
a++;
if(element.CondB)
b++;
Given this I could even imagine something like
int[] List<T>.Count(Predicate<T>[] matches)
My question(s):
Is there such a thing, just I haven't found it yet?
If not: What would be way to implement such functionality?
EDIT :
Adding LINQ looks like it fixes it.
You just have your syntax slightly off. This is how to use Count :
int overdue = listOfChores.Count(element => element.DueDate <= DateTime.Today);
If you already have a Predicate<T> and want to pass it to Count just call it like a function:
Predicate<Chore> p = (element) => element.DueDate <= DateTime.Today;
int overdue = listOfChores.Count(element => p(element));
There's is a count method using a predicate : see Enumerable.Count Method (IEnumerable, Func)
Note that this method is an extension method and you can use it only if you add a reference to the System.Linq namespace.

Why does ReSharper suggest I convert a for loop into a LINQ expression?

In Visual Studio Re-Sharper keeps recommending I convert a for loop to a linq expression but what is the reason for this?
Which is faster?
Here are some example loops where resharper suggests a linq conversion:
foreach (XmlNode legendEntryNode in _legendEntryNodes)
{
var xmlElement = legendEntryNode["FeatureType"];
if (xmlElement == null || !xmlElement.InnerText.Equals(featuretype)) continue;
var xmlNodeList = legendEntryNode.SelectNodes("Themes/Theme");
if (xmlNodeList != null)
foreach (XmlNode themeNode in xmlNodeList)
{
var element = themeNode["Value"];
if (element == null || !element.InnerText.Equals(v)) continue;
var xmlElement1 = themeNode["Icon"];
if (xmlElement1 != null)
{
string iconname = "<ms:ICON>" + xmlElement1.InnerText + "</ms:ICON>";
var element1 = themeNode["Highlight"];
if (element1 != null)
{
string highlightname = "<ms:HIGHLIGHT>" + element1.InnerText + "</ms:HIGHLIGHT>";
gml = gml.Insert(c, iconname + highlightname);
c += (iconname.Length + highlightname.Length);
}
}
break;
}
}
And this simpler example:
for (int i = 0; i < getPointsRequest.Attribs.Length; i++)
{
string attribName = getPointsRequest.Attribs[i].AttributeName;
if (!String.IsNullOrEmpty(attribName))
{
sqlQuery += "<ms:" + attribName + ">||\"" + attribName + "\"||</ms:" + attribName + ">";
}
}
Speed is very often irrelevant in large portions of your code - you should write code the simplest way, and then measure it to make sure it's fast enough.
If your for loop is really just querying, then LINQ is absolutely a great way to end up with more readable code. It's not universally applicable, but it's something you should at least bear in mind frequently.
Quite often a for loop can be converted into a query to be evaluated lazily, and then a foreach loop which performs some action on each value returned by the query. That can help separate the two aspects, letting you focus on one at a time when reading the code. It's important to keep LINQ queries as queries though, rather than using side-effects within them - it's designed to have a functional approach, which really doesn't mix pleasantly with side-effects.
If you have some concrete examples, we could give more opinions about which loops would make sense to convert to use LINQ, and which wouldn't.
No performance gain as such, but some benefits
Makes code more readable.
Reduces the number of lines.
Easy to maintain.
In some cases you don't require temporary variables, which you might require in for loop. Using Linq you can chain queries.
For more details you can refer:
LINQ query operators: lose that foreach already
The “Anti-For” Campaign
Life After Loops
Hope this helps you.
In general ReSharper's suggestions are just suggestions and no warnings. So it's only up to you to decide what way you go: LINQ or foreach.
I have the same issue with suggestion "Use 'var'". I click that suggestion only if I think the reader could better read the statement.
Readability is one of my highest priorities while writing code.
It's probable that there's no difference in speed, however using Linq can often result in terser code.
That's not to say you should always accept R#'s suggestion to convert to a Linq expression. Sometimes complex but understandable foreach loops are converted into valid but not easily understood Linq expressions.
I'd say there is a reason why not to convert sometimes. It is perhaps less admirable that ReSharper does not offer a refactoring to convert a LINQ expression (back) into a for-loop. I have on a few occasions converted a loop into an expression and then later wanted to put in some further actions (often debugging actions) within the loop; I have to convert them back by hand.
I would warn against converting a for-loop without good reason. Quite often it really doesn't improve readability, and there isn't any other strong reason to do it (as others have rightly pointed out, most loops are not speed critical).
I think some for-loops are more readable than the LINQ equivalent, because they visually break down the actions of the loop into bite-size pieces. I'd say that it tends to be small loops (three or four lines) that are most improved by making them into an expression on one line.
[Sorry this post is mostly opinion, but readability is a bit of a subjective subject. No trolling!]
Hi Linq is actually calling a for loop internally. I guess it comes downs to that Linq expressions are in general easier to read/ maintain. If you are really concerned about performance there is a good comparison between the two: http://geekswithblogs.net/BlackRabbitCoder/archive/2010/04/23/c-linq-vs-foreach---round-1.aspx
As reference for others, here is an example of for loop and a for loop suggested by Resharper
for (int x = 0; x < grid.Length; x++)
{
var intCount = grid[x].Select((a, b) => new {Value = a, Index = b})
.GroupBy(y => y.Value)
.Where(y => y.Count() > 1).Select(item => item.Key).ToArray();
if (intCount.Count() > 1)
return false;
}
To explain this code, this for loop will get all the duplicates on an array. After getting all the duplicates, check if count of items are greater than one, then return false.
This is the suggested for loop in LINQ:
return grid.Select(t => t.Select((a, b) => new {Value = a, Index = b}).
GroupBy(y => y.Value).Where(y => y.Count() > 1).
Select(item => item.Key).ToArray()).All(intCount => intCount.Count() <= 1);
There might be no performance gains, but as you can see from the example, a LINQ query is cleaner, easy to read, lesser lines (which in this case, only one line of code, I just adjusted it after pasting it here) and easy to debug as well.

Linq late binding confusion

Can someone please explain me what I am missing here. Based on my basic understanding linq result will be calculated when the result will be used and I can see that in following code.
static void Main(string[] args)
{
Action<IEnumerable<int>> print = (x) =>
{
foreach (int i in x)
{
Console.WriteLine(i);
}
};
int[] arr = { 1, 2, 3, 4, 5 };
int cutoff = 1;
IEnumerable<int> result = arr.Where(x => x < cutoff);
Console.WriteLine("First Print");
cutoff = 3;
print(result);
Console.WriteLine("Second Print");
cutoff = 4;
print(result);
Console.Read();
}
Output:
First Print
1
2
Second Print
1
2
3
Now I changed the
arr.Where(x => x < cutoff);
to
IEnumerable<int> result = arr.Take(cutoff);
and the output is as follow.
First Print
1
Second Print
1
Why with Take, it does not use the current value of the variable?
The behavior your seeing comes from the different way in which the arguments to the LINQ functions are evaluated. The Where method recieves a lambda which captures the value cutoff by reference. It is evaluated on demand and hence sees the value of cutoff at that time.
The Take method (and similar methods like Skip) take an int parameter and hence cutoff is passed by value. The value used is the value of cutoff at the moment the Take method is called, not when the query is evaluated
Note: The term late binding here is a bit incorrect. Late binding generally refers to the process where the members an expression binds to are determined at runtime vs. compile time. In C# you'd accomplish this with dynamic or reflection. The behavior of LINQ to evaluate it's parts on demand is known as delayed execution.
There's a few different things getting confused here.
Late-binding: This is where the meaning of code is determined after it was compiled. For example, x.DoStuff() is early-bound if the compiler checks that objects of x's type have a DoStuff() method (considering extension methods and default arguments too) and then produces the call to it in the code it outputs, or fails with a compiler error otherwise. It is late-bound if the search for the DoStuff() method is done at run-time and throws a run-time exception if there was no DoStuff() method. There are pros and cons to each, and C# is normally early-bound but has support for late-binding (most simply through dynamic but the more convoluted approaches involving reflection also count).
Delayed execution: Strictly speaking, all Linq methods immediately produce a result. However, that result is an object which stores a reference to an enumerable object (often the result of the previous Linq method) which it will process in an appropriate manner when it is itself enumerated. For example, we can write our own Take method as:
private static IEnumerable<T> TakeHelper<T>(IEnumerable<T> source, int number)
{
foreach(T item in source)
{
yield return item;
if(--number == 0)
yield break;
}
}
public static IEnumerable<T> Take<T>(this IEnumerable<T> source, int number)
{
if(source == null)
throw new ArgumentNullException();
if(number < 0)
throw new ArgumentOutOfRangeException();
if(number == 0)
return Enumerable.Empty<T>();
return TakeHelper(source, number);
}
Now, when we use it:
var taken4 = someEnumerable.Take(4);//taken4 has a value, so we've already done
//something. If it was going to throw
//an argument exception it would have done so
//by now.
var firstTaken = taken4.First();//only now does the object in taken4
//do the further processing that iterates
//through someEnumerable.
Captured variables: Normally when we make use of a variable, we make use of how its current state:
int i = 2;
string s = "abc";
Console.WriteLine(i);
Console.WriteLine(s);
i = 3;
s = "xyz";
It's pretty intuitive that this prints 2 and abc and not 3 and xyz. In anonymous functions and lambda expressions though, when we make use of a variable we are "capturing" it as a variable, and so we will end up using the value it has when the delegate is invoked:
int i = 2;
string s = "abc";
Action λ = () =>
{
Console.WriteLine(i);
Console.WriteLine(s);
};
i = 3;
s = "xyz";
λ();
Creating the λ doesn't use the values of i and s, but creates a set of instructions as to what to do with i and s when λ is invoked. Only when that happens are the values of i and s used.
Putting it all together: In none of your cases do you have any late-binding. That is irrelevant to your question.
In both you have delayed execution. Both the call to Take and the call to Where return enumerable objects which will act upon arr when they are enumerated.
In only one do you have a captured variable. The call to Take passes an integer directly to Take and Take makes use of that value. The call to Where passes a Func<int, bool> created from a lambda expression, and that lambda expression captures an int variable. Where knows nothing of this capture, but the Func does.
That's the reason the two behave so differently in how they treat cutoff.
Take doesn't take a lambda, but an integer, as such it can't change when you change the original variable.

Which is faster for removing items from a List<object> - RemoveAll or a foreach loop?

For removing an object where a property equals a value which is faster?
foreach(object o in objects)
{
if(o.name == "John Smith")
{
objects.Remove(o);
break;
}
}
or
objects.RemoveAll(o => o.Name == "John Smith");
Thanks!
EDIT:
I should have mentioned this is removing one object from the collection, then breaking out of the loop which prevents any errors you have described, although using a for loop with the count is the better option!
If you really want to know if one thing is faster than another, benchmark it. In other words, measure, don't guess! This is probably my favorite mantra.
As well as the fact that you're breaking the rules in the first one (modifying the list during the processing of it, leading me to invoke my second mantra: You can't get any more unoptimised than "wrong"), the second is more readable and that's usually what I aim for first.
And, just to complete my unholy trinity of mantras: Optimise for readability first, then optimise for speed only where necessary :-)
From a List<string> of 10,000 items, the speeds are:
for loop: 110,000 ticks
lambda: 1,000 ticks
From this information, we can conclude that the lambda expression is faster.
The source code I used can be found here.
Note that I substituted your foreach with a for loop, since we aren't able to modify values within a foreach loop.
Assuming you meant something like
for(int i = 0; i < objects.Count; i++)
{
if(objects[i].name == "John Smith")
{
objects.Remove(objects[i--]);
}
}
RemoveAll would be faster in this case. As with Remove you are iterating over the list again(IndexOf) when you already have the position.
Here is List.Remove
public bool Remove(T item)
{
int index = this.IndexOf(item);
if (index >= 0x0)
{
this.RemoveAt(index);
return true;
}
return false;
}

Categories

Resources