How does local initialization with Parallel ForEach work? - c#

I am unsure about the use of the local init function in Parallel.ForEach, as described in the msdn article: http://msdn.microsoft.com/en-us/library/dd997393.aspx
Parallel.ForEach<int, long>(nums, // source collection
() => 0, // method to initialize the local variable
(j, loop, subtotal) => // method invoked by the loop on each iteration
{
subtotal += nums[j]; //modify local variable
return subtotal; // value to be passed to next iteration
},...
How does () => 0 initialize anything? What's the name of the variable and how can I use it in the loop logic?

With reference to the following overload of the Parallel.ForEach static extension method:
public static ParallelLoopResult ForEach<TSource, TLocal>(
IEnumerable<TSource> source,
Func<TLocal> localInit,
Func<TSource, ParallelLoopState, TLocal, TLocal> taskBody,
Action<TLocal> localFinally
)
In your specific example
The line:
() => 0, // method to initialize the local variable
is simply a lambda (anonymous function) which will return the constant integer zero. This lambda is passed as the localInit parameter to Parallel.ForEach - since the lambda returns an integer, it has type Func<int> and type TLocal can be inferred as int by the compiler (similarly, TSource can be inferred from the type of the collection passed as parameter source)
The return value (0) is then passed as the 3rd parameter (named subtotal) to the taskBody Func. This (0) is used the initial seed for the body loop:
(j, loop, subtotal) =>
{
subtotal += nums[j]; //modify local variable (Bad idea, see comment)
return subtotal; // value to be passed to next iteration
}
This second lambda (passed to taskBody) is called N times, where N is the number of items allocated to this task by the TPL partitioner.
Each subsequent call to the second taskBody lambda will pass the new value of subTotal, effectively calculating a running partial total, for this Task. After all the items assigned to this task have been added, the third and last, localFinally function parameter will be called, again, passing the final value of the subtotal returned from taskBody. Because several such tasks will be operating in parallel, there will also need to be a final step to add up all the partial totals into the final 'grand' total. However, because multiple concurrent tasks (on different Threads) could be contending for the grandTotal variable, it is important that changes to it are done in a thread-safe manner.
(I've changed names of the MSDN variables to make it more clear)
long grandTotal = 0;
Parallel.ForEach(nums, // source collection
() => 0, // method to initialize the local variable
(j, loop, subtotal) => // method invoked by the loop on each iteration
subtotal + nums[j], // value to be passed to next iteration subtotal
// The final value of subtotal is passed to the localFinally function parameter
(subtotal) => Interlocked.Add(ref grandTotal, subtotal)
In the MS Example, modification of the parameter subtotal inside the task body is a poor practice, and unnecessary. i.e. The code subtotal += nums[j]; return subtotal; would be better as just return subtotal + nums[j]; which could be abbreviated to the lambda shorthand projection (j, loop, subtotal) => subtotal + nums[j]
In General
The localInit / body / localFinally overloads of Parallel.For / Parallel.ForEach allow once-per task initialization and cleanup code to be run, before, and after (respectively) the taskBody iterations are performed by the Task.
(Noting the For range / Enumerable passed to the parallel For / Foreach will be partitioned into batches of IEnumerable<>, each of which will be allocated a Task)
In each Task, localInit will be called once, the body code will be repeatedly invoked, once per item in batch (0..N times), and localFinally will be called once upon completion.
In addition, you can pass any state required for the duration of the task (i.e. to the taskBody and localFinally delegates) via a generic TLocal return value from the localInit Func - I've called this variable taskLocals below.
Common uses of "localInit":
Creating and initializing expensive resources needed by the loop body, like a database connection or a web service connection.
Keeping Task-Local variables to hold (uncontended) running totals or collections
If you need to return multiple objects from localInit to the taskBody and localFinally, you can make use of a strongly typed class, a Tuple<,,> or, if you use only lambdas for the localInit / taskBody / localFinally, you can also pass data via an anonymous class. Note if you use the return from localInit to share a reference type among multiple tasks, that you will need to consider thread safety on this object - immutability is preferable.
Common uses of the "localFinally" Action:
To release resources such as IDisposables used in the taskLocals (e.g. database connections, file handles, web service clients, etc)
To aggregate / combine / reduce the work done by each task back into shared variable(s). These shared variables will be contended, so thread safety is a concern:
e.g. Interlocked.Increment on primitive types like integers
lock or similar will be required for write operations
Make use of the concurrent collections to save time and effort.
The taskBody is the tight part of the loop operation - you'll want to optimize this for performance.
This is all best summarized with a commented example:
public void MyParallelizedMethod()
{
// Shared variable. Not thread safe
var itemCount = 0;
Parallel.For(myEnumerable,
// localInit - called once per Task.
() =>
{
// Local `task` variables have no contention
// since each Task can never run by multiple threads concurrently
var sqlConnection = new SqlConnection("connstring...");
sqlConnection.Open();
// This is the `task local` state we wish to carry for the duration of the task
return new
{
Conn = sqlConnection,
RunningTotal = 0
}
},
// Task Body. Invoked once per item in the batch assigned to this task
(item, loopState, taskLocals) =>
{
// ... Do some fancy Sql work here on our task's independent connection
using(var command = taskLocals.Conn.CreateCommand())
using(var reader = command.ExecuteReader(...))
{
if (reader.Read())
{
// No contention for `taskLocal`
taskLocals.RunningTotal += Convert.ToInt32(reader["countOfItems"]);
}
}
// The same type of our `taskLocal` param must be returned from the body
return taskLocals;
},
// LocalFinally called once per Task after body completes
// Also takes the taskLocal
(taskLocals) =>
{
// Any cleanup work on our Task Locals (as you would do in a `finally` scope)
if (taskLocals.Conn != null)
taskLocals.Conn.Dispose();
// Do any reduce / aggregate / synchronisation work.
// NB : There is contention here!
Interlocked.Add(ref itemCount, taskLocals.RunningTotal);
}
And more examples:
Example of per-Task uncontended dictionaries
Example of per-Task database connections

as an extension to #Honza Brestan's answer. The way Parallel foreach splits the work into tasks can also be important, it will group several loop iterations into a single task so in practice localInit() is called once for every n iterations of the loop and multiple groups can be started simultaneously.
The point of a localInit and localFinally is to ensure that a parallel foreach loop can combine results from each itteration into a single result without you needing to specify lock statements in the body, to do this you must provide an initialisation for the value you want to create (localInit) then each body itteration can process the local value, then you provide a method to combine values from each group (localFinally) in a thread-safe way.
If you don't need localInit for synchronising tasks, you can use lambda methods to reference values from the surrounding context as normal without any problems. See Threading in C# (Parallel.For and Parallel.ForEach) for a more in depth tutorial on using localInit/Finally and scroll down to Optimization with local values, Joseph Albahari is really my goto source for all things threading.

You can get a hint on MSDN in the correct Parallel.ForEach overload.
The localInit delegate is invoked once for each thread that participates in the loop's execution and returns the initial local state for each of those tasks. These initial states are passed to the first body invocations on each task. Then, every subsequent body invocation returns a possibly modified state value that is passed to the next body invocation.
In your example () => 0 is a delegate just returning 0, so this value is used for the first iteration on each task.

From my side a little bit more easier example
class Program
{
class Person
{
public int Id { get; set; }
public string Name { get; set; }
public int Age { get; set; }
}
static List<Person> GetPerson() => new List<Person>()
{
new Person() { Id = 0, Name = "Artur", Age = 26 },
new Person() { Id = 1, Name = "Edward", Age = 30 },
new Person() { Id = 2, Name = "Krzysiek", Age = 67 },
new Person() { Id = 3, Name = "Piotr", Age = 23 },
new Person() { Id = 4, Name = "Adam", Age = 11 },
};
static void Main(string[] args)
{
List<Person> persons = GetPerson();
int ageTotal = 0;
Parallel.ForEach
(
persons,
() => 0,
(person, loopState, subtotal) => subtotal + person.Age,
(subtotal) => Interlocked.Add(ref ageTotal, subtotal)
);
Console.WriteLine($"Age total: {ageTotal}");
Console.ReadKey();
}
}

Related

C# Access ConcurrentDictionary value by key in async app

I have question about ConcurrencyDictionary in .NET C#.
My app is going to be async (I try to do that :)).
I have some external devices, which send data to my core (C# .NET) via some TCPIP communication. I store the objects in values of ConcurrentDictionary for each device. I have some operations with that data, where I need to read it and sometimes change some in the object.
Now it looks good without deadlock (when I increase the number of external/simulated devices, it does not slow, but it can handle more messages in same time (and without data lose)
But: I am not sure if I'm using it correctly.
I need to change some values inside of the object, call some functions and store all changes in the dict. All objects in the dict must be available to be read by other processes (I know during the "DoJob" other processes can have old values in dict until I will save value, but in my case it is ok). I just need to avoid blocking/locking other tasks and make it as fast as possible.
Which way is better:
1 way (i use it now):
var dict = new ConcurentDictionary<MyClass>(concurrencyLevel, initalCapacity);
private async Task DoJob(string myKey)
{
MyClass myClass;
MyClass myClassInitState;
dict.TryGetValue(myKey, out myClass);
dict.TryGetValue(myKey, out myClassInitState);
var value = myClass.SomeValueToRead;
myClass.Prop1 = 10;
await myClass.DoSomeAnotherJob();
dict.TryUpdate(myKey, myClass, myClassInitState);
}
2 way:
var dict = new ConcurentDictionary<MyClass>(concurrencyLevel, initalCapacity);
private async Task DoJob(string myKey)
{
var value = dict[myKey].SomeValueToRead;
dict[myKey].ChangeProp1(10);
await dict[myKey].DoSomeAnotherJob();
}
The second way looks much more clear and simple. But I am not sure if I can do that because of async.
Will I block the other threads/tasks?
Which way will be faster? I expect first one, because inside of DoJob I do not work with dict, but with some copy of object and after all I will update the dict.
Does the reading of values directly (#2) could slow down the whole process?
Could other processes read last-actualised value from dict even during #2 way without any troubles?
What happen when I call:
dict[myKey].DoSomeAnotherJob();
It is awaitable, so it should not block the threads. But in fact it is called in shared dict in some its value.
The thread-safe ConcurrentDictionary (as opposed to a plain old Dictionary) has nothing to do with async/await.
What this does:
await dict[myKey].DoSomeAnotherJob();
Is this:
var temp = dict[myKey];
await temp.DoSomeAnotherJob();
You do not need a ConcurrentDictionary in order to call that async method, dict can just as well be a regular Dictionary.
Also, assuming MyClass is a reference type (a class as opposed to a struct), saving its original reference in a temporary variable and updating the dictionary, as you do, is unnecessary. The moment after you called myClass.Prop1 = 10, this change is propagated to all other places where you have a reference to that same myClass instance.
You only want to call TryUpdate() if you want to replace the value, but you don't, as it's still the same reference - there's nothing to replace, both myClass and myClassInitState point to the same object.
The only reason to use a ConcurrentDictionary (as opposed to a Dictionary), is when the dictionary is accessed from multiple threads. So if you call DoJob() from different threads, that's when you should use a ConcurrentDictionary.
Also, when multithreading is involved, this is dangerous:
var value = dict[myKey].SomeValueToRead;
dict[myKey].ChangeProp1(10);
await dict[myKey].DoSomeAnotherJob();
Because in the meantime, another thread could change the value for myKey, meaning you obtain a different reference each time you call dict[myKey]. So saving it in a temporary variable is the way to go.
Also, using the indexer property (myDict[]) instead of TryGetValue() has its own issues, but still no threading issues.
Both ways are equal in effect. The first way uses methods to read from the collection and the second way uses an indexer to achieve the same. In fact the indexer internally invokes TryGetValue().
When invoking MyClass myClass = concurrentDictionary[key] (or concurrentDictionary[key].MyClassOperation()) the dictionary internally executes the getter of the indexer property:
public TValue this[TKey key]
{
get
{
if (!TryGetValue(key, out TValue value))
{
throw new KeyNotFoundException();
}
return value;
}
set
{
if (key == null) throw new ArgumentNullException("key");
TryAddInternal(key, value, true, true, out TValue dummy);
}
}
The internal ConcurrentDictionary code shows that
concurrentDictionary.TryGetValue(key, out value)
and
var value = concurrentDictionary[key]
are the same except the indexer will throw an KeyNotFoundException if the key doesn't exist.
From an consuming point of view the first version using TryGetValue enables to write more readable code:
// Only get value if key exists
if (concurrentDictionary.TryGetValue(key, out MyClass value))
{
value.Operation();
}
vs
// Only get value if key exists to avoid an exception
if (concurrentDictionary.Contains(key))
{
MyClass myClass = concurrentDictionary[key];
myClass.Operation();
}
Talking about readability, your code can be simplified as followed:
private async Task DoJob(string myKey)
{
if (dict.TryGetValue(myKey, out MyClass myClass))
{
var value = myClass.SomeValueToRead;
myClass.Prop1 = 10;
await myClass.DoSomeAnotherJob();
}
}
As async/await was designed to asynchronously execute operation on
the UI thread, await myClass.DoSomeAnotherJob() won't block.
neither will TryGetValue nor this[] block other threads
both access variants execute at the same speed as they share the same implementation
dict[myKey].Operation()
is equal to
MyClass myclass = dict.[myKey];
myClass.Operation();.
It's the same when
GetMyClass().Operation()
is equal to
MyClass myClass = GetMyClass();
myClass.Operation();
your perception is wrong. Nothing is called "inside" the dictionary. As you can see from the internal code snippet dict[key] returns a value.
Remarks
ConcurrentDictionary is a way to provide thread-safe access to a collection. E.g., this means the thread that accesses the collection will always access a defined state of the collection.
But be aware that the items itself are not thread-safe because they are stored in a thread-safe collection:
Consider the following method is executed by two threads simultaneously.
Both threads share the same ConcurrentDictionary containing therefore shared objects.
// Thread 1
private async Task DoJob(string myKey)
{
if (dict.TryGetValue(myKey, out MyClass myClass))
{
var value = myClass.SomeValueToRead;
myClass.Prop1 = 10;
await myClass.DoSomeLongRunningJob();
// The result is now '100' and not '20' because myClass.Prop1 is not thread-safe. The second thread was allowed to change the value while this thread was reading it
int result = 2 * myClass.Prop1;
}
}
// Thread 2
private async Task DoJob(string myKey)
{
if (dict.TryGetValue(myKey, out MyClass myClass))
{
var value = myClass.SomeValueToRead;
myClass.Prop1 = 50;
await myClass.DoSomeLongRunningJob();
int result = 2 * myClass.Prop1; // '100'
}
}
Also ConcurrentDictionary.TryUpdate(key, newValue, comparisonValue) is the same like the following code:
if (dict.Contains(key))
{
var value = dict[key];
if (value == comparisonValue)
{
dict[key] = newValue;
}
}
Example: Let's say the dictionary contains a numeric element at key "Amount" with a value of 50. Thread 1 only wants to modify this value if Thread 2 hasn't changed it in the meantime. Thread 2 value is more important (has precedence). You now can use the TryUpdate method to apply this rule:
if (dict.TryGetValue("Amount", out int oldIntValue))
{
// Change value according to rule
if (oldIntValue > 0)
{
// Calculate a new value
int newIntValue = oldIintValue *= 2;
// Replace the old value inside the dictionary ONLY if thread 2 hasn't change it already
dict.TryUpdate("Amount", newIntValue, oldIntValue);
}
else // Change value without rule
{
dict["Amount"] = 1;
}
}

starting tasks with lambda expressions in loops in C#

In the preparation for a C# exam at university I found the following multiple choice question:
Client applications call your library by passing a set of operations
to perform. Your library must ensure that system resources are most
effectively used. Jobs may be scheduled in any order, but your
librarymust log the position of each operation. You have declared this
code:
public IEnumerable<Task> Execute(Action[] jobs)
{
var tasks = new Task[jobs.Length];
for (var i = 0; i < jobs.Length; i++)
{
/* COMPLETION NEEDED */
}
return tasks;
}
public void RunJob(Action job, int index)
{
// implementation omitted
}
Complete the method by inserting code in the for loop. Choose the
correct answer.
1.)
tasks[i] = new Task((idx) => RunJob(jobs[(int)idx], (int)idx), i);
tasks[i].Start();
2.)
tasks[i] = new Task(() => RunJob(jobs[i], i));
tasks[i].Start();
3.)
tasks[i] = Task.Run(() => RunJob(jobs[i], i));
I have opted for answer 3 since Task.Run() queues the specified work on the thread pool and returns a Task object that represents the work.
But the correct answer was 1, using the Task(Action, Object) constructor. The explanation says the following:
In answer 1, the second argument to the constructor is passed as the
only argument to the Action delegate. The current value of the
i variable is captured when the value is boxed and passed to the Task
constructor.
Answer 2 and 3 use a lambda expression that captures the i variable
from the enclosing method. The lambda expression will probably return
the final value of i, in this case 10, before the operating system
preempts the current thread and begins every task delegate created by
the loop. The exact value cannot be determined because the OS
schedules thread execution based on many factors external to your
program.
While I perfectly understand the explanation of answer 1, I don't get the point in the explanations for answer 2 and 3. Why would the lambda expression return the final value?
In options 2 and 3 lambda captures original i variable used in for loop. It's not guaranteed when tasks will be run on thread pool. So possible behavior: for loop is finished, i=10 and then tasks are started to execute. So all of them will use i=10.
Similar behavior you can see here:
void Do()
{
var actions = new List<Action>();
for (int i = 0; i < 3; i++)
{
actions.Add(() => Console.WriteLine(i));
}
//actions executed after loop is finished
foreach(var a in actions)
{
a();
}
}
Output is:
3
3
3
You can fix it like this:
for (int i = 0; i < 3; i++)
{
var local = i;
actions.Add(() => Console.WriteLine(local));
}

Why is a lock required here?

I'm looking at an example on p. 40 of Stephen Cleary's book that is
// Note: this is not the most efficient implementation.
// This is just an example of using a lock to protect shared state.
static int ParallelSum(IEnumerable<int> values)
{
object mutex = new object();
int result = 0;
Parallel.ForEach(source: values,
localInit: () => 0,
body: (item, state, localValue) => localValue + item,
localFinally: localValue =>
{
lock (mutex)
result += localValue;
});
return result;
}
and I'm a little confused why the lock is needed. Because if all we're doing is summing a collection of ints, say {1, 5, 6}, then we shouldn't need to care about the shared sum result being incremented in any order.
(1 + 5) + 6 = 1 + (5 + 6) = (1 + 6) + 5 = ...
Can someone explain where my thinking here is flawed?
I guess I'm a little confused by the body of the method can't simply be
int result = 0;
Parallel.ForReach(values, (val) => { result += val; });
return result;
Operations such as addition are not atomic and thus are not thread safe. In the example code, if the lock was omitted, it's completely possible that two addition operations are executed near-simultaneously, resulting in one of the values being overridden or incorrectly added. There is a method that is thread safe to increment integers: Interlocked.Add(int, int). However, since this isn't being used, the lock is required in the example code to ensure that exactly one non-atomic addition operation at most is completed at a time (in sequence and not in parallel).
Its not about the order in which 'result' is updated, its the race condition of updating it, remember that operator += is not atomic, so two threads may not see the udpate of another's thread before they touch it
The statement result += localValue; is really saying result = result + localValue; You are both reading and updating a resource(variable) shared by different threads. This could easily lead into a Race Condition. lockmakes sure this statement at any given moment in time is accessed by a single thread.

Where does a local variable get stored so that it's accessible from an asyc delegate? [duplicate]

What is a closure? Do we have them in .NET?
If they do exist in .NET, could you please provide a code snippet (preferably in C#) explaining it?
I have an article on this very topic. (It has lots of examples.)
In essence, a closure is a block of code which can be executed at a later time, but which maintains the environment in which it was first created - i.e. it can still use the local variables etc of the method which created it, even after that method has finished executing.
The general feature of closures is implemented in C# by anonymous methods and lambda expressions.
Here's an example using an anonymous method:
using System;
class Test
{
static void Main()
{
Action action = CreateAction();
action();
action();
}
static Action CreateAction()
{
int counter = 0;
return delegate
{
// Yes, it could be done in one statement;
// but it is clearer like this.
counter++;
Console.WriteLine("counter={0}", counter);
};
}
}
Output:
counter=1
counter=2
Here we can see that the action returned by CreateAction still has access to the counter variable, and can indeed increment it, even though CreateAction itself has finished.
If you are interested in seeing how C# implements Closure read "I know the answer (its 42) blog"
The compiler generates a class in the background to encapsulate the anoymous method and the variable j
[CompilerGenerated]
private sealed class <>c__DisplayClass2
{
public <>c__DisplayClass2();
public void <fillFunc>b__0()
{
Console.Write("{0} ", this.j);
}
public int j;
}
for the function:
static void fillFunc(int count) {
for (int i = 0; i < count; i++)
{
int j = i;
funcArr[i] = delegate()
{
Console.Write("{0} ", j);
};
}
}
Turning it into:
private static void fillFunc(int count)
{
for (int i = 0; i < count; i++)
{
Program.<>c__DisplayClass1 class1 = new Program.<>c__DisplayClass1();
class1.j = i;
Program.funcArr[i] = new Func(class1.<fillFunc>b__0);
}
}
Closures are functional values that hold onto variable values from their original scope. C# can use them in the form of anonymous delegates.
For a very simple example, take this C# code:
delegate int testDel();
static void Main(string[] args)
{
int foo = 4;
testDel myClosure = delegate()
{
return foo;
};
int bar = myClosure();
}
At the end of it, bar will be set to 4, and the myClosure delegate can be passed around to be used elsewhere in the program.
Closures can be used for a lot of useful things, like delayed execution or to simplify interfaces - LINQ is mainly built using closures. The most immediate way it comes in handy for most developers is adding event handlers to dynamically created controls - you can use closures to add behavior when the control is instantiated, rather than storing data elsewhere.
Func<int, int> GetMultiplier(int a)
{
return delegate(int b) { return a * b; } ;
}
//...
var fn2 = GetMultiplier(2);
var fn3 = GetMultiplier(3);
Console.WriteLine(fn2(2)); //outputs 4
Console.WriteLine(fn2(3)); //outputs 6
Console.WriteLine(fn3(2)); //outputs 6
Console.WriteLine(fn3(3)); //outputs 9
A closure is an anonymous function passed outside of the function in which it is created.
It maintains any variables from the function in which it is created that it uses.
A closure is when a function is defined inside another function (or method) and it uses the variables from the parent method. This use of variables which are located in a method and wrapped in a function defined within it, is called a closure.
Mark Seemann has some interesting examples of closures in his blog post where he does a parallel between oop and functional programming.
And to make it more detailed
var workingDirectory = new DirectoryInfo(Environment.CurrentDirectory);//when this variable
Func<int, string> read = id =>
{
var path = Path.Combine(workingDirectory.FullName, id + ".txt");//is used inside this function
return File.ReadAllText(path);
};//the entire process is called a closure.
Here is a contrived example for C# which I created from similar code in JavaScript:
public delegate T Iterator<T>() where T : class;
public Iterator<T> CreateIterator<T>(IList<T> x) where T : class
{
var i = 0;
return delegate { return (i < x.Count) ? x[i++] : null; };
}
So, here is some code that shows how to use the above code...
var iterator = CreateIterator(new string[3] { "Foo", "Bar", "Baz"});
// So, although CreateIterator() has been called and returned, the variable
// "i" within CreateIterator() will live on because of a closure created
// within that method, so that every time the anonymous delegate returned
// from it is called (by calling iterator()) it's value will increment.
string currentString;
currentString = iterator(); // currentString is now "Foo"
currentString = iterator(); // currentString is now "Bar"
currentString = iterator(); // currentString is now "Baz"
currentString = iterator(); // currentString is now null
Hope that is somewhat helpful.
Closures are chunks of code that reference a variable outside themselves, (from below them on the stack), that might be called or executed later, (like when an event or delegate is defined, and could get called at some indefinite future point in time)... Because the outside variable that the chunk of code references may gone out of scope (and would otherwise have been lost), the fact that it is referenced by the chunk of code (called a closure) tells the runtime to "hold" that variable in scope until it is no longer needed by the closure chunk of code...
Basically closure is a block of code that you can pass as an argument to a function. C# supports closures in form of anonymous delegates.
Here is a simple example:
List.Find method can accept and execute piece of code (closure) to find list's item.
// Passing a block of code as a function argument
List<int> ints = new List<int> {1, 2, 3};
ints.Find(delegate(int value) { return value == 1; });
Using C#3.0 syntax we can write this as:
ints.Find(value => value == 1);
If you write an inline anonymous method (C#2) or (preferably) a Lambda expression (C#3+), an actual method is still being created. If that code is using an outer-scope local variable - you still need to pass that variable to the method somehow.
e.g. take this Linq Where clause (which is a simple extension method which passes a lambda expression):
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
// this is a predicate, i.e. a Func<T, bool> written as a lambda expression
// which is still a method actually being created for you in compile time
{
i++;
return true;
});
if you want to use i in that lambda expression, you have to pass it to that created method.
So the first question that arises is: should it be passed by value or reference?
Pass by reference is (I guess) more preferable as you get read/write access to that variable (and this is what C# does; I guess the team in Microsoft weighed the pros and cons and went with by-reference; According to Jon Skeet's article, Java went with by-value).
But then another question arises: Where to allocate that i?
Should it actually/naturally be allocated on the stack?
Well, if you allocate it on the stack and pass it by reference, there can be situations where it outlives it's own stack frame. Take this example:
static void Main(string[] args)
{
Outlive();
var list = whereItems.ToList();
Console.ReadLine();
}
static IEnumerable<string> whereItems;
static void Outlive()
{
var i = 0;
var items = new List<string>
{
"Hello","World"
};
whereItems = items.Where(x =>
{
i++;
Console.WriteLine(i);
return true;
});
}
The lambda expression (in the Where clause) again creates a method which refers to an i. If i is allocated on the stack of Outlive, then by the time you enumerate the whereItems, the i used in the generated method will point to the i of Outlive, i.e. to a place in the stack that is no longer accessible.
Ok, so we need it on the heap then.
So what the C# compiler does to support this inline anonymous/lambda, is use what is called "Closures": It creates a class on the Heap called (rather poorly) DisplayClass which has a field containing the i, and the Function that actually uses it.
Something that would be equivalent to this (you can see the IL generated using ILSpy or ILDASM):
class <>c_DisplayClass1
{
public int i;
public bool <GetFunc>b__0()
{
this.i++;
Console.WriteLine(i);
return true;
}
}
It instantiates that class in your local scope, and replaces any code relating to i or the lambda expression with that closure instance. So - anytime you are using the i in your "local scope" code where i was defined, you are actually using that DisplayClass instance field.
So if I would change the "local" i in the main method, it will actually change _DisplayClass.i ;
i.e.
var i = 0;
var items = new List<string>
{
"Hello","World"
};
var filtered = items.Where(x =>
{
i++;
return true;
});
filtered.ToList(); // will enumerate filtered, i = 2
i = 10; // i will be overwriten with 10
filtered.ToList(); // will enumerate filtered again, i = 12
Console.WriteLine(i); // should print out 12
it will print out 12, as "i = 10" goes to that dispalyclass field and changes it just before the 2nd enumeration.
A good source on the topic is this Bart De Smet Pluralsight module (requires registration) (also ignore his erroneous use of the term "Hoisting" - what (I think) he means is that the local variable (i.e. i) is changed to refer to the the new DisplayClass field).
In other news, there seems to be some misconception that "Closures" are related to loops - as I understand "Closures" are NOT a concept related to loops, but rather to anonymous methods / lambda expressions use of local scoped variables - although some trick questions use loops to demonstrate it.
A closure aims to simplify functional thinking, and it allows the runtime to manage
state, releasing extra complexity for the developer. A closure is a first-class function
with free variables that are bound in the lexical environment. Behind these buzzwords
hides a simple concept: closures are a more convenient way to give functions access
to local state and to pass data into background operations. They are special functions
that carry an implicit binding to all the nonlocal variables (also called free variables or
up-values) referenced. Moreover, a closure allows a function to access one or more nonlocal variables even when invoked outside its immediate lexical scope, and the body
of this special function can transport these free variables as a single entity, defined in
its enclosing scope. More importantly, a closure encapsulates behavior and passes it
around like any other object, granting access to the context in which the closure was
created, reading, and updating these values.
Just out of the blue,a simple and more understanding answer from the book C# 7.0 nutshell.
Pre-requisit you should know :A lambda expression can reference the local variables and parameters of the method
in which it’s defined (outer variables).
static void Main()
{
int factor = 2;
//Here factor is the variable that takes part in lambda expression.
Func<int, int> multiplier = n => n * factor;
Console.WriteLine (multiplier (3)); // 6
}
Real part:Outer variables referenced by a lambda expression are called captured variables. A lambda expression that captures variables is called a closure.
Last Point to be noted:Captured variables are evaluated when the delegate is actually invoked, not when the variables were captured:
int factor = 2;
Func<int, int> multiplier = n => n * factor;
factor = 10;
Console.WriteLine (multiplier (3)); // 30
A closure is a function, defined within a function, that can access the local variables of it as well as its parent.
public string GetByName(string name)
{
List<things> theThings = new List<things>();
return theThings.Find<things>(t => t.Name == name)[0];
}
so the function inside the find method.
t => t.Name == name
can access the variables inside its scope, t, and the variable name which is in its parents scope. Even though it is executed by the find method as a delegate, from another scope all together.

Lambda expression as ThreadStart strange behavior [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
C# - The foreach identifier and closures
From Eric Lippert’s blog: “don’t close over the loop variable”
I'm using a lambda expression as ThreadStart parameter, to run a method in a new thread using Thread class. This is my code:
delegate void del();
static void Do(int i)
{
Console.WriteLine(i);
}
static del CreateLoop(del Do)
{
return () =>
{
while (true)
{
Do();
Thread.Sleep(500);
}
};
}
static void Main(string[] args)
{
int n = 0;
var loop = CreateLoop(() => Do(n));
new Thread(() => loop()).Start();
Thread.Sleep(500);
n = 1;
}
And this is the output:
0
1
1
1
...
How is it possible?
Why if I change the value of my integer variable n, also changes the value of i (Do's parameter)?
You should make a different variable out of it, thus not changing the original value.
After all, all you're really doing is calling that same old 'function', the lambda expression passing the variable i over and over again, which indeed changes. It's nog like you're storing the initial value of the var i somewhere.
var loop = CreateLoop(() => Do(n));
This line is simply creating a new function and assigning it to a variable. This function, among other things, passes the value n to the Do function. But it's not calling the Do function, it's just creating a function which will, when executed, call the Do function.
You then start a new thread which calls the function, etc, but your new thread is still executing Do(n), passing the n variable to Do. That part doesn't change - you've created a function which references a particular place in memory (represented by the variable n) and continues to reference that place in memory even as you change the value which is stored there.
I believe the following would "fix" your code:
var loop = (int x) => () => CreateLoop(() => Do(x));
new Thread(loop(n)).Start();
This passes the value of n to the function represented by loop, but the loop function creates a new place in memory (represented by x) in which to store the value. This new place in memory is not affected by subsequent changes to n. That is to say, the function you've created does not directly reference the place in memory to which n is a pointer.

Categories

Resources