In the preparation for a C# exam at university I found the following multiple choice question:
Client applications call your library by passing a set of operations
to perform. Your library must ensure that system resources are most
effectively used. Jobs may be scheduled in any order, but your
librarymust log the position of each operation. You have declared this
code:
public IEnumerable<Task> Execute(Action[] jobs)
{
var tasks = new Task[jobs.Length];
for (var i = 0; i < jobs.Length; i++)
{
/* COMPLETION NEEDED */
}
return tasks;
}
public void RunJob(Action job, int index)
{
// implementation omitted
}
Complete the method by inserting code in the for loop. Choose the
correct answer.
1.)
tasks[i] = new Task((idx) => RunJob(jobs[(int)idx], (int)idx), i);
tasks[i].Start();
2.)
tasks[i] = new Task(() => RunJob(jobs[i], i));
tasks[i].Start();
3.)
tasks[i] = Task.Run(() => RunJob(jobs[i], i));
I have opted for answer 3 since Task.Run() queues the specified work on the thread pool and returns a Task object that represents the work.
But the correct answer was 1, using the Task(Action, Object) constructor. The explanation says the following:
In answer 1, the second argument to the constructor is passed as the
only argument to the Action delegate. The current value of the
i variable is captured when the value is boxed and passed to the Task
constructor.
Answer 2 and 3 use a lambda expression that captures the i variable
from the enclosing method. The lambda expression will probably return
the final value of i, in this case 10, before the operating system
preempts the current thread and begins every task delegate created by
the loop. The exact value cannot be determined because the OS
schedules thread execution based on many factors external to your
program.
While I perfectly understand the explanation of answer 1, I don't get the point in the explanations for answer 2 and 3. Why would the lambda expression return the final value?
In options 2 and 3 lambda captures original i variable used in for loop. It's not guaranteed when tasks will be run on thread pool. So possible behavior: for loop is finished, i=10 and then tasks are started to execute. So all of them will use i=10.
Similar behavior you can see here:
void Do()
{
var actions = new List<Action>();
for (int i = 0; i < 3; i++)
{
actions.Add(() => Console.WriteLine(i));
}
//actions executed after loop is finished
foreach(var a in actions)
{
a();
}
}
Output is:
3
3
3
You can fix it like this:
for (int i = 0; i < 3; i++)
{
var local = i;
actions.Add(() => Console.WriteLine(local));
}
Related
This question already has answers here:
Captured variable in a loop in C#
(10 answers)
Closed 1 year ago.
I am trying to run several tasks at the same time and I came across an issue I can't seem to be able to understand nor solve.
I used to have a function like this :
private void async DoThings(int index, bool b) {
await SomeAsynchronousTasks();
var item = items[index];
item.DoSomeProcessing();
if(b)
AVolatileList[index] = item; //volatile or not, it does not work
else
AnotherVolatileList[index] = item;
}
That I wanted to call in a for loop using Task.Run(). However I could not find a way to send parameters to this Action<int, bool> and everyone recommends using lambdas in similar cases:
for(int index = 0; index < MAX; index++) { //let's say that MAX equals 400
bool b = CheckSomething();
Task.Run(async () => {
await SomeAsynchronousTasks();
var item = items[index]; //here, index is always evaluated at 400
item.DoSomeProcessing();
if(b)
AVolatileList[index] = item; //volatile or not, it does not work
else
AnotherVolatileList[index] = item;
}
}
I thought using local variables in lambdas would "capture" their values but it looks like it does not; it will always take the value of index as if the value would be captured at the end of the for loop. The index variable is evaluated at 400 in the lambda at each iteration so of course I get an IndexOutOfRangeException 400 times (items.Count is actually MAX).
I am really not sure about what is happening here (though I am really curious about it) and I don't know how to do what I am trying to achieve either. Any hints are welcome!
Make a local copy of your index variable:
for(int index = 0; index < MAX; index++) {
var localIndex = index;
Task.Run(async () => {
await SomeAsynchronousTasks();
var item = items[index];
item.DoSomeProcessing();
if(b)
AVolatileList[index] = item;
else
AnotherVolatileList[index] = item;
}
}
This is due to the way C# does a for loop: there is only one index variable that is updated, and all your lambdas are capturing that same variable (with lambdas, variables are captured, not values).
As a side note, I recommend that you:
Avoid async void. You can never know when an async void method completes, and they have difficult error handling semantics.
await all of your asynchronous operations. I.e., don't ignore the task returned from Task.Run. Use Task.WhenAll or the like to await for them. This allows exceptions to propagate.
For example, here's one way to use WhenAll:
var tasks = Enumerable.Range(0, MAX).Select(index =>
Task.Run(async () => {
await SomeAsynchronousTasks();
var item = items[localIndex];
item.DoSomeProcessing();
if(b)
AVolatileList[localIndex] = item;
else
AnotherVolatileList[localIndex] = item;
}));
await Task.WhenAll(tasks);
All your lambdas capture the same variable which is your loop variable. However, all your lambdas are executed only after the loop has finished. At that point in time, the loop variable has the maximum value, hence all your lambdas use it.
Stephen Cleary shows in his answer how to fix it.
Eric Lippert wrote a detailled two-part series about this.
Consider the following code:
attempt = 0;
for (int counter = 0; counter < 8; counter++)
{
if (attempt < totalitems)
{
Tasklist<output>.Add(Task.Run(() =>
{
return someasynctask(inputList[attempt]);
}));
}
else
{
break;
}
attempt++;
}
await Task.WhenAll(Tasklist).ConfigureAwait(false);
I want to have for example 8 concurrent tasks, each working on different inputs concurrently, and finally check the result, when all of them have finished.
Because I'm not awaiting for completion of Task.Run() attempt is increased before starting of tasks, and when the task is started, there may be items in the inputList that are not processed or processed twice or more instead (because of uncertainty in attempt value.
How to do that?
The problem lies within the use of a "lambda": when Task.Run(() => return someasynctask(inputList[attempt])); is reached during the execution, the variable attempt is captured, not its value (i.e. it is a "closure"). Consequently, when the lambda gets executed, the value of the variable at that specific moment will be used.
Just add a temporary copy of the variable before your lambda, and use that. E.g.
if (attempt < totalitems)
{
int localAttempt = attempt;
Tasklist<output>.Add(Task.Run(() =>
{
return someasynctask(inputList[localAttempt]);
}));
}
Thanks to #gobes for his answer:
Try this:
attempt = 0;
for (int counter = 0; counter < 8; counter++)
{
if (attempt < totalitems)
{
Tasklist<output>.Add(Task.Run(() =>
{
int tmpAttempt = attempt;
return someasynctask(inputList[tmpAttempt]);
}));
}
else
{
break;
}
attempt++;
}
await Task.WhenAll(Tasklist).ConfigureAwait(false);
Actually, what the compiler is doing is extracting your lambda into a method, located in an automagically generated class, which is referencing the attempt variable. This is the important point: the generated code only reference the variable from another class; it doesn't copy it. So every change to attempt is seen by the method.
What happens during the execution is roughly this:
enter the loop with attempt = 0
add a call of the lambda-like-method to your tasklist
increase attempt
repeat
After the loop, you have some method calls awaiting (no pun intended) to be executed, but each one is referencing THE SAME VARIABLE, therefore sharing its value - the last affected to it.
For more details, I really recommend reading C# in depth, or some book of the same kind - there are a lot of resources about closure in C# on the web :)
This question already has answers here:
For-Loop and LINQ's deferred execution don't play well together
(2 answers)
Closed 7 years ago.
I'm trying to do parallel programming using Task in .Net 4.0 c#.
output of my program is little confusing.
class Program
{
static void Main(string[] args)
{
List<Task> lstTasks = new List<Task>();
for (int i = 0; i < 5; i++)
{
Task tsk = Task.Factory.StartNew(() => DoSomething(i.ToString()));
lstTasks.Add(tsk);
}
Task.WaitAll(lstTasks.ToArray());
Console.WriteLine("Done");
Console.ReadLine();
}
static void DoSomething(string tasKname)
{
Console.WriteLine(tasKname);
System.Threading.Thread.Sleep(10000);
}
}
Output is
5
5
5
5
5
Done.
I'm expecting.
0
1
2
3
4
Done.
where I'm going wrong?
You created a closure when you defined the function () => DoSomething(i.ToString()).
A closure is an anonymous function/lamdba that references some variables defined in the method where the closure was created. In your case, that's variable i.
When this function is executed, it will use the current value of i, not the value that i had when you created it.
You have to be aware that calling Task.Factory.StartNew will not start executing the task immediately. In your case, the tasks started executing after the for loop, so the value of i is 5.
To get the results you expect, use a separate variable in the loop to store the current value of i.
for (int i = 0; i < 5; i++)
{
int k = i;
Task tsk = Task.Factory.StartNew(() => DoSomething(k.ToString()));
lstTasks.Add(tsk);
}
You shouldn't expect the results in any particular order though.
You are accessing a variable that is changing within your loop. Essentially, your foreach runs so quickly, by the time DoSomething runs, i is 5. try this:
for (int i = 0; i < 5; i++)
{
Task tsk = Task.Factory.StartNew(() => DoSomething(i.ToString()));
lstTasks.Add(tsk);
Thread.Sleep(50);
}
and you should see your expected output in the console.
you say I'm expecting 0 1 2 3 4. But you shouldn't. The most important aspect of Tasks is you don't know when they'll complete. For example, when I alter your code to use a Parallel.Foreach():
Parallel.ForEach(Enumerable.Range(0, 5), i =>
{
Task tsk = Task.Factory.StartNew(() => DoSomething(i.ToString()));
lstTasks.Add(tsk);
});
I get the expected numbers, 0 through 4, but in a random order each time i run the code, because we are using Tasks that are all running independently of eachother.
In C# spec 4.0 section 7.15.5.1:
Note that unlike an uncaptured variable, a captured local variable can
be simulataneously exposed to multiple threads of execution.
What exactly does it mean by "multiple threads of execution"? Does this mean multiple threads, multiple execution paths or something else?
E.G.
private static void Main(string[] args)
{
Action[] result = new Action[3];
int x;
for (int i = 0; i < 3; i++)
{
//int x = i * 2 + 1;//this behaves more intuitively. Outputs 1,3,5
x = i*2 + 1;
result[i] = () => { Console.WriteLine(x); };
}
foreach (var a in result)
{
a(); //outputs 5 each time
}
//OR...
int y = 1;
Action one = new Action(() =>
{
Console.WriteLine(y);//Outputs 1
y = 2;
});
Action two = new Action(() =>
{
Console.WriteLine(y);//Outputs 2. Working with same Y
});
var t1 = Task.Factory.StartNew(one);
t1.Wait();
Task.Factory.StartNew(two);
Console.Read();
}
Here x exhibits different behavior based upon where x is declared. In the case of y the same variable is captured and used by multiple threads, but IMO this behavior is intuitive.
What are they referring to?
"Multiple threads of execution" just means multiple threads; i.e., multiple threads that are executing concurrently. If a particular variable is exposed to multiple threads, any of those threads can read and write that variable's value.
This is potentially dangerous and should be avoided whenever possible. One possibility to avoid this, if your scenario allows, is to create local copies of variables in your tasks' methods.
If you modify the second part of your code a little:
int y = 1;
Action one = new Action(() =>
{
Console.WriteLine(y);//Outputs 1
y = 2;
});
Action two = new Action(() =>
{
Console.WriteLine(y);//Outputs 2. Working with same Y
y = 1;
});
var t1 = Task.Factory.StartNew(one);
t1 = Task.Factory.StartNew(two);
t1.Wait();
t1 = Task.Factory.StartNew(one);
t1.Wait();
t1 = Task.Factory.StartNew(two);
Console.Read();
Run it a few times or put it in a loop. It will output different, seemingly random, results, e.g. 1 1 1 2 or 1 2 1 2.
Multiple threads are accessing the same variable and getting and setting it simultaneously, which may give unexpected results.
Refer to the link below.
A reference to the outer variable n is said to be captured when the delegate is created. Unlike local variables, the lifetime of a captured variable extends until the delegates that reference the anonymous methods are eligible for garbage collection.
In other words the variable's memory location is captured when the method is created and then shared to that method. Just like if a thread had access to that variable minus the anonymous method. There is a compiler warning that will occur in some cases that it will suggest you move to a local temp variable to not cause unintended consequences.
MSDN Anonymous Methods
In the question below, I found this neat trick for calling QueueUserWorkItem in a type safe way, where you pass a delegate instead of WaitCallBack and an object. However, it doesn't work the way one would expect.
What's the difference between QueueUserWorkItem() and BeginInvoke(), for performing an asynchronous activity with no return types needed
Here's some sample code and output that demonstrates the issue.
for (int i = 0; i < 10; ++i)
{
// doesn't work - somehow DoWork is invoked with i=10 each time!!!
ThreadPool.QueueUserWorkItem(delegate { DoWork("closure", i); });
// not type safe, but it works
ThreadPool.QueueUserWorkItem(new WaitCallback(DoWork), Tuple.Create(" WCB", i));
}
void DoWork(string s, int i)
{
Console.WriteLine("{0} - i:{1}", s, i);
}
void DoWork(object state)
{
var t = (Tuple<string, int>)state;
DoWork(t.Item1, t.Item2);
}
and here is the output:
closure - i:10
WCB - i:0
closure - i:10
WCB - i:2
WCB - i:3
closure - i:10
WCB - i:4
closure - i:10
WCB - i:5
closure - i:10
WCB - i:6
closure - i:10
WCB - i:7
closure - i:10
WCB - i:8
closure - i:10
WCB - i:9
WCB - i:1
closure - i:10
Note that when using the closure to call QueueUserWorkitem, i=10 for ever call, but when using the WaitCallBack you get the correct values, 0-9.
So my questions are:
Why isn't the correct value of i passed when using the closure/delegate way of doing it?
How on earth does i ever get to be 10? In the loop, it only ever had values 0-9 right?
The answers to both of your question are related to the scope of the closure when you create the anonymous method.
When you do this:
// Closure for anonymous function call begins here.
for (int i = 0; i < 10; ++i)
{
// i is captured
ThreadPool.QueueUserWorkItem(delegate { DoWork("closure", i); });
}
You're capturing i across the entire loop. That means that you queue up your ten threads very quickly, and by the time they start, the closure has captured i to be 10.
To get around this, you reduce the scope of the closure, by introducing a variable inside the loop, like so:
for (int i = 0; i < 10; ++i)
{
// Closure extends to here.
var copy = i;
// **copy** is captured
ThreadPool.QueueUserWorkItem(delegate { DoWork("closure", copy); });
}
Here, the closure doesn't extend beyond the loop, but just to the value inside.
That said, the second call to the QueueUserWorkItem produces the desired result because you've created the Tuple<T1, T2> at the time that the delegate is being queued up, the value is fixed at that point.
Note that in C# 5.0, the behavior for foreach was changed because it happens so often (where the closure closes over the loop) and causes a number of people a lot of headaches (but not for like you are using).
If you want to take advantage of that fact, you can call the Range method on the Enumerable class to use foreach:
foreach (int i in Enumerable.Range(0, 10))
{
// Closure for anonymous function call begins here.
ThreadPool.QueueUserWorkItem(delegate { DoWork("closure", i); });
}
That's because of how variables are captured: the delegate will take the value of i at the time of actual execution, not at the time of declaration, so by that time they're all 10. Try a copy to a local variable:
for (int i = 0; i < 10; ++i)
{
int j = i;
ThreadPool.QueueUserWorkItem(delegate { DoWork("closure", j); });