Task.Run() in console application print weird result [duplicate] - c#

This question already has answers here:
Captured variable in a loop in C#
(10 answers)
Closed 1 year ago.
I am trying to run several tasks at the same time and I came across an issue I can't seem to be able to understand nor solve.
I used to have a function like this :
private void async DoThings(int index, bool b) {
await SomeAsynchronousTasks();
var item = items[index];
item.DoSomeProcessing();
if(b)
AVolatileList[index] = item; //volatile or not, it does not work
else
AnotherVolatileList[index] = item;
}
That I wanted to call in a for loop using Task.Run(). However I could not find a way to send parameters to this Action<int, bool> and everyone recommends using lambdas in similar cases:
for(int index = 0; index < MAX; index++) { //let's say that MAX equals 400
bool b = CheckSomething();
Task.Run(async () => {
await SomeAsynchronousTasks();
var item = items[index]; //here, index is always evaluated at 400
item.DoSomeProcessing();
if(b)
AVolatileList[index] = item; //volatile or not, it does not work
else
AnotherVolatileList[index] = item;
}
}
I thought using local variables in lambdas would "capture" their values but it looks like it does not; it will always take the value of index as if the value would be captured at the end of the for loop. The index variable is evaluated at 400 in the lambda at each iteration so of course I get an IndexOutOfRangeException 400 times (items.Count is actually MAX).
I am really not sure about what is happening here (though I am really curious about it) and I don't know how to do what I am trying to achieve either. Any hints are welcome!

Make a local copy of your index variable:
for(int index = 0; index < MAX; index++) {
var localIndex = index;
Task.Run(async () => {
await SomeAsynchronousTasks();
var item = items[index];
item.DoSomeProcessing();
if(b)
AVolatileList[index] = item;
else
AnotherVolatileList[index] = item;
}
}
This is due to the way C# does a for loop: there is only one index variable that is updated, and all your lambdas are capturing that same variable (with lambdas, variables are captured, not values).
As a side note, I recommend that you:
Avoid async void. You can never know when an async void method completes, and they have difficult error handling semantics.
await all of your asynchronous operations. I.e., don't ignore the task returned from Task.Run. Use Task.WhenAll or the like to await for them. This allows exceptions to propagate.
For example, here's one way to use WhenAll:
var tasks = Enumerable.Range(0, MAX).Select(index =>
Task.Run(async () => {
await SomeAsynchronousTasks();
var item = items[localIndex];
item.DoSomeProcessing();
if(b)
AVolatileList[localIndex] = item;
else
AnotherVolatileList[localIndex] = item;
}));
await Task.WhenAll(tasks);

All your lambdas capture the same variable which is your loop variable. However, all your lambdas are executed only after the loop has finished. At that point in time, the loop variable has the maximum value, hence all your lambdas use it.
Stephen Cleary shows in his answer how to fix it.
Eric Lippert wrote a detailled two-part series about this.

Related

starting tasks with lambda expressions in loops in C#

In the preparation for a C# exam at university I found the following multiple choice question:
Client applications call your library by passing a set of operations
to perform. Your library must ensure that system resources are most
effectively used. Jobs may be scheduled in any order, but your
librarymust log the position of each operation. You have declared this
code:
public IEnumerable<Task> Execute(Action[] jobs)
{
var tasks = new Task[jobs.Length];
for (var i = 0; i < jobs.Length; i++)
{
/* COMPLETION NEEDED */
}
return tasks;
}
public void RunJob(Action job, int index)
{
// implementation omitted
}
Complete the method by inserting code in the for loop. Choose the
correct answer.
1.)
tasks[i] = new Task((idx) => RunJob(jobs[(int)idx], (int)idx), i);
tasks[i].Start();
2.)
tasks[i] = new Task(() => RunJob(jobs[i], i));
tasks[i].Start();
3.)
tasks[i] = Task.Run(() => RunJob(jobs[i], i));
I have opted for answer 3 since Task.Run() queues the specified work on the thread pool and returns a Task object that represents the work.
But the correct answer was 1, using the Task(Action, Object) constructor. The explanation says the following:
In answer 1, the second argument to the constructor is passed as the
only argument to the Action delegate. The current value of the
i variable is captured when the value is boxed and passed to the Task
constructor.
Answer 2 and 3 use a lambda expression that captures the i variable
from the enclosing method. The lambda expression will probably return
the final value of i, in this case 10, before the operating system
preempts the current thread and begins every task delegate created by
the loop. The exact value cannot be determined because the OS
schedules thread execution based on many factors external to your
program.
While I perfectly understand the explanation of answer 1, I don't get the point in the explanations for answer 2 and 3. Why would the lambda expression return the final value?
In options 2 and 3 lambda captures original i variable used in for loop. It's not guaranteed when tasks will be run on thread pool. So possible behavior: for loop is finished, i=10 and then tasks are started to execute. So all of them will use i=10.
Similar behavior you can see here:
void Do()
{
var actions = new List<Action>();
for (int i = 0; i < 3; i++)
{
actions.Add(() => Console.WriteLine(i));
}
//actions executed after loop is finished
foreach(var a in actions)
{
a();
}
}
Output is:
3
3
3
You can fix it like this:
for (int i = 0; i < 3; i++)
{
var local = i;
actions.Add(() => Console.WriteLine(local));
}

Task.Run(), passing parameter to

Consider the following code:
attempt = 0;
for (int counter = 0; counter < 8; counter++)
{
if (attempt < totalitems)
{
Tasklist<output>.Add(Task.Run(() =>
{
return someasynctask(inputList[attempt]);
}));
}
else
{
break;
}
attempt++;
}
await Task.WhenAll(Tasklist).ConfigureAwait(false);
I want to have for example 8 concurrent tasks, each working on different inputs concurrently, and finally check the result, when all of them have finished.
Because I'm not awaiting for completion of Task.Run() attempt is increased before starting of tasks, and when the task is started, there may be items in the inputList that are not processed or processed twice or more instead (because of uncertainty in attempt value.
How to do that?
The problem lies within the use of a "lambda": when Task.Run(() => return someasynctask(inputList[attempt])); is reached during the execution, the variable attempt is captured, not its value (i.e. it is a "closure"). Consequently, when the lambda gets executed, the value of the variable at that specific moment will be used.
Just add a temporary copy of the variable before your lambda, and use that. E.g.
if (attempt < totalitems)
{
int localAttempt = attempt;
Tasklist<output>.Add(Task.Run(() =>
{
return someasynctask(inputList[localAttempt]);
}));
}
Thanks to #gobes for his answer:
Try this:
attempt = 0;
for (int counter = 0; counter < 8; counter++)
{
if (attempt < totalitems)
{
Tasklist<output>.Add(Task.Run(() =>
{
int tmpAttempt = attempt;
return someasynctask(inputList[tmpAttempt]);
}));
}
else
{
break;
}
attempt++;
}
await Task.WhenAll(Tasklist).ConfigureAwait(false);
Actually, what the compiler is doing is extracting your lambda into a method, located in an automagically generated class, which is referencing the attempt variable. This is the important point: the generated code only reference the variable from another class; it doesn't copy it. So every change to attempt is seen by the method.
What happens during the execution is roughly this:
enter the loop with attempt = 0
add a call of the lambda-like-method to your tasklist
increase attempt
repeat
After the loop, you have some method calls awaiting (no pun intended) to be executed, but each one is referencing THE SAME VARIABLE, therefore sharing its value - the last affected to it.
For more details, I really recommend reading C# in depth, or some book of the same kind - there are a lot of resources about closure in C# on the web :)

cannot understand lambda expression output [duplicate]

This question already has answers here:
Captured variable in a loop in C#
(10 answers)
Closed last month.
I have this code and do not understand why the out put is 22! I am afraid it should be 01!
can anyone explain what happens? if the list store a method with a parameter, so the parameters should be 0 and 1 respectively!
List<Action> list = new List<Action>();
for (int i = 0; i < 2; i++)
{
list.Add(() => Console.Write(i));
}
foreach (var it in list)
{
it();
}
It is Closure (1, 2).
In your case Console.Write(i) will use value of i in the moment of action call. You firstly increment i in for loop then in second loop you call every action in the list. In the moment of call of every action i has value 2 - so, you get 22 as output.
To get expected result you should create local copy of i and use it:
for (int i = 0; i < 2; i++)
{
var temp = i;
list.Add(() => Console.Write(temp));
}
Addition to Roma Doskoch's anwser, another approach is to avoid for.
var list = Enumerable
.Range(0, 2)
.Select<int, Action>(i => () => Console.Write(i));
Closures capture variables, not values.
In your code, the closure captures the variable i, not whatever value happens to be stored in i on each iteration. When you invoke the action, the variable i has a value of 2 (because the loop has finished) and therefore 2 will be printed out twice.
In order to avoid this, as other answers already point out, you need to create a new variable every time around as a workaround to not being able to capture values; if you declare a new variable on every iteration then the result of capturing the variable is equivalent to capturing the value because you won't be changing it on the next iteration.

How AsParallel extension actually works

So topic is the questions.
I get that method AsParallel returns wrapper ParallelQuery<TSource> that uses the same LINQ keywords, but from System.Linq.ParallelEnumerable instead of System.Linq.Enumerable
It's clear enough, but when i'm looking into decompiled sources, i don't understand how does it works.
Let's begin from an easiest extension : Sum() method. Code:
[__DynamicallyInvokable]
public static int Sum(this ParallelQuery<int> source)
{
if (source == null)
throw new ArgumentNullException("source");
else
return new IntSumAggregationOperator((IEnumerable<int>) source).Aggregate();
}
it's clear, let's go to Aggregate() method. It's a wrapper on InternalAggregate method that traps some exceptions. Now let's take a look on it.
protected override int InternalAggregate(ref Exception singularExceptionToThrow)
{
using (IEnumerator<int> enumerator = this.GetEnumerator(new ParallelMergeOptions?(ParallelMergeOptions.FullyBuffered), true))
{
int num = 0;
while (enumerator.MoveNext())
checked { num += enumerator.Current; }
return num;
}
}
and here is the question: how it works? I see no concurrence safety for a variable, modified by many threads, we see only iterator and summing. Is it magic enumerator? Or how does it works? GetEnumerator() returns QueryOpeningEnumerator<TOutput>, but it's code is too complicated.
Finally in my second PLINQ assault I found an answer. And it's pretty clear.
Problem is that enumerator is not simple. It's a special multithreading one. So how it works? Answer is that enumerator doesn't return a next value of source, it returns a whole sum of next partition. So this code is only executed 2,4,6,8... times (based on Environment.ProcessorCount), when actual summation work is performed inside enumerator.MoveNext in enumerator.OpenQuery method.
So TPL obviosly partition the source enumerable, then sum independently each partition and then pefrorm this summation, see IntSumAggregationOperatorEnumerator<TKey>. No magic here, just could plunge deeper.
The Sum operator aggregates all values in a single thread. There is no multi-threading here. The trick is that multi-threading is happening somewhere else.
The PLINQ Sum method can handle PLINQ enumerables. Those enumerables could be built up using other constructs (such as where) that allows a collection to be processed over multiple threads.
The Sum operator is always the last operator in a chain. Although it is possible to process this sum over multiple threads, the TPL team probably found out that this had a negative impact on performance, which is reasonable, since the only thing this method has to do is a simple integer addition.
So this method processes all results that come available from other threads and processes them on a single thread and returns that value. The real trick is in other PLINQ extension methods.
protected override int InternalAggregate(ref Exception singularExceptionToThrow)
{
using (IEnumerator<int> enumerator = this.GetEnumerator(new ParallelMergeOptions? (ParallelMergeOptions.FullyBuffered), true))
{
int num = 0;
while (enumerator.MoveNext())
checked { num += enumerator.Current; }
return num;
}
}
This code won't be executed parallel, the while will be sequentially execute it's innerscope.
Try this instead
List<int> list = new List<int>();
int num = 0;
Parallel.ForEach(list, (item) =>
{
checked { num += item; }
});
The inner action will be spread on the ThreadPool and the ForEach statement will be complete when all items are handled.
Here you need threadsafety:
List<int> list = new List<int>();
int num = 0;
Parallel.ForEach(list, (item) =>
{
Interlocked.Add(ref num, item);
});

For loop goes out of range [duplicate]

This question already has answers here:
Captured variable in a loop in C#
(10 answers)
Closed 11 days ago.
using System;
using System.Collections.Generic;
using System.Text;
using System.Threading.Tasks;
namespace ConsoleApplication1
{
class Program
{
static void Main(string[] args)
{
MyClass myClass = new MyClass();
myClass.StartTasks();
}
}
class MyClass
{
int[] arr;
public void StartTasks()
{
arr = new int[2];
arr[0] = 100;
arr[1] = 101;
for (int i = 0; i < 2; i++)
{
Task.Factory.StartNew(() => WorkerMethod(arr[i])); // IndexOutOfRangeException: i==2!!!
}
}
void WorkerMethod(int i)
{
}
}
}
It seems that i++ gets executed one more time before the loop iteration is finished. Why do I get the IndexOutOfRangeException?
You are closing over loop variable. When it's time for WorkerMethod to get called, i can have the value of two, not the value of 0 or 1.
When you use closures it's important to understand that you are not using the value that the variable has at the moment, you use the variable itself. So if you create lambdas in loop like so:
for(int i = 0; i < 2; i++) {
actions[i] = () => { Console.WriteLine(i) };
}
and later execute the actions, they all will print "2", because that's what the value of i is at the moment.
Introducing a local variable inside the loop will solve your problem:
for (int i = 0; i < 2; i++)
{
int index = i;
Task.Factory.StartNew(() => WorkerMethod(arr[index]));
}
<Resharper plug> That's one more reason to try Resharper - it gives a lot of warnings that help you catch the bugs like this one early. "Closing over a loop variable" is amongst them </Resharper plug>
The reason is that you are using a loop variable inside a parallel task. Because tasks can execute concurrently the value of the loop variable may be different to the value it had when you started the task.
You started the task inside the loop. By the time the task comes to querying the loop variable the loop has ended becuase the variable i is now beyond the stop point.
That is:
i = 2 and the loop exits.
The task uses variable i (which is now 2)
You should use Parallel.For to perform a loop body in parallel. Here is an example of how to use Parallel.For
Alternativly, if you want to maintain you current strucuture, you can make a copy of i into a loop local variable and the loop local copy will maintain its value into the parallel task.
e.g.
for (int i = 0; i < 2; i++)
{
int localIndex = i;
Task.Factory.StartNew(() => WorkerMethod(arr[localIndex]));
}
Using foreach does not throw:
foreach (var i in arr)
{
Task.Factory.StartNew(() => WorkerMethod(i));
}
But is doesn't work either:
101
101
It executes WorkerMethod with the last entry in the array. Why is nicely explained in the other answers.
This does work:
Parallel.ForEach(arr,
item => Task.Factory.StartNew(() => WorkerMethod(item))
);
Note
This actually is my first hands-on experience with System.Threading.Tasks. I found this question, my naive answer and especially some of the other answers useful for my personal learning experience. I'll leave my answer up here because it might be useful for others.

Categories

Resources