Process not running in parallel using ParallelForEachAsync

Process not running in parallel using ParallelForEachAsync - c#

I am testing running python via Process.Start in parallel
My machine has a 2.8GHz CPU with 4 cores and 8 logical processors
My main console application is as below
static void Main(string[] args) => MainAsync(args).GetAwaiter().GetResult();
static async Task MainAsync(string[] args)
{
var startTime = DateTime.UtcNow;
Console.WriteLine($"Execution started at {DateTime.UtcNow:T}");
await ExecuteInParallelAsync(args).ConfigureAwait(false);
Console.WriteLine($"Executions completed at {DateTime.UtcNow:T}");
var endTime = DateTime.UtcNow;
var duration = (endTime - startTime);
Console.WriteLine($"Execution took {duration.TotalMilliseconds} milliseconds {duration.TotalSeconds} seconds");
Console.WriteLine("Press Any Key to close");
Console.ReadKey();
}
Where ExecuteInParallelAsync is the method that does the work...
private static async Task ExecuteInParallelAsync(string[] args)
{
var executionNumbers = new List<int>();
var executions = 5;
for (var executionNumber = 1; executionNumber <= executions; executionNumber++)
{
executionNumbers.Add(executionNumber);
}
await executionNumbers.ParallelForEachAsync(async executionNumber =>
{
Console.WriteLine($"Execution {executionNumber} of {executions} {DateTime.UtcNow:T}");
ExecuteSampleModel();
Console.WriteLine($"Execution {executionNumber} complete {DateTime.UtcNow:T}");
}).ConfigureAwait(false);
}
ExecuteSampleModel runs the Python model...
IModelResponse GetResponse()
{
_actualResponse = new ModelResponse();
var fileName = $#"main.py";
var p = new Process();
p.StartInfo = new ProcessStartInfo(#"C:\Program Files (x86)\Microsoft Visual Studio\Shared\Python36_64\python.exe", fileName)
{
WorkingDirectory = RootFolder,
RedirectStandardOutput = true,
UseShellExecute = false,
CreateNoWindow = true
};
p.Start();
_actualResponse.RawResponseFromModel = p.StandardOutput.ReadToEnd();
p.WaitForExit();
return _actualResponse;
}
As you can see, I am asking this model to be executed 5 times
When I use the debugger it appears as though even though I am using ParalellForEach (introduced by the AsyncEnumerator package) this is not being run in parallel
I thought that each iteration is run on its own thread?
Each Python Model execution takes 5 seconds.
Running in parallel I would expect the whole process to be done in 15 seconds or so but it actually takes 34 seconds
The Console.WriteLines added before and after the call to GetResponse show that the first call is starting, being executed in full, then the second is starting, etc
Is this something to do with me calling Process.Start?
Can anyone see anything wrong with this?
Paul

To make the answer useful here is explanation what happened with async code. Omitting lot of details which aren't so important from standpoint of explanation the code inside ParallelForEachAsync loop looks like as follows:
// some preparations
...
var itemIndex = 0L;
while (await enumerator.MoveNextAsync(cancellationToken).ConfigureAwait(false))
{
...
Task itemActionTask = null;
try
{
itemActionTask = asyncItemAction(enumerator.Current, itemIndex);
}
catch (Exception ex)
{
// some exception handling
}
...
itemIndex++;
}
where asyncItemAction has type Func<T, long, Task> and it's a wrapper around custom asynchronous action with type Func<T, Task> which is passed as parameter to the ParallelForEachAsync call (the wrapper adds indexing functionality). The loop code just calls this action in order to obtain a task which would represent the asynchronous operation promise to wait for its completion. In case of given code example the custom action
async executionNumber =>
{
Console.WriteLine($"Execution {executionNumber} of {executions}{DateTime.UtcNow:T}");
ExecuteSampleModel();
Console.WriteLine($"Execution {executionNumber} complete {DateTime.UtcNow:T}");
}
contains no asynchronous code but prefix async allows compiler to generate state machine with method which returns some Task which makes this code compliant (from the syntax standpoint) with custom action call inside the loop.
The important thing that code inside the loop expects this operation to be asynchronous which implies that the operation implicitly is split into synchronous part which will be executed along with asyncItemAction(enumerator.Current, itemIndex) call and at least one (one or more depending on number of awaits inside) asynchronous parts which can be executed during iterating over other loop items. The following pseudo-code gives an idea of that:
{
... synchronous part
await SomeAsyncOperation();
... asynchronous part
}
In this particular case there is no async part in the custom action at all which consequently means that the call
itemActionTask = asyncItemAction(enumerator.Current, itemIndex);
will be executed synchronously and the next iteration inside the loop won't start until asyncItemAction completes the entire custom action execution.
That's why the switching off of the asynchrony in the code and using simple parallelism helps.

Related

Multi-threading in a foreach loop

I have read a few stackoverflow threads about multi-threading in a foreach loop, but I am not sure I am understanding and using it right.
I have tried multiple scenarios, but I am not seeing much increase in performance.
Here is what I believe runs Asynchronous tasks, but running synchronously in the loop using a single thread:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
}
stopWatch.Stop();
Here is what I hoped to be running Asynchronously (still using a single thread) - I would have expected some speed improvement already here:
List<Task<ExchangeTicker>> exchTkrs = new List<Task<ExchangeTicker>>();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
exchTkrs.Add(selectedApi.GetTickerAsync(symbol));
}
}
ExchangeTicker[] retTickers = await Task.WhenAll(exchTkrs);
stopWatch.Stop();
Here is what I would have hoped to run Asynchronously in Multi-thread:
stopWatch.Start();
Parallel.ForEach(selectedApis, async (IExchangeAPI selectedApi) =>
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
});
stopWatch.Stop();
Stop watch results interpreted as follows:
Console.WriteLine("Time elapsed (ns): {0}", stopWatch.Elapsed.TotalMilliseconds * 1000000);
Console outputs:
Time elapsed (ns): 4183308100
Time elapsed (ns): 4183946299.9999995
Time elapsed (ns): 4188032599.9999995
Now, the speed improvement looks minuscule. Am I doing something wrong or is that more or less what I should be expecting? I suppose writing to files would be a better to check that.
Would you mind also confirming I am interpreting the different use cases correctly?
Finally, using a foreach loop in order to get the ticker from multiple platforms in parallel may not be the best approach. Suggestions on how to improve this would be welcome.
EDIT
Note that I am using the ExchangeSharp code base that you can find here
Here is what the GerTickerAsync() method looks like:
public virtual async Task<ExchangeTicker> GetTickerAsync(string marketSymbol)
{
marketSymbol = NormalizeMarketSymbol(marketSymbol);
return await Cache.CacheMethod(MethodCachePolicy, async () => await OnGetTickerAsync(marketSymbol), nameof(GetTickerAsync), nameof(marketSymbol), marketSymbol);
}
For the Kraken API, you then have:
protected override async Task<ExchangeTicker> OnGetTickerAsync(string marketSymbol)
{
JToken apiTickers = await MakeJsonRequestAsync<JToken>("/0/public/Ticker", null, new Dictionary<string, object> { { "pair", NormalizeMarketSymbol(marketSymbol) } });
JToken ticker = apiTickers[marketSymbol];
return await ConvertToExchangeTickerAsync(marketSymbol, ticker);
}
And the Caching method:
public static async Task<T> CacheMethod<T>(this ICache cache, Dictionary<string, TimeSpan> methodCachePolicy, Func<Task<T>> method, params object?[] arguments) where T : class
{
await new SynchronizationContextRemover();
methodCachePolicy.ThrowIfNull(nameof(methodCachePolicy));
if (arguments.Length % 2 == 0)
{
throw new ArgumentException("Must pass function name and then name and value of each argument");
}
string methodName = (arguments[0] ?? string.Empty).ToStringInvariant();
string cacheKey = methodName;
for (int i = 1; i < arguments.Length;)
{
cacheKey += "|" + (arguments[i++] ?? string.Empty).ToStringInvariant() + "=" + (arguments[i++] ?? string.Empty).ToStringInvariant("(null)");
}
if (methodCachePolicy.TryGetValue(methodName, out TimeSpan cacheTime))
{
return (await cache.Get<T>(cacheKey, async () =>
{
T innerResult = await method();
return new CachedItem<T>(innerResult, CryptoUtility.UtcNow.Add(cacheTime));
})).Value;
}
else
{
return await method();
}
}

At first it should be pointed out that what you are trying to achieve is performance, not asynchrony. And you are trying to achieve it by running multiple operations concurrently, not in parallel. To keep the explanation simple I'll use a simplified version of your code, and I'll assume that each operation is a direct web request, without an intermediate caching layer, and with no dependencies to values existing in dictionaries.
foreach (var symbol in selectedSymbols)
{
var ticker = await selectedApi.GetTickerAsync(symbol);
}
The above code runs the operations sequentially. Each operation starts after the completion of the previous one.
var tasks = new List<Task<ExchangeTicker>>();
foreach (var symbol in selectedSymbols)
{
tasks.Add(selectedApi.GetTickerAsync(symbol));
}
var tickers = await Task.WhenAll(tasks);
The above code runs the operations concurrently. All operations start at once. The total duration is expected to be the duration of the longest running operation.
Parallel.ForEach(selectedSymbols, async symbol =>
{
var ticker = await selectedApi.GetTickerAsync(symbol);
});
The above code runs the operations concurrently, like the previous version with Task.WhenAll. It offers no advantage, while having the huge disadvantage that you no longer have a way to await the operations to complete. The Parallel.ForEach method will return immediately after launching the operations, because the Parallel class doesn't understand async delegates (it does not accept Func<Task> lambdas). Essentially there are a bunch of async void lambdas in there, that are running out of control, and in case of an exception they will bring down the process.
So the correct way to run the operations concurrently is the second way, using a list of tasks and the Task.WhenAll. Since you've already measured this method and haven't observed any performance improvements, I am assuming that there something else that serializes the concurrent operations. It could be something like a SemaphoreSlim hidden somewhere in your code, or some mechanism on the server side that throttles your requests. You'll have to investigate further to find where and why the throttling happens.

In general, when you do not see an increase by multi threading, it is because your task is not CPU limited or large enough to offset the overhead.
In your example, i.e.:
selectedApi.GetTickerAsync(symbol);
This can hae 2 reasons:
1: Looking up the ticker is brutally fast and it should not be an async to start with. I.e. when you look it up in a dictionary.
2: This is running via a http connection where the runtime is LIMITING THE NUMBER OF CONCURRENT CALLS. Regardless how many tasks you open, it will not use more than 4 at the same time.
Oh, and 3: you think async is using threads. It is not. It is particularly not the case in a codel ike this:
await selectedApi.GetTickerAsync(symbol);
Where you basically IMMEDIATELY WAIT FOR THE RESULT. There is no multi threading involved here at all.
foreach (IExchangeAPI selectedApi in selectedApis) {
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
} }
This is linear non threaded code using an async interface to not block the current thread while the (likely expensive IO) operation is in place. It starts one, THEN WAITS FOR THE RESULT. No 2 queries ever start at the same time.
If you want a possible (just as example) more scalable way:
In the foreach, do not await but add the task to a list of tasks.
Then start await once all the tasks have started. Likein a 2nd loop.
WAY not perfect, but at least the runtime has a CHANCE to do multiple lookups at the same time. Your await makes sure that you essentially run single threaded code, except async, so your thread goes back into the pool (and is not waiting for results), increasing your scalability - an item possibly not relevant in this case and definitely not measured in your test.

Thread.Sleep into a Task

I've the following code:
static void Main(string[] args)
{
IEnumerable<int> threadsIds = Enumerable.Range(1, 1000);
DateTime globalStart = DateTime.Now;
Console.WriteLine("{0:s.fff} Starting tasks", globalStart);
Parallel.ForEach(threadsIds, (threadsId) =>
{
DateTime taskStart = DateTime.Now;
const int sleepDuration = 1000;
Console.WriteLine("{1:s.fff} Starting task {0}, sleeping for {2}", threadsId, taskStart, sleepDuration);
Thread.Sleep(sleepDuration);
DateTime taskFinish = DateTime.Now;
Console.WriteLine("{1:s.fff} Ending task {0}, task duration {2}", threadsId, taskFinish, taskFinish- taskStart);
});
DateTime globalFinish= DateTime.Now;
Console.WriteLine("{0:s.fff} Tasks finished. Total duration: {1}", globalFinish, globalFinish-globalStart);
Console.ReadLine();
}
Currently when I run it, it takes ~60seconds to run it. For what I understand, it's because .Net doesn't create one thread per task but some threads for all the Tasks, and when I do the Thread.Sleep, I prevent this thread to execute some other tasks.
In my real case, I've some work to do in parallel, and in case of failure, I've to wait some amount of time before trying again.
I'm looking something else than the Thread.Sleep, that would allow other tasks to run during the "sleep time" of other tasks.
Unfortunately, I'm currently running .Net 4, which prevent me to use async and await(which I guess could have helped me in this case.
Ps, I got the same results by:
putting Task.Delay(sleepDuration).Wait()
Not using Parallel.Foreach, but a foreach with a Task.Factory.StartNew
Ps2, I know that I can do my real case differently, but I'm very interessted to understand how it could be achieved that way.

You are on the right path. Task.Delay(timespan) is the solution for your problem. Since you cannot use async/await, you have to write a bit more code to achieve the desired result.
Think about using Task.ContinueWith() method, for example:
Task.Run(() => { /* code before Thread.Sleep */ })
.ContinueWith(task => Task.Delay(sleepDuration)
.ContinueWith(task2 => { /* code after Thread.Sleep */ }));
Also you will need create a class to make local method variables accessible across subtasks.
If you want to create a task that will run polling every second some condition, you could try the following code:
Task PollTask(Func<bool> condition)
{
TaskCompletionSource<bool> tcs = new TaskCompletionSource<bool>();
PollTaskImpl(tcs, condition);
return tcs.Task;
}
void PollTaskImpl(TaskCompletionSource<bool> tcs, Func<bool> condition)
{
if (condition())
tcs.SetResult(true);
else
Task.Delay(1000).ContinueWith(_ => PollTaskImpl(tcs, condition));
}
Don't worry about creating new task every second - ContinueWith and async/await methods do the same thing internally.

Random tasks from Task.Factory.StartNew never finishes

I am using Async await with Task.Factory method.
public async Task<JobDto> ProcessJob(JobDto jobTask)
{
try
{
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
await T;
}
This method I am calling inside a loop like this
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
What I notice is that new tasks opens up inside Process explorer and they also start working (based on log file). however out of 10 sometimes 8 or sometimes 7 finishes. Rest of them just never come back.
why would that be happening ?
Are they timing out ? Where can I set timeout for my tasks ?
UPDATE
Basically above, I would like each Task to start running as soon as they are called and wait for the response on AWAIT T keyword. I am assuming here that once they finish each of them will come back at Await T and do the next action. I am alraedy seeing this result for 7 out of 10 tasks but 3 of them are not coming back.
Thanks

It is hard to say what the issues is without the rest if the code, but you code can be simplified by making ProcessJob synchronous and then calling Task.Run with it.
public JobDto ProcessJob(JobDto jobTask)
{
JobWorker jobWorker = new JobWorker();
return jobWorker.Execute(jobTask);
}
Start tasks and wait for all tasks to finish. Prefer using Task.Run rather than Task.Factory.StartNew as it provides more favourable defaults for pushing work to the background. See here.
for(int i=0; i < jobList.Count(); i++)
{
tasks[i] = Task.Run(() => ProcessJob(jobList[i]));
}
try
{
await Task.WhenAll(tasks);
}
catch(Exception ex)
{
// handle exception
}

First, let's make a reproducible version of your code. This is NOT the best way to achieve what you are doing, but to show you what is happening in your code!
I'll keep the code almost same as your code, except I'll use simple int rather than your JobDto and on completion of the job Execute() I'll write in a file that we can verify later. Here's the code
public class SomeMainClass
{
public void StartProcessing()
{
var jobList = Enumerable.Range(1, 10).ToArray();
var tasks = new Task[10];
//[1] start 10 jobs, one-by-one
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
//[4] here we have 10 awaitable Task in tasks
//[5] do all other unrelated operations
Thread.Sleep(1500); //assume it works for 1.5 sec
// Task.WaitAll(tasks); //[6] wait for tasks to complete
// The PROCESS IS COMPLETE here
}
public async Task ProcessJob(int jobTask)
{
try
{
//[2] start job in a ThreadPool, Background thread
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
});
//[3] await here will keep context of calling thread
await T; //... and release the calling thread
}
catch (Exception) { /*handle*/ }
}
}
public class JobWorker
{
static object locker = new object();
const string _file = #"C:\YourDirectory\out.txt";
public void Execute(int jobTask) //on complete, writes in file
{
Thread.Sleep(500); //let's assume does something for 0.5 sec
lock(locker)
{
File.AppendAllText(_file,
Environment.NewLine + "Writing the value-" + jobTask);
}
}
}
After running just the StartProcessing(), this is what I get in the file
Writing the value-4
Writing the value-2
Writing the value-3
Writing the value-1
Writing the value-6
Writing the value-7
Writing the value-8
Writing the value-5
So, 8/10 jobs has completed. Obviously, every time you run this, the number and order might change. But, the point is, all the jobs did not complete!
Now, if I un-comment the step [6] Task.WaitAll(tasks);, this is what I get in my file
Writing the value-2
Writing the value-3
Writing the value-4
Writing the value-1
Writing the value-5
Writing the value-7
Writing the value-8
Writing the value-6
Writing the value-9
Writing the value-10
So, all my jobs completed here!
Why the code is behaving like this, is already explained in the code-comments. The main thing to note is, your tasks run in ThreadPool based Background threads. So, if you do not wait for them, they will be killed when the MAIN process ends and the main thread exits!!
If you still don't want to await the tasks there, you can return the list of tasks from this first method and await the tasks at the very end of the process, something like this
public Task[] StartProcessing()
{
...
for (int i = 0; i < jobList.Count(); i++)
{
tasks[i] = ProcessJob(jobList[i]);
}
...
return tasks;
}
//in the MAIN METHOD of your application/process
var tasks = new SomeMainClass().StartProcessing();
// do all other stuffs here, and just at the end of process
Task.WaitAll(tasks);
Hope this clears all confusion.

It's possible your code is swallowing exceptions. I would add a ContineWith call to the end of the part of the code that starts the new task. Something like this untested code:
var T = Task.Factory.StartNew(() =>
{
JobWorker jobWorker = new JobWorker();
jobWorker.Execute(jobTask);
}).ContinueWith(tsk =>
{
var flattenedException = tsk.Exception.Flatten();
Console.Log("Exception! " + flattenedException);
return true;
});
},TaskContinuationOptions.OnlyOnFaulted); //Only call if task is faulted
Another possibility is that something in one of the tasks is timing out (like you mentioned) or deadlocking. To track down whether a timeout (or maybe deadlock) is the root cause, you could add some timeout logic (as described in this SO answer):
int timeout = 1000; //set to something much greater than the time it should take your task to complete (at least for testing)
var task = TheMethodWhichWrapsYourAsyncLogic(cancellationToken);
if (await Task.WhenAny(task, Task.Delay(timeout, cancellationToken)) == task)
{
// Task completed within timeout.
// Consider that the task may have faulted or been canceled.
// We re-await the task so that any exceptions/cancellation is rethrown.
await task;
}
else
{
// timeout/cancellation logic
}
Check out the documentation on exception handling in the TPL on MSDN.

Execute multiple CMD tasks and update asynchronously

I'm implementing a C# application. I need to execute a program on multiple remote machines at the same time. To do so I'm using PSExec over a CMD with multithreading. Basically, for each machine, I start a thread that starts a CMD process. Depending on the result of the program executed remotely I'd like to either take an action or kill it if it takes more than x minutes (hope that makes sense).
The issue I've got is that I don't really know how to control for how long the process has been running other than using WaitForExit and, that doesn't really let me go multi-threading as it waits till the CMD call has finished.
I'm sure there must be a way of doing this but I cannot really figure it out. Could anyone please help me?
Here is my code (I'm new at c# coding so might not be the best code, feel free to correct any part of it you consider it is not right):
public async void BulkExecution()
{
//Some code
foreach (string machine in Machines)
{
//more code to work out the CMDline and other duties.
var result = Task.Factory.StartNew(r => ExecutePsexec((string)r, RunBeforeKillMsec), CMDLine);
await result;
}
//More Code
}
private static void ExecutePsexec(string CMDline, int RunBeforeKillMsec)
{
Process compiler = new Process();
compiler.StartInfo.FileName = "psexec.exe";
compiler.StartInfo.Arguments = CMDline;
compiler.StartInfo.UseShellExecute = false;
compiler.StartInfo.RedirectStandardOutput = true;
compiler.Start();
if (!compiler.WaitForExit(RunBeforeKillMsec))
{
ExecutePSKill(CMDline);
}
else
{
//Some Actions here
Common.Log(LogFile, CMDline.Split(' ')[0] + " finished successfully");
}
}

ExecutePsexec runs in a separate task. All such tasks are independent. await result; is what sequences them. You need to remove it.

Async void methods should be avoided. You should change the signature of the BulkExecution method to return a Task, so that you can await it and handle any exceptions that may occur. Inside the method create a Task for each machine, and then await all tasks with the Task.WhenAll method:
public async Task BulkExecution()
{
//Some code
Task[] tasks = Machines.Select(machine =>
{
//more code to work out the CMDline and other duties.
return Task.Run(r => ExecutePsexec(CMDLine, ExecutionTimeoutMsec));
}).ToArray();
await Task.WhenAll(tasks);
//More Code
}

Async and Await - How is order of execution maintained?

I am actually reading some topics about the Task Parallel Library and the asynchronous programming with async and await. The book "C# 5.0 in a Nutshell" states that when awaiting an expression using the await keyword, the compiler transforms the code into something like this:
var awaiter = expression.GetAwaiter();
awaiter.OnCompleted (() =>
{
var result = awaiter.GetResult();
Let's assume, we have this asynchronous function (also from the referred book):
async Task DisplayPrimeCounts()
{
for (int i = 0; i < 10; i++)
Console.WriteLine (await GetPrimesCountAsync (i*1000000 + 2, 1000000) +
" primes between " + (i*1000000) + " and " + ((i+1)*1000000-1));
Console.WriteLine ("Done!");
}
The call of the 'GetPrimesCountAsync' method will be enqueued and executed on a pooled thread. In general invoking multiple threads from within a for loop has the potential for introducing race conditions.
So how does the CLR ensure that the requests will be processed in the order they were made? I doubt that the compiler simply transforms the code into the above manner, since this would decouple the 'GetPrimesCountAsync' method from the for loop.

Just for the sake of simplicity, I'm going to replace your example with one that's slightly simpler, but has all of the same meaningful properties:
async Task DisplayPrimeCounts()
{
for (int i = 0; i < 10; i++)
{
var value = await SomeExpensiveComputation(i);
Console.WriteLine(value);
}
Console.WriteLine("Done!");
}
The ordering is all maintained because of the definition of your code. Let's imagine stepping through it.
This method is first called
The first line of code is the for loop, so i is initialized.
The loop check passes, so we go to the body of the loop.
SomeExpensiveComputation is called. It should return a Task<T> very quickly, but the work that it'd doing will keep going on in the background.
The rest of the method is added as a continuation to the returned task; it will continue executing when that task finishes.
After the task returned from SomeExpensiveComputation finishes, we store the result in value.
value is printed to the console.
GOTO 3; note that the existing expensive operation has already finished before we get to step 4 for the second time and start the next one.
As far as how the C# compiler actually accomplishes step 5, it does so by creating a state machine. Basically every time there is an await there's a label indicating where it left off, and at the start of the method (or after it's resumed after any continuation fires) it checks the current state, and does a goto to the spot where it left off. It also needs to hoist all local variables into fields of a new class so that the state of those local variables is maintained.
Now this transformation isn't actually done in C# code, it's done in IL, but this is sort of the morale equivalent of the code I showed above in a state machine. Note that this isn't valid C# (you cannot goto into a a for loop like this, but that restriction doesn't apply to the IL code that is actually used. There are also going to be differences between this and what C# actually does, but is should give you a basic idea of what's going on here:
internal class Foo
{
public int i;
public long value;
private int state = 0;
private Task<int> task;
int result0;
public Task Bar()
{
var tcs = new TaskCompletionSource<object>();
Action continuation = null;
continuation = () =>
{
try
{
if (state == 1)
{
goto state1;
}
for (i = 0; i < 10; i++)
{
Task<int> task = SomeExpensiveComputation(i);
var awaiter = task.GetAwaiter();
if (!awaiter.IsCompleted)
{
awaiter.OnCompleted(() =>
{
result0 = awaiter.GetResult();
continuation();
});
state = 1;
return;
}
else
{
result0 = awaiter.GetResult();
}
state1:
Console.WriteLine(value);
}
Console.WriteLine("Done!");
tcs.SetResult(true);
}
catch (Exception e)
{
tcs.SetException(e);
}
};
continuation();
}
}
Note that I've ignored task cancellation for the sake of this example, I've ignored the whole concept of capturing the current synchronization context, there's a bit more going on with error handling, etc. Don't consider this a complete implementation.

The call of the 'GetPrimesCountAsync' method will be enqueued and executed on a pooled thread.
No. await does not initiate any kind of background processing. It waits for existing processing to complete. It is up to GetPrimesCountAsync to do that (e.g. using Task.Run). It's more clear this way:
var myRunningTask = GetPrimesCountAsync();
await myRunningTask;
The loop only continues when the awaited task has completed. There is never more than one task outstanding.
So how does the CLR ensure that the requests will be processed in the order they were made?
The CLR is not involved.
I doubt that the compiler simply transforms the code into the above manner, since this would decouple the 'GetPrimesCountAsync' method from the for loop.
The transform that you shows is basically right but notice that the next loop iteration is not started right away but in the callback. That's what serializes execution.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.