Asynchronous Tasks take too much time

Asynchronous Tasks take too much time - c#

I have been trying make an asynchronous approach to my CPU-bound function which compute some aggregate functions. The thing is that there is some Deadlock (I suppose), because the time of calculation is too different. I am reallz newbie in this Task Parallel world, I also read Stephem Cleary articles but I am still unsure of all aspect this asynchronous approach.
My Code:
private static void Main(string[] args)
{
PIServer server = ConnectToDefaultPIServer();
AFTimeRange timeRange = new AFTimeRange("1/1/2012", "6/30/2012");
Program p = new Program();
for (int i = 0; i < 10; i++)
{
p.TestAsynchronousCall(server, timeRange);
//p.TestAsynchronousCall(server, timeRange).Wait();-same results
}
Console.WriteLine("Main check-disconnected done");
Console.ReadKey();
}
private async Task TestAsynchronousCall(PIServer server, AFTimeRange timeRange)
{
AsyncClass asyn;
for (int i = 0; i < 1; i++)
{
asyn = new AsyncClass();
await asyn.DoAsyncTask(server, timeRange);
//asyn.DoAsyncTask(server, timeRange);-same results
}
}
public async Task DoAsyncTask(PIServer server, AFTimeRange timeRange)
{
var timeRanges = DivideTheTimeRange(timeRange);
Task<Dictionary<PIPoint, AFValues>>[] tasksArray = new Task<Dictionary<PIPoint, AFValues>>[2];
tasksArray[0] = (Task.Run(() => CalculationClass.AverageValueOfTagPerDay(server, timeRanges[0])));
// tasksArray[1] = tasksArray[0].ContinueWith((x) => CalculationClass.AverageValueOfTagPerDay(server, timeRanges[1]));
tasksArray[1] = (Task.Run(() => CalculationClass.AverageValueOfTagPerDay(server, timeRanges[1])));
Task.WaitAll(tasksArray);
//await Task.WhenAll(tasksArray); -same results
for (int i = 0; i < tasksArray.Length; i++)
{
Program.Show(tasksArray[i].Result);
}
}
I measure time throught Stopwatch in AverageValueOfTagPerDay functions. This function is synchronous (Is that a problem?). Each Task take 12 seconds. But when I uncommented the line and use ContinueWith() approach, these Tasks take 5-6 seconds each(which is desirable). How is it possible?
More strange is that when I set the for loop in Main() on 10, sometimes it takes 5 seconds as well as when I use ContinueWith(). So I guess somewhere is deadlock but I am unable to find that.
Sorry for english, I got still problem make good senteces when I try explain some difficulties.

I have been trying make an asynchronous approach to my CPU-bound function which compute some aggregate functions.
"Asynchronous" and "CPU-bound" are not terms that go together. If you have a CPU-bound process, then you should use parallel technologies (Parallel, Parallel LINQ, TPL Dataflow).
I am reallz newbie in this Task Parallel world, I also read Stephem Cleary articles but I am still unsure of all aspect this asynchronous approach.
Possibly because I do not cover parallel technologies in any of my articles or blog posts. :) I do cover them in my book, but not online. My online work focuses on asynchrony, which is ideal for I/O-based operations.
To solve your problem, you should use a parallel approach:
public Dictionary<PIPoint, AFValues>[] DoTask(PIServer server, AFTimeRange timeRange)
{
var timeRanges = DivideTheTimeRange(timeRange);
var result = timeRanges.AsParallel().AsOrdered().
Select(range => CalculationClass.AverageValueOfTagPerDay(server, range)).
ToArray();
return result;
}
Of course, this approach assumes that PIServer is threadsafe. It also assumes that there's no I/O being done by the "server" class; if there is, then TPL Dataflow may be a better choice than Parallel LINQ.
If you are planning to use this code in a UI application and don't want to block the UI thread, then you can call the code asynchronously like this:
var results = await Task.Run(() => DoTask(server, timeRange));
foreach (var result in results)
Program.Show(result);

Related

How to improve the performance of aync?

I'm working on a problem where I have to delete records using a service call. The issue is that I have a for each loop where i have multiple await operations.This is making the operation take lot of time and performance is lacking
foreach(var a in list<long>b)
{
await _serviceresolver().DeleteOperationAsync(id,a)
}

The issue is that I have a for each loop where i have multiple await operations.
This is making the operation take lot of time and performance is lacking
The number one solution is to reduce the number of calls. This is often called "chunky" over "chatty". So if your service supports some kind of bulk-delete operation, then expose it in your service type and then you can just do:
await _serviceresolver().BulkDeleteOperationAsync(id, b);
But if that isn't possible, then you can at least use asynchronous concurrency. This is quite different from parallelism; you don't want to use Parallel or PLINQ.
var service = _serviceresolver();
var tasks = b.Select(a => service.DeleteOperationAsync(id, a)).ToList();
await Task.WhenAll(tasks);

I do not know what code is behind this DeleteOperationAsync, but for sure async/await isn't designed to speed things up. It was designated to "spare" threads (colloquially speaking)
The best would be to change the method to take as a parameter the whole list of ids - instead of taking and sending just one id.
And then to perform this async/await heavy operation only once for all of the ids.
If that is not possible, you could just run it in parallel using TPL (but it is ready the worst-case scenario - really:) )
Parallel.ForEach(listOfIdsToDelete,
async idToDelete => await _serviceresolver().DeleteOperationAsync(id,idToDelete)
);

You're waiting for each async operation to finish right now. If you can fire them all off concurrently, you can just call them without the await, or if you need to know when they finish, you can just fire them all off and then wait for them all to finish by tracking the tasks in a list:
List<Task> tasks = new List<Task>();
foreach (var a in List<long> b)
tasks.Add(_serviceresolveer().DeleteOperationAsync(id, a));
await Task.WhenAll(tasks);

You can use PLINQ (to leverage of all the processors of your machine) and the Task.WhenAll method (to no freeze the calling thread). In code, resulting something like this:
class Program {
static async Task Main(string[] args) {
var list = new List<long> {
4, 3, 2
};
var service = new Service();
var response =
from item in list.AsParallel()
select service.DeleteOperationAsync(item);
await Task.WhenAll(response);
}
}
public class Service {
public async Task DeleteOperationAsync(long value) {
await Task.Delay(2000);
Console.WriteLine($"Finished... {value}");
}
}

Multi-threading in a foreach loop

I have read a few stackoverflow threads about multi-threading in a foreach loop, but I am not sure I am understanding and using it right.
I have tried multiple scenarios, but I am not seeing much increase in performance.
Here is what I believe runs Asynchronous tasks, but running synchronously in the loop using a single thread:
Stopwatch stopWatch = new Stopwatch();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
}
stopWatch.Stop();
Here is what I hoped to be running Asynchronously (still using a single thread) - I would have expected some speed improvement already here:
List<Task<ExchangeTicker>> exchTkrs = new List<Task<ExchangeTicker>>();
stopWatch.Start();
foreach (IExchangeAPI selectedApi in selectedApis)
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
exchTkrs.Add(selectedApi.GetTickerAsync(symbol));
}
}
ExchangeTicker[] retTickers = await Task.WhenAll(exchTkrs);
stopWatch.Stop();
Here is what I would have hoped to run Asynchronously in Multi-thread:
stopWatch.Start();
Parallel.ForEach(selectedApis, async (IExchangeAPI selectedApi) =>
{
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
}
});
stopWatch.Stop();
Stop watch results interpreted as follows:
Console.WriteLine("Time elapsed (ns): {0}", stopWatch.Elapsed.TotalMilliseconds * 1000000);
Console outputs:
Time elapsed (ns): 4183308100
Time elapsed (ns): 4183946299.9999995
Time elapsed (ns): 4188032599.9999995
Now, the speed improvement looks minuscule. Am I doing something wrong or is that more or less what I should be expecting? I suppose writing to files would be a better to check that.
Would you mind also confirming I am interpreting the different use cases correctly?
Finally, using a foreach loop in order to get the ticker from multiple platforms in parallel may not be the best approach. Suggestions on how to improve this would be welcome.
EDIT
Note that I am using the ExchangeSharp code base that you can find here
Here is what the GerTickerAsync() method looks like:
public virtual async Task<ExchangeTicker> GetTickerAsync(string marketSymbol)
{
marketSymbol = NormalizeMarketSymbol(marketSymbol);
return await Cache.CacheMethod(MethodCachePolicy, async () => await OnGetTickerAsync(marketSymbol), nameof(GetTickerAsync), nameof(marketSymbol), marketSymbol);
}
For the Kraken API, you then have:
protected override async Task<ExchangeTicker> OnGetTickerAsync(string marketSymbol)
{
JToken apiTickers = await MakeJsonRequestAsync<JToken>("/0/public/Ticker", null, new Dictionary<string, object> { { "pair", NormalizeMarketSymbol(marketSymbol) } });
JToken ticker = apiTickers[marketSymbol];
return await ConvertToExchangeTickerAsync(marketSymbol, ticker);
}
And the Caching method:
public static async Task<T> CacheMethod<T>(this ICache cache, Dictionary<string, TimeSpan> methodCachePolicy, Func<Task<T>> method, params object?[] arguments) where T : class
{
await new SynchronizationContextRemover();
methodCachePolicy.ThrowIfNull(nameof(methodCachePolicy));
if (arguments.Length % 2 == 0)
{
throw new ArgumentException("Must pass function name and then name and value of each argument");
}
string methodName = (arguments[0] ?? string.Empty).ToStringInvariant();
string cacheKey = methodName;
for (int i = 1; i < arguments.Length;)
{
cacheKey += "|" + (arguments[i++] ?? string.Empty).ToStringInvariant() + "=" + (arguments[i++] ?? string.Empty).ToStringInvariant("(null)");
}
if (methodCachePolicy.TryGetValue(methodName, out TimeSpan cacheTime))
{
return (await cache.Get<T>(cacheKey, async () =>
{
T innerResult = await method();
return new CachedItem<T>(innerResult, CryptoUtility.UtcNow.Add(cacheTime));
})).Value;
}
else
{
return await method();
}
}

At first it should be pointed out that what you are trying to achieve is performance, not asynchrony. And you are trying to achieve it by running multiple operations concurrently, not in parallel. To keep the explanation simple I'll use a simplified version of your code, and I'll assume that each operation is a direct web request, without an intermediate caching layer, and with no dependencies to values existing in dictionaries.
foreach (var symbol in selectedSymbols)
{
var ticker = await selectedApi.GetTickerAsync(symbol);
}
The above code runs the operations sequentially. Each operation starts after the completion of the previous one.
var tasks = new List<Task<ExchangeTicker>>();
foreach (var symbol in selectedSymbols)
{
tasks.Add(selectedApi.GetTickerAsync(symbol));
}
var tickers = await Task.WhenAll(tasks);
The above code runs the operations concurrently. All operations start at once. The total duration is expected to be the duration of the longest running operation.
Parallel.ForEach(selectedSymbols, async symbol =>
{
var ticker = await selectedApi.GetTickerAsync(symbol);
});
The above code runs the operations concurrently, like the previous version with Task.WhenAll. It offers no advantage, while having the huge disadvantage that you no longer have a way to await the operations to complete. The Parallel.ForEach method will return immediately after launching the operations, because the Parallel class doesn't understand async delegates (it does not accept Func<Task> lambdas). Essentially there are a bunch of async void lambdas in there, that are running out of control, and in case of an exception they will bring down the process.
So the correct way to run the operations concurrently is the second way, using a list of tasks and the Task.WhenAll. Since you've already measured this method and haven't observed any performance improvements, I am assuming that there something else that serializes the concurrent operations. It could be something like a SemaphoreSlim hidden somewhere in your code, or some mechanism on the server side that throttles your requests. You'll have to investigate further to find where and why the throttling happens.

In general, when you do not see an increase by multi threading, it is because your task is not CPU limited or large enough to offset the overhead.
In your example, i.e.:
selectedApi.GetTickerAsync(symbol);
This can hae 2 reasons:
1: Looking up the ticker is brutally fast and it should not be an async to start with. I.e. when you look it up in a dictionary.
2: This is running via a http connection where the runtime is LIMITING THE NUMBER OF CONCURRENT CALLS. Regardless how many tasks you open, it will not use more than 4 at the same time.
Oh, and 3: you think async is using threads. It is not. It is particularly not the case in a codel ike this:
await selectedApi.GetTickerAsync(symbol);
Where you basically IMMEDIATELY WAIT FOR THE RESULT. There is no multi threading involved here at all.
foreach (IExchangeAPI selectedApi in selectedApis) {
if (exchangeSymbols.TryGetValue(selectedApi.Name, out symbol))
{
ticker = await selectedApi.GetTickerAsync(symbol);
} }
This is linear non threaded code using an async interface to not block the current thread while the (likely expensive IO) operation is in place. It starts one, THEN WAITS FOR THE RESULT. No 2 queries ever start at the same time.
If you want a possible (just as example) more scalable way:
In the foreach, do not await but add the task to a list of tasks.
Then start await once all the tasks have started. Likein a 2nd loop.
WAY not perfect, but at least the runtime has a CHANCE to do multiple lookups at the same time. Your await makes sure that you essentially run single threaded code, except async, so your thread goes back into the pool (and is not waiting for results), increasing your scalability - an item possibly not relevant in this case and definitely not measured in your test.

Task post-processing to start soon after 2 tasks are done

I have a core task retreiving me some core data and multiple other sub-tasks fetching extra data. Would like to run some enricher process to the core data as soon as the core task and any of the sub-task is ready. Would you know how to do so?
Thought about something like this but not sure it's the doing what I want:
// Starting the tasks
var coreDataTask = new Task(...);
var extraDataTask1 = new Task(...);
var extraDataTask2 = new Task(...);
coreDataTask.Start();
extraDataTask1.Start();
extraDataTask2.Start();
// Enriching the results
Task.WaitAll(coreDataTask, extraDataTask1);
EnrichCore(coreDataTask.Results, extraDataTask1.Results);
Task.WaitAll(coreDataTask, extraDataTask2);
EnrichCore(coreDataTask.Results, extraDataTask2.Results);
Also given the enrichement is on the same core object, guess I would need to lock it somewhere?
Thanks in advance!

Here is another idea taking advantage of Task.WhenAny() to detect when tasks are completing.
For this minimal example, I just assume that the core data and extra data are strings. But you can adjust for whatever your type is.
Also, I am not actually doing any processing. You would have to plug in your processing.
Also, an assumption I am making, that is not really clear, is that you are mostly trying to parallelize the gathering of your data because that's the expensive part, but that the enriching part is actually pretty fast. Based on that assumption, you'll notice that the tasks run in parallel to gather the core data and extra data. But as the data becomes available, the core data is enriched synchronously to avoid having to complicate the code with locking.
If you copy-paste the code below, you should be able to run it as is to see how it works.
public static void Main(string[] args)
{
StartWork().Wait();
}
private async static Task StartWork()
{
// start core and extra tasks
Task<string> coreDataTask = Task.Run(() => "core data" /* do something more complicated here */);
List<Task<string>> extraDataTaskList = new List<Task<string>>();
for (int i = 0; i < 10; i++)
{
int x = i;
extraDataTaskList.Add(Task.Run(() => "extra data " + x /* do something more complicated here */));
}
// wait for core data to be ready first.
StringBuilder coreData = new StringBuilder(await coreDataTask);
// enrich core as the extra data tasks complete.
while (extraDataTaskList.Count != 0)
{
Task<string> completedExtraDataTask = await Task.WhenAny(extraDataTaskList);
extraDataTaskList.Remove(completedExtraDataTask);
EnrichCore(coreData, await completedExtraDataTask);
}
Console.WriteLine(coreData.ToString());
}
private static void EnrichCore(StringBuilder coreData, string extraData)
{
coreData.Append(" enriched with ").Append(extraData);
}
EDIT: .NET 4.0 version
Here is how I would change it for .NET 4.0, while still retaining the same overall design:
Task.Run() becomes Task.Factory.StartNew()
Instead of doing await on tasks, I call Result, which is a blocking call that waits for the task to complete.
Use Task.WaitAny instead of Task.WhenAny, which is also a blocking call.
The design remains very similar. The one big difference between both versions of the code is that in the .NET 4.5 version, whenever there is an await, the current thread is free to do other work. In the .NET 4.0 version, whenever you call Task.Result or Task.WaitAny, the current thread blocks until the Task completes. It's possible that this difference is not really important to you. But if it is, just make sure to wrap and run the whole block of code in a background thread or task to free up your main thread.
The other difference is with the exception handling. With the .NET 4.5 version, if any of your tasks fails with an unhandled exception, the exception is automatically unwrapped and propagated in a very transparent manner. With the .NET 4.0 version, you'll be getting AggregateExceptions that you will have to unwrap and handle yourself. If this is a concern, make sure you test this beforehand so you know what to expect.
Personally, I try to avoid Task.ContinueWith whenever I can. It tends to make the code really ugly and hard to read.
public static void Main(string[] args)
{
// start core and extra tasks
Task<string> coreDataTask = Task.Factory.StartNew(() => "core data" /* do something more complicated here */);
List<Task<string>> extraDataTaskList = new List<Task<string>>();
for (int i = 0; i < 10; i++)
{
int x = i;
extraDataTaskList.Add(Task.Factory.StartNew(() => "extra data " + x /* do something more complicated here */));
}
// wait for core data to be ready first.
StringBuilder coreData = new StringBuilder(coreDataTask.Result);
// enrich core as the extra data tasks complete.
while (extraDataTaskList.Count != 0)
{
int indexOfCompletedTask = Task.WaitAny(extraDataTaskList.ToArray());
Task<string> completedExtraDataTask = extraDataTaskList[indexOfCompletedTask];
extraDataTaskList.Remove(completedExtraDataTask);
EnrichCore(coreData, completedExtraDataTask.Result);
}
Console.WriteLine(coreData.ToString());
}
private static void EnrichCore(StringBuilder coreData, string extraData)
{
coreData.Append(" enriched with ").Append(extraData);
}

I think what you probably want is "ContinueWith" (Documentation here : https://msdn.microsoft.com/en-us/library/dd270696(v=vs.110).aspx). That is as long as your enriching doesn't need to be done in a specific order.
The code would look something like the following :
var coreTask = new Task<object>(() => { return null; });
var enrichTask1 = new Task<object>(() => { return null; });
var enrichTask2 = new Task<object>(() => { return null; });
coreTask.Start();
coreTask.Wait();
//Create your continue tasks here with the data you want.
enrichTask1.ContinueWith(task => {/*Do enriching here with task.Result*/});
//Start all enricher tasks here.
enrichTask1.Start();
//Wait for all the tasks to complete here.
Task.WaitAll(enrichTask1);
You still need to run your CoreTask first as that's required to finish before all enriching tasks. But from there you can start all tasks, and tell them when they are done to "ContinueWith" doing something else.
You should also take a quick look in the "Enricher Pattern" that may be able to help you in general with what you want to achieve (Outside of threading). Examples like here : http://www.enterpriseintegrationpatterns.com/DataEnricher.html

C# async and tasks

I have a function that sends requests to search for information from a url. The search criteria is a list and the search iterates through each item and requests info from the url. To speed it up I divide the list into x subsets, and create a task for each subset. Then each subset sends 3 simultaneous requests, as follows:
This is the main entry point:
Search search = new Search();
await Task.Run(() => search.Start());
The Start function:
public void Search()
{
//Each subset is a List<T> ie where T is certain search criteria
//If originalList.Count = 30 and max items per subset is 10, then subsets will be 3 lists of 10 items each
var subsets = CreateSubsets(originalList);
List<Task> tasks = new List<Task>(subsets.Count);
for (int i = 0; i < subsets.Count; i++)
tasks.Add(Task.Factory.StartNew(() => SearchSubset(subsets[i]));
Task.WaitAll(tasks.ToArray());
foreach (Task task in tasks)
if (task != null)
task.Dispose();
}
private void SearchSubset(List<SearchCriteria> subset)
{
//Checking that i+1 and i+2 is within subset.Count-1 has been omitted
for (int i = 0; i < subset.Count; i+=3)
{
Task[] tasks = {Task.Factory.StartNew(() => SearchCriteria(subset[i])),
Task.Factory.StartNew(() => SearchCriteria(subset[i+1])),
Task.Factory.StartNew(() => SearchCriteria(subset[i+2]))};
//Wait & dispose like above
}
}
private void SearchCriteria(SearchCriteria criteria)
{
//SearchForCriteria uses WebRequest and WebResponse (callback)
//to query the url and return the response.content
var results = SearchForCriteria(criteria);
//process results...
}
The above code works fine and the search is quite fast. However, does the above code create too much overhead, and is there is more cleaner (or simpler) way to achieve the same results?

This is not the most efficient method, but if this is for a desktop application, efficiency isn't your primary concern anyway. So, unless you are actually seeing performance degradation from this code, you shouldn't change it.
That said, I would have approached this differently.
You're using the TPL to parallelize I/O-bound operations. You're using dynamic parallelism, the most complex kind; as Jeff Mercado commented, your code would be simpler and slightly more efficient if you used a higher-level parallelism abstraction such as Parallel or PLINQ).
However, any parallel approach is going to waste thread pool threads by blocking them. Since this is I/O-bound, I would recommend using async/await to make them concurrent.
If you want to do simple throttling, you can use SemaphoreSlim. I don't think you need to do throttling like this in addition to your subsets, but if you want an async equivalent to your existing code, it would look something like this:
public Task SearchAsync()
{
var subsets = CreateSubsets(originalList);
return Task.WhenAll(subsets.Select(subset => SearchSubsetAsync(subset)));
}
private Task SearchSubsetAsync(List<SearchCriteria> subset)
{
var semaphore = new SemaphoreSlim(3);
return Task.WhenAll(subset.Select(criteria => SearchCriteriaAsync(criteria, semaphore)));
}
private async Task SearchCriteriaAsync(SearchCriteria criteria, SemaphoreSlim semaphore)
{
await semaphore.WaitAsync();
try
{
// SearchForCriteriaAsync uses HttpClient (async).
var results = await SearchForCriteriaAsync(criteria);
// Consider returning results rather than processing them here.
}
finally
{
semaphore.Release();
}
}

Why does WebClient.DownloadStringTaskAsync() block ? - new async API/syntax/CTP

For some reason there is a pause after the program below starts. I believe that WebClient().DownloadStringTaskAsync() is the cause.
class Program
{
static void Main(string[] args)
{
AsyncReturnTask();
for (int i = 0; i < 15; i++)
{
Console.WriteLine(i);
Thread.Sleep(100);
}
}
public static async void AsyncReturnTask()
{
var result = await DownloadAndReturnTaskStringAsync();
Console.WriteLine(result);
}
private static async Task<string> DownloadAndReturnTaskStringAsync()
{
return await new WebClient().DownloadStringTaskAsync(new Uri("http://www.weather.gov"));
}
}
As far as I understand my program should start counting from 0 to 15 immediately. Am I doing something wrong?
I had the same problem with the original Netflix download sample (which you get with CTP) - after pressing the search button the UI first freezes - and after some time it is responsive while loadning the next movies. And I believe it didn't freeze in Anders Hejlsberg's presentation at PDC 2010.
One more thing. When instead of
return await new WebClient().DownloadStringTaskAsync(new Uri("http://www.weather.gov"));
I use my own method:
return await ReturnOrdinaryTask();
Which is:
public static Task<string> ReturnOrdinaryTask()
{
var t = Task.Factory.StartNew(() =>
{
for (int i = 0; i < 10; i++)
{
Console.WriteLine("------------- " + i.ToString());
Thread.Sleep(100);
}
return "some text";
});
return t;
}
It works as it should. I mean it doesn't load anything, but it starts immediately and doesn't block the main thread, while doing its work.
Edit
OK, what I believe right now is: the WebClient.DownloadStringTaskAsync function is screwed up. It should work without the initial blocking period, like this:
static void Main(string[] args)
{
WebClient cli = new WebClient();
Task.Factory.StartNew(() =>
{
cli.DownloadStringCompleted += (sender, e) => Console.WriteLine(e.Result);
cli.DownloadStringAsync(new Uri("http://www.weather.gov"));
});
for (int i = 0; i < 100; i++)
{
Console.WriteLine(i);
Thread.Sleep(100);
}
}

While your program does block for a while, it does resume execution in the for loop, before the result is returned from the remote server.
Remember that the new async API is still single-threaded. So WebClient().DownloadStringTaskAsync() still needs to run on your thread until the request has been prepared and sent to the server, before it can await and yield execution back to your program flow in Main().
I think the results you are seeing are due to the fact that it takes some time to create and send the request out from your machine. First when that has finished, the implementation of DownloadStringTaskAsync can wait for network IO and the remote server to complete, and can return execution to you.
On the other hand, your RunOrdinaryTask method just initializes a task and gives it a workload, and tells it to start. Then it returns immediately. That is why you don't see a delay when using RunOrdinaryTask.
Here are some links on the subject: Eric Lippert's blog (one of the language designers), as well as Jon Skeet's initial blog post about it. Eric has a series of 5 posts about continuation-passing style, which really is what async and await is really about. If you want to understand the new feature in detail, you might want to read Eric's posts about CPS and Async. Anyways, both links above does a good job on explaining a very important fact:
Asynchronous != parallel
In other words, async and await does not spin up new threads for you. They just lets you resume execution of your normal flow, when you are doing a blocking operation - times where your CPU would just sit and do nothing in a synchronous program, waiting for some external operation to complete.
Edit
Just to be clear about what is happening: DownloadStringTaskAsync sets up a continuation, then calls WebClient.DownloadStringAsync, on the same thread, and then yields execution back to your code. Therefore, the blocking time you are seeing before the loop starts counting, is the time it takes DownloadStringAsync to complete. Your program with async and await is very close to be the equivalent of the following program, which exhibits the same behaviour as your program: An initial block, then counting starts, and somewhere in the middle, the async op finishes and prints the content from the requested URL:
static void Main(string[] args)
{
WebClient cli = new WebClient();
cli.DownloadStringCompleted += (sender, e) => Console.WriteLine(e.Result);
cli.DownloadStringAsync(new Uri("http://www.weather.gov")); // Blocks until request has been prepared
for (int i = 0; i < 15; i++)
{
Console.WriteLine(i);
Thread.Sleep(100);
}
}
Note: I am by no means an expert on this subject, so I might be wrong on some points. Feel free to correct my understanding of the subject, if you think this is wrong - I just looked at the PDC presentation and played with the CTP last night.

Are you sure the issue isn't related to the proxy configuration settings being detected from IE/Registry/Somewhere Slow?
Try setting webClient.Proxy = null (or specifying settings in app.config) and your "blocking" period should be minimal.

Are you pressing F5 or CTLR+F5 to run it? With F5 there's a delay for VS just to search for the symbols for AsyncCtpLibrary.dll...

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.