I'm trying to create an asynchronous console app that does a some work on a collection. I have one version which uses parallel for loop another version that uses async/await. I expected the async/await version to work similar to parallel version but it executes synchronously. What am I doing wrong?
public class Program
{
public static void Main(string[] args)
{
var worker = new Worker();
worker.ParallelInit();
var t = worker.Init();
t.Wait();
Console.ReadKey();
}
}
public class Worker
{
public async Task<bool> Init()
{
var series = Enumerable.Range(1, 5).ToList();
foreach(var i in series)
{
Console.WriteLine("Starting Process {0}", i);
var result = await DoWorkAsync(i);
if (result)
{
Console.WriteLine("Ending Process {0}", i);
}
}
return true;
}
public async Task<bool> DoWorkAsync(int i)
{
Console.WriteLine("working..{0}", i);
await Task.Delay(1000);
return true;
}
public bool ParallelInit()
{
var series = Enumerable.Range(1, 5).ToList();
Parallel.ForEach(series, i =>
{
Console.WriteLine("Starting Process {0}", i);
DoWorkAsync(i);
Console.WriteLine("Ending Process {0}", i);
});
return true;
}
}
The way you're using the await keyword tells C# that you want to wait each time you pass through the loop, which isn't parallel. You can rewrite your method like this to do what you want, by storing a list of Tasks and then awaiting them all with Task.WhenAll.
public async Task<bool> Init()
{
var series = Enumerable.Range(1, 5).ToList();
var tasks = new List<Task<Tuple<int, bool>>>();
foreach (var i in series)
{
Console.WriteLine("Starting Process {0}", i);
tasks.Add(DoWorkAsync(i));
}
foreach (var task in await Task.WhenAll(tasks))
{
if (task.Item2)
{
Console.WriteLine("Ending Process {0}", task.Item1);
}
}
return true;
}
public async Task<Tuple<int, bool>> DoWorkAsync(int i)
{
Console.WriteLine("working..{0}", i);
await Task.Delay(1000);
return Tuple.Create(i, true);
}
Your code waits for each operation (using await) to finish before starting the next iteration.
Therefore, you don't get any parallelism.
If you want to run an existing asynchronous operation in parallel, you don't need await; you just need to get a collection of Tasks and call Task.WhenAll() to return a task that waits for all of them:
return Task.WhenAll(list.Select(DoWorkAsync));
public async Task<bool> Init()
{
var series = Enumerable.Range(1, 5);
Task.WhenAll(series.Select(i => DoWorkAsync(i)));
return true;
}
In C# 7.0 you can use semantic names to each of the members of the tuple, here is Tim S.'s answer using the new syntax:
public async Task<bool> Init()
{
var series = Enumerable.Range(1, 5).ToList();
var tasks = new List<Task<(int Index, bool IsDone)>>();
foreach (var i in series)
{
Console.WriteLine("Starting Process {0}", i);
tasks.Add(DoWorkAsync(i));
}
foreach (var task in await Task.WhenAll(tasks))
{
if (task.IsDone)
{
Console.WriteLine("Ending Process {0}", task.Index);
}
}
return true;
}
public async Task<(int Index, bool IsDone)> DoWorkAsync(int i)
{
Console.WriteLine("working..{0}", i);
await Task.Delay(1000);
return (i, true);
}
You could also get rid of task. inside foreach:
// ...
foreach (var (IsDone, Index) in await Task.WhenAll(tasks))
{
if (IsDone)
{
Console.WriteLine("Ending Process {0}", Index);
}
}
// ...
We can use async method in foreach loop to run async API calls.
public static void Main(string[] args)
{
List<ZoneDetails> lst = GetRecords();
foreach (var item in lst)
{
//For loop run asyn
var result = GetAPIData(item.ZoneId, item.fitnessclassid).Result;
if (result != null && result.EventHistoryId != null)
{
UpdateDB(result);
}
}
}
private static async Task<FODBrandChannelLicense> GetAPIData(int zoneId, int fitnessclassid)
{
HttpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", token);
var response = HttpClient.GetAsync(new Uri(url)).Result;
var content = response.Content.ReadAsStringAsync().Result;
var result = JsonConvert.DeserializeObject<Model>(content);
if (response.EnsureSuccessStatusCode().IsSuccessStatusCode)
{
Console.WriteLine($"API Call completed successfully");
}
return result;
}
To add to the already good answers here, it's always helpful to me to remember that the async method returns a Task.
So in the example in this question, each iteration of the loop has await. This causes the Init() method to return control to its caller with a Task<bool> - not a bool.
Thinking of await as just a magic word that causes execution state to be saved, then skipped to the next available line until ready, encourages confusion: "why doesn't the for loop just skip the line with await and go to the next statement?"
If instead you think of await as something more like a yield statement, that brings a Task with it when it returns control to the caller, in my opinion flow starts to make more sense: "the for loop stops at await, and returns control and the Task to the caller. The for loop won't continue until that is done."
Related
In the WPF .net core app there is the following:
An Observable Collection of items (itemObservCollection).
A static readonly HttpClient _httpclient
XML Responses
I am making a URL call to the api on each item in the observable collection (0 to 1000 items in collection). The return is XML. The XML is parsed using XElement. The property values in the observable collection are updated from the XML.
Task.Run is used to run the operation off the UI thread. Parallel.Foreach is used to make the calls in Parallel.
I feel I have made the solution overly complicated. Is there a way to simplify this? UpdateItems() is called from a button click.
private async Task UpdateItems()
{
try
{
await Task.Run(() => Parallel.ForEach(itemObservCollection, new ParallelOptions { MaxDegreeOfParallelism = 12 }, async item =>
{
try
{
var apiRequestString = $"http://localhost:6060/" + item.Name;
HttpResponseMessage httpResponseMessage = await _httpclient.GetAsync(apiRequestString);
var httpResponseStream = await httpResponseMessage.Content.ReadAsStreamAsync();
StringBuilder sb = new StringBuilder(1024);
XElement doc = XElement.Load(httpResponseStream);
foreach (var elem in doc.Descendants())
{
if (elem.Name == "ItemDetails")
{
var itemUpdate = itemObservCollection.FirstOrDefault(updateItem => updateItem.Name == item.Name);
if (itemUpdate != null)
{
itemUpdate.Price = decimal.Parse(elem.Attribute("Price").Value);
itemUpdate.Quantity = int.Parse(elem.Attribute("Quantity").Value);
}
}
}
}
catch (Exception ex)
{
LoggerTextBlock.Text = ('\n' + ex.ToString());
}
}));
}
catch (Exception ex)
{
LoggerTextBlock.Text = ('\n' + ex.ToString());
}
}
You could create an array of tasks and await them all using Task.WhenAll.
The following sample code kicks off a task per item in the ObservableCollection<int> and then wait asynchronously for all tasks to finish:
ObservableCollection<int> itemObservCollection =
new ObservableCollection<int>(Enumerable.Range(1, 10));
async Task SendAsync()
{
//query the HTTP API here...
await Task.Delay(1000);
}
await Task.WhenAll(itemObservCollection.Select(x => SendAsync()).ToArray());
If you want to limit the number of concurrent requests, you could either iterate through a subset of the source collecton to send requests in batches or use a SemaphoreSlim to limit the number of actual concurrent requests:
Task[] tasks = new Task[itemObservCollection.Count];
using (SemaphoreSlim semaphoreSlim = new SemaphoreSlim(12))
{
for (int i = 0; i < itemObservCollection.Count; ++i)
{
async Task SendAsync()
{
//query the HTTP API here...
try
{
await Task.Delay(5000);
}
finally
{
semaphoreSlim.Release();
}
}
await semaphoreSlim.WaitAsync();
tasks[i] = SendAsync();
}
await Task.WhenAll(tasks);
}
I have two loops that use SemaphoreSlim and a array of strings "Contents"
a foreachloop:
var allTasks = new List<Task>();
var throttle = new SemaphoreSlim(10,10);
foreach (string s in Contents)
{
await throttle.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
rootResponse.Add(await POSTAsync(s, siteurl, src, target));
}
finally
{
throttle.Release();
}
}));
}
await Task.WhenAll(allTasks);
a for loop:
var allTasks = new List<Task>();
var throttle = new SemaphoreSlim(10,10);
for(int s=0;s<Contents.Count;s++)
{
await throttle.WaitAsync();
allTasks.Add(
Task.Run(async () =>
{
try
{
rootResponse[s] = await POSTAsync(Contents[s], siteurl, src, target);
}
finally
{
throttle.Release();
}
}));
}
await Task.WhenAll(allTasks);
the first foreach loop runs well, but the for loops Task.WhenAll(allTasks) returns a OutOfRangeException and I want the Contents[] index and List index to match.
Can I fix the for loop? or is there a better approach?
This would fix your current problems
for (int s = 0; s < Contents.Count; s++)
{
var content = Contents[s];
allTasks.Add(
Task.Run(async () =>
{
await throttle.WaitAsync();
try
{
rootResponse[s] = await POSTAsync(content, siteurl, src, target);
}
finally
{
throttle.Release();
}
}));
}
await Task.WhenAll(allTasks);
However this is a fairly messy and nasty piece of code. This looks a bit neater
public static async Task DoStuffAsync(Content[] contents, string siteurl, string src, string target)
{
var throttle = new SemaphoreSlim(10, 10);
// local method
async Task<(Content, SomeResponse)> PostAsyncWrapper(Content content)
{
await throttle.WaitAsync();
try
{
// return a content and result pair
return (content, await PostAsync(content, siteurl, src, target));
}
finally
{
throttle.Release();
}
}
var results = await Task.WhenAll(contents.Select(PostAsyncWrapper));
// do stuff with your results pairs here
}
There are many other ways you could do this, PLinq, Parallel.For,Parallel.ForEach, Or just tidying up your captures in your loops like above.
However since you have an IO bound work load, and you have async methods that run it. The most appropriate solution is the async await pattern which neither Parallel.For,Parallel.ForEach cater for optimally.
Another way is TPL DataFlow library which can be found in the System.Threading.Tasks.Dataflow nuget package.
Code
public static async Task DoStuffAsync(Content[] contents, string siteurl, string src, string target)
{
async Task<(Content, SomeResponse)> PostAsyncWrapper(Content content)
{
return (content, await PostAsync(content, siteurl, src, target));
}
var bufferblock = new BufferBlock<(Content, SomeResponse)>();
var actionBlock = new TransformBlock<Content, (Content, SomeResponse)>(
content => PostAsyncWrapper(content),
new ExecutionDataflowBlockOptions
{
EnsureOrdered = false,
MaxDegreeOfParallelism = 100,
SingleProducerConstrained = true
});
actionBlock.LinkTo(bufferblock);
foreach (var content in contents)
actionBlock.Post(content);
actionBlock.Complete();
await actionBlock.Completion;
if (bufferblock.TryReceiveAll(out var result))
{
// do stuff with your results pairs here
}
}
Basically this creates a BufferBlock And TransformBlock, You pump your work load into the TransformBlock, it has degrees of parallel in its options, and it pushes them into the BufferBlock, you await completion and get your results.
Why Dataflow? because it deals with the async await, it has MaxDegreeOfParallelism, it designed for IO bound or CPU Bound workloads, and its extremely simple to use. Additionally, as most data is generally processed in many ways (in a pipeline), you can then use it to pipe and manipulate streams of data in sequence and parallel or in any way way you choose down the line.
Anyway good luck
I'm a bit new to async programming in C#, and struggling with a small but frustrating challenge.
I have a ASP.Net MVC WEB API that runs nicely. on the other side I have created this WEB API client method:
public async Task<DTOVArt> GetVArtFormArt(DTOArt art)
{
using (var client = GetHttpClient())
{
var response = await client.GetAsync("api/APIVArt/Get/" + art.ART_ID);
if (!response.IsSuccessStatusCode) return null;
var arts = await response.Content.ReadAsAsync<DTOVArt>();
return arts;
}
}
which seems to be work very fine...
This method is called from a WPF view model, an that's where my problem comes..
private DTOVObs TransFormAaretsGang(DTOAaretsGang aaretsGang)
{
var dtovObs = new DTOVObs();
using (var artservice = new ArtService())
{
artservice.GetVArtFormArt(new DTOArt() {ART_ID = aaretsGang.ART_ID}).ContinueWith(t =>
{
dtovObs.Familie_id = t.Result.Familie_id;
dtovObs.Gruppe_id = t.Result.Gruppe_id;
dtovObs.ART_ID = t.Result.ART_ID;
if (aaretsGang.Dato != null) dtovObs.Aarstal = aaretsGang.Dato.Value.Year;
return dtovObs;
});
}
return dtovObs;
}
The problem is that this last methods performs the return statement before it hits the ContinueWith statement block, that actually sets the values inside the class that should be returned.
Any attempt to do any kind of Wait() or using .Result instead of ContinueWith just blocks everything.
And if I do the return inside the ContinueWith block, the C# compiler says the method is missing a return statement, which is true.
That's the nature of async. Because your call is async, it will be executed later and the code below just continue executing.
Try adding an await and just remove the ContinueWith if this is the root of the call, usually it's an event handler:
private async Task<DTOVObs> TransFormAaretsGang(DTOAaretsGang aaretsGang)
{
var dtovObs = new DTOVObs();
DTOVArt Result = await artservice.GetVArtFormArt(new DTOArt() {ART_ID = aaretsGang.ART_ID});
dtovObs.Familie_id = Result.Familie_id;
dtovObs.Gruppe_id = Result.Gruppe_id;
dtovObs.ART_ID = Result.ART_ID;
if (aaretsGang.Dato != null)
dtovObs.Aarstal = aaretsGang.Dato.Value.Year;
return dtovObs;
}
If you still want to return an asynch Task so that any caller that calls this method can await the result, try:
private async Task<DTOVObs> TransFormAaretsGang(DTOAaretsGang aaretsGang)
{
using (var artservice = new ArtService())
{
return artservice.GetVArtFormArt(new DTOArt() {ART_ID = aaretsGang.ART_ID}).ContinueWith(t =>
{
dtovObs.Familie_id = t.Result.Familie_id;
dtovObs.Gruppe_id = t.Result.Gruppe_id;
dtovObs.ART_ID = t.Result.ART_ID;
if (aaretsGang.Dato != null) dtovObs.Aarstal = aaretsGang.Dato.Value.Year;
return dtovObs;
});
}
}
When you use async / await, you don't have to use ContinueWith anymore. ContinueWith means: wait until the previous is finished and use the result to do the next.
async await does this for you.
Suppose you have an async function. All async functions return either Task (for void return) or Task<TResult> if the return is of type TResult
private async Task<int> SlowAdditionAsync(int a, int b)
{
await Task.Delay(TimeSpan.FromSeconds(5); // causing the slow part
return a + b;
}
usage:
private async Task PerformSlowAddition()
{
int a = ...;
int b = ...;
int x =await SlowAditionAsync(a, b);
// the statement after the await, the task is finished, the result is in x.
// You can used the result as if you used Continuewith:
DisplayAddition(x);
}
Or if you want to do something else during the calculation:
private async Task PerformSlowAddition()
{
int a = ...;
int b = ...;
var taskSlowAddition = SlowAditionAsync(a, b);
DoSomethingElse(); // while the calculator does its thing
// now we need the result:
int x = await taskSlowAddition;
// no need to use ContinueWith, the next statement will be executed:
DisplayAddition(x);
}
Remember:
All functions that use a function that returns a Task or Task should be declared async
all async functions return Task is they return void or Task if they return TResult.
There is one exception to Task return: event handlers return void
After calling an async function, you can do other things.
When you need the result use await
you can only await a Task or a Task
The value of await Task is the TResult
There is only one async function that doesn't have to return a task and that is the eventhandler:
private async void OnButton1_Clicked(object sender, ...)
{
var taskX = DosomethingAsync(...)
DoSomethingElse();'
// now we need the result of taskX:
var x = await TaskX;
ProcessReault(x)
}
Note that although the event handler doesn't return a task it is still async
If you have some statements that needs to run in the background while your user interface keeps responsive, use Task.Factory.StartNew() or the more modern one Task.Run():
private int SlowCalculation(int a, int b)
{
// do something really difficult and slow
System.Threading.Thread.Sleep(TimeSpan.FromSeconds(5));
return a + b;
}
// make it async:
private async Task<int> SlowCalculationAsync(int a, int b)
{
return await Task.Run( () => SlowCalculation(a, b));
}
usage:
private async Task CalculateAsync()
{
int a = ...;
int b = ...;
int x = await SlowCalculationAsync(a, b);
Display(x);
}
private async void OnButton1_clicked(object sender, ...)
{
await CalculateAsync();
}
I am trying to learn TPL. I write to files in a parallel manner like this:
public async Task SaveToFilesAsync(string path, List<string> list, CancellationToken ct)
{
int count = 0;
foreach (var str in list)
{
string fullPath = path + #"\" + count.ToString() + "_element.txt";
using (var sw = File.CreateText(fullPath))
{
await sw.WriteLineAsync(str);
}
count++;
Log("Saved in thread: {0} to {1}",
Environment.CurrentManagedThreadId,
fullPath);
if (ct.IsCancellationRequested)
ct.ThrowIfCancellationRequested();
}
}
And call it like this:
var tasks = new List<Task>();
try
{
tasks.Add(SaveToFilesAsync(path, myListOfStrings, cts.Token));
}
catch (Exception ex)
{
Log("Failed to save: " + ex.Message);
throw;
}
tasks.Add(MySecondFuncAsync(), cts.Token);
//...
tasks.Add(MyLastFuncAsync(), cts.Token);
try
{
//Or should I call await Task.WhenAll(tasks) ? What should I call here?
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException ex)
{
foreach (var v in ex.InnerExceptions)
Error(ex.Message + " " + v.Message);
}
finally
{
cts.Dispose();
}
foreach (task in tasks)
{
// Now, how to print results from the tasks?
//Considering that all tasks return bool value,
//I need to do something like this:
if (task.Status != TaskStatus.Faulted)
Console.Writeline(task.Result);
else
Log("Error...");
}
My goal is to make all functions (SaveToFilesAsync, MySecondFuncAsync) run at the same time in a parallel manner, using all cores on the computer and saving time. But when I see logs of SaveToFilesAsync I realize that saving to files always occur in the same thread, not parallel. What am I doing wrong? Second question: How can I get Task.Result from each task in task list at the end of the code? If the second function returns Task(bool), how can I get bool value in my code? Also, all comments about my code are very welcome since I am new at TPL.
You need to replace the foreach loop, which runs sequentially from the first to the last item, with a Parallel.ForEach() loop that can be configured for parallelism, or Parallel.For() which gives you the index of the currently processed item. Since you need to use a counter for the files names, you will need to modify the list parameter to provide the file number, which you populate when creating your list, or use the index provided by Parallel.For(). Another option would be to have a long variable on which you could do an Interlocked.Increment after creating the file name but I'm not sure that would be optimal, I haven't tried it.
Here's how it would look like.
Wrap the code that will invoke SaveFilesAsync in a try/catch to handle operation canceled via the CancellationTokenSource
var cts = new CancellationTokenSource();
try
{
Task.WaitAll(SaveFilesAsync(#"C:\Some\Path", files, cts.Token));
}
catch (Exception)
{
Debug.Print("SaveFilesAsync Exception");
}
finally
{
cts.Dispose();
}
Then do your parallelism in that method.
public async Task SaveFilesAsync(string path, List<string> list, CancellationToken token)
{
int counter = 0;
var options = new ParallelOptions
{
CancellationToken = token,
MaxDegreeOfParallelism = Environment.ProcessorCount,
TaskScheduler = TaskScheduler.Default
};
await Task.Run(
() =>
{
try
{
Parallel.ForEach(
list,
options,
(item, state) =>
{
// if cancellation is requested, this will throw an OperationCanceledException caught outside the Parallel loop
options.CancellationToken.ThrowIfCancellationRequested();
// safely increment and get your next file number
int index = Interlocked.Increment(ref counter);
string fullPath = string.Format(#"{0}\{1}_element.txt", path, index);
using (var sw = File.CreateText(fullPath))
{
sw.WriteLine(item);
}
Debug.Print(
"Saved in thread: {0} to {1}",
Thread.CurrentThread.ManagedThreadId,
fullPath);
});
}
catch (OperationCanceledException)
{
Debug.Print("Operation Canceled");
}
});
}
The other part of your code doesn't change, simply adapt where you create your list of files contents.
Edit: The try/catch around the invocation of the SaveFileAsync method does nothing actually, it is all handled inside SaveFileAsync.
Try this:
public async Task SaveToFileAsync(string fullPath, line)
{
using (var sw = File.CreateText(fullPath))
{
await sw.WriteLineAsync(str);
}
Log("Saved in thread: {0} to {1}",
Environment.CurrentManagedThreadId,
fullPath);
}
public async Task SaveToFilesAsync(string path, List<string> list)
{
await Task.WhenAll(
list
.Select((line, i) =>
SaveToFileAsync(
string.Format(
#"{0}\{1}_element.txt",
path,
i),
line));
}
Since you're writing only one line per file and you want to parellelize it all, I don't think it's cancellable.
There're Task.WaitAll method which waits for all tasks and Task.WaitAny method which waits for one task. How to wait for any N tasks?
Use case: search result pages are downloaded, each result needs a separate task to download and process it. If I use WaitAll to wait for the results of the subtasks before getting next search result page, I will not use all available resources (one long task will delay the rest). Not waiting at all can cause thousands of tasks to be queued which isn't the best idea either.
So, how to wait for a subset of tasks to be completed? Or, alternatively, how to wait for the task scheduler queue to have only N tasks?
This looks like an excellent problem for TPL Dataflow, which will allow you to control parallelism and buffering to process at maximum speed.
Here's some (untested) code to show you what I mean:
static void Process()
{
var searchReader =
new TransformManyBlock<SearchResult, SearchResult>(async uri =>
{
// return a list of search results at uri.
return new[]
{
new SearchResult
{
IsResult = true,
Uri = "http://foo.com"
},
new SearchResult
{
// return the next search result page here.
IsResult = false,
Uri = "http://google.com/next"
}
};
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 8, // restrict buffer size.
MaxDegreeOfParallelism = 4 // control parallelism.
});
// link "next" pages back to the searchReader.
searchReader.LinkTo(searchReader, x => !x.IsResult);
var resultActor = new ActionBlock<SearchResult>(async uri =>
{
// do something with the search result.
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 64,
MaxDegreeOfParallelism = 16
});
// link search results into resultActor.
searchReader.LinkTo(resultActor, x => x.IsResult);
// put in the first piece of input.
searchReader.Post(new SearchResult { Uri = "http://google/first" });
}
struct SearchResult
{
public bool IsResult { get; set; }
public string Uri { get; set; }
}
I think you should independently limit the number of parallel download tasks and the number of concurrent result processing tasks. I would do it using two SemaphoreSlim objects, like below. This version doesn't use the synchronous SemaphoreSlim.Wait (thanks #svick for making the point). It was only slightly tested, the exception handling can be improved; substitute your own DownloadNextPageAsync and ProcessResults:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Console_21666797
{
partial class Program
{
// the actual download method
// async Task<string> DownloadNextPageAsync(string url) { ... }
// the actual process methods
// void ProcessResults(string data) { ... }
// download and process all pages
async Task DownloadAndProcessAllAsync(
string startUrl, int maxDownloads, int maxProcesses)
{
// max parallel downloads
var downloadSemaphore = new SemaphoreSlim(maxDownloads);
// max parallel processing tasks
var processSemaphore = new SemaphoreSlim(maxProcesses);
var tasks = new HashSet<Task>();
var complete = false;
var protect = new Object(); // protect tasks
var page = 0;
// do the page
Func<string, Task> doPageAsync = async (url) =>
{
bool downloadSemaphoreAcquired = true;
try
{
// download the page
var data = await DownloadNextPageAsync(
url).ConfigureAwait(false);
if (String.IsNullOrEmpty(data))
{
Volatile.Write(ref complete, true);
}
else
{
// enable the next download to happen
downloadSemaphore.Release();
downloadSemaphoreAcquired = false;
// process this download
await processSemaphore.WaitAsync();
try
{
await Task.Run(() => ProcessResults(data));
}
finally
{
processSemaphore.Release();
}
}
}
catch (Exception)
{
Volatile.Write(ref complete, true);
throw;
}
finally
{
if (downloadSemaphoreAcquired)
downloadSemaphore.Release();
}
};
// do the page and save the task
Func<string, Task> queuePageAsync = async (url) =>
{
var task = doPageAsync(url);
lock (protect)
tasks.Add(task);
await task;
lock (protect)
tasks.Remove(task);
};
// process pages in a loop until complete is true
while (!Volatile.Read(ref complete))
{
page++;
// acquire download semaphore synchrnously
await downloadSemaphore.WaitAsync().ConfigureAwait(false);
// do the page
var task = queuePageAsync(startUrl + "?page=" + page);
}
// await completion of the pending tasks
Task[] pendingTasks;
lock (protect)
pendingTasks = tasks.ToArray();
await Task.WhenAll(pendingTasks);
}
static void Main(string[] args)
{
new Program().DownloadAndProcessAllAsync("http://google.com", 10, 5).Wait();
Console.ReadLine();
}
}
}
Something like this should work. There might be some edge cases, but all in all it should ensure a minimum of completions.
public static async Task WhenN(IEnumerable<Task> tasks, int n, CancellationTokenSource cts = null)
{
var pending = new HashSet<Task>(tasks);
if (n > pending.Count)
{
n = pending.Count;
// or throw
}
var completed = 0;
while (completed != n)
{
var completedTask = await Task.WhenAny(pending);
pending.Remove(completedTask);
completed++;
}
if (cts != null)
{
cts.Cancel();
}
}
Usage:
static void Main(string[] args)
{
var tasks = new List<Task>();
var completed = 0;
var cts = new CancellationTokenSource();
for (int i = 0; i < 100; i++)
{
tasks.Add(Task.Run(async () =>
{
await Task.Delay(temp * 100, cts.Token);
Console.WriteLine("Completed task {0}", i);
completed++;
}, cts.Token));
}
Extensions.WhenN(tasks, 30, cts).Wait();
Console.WriteLine(completed);
Console.ReadLine();
}
Task[] runningTasks = MyTasksFactory.StartTasks();
while(runningTasks.Any())
{
int finished = Task.WaitAny(runningTasks);
Task.Factory.StareNew(()=> {Consume(runningTasks[Finished].Result);})
runningTasks.RemoveAt(finished);
}