Parallel Mulit-threaded Downloads using async-await

Parallel Mulit-threaded Downloads using async-await - c#

I have 100s of multiple big files to download from web in my windows service - C#. The requirement is to maintain at one time - max 4 parallel web file downloads.
Can I achieve concurrent/parallel downloads using async await or do I have to use BackgroundWorker process or threads ? Is async-await multithreaded ?
See my sample Program using async-await below:
static int i = 0;
Timer_tick() {
while (i < 4) {
i++;
model = GetNextModel();
await Download(model);
}
}
private async Download(XYZ model) {
Task<FilesetResult> t = DoWork(model);
result = await t;
//Use Result
}
private async Task<FilesetResult> Work(XYZ model) {
fileresult = await api.Download(model.path)
i--;
return filesetresult;
}

You can limit number of async tasks running in parallel using SemaphoreSlim class. Something like:
List<DownloadRequest> requests = Enumerable.Range(0, 100).Select(x => new DownloadRequest()).ToList();
using (var throttler = new SemaphoreSlim(4))
{
Task<DownloadResult>[] downloadTasks = requests.Select(request => Task.Run(async () =>
{
await throttler.WaitAsync();
try
{
return await DownloadTaskAsync(request);
}
finally
{
throttler.Release();
}
})).ToArray();
await Task.WhenAll(downloadTasks);
}
Update: thank you for comments, fixed issues.
Update2: Sample solution for dynamic list of requests
public class DownloadManager : IDisposable
{
private readonly SemaphoreSlim _throttler = new SemaphoreSlim(4);
public async Task<DownloadResult> DownloadAsync(DownloadRequest request)
{
await _throttler.WaitAsync();
try
{
return await api.Download(request);
}
finally
{
_throttler.Release();
}
}
public void Dispose()
{
_throttler?.Dispose();
}
}

Doing it by hand seems awfully complicated.
var files = new List<Uri>();
Parallel.ForEach(files,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
this.Download);
Now all you need is a single, normal, synchronous method private void Download(Uri file) and you are good to go.
If you need a producer/consumer pattern, the easiest version might be a BlockingCollection:
using System;
using System.Collections.Concurrent;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApp11
{
internal class Program
{
internal static void Main()
{
using (var queue = new BlockingCollection<Uri>())
{
// starting the producer task:
Task.Factory.StartNew(() =>
{
for (int i = 0; i < 100; i++)
{
// faking read from message queue... we get a new Uri every 100 ms
queue.Add(new Uri("http://www.example.com/" + i));
Thread.Sleep(100);
}
// just to end this program... you don't need to end this, just listen to your message queue
queue.CompleteAdding();
});
// run the consumers:
Parallel.ForEach(queue.GetConsumingEnumerable(), new ParallelOptions { MaxDegreeOfParallelism = 4 }, Download);
}
}
internal static void Download(Uri uri)
{
// download your file here
Console.WriteLine($"Downloading {uri} [.. ]");
Thread.Sleep(1000);
Console.WriteLine($"Downloading {uri} [..... ]");
Thread.Sleep(1000);
Console.WriteLine($"Downloading {uri} [....... ]");
Thread.Sleep(1000);
Console.WriteLine($"Downloading {uri} [......... ]");
Thread.Sleep(1000);
Console.WriteLine($"Downloading {uri} [..........]");
Thread.Sleep(1000);
Console.WriteLine($"Downloading {uri} OK");
}
}
}

Related

How to concurrently complete HTTP calls on an observable collection?

In the WPF .net core app there is the following:
An Observable Collection of items (itemObservCollection).
A static readonly HttpClient _httpclient
XML Responses
I am making a URL call to the api on each item in the observable collection (0 to 1000 items in collection). The return is XML. The XML is parsed using XElement. The property values in the observable collection are updated from the XML.
Task.Run is used to run the operation off the UI thread. Parallel.Foreach is used to make the calls in Parallel.
I feel I have made the solution overly complicated. Is there a way to simplify this? UpdateItems() is called from a button click.
private async Task UpdateItems()
{
try
{
await Task.Run(() => Parallel.ForEach(itemObservCollection, new ParallelOptions { MaxDegreeOfParallelism = 12 }, async item =>
{
try
{
var apiRequestString = $"http://localhost:6060/" + item.Name;
HttpResponseMessage httpResponseMessage = await _httpclient.GetAsync(apiRequestString);
var httpResponseStream = await httpResponseMessage.Content.ReadAsStreamAsync();
StringBuilder sb = new StringBuilder(1024);
XElement doc = XElement.Load(httpResponseStream);
foreach (var elem in doc.Descendants())
{
if (elem.Name == "ItemDetails")
{
var itemUpdate = itemObservCollection.FirstOrDefault(updateItem => updateItem.Name == item.Name);
if (itemUpdate != null)
{
itemUpdate.Price = decimal.Parse(elem.Attribute("Price").Value);
itemUpdate.Quantity = int.Parse(elem.Attribute("Quantity").Value);
}
}
}
}
catch (Exception ex)
{
LoggerTextBlock.Text = ('\n' + ex.ToString());
}
}));
}
catch (Exception ex)
{
LoggerTextBlock.Text = ('\n' + ex.ToString());
}
}

You could create an array of tasks and await them all using Task.WhenAll.
The following sample code kicks off a task per item in the ObservableCollection<int> and then wait asynchronously for all tasks to finish:
ObservableCollection<int> itemObservCollection =
new ObservableCollection<int>(Enumerable.Range(1, 10));
async Task SendAsync()
{
//query the HTTP API here...
await Task.Delay(1000);
}
await Task.WhenAll(itemObservCollection.Select(x => SendAsync()).ToArray());
If you want to limit the number of concurrent requests, you could either iterate through a subset of the source collecton to send requests in batches or use a SemaphoreSlim to limit the number of actual concurrent requests:
Task[] tasks = new Task[itemObservCollection.Count];
using (SemaphoreSlim semaphoreSlim = new SemaphoreSlim(12))
{
for (int i = 0; i < itemObservCollection.Count; ++i)
{
async Task SendAsync()
{
//query the HTTP API here...
try
{
await Task.Delay(5000);
}
finally
{
semaphoreSlim.Release();
}
}
await semaphoreSlim.WaitAsync();
tasks[i] = SendAsync();
}
await Task.WhenAll(tasks);
}

Throttling semaphore http-server C#

My task: Organize the stability of the server under the load exceeding its capabilities.
Here is the code:
private async Task HandleOneRequest(HttpListenerContext listenerContext)
{
// one request processing
await Task.CompletedTask;
}
private async Task HandleContextAsync(HttpListenerContext listenerContext)
{
var allTasks = new List<Task>();
var queue = new Queue<HttpListenerContext>();
queue.Enqueue(listenerContext);
if (queue.Count > 50) // the number 50 is taken approximately
{
// many request
}
else
{
using (var throttler = new SemaphoreSlim(Environment.ProcessorCount, Environment.ProcessorCount))
{
foreach (var request in queue)
{
await throttler.WaitAsync();
allTasks.Add(Task.Run(async () =>
{
try
{
await HandleOneRequest(queue.Dequeue());
}
finally
{
throttler.Release();
}
}));
}
await Task.WhenAll(allTasks);
}
}
}
Am I using the semaphore correctly? How else can you implement throttling http-server? Is the request queue created correctly?

TPL Dataflow block which delays the forward of the message to the next block

I require a Dataflow block which delays the forward of the message to the next block based on the timestamp in the message (LogEntry).
This is what i came up with but it feels not right. Any suggestions for improvements?
private IPropagatorBlock<LogEntry, LogEntry> DelayedForwardBlock()
{
var buffer = new ConcurrentQueue<LogEntry>();
var source = new BufferBlock<LogEntry>();
var target = new ActionBlock<LogEntry>(item =>
{
buffer.Enqueue(item);
});
Task.Run(() =>
{
LogEntry entry;
while (true)
{
entry = null;
if (buffer.TryPeek(out entry))
{
if (entry.UtcTimestamp < (DateTime.UtcNow - TimeSpan.FromMinutes(5)))
{
buffer.TryDequeue(out entry);
source.Post(entry);
}
}
}
});
target.Completion.ContinueWith(delegate
{
LogEntry entry;
while (buffer.TryDequeue(out entry))
{
source.Post(entry);
}
source.Complete();
});
return DataflowBlock.Encapsulate(target, source);
}

You could simply use a single TransformBlock that asynchronously waits out the delay using Task.Delay:
IPropagatorBlock<TItem, TItem> DelayedForwardBlock<TItem>(TimeSpan delay)
{
return new TransformBlock<TItem, TItem>(async item =>
{
await Task.Delay(delay);
return item;
});
}
Usage:
var block = DelayedForwardBlock<LogEntry>(TimeSpan.FromMinutes(5));

Task.WaitSubset / Task.WaitN?

There're Task.WaitAll method which waits for all tasks and Task.WaitAny method which waits for one task. How to wait for any N tasks?
Use case: search result pages are downloaded, each result needs a separate task to download and process it. If I use WaitAll to wait for the results of the subtasks before getting next search result page, I will not use all available resources (one long task will delay the rest). Not waiting at all can cause thousands of tasks to be queued which isn't the best idea either.
So, how to wait for a subset of tasks to be completed? Or, alternatively, how to wait for the task scheduler queue to have only N tasks?

This looks like an excellent problem for TPL Dataflow, which will allow you to control parallelism and buffering to process at maximum speed.
Here's some (untested) code to show you what I mean:
static void Process()
{
var searchReader =
new TransformManyBlock<SearchResult, SearchResult>(async uri =>
{
// return a list of search results at uri.
return new[]
{
new SearchResult
{
IsResult = true,
Uri = "http://foo.com"
},
new SearchResult
{
// return the next search result page here.
IsResult = false,
Uri = "http://google.com/next"
}
};
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 8, // restrict buffer size.
MaxDegreeOfParallelism = 4 // control parallelism.
});
// link "next" pages back to the searchReader.
searchReader.LinkTo(searchReader, x => !x.IsResult);
var resultActor = new ActionBlock<SearchResult>(async uri =>
{
// do something with the search result.
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 64,
MaxDegreeOfParallelism = 16
});
// link search results into resultActor.
searchReader.LinkTo(resultActor, x => x.IsResult);
// put in the first piece of input.
searchReader.Post(new SearchResult { Uri = "http://google/first" });
}
struct SearchResult
{
public bool IsResult { get; set; }
public string Uri { get; set; }
}

I think you should independently limit the number of parallel download tasks and the number of concurrent result processing tasks. I would do it using two SemaphoreSlim objects, like below. This version doesn't use the synchronous SemaphoreSlim.Wait (thanks #svick for making the point). It was only slightly tested, the exception handling can be improved; substitute your own DownloadNextPageAsync and ProcessResults:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Console_21666797
{
partial class Program
{
// the actual download method
// async Task<string> DownloadNextPageAsync(string url) { ... }
// the actual process methods
// void ProcessResults(string data) { ... }
// download and process all pages
async Task DownloadAndProcessAllAsync(
string startUrl, int maxDownloads, int maxProcesses)
{
// max parallel downloads
var downloadSemaphore = new SemaphoreSlim(maxDownloads);
// max parallel processing tasks
var processSemaphore = new SemaphoreSlim(maxProcesses);
var tasks = new HashSet<Task>();
var complete = false;
var protect = new Object(); // protect tasks
var page = 0;
// do the page
Func<string, Task> doPageAsync = async (url) =>
{
bool downloadSemaphoreAcquired = true;
try
{
// download the page
var data = await DownloadNextPageAsync(
url).ConfigureAwait(false);
if (String.IsNullOrEmpty(data))
{
Volatile.Write(ref complete, true);
}
else
{
// enable the next download to happen
downloadSemaphore.Release();
downloadSemaphoreAcquired = false;
// process this download
await processSemaphore.WaitAsync();
try
{
await Task.Run(() => ProcessResults(data));
}
finally
{
processSemaphore.Release();
}
}
}
catch (Exception)
{
Volatile.Write(ref complete, true);
throw;
}
finally
{
if (downloadSemaphoreAcquired)
downloadSemaphore.Release();
}
};
// do the page and save the task
Func<string, Task> queuePageAsync = async (url) =>
{
var task = doPageAsync(url);
lock (protect)
tasks.Add(task);
await task;
lock (protect)
tasks.Remove(task);
};
// process pages in a loop until complete is true
while (!Volatile.Read(ref complete))
{
page++;
// acquire download semaphore synchrnously
await downloadSemaphore.WaitAsync().ConfigureAwait(false);
// do the page
var task = queuePageAsync(startUrl + "?page=" + page);
}
// await completion of the pending tasks
Task[] pendingTasks;
lock (protect)
pendingTasks = tasks.ToArray();
await Task.WhenAll(pendingTasks);
}
static void Main(string[] args)
{
new Program().DownloadAndProcessAllAsync("http://google.com", 10, 5).Wait();
Console.ReadLine();
}
}
}

Something like this should work. There might be some edge cases, but all in all it should ensure a minimum of completions.
public static async Task WhenN(IEnumerable<Task> tasks, int n, CancellationTokenSource cts = null)
{
var pending = new HashSet<Task>(tasks);
if (n > pending.Count)
{
n = pending.Count;
// or throw
}
var completed = 0;
while (completed != n)
{
var completedTask = await Task.WhenAny(pending);
pending.Remove(completedTask);
completed++;
}
if (cts != null)
{
cts.Cancel();
}
}
Usage:
static void Main(string[] args)
{
var tasks = new List<Task>();
var completed = 0;
var cts = new CancellationTokenSource();
for (int i = 0; i < 100; i++)
{
tasks.Add(Task.Run(async () =>
{
await Task.Delay(temp * 100, cts.Token);
Console.WriteLine("Completed task {0}", i);
completed++;
}, cts.Token));
}
Extensions.WhenN(tasks, 30, cts).Wait();
Console.WriteLine(completed);
Console.ReadLine();
}

Task[] runningTasks = MyTasksFactory.StartTasks();
while(runningTasks.Any())
{
int finished = Task.WaitAny(runningTasks);
Task.Factory.StareNew(()=> {Consume(runningTasks[Finished].Result);})
runningTasks.RemoveAt(finished);
}

How to use await in a loop

I'm trying to create an asynchronous console app that does a some work on a collection. I have one version which uses parallel for loop another version that uses async/await. I expected the async/await version to work similar to parallel version but it executes synchronously. What am I doing wrong?
public class Program
{
public static void Main(string[] args)
{
var worker = new Worker();
worker.ParallelInit();
var t = worker.Init();
t.Wait();
Console.ReadKey();
}
}
public class Worker
{
public async Task<bool> Init()
{
var series = Enumerable.Range(1, 5).ToList();
foreach(var i in series)
{
Console.WriteLine("Starting Process {0}", i);
var result = await DoWorkAsync(i);
if (result)
{
Console.WriteLine("Ending Process {0}", i);
}
}
return true;
}
public async Task<bool> DoWorkAsync(int i)
{
Console.WriteLine("working..{0}", i);
await Task.Delay(1000);
return true;
}
public bool ParallelInit()
{
var series = Enumerable.Range(1, 5).ToList();
Parallel.ForEach(series, i =>
{
Console.WriteLine("Starting Process {0}", i);
DoWorkAsync(i);
Console.WriteLine("Ending Process {0}", i);
});
return true;
}
}

The way you're using the await keyword tells C# that you want to wait each time you pass through the loop, which isn't parallel. You can rewrite your method like this to do what you want, by storing a list of Tasks and then awaiting them all with Task.WhenAll.
public async Task<bool> Init()
{
var series = Enumerable.Range(1, 5).ToList();
var tasks = new List<Task<Tuple<int, bool>>>();
foreach (var i in series)
{
Console.WriteLine("Starting Process {0}", i);
tasks.Add(DoWorkAsync(i));
}
foreach (var task in await Task.WhenAll(tasks))
{
if (task.Item2)
{
Console.WriteLine("Ending Process {0}", task.Item1);
}
}
return true;
}
public async Task<Tuple<int, bool>> DoWorkAsync(int i)
{
Console.WriteLine("working..{0}", i);
await Task.Delay(1000);
return Tuple.Create(i, true);
}

Your code waits for each operation (using await) to finish before starting the next iteration.
Therefore, you don't get any parallelism.
If you want to run an existing asynchronous operation in parallel, you don't need await; you just need to get a collection of Tasks and call Task.WhenAll() to return a task that waits for all of them:
return Task.WhenAll(list.Select(DoWorkAsync));

public async Task<bool> Init()
{
var series = Enumerable.Range(1, 5);
Task.WhenAll(series.Select(i => DoWorkAsync(i)));
return true;
}

In C# 7.0 you can use semantic names to each of the members of the tuple, here is Tim S.'s answer using the new syntax:
public async Task<bool> Init()
{
var series = Enumerable.Range(1, 5).ToList();
var tasks = new List<Task<(int Index, bool IsDone)>>();
foreach (var i in series)
{
Console.WriteLine("Starting Process {0}", i);
tasks.Add(DoWorkAsync(i));
}
foreach (var task in await Task.WhenAll(tasks))
{
if (task.IsDone)
{
Console.WriteLine("Ending Process {0}", task.Index);
}
}
return true;
}
public async Task<(int Index, bool IsDone)> DoWorkAsync(int i)
{
Console.WriteLine("working..{0}", i);
await Task.Delay(1000);
return (i, true);
}
You could also get rid of task. inside foreach:
// ...
foreach (var (IsDone, Index) in await Task.WhenAll(tasks))
{
if (IsDone)
{
Console.WriteLine("Ending Process {0}", Index);
}
}
// ...

We can use async method in foreach loop to run async API calls.
public static void Main(string[] args)
{
List<ZoneDetails> lst = GetRecords();
foreach (var item in lst)
{
//For loop run asyn
var result = GetAPIData(item.ZoneId, item.fitnessclassid).Result;
if (result != null && result.EventHistoryId != null)
{
UpdateDB(result);
}
}
}
private static async Task<FODBrandChannelLicense> GetAPIData(int zoneId, int fitnessclassid)
{
HttpClient.DefaultRequestHeaders.Authorization = new AuthenticationHeaderValue("Bearer", token);
var response = HttpClient.GetAsync(new Uri(url)).Result;
var content = response.Content.ReadAsStringAsync().Result;
var result = JsonConvert.DeserializeObject<Model>(content);
if (response.EnsureSuccessStatusCode().IsSuccessStatusCode)
{
Console.WriteLine($"API Call completed successfully");
}
return result;
}

To add to the already good answers here, it's always helpful to me to remember that the async method returns a Task.
So in the example in this question, each iteration of the loop has await. This causes the Init() method to return control to its caller with a Task<bool> - not a bool.
Thinking of await as just a magic word that causes execution state to be saved, then skipped to the next available line until ready, encourages confusion: "why doesn't the for loop just skip the line with await and go to the next statement?"
If instead you think of await as something more like a yield statement, that brings a Task with it when it returns control to the caller, in my opinion flow starts to make more sense: "the for loop stops at await, and returns control and the Task to the caller. The for loop won't continue until that is done."

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallel Mulit-threaded Downloads using async-await - c#

Related

How to concurrently complete HTTP calls on an observable collection?

Throttling semaphore http-server C#

TPL Dataflow block which delays the forward of the message to the next block

Task.WaitSubset / Task.WaitN?

How to use await in a loop

Categories

Resources