async i/o and process results as they become available - c#

I has a simple console app where I want to call many Urls in a loop and put the result in a database table. I am using .Net 4.5 and using async i/o to fetch the URL data. Here is a simplified version of what I am doing. All methods are async except for the database operation. Do you guys see any issues with this? Are there better ways of optimizing?
private async Task Run(){
var items = repo.GetItems(); // sync method to get list from database
var tasks = new List<Task>();
// add each call to task list and process result as it becomes available
// rather than waiting for all downloads
foreach(Item item in items){
tasks.Add(GetFromWeb(item.url).ContinueWith(response => { AddToDatabase(response.Result);}));
}
await Task.WhenAll(tasks); // wait for all tasks to complete.
}
private async Task<string> GetFromWeb(url) {
HttpResponseMessage response = await GetAsync(url);
return await response.Content.ReadAsStringAsync();
}
private void AddToDatabase(string item){
// add data to database.
}

Your solution is acceptable. But you should check out TPL Dataflow, which allows you to set up a dataflow "mesh" (or "pipeline") and then shove the data through it.
For a problem this simple, Dataflow won't really add much other than getting rid of the ContinueWith (I always find manual continuations awkward). But if you plan to add more steps or change your data flow in the future, Dataflow should be something you consider.

Your solution is pretty much correct, with just two minor mistakes (both of which cause compiler errors). First, you don't call ContinueWith on the result of List.Add, you need call continue with on the task and then add the continuation to your list, this is solved by just moving a parenthesis. You also need to call Result on the reponse Task.
Here is the section with the two minor changes:
tasks.Add(GetFromWeb(item.url)
.ContinueWith(response => { AddToDatabase(response.Result);}));
Another option is to leverage a method that takes a sequence of tasks and orders them by the order that they are completed. Here is my implementation of such a method:
public static IEnumerable<Task<T>> Order<T>(this IEnumerable<Task<T>> tasks)
{
var taskList = tasks.ToList();
var taskSources = new BlockingCollection<TaskCompletionSource<T>>();
var taskSourceList = new List<TaskCompletionSource<T>>(taskList.Count);
foreach (var task in taskList)
{
var newSource = new TaskCompletionSource<T>();
taskSources.Add(newSource);
taskSourceList.Add(newSource);
task.ContinueWith(t =>
{
var source = taskSources.Take();
if (t.IsCanceled)
source.TrySetCanceled();
else if (t.IsFaulted)
source.TrySetException(t.Exception.InnerExceptions);
else if (t.IsCompleted)
source.TrySetResult(t.Result);
}, CancellationToken.None, TaskContinuationOptions.PreferFairness, TaskScheduler.Default);
}
return taskSourceList.Select(tcs => tcs.Task);
}
Using this your code can become:
private async Task Run()
{
IEnumerable<Item> items = repo.GetItems(); // sync method to get list from database
foreach (var task in items.Select(item => GetFromWeb(item.url))
.Order())
{
await task.ConfigureAwait(false);
AddToDatabase(task.Result);
}
}

Just though I'd throw in my hat as well with the Rx solution
using System.Reactive;
using System.Reactive.Linq;
private Task Run()
{
var fromWebObservable = from item in repo.GetItems.ToObservable(Scheduler.Default)
select GetFromWeb(item.url);
fromWebObservable
.Select(async x => await x)
.Do(AddToDatabase)
.ToTask();
}

Related

Correct pattern to check if an async Task completed synchronously before awaiting it

I have a bunch of requests to process, some of which may complete synchronously.
I'd like to gather all results that are immediately available and return them early, while waiting for the rest.
Roughly like this:
List<Task<Result>> tasks = new ();
List<Result> results = new ();
foreach (var request in myRequests) {
var task = request.ProcessAsync();
if (task.IsCompleted)
results.Add(task.Result); // or Add(await task) ?
else
tasks.Add(task);
}
// send results that are available "immediately" while waiting for the rest
if (results.Count > 0) SendResults(results);
results = await Task.WhenAll(tasks);
SendResults(results);
I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted, or where it may change back to false again, etc.?
Similarly, could it be dangerous to use task.Result even after checking IsCompleted, should one always prefer await task? What if were using ValueTask instead of Task?
I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted...
If you're in a multithreaded context, it's possible that IsCompleted could return false at the moment when you check on it, but it completes immediately thereafter. In cases like the code you're using, the cost of this happening would be very low, so I wouldn't worry about it.
or where it may change back to false again, etc.?
No, once a Task completes, it cannot uncomplete.
could it be dangerous to use task.Result even after checking IsCompleted.
Nope, that should always be safe.
should one always prefer await task?
await is a great default when you don't have a specific reason to do something else, but there are a variety of use cases where other patterns might be useful. The use case you've highlighted is a good example, where you want to return the results of finished tasks without awaiting all of them.
As Stephen Cleary mentioned in a comment below, it may still be worthwhile to use await to maintain expected exception behavior. You might consider doing something more like this:
var requestsByIsCompleted = myRequests.ToLookup(r => r.IsCompleted);
// send results that are available "immediately" while waiting for the rest
SendResults(await Task.WhenAll(requestsByIsCompleted[true]));
SendResults(await Task.WhenAll(requestsByIsCompleted[false]));
What if were using ValueTask instead of Task?
The answers above apply equally to both types.
You could use code like this to continually send the results of completed tasks while waiting on others to complete.
foreach (var request in myRequests)
{
tasks.Add(request.ProcessAsync());
}
// wait for at least one task to be complete, then send all available results
while (tasks.Count > 0)
{
// wait for at least one task to complete
Task.WaitAny(tasks.ToArray());
// send results for each completed task
var completedTasks = tasks.Where(t => t.IsCompleted);
var results = completedTasks.Where(t => t.IsCompletedSuccessfully).Select(t => t.Result).ToList();
SendResults(results);
// TODO: handle completed but failed tasks here
// remove completed tasks from the tasks list and keep waiting
tasks.RemoveAll(t => completedTasks.Contains(t));
}
Using only await you can achieve the desired behavior:
async Task ProcessAsync(MyRequest request, Sender sender)
{
var result = await request.ProcessAsync();
await sender.SendAsync(result);
}
...
async Task ProcessAll()
{
var tasks = new List<Task>();
foreach(var request in requests)
{
var task = ProcessAsync(request, sender);
// Dont await until all requests are queued up
tasks.Add(task);
}
// Await on all outstanding requests
await Task.WhenAll(tasks);
}
There are already good answers, but in addition of them here is my suggestion too, on how to handle multiple tasks and process each task differently, maybe it will suit your needs. My example is with events, but you can replace them with some kind of state management that fits your needs.
public interface IRequestHandler
{
event Func<object, Task> Ready;
Task ProcessAsync();
}
public class RequestHandler : IRequestHandler
{
// Hier where you wraps your request:
// private object request;
private readonly int value;
public RequestHandler(int value)
=> this.value = value;
public event Func<object, Task> Ready;
public async Task ProcessAsync()
{
await Task.Delay(1000 * this.value);
// Hier where you calls:
// var result = await request.ProcessAsync();
//... then do something over the result or wrap the call in try catch for example
var result = $"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]";
if (this.Ready is not null)
{
// If result passes send the result to all subscribers
await this.Ready.Invoke($"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]");
}
}
}
static void Main()
{
var a = new RequestHandler(1);
a.Ready += PrintAsync;
var b = new RequestHandler(2);
b.Ready += PrintAsync;
var c = new RequestHandler(3);
c.Ready += PrintAsync;
var d= new RequestHandler(4);
d.Ready += PrintAsync;
var e = new RequestHandler(5);
e.Ready += PrintAsync;
var f = new RequestHandler(6);
f.Ready += PrintAsync;
var requests = new List<IRequestHandler>()
{
a, b, c, d, e, f
};
var tasks = requests
.Select(x => Task.Run(x.ProcessAsync));
// Hier you must await all of the tasks
Task
.Run(async () => await Task.WhenAll(tasks))
.Wait();
}
static Task PrintAsync(object output)
{
Console.WriteLine(output);
return Task.CompletedTask;
}

Asynchronous foreach

is there a way for an Asynchronous foreach in C#?
where id(s) will be processed asynchronously by the method, instead of using Parallel.ForEach
//This Gets all the ID(s)-IEnumerable <int>
var clientIds = new Clients().GetAllClientIds();
Parallel.ForEach(clientIds, ProcessId); //runs the method in parallel
static void ProcessId(int id)
{
// just process the id
}
should be something a foreach but runs asynchronously
foreach(var id in clientIds)
{
ProcessId(id) //runs the method with each Id asynchronously??
}
i'm trying to run the Program in Console, it should wait for all id(s) to complete processing before closing the Console.
No, it is not really possible.
Instead in foreach loop add what you want to do as Task for Task collection and later use Task.WaitAll.
var tasks = new List<Task>();
foreach(var something in somethings)
tasks.Add(DoJobAsync(something));
await Task.WhenAll(tasks);
Note that method DoJobAsync should return Task.
Update:
If your method does not return Task but something else (eg void) you have two options which are essentially the same:
1.Add Task.Run(action) to tasks collection instead
tasks.Add(Task.Run(() => DoJob(something)));
2.Wrap your sync method in method returning Task
private Task DoJobAsync(Something something)
{
return Task.Run(() => DoJob(something));
}
You can also use Task<TResult> generic if you want to receive some results from task execution.
Your target method would have to return a Task
static Task ProcessId(int id)
{
// just process the id
}
Processing ids would be done like this
// This Gets all the ID(s)-IEnumerable <int>
var clientIds = new Clients().GetAllClientIds();
// This gets all the tasks to be executed
var tasks = clientIds.Select(id => ProcessId(id)).
// this will create a task that will complete when all of the `Task`
// objects in an enumerable collection have completed.
await Task.WhenAll(tasks);
Now, in .NET 6 there is already a built-in Parallel.ForEachAsync
See:
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreachasync?view=net-6.0
https://www.hanselman.com/blog/parallelforeachasync-in-net-6

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

Best practice for task/await in a foreach loop

I have some time consuming code in a foreach that uses task/await.
it includes pulling data from the database, generating html, POSTing that to an API, and saving the replies to the DB.
A mock-up looks like this
List<label> labels = db.labels.ToList();
foreach (var x in list)
{
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
List<response> res = await api.sendMessage(object); //POST
//put all the responses in the db
foreach (var r in res)
{
db.responses.add(r);
}
db.SaveChanges();
}
Time wise, generating the Html and posting it to the API seem to be taking most of the time.
Ideally it would be great if I could generate the HTML for the next item, and wait for the post to finish, before posting the next item.
Other ideas are also welcome.
How would one go about this?
I first thought of adding a Task above the foreach and wait for that to finish before making the next POST, but then how do I process the last loop... it feels messy...
You can do it in parallel but you will need different context in each Task.
Entity framework is not thread safe, so if you can't use one context in parallel tasks.
var tasks = myLabels.Select( async label=>{
using(var db = new MyDbContext ()){
// do processing...
var response = await api.getresponse();
db.Responses.Add(response);
await db.SaveChangesAsync();
}
});
await Task.WhenAll(tasks);
In this case, all tasks will appear to run in parallel, and each task will have its own context.
If you don't create new Context per task, you will get error mentioned on this question Does Entity Framework support parallel async queries?
It's more an architecture problem than a code issue here, imo.
You could split your work into two separate parts:
Get data from database and generate HTML
Send API request and save response to database
You could run them both in parallel, and use a queue to coordinate that: whenever your HTML is ready it's added to a queue and another worker proceeds from there, taking that HTML and sending to the API.
Both parts can be done in multithreaded way too, e.g. you can process multiple items from the queue at the same time by having a set of workers looking for items to be processed in the queue.
This screams for the producer / consumer pattern: one producer produces data in a speed different than the consumer consumes it. Once the producer does not have anything to produce anymore it notifies the consumer that no data is expected anymore.
MSDN has a nice example of this pattern where several dataflowblocks are chained together: the output of one block is the input of another block.
Walkthrough: Creating a Dataflow Pipeline
The idea is as follows:
Create a class that will generate the HTML.
This class has an object of class System.Threading.Tasks.Dataflow.BufferBlock<T>
An async procedure creates all HTML output and await SendAsync the data to the bufferBlock
The buffer block implements interface ISourceBlock<T>. The class exposes this as a get property:
The code:
class MyProducer<T>
{
private System.Threading.Tasks.Dataflow.BufferBlock<T> bufferBlock = new BufferBlock<T>();
public ISourceBlock<T> Output {get {return this.bufferBlock;}
public async ProcessAsync()
{
while (somethingToProduce)
{
T producedData = ProduceOutput(...)
await this.bufferBlock.SendAsync(producedData);
}
// no date to send anymore. Mark the output complete:
this.bufferBlock.Complete()
}
}
A second class takes this ISourceBlock. It will wait at this source block until data arrives and processes it.
do this in an async function
stop when no more data is available
The code:
public class MyConsumer<T>
{
ISourceBlock<T> Source {get; set;}
public async Task ProcessAsync()
{
while (await this.Source.OutputAvailableAsync())
{ // there is input of type T, read it:
var input = await this.Source.ReceiveAsync();
// process input
}
// if here, no more input expected. finish.
}
}
Now put it together:
private async Task ProduceOutput<T>()
{
var producer = new MyProducer<T>();
var consumer = new MyConsumer<T>() {Source = producer.Output};
var producerTask = Task.Run( () => producer.ProcessAsync());
var consumerTask = Task.Run( () => consumer.ProcessAsync());
// while both tasks are working you can do other things.
// wait until both tasks are finished:
await Task.WhenAll(new Task[] {producerTask, consumerTask});
}
For simplicity I've left out exception handling and cancellation. StackOverFlow has artibles about exception handling and cancellation of Tasks:
Keep UI responsive using Tasks, Handle AggregateException
Cancel an Async Task or a List of Tasks
This is what I ended up using: (https://stackoverflow.com/a/25877042/275990)
List<ToSend> sendToAPI = new List<ToSend>();
List<label> labels = db.labels.ToList();
foreach (var x in list) {
var myLabels = labels.Where(q => !db.filter.Where(y => x.userid ==y.userid))
.Select(y => y.ID)
.Contains(q.id))
//Render the HTML
//do some fast stuff with objects
sendToAPI.add(the object with HTML);
}
int maxParallelPOSTs=5;
await TaskHelper.ForEachAsync(sendToAPI, maxParallelPOSTs, async i => {
using (NasContext db2 = new NasContext()) {
List<response> res = await api.sendMessage(i.object); //POST
//put all the responses in the db
foreach (var r in res)
{
db2.responses.add(r);
}
db2.SaveChanges();
}
});
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body) {
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext()) {
await body(partition.Current).ContinueWith(t => {
if (t.Exception != null) {
string problem = t.Exception.ToString();
}
//observe exceptions
});
}
}));
}
basically lets me generate the HTML sync, which is fine, since it only takes a few seconds to generate 1000's but lets me post and save to DB async, with as many threads as I predefine. In this case I'm posting to the Mandrill API, parallel posts are no problem.

Do I create a deadlock for Task.WhenAll()

I seem to be experiencing a deadlock with the following code, but I do not understand why.
From a certain point in code I call this method.
public async Task<SearchResult> Search(SearchData searchData)
{
var tasks = new List<Task<FolderResult>>();
using (var serviceClient = new Service.ServiceClient())
{
foreach (var result in MethodThatCallsWebservice(serviceClient, config, searchData))
tasks.Add(result);
return await GetResult(tasks);
}
Where GetResult is as following:
private static async Task<SearchResult> GetResult(IEnumerable<Task<FolderResult>> tasks)
{
var result = new SearchResult();
await Task.WhenAll(tasks).ConfigureAwait(false);
foreach (var taskResult in tasks.Select(p => p.MyResult))
{
foreach (var folder in taskResult.Result)
{
// Do stuff to fill result
}
}
return result;
}
The line var result = new SearchResult(); never completes, though the GUI is responsive because of the following code:
public async void DisplaySearchResult(Task<SearchResult> searchResult)
{
var result = await searchResult;
FillResultView(result);
}
This method is called via an event handler that called the Search method.
_view.Search += (sender, args) => _view.DisplaySearchResult(_model.Search(args.Value));
The first line of DisplaySearchResult gets called, which follows the path down to the GetResult method with the Task.WhenAll(...) part.
Why isn't the Task.WhenAll(...) ever completed? Did I not understand the use of await correctly?
If I run the tasks synchronously, I do get the result but then the GUI freezes:
foreach (var task in tasks)
task.RunSynchronously();
I read various solutions, but most were in combination with Task.WaitAll() and therefore did not help much. I also tried to use the help from this blogpost as you can see in DisplaySearchResult but I failed to get it to work.
Update 1:
The method MethodThatCallsWebservice:
private IEnumerable<Task<FolderResult>> MethodThatCallsWebservice(ServiceClient serviceClient, SearchData searchData)
{
// Doing stuff here to determine keys
foreach(var key in keys)
yield return new Task<FolderResult>(() => new FolderResult(key, serviceClient.GetStuff(input))); // NOTE: This is not the async variant
}
Since you have an asynchronous version of GetStuff (GetStuffAsync) it's much better to use it instead of offloading the synchronous GetStuff to a ThreadPool thread with Task.Run. This wastes threads and limits scalability.
async methods return a "hot" task so you don't need to call Start:
IEnumerable<Task<FolderResult>> MethodThatCallsWebservice(ServiceClient serviceClient, SearchData searchData)
{
return keys.Select(async key =>
new FolderResult(key, await serviceClient.GetStuffAsync(input)));
}
You need to start your tasks before you return them. Or even better use Task.Run.
This:
yield return new Task<FolderResult>(() =>
new FolderResult(key, serviceClient.GetStuff(input)))
// NOTE: This is not the async variant
Is better written as:
yield return Task.Run<FolderResult>(() =>
new FolderResult(key, serviceClient.GetStuff(input)));

Categories

Resources