Sequential version of Task.WhenAll - c#

Is there a nonblocking Task.WaitAll similar to Task.WhenAll, but not parallel?
I wrote this, but maybe it’s built-in?
public async Task<IEnumerable<T>> AwaitAllAsync<T>(IEnumerable<Task<T>> tasks)
{
List<T> result = new List<T>();
foreach(var task in tasks)
{
result.Add(await task);
}
return result;
}
I want to know if there is a built-in way of waiting for all tasks to complete in async, but a sequential way.
Consider this code:
public class SaveFooCommandHandler : ICommandHandler<SaveFooCommand>
{
private readonly IBusinessContext context;
public SaveFooCommandHandler(IBusinessContext context)
{
this.context = context;
}
public async Task Handle(SaveFooCommand command)
{
var foos = (await Task.WhenAll(command.Foos.Select(foo => context.FindAsync<Foo>(foo.Id))).ToList()
...
}
}
That will fail, but
var foos = await context.AwaitAllAsync(command.Foos.Select(foo => context.FindAsync<Foo>(foo.Id));
will not, context.FindAsync is an abstraction of dbcontext.Set<T>().FindAsync
You could do await context.Set<Foo>().Where(f => command.Foos.Contains(f.Id)).ToListAsync(), but the example is simplified.

I think the core misunderstanding is around the Task type. In asynchronous code, a Task is always already running. So this doesn't make sense:
Is there a non-blocking Task.WaitAll similar to Task.WhenAll but not parallel concurrent?
If you have a collection of tasks, they're all already started.
I want to know if there is a build in way of waiting for all tasks to complete in async but sequential way.
You can, of course, await them sequentially. The standard pattern for this is to use await inside a foreach loop, just like the method you posted.
However, the only reason the sequential-await works is because your LINQ query is lazily evaluated. In particular, if you reify your task collection, it will fail. So this works:
var tasks = command.Foos.Select(foo => context.FindAsync<Foo>(foo.Id));
var foos = await context.AwaitAllAsync(tasks);
and this fails:
var tasks = command.Foos.Select(foo => context.FindAsync<Foo>(foo.Id))
.ToList();
var foos = await context.AwaitAllAsync(tasks);
Internally, Task.WhenAll reifies your task sequence so it knows how many tasks it needs to wait for.
But this is really beside the point. The real problem you're trying to solve is how to serially execute asynchronous code, which is most easily done using foreach:
var foos = new List<Foo>();
foreach (var fooId in command.Foos.Select(f => f.Id))
foos.Add(await context.FindAsync<Foo>(fooId));

Related

"Storing" a task for later completion

I'm trying to "store" an async task for later completion - I've found the async cache example but this is effectively caching task results in a concurrent dictionary so that their results can be reloaded without re-doing the task again (the HTML implementation is here).
Basically what I'm trying to design is a dictionary of tasks, with correlation IDs (GUIDs) as the key. This is for co-ordinating incoming results from another place (XML identified by the GUID correlation ID) and my aim is for the task to suspend execution until the results come in (probably from a queue).
Is this going to work? This is my first foray into proper async coding and I can't find anything similar to my hopeful solution so I may well be entirely on the right track.
Can I effectively "store" a task for later completion, with the task result being set at completion time?
Edit: I've just found out about TaskCompletionSource (based on this question) is that viable?
If I understand your use-case correctly, you can use TaskCompletionSource.
An example of implementation:
public class AsyncCache
{
private Dictionary<Guid, Task<string>> _cache;
public Task<string> GetAsync(Guid guid)
{
if (_cache.TryGetValue(guid, out var task))
{
// The value is either there or already queued
return task;
}
var tcs = new TaskCompletionSource<string>(TaskCreationOptions.RunContinuationsAsynchronously);
_queue.Enqueue(() => {
var result = LoadResult();
tcs.TrySetValue(result);
});
_cache.Add(guid, tcs.Task);
return tcs.Task;
}
}
Here, _queue is whatever queuing mechanism you're going to use to process the data.
Of course, you would also have to make that code thread-safe.
Are you thinking of lazy loading? You could use Lazy<Task> (which will initialise the task but not queue it to run).
var tasks = new Dictionary<Guid, Lazy<Task>>();
tasks.Add(Task1Guid, new Lazy<Task>(() => { whatever the 1st task is }));
tasks.Add(Task2Guid, new Lazy<Task>(() => { whatever the 2nd task is }));
void async RunTaskAsync(Guid guid)
{
await tasks[guid].Value;
}

Running async methods in parallel

I've got an async method, GetExpensiveThing(), which performs some expensive I/O work. This is how I am using it:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = await GetExpensiveThing();
var second = await GetExpensiveThing();
return new List<Thing>() { first, second };
}
But since it's an expensive method, I want to execute these calls in in parallel. I would have thought moving the awaits would have solved this:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = GetExpensiveThing();
var second = GetExpensiveThing();
return new List<Thing>() { await first, await second };
}
That didn't work, so I wrapped them in some tasks and this works:
// Parallel execution
public async Task<List<Thing>> GetThings()
{
var first = Task.Run(() =>
{
return GetExpensiveThing();
});
var second = Task.Run(() =>
{
return GetExpensiveThing();
});
return new List<Thing>() { first.Result, second.Result };
}
I even tried playing around with awaits and async in and around the tasks, but it got really confusing and I had no luck.
Is there a better to run async methods in parallel, or are tasks a good approach?
Is there a better to run async methods in parallel, or are tasks a good approach?
Yes, the "best" approach is to utilize the Task.WhenAll method. However, your second approach should have ran in parallel. I have created a .NET Fiddle, this should help shed some light. Your second approach should actually be running in parallel. My fiddle proves this!
Consider the following:
public Task<Thing[]> GetThingsAsync()
{
var first = GetExpensiveThingAsync();
var second = GetExpensiveThingAsync();
return Task.WhenAll(first, second);
}
Note
It is preferred to use the "Async" suffix, instead of GetThings and GetExpensiveThing - we should have GetThingsAsync and GetExpensiveThingAsync respectively - source.
Task.WhenAll() has a tendency to become unperformant with large scale/amount of tasks firing simultaneously - without moderation/throttling.
If you are doing a lot of tasks in a list and wanting to await the final outcome, then I propose using a partition with a limit on the degree of parallelism.
I have modified Stephen Toub's blog elegant approach to modern LINQ:
public static Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> funcBody, int maxDoP = 4)
{
async Task AwaitPartition(IEnumerator<T> partition)
{
using (partition)
{
while (partition.MoveNext())
{
await Task.Yield(); // prevents a sync/hot thread hangup
await funcBody(partition.Current);
}
}
}
return Task.WhenAll(
Partitioner
.Create(source)
.GetPartitions(maxDoP)
.AsParallel()
.Select(p => AwaitPartition(p)));
}
How it works is simple, take an IEnumerable - dissect it into evenish partitions and the fire a function/method against each element, in each partition, at the same time. No more than one element in each partition at anyone time, but n Tasks in n partitions.
Extension Usage:
await myList.ParallelForEachAsync(myFunc, Environment.ProcessorCount);
Edit:
I now keep some overloads in a repository on Github if you need more options. It's in a NuGet too for NetStandard.
Edit 2: Thanks to comments from Theodor below, I was able to mitigate poorly written Async Tasks from blocking parallelism by using await Task.Yield();.
You can your the Task.WhenAll, which returns when all depending tasks are done
Check this question here for reference
If GetExpensiveThing is properly asynchronous (meaning it doesn't do any IO or CPU work synchronously), your second solution of invoking both methods and then awaiting the results should've worked. You could've also used Task.WhenAll.
However, if it isn't, you may get better results by posting each task to the thread-pool and using the Task.WhenAll combinator, e.g.:
public Task<IList<Thing>> GetThings() =>
Task.WhenAll(Task.Run(() => GetExpensiveThing()), Task.Run(() => GetExpensiveThing()));
(Note I changed the return type to IList to avoid awaits altogether.)
You should avoid using the Result property. It causes the caller thread to block and wait for the task to complete, unlike await or Task.WhenAll which use continuations.

async i/o and process results as they become available

I has a simple console app where I want to call many Urls in a loop and put the result in a database table. I am using .Net 4.5 and using async i/o to fetch the URL data. Here is a simplified version of what I am doing. All methods are async except for the database operation. Do you guys see any issues with this? Are there better ways of optimizing?
private async Task Run(){
var items = repo.GetItems(); // sync method to get list from database
var tasks = new List<Task>();
// add each call to task list and process result as it becomes available
// rather than waiting for all downloads
foreach(Item item in items){
tasks.Add(GetFromWeb(item.url).ContinueWith(response => { AddToDatabase(response.Result);}));
}
await Task.WhenAll(tasks); // wait for all tasks to complete.
}
private async Task<string> GetFromWeb(url) {
HttpResponseMessage response = await GetAsync(url);
return await response.Content.ReadAsStringAsync();
}
private void AddToDatabase(string item){
// add data to database.
}
Your solution is acceptable. But you should check out TPL Dataflow, which allows you to set up a dataflow "mesh" (or "pipeline") and then shove the data through it.
For a problem this simple, Dataflow won't really add much other than getting rid of the ContinueWith (I always find manual continuations awkward). But if you plan to add more steps or change your data flow in the future, Dataflow should be something you consider.
Your solution is pretty much correct, with just two minor mistakes (both of which cause compiler errors). First, you don't call ContinueWith on the result of List.Add, you need call continue with on the task and then add the continuation to your list, this is solved by just moving a parenthesis. You also need to call Result on the reponse Task.
Here is the section with the two minor changes:
tasks.Add(GetFromWeb(item.url)
.ContinueWith(response => { AddToDatabase(response.Result);}));
Another option is to leverage a method that takes a sequence of tasks and orders them by the order that they are completed. Here is my implementation of such a method:
public static IEnumerable<Task<T>> Order<T>(this IEnumerable<Task<T>> tasks)
{
var taskList = tasks.ToList();
var taskSources = new BlockingCollection<TaskCompletionSource<T>>();
var taskSourceList = new List<TaskCompletionSource<T>>(taskList.Count);
foreach (var task in taskList)
{
var newSource = new TaskCompletionSource<T>();
taskSources.Add(newSource);
taskSourceList.Add(newSource);
task.ContinueWith(t =>
{
var source = taskSources.Take();
if (t.IsCanceled)
source.TrySetCanceled();
else if (t.IsFaulted)
source.TrySetException(t.Exception.InnerExceptions);
else if (t.IsCompleted)
source.TrySetResult(t.Result);
}, CancellationToken.None, TaskContinuationOptions.PreferFairness, TaskScheduler.Default);
}
return taskSourceList.Select(tcs => tcs.Task);
}
Using this your code can become:
private async Task Run()
{
IEnumerable<Item> items = repo.GetItems(); // sync method to get list from database
foreach (var task in items.Select(item => GetFromWeb(item.url))
.Order())
{
await task.ConfigureAwait(false);
AddToDatabase(task.Result);
}
}
Just though I'd throw in my hat as well with the Rx solution
using System.Reactive;
using System.Reactive.Linq;
private Task Run()
{
var fromWebObservable = from item in repo.GetItems.ToObservable(Scheduler.Default)
select GetFromWeb(item.url);
fromWebObservable
.Select(async x => await x)
.Do(AddToDatabase)
.ToTask();
}

Speculative execution using the TPL

I have a List<Task<bool>> that I want to enumerate in parallel finding the first task to complete with a result of true and not waiting for or observe exceptions on any of the other tasks still pending.
var tasks = new List<Task<bool>>
{
Task.Delay(2000).ContinueWith(x => false),
Task.Delay(0).ContinueWith(x => true),
};
I have tried to use PLINQ to do something like:
var task = tasks.AsParallel().FirstOrDefault(t => t.Result);
Which executes in parallel, but doesn't return as soon as it finds a satisfying result. because accessing the Result property is blocking. In order for this to work using PLINQ, I'd have to write this aweful statement:
var cts = new CancellationTokenSource();
var task = tasks.AsParallel()
.FirstOrDefault(t =>
{
try
{
t.Wait(cts.Token);
if (t.Result)
{
cts.Cancel();
}
return t.Result;
}
catch (OperationCanceledException)
{
return false;
}
} );
I've written up an extension method that yields tasks as they complete like so.
public static class Exts
{
public static IEnumerable<Task<T>> InCompletionOrder<T>(this IEnumerable<Task<T>> source)
{
var tasks = source.ToList();
while (tasks.Any())
{
var t = Task.WhenAny(tasks);
yield return t.Result;
tasks.Remove(t.Result);
}
}
}
// and run like so
var task = tasks.InCompletionOrder().FirstOrDefault(t => t.Result);
But it feels like this is something common enough that there is a better way. Suggestions?
Maybe something like this?
var tcs = new TaskCompletionSource<Task<bool>>();
foreach (var task in tasks)
{
task.ContinueWith((t, state) =>
{
if (t.Result)
{
((TaskCompletionSource<Task<bool>>)state).TrySetResult(t);
}
},
tcs,
TaskContinuationOptions.OnlyOnRanToCompletion |
TaskContinuationOptions.ExecuteSynchronously);
}
var firstTaskToComplete = tcs.Task;
Perhaps you could try the Rx.Net library. Its very good for in effect providing Linq to Work.
Try this snippet in LinqPad after you reference the Microsoft Rx.Net assemblies.
using System
using System.Linq
using System.Reactive.Concurrency
using System.Reactive.Linq
using System.Reactive.Threading.Tasks
using System.Threading.Tasks
void Main()
{
var tasks = new List<Task<bool>>
{
Task.Delay(2000).ContinueWith(x => false),
Task.Delay(0).ContinueWith(x => true),
};
var observable = (from t in tasks.ToObservable()
//Convert task to an observable
let o = t.ToObservable()
//SelectMany
from x in o
select x);
var foo = observable
.SubscribeOn(Scheduler.Default) //Run the tasks on the threadpool
.ToList()
.First();
Console.WriteLine(foo);
}
First, I don't understand why are you trying to use PLINQ here. Enumerating a list of Tasks shouldn't take long, so I don't think you're going to gain anything from parallelizing it.
Now, to get the first Task that already completed with true, you can use the (non-blocking) IsCompleted property:
var task = tasks.FirstOrDefault(t => t.IsCompleted && t.Result);
If you wanted to get a collection of Tasks, ordered by their completion, have a look at Stephen Toub's article Processing tasks as they complete. If you want to list those that return true first, you would need to modify that code. If you don't want to modify it, you can use a version of this approach from Stephen Cleary's AsyncEx library.
Also, in the specific case in your question, you could “fix” your code by adding .WithMergeOptions(ParallelMergeOptions.NotBuffered) to the PLINQ query. But doing so still wouldn't work most of the time and can waste threads a lot even when it does. That's because PLINQ uses a constant number of threads and partitioning and using Result would block those threads most of the time.

List of objects with async Task methods, execute all concurrently

Given the following:
BlockingCollection<MyObject> collection;
public class MyObject
{
public async Task<ReturnObject> DoWork()
{
(...)
return await SomeIOWorkAsync();
}
}
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit though (I believe the Task Factory/ThreadPool does some management here)?
You can make use of the WhenAll extension method.
var combinedTask = await Task.WhenAll(collection.Select(x => x.DoWork());
It will start all tasks concurrently and waits for all to finish.
ThreadPool manages the number of threads running, but that won't help you much with asynchronous Tasks.
Because of that, you need something else. One way to do this is to utilize ActionBlock from TPL Dataflow:
int limit = …;
IEnumerable<MyObject> collection = …;
var block = new ActionBlock<MyObject>(
o => o.DoWork(),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = limit });
foreach (var obj in collection)
block.Post(o);
block.Complete();
await block.Completion;
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit
The easiest way to do that is with Task.WhenAll:
ReturnObject[] results = await Task.WhenAll(collection.Select(x => x.DoWork()));
This will invoke DoWork on all MyObjects in the collection and then wait for them all to complete. The thread pool handles all throttling sensibly.
Is there a different way if I want to capture every individual DoWork() return immediately instead of waiting for all items to complete?
Yes, you can use the method described by Jon Skeet and Stephen Toub. I have a similar solution in my AsyncEx library (available via NuGet), which you can use like this:
// "tasks" is of type "Task<ReturnObject>[]"
var tasks = collection.Select(x => x.DoWork()).OrderByCompletion();
foreach (var task in tasks)
{
var result = await task;
...
}
My comment was a bit cryptic, so I though I'd add this answer:
List<Task<ReturnObject>> workTasks =
collection.Select( o => o.DoWork() ).ToList();
List<Task> resultTasks =
workTasks.Select( o => o.ContinueWith( t =>
{
ReturnObject r = t.Result;
// do something with the result
},
// if you want to run this on the UI thread
TaskScheduler.FromCurrentSynchronizationContext()
)
)
.ToList();
await Task.WhenAll( resultTasks );

Categories

Resources