Given the following:
BlockingCollection<MyObject> collection;
public class MyObject
{
public async Task<ReturnObject> DoWork()
{
(...)
return await SomeIOWorkAsync();
}
}
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit though (I believe the Task Factory/ThreadPool does some management here)?
You can make use of the WhenAll extension method.
var combinedTask = await Task.WhenAll(collection.Select(x => x.DoWork());
It will start all tasks concurrently and waits for all to finish.
ThreadPool manages the number of threads running, but that won't help you much with asynchronous Tasks.
Because of that, you need something else. One way to do this is to utilize ActionBlock from TPL Dataflow:
int limit = …;
IEnumerable<MyObject> collection = …;
var block = new ActionBlock<MyObject>(
o => o.DoWork(),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = limit });
foreach (var obj in collection)
block.Post(o);
block.Complete();
await block.Completion;
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit
The easiest way to do that is with Task.WhenAll:
ReturnObject[] results = await Task.WhenAll(collection.Select(x => x.DoWork()));
This will invoke DoWork on all MyObjects in the collection and then wait for them all to complete. The thread pool handles all throttling sensibly.
Is there a different way if I want to capture every individual DoWork() return immediately instead of waiting for all items to complete?
Yes, you can use the method described by Jon Skeet and Stephen Toub. I have a similar solution in my AsyncEx library (available via NuGet), which you can use like this:
// "tasks" is of type "Task<ReturnObject>[]"
var tasks = collection.Select(x => x.DoWork()).OrderByCompletion();
foreach (var task in tasks)
{
var result = await task;
...
}
My comment was a bit cryptic, so I though I'd add this answer:
List<Task<ReturnObject>> workTasks =
collection.Select( o => o.DoWork() ).ToList();
List<Task> resultTasks =
workTasks.Select( o => o.ContinueWith( t =>
{
ReturnObject r = t.Result;
// do something with the result
},
// if you want to run this on the UI thread
TaskScheduler.FromCurrentSynchronizationContext()
)
)
.ToList();
await Task.WhenAll( resultTasks );
Related
is there a way for an Asynchronous foreach in C#?
where id(s) will be processed asynchronously by the method, instead of using Parallel.ForEach
//This Gets all the ID(s)-IEnumerable <int>
var clientIds = new Clients().GetAllClientIds();
Parallel.ForEach(clientIds, ProcessId); //runs the method in parallel
static void ProcessId(int id)
{
// just process the id
}
should be something a foreach but runs asynchronously
foreach(var id in clientIds)
{
ProcessId(id) //runs the method with each Id asynchronously??
}
i'm trying to run the Program in Console, it should wait for all id(s) to complete processing before closing the Console.
No, it is not really possible.
Instead in foreach loop add what you want to do as Task for Task collection and later use Task.WaitAll.
var tasks = new List<Task>();
foreach(var something in somethings)
tasks.Add(DoJobAsync(something));
await Task.WhenAll(tasks);
Note that method DoJobAsync should return Task.
Update:
If your method does not return Task but something else (eg void) you have two options which are essentially the same:
1.Add Task.Run(action) to tasks collection instead
tasks.Add(Task.Run(() => DoJob(something)));
2.Wrap your sync method in method returning Task
private Task DoJobAsync(Something something)
{
return Task.Run(() => DoJob(something));
}
You can also use Task<TResult> generic if you want to receive some results from task execution.
Your target method would have to return a Task
static Task ProcessId(int id)
{
// just process the id
}
Processing ids would be done like this
// This Gets all the ID(s)-IEnumerable <int>
var clientIds = new Clients().GetAllClientIds();
// This gets all the tasks to be executed
var tasks = clientIds.Select(id => ProcessId(id)).
// this will create a task that will complete when all of the `Task`
// objects in an enumerable collection have completed.
await Task.WhenAll(tasks);
Now, in .NET 6 there is already a built-in Parallel.ForEachAsync
See:
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.parallel.foreachasync?view=net-6.0
https://www.hanselman.com/blog/parallelforeachasync-in-net-6
I have bunch of async methods, which I invoke from Dispatcher. The methods does not perform any work in the background, they just waits for some I/O operations, or wait for response from the webserver.
async Task FetchAsync()
{
// Prepare request in UI thread
var response = await new WebClient().DownloadDataTaskAsync();
// Process response in UI thread
}
now, I want to perform load tests, by calling multiple FetchAsync() in parallel with some max degree of parallelism.
My first attempt was using Paralell.Foreach(), but id does not work well with async/await.
var option = new ParallelOptions {MaxDegreeOfParallelism = 10};
Parallel.ForEach(UnitsOfWork, uow => uow.FetchAsync().Wait());
I've been looking at reactive extensions, but I'm still not able to take advantage of Dispatcher and async/await.
My goal is to not create separate thread for each FetchAsync(). Can you give me some hints how to do it?
Just call Fetchasync without awaiting each call and then use Task.WhenAll to await all of them together.
var tasks = new List<Task>();
var max = 10;
for(int i = 0; i < max; i++)
{
tasks.Add(FetchAsync());
}
await Task.WhenAll(tasks);
Here is a generic reusable solution to your question that you can reuse not only with your FetchAsync method but for any async method that has the same signature. The api includes real time concurrent throttling support as well:
Parameters are self explanatory:
totalRequestCount: is how many async requests (FatchAsync calls) you want to do in total, async processor is the FetchAsync method itself, maxDegreeOfParallelism is the optional nullable parameter. If you want real time concurrent throttling with max number of concurrent async requests, set it, otherwise not.
public static Task ForEachAsync(
int totalRequestCount,
Func<Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
IEnumerable<Task> tasks;
if (maxDegreeOfParallelism != null)
{
SemaphoreSlim throttler = new SemaphoreSlim(maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value);
tasks = Enumerable.Range(0, totalRequestCount).Select(async requestNumber =>
{
await throttler.WaitAsync();
try
{
await asyncProcessor().ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
}
else
{
tasks = Enumerable.Range(0, totalRequestCount).Select(requestNumber => asyncProcessor());
}
return Task.WhenAll(tasks);
}
I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}
Is there a nonblocking Task.WaitAll similar to Task.WhenAll, but not parallel?
I wrote this, but maybe it’s built-in?
public async Task<IEnumerable<T>> AwaitAllAsync<T>(IEnumerable<Task<T>> tasks)
{
List<T> result = new List<T>();
foreach(var task in tasks)
{
result.Add(await task);
}
return result;
}
I want to know if there is a built-in way of waiting for all tasks to complete in async, but a sequential way.
Consider this code:
public class SaveFooCommandHandler : ICommandHandler<SaveFooCommand>
{
private readonly IBusinessContext context;
public SaveFooCommandHandler(IBusinessContext context)
{
this.context = context;
}
public async Task Handle(SaveFooCommand command)
{
var foos = (await Task.WhenAll(command.Foos.Select(foo => context.FindAsync<Foo>(foo.Id))).ToList()
...
}
}
That will fail, but
var foos = await context.AwaitAllAsync(command.Foos.Select(foo => context.FindAsync<Foo>(foo.Id));
will not, context.FindAsync is an abstraction of dbcontext.Set<T>().FindAsync
You could do await context.Set<Foo>().Where(f => command.Foos.Contains(f.Id)).ToListAsync(), but the example is simplified.
I think the core misunderstanding is around the Task type. In asynchronous code, a Task is always already running. So this doesn't make sense:
Is there a non-blocking Task.WaitAll similar to Task.WhenAll but not parallel concurrent?
If you have a collection of tasks, they're all already started.
I want to know if there is a build in way of waiting for all tasks to complete in async but sequential way.
You can, of course, await them sequentially. The standard pattern for this is to use await inside a foreach loop, just like the method you posted.
However, the only reason the sequential-await works is because your LINQ query is lazily evaluated. In particular, if you reify your task collection, it will fail. So this works:
var tasks = command.Foos.Select(foo => context.FindAsync<Foo>(foo.Id));
var foos = await context.AwaitAllAsync(tasks);
and this fails:
var tasks = command.Foos.Select(foo => context.FindAsync<Foo>(foo.Id))
.ToList();
var foos = await context.AwaitAllAsync(tasks);
Internally, Task.WhenAll reifies your task sequence so it knows how many tasks it needs to wait for.
But this is really beside the point. The real problem you're trying to solve is how to serially execute asynchronous code, which is most easily done using foreach:
var foos = new List<Foo>();
foreach (var fooId in command.Foos.Select(f => f.Id))
foos.Add(await context.FindAsync<Foo>(fooId));
I have a List<Task<bool>> that I want to enumerate in parallel finding the first task to complete with a result of true and not waiting for or observe exceptions on any of the other tasks still pending.
var tasks = new List<Task<bool>>
{
Task.Delay(2000).ContinueWith(x => false),
Task.Delay(0).ContinueWith(x => true),
};
I have tried to use PLINQ to do something like:
var task = tasks.AsParallel().FirstOrDefault(t => t.Result);
Which executes in parallel, but doesn't return as soon as it finds a satisfying result. because accessing the Result property is blocking. In order for this to work using PLINQ, I'd have to write this aweful statement:
var cts = new CancellationTokenSource();
var task = tasks.AsParallel()
.FirstOrDefault(t =>
{
try
{
t.Wait(cts.Token);
if (t.Result)
{
cts.Cancel();
}
return t.Result;
}
catch (OperationCanceledException)
{
return false;
}
} );
I've written up an extension method that yields tasks as they complete like so.
public static class Exts
{
public static IEnumerable<Task<T>> InCompletionOrder<T>(this IEnumerable<Task<T>> source)
{
var tasks = source.ToList();
while (tasks.Any())
{
var t = Task.WhenAny(tasks);
yield return t.Result;
tasks.Remove(t.Result);
}
}
}
// and run like so
var task = tasks.InCompletionOrder().FirstOrDefault(t => t.Result);
But it feels like this is something common enough that there is a better way. Suggestions?
Maybe something like this?
var tcs = new TaskCompletionSource<Task<bool>>();
foreach (var task in tasks)
{
task.ContinueWith((t, state) =>
{
if (t.Result)
{
((TaskCompletionSource<Task<bool>>)state).TrySetResult(t);
}
},
tcs,
TaskContinuationOptions.OnlyOnRanToCompletion |
TaskContinuationOptions.ExecuteSynchronously);
}
var firstTaskToComplete = tcs.Task;
Perhaps you could try the Rx.Net library. Its very good for in effect providing Linq to Work.
Try this snippet in LinqPad after you reference the Microsoft Rx.Net assemblies.
using System
using System.Linq
using System.Reactive.Concurrency
using System.Reactive.Linq
using System.Reactive.Threading.Tasks
using System.Threading.Tasks
void Main()
{
var tasks = new List<Task<bool>>
{
Task.Delay(2000).ContinueWith(x => false),
Task.Delay(0).ContinueWith(x => true),
};
var observable = (from t in tasks.ToObservable()
//Convert task to an observable
let o = t.ToObservable()
//SelectMany
from x in o
select x);
var foo = observable
.SubscribeOn(Scheduler.Default) //Run the tasks on the threadpool
.ToList()
.First();
Console.WriteLine(foo);
}
First, I don't understand why are you trying to use PLINQ here. Enumerating a list of Tasks shouldn't take long, so I don't think you're going to gain anything from parallelizing it.
Now, to get the first Task that already completed with true, you can use the (non-blocking) IsCompleted property:
var task = tasks.FirstOrDefault(t => t.IsCompleted && t.Result);
If you wanted to get a collection of Tasks, ordered by their completion, have a look at Stephen Toub's article Processing tasks as they complete. If you want to list those that return true first, you would need to modify that code. If you don't want to modify it, you can use a version of this approach from Stephen Cleary's AsyncEx library.
Also, in the specific case in your question, you could “fix” your code by adding .WithMergeOptions(ParallelMergeOptions.NotBuffered) to the PLINQ query. But doing so still wouldn't work most of the time and can waste threads a lot even when it does. That's because PLINQ uses a constant number of threads and partitioning and using Result would block those threads most of the time.