I've got an async method, GetExpensiveThing(), which performs some expensive I/O work. This is how I am using it:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = await GetExpensiveThing();
var second = await GetExpensiveThing();
return new List<Thing>() { first, second };
}
But since it's an expensive method, I want to execute these calls in in parallel. I would have thought moving the awaits would have solved this:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = GetExpensiveThing();
var second = GetExpensiveThing();
return new List<Thing>() { await first, await second };
}
That didn't work, so I wrapped them in some tasks and this works:
// Parallel execution
public async Task<List<Thing>> GetThings()
{
var first = Task.Run(() =>
{
return GetExpensiveThing();
});
var second = Task.Run(() =>
{
return GetExpensiveThing();
});
return new List<Thing>() { first.Result, second.Result };
}
I even tried playing around with awaits and async in and around the tasks, but it got really confusing and I had no luck.
Is there a better to run async methods in parallel, or are tasks a good approach?
Is there a better to run async methods in parallel, or are tasks a good approach?
Yes, the "best" approach is to utilize the Task.WhenAll method. However, your second approach should have ran in parallel. I have created a .NET Fiddle, this should help shed some light. Your second approach should actually be running in parallel. My fiddle proves this!
Consider the following:
public Task<Thing[]> GetThingsAsync()
{
var first = GetExpensiveThingAsync();
var second = GetExpensiveThingAsync();
return Task.WhenAll(first, second);
}
Note
It is preferred to use the "Async" suffix, instead of GetThings and GetExpensiveThing - we should have GetThingsAsync and GetExpensiveThingAsync respectively - source.
Task.WhenAll() has a tendency to become unperformant with large scale/amount of tasks firing simultaneously - without moderation/throttling.
If you are doing a lot of tasks in a list and wanting to await the final outcome, then I propose using a partition with a limit on the degree of parallelism.
I have modified Stephen Toub's blog elegant approach to modern LINQ:
public static Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> funcBody, int maxDoP = 4)
{
async Task AwaitPartition(IEnumerator<T> partition)
{
using (partition)
{
while (partition.MoveNext())
{
await Task.Yield(); // prevents a sync/hot thread hangup
await funcBody(partition.Current);
}
}
}
return Task.WhenAll(
Partitioner
.Create(source)
.GetPartitions(maxDoP)
.AsParallel()
.Select(p => AwaitPartition(p)));
}
How it works is simple, take an IEnumerable - dissect it into evenish partitions and the fire a function/method against each element, in each partition, at the same time. No more than one element in each partition at anyone time, but n Tasks in n partitions.
Extension Usage:
await myList.ParallelForEachAsync(myFunc, Environment.ProcessorCount);
Edit:
I now keep some overloads in a repository on Github if you need more options. It's in a NuGet too for NetStandard.
Edit 2: Thanks to comments from Theodor below, I was able to mitigate poorly written Async Tasks from blocking parallelism by using await Task.Yield();.
You can your the Task.WhenAll, which returns when all depending tasks are done
Check this question here for reference
If GetExpensiveThing is properly asynchronous (meaning it doesn't do any IO or CPU work synchronously), your second solution of invoking both methods and then awaiting the results should've worked. You could've also used Task.WhenAll.
However, if it isn't, you may get better results by posting each task to the thread-pool and using the Task.WhenAll combinator, e.g.:
public Task<IList<Thing>> GetThings() =>
Task.WhenAll(Task.Run(() => GetExpensiveThing()), Task.Run(() => GetExpensiveThing()));
(Note I changed the return type to IList to avoid awaits altogether.)
You should avoid using the Result property. It causes the caller thread to block and wait for the task to complete, unlike await or Task.WhenAll which use continuations.
Related
I've got a loop that needs to be run in parallel as each iteration is slow and processor intensive but I also need to call an async method as part of each iteration in the loop.
I've seen questions on how to handle an async method in the loop but not a combination of async and synchronous, which is what I've got.
My (simplified) code is as follows - I know this won't work properly due to the async action being passed to foreach.
protected IDictionary<int, ReportData> GetReportData()
{
var results = new ConcurrentDictionary<int, ReportData>();
Parallel.ForEach(requestData, async data =>
{
// process data synchronously
var processedData = ProcessData(data);
// get some data async
var reportRequest = await BuildRequestAsync(processedData);
// synchronous building
var report = reportRequest.BuildReport();
results.TryAdd(data.ReportId, report);
});
// This needs to be populated before returning
return results;
}
Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.
It's not a practical option to convert the synchronous functions to async.
I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.
I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach but have never used PLINQ and not sure if it would wait for all of the iterations to be completed before returning, i.e. would the Tasks still be running at the end of the process.
Is there any way to get execute the action in parallel when the action has to be async in order to await the single async call.
Yes, but you'll need to understand what Parallel gives you that you lose when you take alternative approaches. Specifically, Parallel will automatically determine the appropriate number of threads and adjust based on usage.
It's not a practical option to convert the synchronous functions to async.
For CPU-bound methods, you shouldn't convert them.
I don't want to split the action up and have a Parallel.ForEach followed by the async calls with a WhenAll and another Parallel.ForEach as the speed of each stage can vary greatly between different iterations so splitting it would be inefficient as the faster ones would be waiting for the slower ones before continuing.
The first recommendation I would make is to look into TPL Dataflow. It allows you to define a "pipeline" of sorts that keeps the data flowing through while limiting the concurrency at each stage.
I did wonder if a PLINQ ForAll could be used instead of the Parallel.ForEach
No. PLINQ is very similar to Parallel in how they work. There's a few differences over how aggressive they are at CPU utilization, and some API differences - e.g., if you have a collection of results coming out the end, PLINQ is usually cleaner than Parallel - but at a high-level view they're very similar. Both only work on synchronous code.
However, you could use a simple Task.Run with Task.WhenAll as such:
protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
var tasks = requestData.Select(async data => Task.Run(() =>
{
// process data synchronously
var processedData = ProcessData(data);
// get some data async
var reportRequest = await BuildRequestAsync(processedData);
// synchronous building
var report = reportRequest.BuildReport();
return (Key: data.ReportId, Value: report);
})).ToList();
var results = await Task.WhenAll(tasks);
return results.ToDictionary(x => x.Key, x => x.Value);
}
You may need to apply a concurrency limit (which Parallel would have done for you). In the asynchronous world, this would look like:
protected async Task<IDictionary<int, ReportData>> GetReportDataAsync()
{
var throttle = new SemaphoreSlim(10);
var tasks = requestData.Select(data => Task.Run(async () =>
{
await throttle.WaitAsync();
try
{
// process data synchronously
var processedData = ProcessData(data);
// get some data async
var reportRequest = await BuildRequestAsync(processedData);
// synchronous building
var report = reportRequest.BuildReport();
return (Key: data.ReportId, Value: report);
}
finally
{
throttle.Release();
}
})).ToList();
var results = await Task.WhenAll(tasks);
return results.ToDictionary(x => x.Key, x => x.Value);
}
I have a method that runs multiple async methods within it. I have to iterate over a list of devices, and pass the device to this method. I am noticing that this is taking a long time to complete so I am thinking of using Parallel.ForEach so it can run this process against multiple devices at the same time.
Let's say this is my method.
public async Task ProcessDevice(Device device) {
var dev = await _deviceService.LookupDeviceIndbAsNoTracking(device);
var result = await DoSomething(dev);
await DoSomething2(dev);
}
Then DoSomething2 also calls an async method.
public async Task DoSomething2(Device dev) {
foreach(var obj in dev.Objects) {
await DoSomething3(obj);
}
}
The list of devices continuously gets larger over time, so the more this list grows, the longer it takes the program to finish running ProcessDevice() against each device. I would like to process more than one device at a time. So I have been looking into using Parallel.ForEach.
Parallel.ForEach(devices, async device => {
try {
await ProcessDevice(device);
} catch (Exception ex) {
throw ex;
}
})
It appears that the program is finishing before the device is fully processed. I have also tried creating a list of tasks, and then foreach device, add a new task running ProcessDevice to that list and then awaiting Task.WhenAll(listOfTasks);
var listOfTasks = new List<Task>();
foreach(var device in devices) {
var task = Task.Run(async () => await ProcessDevice(device));
listOfTasks.Add(task);
}
await Task.WhenAll(listOfTasks);
But it appears that the task is marked as completed before ProcessDevice() is actually finished running.
Please excuse my ignorance on this issue as I am new to parallel processing and not sure what is going on. What is happening to cause this behavior and is there any documentation that you could provide that could help me better understand what to do?
You can't mix async with Parallel.ForEach. Since your underlying operation is asynchronous, you'd want to use asynchronous concurrency, not parallelism. Asynchronous concurrency is most easily expressed with WhenAll:
var listOfTasks = devices.Select(ProcessDevice).ToList();
await Task.WhenAll(listOfTasks);
In your last example there's a few problems:
var listOfTasks = new List<Task>();
foreach (var device in devices)
{
await Task.Run(async () => await ProcessDevice(device));
}
await Task.WhenAll(listOfTasks);
Doing await Task.Run(async () => await ProcessDevice(device)); means you are not moving to the next iteration of the foreach loop until the previous one is done. Essentially, you're still doing them one at a time.
Additionally, you aren't adding any tasks to listOfTasks so it remains empty and therefore Task.WhenAll(listOfTasks) completes instantly because there's no tasks to await.
Try this:
var listOfTasks = new List<Task>();
foreach (var device in devices)
{
var task = Task.Run(async () => await ProcessDevice(device))
listOfTasks.Add(task);
}
await Task.WhenAll(listOfTasks);
I can explain the problem with Parallel.ForEach. An important thing to understand is that when the await keyword acts on an incomplete Task, it returns. It will return its own incomplete Task if the method signature allows (if it's not void). Then it is up to the caller to use that Task object to wait for the job to finish.
But the second parameter in Parallel.ForEach is an Action<T>, which is a void method, which means no Task can be returned, which means the caller (Parallel.ForEach in this case) has no way to wait until the job has finished.
So in your case, as soon as it hits await ProcessDevice(device), it returns and nothing waits for it to finish so it starts the next iteration. By the time Parallel.ForEach is finished, all it has done is started all the tasks, but not waited for them.
So don't use Parallel.ForEach with asynchronous code.
Stephen's answer is more appropriate. You can also use WSC's answer, but that can be dangerous with larger lists. Creating hundreds or thousands of new threads all at once will not help your performance.
not very sure it this if what you are asking for, but I can give example of how we start async process
private readonly Func<Worker> _worker;
private void StartWorkers(IEnumerable<Props> props){
Parallel.ForEach(props, timestamp => { _worker.Invoke().Consume(timestamp); });
}
Would recommend reading about Parallel.ForEach as it will do some part for you.
I'm working on a problem where I have to delete records using a service call. The issue is that I have a for each loop where i have multiple await operations.This is making the operation take lot of time and performance is lacking
foreach(var a in list<long>b)
{
await _serviceresolver().DeleteOperationAsync(id,a)
}
The issue is that I have a for each loop where i have multiple await operations.
This is making the operation take lot of time and performance is lacking
The number one solution is to reduce the number of calls. This is often called "chunky" over "chatty". So if your service supports some kind of bulk-delete operation, then expose it in your service type and then you can just do:
await _serviceresolver().BulkDeleteOperationAsync(id, b);
But if that isn't possible, then you can at least use asynchronous concurrency. This is quite different from parallelism; you don't want to use Parallel or PLINQ.
var service = _serviceresolver();
var tasks = b.Select(a => service.DeleteOperationAsync(id, a)).ToList();
await Task.WhenAll(tasks);
I do not know what code is behind this DeleteOperationAsync, but for sure async/await isn't designed to speed things up. It was designated to "spare" threads (colloquially speaking)
The best would be to change the method to take as a parameter the whole list of ids - instead of taking and sending just one id.
And then to perform this async/await heavy operation only once for all of the ids.
If that is not possible, you could just run it in parallel using TPL (but it is ready the worst-case scenario - really:) )
Parallel.ForEach(listOfIdsToDelete,
async idToDelete => await _serviceresolver().DeleteOperationAsync(id,idToDelete)
);
You're waiting for each async operation to finish right now. If you can fire them all off concurrently, you can just call them without the await, or if you need to know when they finish, you can just fire them all off and then wait for them all to finish by tracking the tasks in a list:
List<Task> tasks = new List<Task>();
foreach (var a in List<long> b)
tasks.Add(_serviceresolveer().DeleteOperationAsync(id, a));
await Task.WhenAll(tasks);
You can use PLINQ (to leverage of all the processors of your machine) and the Task.WhenAll method (to no freeze the calling thread). In code, resulting something like this:
class Program {
static async Task Main(string[] args) {
var list = new List<long> {
4, 3, 2
};
var service = new Service();
var response =
from item in list.AsParallel()
select service.DeleteOperationAsync(item);
await Task.WhenAll(response);
}
}
public class Service {
public async Task DeleteOperationAsync(long value) {
await Task.Delay(2000);
Console.WriteLine($"Finished... {value}");
}
}
I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}
I am trying to get to grips with Tasks and the Async Await keywords. I have a little sample method which essentially invokes n number of methods. Two key points to note are
I don't care about the order the methods run in
All methods should be called on a background thread
With this here is the code.
public async void Handle<T>(T entry) {
await Task.Run(() => {
Parallel.ForEach(_handlers, pair => {
pair.Value.Invoke(_reference, new object[] {
entry
});
});
});
My question is did I actually gain any async or parallelism out of the code above?
I'm assuming that you're running in a UI application, since parallel code on a server is quite rare.
In this case, wrapping parallel code in a Task.Run is perfectly normal, and a common pattern when you wish to execute parallel code asynchronously.
I would make a few changes:
Avoid async void. Return a Task, so you can handle errors more cleanly.
Follow the Task-based Asynchronous Pattern naming conventions (i.e.,
end your method with Async).
Return the Task directly instead of async/await (if your entire async method is just to await a task, you can avoid some overhead just by returning the task directly).
Like this:
public Task HandleAsync<T>(T entry) {
return Task.Run(() => {
Parallel.ForEach(_handlers, pair => {
pair.Value.Invoke(_reference, new object[] {
entry
});
});
});
I'd also consider two other possibilities:
Consider using Parallel.Invoke instead of Parallel.ForEach. It seems to me that Parallel.Invoke is a closer match to what you're actually trying to do. (See svick's comment below).
Consider leaving the parallel code in a synchronous method (e.g., public void Handle<T>(T entry)) and calling it with Task.Run (e.g., await Task.Run(() => Handle(entry));). If your Handle[Async] method is intended to be part of a general-purpose library, then you want to avoid the use of Task.Run within your asynchronous methods.
No, you've gained nothing here. Even if you made it return Task (so that the caller could check when everything had finished) you'd be better off without async:
public Task Handle<T>(T entry)
{
return Task.Run(() =>
{
Parallel.ForEach(_handlers, pair =>
{
pair.Value.Invoke(_reference, new object[] { entry });
});
});
}