Parallel.ForEach and async-await [duplicate]

Parallel.ForEach and async-await [duplicate] - c#

This question already has answers here:
Nesting await in Parallel.ForEach [duplicate]
(11 answers)
Closed last year.
I had such method:
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
foreach(var method in Methods)
{
string json = await Process(method);
result.Prop1 = PopulateProp1(json);
result.Prop2 = PopulateProp2(json);
}
return result;
}
Then I decided to use Parallel.ForEach:
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
Parallel.ForEach(Methods, async method =>
{
string json = await Process(method);
result.Prop1 = PopulateProp1(json);
result.Prop2 = PopulateProp2(json);
});
return result;
}
But now I've got an error:
An asynchronous module or handler completed while an asynchronous operation was still pending.

async doesn't work well with ForEach. In particular, your async lambda is being converted to an async void method. There are a number of reasons to avoid async void (as I describe in an MSDN article); one of them is that you can't easily detect when the async lambda has completed. ASP.NET will see your code return without completing the async void method and (appropriately) throw an exception.
What you probably want to do is process the data concurrently, just not in parallel. Parallel code should almost never be used on ASP.NET. Here's what the code would look like with asynchronous concurrent processing:
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
var tasks = Methods.Select(method => ProcessAsync(method)).ToArray();
string[] json = await Task.WhenAll(tasks);
result.Prop1 = PopulateProp1(json[0]);
...
return result;
}

.NET 6 finally added Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urlsToDownload = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://twitter.com/shahabfar"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urlsToDownload, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url, token);
// The request will be canceled in case of an error in another URL.
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});

Alternatively, with the AsyncEnumerator NuGet Package you can do this:
using System.Collections.Async;
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
await Methods.ParallelForEachAsync(async method =>
{
string json = await Process(method);
result.Prop1 = PopulateProp1(json);
result.Prop2 = PopulateProp2(json);
}, maxDegreeOfParallelism: 10);
return result;
}
where ParallelForEachAsync is an extension method.

Ahh, okay. I think I know what's going on now. async method => an "async void" which is "fire and forget" (not recommended for anything other than event handlers). This means the caller cannot know when it is completed... So, GetResult returns while the operation is still running. Although the technical details of my first answer are incorrect, the result is the same here: that GetResult is returning while the operations started by ForEach are still running. The only thing you could really do is not await on Process (so that the lambda is no longer async) and wait for Process to complete each iteration. But, that will use at least one thread pool thread to do that and thus stress the pool slightly--likely making use of ForEach pointless. I would simply not use Parallel.ForEach...

Related

Correct pattern to check if an async Task completed synchronously before awaiting it

I have a bunch of requests to process, some of which may complete synchronously.
I'd like to gather all results that are immediately available and return them early, while waiting for the rest.
Roughly like this:
List<Task<Result>> tasks = new ();
List<Result> results = new ();
foreach (var request in myRequests) {
var task = request.ProcessAsync();
if (task.IsCompleted)
results.Add(task.Result); // or Add(await task) ?
else
tasks.Add(task);
}
// send results that are available "immediately" while waiting for the rest
if (results.Count > 0) SendResults(results);
results = await Task.WhenAll(tasks);
SendResults(results);
I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted, or where it may change back to false again, etc.?
Similarly, could it be dangerous to use task.Result even after checking IsCompleted, should one always prefer await task? What if were using ValueTask instead of Task?

I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted...
If you're in a multithreaded context, it's possible that IsCompleted could return false at the moment when you check on it, but it completes immediately thereafter. In cases like the code you're using, the cost of this happening would be very low, so I wouldn't worry about it.
or where it may change back to false again, etc.?
No, once a Task completes, it cannot uncomplete.
could it be dangerous to use task.Result even after checking IsCompleted.
Nope, that should always be safe.
should one always prefer await task?
await is a great default when you don't have a specific reason to do something else, but there are a variety of use cases where other patterns might be useful. The use case you've highlighted is a good example, where you want to return the results of finished tasks without awaiting all of them.
As Stephen Cleary mentioned in a comment below, it may still be worthwhile to use await to maintain expected exception behavior. You might consider doing something more like this:
var requestsByIsCompleted = myRequests.ToLookup(r => r.IsCompleted);
// send results that are available "immediately" while waiting for the rest
SendResults(await Task.WhenAll(requestsByIsCompleted[true]));
SendResults(await Task.WhenAll(requestsByIsCompleted[false]));
What if were using ValueTask instead of Task?
The answers above apply equally to both types.

You could use code like this to continually send the results of completed tasks while waiting on others to complete.
foreach (var request in myRequests)
{
tasks.Add(request.ProcessAsync());
}
// wait for at least one task to be complete, then send all available results
while (tasks.Count > 0)
{
// wait for at least one task to complete
Task.WaitAny(tasks.ToArray());
// send results for each completed task
var completedTasks = tasks.Where(t => t.IsCompleted);
var results = completedTasks.Where(t => t.IsCompletedSuccessfully).Select(t => t.Result).ToList();
SendResults(results);
// TODO: handle completed but failed tasks here
// remove completed tasks from the tasks list and keep waiting
tasks.RemoveAll(t => completedTasks.Contains(t));
}

Using only await you can achieve the desired behavior:
async Task ProcessAsync(MyRequest request, Sender sender)
{
var result = await request.ProcessAsync();
await sender.SendAsync(result);
}
...
async Task ProcessAll()
{
var tasks = new List<Task>();
foreach(var request in requests)
{
var task = ProcessAsync(request, sender);
// Dont await until all requests are queued up
tasks.Add(task);
}
// Await on all outstanding requests
await Task.WhenAll(tasks);
}

There are already good answers, but in addition of them here is my suggestion too, on how to handle multiple tasks and process each task differently, maybe it will suit your needs. My example is with events, but you can replace them with some kind of state management that fits your needs.
public interface IRequestHandler
{
event Func<object, Task> Ready;
Task ProcessAsync();
}
public class RequestHandler : IRequestHandler
{
// Hier where you wraps your request:
// private object request;
private readonly int value;
public RequestHandler(int value)
=> this.value = value;
public event Func<object, Task> Ready;
public async Task ProcessAsync()
{
await Task.Delay(1000 * this.value);
// Hier where you calls:
// var result = await request.ProcessAsync();
//... then do something over the result or wrap the call in try catch for example
var result = $"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]";
if (this.Ready is not null)
{
// If result passes send the result to all subscribers
await this.Ready.Invoke($"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]");
}
}
}
static void Main()
{
var a = new RequestHandler(1);
a.Ready += PrintAsync;
var b = new RequestHandler(2);
b.Ready += PrintAsync;
var c = new RequestHandler(3);
c.Ready += PrintAsync;
var d= new RequestHandler(4);
d.Ready += PrintAsync;
var e = new RequestHandler(5);
e.Ready += PrintAsync;
var f = new RequestHandler(6);
f.Ready += PrintAsync;
var requests = new List<IRequestHandler>()
{
a, b, c, d, e, f
};
var tasks = requests
.Select(x => Task.Run(x.ProcessAsync));
// Hier you must await all of the tasks
Task
.Run(async () => await Task.WhenAll(tasks))
.Wait();
}
static Task PrintAsync(object output)
{
Console.WriteLine(output);
return Task.CompletedTask;
}

Calling Async method

I am trying calling the async method SendParkInfo method using the await operator like this
await Task.WhenAny(parkList);and await Task.WhenAny(parkInfo);
parkInfo has SendParkInfo method object
Here is some part of my code.
public async Task<AvailableParksResponse> GetAvailableParks(IEnumerable<string> parkRegionList)
{
//
var parkList = parkRegionList.Select(x => SendParkInfo(x)).ToList();
var parkListTask = await Task.WhenAny(parkList);
response.ParkInfoList = new List<Task<ParkInfo>> { parkListTask };
var parkInfo = SendParkInfo(id);
var parkTask = await Task.WhenAny(parkInfo);
response.ParkInfo = new List<Task<ParkInfo>> { parkTask };
//
}
public virtual async Task<ParkInfo> SendParkInfo(string id)
{
//
var apiResponse = await apiClient.GetAsync(RequestUri + id);
//
}
Is it ok to call SendParkInfo like the way I am calling and then using await operator with Task.WhenAny method. Or is there any better way of calling the Async SendParkInfo method
Any help or suggestion would be appreciated.
Thanks in advance

Task.WhenAny() will return a Task which is considered completed when at least one item in the list of tasks passed into WhenAny() has completed. This usually an appropriate option if you want to return data incrementally, as processing completes.
Alternatively, if your intent is to only return a result when all async tasks have completed, consider Task.WhenAll().

Task.Run on method returning T VS method returning Task<T>

Scenario 1 - For each website in string list (_websites), the caller method wraps GetWebContent into a task, waits for all the tasks to finish and return results.
private async Task<string[]> AsyncGetUrlStringFromWebsites()
{
List<Task<string>> tasks = new List<Task<string>>();
foreach (var website in _websites)
{
tasks.Add(Task.Run(() => GetWebsiteContent(website)));
}
var results = await Task.WhenAll(tasks);
return results;
}
private string GetWebContent(string url)
{
var client = new HttpClient();
var content = client.GetStringAsync(url);
return content.Result;
}
Scenario 2 - For each website in string list (_websites), the caller method calls GetWebContent (returns Task< string >), waits for all the tasks to finish and return the results.
private async Task<string[]> AsyncGetUrlStringFromWebsites()
{
List<Task<string>> tasks = new List<Task<string>>();
foreach (var website in _websites)
{
tasks.Add(GetWebContent(website));
}
var results = await Task.WhenAll(tasks);
return results;
}
private async Task<string> GetWebContent(string url)
{
var client = new HttpClient();
var content = await client.GetStringAsync(url);
return content;
}
Questions - Which way is the correct approach and why? How does each approach impact achieving asynchronous processing?

With Task.Run() you occupy a thread from the thread pool and tell it to wait until the web content has been received.
Why would you want to do that? Do you pay someone to stand next to your mailbox to tell you when a letter arrives?
GetStringAsync already is asynchronous. The cpu has nothing to do (with this process) while the content comes in over the network.
So the second approach is correct, no need to use extra threads from the thread pool here.
Always interesting to read: Stephen Cleary's "There is no thread"

#René Vogt gave a great explanation.
There a minor 5 cents from my side.
In the second example there is not need to use async / await in GetWebContent method. You can simply return Task<string> (this would also reduce async depth).

Async\Await methods

I am a new in the Async/Await functionality and I tried to use them in my MVC project.
So from the Controller I call the current method to initialize my model:
var model = this.quantService.GetIndexViewModel(companyIds, isMore, currentrole).Result;
In this GetIndexViewModel I use await:
public async Task<CompanyBoardViewModel> GetIndexViewModel(IEnumerable<int> parameter, bool isMore = false, bool currentRole = false)
{
return new CompanyBoardViewModel
{
TopJoinAplicant = await this.TopJointApplicant(parameter, isMore),
TopPriorityCountry = await this.TopPriorityCountry(parameter),
TopPublicationCountries = await this.TopPublicationCountries(parameter),
TopGrantedInventors = await this.TopGrantedInventors(parameter),
TopIPC = await this.TopIPC(parameter),
TopCPC = await this.TopCPC(parameter),
TopCitedInventors = await this.TopCitedInventors(parameter),
TopCitedPatents = await this.TopCitedPatents(parameter),
CGAR = await this.GetCGAR(parameter),
};
}
For the first method I use these code:
private async Task<QuantTableViewModel<TopFilterViewModel>> TopJointApplicant(IEnumerable<int> ids, bool isMore = false)
{
return await Task.Run(() => new QuantTableViewModel<TopFilterViewModel>
{
Tableid = "TopJointApplicants",
Title = "Top Joint Applicants",
FirstCol = "Position",
SecondCol = "Joint Applicant",
ThirdCol = "#",
IsSeeMore = isMore,
Data = this.cache.TopJointApplicant(ids).ToList()
});
}
In this method I call : Data = this.cache.TopJointApplicant(ids).ToList()
this method created a procedure and get information from the Database(the method is executed without any problems), but when I try to return the QuantTableViewModel<TopFilterViewModel> I stack(as I go in a death log).
I will be really happy if anyone know why this is happened.

I explain the deadlock you're seeing on my blog. In short, don't block on async code; instead, use async all the way.
But there are other problems with your approach. As others have noted, await Task.Run is an antipattern on ASP.NET. You may want to read my article on async ASP.NET.
Finally, one other tip: you're approaching the problem from the wrong direction. Instead of just choosing a method to "make async", you should first think about what your application is doing, and start converting I/O calls to async at the lowest level. Convert them to use async APIs instead of blocking APIs (i.e., no Task.Run). Then change their callers to async, and their callers to async, eventually changing your controller method(s) to async.

You don't need to use Task.Run when using the async/await pattern.

You can actually have an async controller so you don't need to call .Result which renders async operation to run synchronously.
something like:
public Task<ActionResult> Index(object parameter)
{
var model = await this.quantService.GetIndexViewModel(companyIds, isMore, currentRole);
return View(model);
}

In this case I would say that's it's enough that your public method is async, since there's not really any asyncronous going on in TopJointApplicant.
public async Task<CompanyBoardViewModel> GetIndexViewModel(IEnumerable<int> parameter, bool isMore = false, bool currentRole = false)
{
return new CompanyBoardViewModel
{
TopJoinAplicant = this.TopJointApplicant(parameter, isMore),
TopPriorityCountry = await this.TopPriorityCountry(parameter),
TopPublicationCountries = await this.TopPublicationCountries(parameter),
TopGrantedInventors = await this.TopGrantedInventors(parameter),
TopIPC = await this.TopIPC(parameter),
TopCPC = await this.TopCPC(parameter),
TopCitedInventors = await this.TopCitedInventors(parameter),
TopCitedPatents = await this.TopCitedPatents(parameter),
CGAR = await this.GetCGAR(parameter),
};
}
private QuantTableViewModel<TopFilterViewModel> TopJointApplicant(IEnumerable<int> ids, bool isMore = false)
{
var Data = this.cache.TopJointApplicant(ids).ToList();
return new QuantTableViewModel<TopFilterViewModel>
{
Tableid = "TopJointApplicants",
Title = "Top Joint Applicants",
FirstCol = "Position",
SecondCol = "Joint Applicant",
ThirdCol = "#",
IsSeeMore = isMore,
Data = this.cache.TopJointApplicant(ids).ToList()
});
}
I recommend you to fully embrace the await/async-pattern if you're going to use it. This means that your controller also should use await/async.
public class YourController : Controller
{
// Note the Task<ActionResult> and async in your controller.
public async Task<ActionResult> YourControllerMethod()
{
var model = await this.quantService.GetIndexViewModel(companyIds, isMore, currentrole);
return View(model); // Or something like this.
}
}
Also, consider your naming convention. For clarity, async methods should end with the suffix Async, such as GetIndexViewModelAsync.
EDIT:
Based on the comments I think I should clarify what await/async does. An operation will not execute faster simply because you use the await/async-pattern, rather the opposite. Async/await creates an overhead for the thread management which would likely cause your operation to execute slower.
Instead, there are 2 main advantages:
When using async/await you will not block the thread. This means that while you're application is waiting for something else (such as IO, DB or webservice call) the thread can be used for something else, such as executing another we request. This doesn't mean that an DB-call will be executed faster. But it will let the thread do something else while waiting. If you're using IIS the number of threads are limited. So instead of locking them with expensive IO, they can serve another request while waiting.
You may do more things at the same time. For example, you could send a request to you DB, while executing a slow webservice call at the same time. This may cause the total execution time to be faster, since you're doing more things at the same time. However, there are limitations. For instance, if you're using Entity Framework, only one thread may access the context at the time. But while waiting for the DB, you can do something else. For example:
public class MyThreadingClass
{
private Task ExecuteWebServiceCallAsync()
{
return await _myService.DoSomething();
}
private Task ExecuteDbQueryAsync()
{
return await _context.Customer.FirstOrDefaultAsync();
}
public void DoThingsWithWaitAll()
{
var tasks = new Task[2];
// Fire up first task.
tasks[0] = ExecuteWebServiceCallAsync();
// Fire up next task.
tasks[1] = ExecuteDbQueryAsync();
// Wait for all tasks.
Task.WaitAll(tasks);
}
public Task DoThingsWithWithAwaitAsync()
{
// Fire up first task.
var webServiceTask = ExecuteWebServiceCallAsync();
// Fire up next task.
var dbTask = ExecuteDbQueryAsync();
// Wait for all tasks.
await webServiceTask;
await dbTask;
}
}
So, to sum up. The reason why you should use await/async is when you can do it ALL THE WAY down to the execution of the slow operation (such as DB or webservice). Or if you wish to do several things at once.
In your particular case you can do something like this:
public async Task<CompanyBoardViewModel> GetIndexViewModel(IEnumerable<int> parameter, bool isMore = false, bool currentRole = false)
{
// Let the threads start processing.
var topApplicantTask = this.TopJointApplicant(parameter, isMore);
var topPriorityCountryTask = this.TopPriorityCountry(parameter);
var topPublicationContriesTask = this.TopPublicationCountries(parameter);
var topIPCTask = this.TopIPC(parameter);
var topCPCTask = this.TopCPC(parameter);
var topCitedInventorsTask = this.TopCitedInventors(parameter);
var topCitetPatentsTask = this.TopCitedPatents(parameter);
var getCGARTask = this.GetCGAR(parameter);
// Await them later.
return new CompanyBoardViewModel
{
TopJoinAplicant = await topApplicantTask,
TopPriorityCountry = await topPriorityCountryTask,
TopPublicationCountries = await topPublicationContriesTask,
TopGrantedInventors = await this.TopGrantedInventors(parameter),
TopIPC = await topIPCTask,
TopCPC = await topCPCTask,
TopCitedInventors = await topCitedInventorsTask,
TopCitedPatents = await topCitetPatentsTask,
CGAR = await getCGARTask,
};
}
But try to avoid Task.Run since it's considered an anti-pattern. Instead, try to use await/async all the way from the controller to the actual operation (DB, IO, webservice). Also, in the example above there's a lot of threading going on. It should likely be cleaned up a bit, but you can see it as a proof-of-concept more than the suggested solution.

Do I create a deadlock for Task.WhenAll()

I seem to be experiencing a deadlock with the following code, but I do not understand why.
From a certain point in code I call this method.
public async Task<SearchResult> Search(SearchData searchData)
{
var tasks = new List<Task<FolderResult>>();
using (var serviceClient = new Service.ServiceClient())
{
foreach (var result in MethodThatCallsWebservice(serviceClient, config, searchData))
tasks.Add(result);
return await GetResult(tasks);
}
Where GetResult is as following:
private static async Task<SearchResult> GetResult(IEnumerable<Task<FolderResult>> tasks)
{
var result = new SearchResult();
await Task.WhenAll(tasks).ConfigureAwait(false);
foreach (var taskResult in tasks.Select(p => p.MyResult))
{
foreach (var folder in taskResult.Result)
{
// Do stuff to fill result
}
}
return result;
}
The line var result = new SearchResult(); never completes, though the GUI is responsive because of the following code:
public async void DisplaySearchResult(Task<SearchResult> searchResult)
{
var result = await searchResult;
FillResultView(result);
}
This method is called via an event handler that called the Search method.
_view.Search += (sender, args) => _view.DisplaySearchResult(_model.Search(args.Value));
The first line of DisplaySearchResult gets called, which follows the path down to the GetResult method with the Task.WhenAll(...) part.
Why isn't the Task.WhenAll(...) ever completed? Did I not understand the use of await correctly?
If I run the tasks synchronously, I do get the result but then the GUI freezes:
foreach (var task in tasks)
task.RunSynchronously();
I read various solutions, but most were in combination with Task.WaitAll() and therefore did not help much. I also tried to use the help from this blogpost as you can see in DisplaySearchResult but I failed to get it to work.
Update 1:
The method MethodThatCallsWebservice:
private IEnumerable<Task<FolderResult>> MethodThatCallsWebservice(ServiceClient serviceClient, SearchData searchData)
{
// Doing stuff here to determine keys
foreach(var key in keys)
yield return new Task<FolderResult>(() => new FolderResult(key, serviceClient.GetStuff(input))); // NOTE: This is not the async variant
}

Since you have an asynchronous version of GetStuff (GetStuffAsync) it's much better to use it instead of offloading the synchronous GetStuff to a ThreadPool thread with Task.Run. This wastes threads and limits scalability.
async methods return a "hot" task so you don't need to call Start:
IEnumerable<Task<FolderResult>> MethodThatCallsWebservice(ServiceClient serviceClient, SearchData searchData)
{
return keys.Select(async key =>
new FolderResult(key, await serviceClient.GetStuffAsync(input)));
}

You need to start your tasks before you return them. Or even better use Task.Run.
This:
yield return new Task<FolderResult>(() =>
new FolderResult(key, serviceClient.GetStuff(input)))
// NOTE: This is not the async variant
Is better written as:
yield return Task.Run<FolderResult>(() =>
new FolderResult(key, serviceClient.GetStuff(input)));

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Parallel.ForEach and async-await [duplicate] - c#

Related

Correct pattern to check if an async Task completed synchronously before awaiting it

Calling Async method

Task.Run on method returning T VS method returning Task<T>

Async\Await methods

Do I create a deadlock for Task.WhenAll()

Categories

Resources