How do I turn the following into a Parallel.ForEach?
public async void getThreadContents(String[] threads)
{
HttpClient client = new HttpClient();
List<String> usernames = new List<String>();
int i = 0;
foreach (String url in threads)
{
i++;
progressLabel.Text = "Scanning thread " + i.ToString() + "/" + threads.Count<String>();
HttpResponseMessage response = await client.GetAsync(url);
String content = await response.Content.ReadAsStringAsync();
String user;
Predicate<String> userPredicate;
foreach (Match match in regex.Matches(content))
{
user = match.Groups[1].ToString();
userPredicate = (String x) => x == user;
if (usernames.Find(userPredicate) != user)
{
usernames.Add(match.Groups[1].ToString());
}
}
progressBar1.PerformStep();
}
}
I coded it in the assumption that asynchronous and parallel processing would be the same, and I just realized it isn't. I took a look at all the questions I could find on this, and I really can't seem to find an example that does it for me. Most of them lack readable variable names. Using single-letter variable names which don't explain what they contain is a horrible way to state an example.
I normally have between 300 and 2000 entries in the array named threads (Contains URL's to forum threads) and it would seem that parallel processing (Due to the many HTTP requests) would speed up the execution).
Do I have to remove all the asynchrony (I got nothing async outside the foreach, only variable definitions) before I can use Parallel.ForEach? How should I go about doing this? Can I do this without blocking the main thread?
I am using .NET 4.5 by the way.
I coded it in the assumption that asynchronous and parallel processing would be the same
Asynchronous processing and parallel processing are quite different. If you don't understand the difference, I think you should first read more about it (for example what is the relation between Asynchronous and parallel programming in c#?).
Now, what you want to do is actually not that simple, because you want to process a big collection asynchronously, with a specific degree of parallelism (8). With synchronous processing, you could use Parallel.ForEach() (along with ParallelOptions to configure the degree of parallelism), but there is no simple alternative that would work with async.
In your code, this is complicated by the fact that you expect everything to execute on the UI thread. (Though ideally, you shouldn't access the UI directly from your computation. Instead, you should use IProgress, which would mean the code no longer has to execute on the UI thread.)
Probably the best way to do this in .Net 4.5 is to use TPL Dataflow. Its ActionBlock does exactly what you want, but it can be quite verbose (because it's more flexible than what you need). So it makes sense to create a helper method:
public static Task AsyncParallelForEach<T>(
IEnumerable<T> source, Func<T, Task> body,
int maxDegreeOfParallelism = DataflowBlockOptions.Unbounded,
TaskScheduler scheduler = null)
{
var options = new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism
};
if (scheduler != null)
options.TaskScheduler = scheduler;
var block = new ActionBlock<T>(body, options);
foreach (var item in source)
block.Post(item);
block.Complete();
return block.Completion;
}
In your case, you would use it like this:
await AsyncParallelForEach(
threads, async url => await DownloadUrl(url), 8,
TaskScheduler.FromCurrentSynchronizationContext());
Here, DownloadUrl() is an async Task method that processes a single URL (the body of your loop), 8 is the degree of parallelism (probably shouldn't be a literal constant in real code) and FromCurrentSynchronizationContext() makes sure the code executes on the UI thread.
Stephen Toub has a good blog post on implementing a ForEachAsync. Svick's answer is quite good for platforms on which Dataflow is available.
Here's an alternative, using the partitioner from the TPL:
public static Task ForEachAsync<T>(this IEnumerable<T> source,
int degreeOfParallelism, Func<T, Task> body)
{
var partitions = Partitioner.Create(source).GetPartitions(degreeOfParallelism);
var tasks = partitions.Select(async partition =>
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
});
return Task.WhenAll(tasks);
}
You can then use this as such:
public async Task getThreadContentsAsync(String[] threads)
{
HttpClient client = new HttpClient();
ConcurrentDictionary<String, object> usernames = new ConcurrentDictionary<String, object>();
await threads.ForEachAsync(8, async url =>
{
HttpResponseMessage response = await client.GetAsync(url);
String content = await response.Content.ReadAsStringAsync();
String user;
foreach (Match match in regex.Matches(content))
{
user = match.Groups[1].ToString();
usernames.TryAdd(user, null);
}
progressBar1.PerformStep();
});
}
Yet another alternative is using SemaphoreSlim or AsyncSemaphore (which is included in my AsyncEx library and supports many more platforms than SemaphoreSlim):
public async Task getThreadContentsAsync(String[] threads)
{
SemaphoreSlim semaphore = new SemaphoreSlim(8);
HttpClient client = new HttpClient();
ConcurrentDictionary<String, object> usernames = new ConcurrentDictionary<String, object>();
await Task.WhenAll(threads.Select(async url =>
{
await semaphore.WaitAsync();
try
{
HttpResponseMessage response = await client.GetAsync(url);
String content = await response.Content.ReadAsStringAsync();
String user;
foreach (Match match in regex.Matches(content))
{
user = match.Groups[1].ToString();
usernames.TryAdd(user, null);
}
progressBar1.PerformStep();
}
finally
{
semaphore.Release();
}
}));
}
You can try the ParallelForEachAsync extension method from AsyncEnumerator NuGet Package:
using System.Collections.Async;
public async void getThreadContents(String[] threads)
{
HttpClient client = new HttpClient();
List<String> usernames = new List<String>();
int i = 0;
await threads.ParallelForEachAsync(async url =>
{
i++;
progressLabel.Text = "Scanning thread " + i.ToString() + "/" + threads.Count<String>();
HttpResponseMessage response = await client.GetAsync(url);
String content = await response.Content.ReadAsStringAsync();
String user;
Predicate<String> userPredicate;
foreach (Match match in regex.Matches(content))
{
user = match.Groups[1].ToString();
userPredicate = (String x) => x == user;
if (usernames.Find(userPredicate) != user)
{
usernames.Add(match.Groups[1].ToString());
}
}
// THIS CALL MUST BE THREAD-SAFE!
progressBar1.PerformStep();
},
maxDegreeOfParallelism: 8);
}
Related
I would like to handle a collection in parallel, but I'm having trouble implementing it and I'm therefore hoping for some help.
The trouble arises if I want to call a method marked async in C#, within the lambda of the parallel loop. For example:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
var count = bag.Count;
The problem occurs with the count being 0, because all the threads created are effectively just background threads and the Parallel.ForEach call doesn't wait for completion. If I remove the async keyword, the method looks like this:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
// some pre stuff
var responseTask = await GetData(item);
responseTask.Wait();
var response = responseTask.Result;
bag.Add(response);
// some post stuff
}
var count = bag.Count;
It works, but it completely disables the await cleverness and I have to do some manual exception handling.. (Removed for brevity).
How can I implement a Parallel.ForEach loop, that uses the await keyword within the lambda? Is it possible?
The prototype of the Parallel.ForEach method takes an Action<T> as parameter, but I want it to wait for my asynchronous lambda.
If you just want simple parallelism, you can do this:
var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;
If you need something more complex, check out Stephen Toub's ForEachAsync post.
You can use the ParallelForEachAsync extension method from AsyncEnumerator NuGet Package:
using Dasync.Collections;
var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;
Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.
One of the new .NET 6 APIs is Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urls = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://stackoverflow.com"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urls, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Another example in Scott Hanselman's blog.
The source, for reference.
With SemaphoreSlim you can achieve parallelism control.
var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
await throttler.WaitAsync();
try
{
var response = await GetData(item);
bag.Add(response);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
var count = bag.Count;
Simplest possible extension method compiled from other answers and the article referenced by the accepted asnwer:
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync();
try
{
await asyncAction(item).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
UPDATE: here's a simple modification that also supports a cancellation token like requested in the comments (untested)
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, CancellationToken, Task> asyncAction, int maxDegreeOfParallelism, CancellationToken cancellationToken)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync(cancellationToken);
if (cancellationToken.IsCancellationRequested) return;
try
{
await asyncAction(item, cancellationToken).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
My lightweight implementation of ParallelForEach async.
Features:
Throttling (max degree of parallelism).
Exception handling (aggregation exception will be thrown at completion).
Memory efficient (no need to store the list of tasks).
public static class AsyncEx
{
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
{
var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
var tcs = new TaskCompletionSource<object>();
var exceptions = new ConcurrentBag<Exception>();
bool addingCompleted = false;
foreach (T item in source)
{
await semaphoreSlim.WaitAsync();
asyncAction(item).ContinueWith(t =>
{
semaphoreSlim.Release();
if (t.Exception != null)
{
exceptions.Add(t.Exception);
}
if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
{
tcs.TrySetResult(null);
}
});
}
Volatile.Write(ref addingCompleted, true);
await tcs.Task;
if (exceptions.Count > 0)
{
throw new AggregateException(exceptions);
}
}
}
Usage example:
await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
var data = await GetData(i);
}, maxDegreeOfParallelism: 100);
I've created an extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
In the accepted answer the ConcurrentBag is not required.
Here's an implementation without it:
var tasks = myCollection.Select(GetData).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result);
Any of the "// some pre stuff" and "// some post stuff" can go into the GetData implementation (or another method that calls GetData)
Aside from being shorter, there's no use of an "async void" lambda, which is an anti pattern.
The following is set to work with IAsyncEnumerable but can be modified to use IEnumerable by just changing the type and removing the "await" on the foreach. It's far more appropriate for large sets of data than creating countless parallel tasks and then awaiting them all.
public static async Task ForEachAsyncConcurrent<T>(this IAsyncEnumerable<T> enumerable, Func<T, Task> action, int maxDegreeOfParallelism, int? boundedCapacity = null)
{
ActionBlock<T> block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = boundedCapacity ?? maxDegreeOfParallelism * 3
});
await foreach (T item in enumerable)
{
await block.SendAsync(item).ConfigureAwait(false);
}
block.Complete();
await block.Completion;
}
For a more simple solution (not sure if the most optimal), you can simply nest Parallel.ForEach inside a Task - as such
var options = new ParallelOptions { MaxDegreeOfParallelism = 5 }
Task.Run(() =>
{
Parallel.ForEach(myCollection, options, item =>
{
DoWork(item);
}
}
The ParallelOptions will do the throttlering for you, out of the box.
I am using it in a real world scenario to run a very long operations in the background. These operations are called via HTTP and it was designed not to block the HTTP call while the long operation is running.
Calling HTTP for long background operation.
Operation starts at the background.
User gets status ID which can be used to check the status using another HTTP call.
The background operation update its status.
That way, the CI/CD call does not timeout because of long HTTP operation, rather it loops the status every x seconds without blocking the process
Scenario 1 - For each website in string list (_websites), the caller method wraps GetWebContent into a task, waits for all the tasks to finish and return results.
private async Task<string[]> AsyncGetUrlStringFromWebsites()
{
List<Task<string>> tasks = new List<Task<string>>();
foreach (var website in _websites)
{
tasks.Add(Task.Run(() => GetWebsiteContent(website)));
}
var results = await Task.WhenAll(tasks);
return results;
}
private string GetWebContent(string url)
{
var client = new HttpClient();
var content = client.GetStringAsync(url);
return content.Result;
}
Scenario 2 - For each website in string list (_websites), the caller method calls GetWebContent (returns Task< string >), waits for all the tasks to finish and return the results.
private async Task<string[]> AsyncGetUrlStringFromWebsites()
{
List<Task<string>> tasks = new List<Task<string>>();
foreach (var website in _websites)
{
tasks.Add(GetWebContent(website));
}
var results = await Task.WhenAll(tasks);
return results;
}
private async Task<string> GetWebContent(string url)
{
var client = new HttpClient();
var content = await client.GetStringAsync(url);
return content;
}
Questions - Which way is the correct approach and why? How does each approach impact achieving asynchronous processing?
With Task.Run() you occupy a thread from the thread pool and tell it to wait until the web content has been received.
Why would you want to do that? Do you pay someone to stand next to your mailbox to tell you when a letter arrives?
GetStringAsync already is asynchronous. The cpu has nothing to do (with this process) while the content comes in over the network.
So the second approach is correct, no need to use extra threads from the thread pool here.
Always interesting to read: Stephen Cleary's "There is no thread"
#René Vogt gave a great explanation.
There a minor 5 cents from my side.
In the second example there is not need to use async / await in GetWebContent method. You can simply return Task<string> (this would also reduce async depth).
I'm having trouble trying to correctly architect the most efficient way to iterate several async tasks launched from a request object and then performing some other async tasks that depend on both the request object and the result of the first async task. I'm running a C# lambda function in AWS. I've tried a model like this (error handling and such has been omitted for brevity):
public async Task MyAsyncWrapper()
{
List<Task> Tasks = new List<Task>();
foreach (var Request in Requests)
{
var Continuation = this.ExecuteAsync(Request).ContinueWith(async x => {
var KeyValuePair<bool, string> Result = x.Result;
if (Result.Key == true)
{
await this.DoSomethingElseAsync(Request.Id, Request.Name, Result.Value);
Console.WriteLine("COMPLETED");
}
}
Tasks.Add(Continuation);
}
Task.WaitAll(Tasks.ToArray());
}
This approach results in the DoSomethingElseAsync() method not really getting awaited on and in a lot of my Lambda Function calls, I never get the "COMPLETED" output. I've also approached this in this method:
public async Task MyAsyncWrapper()
{
foreach (var Request in Requests)
{
KeyValuePair<bool, string> Result = await this.ExecuteAsync(Request);
if (Result.Key == true)
{
await this.DoSomethingElseAsync(Request.Id, Request.Name, Result.Value);
Console.WriteLine("COMPLETED");
}
}
}
This works, but I think it's wasteful, since I can only execute one iteration of the loop while waiting on the asnyc's to finish. I also have referenced Interleaved Tasks but the issue is that I basically have two loops, one to populate the tasks, and another to iterate them after they've completed, where I don't have access to the original Request object anymore. So basically this:
List<Task<KeyValuePair<bool, string>>> Tasks = new List<Task<KeyValuePair<bool, string>>>();
foreach (var Request in Requests)
{
Tasks.Add(ths.ExecuteAsync(Request);
}
foreach (Task<KeyValuePair<bool, string>> ResultTask in Tasks.Interleaved())
{
KeyValuePair<bool, string> Result = ResultTask.Result;
//Can't access the original request for this method's parameters
await this.DoSomethingElseAsync(???, ???, Result.Value);
}
Any ideas on better ways to implement this type of async chaining in a foreach loop? My ideal approach wouldn't be to return the request object back as part of the response from ExecuteAsync(), so I'd like to try and find other options if possible.
I may be misinterpreting, but why not move your "iteration" into it's own function and then use Task.WhenAll to wait for all iterations in parallel.
public async Task MyAsyncWrapper()
{
var allTasks = Requests.Select(ProcessRequest);
await Task.WhenAll(allTasks);
}
private async Task ProcessRequest(Request request)
{
KeyValuePair<bool, string> Result = await this.ExecuteAsync(request);
if (Result.Key == true)
{
await this.DoSomethingElseAsync(request.Id, request.Name, Result.Value);
Console.WriteLine("COMPLETED");
}
}
Consider using TPL dataflow:
var a = new TransformBlock<Input, OutputA>(async Input i=>
{
// do something async.
return new OutputA();
});
var b = new TransformBlock<OutputA, OutputB>(async OutputA i =>
{
// do more async.
return new OutputB();
});
var c = new ActionBlock<OutputB>(async OutputB i =>
{
// do some final async.
});
a.LinkTo(b, new DataflowLinkOptions { PropogateCompletion = true });
b.LinkTo(c, new DataflowLinkOptions { PropogateCompletion = true });
// push all of the items into the dataflow.
a.Post(new Input());
a.Complete();
// wait for it all to complete.
await c.Completion;
I want to avoid application crashing problem due to parallel for loop and httpclient but I am unable to apply solutions that are provided elsewhere on the web due to my limited knowledge of programming. My code is pasted below.
class Program
{
public static List<string> words = new List<string>();
public static int count = 0;
public static string output = "";
private static HttpClient Client = new HttpClient();
public static void Main(string[] args)
{
//input path strings...
List<string> links = new List<string>();
links.AddRange(File.ReadAllLines(input));
List<string> longList = new List<string>(File.ReadAllLines(#"a.txt"));
words.AddRange(File.ReadAllLines(output1));
System.Net.ServicePointManager.DefaultConnectionLimit = 8;
count = longList.Count;
//for (int i = 0; i < longList.Count; i++)
Task.Run(() => Parallel.For(0, longList.Count, new ParallelOptions { MaxDegreeOfParallelism = 5 }, (i, loopState) =>
{
Console.WriteLine(i);
string link = #"some link" + longList[i] + "/";
try
{
if (!links.Contains(link))
{
Task.Run(async () => { await Download(link); }).Wait();
}
}
catch (System.Exception e)
{
}
}));
//}
}
public static async Task Download(string link)
{
HtmlAgilityPack.HtmlDocument document = new HtmlDocument();
document.LoadHtml(await getURL(link));
//...stuff with html agility pack
}
public static async Task<string> getURL(string link)
{
string result = "";
HttpResponseMessage response = await Client.GetAsync(link);
Console.WriteLine(response.StatusCode);
if(response.IsSuccessStatusCode)
{
HttpContent content = response.Content;
var bytes = await response.Content.ReadAsByteArrayAsync();
result = Encoding.UTF8.GetString(bytes);
}
return result;
}
}
There are solutions for example this one, but I don't know how to put await keyword in my main method, and currently the program simply exits due to its absence before Task.Run(). As you can see I have already applied a workaround regarding async Download() method to call it in main method.
I have also doubts regarding the use of same instance of httpclient in different parallel threads. Please advise me whether I should create new instance of httpclient each time.
You're right that you have to block tasks somewhere in a console application, otherwise the program will just exit before it's complete. But you're doing this more than you need to. Aim for just blocking the main thread and delegating the rest to an async method. A good practice is to create a method with a signature like private async Task MainAsyc(args), put the "guts" of your program logic there, call it from Main like this:
MainAsync(args).Wait();
In your example, move everything from Main to MainAsync. Then you're free to use await as much as you want. Task.Run and Parallel.For are explicitly consuming new threads for I/O bound work, which is unnecessary in the async world. Use Task.WhenAll instead. The last part of your MainAsync method should end up looking something like this:
await Task.WhenAll(longList.Select(async s => {
Console.WriteLine(i);
string link = #"some link" + s + "/";
try
{
if (!links.Contains(link))
{
await Download(link);
}
}
catch (System.Exception e)
{
}
}));
There is one little wrinkle here though. Your example is throttling the parallelism at 5. If you find you still need this, TPL Dataflow is a great library for throttled parallelism in the async world. Here's a simple example.
Regarding HttpClient, using a single instance across threads is completely safe and highly encouraged.
This question already has answers here:
Nesting await in Parallel.ForEach [duplicate]
(11 answers)
Closed last year.
I had such method:
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
foreach(var method in Methods)
{
string json = await Process(method);
result.Prop1 = PopulateProp1(json);
result.Prop2 = PopulateProp2(json);
}
return result;
}
Then I decided to use Parallel.ForEach:
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
Parallel.ForEach(Methods, async method =>
{
string json = await Process(method);
result.Prop1 = PopulateProp1(json);
result.Prop2 = PopulateProp2(json);
});
return result;
}
But now I've got an error:
An asynchronous module or handler completed while an asynchronous operation was still pending.
async doesn't work well with ForEach. In particular, your async lambda is being converted to an async void method. There are a number of reasons to avoid async void (as I describe in an MSDN article); one of them is that you can't easily detect when the async lambda has completed. ASP.NET will see your code return without completing the async void method and (appropriately) throw an exception.
What you probably want to do is process the data concurrently, just not in parallel. Parallel code should almost never be used on ASP.NET. Here's what the code would look like with asynchronous concurrent processing:
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
var tasks = Methods.Select(method => ProcessAsync(method)).ToArray();
string[] json = await Task.WhenAll(tasks);
result.Prop1 = PopulateProp1(json[0]);
...
return result;
}
.NET 6 finally added Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urlsToDownload = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://twitter.com/shahabfar"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urlsToDownload, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url, token);
// The request will be canceled in case of an error in another URL.
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Alternatively, with the AsyncEnumerator NuGet Package you can do this:
using System.Collections.Async;
public async Task<MyResult> GetResult()
{
MyResult result = new MyResult();
await Methods.ParallelForEachAsync(async method =>
{
string json = await Process(method);
result.Prop1 = PopulateProp1(json);
result.Prop2 = PopulateProp2(json);
}, maxDegreeOfParallelism: 10);
return result;
}
where ParallelForEachAsync is an extension method.
Ahh, okay. I think I know what's going on now. async method => an "async void" which is "fire and forget" (not recommended for anything other than event handlers). This means the caller cannot know when it is completed... So, GetResult returns while the operation is still running. Although the technical details of my first answer are incorrect, the result is the same here: that GetResult is returning while the operations started by ForEach are still running. The only thing you could really do is not await on Process (so that the lambda is no longer async) and wait for Process to complete each iteration. But, that will use at least one thread pool thread to do that and thus stress the pool slightly--likely making use of ForEach pointless. I would simply not use Parallel.ForEach...