Parallelizing execution with Task.Run

Parallelizing execution with Task.Run - c#

I am trying to improve performane of some code which does some shopping function calling number of different vendors. 3rd party vendor call is async and results are processed to generate a result. Strucure of the code is as follows.
public async Task<List<ShopResult>> DoShopping(IEnumerable<Vendor> vendors)
{
var res = vendors.Select(async s => await DoShopAndProcessResultAsync(s));
await Task.WhenAll(res); ....
}
Since DoShopAndProcessResultAsync is both IO bound and CPU bound, and each vendor iteration is independant I think Task.Run can be used to do something like below.
public async Task<List<ShopResult>> DoShopping(IEnumerable<Vendor> vendors)
{
var res = vendors.Select(s => Task.Run(() => DoShopAndProcessResultAsync(s)));
await Task.WhenAll(res); ...
}
Using Task.Run as is having a performance gain and I can see multiple threads are being involved here from the order of execution of the calls. And it is running without any issue locally on my machine.
However, it is a tasks of tasks scenario and wondering whether any pitfalls or this is deadlock prone in a high traffic prod environment.
What are your opinions on the approach of using Task.Run to parallelize async calls?

Tasks are .NET's low-level building blocks. .NET almost always has a better high-level abstraction for specific concurrency paradigms.
To paraphrase Rob Pike (slides) Concurrency is not parallelism is not asynchronous execution. What you ask is concurrent execution, with a specific degree-of-parallelism. NET already offers high-level classes that can do that, without resorting to low-level task handling.
At the end, I explain why these distinctions matter and how they're implemented using different .NET classes or libraries
Dataflow blocks
At the highest level, the Dataflow classes allow creating a pipeline of processing blocks similar to a Powershell or Bash pipeline, where each block can use one or more tasks to process input. Dataflow blocks preserve message order, ensuring results are emitted in the order the input messages were received.
You'll often see combinations of block called meshes, not pipelines. Dataflow grew out of the Microsoft Robotics Framework and can be used to create a network of independent processing blocks. Most programmers just use to build a pipeline of steps though.
In your case, you could use a TransformBlock to execute DoShopAndProcessResultAsync and feed the output either to another processing block, or a BufferBlock you can read after processing all results. You could even split Shop and Process into separate blocks, each with its own logic and degree of parallelism
Eg.
var buffer=new BufferBlock<ShopResult>();
var blockOptions=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism=3,
BoundedCapacity=1
};
var shop=new TransformBlock<Vendor,ShopResult)(DoShopAndProcessResultAsync,
blockOptions);
var linkOptions=new DataflowLinkOptions{ PropagateCompletion=true;}
shop.LinkTo(buffer,linkOptions);
foreach(var v in vendors)
{
await shop.SendAsync(v);
}
shop.Complete();
await shop.Completion;
buffer.TryReceiveAll(out IList<ShopResult> results);
You can use two separate blocks to shop and process :
var shop=new TransformBlock<Vendor,ShopResponse>(DoShopAsync,shopOptions);
var process=new TransformBlock<ShopResponse,ShopResult>(DoProcessAsync,processOptions);
shop.LinkTo(process,linkOptions);
process.LinkTo(results,linkOptions);
foreach(var v in vendors)
{
await shop.SendAsync(v);
}
shop.Complete();
await process.Completion;
In this case we await the completion of the last block in the chain before reading the results.
Instead of reading from a buffer block, we could use an ActionBlock at the end to do whatever we want to do with the results, eg store them to a database. The results can be batched using a BatchBlock to reduce the number of storage operations
...
var batch=new BatchBlock<ShopResult>(100);
var store=new ActionBlock<ShopResult[]>(DoStoreAsync);
shop.LinkTo(process,linkOptions);
process.LinkTo(batch,linkOptions);
batch.LinkTo(store,linkOptions);
...
shop.Complete();
await store.Completion;
Why do names matter
Tasks are the lowest level building blocks used to implement multiple paradigms. In other languages you'd see them described as Futures or Promises (eg Javascript)
Parallelism in .NET means executing CPU-bound computations over a lot of data using all available cores. Parallel.ForEach will partition the input data into roughly as many partitions as there are cores and use one worker task per partition. PLINQ goes one step further, allowing the use of LINQ operators to specify the computation and let PLINQ to use algorithms optimized for parallel execution to map, filter, sort, group and collect results. That's why Parallel.ForEach can't be used for async work at all.
Concurrency means executing multiple independent and often IO-bound jobs. At the lowest level you can use Tasks but Dataflow, Rx.NET, Channels, IAsyncEnumerable etc allow the use of high-level patterns like CSP/Pipelines, event stream processing etc
Asynchronous execution means you don't have to block while waiting for I/O-bound work to complete.

What is alarming with the Task.Run approach in your question, is that it depletes the ThreadPool from available worker threads in a non-controlled manner. It doesn't offer any configuration option that would allow you to reduce the parallelism of each individual request, in favor of preserving the scalability of the whole service. That's something that might bite you in the long run.
Ideally you would like to control both the parallelism and the concurrency, and control them independently. For example you might want to limit the maximum concurrency of the I/O-bound work to 10, and the maximum parallelism of the CPU-bound work to 2. Regarding the former you could take a look at this question: How to limit the amount of concurrent async I/O operations?
Regarding the later, you could use a TaskScheduler with limited concurrency. The ConcurrentExclusiveSchedulerPair is a handy class for this purpose. Here is an example of how you could rewrite your DoShopping method in a way that limits the ThreadPool usage to two threads at maximum (per request), without limiting at all the concurrency of the I/O-bound work:
public async Task<ShopResult[]> DoShopping(IEnumerable<Vendor> vendors)
{
var scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxConcurrencyLevel: 2).ConcurrentScheduler;
var tasks = vendors.Select(vendor =>
{
return Task.Factory.StartNew(() => DoShopAndProcessResultAsync(vendor),
default, TaskCreationOptions.DenyChildAttach, scheduler).Unwrap();
});
return await Task.WhenAll(tasks);
}
Important: In order for this to work, the DoShopAndProcessResultAsync method should be implemented internally without .ConfigureAwait(false) at the await points. Otherwise the continuations after the await will not run on our preferred scheduler, and the goal of limiting the ThreadPool utilization will be defeated.
My personal preference though would be to use instead the new (.NET 6) Parallel.ForEachAsync API. Apart from making it easy to control the concurrency through the MaxDegreeOfParallelism option, it also comes with a better behavior in case of exceptions. Instead of launching invariably all the async operations, it stops launching new operations as soon as a previously launched operation has failed. This can make a big difference in the responsiveness of your service, in case for example that all individual async operations are failing with a timeout exception. You can find here a synopsis of the main differences between the Parallel.ForEachAsync and the Task.WhenAll APIs.
Unfortunately the Parallel.ForEachAsync has the disadvantage that it doesn't return the results of the async operations. Which means that you have to collect the results manually as a side-effect of each async operation. I've posted here a ForEachAsync variant that returns results, that combines the best aspects of the Parallel.ForEachAsync and the Task.WhenAll APIs. You could use it like this:
public async Task<ShopResult[]> DoShopping(IEnumerable<Vendor> vendors)
{
var scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxConcurrencyLevel: 2).ConcurrentScheduler;
ParallelOptions options = new() { MaxDegreeOfParallelism = 10 };
return await ForEachAsync(vendors, options, async (vendor, ct) =>
{
return await Task.Factory.StartNew(() => DoShopAndProcessResultAsync(vendor),
ct, TaskCreationOptions.DenyChildAttach, scheduler).Unwrap();
});
}
Note: In my initial answer (revision 1) I had suggested erroneously to pass the scheduler through the ParallelOptions.TaskScheduler property. I just found out that this doesn't work as I expected. The ParallelOptions class has an internal property EffectiveMaxConcurrencyLevel that represents the minimum of the MaxDegreeOfParallelism and the TaskScheduler.MaximumConcurrencyLevel. The implementation of the Parallel.ForEachAsync method uses this property, instead of reading directly the MaxDegreeOfParallelism. So the MaxDegreeOfParallelism, by being larger than the MaximumConcurrencyLevel, was effectively ignored.
You've probably also noticed by now that the names of these two settings are confusing. We use the MaximumConcurrencyLevel in order to control the number of threads (aka the parallelization), and we use the MaxDegreeOfParallelism in order to control the amount of concurrent async operations (aka the concurrency). The reason for this confusing terminology can be traced to the historic origins of these APIs. The ParallelOptions class was introduced before the async-await era, and the designers of the new Parallel.ForEachAsync API aimed at making it compatible with the older non-asynchronous members of the Parallel class.

Related

Thread Contention on a ConcurrentDictionary in C#

I have a C# .NET program that uses an external API to process events for real-time stock market data. I use the API callback feature to populate a ConcurrentDictionary with the data it receives on a stock-by-stock basis.
I have a set of algorithms that each run in a constant loop until a terminal condition is met. They are called like this (but all from separate calling functions elsewhere in the code):
Task.Run(() => ExecutionLoop1());
Task.Run(() => ExecutionLoop2());
...
Task.Run(() => ExecutionLoopN());
Each one of those functions calls SnapTotals():
public void SnapTotals()
{
foreach (KeyValuePair<string, MarketData> kvpMarketData in
new ConcurrentDictionary<string, MarketData>(Handler.MessageEventHandler.Realtime))
{
...
The Handler.MessageEventHandler.Realtime object is the ConcurrentDictionary that is updated in real-time by the external API.
At a certain specific point in the day, there is an instant burst of data that comes in from the API. That is the precise time I want my ExecutionLoop() functions to do some work.
As I've grown the program and added more of those execution loop functions, and grown the number of elements in the ConcurrentDictionary, the performance of the program as a whole has seriously degraded. Specifically, those ExecutionLoop() functions all seem to freeze up and take much longer to meet their terminal condition than they should.
I added some logging to all of the functions above, and to the function that updates the ConcurrentDictionary. From what I can gather, the ExecutionLoop() functions appear to access the ConcurrentDictionary so often that they block the API from updating it with real-time data. The loops are dependent on that data to meet their terminal condition so they cannot complete.
I'm stuck trying to figure out a way to re-architect this. I would like for the thread that updates the ConcurrentDictionary to have a higher priority but the message events are handled from within the external API. I don't know if ConcurrentDictionary was the right type of data structure to use, or what the alternative could be, because obviously a regular Dictionary would not work here. Or is there a way to "pause" my execution loops for a few milliseconds to allow the market data feed to catch up? Or something else?

Your basic approach is sound except for one fatal flaw: they are all hitting the same dictionary at the same time via iterators, sets, and gets. So you must do one thing: in SnapTotals you must iterate over a copy of the concurrent dictionary.
When you iterate over Handler.MessageEventHandler.Realtime or even new ConcurrentDictionary<string, MarketData>(Handler.MessageEventHandler.Realtime) you are using the ConcurrentDictionary<>'s iterator, which even though is thread-safe, is going to be using the dictionary for the entire period of iteration (including however long it takes to do the processing for each and every entry in the dictionary). That is most likely where the contention occurs.
Making a copy of the dictionary is much faster, so should lower contention.
Change SnapTotals to
public void SnapTotals()
{
var copy = Handler.MessageEventHandler.Realtime.ToArray();
foreach (var kvpMarketData in copy)
{
...
Now, each ExecutionLoopX can execute in peace without write-side contention (your API updates) and without read-side contention from the other loops. The write-side can execute without read-side contention as well.
The only "contention" should be for the short duration needed to do each copy.
And by the way, the dictionary copy (an array) is not threadsafe; it's just a plain array, but that is ok because each task is executing in isolation on its own copy.

I think that your main problem is not related to the ConcurrentDictionary, but to the large number of ExecutionLoopX methods. Each of these methods saturates a CPU core, and since the methods are more than the cores of your machine, the whole CPU is saturated. My assumption is that if you find a way to limit the degree of parallelism of the ExecutionLoopX methods to a number smaller than the Environment.ProcessorCount, your program will behave and perform better. Below is my suggestion for implementing this limitation.
The main obstacle is that currently your ExecutionLoopX methods are monolithic: they can't be separated to pieces so that they can be parallelized. My suggestion is to change their return type from void to async Task, and place an await Task.Yield(); inside the outer loop. This way it will be possible to execute them in steps, with each step being the code from the one await to the next.
Then create a TaskScheduler with limited concurrency, and a TaskFactory that uses this scheduler:
int maxDegreeOfParallelism = Environment.ProcessorCount - 1;
TaskScheduler scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxDegreeOfParallelism).ConcurrentScheduler;
TaskFactory taskFactory = new TaskFactory(scheduler);
Now you can parallelize the execution of the methods, by starting the tasks with the taskFactory.StartNew method instead of the Task.Run:
List<Task> tasks = new();
tasks.Add(taskFactory.StartNew(() => ExecutionLoop1(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop2(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop3(data)).Unwrap());
tasks.Add(taskFactory.StartNew(() => ExecutionLoop4(data)).Unwrap());
//...
Task.WaitAll(tasks.ToArray());
The .Unwrap() is needed because the taskFactory.StartNew returns a nested task (Task<Task>). The Task.Run method is also doing this unwrapping internally, when the action is asynchronous.
An online demo of this idea can be found here.
The Environment.ProcessorCount - 1 configuration means that one CPU core will be available for other work, like the communication with the external API and the updating of the ConcurrentDictionary.
A more cumbersome implementation of the same idea, using iterators and the Parallel.ForEach method instead of async/await, can be found in the first revision of this answer.

If you're not squeamish about mixing operations in a task, you could redesign such that instead of task A doing A things, B doing B things, C doing C things, etc. you can reduce the number of tasks to the number of processors, and thus run fewer concurrently, greatly easing contention.
So, for example, say you have just two processors. Make a "general purpose/pluggable" task wrapper that accepts delegates. So, wrapper 1 would accept delegates to do A and B work. Wrapper 2 would accept delegates to do C and D work. Then ask each wrapper to spin up a task that calls the delegates in a loop over the dictionary.
This would of course need to be measured. What I am proposing is, say, 4 tasks each doing 4 different types of processing. This is 4 units of work per loop over 4 loops. This is not the same as 16 tasks each doing 1 unit of work. In that case you have 16 loops.
16 loops intuitively would cause more contention than 4.
Again, this is a potential solution that should be measured. There is one drawback for sure: you will have to ensure that a piece of work within a task doesn't affect any of the others.

Parallel.ForEach with async lambda waiting forall iterations to complete

recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Is there any way how could I write:
List<int> list = new List<int>[]();
Parallel.ForEach(arrayValues, async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
list.Add(x);
});
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?

recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Well, that's because Parallel doesn't work with async. And from a different perspective, why would you want to mix them in the first place? They do opposite things. Parallel is all about adding threads and async is all about giving up threads. If you want to do asynchronous work concurrently, then use Task.WhenAll. That's the correct tool for the job; Parallel is not.
That said, it sounds like you want to use the wrong tool, so here's how you do it...
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
You'll need to have some kind of a signal that some code can block on until the processing is done, e.g., CountdownEvent or Monitor. On a side note, you'll need to protect access to the non-thread-safe List<T> as well.
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
Since Parallel doesn't understand async lambdas, when the first await yields (returns) to its caller, Parallel will assume that interation of the loop is complete.
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
Correct. As far as Parallel knows, it can only "see" the method to the first await that returns to its caller. So it doesn't know when the async lambda is complete. It also will assume iterations are complete too early, which throws partitioning off.

You don't need Parallel.For/ForEach here you just need to await a list of tasks.
Background
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run multiple async tasks at the same time use Task.WhenAll , or a TPL Dataflow Block (or something similar) which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
Unless you need to do more inside of your lambda (for which you haven't shown), just use aSelect and WhenAll
var tasks = items .Select(LongRunningIoOperationAsync);
var results = await Task.WhenAll(tasks); // here is your list of int
If you do, you can still use the await,
var tasks = items.Select(async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
// do other stuff
return x;
});
var results = await Task.WhenAll(tasks);
Note : If you need the extended functionality of Parallel.ForEach (namely the Options to control max concurrency), there are several approach, however RX or DataFlow might be the most succinct

Parallel for each or any alternative for parallel loop?

I have this code
Lines.ToList().ForEach(y =>
{
globalQueue.AddRange(GetTasks(y.LineCode).ToList());
});
So for each line in my list of lines I get the tasks that I add to a global production queue. I can have 8 lines. Each get task request GetTasks(y.LineCode) take 1 minute. I would like to use parallelism to be sure I request my 8 calls together and not one by one.
What should I do?
Using another ForEach loop or using another extension method? Is there a ForEachAsync? Make the GetTasks request itself async?

Parallelism isn't concurrency. Concurrency isn't asynchrony. Running multiple slow queries in parallel won't make them run faster, quite the opposite. These are different problems and require very different solutions. Without a specific problem one can only give generic advice.
Parallelism - processing an 800K item array
Parallelism means processing a ton of data using multiple cores in parallel. To do that, you need to partition your data and feed each partition to a "worker" for processing. You need to minimize communication between workers and the need of synchronization to get the best performance, otherwise your workers will spend CPU time doing nothing. That means, no global queue updating.
If you have a lot of lines, or if line processing is CPU-bound, you can use PLINQ to process it :
var query = from y in lines.AsParallel()
from t in GetTasks(y.LineCode)
select t;
var theResults=query.ToList();
That's it. No need to synchronize access to a queue, either through locking or using a concurrent collection. This will use all available cores though. You can add WithDegreeOfParallelism() to reduce the number of cores used to avoid freezing
Concurrency - calling 2000 servers
Concurrency on the other hand means doing several different things at the same time. No partitioning is involved.
For example, if I had to query 8 or 2000 servers for monitoring data (true story) I wouldn't use Parallel or PLINQ. For one thing, Parallel and PLINQ use all available cores. In this case though they won't be doing anything, they'll just wait for responses. Parallelism classes can't handle async methods either because there's no point - they aren't meant to wait for responses.
A very quick & dirty solution would be to start multiple tasks and wait for them to return, eg :
var tasks=lines.Select(y=>Task.Run(()=>GetTasks(y.LineCode));
//Array of individual results
var resultsArray=await Task.WhenAll(tasks);
//flatten the results
var resultList=resultsArray.SelectMany(r=>r).ToList();
This will start all requests at once. Network Security didn't like the 2000 concurrent requests, since it looked like a hack attack and caused a bit of network flooding.
Concurrency with Dataflow
We can use the TPL Dataflow library and eg ActionBlock or TransformBlock to make the requests with a controlled degree of parallelism :
var options=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 4 ,
BoundedCapacity=10,
};
var spamBlock=new TransformManyBlock<Line,Result>(
y=>GetTasks(y.LineCode),
options);
var outputBlock=new BufferBlock<Result>();
spamBlock.LinkTo(outputBlock);
foreach(var line in lines)
{
await spamBlock.SendAsync(line);
}
spamBlock.Complete();
//Wait for all 4 workers to finish
await spamBlock.Completion;
Once the spamBlock completes, the results can be found in outputBlock. By setting a BoundedCapacity I ensure that the posting loop will wait if there are too many unprocessed messages in spamBlock's input queue.
An ActionBlock can handle asynchronous methods too. Assuming GetTasksAsync returns a Task<Result[]> we can use:
var spamBlock=new TransformManyBlock<Line,Result>(
y=>GetTasksAsync(y.LineCode),
options);

You can use Parallel Foreach:
Parallel.ForEach(Lines, (line) =>
{
globalQueue.AddRange(GetTasks(line.LineCode).ToList());
});
A Parallel.ForEach loop works like a Parallel.For loop. The loop
partitions the source collection and schedules the work on multiple
threads based on the system environment. The more processors on the
system, the faster the parallel method runs.

ActionBlock<T> vs Task.WhenAll

I would like to know what is the recommended way to execute multiple async methods in parallel?
in System.Threading.Tasks.Dataflow we can specify the max degree of parallelism but unbounded is probably the default for Task.WhenAll too ?
this :
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(myAsyncMethod(item));
}
await Task.WhenAll(tasks.ToArray());
or that :
var action = new ActionBlock<string>(myAsyncMethod, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded,
BoundedCapacity = DataflowBlockOptions.Unbounded,
MaxMessagesPerTask = DataflowBlockOptions.Unbounded
});
foreach (var item in items) { }
{
action.Post(item);
}
action.Complete();
await action.Completion;

I would like to know what is the recommended way to execute multiple async methods in parallel?
Side note: actually not parallel, but concurrent.
in System.Threading.Tasks.Dataflow we can specify the max degree of parallelism but unbounded is probably the default for Task.WhenAll too ?
As someone commented, Task.WhenAll only joins existing tasks; by the time your code gets to Task.WhenAll, all the concurrency decsions have already been made.
You can throttle plain asynchronous code by using something like SemaphoreSlim.
The decision of whether to use asynchronous concurrency directly or TPL Dataflow is dependent on the surrounding code. If this concurrent operation is just called once asynchronously, then asynchronous concurrency is the best bet; but if this concurrent operation is part of a "pipeline" for your data, then TPL Dataflow may be a better fit.

Both methods are acceptable and the choice should be governed by your requirements as you can see Dataflow gives you a lot of configurability that you would otherwise have to implement manually when using Tasks directly.
Note that in both situations the Task Pool will be responsible for enqueuing and running the tasks so the behaviour should remain the same.
Dataflow is good at chaining together groups of composable asynchronous operations whereas using tasks gives you finer grained control.

Parallel execution for IO bound operations

I have read TPL and Task library documents cover to cover. But, I still couldn't comprehend the following case very clearly and right now I need to implement it.
I will simplify my situation. I have an IEnumerable<Uri> of length 1000. I have to make a request for them using HttpClient.
I have two questions.
There is not much computation, just waiting for Http request. In this case can I still use Parallel.Foreach() ?
In case of using Task instead, what is the best practice for creating huge number of them? Let's say I use Task.Factory.StartNew() and add those tasks to a list and wait for all of them. Is there a feature (such as TPL partitioner) that controls number of maximum tasks and maximum HttpClient I can create?
There are couple of similar questions on SO, but no one mentions the maximums. The requirement is just using maximum tasks with maximum HttpClient.
Thank you in advance.

i3arnon's answer with TPL Dataflow is good; Dataflow is useful especially if you have a mix of CPU and I/O bound code. I'll echo his sentiment that Parallel is designed for CPU-bound code; it's not the best solution for I/O-based code, and especially not appropriate for asynchronous code.
If you want an alternative solution that works well with mostly-I/O code - and doesn't require an external library - the method you're looking for is Task.WhenAll:
var tasks = uris.Select(uri => SendRequestAsync(uri)).ToArray();
await Task.WhenAll(tasks);
This is the easiest solution, but it does have the drawback of starting all requests simultaneously. Particularly if all requests are going to the same service (or a small set of services), this can cause timeouts. To solve this, you need to use some kind of throttling...
Is there a feature (such as TPL partitioner) that controls number of maximum tasks and maximum HttpClient I can create?
TPL Dataflow has that nice MaxDegreeOfParallelism which only starts so many at a time. You can also throttle regular asynchronous code by using another builtin, SemaphoreSlim:
private readonly SemaphoreSlim _sem = new SemaphoreSlim(50);
private async Task SendRequestAsync(Uri uri)
{
await _sem.WaitAsync();
try
{
...
}
finally
{
_sem.Release();
}
}
In case of using Task instead, what is the best practice for creating huge number of them? Let's say I use Task.Factory.StartNew() and add those tasks to a list and wait for all of them.
You actually don't want to use StartNew. It only has one appropriate use case (dynamic task-based parallelism), which is extremely rare. Modern code should use Task.Run if you need to push work onto a background thread. But you don't even need that to begin with, so neither StartNew nor Task.Run is appropriate here.
There are couple of similar questions on SO, but no one mentions the maximums. The requirement is just using maximum tasks with maximum HttpClient.
Maximums are where asynchronous code really gets tricky. With CPU-bound (parallel) code, the solution is obvious: you use as many threads as you have cores. (Well, at least you can start there and adjust as necessary). With asynchronous code, there isn't as obvious of a solution. It depends on a lot of factors - how much memory you have, how the remote server responds (rate limiting, timeouts, etc), etc.
There's no easy solutions here. You just have to test out how your specific application deals with high levels of concurrency, and then throttle to some lower number.
I have some slides for a talk that attempts to explain when different technologies are appropriate (parallelism, asynchrony, TPL Dataflow, and Rx). If you prefer more of a written description with recipes, I think you may benefit from my book on concurrency.

.NET 6
Starting from .NET 6 you can use one of the Parallel.ForEachAsync methods which are async aware:
await Parallel.ForEachAsync(
uris,
async (uri, cancellationToken) => await SendRequestAsync(uri, cancellationToken));
This will use Environment.ProcessorCount as the degree of parallelism. To change it you can use the overload that accepts ParallelOptions:
await Parallel.ForEachAsync(
uris,
new ParallelOptions { MaxDegreeOfParallelism = 50 },
async (uri, cancellationToken) => await SendRequestAsync(uri, cancellationToken));
ParallelOptions also allows passing in a CancellationToken and a TaskScheduler
.NET 5 and older (including all .NET Framework versions)
In this case can I still use Parallel.Foreach ?
This isn't really appropriate. Parallel.Foreach is more for CPU intensive work. It also doesn't support async operations.
In case of using Task instead, what is the best practice for creating huge number of them?
Use a TPL Dataflow block instead. You don't create huge amounts of tasks that sit there waiting for a thread to become available. You can configure the max amount of tasks and reuse them for all the items that meanwhile sit in a buffer waiting for a task. For example:
var block = new ActionBlock<Uri>(
uri => SendRequestAsync(uri),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 50 });
foreach (var uri in uris)
{
block.Post(uri);
}
block.Complete();
await block.Completion;

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.