I have read TPL and Task library documents cover to cover. But, I still couldn't comprehend the following case very clearly and right now I need to implement it.
I will simplify my situation. I have an IEnumerable<Uri> of length 1000. I have to make a request for them using HttpClient.
I have two questions.
There is not much computation, just waiting for Http request. In this case can I still use Parallel.Foreach() ?
In case of using Task instead, what is the best practice for creating huge number of them? Let's say I use Task.Factory.StartNew() and add those tasks to a list and wait for all of them. Is there a feature (such as TPL partitioner) that controls number of maximum tasks and maximum HttpClient I can create?
There are couple of similar questions on SO, but no one mentions the maximums. The requirement is just using maximum tasks with maximum HttpClient.
Thank you in advance.
i3arnon's answer with TPL Dataflow is good; Dataflow is useful especially if you have a mix of CPU and I/O bound code. I'll echo his sentiment that Parallel is designed for CPU-bound code; it's not the best solution for I/O-based code, and especially not appropriate for asynchronous code.
If you want an alternative solution that works well with mostly-I/O code - and doesn't require an external library - the method you're looking for is Task.WhenAll:
var tasks = uris.Select(uri => SendRequestAsync(uri)).ToArray();
await Task.WhenAll(tasks);
This is the easiest solution, but it does have the drawback of starting all requests simultaneously. Particularly if all requests are going to the same service (or a small set of services), this can cause timeouts. To solve this, you need to use some kind of throttling...
Is there a feature (such as TPL partitioner) that controls number of maximum tasks and maximum HttpClient I can create?
TPL Dataflow has that nice MaxDegreeOfParallelism which only starts so many at a time. You can also throttle regular asynchronous code by using another builtin, SemaphoreSlim:
private readonly SemaphoreSlim _sem = new SemaphoreSlim(50);
private async Task SendRequestAsync(Uri uri)
{
await _sem.WaitAsync();
try
{
...
}
finally
{
_sem.Release();
}
}
In case of using Task instead, what is the best practice for creating huge number of them? Let's say I use Task.Factory.StartNew() and add those tasks to a list and wait for all of them.
You actually don't want to use StartNew. It only has one appropriate use case (dynamic task-based parallelism), which is extremely rare. Modern code should use Task.Run if you need to push work onto a background thread. But you don't even need that to begin with, so neither StartNew nor Task.Run is appropriate here.
There are couple of similar questions on SO, but no one mentions the maximums. The requirement is just using maximum tasks with maximum HttpClient.
Maximums are where asynchronous code really gets tricky. With CPU-bound (parallel) code, the solution is obvious: you use as many threads as you have cores. (Well, at least you can start there and adjust as necessary). With asynchronous code, there isn't as obvious of a solution. It depends on a lot of factors - how much memory you have, how the remote server responds (rate limiting, timeouts, etc), etc.
There's no easy solutions here. You just have to test out how your specific application deals with high levels of concurrency, and then throttle to some lower number.
I have some slides for a talk that attempts to explain when different technologies are appropriate (parallelism, asynchrony, TPL Dataflow, and Rx). If you prefer more of a written description with recipes, I think you may benefit from my book on concurrency.
.NET 6
Starting from .NET 6 you can use one of the Parallel.ForEachAsync methods which are async aware:
await Parallel.ForEachAsync(
uris,
async (uri, cancellationToken) => await SendRequestAsync(uri, cancellationToken));
This will use Environment.ProcessorCount as the degree of parallelism. To change it you can use the overload that accepts ParallelOptions:
await Parallel.ForEachAsync(
uris,
new ParallelOptions { MaxDegreeOfParallelism = 50 },
async (uri, cancellationToken) => await SendRequestAsync(uri, cancellationToken));
ParallelOptions also allows passing in a CancellationToken and a TaskScheduler
.NET 5 and older (including all .NET Framework versions)
In this case can I still use Parallel.Foreach ?
This isn't really appropriate. Parallel.Foreach is more for CPU intensive work. It also doesn't support async operations.
In case of using Task instead, what is the best practice for creating huge number of them?
Use a TPL Dataflow block instead. You don't create huge amounts of tasks that sit there waiting for a thread to become available. You can configure the max amount of tasks and reuse them for all the items that meanwhile sit in a buffer waiting for a task. For example:
var block = new ActionBlock<Uri>(
uri => SendRequestAsync(uri),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 50 });
foreach (var uri in uris)
{
block.Post(uri);
}
block.Complete();
await block.Completion;
Related
I am trying to improve performane of some code which does some shopping function calling number of different vendors. 3rd party vendor call is async and results are processed to generate a result. Strucure of the code is as follows.
public async Task<List<ShopResult>> DoShopping(IEnumerable<Vendor> vendors)
{
var res = vendors.Select(async s => await DoShopAndProcessResultAsync(s));
await Task.WhenAll(res); ....
}
Since DoShopAndProcessResultAsync is both IO bound and CPU bound, and each vendor iteration is independant I think Task.Run can be used to do something like below.
public async Task<List<ShopResult>> DoShopping(IEnumerable<Vendor> vendors)
{
var res = vendors.Select(s => Task.Run(() => DoShopAndProcessResultAsync(s)));
await Task.WhenAll(res); ...
}
Using Task.Run as is having a performance gain and I can see multiple threads are being involved here from the order of execution of the calls. And it is running without any issue locally on my machine.
However, it is a tasks of tasks scenario and wondering whether any pitfalls or this is deadlock prone in a high traffic prod environment.
What are your opinions on the approach of using Task.Run to parallelize async calls?
Tasks are .NET's low-level building blocks. .NET almost always has a better high-level abstraction for specific concurrency paradigms.
To paraphrase Rob Pike (slides) Concurrency is not parallelism is not asynchronous execution. What you ask is concurrent execution, with a specific degree-of-parallelism. NET already offers high-level classes that can do that, without resorting to low-level task handling.
At the end, I explain why these distinctions matter and how they're implemented using different .NET classes or libraries
Dataflow blocks
At the highest level, the Dataflow classes allow creating a pipeline of processing blocks similar to a Powershell or Bash pipeline, where each block can use one or more tasks to process input. Dataflow blocks preserve message order, ensuring results are emitted in the order the input messages were received.
You'll often see combinations of block called meshes, not pipelines. Dataflow grew out of the Microsoft Robotics Framework and can be used to create a network of independent processing blocks. Most programmers just use to build a pipeline of steps though.
In your case, you could use a TransformBlock to execute DoShopAndProcessResultAsync and feed the output either to another processing block, or a BufferBlock you can read after processing all results. You could even split Shop and Process into separate blocks, each with its own logic and degree of parallelism
Eg.
var buffer=new BufferBlock<ShopResult>();
var blockOptions=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism=3,
BoundedCapacity=1
};
var shop=new TransformBlock<Vendor,ShopResult)(DoShopAndProcessResultAsync,
blockOptions);
var linkOptions=new DataflowLinkOptions{ PropagateCompletion=true;}
shop.LinkTo(buffer,linkOptions);
foreach(var v in vendors)
{
await shop.SendAsync(v);
}
shop.Complete();
await shop.Completion;
buffer.TryReceiveAll(out IList<ShopResult> results);
You can use two separate blocks to shop and process :
var shop=new TransformBlock<Vendor,ShopResponse>(DoShopAsync,shopOptions);
var process=new TransformBlock<ShopResponse,ShopResult>(DoProcessAsync,processOptions);
shop.LinkTo(process,linkOptions);
process.LinkTo(results,linkOptions);
foreach(var v in vendors)
{
await shop.SendAsync(v);
}
shop.Complete();
await process.Completion;
In this case we await the completion of the last block in the chain before reading the results.
Instead of reading from a buffer block, we could use an ActionBlock at the end to do whatever we want to do with the results, eg store them to a database. The results can be batched using a BatchBlock to reduce the number of storage operations
...
var batch=new BatchBlock<ShopResult>(100);
var store=new ActionBlock<ShopResult[]>(DoStoreAsync);
shop.LinkTo(process,linkOptions);
process.LinkTo(batch,linkOptions);
batch.LinkTo(store,linkOptions);
...
shop.Complete();
await store.Completion;
Why do names matter
Tasks are the lowest level building blocks used to implement multiple paradigms. In other languages you'd see them described as Futures or Promises (eg Javascript)
Parallelism in .NET means executing CPU-bound computations over a lot of data using all available cores. Parallel.ForEach will partition the input data into roughly as many partitions as there are cores and use one worker task per partition. PLINQ goes one step further, allowing the use of LINQ operators to specify the computation and let PLINQ to use algorithms optimized for parallel execution to map, filter, sort, group and collect results. That's why Parallel.ForEach can't be used for async work at all.
Concurrency means executing multiple independent and often IO-bound jobs. At the lowest level you can use Tasks but Dataflow, Rx.NET, Channels, IAsyncEnumerable etc allow the use of high-level patterns like CSP/Pipelines, event stream processing etc
Asynchronous execution means you don't have to block while waiting for I/O-bound work to complete.
What is alarming with the Task.Run approach in your question, is that it depletes the ThreadPool from available worker threads in a non-controlled manner. It doesn't offer any configuration option that would allow you to reduce the parallelism of each individual request, in favor of preserving the scalability of the whole service. That's something that might bite you in the long run.
Ideally you would like to control both the parallelism and the concurrency, and control them independently. For example you might want to limit the maximum concurrency of the I/O-bound work to 10, and the maximum parallelism of the CPU-bound work to 2. Regarding the former you could take a look at this question: How to limit the amount of concurrent async I/O operations?
Regarding the later, you could use a TaskScheduler with limited concurrency. The ConcurrentExclusiveSchedulerPair is a handy class for this purpose. Here is an example of how you could rewrite your DoShopping method in a way that limits the ThreadPool usage to two threads at maximum (per request), without limiting at all the concurrency of the I/O-bound work:
public async Task<ShopResult[]> DoShopping(IEnumerable<Vendor> vendors)
{
var scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxConcurrencyLevel: 2).ConcurrentScheduler;
var tasks = vendors.Select(vendor =>
{
return Task.Factory.StartNew(() => DoShopAndProcessResultAsync(vendor),
default, TaskCreationOptions.DenyChildAttach, scheduler).Unwrap();
});
return await Task.WhenAll(tasks);
}
Important: In order for this to work, the DoShopAndProcessResultAsync method should be implemented internally without .ConfigureAwait(false) at the await points. Otherwise the continuations after the await will not run on our preferred scheduler, and the goal of limiting the ThreadPool utilization will be defeated.
My personal preference though would be to use instead the new (.NET 6) Parallel.ForEachAsync API. Apart from making it easy to control the concurrency through the MaxDegreeOfParallelism option, it also comes with a better behavior in case of exceptions. Instead of launching invariably all the async operations, it stops launching new operations as soon as a previously launched operation has failed. This can make a big difference in the responsiveness of your service, in case for example that all individual async operations are failing with a timeout exception. You can find here a synopsis of the main differences between the Parallel.ForEachAsync and the Task.WhenAll APIs.
Unfortunately the Parallel.ForEachAsync has the disadvantage that it doesn't return the results of the async operations. Which means that you have to collect the results manually as a side-effect of each async operation. I've posted here a ForEachAsync variant that returns results, that combines the best aspects of the Parallel.ForEachAsync and the Task.WhenAll APIs. You could use it like this:
public async Task<ShopResult[]> DoShopping(IEnumerable<Vendor> vendors)
{
var scheduler = new ConcurrentExclusiveSchedulerPair(
TaskScheduler.Default, maxConcurrencyLevel: 2).ConcurrentScheduler;
ParallelOptions options = new() { MaxDegreeOfParallelism = 10 };
return await ForEachAsync(vendors, options, async (vendor, ct) =>
{
return await Task.Factory.StartNew(() => DoShopAndProcessResultAsync(vendor),
ct, TaskCreationOptions.DenyChildAttach, scheduler).Unwrap();
});
}
Note: In my initial answer (revision 1) I had suggested erroneously to pass the scheduler through the ParallelOptions.TaskScheduler property. I just found out that this doesn't work as I expected. The ParallelOptions class has an internal property EffectiveMaxConcurrencyLevel that represents the minimum of the MaxDegreeOfParallelism and the TaskScheduler.MaximumConcurrencyLevel. The implementation of the Parallel.ForEachAsync method uses this property, instead of reading directly the MaxDegreeOfParallelism. So the MaxDegreeOfParallelism, by being larger than the MaximumConcurrencyLevel, was effectively ignored.
You've probably also noticed by now that the names of these two settings are confusing. We use the MaximumConcurrencyLevel in order to control the number of threads (aka the parallelization), and we use the MaxDegreeOfParallelism in order to control the amount of concurrent async operations (aka the concurrency). The reason for this confusing terminology can be traced to the historic origins of these APIs. The ParallelOptions class was introduced before the async-await era, and the designers of the new Parallel.ForEachAsync API aimed at making it compatible with the older non-asynchronous members of the Parallel class.
I have this code
Lines.ToList().ForEach(y =>
{
globalQueue.AddRange(GetTasks(y.LineCode).ToList());
});
So for each line in my list of lines I get the tasks that I add to a global production queue. I can have 8 lines. Each get task request GetTasks(y.LineCode) take 1 minute. I would like to use parallelism to be sure I request my 8 calls together and not one by one.
What should I do?
Using another ForEach loop or using another extension method? Is there a ForEachAsync? Make the GetTasks request itself async?
Parallelism isn't concurrency. Concurrency isn't asynchrony. Running multiple slow queries in parallel won't make them run faster, quite the opposite. These are different problems and require very different solutions. Without a specific problem one can only give generic advice.
Parallelism - processing an 800K item array
Parallelism means processing a ton of data using multiple cores in parallel. To do that, you need to partition your data and feed each partition to a "worker" for processing. You need to minimize communication between workers and the need of synchronization to get the best performance, otherwise your workers will spend CPU time doing nothing. That means, no global queue updating.
If you have a lot of lines, or if line processing is CPU-bound, you can use PLINQ to process it :
var query = from y in lines.AsParallel()
from t in GetTasks(y.LineCode)
select t;
var theResults=query.ToList();
That's it. No need to synchronize access to a queue, either through locking or using a concurrent collection. This will use all available cores though. You can add WithDegreeOfParallelism() to reduce the number of cores used to avoid freezing
Concurrency - calling 2000 servers
Concurrency on the other hand means doing several different things at the same time. No partitioning is involved.
For example, if I had to query 8 or 2000 servers for monitoring data (true story) I wouldn't use Parallel or PLINQ. For one thing, Parallel and PLINQ use all available cores. In this case though they won't be doing anything, they'll just wait for responses. Parallelism classes can't handle async methods either because there's no point - they aren't meant to wait for responses.
A very quick & dirty solution would be to start multiple tasks and wait for them to return, eg :
var tasks=lines.Select(y=>Task.Run(()=>GetTasks(y.LineCode));
//Array of individual results
var resultsArray=await Task.WhenAll(tasks);
//flatten the results
var resultList=resultsArray.SelectMany(r=>r).ToList();
This will start all requests at once. Network Security didn't like the 2000 concurrent requests, since it looked like a hack attack and caused a bit of network flooding.
Concurrency with Dataflow
We can use the TPL Dataflow library and eg ActionBlock or TransformBlock to make the requests with a controlled degree of parallelism :
var options=new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 4 ,
BoundedCapacity=10,
};
var spamBlock=new TransformManyBlock<Line,Result>(
y=>GetTasks(y.LineCode),
options);
var outputBlock=new BufferBlock<Result>();
spamBlock.LinkTo(outputBlock);
foreach(var line in lines)
{
await spamBlock.SendAsync(line);
}
spamBlock.Complete();
//Wait for all 4 workers to finish
await spamBlock.Completion;
Once the spamBlock completes, the results can be found in outputBlock. By setting a BoundedCapacity I ensure that the posting loop will wait if there are too many unprocessed messages in spamBlock's input queue.
An ActionBlock can handle asynchronous methods too. Assuming GetTasksAsync returns a Task<Result[]> we can use:
var spamBlock=new TransformManyBlock<Line,Result>(
y=>GetTasksAsync(y.LineCode),
options);
You can use Parallel Foreach:
Parallel.ForEach(Lines, (line) =>
{
globalQueue.AddRange(GetTasks(line.LineCode).ToList());
});
A Parallel.ForEach loop works like a Parallel.For loop. The loop
partitions the source collection and schedules the work on multiple
threads based on the system environment. The more processors on the
system, the faster the parallel method runs.
We're developing WebAPI which has some logic of decryption of around 200 items (can be more). Each decryption takes around 20ms.
We've tried to parallel the tasks so we'll get it done as soon as possible, but it seems we're getting some kind of a limit as the threads are getting reused by waiting for the older threads to complete (and there are only few used) - overall action takes around 1-2 seconds to complete...
What we basically want to achieve is get x amount of threads start at the same time and finish after those ~20 ms.
We tried this:
Await multiple async Task while setting max running task at a time
But it seems this only describes setting a limit while we want to release it...
Here's a snippet:
var tasks = new List<Task>();
foreach (var element in Elements)
{
var task = new Task(() =>
{
element.Value = Cipher.Decrypt((string)element.Value);
}
});
task.Start();
tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());
What are we missing here?
Thanks,
Nir.
I cannot recommend parallelism on ASP.NET. It will certainly impact the scalability of your service, particularly if it is public-facing. I have thought "oh, I'm smart enough to do this" a couple of times and added parallelism in an ASP.NET app, only to have to tear it right back out a week later.
However, if you really want to...
it seems we're getting some kind of a limit
Is it the limit of physical cores on your machine?
We tried this: Await multiple async Task while setting max running task at a time
That solution is specifically for asynchronous concurrent code (e.g., I/O-bound). What you want is parallel (threaded) concurrent code (e.g., CPU-bound). Completely different use cases and solutions.
What are we missing here?
Your current code is throwing a ton of simultaneous tasks at the thread pool, which will attempt to handle them as best as it can. You can make this more efficient by using a higher-level abstraction, e.g., Parallel:
Parallel.ForEach(Elements, element =>
{
element.Value = Cipher.Decrypt((string)element.Value);
});
Parallel is more intelligent in terms of its partitioning and (re-)use of threads (i.e., not exceeding number of cores). So you should see some speedup.
However, I would expect it only to be a minor speedup. You are likely being limited by your number of physical cores.
Asuming no hyper threading:
If it takes 20ms for 1 item , then you can look at it as if it takes 1 core 20ms. If you want 200 items to complete in 20 ms, then you need 200 cores all for you. If you don't have that many, it just can't be done...
Under normal surcumstances, as many Task Will be scheduled parallel as optimal for you system
I would like to know what is the recommended way to execute multiple async methods in parallel?
in System.Threading.Tasks.Dataflow we can specify the max degree of parallelism but unbounded is probably the default for Task.WhenAll too ?
this :
var tasks = new List<Task>();
foreach(var item in items)
{
tasks.Add(myAsyncMethod(item));
}
await Task.WhenAll(tasks.ToArray());
or that :
var action = new ActionBlock<string>(myAsyncMethod, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded,
BoundedCapacity = DataflowBlockOptions.Unbounded,
MaxMessagesPerTask = DataflowBlockOptions.Unbounded
});
foreach (var item in items) { }
{
action.Post(item);
}
action.Complete();
await action.Completion;
I would like to know what is the recommended way to execute multiple async methods in parallel?
Side note: actually not parallel, but concurrent.
in System.Threading.Tasks.Dataflow we can specify the max degree of parallelism but unbounded is probably the default for Task.WhenAll too ?
As someone commented, Task.WhenAll only joins existing tasks; by the time your code gets to Task.WhenAll, all the concurrency decsions have already been made.
You can throttle plain asynchronous code by using something like SemaphoreSlim.
The decision of whether to use asynchronous concurrency directly or TPL Dataflow is dependent on the surrounding code. If this concurrent operation is just called once asynchronously, then asynchronous concurrency is the best bet; but if this concurrent operation is part of a "pipeline" for your data, then TPL Dataflow may be a better fit.
Both methods are acceptable and the choice should be governed by your requirements as you can see Dataflow gives you a lot of configurability that you would otherwise have to implement manually when using Tasks directly.
Note that in both situations the Task Pool will be responsible for enqueuing and running the tasks so the behaviour should remain the same.
Dataflow is good at chaining together groups of composable asynchronous operations whereas using tasks gives you finer grained control.
Short version: how does async calls scale when async methods are called thousands and thousands of times in a loop, and these methods might call other async methods? Will my threadpool explode?
I've been reading and experimenting with the TPL and Async and after reading a lot of material I'm still confused about some aspects that I could not find much information about, like how async calls scale. I will try to go straight to the point.
Async calls
For IO, I read it is better to use async than a new thread/start a task, but from what I understand, performing an async operation without using a different thread is impossible, which means async must use other threads/start tasks at some point.
So my question is: how would code A be better than code B regarding system resources?
Code A
// an array with 5000 urls.
var urls = new string[5000];
// list of awaitable tasks.
var tasks = new List<Task<string>>(5000);
HttpClient httpClient;
foreach (string url in urls)
{
tasks.Add(httpClient.GetStringAsync(url));
}
await Task.WhenAll(tasks);
Code B
...same variables as code A...
foreach (string url in urls)
{
tasks.Add(
Task.Factory.StartNew(() =>
{
// This method represents a
// synchronous version of the GetStringAsync.
httpClient.GetString(url);
})
);
}
await Task.WhenAll(tasks);
Which leads me to the questions:
1 - should async calls be avoided in a loop?
2 - Is there a reasonable max of async calls that should be fired at a time, or is firing any number of async calls ok? How does this scale?
3 - Do async methods, under the hood, start a task for each call?
I tested this with 1000 urls and the number of used threadpool worker threads never even reached 30, and the number of IO completion threads is always about 5.
My Practical Experiment
I created a web application with a simple async controller.
The page is composed of a single form with a textarea where the user enters all urls he wishes to request/do some work with.
Upon submition, the urls are requested in loop using the HttpClient.GetUrlAsync method just like the code A above.
An interesting point is that if I submit 1000 urls, it takes about 3 minutes to finish all requests.
On the other hand, if I submit 3 forms from 3 different tabs (i.e. clients), each with 1000 urls, it takes much much longer for the result (about 10 minutes), which really got me confused, because as per msdn definition, it should not take much longer than 3 minutes, specially when even while processing all the requests at the same time the number of used threads from the threadpool is approx 25, which means resources are not being well explored at all!
The way it is working now, this type of application is far from scalable (say I had about 5000 clients requesting a bunch of urls all the time), and I fail to see how asyncis the way to fire multiple IO requests.
Further explanation about the application
Client side:
1. user enter the site
2. types 1000 urls in the text area
3. submits the urls
Server side:
1. receive urls as an array
2. perform the code
foreach (string url in urls)
{
tasks.Add(GetUrlAsync(url));
}
await Task.WhenAll(tasks);
//at this point the thread is
// returned to the pool to receive
// further requests.
notifies the client that work is done
Please, enlighten me!
Thank you.
from what I understand, performing an async operation without using a different thread is impossible, which means async must use other threads/start tasks at some point.
Nope. As I describe on my blog, pure async methods do not block threads.
So my question is: how would code A be better than code B regarding system resources?
A uses fewer threads than B.
(On a side note, do not use StartNew. It's horribly out-of-date and has very dangerous default parameter values. Use Task.Run instead. If you got this idea/code from a blog post or article, please pass the word along. StartNew is a cancer that seems to be taking over the Internet.)
should async calls be avoided in a loop?
Nope, that's fine.
Is there a reasonable max of async calls that should be fired at a time, or is firing any number of async calls ok?
Any number of them are fine, as long as your backend resource can handle it.
How does this scale?
Asynchronous I/O on .NET almost always uses IOCPs (I/O Completion Ports) underneath, which is generally considered the most scalable form of I/O available on Windows.
Do async methods, under the hood, start a task for each call?
Yes and no. The execution of every asynchronous method is represented by a Task instance, but these do not represent running tasks - they don't represent a thread.
I call async tasks Promise Tasks, as opposed to Delegate Tasks (tasks that actually do run on the thread pool).
really got me confused
One thing to be aware of when you're testing URL requests is that there's automatic throttling for URL requests built-in to .NET. Try setting ServicePointManager.DefaultConnectionLimit to int.MaxValue.