How to make parallel async and await use 100% of the cpu? - c#

I have searched a lot of codes saying that is possible to use multiple cores with parallel async, but none worked. It is always stuck in a single core.
Is it possible?
This is uses 100% of the cpu:
class Program
{
static void Main(string[] args)
{
Parallel.For(0, 100000, p =>
{
while (true) { }
});
}
}
Using parallel async, I dont get more than 12%:
class Program
{
static void Main(string[] args)
{
Task.Run(async () =>
{
var tasks = Enumerable.Range(0, 1000000).Select(i =>
{
while (true) { }
return Task.FromResult(0);
});
await Task.WhenAll(tasks);
}).GetAwaiter().GetResult();
}
}
100% cpu. Thanks all that helped, and not just mocked or tried to close:
class Program
{
static void Main(string[] args)
{
Task.Run(async () =>
{
var tasks = new List<Task>();
for (int i = 0; i < 100; i++)
tasks.Add(Task.Run(() =>
{
while (true) { }
}));
await Task.WhenAll(tasks);
}).GetAwaiter().GetResult();
}
}

Your code only creates one task, Task.FromResult returns a finished task and it's added after the while loop, that will be executed 100000 times, but one after other as the generation is done by the synchronous function Select.
You can change your code to this:
class Program
{
static void Main(string[] args)
{
var tasks = Enumerable.Range(0, 1000000).Select(i =>
{
return Task.Run(() =>
{
while (true) { }
return 0;
});
});
Task.WhenAll(tasks).GetAwaiter().GetResult();
}
}
It will use the 100%, tested.

When WhenAll goes to get the first task, it'll ask your query, task, for the first task. It will use the selector to try to translate the first result of Range (0), into a Task. Since your selector has an infinite loop, that will never return. The only work that ever happens is sitting waiting for the first task to be generated, when it never will be.
Other than that your main thread does nothing but ask a thread pool thread to do all the things, you never create any additional threads, and thus have no oppertunity to use multiple cores on your CPU.

Related

Limit the number of cores: new Task will block until core free from other tasks [duplicate]

I have a collection of 1000 input message to process. I'm looping the input collection and starting the new task for each message to get processed.
//Assume this messages collection contains 1000 items
var messages = new List<string>();
foreach (var msg in messages)
{
Task.Factory.StartNew(() =>
{
Process(msg);
});
}
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time?
How to ensure this message get processed in the same sequence/order of the Collection?
You could use Parallel.Foreach and rely on MaxDegreeOfParallelism instead.
Parallel.ForEach(messages, new ParallelOptions {MaxDegreeOfParallelism = 10},
msg =>
{
// logic
Process(msg);
});
SemaphoreSlim is a very good solution in this case and I higly recommend OP to try this, but #Manoj's answer has flaw as mentioned in comments.semaphore should be waited before spawning the task like this.
Updated Answer: As #Vasyl pointed out Semaphore may be disposed before completion of tasks and will raise exception when Release() method is called so before exiting the using block must wait for the completion of all created Tasks.
int maxConcurrency=10;
var messages = new List<string>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach(var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Answer to Comments
for those who want to see how semaphore can be disposed without Task.WaitAll
Run below code in console app and this exception will be raised.
System.ObjectDisposedException: 'The semaphore has been disposed.'
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
// Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static void Process(string msg)
{
Thread.Sleep(2000);
Console.WriteLine(msg);
}
I think it would be better to use Parallel LINQ
Parallel.ForEach(messages ,
new ParallelOptions{MaxDegreeOfParallelism = 4},
x => Process(x);
);
where x is the MaxDegreeOfParallelism
With .NET 5.0 and Core 3.0 channels were introduced.
The main benefit of this producer/consumer concurrency pattern is that you can also limit the input data processing to reduce resource impact.
This is especially helpful when processing millions of data records.
Instead of reading the whole dataset at once into memory, you can now consecutively query only chunks of the data and wait for the workers to process it before querying more.
Code sample with a queue capacity of 50 messages and 5 consumer threads:
/// <exception cref="System.AggregateException">Thrown on Consumer Task exceptions.</exception>
public static async Task ProcessMessages(List<string> messages)
{
const int producerCapacity = 10, consumerTaskLimit = 3;
var channel = Channel.CreateBounded<string>(producerCapacity);
_ = Task.Run(async () =>
{
foreach (var msg in messages)
{
await channel.Writer.WriteAsync(msg);
// blocking when channel is full
// waiting for the consumer tasks to pop messages from the queue
}
channel.Writer.Complete();
// signaling the end of queue so that
// WaitToReadAsync will return false to stop the consumer tasks
});
var tokenSource = new CancellationTokenSource();
CancellationToken ct = tokenSource.Token;
var consumerTasks = Enumerable
.Range(1, consumerTaskLimit)
.Select(_ => Task.Run(async () =>
{
try
{
while (await channel.Reader.WaitToReadAsync(ct))
{
ct.ThrowIfCancellationRequested();
while (channel.Reader.TryRead(out var message))
{
await Task.Delay(500);
Console.WriteLine(message);
}
}
}
catch (OperationCanceledException) { }
catch
{
tokenSource.Cancel();
throw;
}
}))
.ToArray();
Task waitForConsumers = Task.WhenAll(consumerTasks);
try { await waitForConsumers; }
catch
{
foreach (var e in waitForConsumers.Exception.Flatten().InnerExceptions)
Console.WriteLine(e.ToString());
throw waitForConsumers.Exception.Flatten();
}
}
As pointed out by Theodor Zoulias:
On multiple consumer exceptions, the remaining tasks will continue to run and have to take the load of the killed tasks. To avoid this, I implemented a CancellationToken to stop all the remaining tasks and handle the exceptions combined in the AggregateException of waitForConsumers.Exception.
Side note:
The Task Parallel Library (TPL) might be good at automatically limiting the tasks based on your local resources. But when you are processing data remotely via RPC, it's necessary to manually limit your RPC calls to avoid filling the network/processing stack!
If your Process method is async you can't use Task.Factory.StartNew as it doesn't play well with an async delegate. Also there are some other nuances when using it (see this for example).
The proper way to do it in this case is to use Task.Run. Here's #ClearLogic answer modified for an async Process method.
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Run(async () =>
{
try
{
await Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static async Task Process(string msg)
{
await Task.Delay(2000);
Console.WriteLine(msg);
}
You can create your own TaskScheduler and override QueueTask there.
protected virtual void QueueTask(Task task)
Then you can do anything you like.
One example here:
Limited concurrency level task scheduler (with task priority) handling wrapped tasks
You can simply set the max concurrency degree like this way:
int maxConcurrency=10;
var messages = new List<1000>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
foreach(var msg in messages)
{
Task.Factory.StartNew(() =>
{
concurrencySemaphore.Wait();
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
}
}
If you need in-order queuing (processing might finish in any order), there is no need for a semaphore. Old fashioned if statements work fine:
const int maxConcurrency = 5;
List<Task> tasks = new List<Task>();
foreach (var arg in args)
{
var t = Task.Run(() => { Process(arg); } );
tasks.Add(t);
if(tasks.Count >= maxConcurrency)
Task.WaitAny(tasks.ToArray());
}
Task.WaitAll(tasks.ToArray());
I ran into a similar problem where I wanted to produce 5000 results while calling apis, etc. So, I ran some speed tests.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id).GetAwaiter().GetResult();
});
produced 100 results in 30 seconds.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id);
});
Moving the GetAwaiter().GetResult() to the individual async api calls inside GetProductMetaData resulted in 14.09 seconds to produce 100 results.
foreach (var id in ids.Take(100))
{
GetProductMetaData(productsMetaData, client, id);
}
Complete non-async programming with the GetAwaiter().GetResult() in api calls resulted in 13.417 seconds.
var tasks = new List<Task>();
while (y < ids.Count())
{
foreach (var id in ids.Skip(y).Take(100))
{
tasks.Add(GetProductMetaData(productsMetaData, client, id));
}
y += 100;
Task.WhenAll(tasks).GetAwaiter().GetResult();
Console.WriteLine($"Finished {y}, {sw.Elapsed}");
}
Forming a task list and working through 100 at a time resulted in a speed of 7.36 seconds.
using (SemaphoreSlim cons = new SemaphoreSlim(10))
{
var tasks = new List<Task>();
foreach (var id in ids.Take(100))
{
cons.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
GetProductMetaData(productsMetaData, client, id);
}
finally
{
cons.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Using SemaphoreSlim resulted in 13.369 seconds, but also took a moment to boot to start using it.
var throttler = new SemaphoreSlim(initialCount: take);
foreach (var id in ids)
{
throttler.WaitAsync().GetAwaiter().GetResult();
tasks.Add(Task.Run(async () =>
{
try
{
skip += 1;
await GetProductMetaData(productsMetaData, client, id);
if (skip % 100 == 0)
{
Console.WriteLine($"started {skip}/{count}, {sw.Elapsed}");
}
}
finally
{
throttler.Release();
}
}));
}
Using Semaphore Slim with a throttler for my async task took 6.12 seconds.
The answer for me in this specific project was use a throttler with Semaphore Slim. Although the while foreach tasklist did sometimes beat the throttler, 4/6 times the throttler won for 1000 records.
I realize I'm not using the OPs code, but I think this is important and adds to this discussion because how is sometimes not the only question that should be asked, and the answer is sometimes "It depends on what you are trying to do."
Now to answer the specific questions:
How to limit the maximum number of parallel tasks in c#: I showed how to limit the number of tasks that are completed at a time.
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time? I cannot guess how many will be processed at a time unless I set an upper limit but I can set an upper limit. Obviously different computers function at different speeds due to CPU, RAM etc. and how many threads and cores the program itself has access to as well as other programs running in tandem on the same computer.
How to ensure this message get processed in the same sequence/order of the Collection? If you want to process everything in a specific order, it is synchronous programming. The point of being able to run things asynchronously is ensuring that they can do everything without an order. As you can see from my code, the time difference is minimal in 100 records unless you use async code. In the event that you need an order to what you are doing, use asynchronous programming up until that point, then await and do things synchronously from there. For example, task1a.start, task2a.start, then later task1a.await, task2a.await... then later task1b.start task1b.await and task2b.start task 2b.await.
public static void RunTasks(List<NamedTask> importTaskList)
{
List<NamedTask> runningTasks = new List<NamedTask>();
try
{
foreach (NamedTask currentTask in importTaskList)
{
currentTask.Start();
runningTasks.Add(currentTask);
if (runningTasks.Where(x => x.Status == TaskStatus.Running).Count() >= MaxCountImportThread)
{
Task.WaitAny(runningTasks.ToArray());
}
}
Task.WaitAll(runningTasks.ToArray());
}
catch (Exception ex)
{
Log.Fatal("ERROR!", ex);
}
}
you can use the BlockingCollection, If the consume collection limit has reached, the produce will stop producing until a consume process will finish. I find this pattern more easy to understand and implement than the SemaphoreSlim.
int TasksLimit = 10;
BlockingCollection<Task> tasks = new BlockingCollection<Task>(new ConcurrentBag<Task>(), TasksLimit);
void ProduceAndConsume()
{
var producer = Task.Factory.StartNew(RunProducer);
var consumer = Task.Factory.StartNew(RunConsumer);
try
{
Task.WaitAll(new[] { producer, consumer });
}
catch (AggregateException ae) { }
}
void RunConsumer()
{
foreach (var task in tasks.GetConsumingEnumerable())
{
task.Start();
}
}
void RunProducer()
{
for (int i = 0; i < 1000; i++)
{
tasks.Add(new Task(() => Thread.Sleep(1000), TaskCreationOptions.AttachedToParent));
}
}
Note that the RunProducer and RunConsumer has spawn two independent tasks.

When will awaiting a method exit the program?

I've been reading the following post, specifically this point:
On your application’s entry point method, e.g. Main. When you await an instance that’s not yet completed, execution returns to the caller of the method. In the case of Main, this would return out of Main, effectively ending the program.
I've been trying to do that, but I can't seem to make it exit. Here's the code I have:
class Program
{
// ending main!!!
static async Task Main(string[] args)
{
Task<Task<string>> t = new Task<Task<string>>(async () =>
{
Console.WriteLine("Synchronous");
for (int i = 0; i < 10; i++)
{
Console.WriteLine("Synchronous");
Console.WriteLine("Synchronous I is: " + i);
await Task.Run(() => { Console.WriteLine("Asynchronous"); });
await Task.Run(() => { for (int j = 0; j < 1000; j++) { Console.WriteLine( "Asynchronous, program should have exited by now?"); }});
}
return "ASD";
});
t.Start();
await t;
}
}
What exactly am I missing here? Shouldn't the program exit upon hitting the line await t; or by following through the thread at await Task.Run(() => { Console.WriteLine("Asynchronous"); });?
This is a Console application's main method.
The article you linked is from 2012, which predates the addition of async support to Main(), which is why what it's talking about seems wrong.
For the current implementation, consider the following code:
public static async Task Main()
{
Console.WriteLine("Awaiting");
await Task.Delay(2000);
Console.WriteLine("Awaited");
}
This gets converted by the compiler as follows (I used "JustDecompile" to decompile this):
private static void <Main>()
{
Program.Main().GetAwaiter().GetResult();
}
public static async Task Main()
{
Console.WriteLine("Awaiting");
await Task.Delay(2000);
Console.WriteLine("Awaited");
}
Now you can see why the program doesn't exit.
The async Task Main() gets called from static void <Main>() which waits for the Task returned from async Task Main() to complete (by accessing .GetResult()) before the program exits.

Task.Status should be Running but shows RanToCompletion

I'm puzzled with this situation, where a class has a method that launches two periodic Tasks and then a property is used to check if both Tasks are still running or not, but the result is unexpected. Here is the code (simplified):
public partial class UdpClientConnector
{
Task localListener;
Task periodicSubscriber;
bool keepWorking = false;
public bool IsRunning
{
get
{
if ((localListener != null) && (periodicSubscriber != null))
{
return (localListener.Status == TaskStatus.Running) &&
(periodicSubscriber.Status == TaskStatus.Running);
}
else
return false;
}
}
public void Start()
{
keepWorking = true;
localListener = new Task(() => LocalListenerWorker());
localListener.Start();
periodicSubscriber = new Task(() => PeriodicSubscriberWorker());
periodicSubscriber.Start();
}
public void Stop()
{
keepWorking = false;
localListener.Wait();
periodicSubscriber.Wait();
}
async void LocalListenerWorker()
{
while (keepWorking)
{
// Do some work and then wait a bit
await Task.Delay(1000);
}
}
async void PeriodicSubscriberWorker()
{
while (keepWorking)
{
// Do some (other) work and then wait a bit
await Task.Delay(1000);
}
}
}
To test this boilerplate I used the following:
UdpClientConnector connector = new UdpClientConnector();
// This assert is successful because the two Tasks are not yet started
Assert.IsTrue(!connector.IsRunning);
// Starts the tasks and wait a bit
Connector.Start();
Task.Delay(2000).Wait();
// This fails
Assert.IsTrue(connector.IsRunning);
When I've tried to debug the test case, I've found that two Tasks are in the RanToCompletion state, which is unexpected due the fact that both tasks are just loops and should not terminate until keepWorking becomes false.
I've tried also to start the Tasks using Task.Factory.StartNew(..) with same results.
What I'm missing? Thank you!
The problem is with how you start the tasks, and indeed the task methods.
localListener = new Task(() => LocalListenerWorker());
That task will complete when LocalListenerWorker returns - which it will do pretty much immediately (when it hits the first await expression). It doesn't wait for the asynchronous operation to actually complete (i.e. the loop to finish).
async void methods should almost never be used - they're basically only there to support event handlers.
I suggest you rewrite your methods to return Task, and then use Task.Run to start them, passing in a method group:
Task.Run(LocalListenerWorker);
...
private async Task LocalListenerWorker()
{
// Body as before
}
The task by Task.Run will only complete when the task returned by LocalListenerWorker completes, which is when the loop body finishes.
Here's a complete demo:
using System;
using System.Threading.Tasks;
class Program
{
static void Main(string[] args)
{
Task task1 = Task.Run(Loop);
// Don't do this normally! It's just as a simple demo
// in a console app...
task1.Wait();
Console.WriteLine("First task done");
Task task2 = new Task(() => Broken());
task2.Start();
// Don't do this normally! It's just as a simple demo
// in a console app...
task2.Wait();
Console.WriteLine("Second task done");
}
static async Task Loop()
{
for (int i = 0; i < 5; i++)
{
await Task.Delay(1000);
Console.WriteLine(i);
}
}
static async void Broken()
{
for (int i = 0; i < 5; i++)
{
await Task.Delay(1000);
Console.WriteLine(i);
}
}
}
The output shows:
0
1
2
3
4
First task done
Second task done
The first task behaves as expected, and only completes when the first async method has really completed. The second task behaves like your current code: it completes as soon as the second async method has returned - which happens almost immediately due to the await.

How to make a task more independent?

Originally I wrote my C# program using Threads and ThreadPooling, but most of the users on here were telling me Async was a better approach for more efficiency. My program sends JSON objects to a server until status code of 200 is returned and then moves on to the next task.
The problem is once one of my Tasks retrieves a status code of 200, it waits for the other Tasks to get 200 code and then move onto the next Task. I want each task to continue it's next task without waiting for other Tasks to finish (or catch up) by receiving a 200 response.
Below is my main class to run my tasks in parallel.
public static void Main (string[] args)
{
var tasks = new List<Task> ();
for (int i = 0; i < 10; i++) {
tasks.Add (getItemAsync (i));
}
Task.WhenAny (tasks);
}
Here is the getItemAsync() method that does the actual sending of information to another server. Before this method works, it needs a key. The problem also lies here, let's say I run 100 tasks, all 100 tasks will wait until every single one has a key.
public static async Task getItemAsync (int i)
{
if (key == "") {
await getProductKey (i).ConfigureAwait(false);
}
Uri url = new Uri ("http://www.website.com/");
var content = new FormUrlEncodedContent (new[] {
...
});
while (!success) {
using (HttpResponseMessage result = await client.PostAsync (url, content)) {
if (result.IsSuccessStatusCode) {
string resultContent = await result.Content.ReadAsStringAsync ().ConfigureAwait(false);
Console.WriteLine(resultContent);
success=true;
}
}
}
await doMoreAsync (i);
}
Here's the function that retrieves the keys, it uses HttpClient and gets parsed.
public static async Task getProductKey (int i)
{
string url = "http://www.website.com/key";
var stream = await client.GetStringAsync (url).ConfigureAwait(false);
var doc = new HtmlDocument ();
doc.LoadHtml (stream);
HtmlNode node = doc.DocumentNode.SelectNodes ("//input[#name='key']") [0];
try {
key = node.Attributes ["value"].Value;
} catch (Exception e) {
Console.WriteLine ("Exception: " + e);
}
}
After every task receives a key and has a 200 status code, it runs doMoreAsync(). I want any individual Task that retrieved the 200 code to run doMoreAsync() without waiting for other tasks to catch up. How do I go about this?
Your Main method is not waiting for a Task to complete, it justs fires off a bunch of async tasks and returns.
If you want to await an async task you can only do that from an async method. Workaround is to kick off an async task from the Main method and wait for its completion using the blocking Task.Wait():
public static void Main(string[] args) {
Task.Run(async () =>
{
var tasks = new List<Task> ();
for (int i = 0; i < 10; i++) {
tasks.Add (getItemAsync (i));
}
var finishedTask = await Task.WhenAny(tasks); // This awaits one task
}).Wait();
}
When you are invoking getItemAsync() from an async method, you can remove the ConfigureAwait(false) as well. ConfigureAwait(false) just makes sure some code does not execute on the UI thread.
If you want to append another task to a task, you can also append it to the previous Task directly by using ContinueWith():
getItemAsync(i).ContinueWith(anotherTask);
Your main problem seems to be that you're sharing the key field across all your concurrently-running asynchronous operations. This will inevitably lead to race hazards. Instead, you should alter your getProductKey method to return each retrieved key as the result of its asynchronous operation:
// ↓ result type of asynchronous operation
public static async Task<string> getProductKey(int i)
{
// retrieval logic here
// return key as result of asynchronous operation
return node.Attributes["value"].Value;
}
Then, you consume it like so:
public static async Task getItemAsync(int i)
{
string key;
try
{
key = await getProductKey(i).ConfigureAwait(false);
}
catch
{
// handle exceptions
}
// use local variable 'key' in further asynchronous operations
}
Finally, in your main logic, use a WaitAll to prevent the console application from terminating before all your tasks complete. Although WaitAll is a blocking call, it is acceptable in this case since you want the main thread to block.
public static void Main (string[] args)
{
var tasks = new List<Task> ();
for (int i = 0; i < 10; i++) {
tasks.Add(getItemAsync(i));
}
Task.WaitAll(tasks.ToArray());
}

How to throttle multiple asynchronous tasks?

I have some code of the following form:
static async Task DoSomething(int n)
{
...
}
static void RunThreads(int totalThreads, int throttle)
{
var tasks = new List<Task>();
for (var n = 0; n < totalThreads; n++)
{
var task = DoSomething(n);
tasks.Add(task);
}
Task.WhenAll(tasks).Wait(); // all threads must complete
}
Trouble is, if I don't throttle the threads, things start falling apart. Now, I want to launch a maximum of throttle threads, and only start the new thread when an old one is complete. I've tried a few approaches and none so far has worked. Problems I have encountered include:
The tasks collection must be fully populated with all tasks, whether active or awaiting execution, otherwise the final .Wait() call only looks at the threads that it started with.
Chaining the execution seems to require use of Task.Run() or the like. But I need a reference to each task from the outset, and instantiating a task seems to kick it off automatically, which is what I don't want.
How to do this?
First, abstract away from threads. Especially since your operation is asynchronous, you shouldn't be thinking about "threads" at all. In the asynchronous world, you have tasks, and you can have a huge number of tasks compared to threads.
Throttling asynchronous code can be done using SemaphoreSlim:
static async Task DoSomething(int n);
static void RunConcurrently(int total, int throttle)
{
var mutex = new SemaphoreSlim(throttle);
var tasks = Enumerable.Range(0, total).Select(async item =>
{
await mutex.WaitAsync();
try { await DoSomething(item); }
finally { mutex.Release(); }
});
Task.WhenAll(tasks).Wait();
}
The simplest option IMO is to use TPL Dataflow. You just create an ActionBLock, limit it by the desired parallelism and start posting items into it. It makes sure to only run a certain amount of tasks at the same time, and when a task completes, it starts executing the next item:
async Task RunAsync(int totalThreads, int throttle)
{
var block = new ActionBlock<int>(
DoSomething,
new ExecutionDataFlowOptions { MaxDegreeOfParallelism = throttle });
for (var n = 0; n < totalThreads; n++)
{
block.Post(n);
}
block.Complete();
await block.Completion;
}
If I understand correctly, you can start tasks limited number of tasks mentioned by throttle parameter and wait for them to finish before starting next..
To wait for all started tasks to complete before starting new tasks, use the following implementation.
static async Task RunThreads(int totalThreads, int throttle)
{
var tasks = new List<Task>();
for (var n = 0; n < totalThreads; n++)
{
var task = DoSomething(n);
tasks.Add(task);
if (tasks.Count == throttle)
{
await Task.WhenAll(tasks);
tasks.Clear();
}
}
await Task.WhenAll(tasks); // wait for remaining
}
To add tasks as on when it is completed you can use the following code
static async Task RunThreads(int totalThreads, int throttle)
{
var tasks = new List<Task>();
for (var n = 0; n < totalThreads; n++)
{
var task = DoSomething(n);
tasks.Add(task);
if (tasks.Count == throttle)
{
var completed = await Task.WhenAny(tasks);
tasks.Remove(completed);
}
}
await Task.WhenAll(tasks); // all threads must complete
}
Stephen Toub gives the following example for throttling in his The Task-based Asynchronous Pattern document.
const int CONCURRENCY_LEVEL = 15;
Uri [] urls = …;
int nextIndex = 0;
var imageTasks = new List<Task<Bitmap>>();
while(nextIndex < CONCURRENCY_LEVEL && nextIndex < urls.Length)
{
imageTasks.Add(GetBitmapAsync(urls[nextIndex]));
nextIndex++;
}
while(imageTasks.Count > 0)
{
try
{
Task<Bitmap> imageTask = await Task.WhenAny(imageTasks);
imageTasks.Remove(imageTask);
Bitmap image = await imageTask;
panel.AddImage(image);
}
catch(Exception exc) { Log(exc); }
if (nextIndex < urls.Length)
{
imageTasks.Add(GetBitmapAsync(urls[nextIndex]));
nextIndex++;
}
}
Microsoft's Reactive Extensions (Rx) - NuGet "Rx-Main" - has this problem sorted very nicely.
Just do this:
static void RunThreads(int totalThreads, int throttle)
{
Observable
.Range(0, totalThreads)
.Select(n => Observable.FromAsync(() => DoSomething(n)))
.Merge(throttle)
.Wait();
}
Job done.
.NET 6 introduces Parallel.ForEachAsync. You could rewrite your code like this:
static async ValueTask DoSomething(int n)
{
...
}
static Task RunThreads(int totalThreads, int throttle)
=> Parallel.ForEachAsync(Enumerable.Range(0, totalThreads), new ParallelOptions() { MaxDegreeOfParallelism = throttle }, (i, _) => DoSomething(i));
Notes:
I had to change the return type of your DoSomething function from Task to ValueTask.
You probably want to avoid the .Wait() call, so I made the RunThreads method async.
It is not obvious from your example why you need access to the individual tasks. This code does not give you access to the tasks, but might still be helpful in many cases.
Here are some extension method variations to build on Sriram Sakthivel answer.
In the usage example, calls to DoSomething are being wrapped in an explicitly cast closure to allow passing arguments.
public static async Task RunMyThrottledTasks()
{
var myArgsSource = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
await myArgsSource
.Select(a => (Func<Task<object>>)(() => DoSomething(a)))
.Throttle(2);
}
public static async Task<object> DoSomething(int arg)
{
// Await some async calls that need arg..
// ..then return result async Task..
return new object();
}
public static async Task<IEnumerable<T>> Throttle<T>(IEnumerable<Func<Task<T>>> toRun, int throttleTo)
{
var running = new List<Task<T>>(throttleTo);
var completed = new List<Task<T>>(toRun.Count());
foreach(var taskToRun in toRun)
{
running.Add(taskToRun());
if(running.Count == throttleTo)
{
var comTask = await Task.WhenAny(running);
running.Remove(comTask);
completed.Add(comTask);
}
}
return completed.Select(t => t.Result);
}
public static async Task Throttle(this IEnumerable<Func<Task>> toRun, int throttleTo)
{
var running = new List<Task>(throttleTo);
foreach(var taskToRun in toRun)
{
running.Add(taskToRun());
if(running.Count == throttleTo)
{
var comTask = await Task.WhenAny(running);
running.Remove(comTask);
}
}
}
What you need is a custom task scheduler. You can derive a class from System.Threading.Tasks.TaskScheduler and implement two major functions GetScheduledTasks(), QueueTask(), along with other functions to gain complete control over throttling tasks. Here is a well documented example.
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler?view=net-5.0
You can actually emulate the Parallel.ForEachAsync method introduced as part of .NET 6. In order to emulate the same you can use the following code.
public static Task ForEachAsync<T>(IEnumerable<T> source, int maxDegreeOfParallelism, Func<T, Task> body) {
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(maxDegreeOfParallelism)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}

Categories

Resources