I have some code of the following form:
static async Task DoSomething(int n)
{
...
}
static void RunThreads(int totalThreads, int throttle)
{
var tasks = new List<Task>();
for (var n = 0; n < totalThreads; n++)
{
var task = DoSomething(n);
tasks.Add(task);
}
Task.WhenAll(tasks).Wait(); // all threads must complete
}
Trouble is, if I don't throttle the threads, things start falling apart. Now, I want to launch a maximum of throttle threads, and only start the new thread when an old one is complete. I've tried a few approaches and none so far has worked. Problems I have encountered include:
The tasks collection must be fully populated with all tasks, whether active or awaiting execution, otherwise the final .Wait() call only looks at the threads that it started with.
Chaining the execution seems to require use of Task.Run() or the like. But I need a reference to each task from the outset, and instantiating a task seems to kick it off automatically, which is what I don't want.
How to do this?
First, abstract away from threads. Especially since your operation is asynchronous, you shouldn't be thinking about "threads" at all. In the asynchronous world, you have tasks, and you can have a huge number of tasks compared to threads.
Throttling asynchronous code can be done using SemaphoreSlim:
static async Task DoSomething(int n);
static void RunConcurrently(int total, int throttle)
{
var mutex = new SemaphoreSlim(throttle);
var tasks = Enumerable.Range(0, total).Select(async item =>
{
await mutex.WaitAsync();
try { await DoSomething(item); }
finally { mutex.Release(); }
});
Task.WhenAll(tasks).Wait();
}
The simplest option IMO is to use TPL Dataflow. You just create an ActionBLock, limit it by the desired parallelism and start posting items into it. It makes sure to only run a certain amount of tasks at the same time, and when a task completes, it starts executing the next item:
async Task RunAsync(int totalThreads, int throttle)
{
var block = new ActionBlock<int>(
DoSomething,
new ExecutionDataFlowOptions { MaxDegreeOfParallelism = throttle });
for (var n = 0; n < totalThreads; n++)
{
block.Post(n);
}
block.Complete();
await block.Completion;
}
If I understand correctly, you can start tasks limited number of tasks mentioned by throttle parameter and wait for them to finish before starting next..
To wait for all started tasks to complete before starting new tasks, use the following implementation.
static async Task RunThreads(int totalThreads, int throttle)
{
var tasks = new List<Task>();
for (var n = 0; n < totalThreads; n++)
{
var task = DoSomething(n);
tasks.Add(task);
if (tasks.Count == throttle)
{
await Task.WhenAll(tasks);
tasks.Clear();
}
}
await Task.WhenAll(tasks); // wait for remaining
}
To add tasks as on when it is completed you can use the following code
static async Task RunThreads(int totalThreads, int throttle)
{
var tasks = new List<Task>();
for (var n = 0; n < totalThreads; n++)
{
var task = DoSomething(n);
tasks.Add(task);
if (tasks.Count == throttle)
{
var completed = await Task.WhenAny(tasks);
tasks.Remove(completed);
}
}
await Task.WhenAll(tasks); // all threads must complete
}
Stephen Toub gives the following example for throttling in his The Task-based Asynchronous Pattern document.
const int CONCURRENCY_LEVEL = 15;
Uri [] urls = …;
int nextIndex = 0;
var imageTasks = new List<Task<Bitmap>>();
while(nextIndex < CONCURRENCY_LEVEL && nextIndex < urls.Length)
{
imageTasks.Add(GetBitmapAsync(urls[nextIndex]));
nextIndex++;
}
while(imageTasks.Count > 0)
{
try
{
Task<Bitmap> imageTask = await Task.WhenAny(imageTasks);
imageTasks.Remove(imageTask);
Bitmap image = await imageTask;
panel.AddImage(image);
}
catch(Exception exc) { Log(exc); }
if (nextIndex < urls.Length)
{
imageTasks.Add(GetBitmapAsync(urls[nextIndex]));
nextIndex++;
}
}
Microsoft's Reactive Extensions (Rx) - NuGet "Rx-Main" - has this problem sorted very nicely.
Just do this:
static void RunThreads(int totalThreads, int throttle)
{
Observable
.Range(0, totalThreads)
.Select(n => Observable.FromAsync(() => DoSomething(n)))
.Merge(throttle)
.Wait();
}
Job done.
.NET 6 introduces Parallel.ForEachAsync. You could rewrite your code like this:
static async ValueTask DoSomething(int n)
{
...
}
static Task RunThreads(int totalThreads, int throttle)
=> Parallel.ForEachAsync(Enumerable.Range(0, totalThreads), new ParallelOptions() { MaxDegreeOfParallelism = throttle }, (i, _) => DoSomething(i));
Notes:
I had to change the return type of your DoSomething function from Task to ValueTask.
You probably want to avoid the .Wait() call, so I made the RunThreads method async.
It is not obvious from your example why you need access to the individual tasks. This code does not give you access to the tasks, but might still be helpful in many cases.
Here are some extension method variations to build on Sriram Sakthivel answer.
In the usage example, calls to DoSomething are being wrapped in an explicitly cast closure to allow passing arguments.
public static async Task RunMyThrottledTasks()
{
var myArgsSource = new[] { 1, 2, 3, 4, 5, 6, 7, 8, 9 };
await myArgsSource
.Select(a => (Func<Task<object>>)(() => DoSomething(a)))
.Throttle(2);
}
public static async Task<object> DoSomething(int arg)
{
// Await some async calls that need arg..
// ..then return result async Task..
return new object();
}
public static async Task<IEnumerable<T>> Throttle<T>(IEnumerable<Func<Task<T>>> toRun, int throttleTo)
{
var running = new List<Task<T>>(throttleTo);
var completed = new List<Task<T>>(toRun.Count());
foreach(var taskToRun in toRun)
{
running.Add(taskToRun());
if(running.Count == throttleTo)
{
var comTask = await Task.WhenAny(running);
running.Remove(comTask);
completed.Add(comTask);
}
}
return completed.Select(t => t.Result);
}
public static async Task Throttle(this IEnumerable<Func<Task>> toRun, int throttleTo)
{
var running = new List<Task>(throttleTo);
foreach(var taskToRun in toRun)
{
running.Add(taskToRun());
if(running.Count == throttleTo)
{
var comTask = await Task.WhenAny(running);
running.Remove(comTask);
}
}
}
What you need is a custom task scheduler. You can derive a class from System.Threading.Tasks.TaskScheduler and implement two major functions GetScheduledTasks(), QueueTask(), along with other functions to gain complete control over throttling tasks. Here is a well documented example.
https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler?view=net-5.0
You can actually emulate the Parallel.ForEachAsync method introduced as part of .NET 6. In order to emulate the same you can use the following code.
public static Task ForEachAsync<T>(IEnumerable<T> source, int maxDegreeOfParallelism, Func<T, Task> body) {
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(maxDegreeOfParallelism)
select Task.Run(async delegate {
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
Related
I have a collection of 1000 input message to process. I'm looping the input collection and starting the new task for each message to get processed.
//Assume this messages collection contains 1000 items
var messages = new List<string>();
foreach (var msg in messages)
{
Task.Factory.StartNew(() =>
{
Process(msg);
});
}
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time?
How to ensure this message get processed in the same sequence/order of the Collection?
You could use Parallel.Foreach and rely on MaxDegreeOfParallelism instead.
Parallel.ForEach(messages, new ParallelOptions {MaxDegreeOfParallelism = 10},
msg =>
{
// logic
Process(msg);
});
SemaphoreSlim is a very good solution in this case and I higly recommend OP to try this, but #Manoj's answer has flaw as mentioned in comments.semaphore should be waited before spawning the task like this.
Updated Answer: As #Vasyl pointed out Semaphore may be disposed before completion of tasks and will raise exception when Release() method is called so before exiting the using block must wait for the completion of all created Tasks.
int maxConcurrency=10;
var messages = new List<string>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach(var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Answer to Comments
for those who want to see how semaphore can be disposed without Task.WaitAll
Run below code in console app and this exception will be raised.
System.ObjectDisposedException: 'The semaphore has been disposed.'
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
// Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static void Process(string msg)
{
Thread.Sleep(2000);
Console.WriteLine(msg);
}
I think it would be better to use Parallel LINQ
Parallel.ForEach(messages ,
new ParallelOptions{MaxDegreeOfParallelism = 4},
x => Process(x);
);
where x is the MaxDegreeOfParallelism
With .NET 5.0 and Core 3.0 channels were introduced.
The main benefit of this producer/consumer concurrency pattern is that you can also limit the input data processing to reduce resource impact.
This is especially helpful when processing millions of data records.
Instead of reading the whole dataset at once into memory, you can now consecutively query only chunks of the data and wait for the workers to process it before querying more.
Code sample with a queue capacity of 50 messages and 5 consumer threads:
/// <exception cref="System.AggregateException">Thrown on Consumer Task exceptions.</exception>
public static async Task ProcessMessages(List<string> messages)
{
const int producerCapacity = 10, consumerTaskLimit = 3;
var channel = Channel.CreateBounded<string>(producerCapacity);
_ = Task.Run(async () =>
{
foreach (var msg in messages)
{
await channel.Writer.WriteAsync(msg);
// blocking when channel is full
// waiting for the consumer tasks to pop messages from the queue
}
channel.Writer.Complete();
// signaling the end of queue so that
// WaitToReadAsync will return false to stop the consumer tasks
});
var tokenSource = new CancellationTokenSource();
CancellationToken ct = tokenSource.Token;
var consumerTasks = Enumerable
.Range(1, consumerTaskLimit)
.Select(_ => Task.Run(async () =>
{
try
{
while (await channel.Reader.WaitToReadAsync(ct))
{
ct.ThrowIfCancellationRequested();
while (channel.Reader.TryRead(out var message))
{
await Task.Delay(500);
Console.WriteLine(message);
}
}
}
catch (OperationCanceledException) { }
catch
{
tokenSource.Cancel();
throw;
}
}))
.ToArray();
Task waitForConsumers = Task.WhenAll(consumerTasks);
try { await waitForConsumers; }
catch
{
foreach (var e in waitForConsumers.Exception.Flatten().InnerExceptions)
Console.WriteLine(e.ToString());
throw waitForConsumers.Exception.Flatten();
}
}
As pointed out by Theodor Zoulias:
On multiple consumer exceptions, the remaining tasks will continue to run and have to take the load of the killed tasks. To avoid this, I implemented a CancellationToken to stop all the remaining tasks and handle the exceptions combined in the AggregateException of waitForConsumers.Exception.
Side note:
The Task Parallel Library (TPL) might be good at automatically limiting the tasks based on your local resources. But when you are processing data remotely via RPC, it's necessary to manually limit your RPC calls to avoid filling the network/processing stack!
If your Process method is async you can't use Task.Factory.StartNew as it doesn't play well with an async delegate. Also there are some other nuances when using it (see this for example).
The proper way to do it in this case is to use Task.Run. Here's #ClearLogic answer modified for an async Process method.
static void Main(string[] args)
{
int maxConcurrency = 5;
List<string> messages = Enumerable.Range(1, 15).Select(e => e.ToString()).ToList();
using (SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
List<Task> tasks = new List<Task>();
foreach (var msg in messages)
{
concurrencySemaphore.Wait();
var t = Task.Run(async () =>
{
try
{
await Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Console.WriteLine("Exited using block");
Console.ReadKey();
}
private static async Task Process(string msg)
{
await Task.Delay(2000);
Console.WriteLine(msg);
}
You can create your own TaskScheduler and override QueueTask there.
protected virtual void QueueTask(Task task)
Then you can do anything you like.
One example here:
Limited concurrency level task scheduler (with task priority) handling wrapped tasks
You can simply set the max concurrency degree like this way:
int maxConcurrency=10;
var messages = new List<1000>();
using(SemaphoreSlim concurrencySemaphore = new SemaphoreSlim(maxConcurrency))
{
foreach(var msg in messages)
{
Task.Factory.StartNew(() =>
{
concurrencySemaphore.Wait();
try
{
Process(msg);
}
finally
{
concurrencySemaphore.Release();
}
});
}
}
If you need in-order queuing (processing might finish in any order), there is no need for a semaphore. Old fashioned if statements work fine:
const int maxConcurrency = 5;
List<Task> tasks = new List<Task>();
foreach (var arg in args)
{
var t = Task.Run(() => { Process(arg); } );
tasks.Add(t);
if(tasks.Count >= maxConcurrency)
Task.WaitAny(tasks.ToArray());
}
Task.WaitAll(tasks.ToArray());
I ran into a similar problem where I wanted to produce 5000 results while calling apis, etc. So, I ran some speed tests.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id).GetAwaiter().GetResult();
});
produced 100 results in 30 seconds.
Parallel.ForEach(products.Select(x => x.KeyValue).Distinct().Take(100), id =>
{
new ParallelOptions { MaxDegreeOfParallelism = 100 };
GetProductMetaData(productsMetaData, client, id);
});
Moving the GetAwaiter().GetResult() to the individual async api calls inside GetProductMetaData resulted in 14.09 seconds to produce 100 results.
foreach (var id in ids.Take(100))
{
GetProductMetaData(productsMetaData, client, id);
}
Complete non-async programming with the GetAwaiter().GetResult() in api calls resulted in 13.417 seconds.
var tasks = new List<Task>();
while (y < ids.Count())
{
foreach (var id in ids.Skip(y).Take(100))
{
tasks.Add(GetProductMetaData(productsMetaData, client, id));
}
y += 100;
Task.WhenAll(tasks).GetAwaiter().GetResult();
Console.WriteLine($"Finished {y}, {sw.Elapsed}");
}
Forming a task list and working through 100 at a time resulted in a speed of 7.36 seconds.
using (SemaphoreSlim cons = new SemaphoreSlim(10))
{
var tasks = new List<Task>();
foreach (var id in ids.Take(100))
{
cons.Wait();
var t = Task.Factory.StartNew(() =>
{
try
{
GetProductMetaData(productsMetaData, client, id);
}
finally
{
cons.Release();
}
});
tasks.Add(t);
}
Task.WaitAll(tasks.ToArray());
}
Using SemaphoreSlim resulted in 13.369 seconds, but also took a moment to boot to start using it.
var throttler = new SemaphoreSlim(initialCount: take);
foreach (var id in ids)
{
throttler.WaitAsync().GetAwaiter().GetResult();
tasks.Add(Task.Run(async () =>
{
try
{
skip += 1;
await GetProductMetaData(productsMetaData, client, id);
if (skip % 100 == 0)
{
Console.WriteLine($"started {skip}/{count}, {sw.Elapsed}");
}
}
finally
{
throttler.Release();
}
}));
}
Using Semaphore Slim with a throttler for my async task took 6.12 seconds.
The answer for me in this specific project was use a throttler with Semaphore Slim. Although the while foreach tasklist did sometimes beat the throttler, 4/6 times the throttler won for 1000 records.
I realize I'm not using the OPs code, but I think this is important and adds to this discussion because how is sometimes not the only question that should be asked, and the answer is sometimes "It depends on what you are trying to do."
Now to answer the specific questions:
How to limit the maximum number of parallel tasks in c#: I showed how to limit the number of tasks that are completed at a time.
Can we guess how many maximum messages simultaneously get processed at the time (assuming normal Quad core processor), or can we limit the maximum number of messages to be processed at the time? I cannot guess how many will be processed at a time unless I set an upper limit but I can set an upper limit. Obviously different computers function at different speeds due to CPU, RAM etc. and how many threads and cores the program itself has access to as well as other programs running in tandem on the same computer.
How to ensure this message get processed in the same sequence/order of the Collection? If you want to process everything in a specific order, it is synchronous programming. The point of being able to run things asynchronously is ensuring that they can do everything without an order. As you can see from my code, the time difference is minimal in 100 records unless you use async code. In the event that you need an order to what you are doing, use asynchronous programming up until that point, then await and do things synchronously from there. For example, task1a.start, task2a.start, then later task1a.await, task2a.await... then later task1b.start task1b.await and task2b.start task 2b.await.
public static void RunTasks(List<NamedTask> importTaskList)
{
List<NamedTask> runningTasks = new List<NamedTask>();
try
{
foreach (NamedTask currentTask in importTaskList)
{
currentTask.Start();
runningTasks.Add(currentTask);
if (runningTasks.Where(x => x.Status == TaskStatus.Running).Count() >= MaxCountImportThread)
{
Task.WaitAny(runningTasks.ToArray());
}
}
Task.WaitAll(runningTasks.ToArray());
}
catch (Exception ex)
{
Log.Fatal("ERROR!", ex);
}
}
you can use the BlockingCollection, If the consume collection limit has reached, the produce will stop producing until a consume process will finish. I find this pattern more easy to understand and implement than the SemaphoreSlim.
int TasksLimit = 10;
BlockingCollection<Task> tasks = new BlockingCollection<Task>(new ConcurrentBag<Task>(), TasksLimit);
void ProduceAndConsume()
{
var producer = Task.Factory.StartNew(RunProducer);
var consumer = Task.Factory.StartNew(RunConsumer);
try
{
Task.WaitAll(new[] { producer, consumer });
}
catch (AggregateException ae) { }
}
void RunConsumer()
{
foreach (var task in tasks.GetConsumingEnumerable())
{
task.Start();
}
}
void RunProducer()
{
for (int i = 0; i < 1000; i++)
{
tasks.Add(new Task(() => Thread.Sleep(1000), TaskCreationOptions.AttachedToParent));
}
}
Note that the RunProducer and RunConsumer has spawn two independent tasks.
Let's imagine some abstract code
private void Main()
{
var workTask1 = DoWork1();
var workTask2 = DoWork2();
var workTask3 = DoWork3();
await Task.WhenAll(workTask1, workTask2, workTask3);
AnalyzeWork(workTask1.Result, workTask2.Result, workTask3.Result);
}
private async Task<object> DoWork1()
{
var someOperationTask1 = someOperation1();
var someOperationTask2 = someOperation2();
await Task.WhenAll(someOperationTask1, someOperationTask2);
return new object
{
SomeOperationResult1 = someOperationTask1.Result,
SomeOperationResult2 = someOperationTask2.Result,
};
}
private async Task<object> DoWork2()
{
var someOperationTask3 = someOperation3();
var someOperationTask4 = someOperation4();
await Task.WhenAll(someOperationTask3, someOperationTask4);
return new object
{
SomeOperationResult3 = someOperationTask3.Result,
SomeOperationResult4 = someOperationTask4.Result,
};
}
private async Task<object> DoWork3()
{
var someOperationTask5 = someOperation5();
var someOperationTask6 = someOperation6();
await Task.WhenAll(someOperationTask5, someOperationTask6);
return new object
{
SomeOperationResult5 = someOperationTask5.Result,
SomeOperationResult6 = someOperationTask6.Result,
};
}
Where 3 methods are being run parallelly and each of them consists of 2 parallel's operations. And result of 3 methods is passed to some method.
My question is there are any restrictions? Is it ok to have nested Task.WhenAll and what's difference between nested Task.WhenAll and one level Task.WhenAll operations?
The only restrictions are the available memory of your system. The Task.WhenAll method attaches a continuation to each incomplete task, and this continuation is detached when that task completes. A continuation is a lightweight object similar to a Task. It's quite similar to what you get when you invoke the Task.ContinueWith method. Each continuation weights more or less about 100 bytes. It is unlikely that it will have any noticeable effect to your program, unless you need to Task.WhenAll tens of millions of tasks (or more) at once.
If you want a visual demonstration of what this method looks like inside, below is a rough sketch of its implementation:
// For demonstration purposes only. This code is full of bugs.
static Task WhenAll(params Task[] tasks)
{
var tcs = new TaskCompletionSource();
int completedCount = 0;
foreach (var task in tasks)
{
task.ContinueWith(t =>
{
completedCount++;
if (completedCount == tasks.Length) tcs.SetResult();
});
}
return tcs.Task;
}
Task.WhenAll(params System.Threading.Tasks.Task[] tasks) returns Task, but what is the proper way to asquire task results after calling this method?
After awaiting that task, results can be acquired from the original task by awaiting it once again which should be fine as tasks are completed already. Also it is possible to get result using Task.Result property which is often considered not good practice
Task<TResult1> task1= ...
Task<TResult2> task2= ...
Task<TResult3> task3= ...
await Task.WhenAll(task1, task2, task3)
var a = task1.Result; // returns TResult1
var b = await task1; // also returns TResult1
Which one should I choose here and why?
If you really have only an IEnumerable<Task<TResult>> and the task will be created on-the-fly (e.g. due to a .Select()) you would execute your tasks two times.
So, be sure that you either give an already materialized collection to Task.WhenAll() or get the result from the return value of that method:
var someTasks = Enumerable.Range(1, 10).Select(i => { Task.Delay(i * 100); return i; });
// Bad style, cause someTasks is an IEnumerable created on-the-fly
await Task.WhenAll(someTasks);
foreach(var task in someTasks)
{
var taskResult = await task;
Console.WriteLine(taskResult);
}
// Okay style, cause tasks are materialized before waiting, but easy to misuse wrong variable name.
var myTasks = someTasks.ToList();
await Task.WhenAll(myTasks);
foreach(var task in myTasks)
{
Console.WriteLine(task.Result);
}
// Best style
var results = await Task.WhenAll(someTasks);
foreach(var result in results)
{
Console.WriteLine(result);
}
Update
Just read this in your question:
However, I could not find any overload that would return anything but Task.
This happens, if the collection of tasks you give to the Task.WhenAll() method don't share a common Task<T> type. This could happen, if you e.g. want to run two tasks in parallel, but both return a different value. In that case you have to materialize the tasks and afterwards check the results individually:
public static class Program
{
public static async Task Main(string[] args)
{
var taskOne = ReturnTwo();
var taskTwo = ReturnPi();
await Task.WhenAll(taskOne, taskTwo);
Console.WriteLine(taskOne.Result);
Console.WriteLine(taskTwo.Result);
Console.ReadKey();
}
private static async Task<int> ReturnTwo()
{
await Task.Delay(500);
return 2;
}
private static async Task<double> ReturnPi()
{
await Task.Delay(500);
return Math.PI;
}
}
Here is the overload that returns Task<TResult[]> - MS Docs
Example:
static async Task Test()
{
List<Task<string>> tasks = new List<Task<string>>();
for (int i = 0; i < 5; i++)
{
var currentTask = GetStringAsync();
tasks.Add(currentTask);
}
string[] result = await Task.WhenAll(tasks);
}
static async Task<string> GetStringAsync()
{
await Task.Delay(1000);
return "Result string";
}
I have the following async method, which is a simpler wait and retry in case of failures:
public async Task RetryAsync(Func<Task> _action, int _ms = 1000, int _counter = 3) {
while (true) {
try {
await _action();
return; // success!
}
catch {
if (--_counter == 0)
throw;
await Task.Delay(_ms);
}
}
}
To, in theory, be called like this:
await RetryAsync(async () => {
_newID = myDBFunction();
}, 300);
Since the function passed to the RetryAsync method does not contain any await, then obviously this gets the warning:
This async method lacks 'await' operators and will run synchronously.
Consider using the 'await' operator to await non-blocking API calls,
or 'await Task.Run(...)' to do CPU-bound work on a background thread.
... and changing the calling code to this fixes the problem:
await RetryAsync(async () => {
await Task.Run(() => _newID = myDBFunction(););
}, 300);
Is there any other way to achieve parallelism for this simple case besides using Task.Run()? Any disadvantage you see to the latest code with Task.Run() on it?
First, I recommend using Polly. It's widely used, extensively tested, and has native support for asynchronous as well as synchronous usage.
But if you want to keep using your own, you can add a synchronous equivalent:
public async Task RetryAsync(Func<Task> _action, int _ms = 1000, int _counter = 3);
public void Retry(Action _action, int _ms = 1000, int _counter = 3);
which can be called as such:
Retry(() => {
_newID = myDBFunction();
}, 300);
If you want to always put synchronous code on a thread pool, you can add an overload for that, to:
public async Task RetryAsync(Func<Task> _action, int _ms = 1000, int _counter = 3);
public async Task RetryAsync(Action _action, int _ms = 1000, int _counter = 3) =>
await RetryAsync(() => Task.Run(_action), _ms, _counter);
which can be called as such:
await RetryAsync(() => {
_newID = myDBFunction();
}, 300);
I'm using Asp .Net 4.5.1.
I have tasks to run, which call a web-service, and some might fail. I need to run N successful tasks which perform some light CPU work and mainly call a web service, then stop, and I want to throttle.
For example, let's assume we have 300 URLs in some collection. We need to run a function called Task<bool> CheckUrlAsync(url) on each of them, with throttling, meaning, for example, having only 5 run "at the same time" (in other words, have maximum 5 connections used at any given time). Also, we only want to perform N (assume 100) successful operations, and then stop.
I've read this and this and still I'm not sure what would be the correct way to do it.
How would you do it?
Assume ASP .Net
Assume IO call (http call to web serice), no heavy CPU operations.
Use Semaphore slim.
var semaphore = new SemaphoreSlim(5);
var tasks = urlCollection.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
};
while(tasks.Where(t => t.Completed).Count() < 100)
{
await.Task.WhenAny(tasks);
}
Although I would prefer to use Rx.Net to produce some better code.
using(var semaphore = new SemaphoreSlim(5))
{
var results = urlCollection.ToObservable()
.Select(async url =>
{
await semaphore.WaitAsync();
try
{
return await CheckUrlAsync(url);
}
finally
{
semaphore.Release();
}
}).Take(100).ToList();
}
Okay...this is going to be fun.
public static class SemaphoreHelper
{
public static Task<T> ContinueWith<T>(
this SemaphoreSlim semaphore,
Func<Task<T>> action)
var ret = semaphore.WaitAsync()
.ContinueWith(action);
ret.ContinueWith(_ => semaphore.Release(), TaskContinuationOptions.None);
return ret;
}
var semaphore = new SemaphoreSlim(5);
var results = urlCollection.Select(
url => semaphore.ContinueWith(() => CheckUrlAsync(url)).ToList();
I do need to add that the code as it stands will still run all 300 URLs, it just will return quicker...thats all. You would need to add the cancelation token to the semaphore.WaitAsync(token) to cancel the queued work. Again I suggest using Rx.Net for that. Its just easier to use Rx.Net to get the cancelation token to work with .Take(100).
Try something like this?
private const int u_limit = 100;
private const int c_limit = 5;
List<Task> tasks = new List<Task>();
int totalRun = 0;
while (totalRun < u_limit)
{
for (int i = 0; i < c_limit; i++)
{
tasks.Add(Task.Run (() => {
// Your code here.
}));
}
Task.WaitAll(tasks);
totalRun += c_limit;
}