Clarification on running multiple async tasks in parallel with throttling - c#

EDIT: Since the Bulkhead policy needs to be wrapped with a WaitAndRetry policy, anyway...I'm leaning towards example 3 as the best solution to keep parallelism, throttling, and polly policy retrying. Just seems strange since I thought the Parallel.ForEach was for sync operations and Bulkhead would be better for async
I'm trying to run multiple async Tasks in parallel with throttling using polly AsyncBulkheadPolicy. My understanding so far is that the policy method ExecuteAsync does not itself make a call onto a thread, but is leaving that to the default TaskScheduler or someone before it. Thus, if my tasks are CPU bound in some way then I need to use Parallel.ForEach when executing tasks or Task.Run() with the ExecuteAsync method in order to schedule the tasks to background threads.
Can someone look at the examples below and clarify how they would work in terms of parallism and threadpooling?
https://github.com/App-vNext/Polly/wiki/Bulkhead - Operation: Bulkhead policy does not create it's own threads, it assumes we have already done so.
async Task DoSomething(IEnumerable<object> objects);
//Example 1:
//Simple use, but then I don't have access to retry policies from polly
Parallel.ForEach(groupedObjects, (set) =>
{
var task = DoSomething(set);
task.Wait();
});
//Example 2:
//Uses default TaskScheduler which may or may not run the tasks in parallel
var parallelTasks = new List<Task>();
foreach (var set in groupedObjects)
{
var task = bulkheadPolicy.ExecuteAsync(async () => DoSomething(set));
parallelTasks.Add(task);
};
await Task.WhenAll(parallelTasks);
//Example 3:
//seems to defeat the purpose of the bulkhead since Parallel.ForEach and
//PolicyBulkheadAsync can both do throttling...just use basic RetryPolicy
//here?
Parallel.ForEach(groupedObjects, (set) =>
{
var task = bulkheadPolicy.ExecuteAsync(async () => DoSomething(set));
task.Wait();
});
//Example 4:
//Task.Run still uses the default Task scheduler and isn't any different than
//Example 2; just makes more tasks...this is my understanding.
var parallelTasks = new List<Task>();
foreach (var set in groupedObjects)
{
var task = Task.Run(async () => await bulkheadPolicy.ExecuteAsync(async () => DoSomething(set)));
parallelTasks.Add(task);
};
await Task.WhenAll(parallelTasks);
DoSomething is an async method doing operations on a set of objects. I'd like this to happen in parallel threads while respecting retry policies from polly and allowing for throttling.
I seem to have confused myself along the way in what exactly the functional behavior of Parallel.ForEach and using Bulkhead.ExecuteAsync does, however, when it comes to how tasks/threads are handled.

You are probably right that using Parallel.ForEach defeats the purpose of the bulkhead. I think that a simple loop with a delay will do the job of feeding the bulkhead with tasks. Although I guess that in a real life example there would be a continuous stream of data, and not a predefined list or array.
using Polly;
using Polly.Bulkhead;
static async Task Main(string[] args)
{
var groupedObjects = Enumerable.Range(0, 10)
.Select(n => new object[] { n }); // Create 10 sets to work with
var bulkheadPolicy = Policy
.BulkheadAsync(3, 3); // maxParallelization, maxQueuingActions
var parallelTasks = new List<Task>();
foreach (var set in groupedObjects)
{
Console.WriteLine(#$"Scheduling, Available: {bulkheadPolicy
.BulkheadAvailableCount}, QueueAvailable: {bulkheadPolicy
.QueueAvailableCount}");
// Start the task
var task = bulkheadPolicy.ExecuteAsync(async () =>
{
// Await the task without capturing the context
await DoSomethingAsync(set).ConfigureAwait(false);
});
parallelTasks.Add(task);
await Task.Delay(50); // Interval between scheduling more tasks
}
var whenAllTasks = Task.WhenAll(parallelTasks);
try
{
// Await all the tasks (await throws only one of the exceptions)
await whenAllTasks;
}
catch when (whenAllTasks.IsFaulted) // It might also be canceled
{
// Ignore rejections, rethrow other exceptions
whenAllTasks.Exception.Handle(ex => ex is BulkheadRejectedException);
}
Console.WriteLine(#$"Processed: {parallelTasks
.Where(t => t.Status == TaskStatus.RanToCompletion).Count()}");
Console.WriteLine($"Faulted: {parallelTasks.Where(t => t.IsFaulted).Count()}");
}
static async Task DoSomethingAsync(IEnumerable<object> set)
{
// Pretend we are doing something with the set
await Task.Delay(500).ConfigureAwait(false);
}
Output:
Scheduling, Available: 3, QueueAvailable: 3
Scheduling, Available: 2, QueueAvailable: 3
Scheduling, Available: 1, QueueAvailable: 3
Scheduling, Available: 0, QueueAvailable: 3
Scheduling, Available: 0, QueueAvailable: 2
Scheduling, Available: 0, QueueAvailable: 1
Scheduling, Available: 0, QueueAvailable: 0
Scheduling, Available: 0, QueueAvailable: 0
Scheduling, Available: 0, QueueAvailable: 0
Scheduling, Available: 0, QueueAvailable: 1
Processed: 7
Faulted: 3
Try it on Fiddle.
Update: A slightly more realistic version of DoSomethingAsync, that actually forces the CPU to do some real work (CPU utilization near 100% in my quad core machine).
private static async Task DoSomethingAsync(IEnumerable<object> objects)
{
await Task.Run(() =>
{
long sum = 0; for (int i = 0; i < 500000000; i++) sum += i;
}).ConfigureAwait(false);
}
This method is not running for all the data sets. It's running only for the sets that are not rejected by the bulkhead.

Related

TPL Dataflow: How to start the next async action when the current one hasn't finished yet, preserving the execution order?

Consider the following program, which uses TPL Dataflow. Hence, ActionBlock comes from the Dataflow library.
internal static class Program
{
public static async Task Main(string[] args)
{
var actionBlock = new ActionBlock<int>(async i =>
{
Console.WriteLine($"Started with {i}");
await DoSomethingAsync(i);
Console.WriteLine($"Done with {i}");
});
for (int i = 0; i < 5; i++)
{
actionBlock.Post(i);
}
actionBlock.Complete();
await actionBlock.Completion;
}
private static async Task DoSomethingAsync(int i)
{
await Task.Delay(1000);
}
}
The output of this program is:
Started with 0
Done with 0
Started with 1
Done with 1
Started with 2
Done with 2
Started with 3
Done with 3
Started with 4
Done with 4
Reason is that the ActionBlock only starts processing the next task when the previous asynynchronous task was finished.
How can I force it to start processing the next task, even though the previous wasn't fully finished. MaxDegreeOfParallelism isn't an option, as that can mess up the order.
So I'd like the output to be:
Started with 0
Started with 1
Started with 2
Started with 3
Started with 4
Done with 0
Done with 1
Done with 2
Done with 3
Done with 4
I could get rid of the async/await and replace it with ContinueWith. But that has two disadvantages:
The ActionBlock think it's done with the message immediately. An optional call to Complete() would result in the pipeline being completed directly, instead of after the asynchronous action to be completed.
I'd like to add a BoundedCapacity to limit the amount of messages currently still waiting to be fully finished. But because of 1. this BoundedCapacity has no effect.
In situations like this I would try to remove the requirement that things get processed in order, so that you can process in parallel, and then report sequentially.
//The transform block can process everything in parallel,
//but by default the inputs and outputs remain ordered
var processStuff = new TransformBlock<int, string>(async i =>
{
Console.WriteLine($"Started with {i}");
await DoSomethingAsync(i);
return $"Done with {i}";
}, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5 });
//This action block is your reporting block that uses the results from
//the transform block, and it will be executed in order.
var useStuff = new ActionBlock<string>(result =>
{
Console.WriteLine(result);
});
//when linking make sure to propagate completion.
processStuff.LinkTo(useStuff, new DataflowLinkOptions { PropagateCompletion = true });
for (int i = 0; i < 5; i++)
{
Console.WriteLine("Posting {0}", i);
processStuff.Post(i);
}
//mark the top of your pipeline as complete, and that will propagate
//to the end.
processStuff.Complete();
//wait on your last block to finish processing everything.
await useStuff.Completion;
output from this code produced the following as an example. Notice that the "started with" statements are not necessarily even in the order of the postings.
Posting 0
Posting 1
Posting 2
Posting 3
Posting 4
Started with 1
Started with 0
Started with 2
Started with 4
Started with 3
Done with 0
Done with 1
Done with 2
Done with 3
Done with 4
I did, in the meantime, find a solution/workaround, by using two blocks, and passing the asynchronous Task from the first block to the next block, where it is waited for synchronously using .Wait().
So, like this:
using System.Reactive.Linq;
using System.Threading.Tasks.Dataflow;
internal static class Program
{
public static async Task Main(string[] args)
{
var transformBlock = new TransformBlock<int, Task<int>>(async i =>
{
Console.WriteLine($"Started with {i}");
await DoSomethingAsync(i);
return i;
});
var actionBlock = new ActionBlock<Task<int>>(task =>
{
task.Wait();
Console.WriteLine($"Done with {task.Result}");
});
transformBlock.LinkTo(actionBlock, new DataflowLinkOptions { PropagateCompletion = true });
for (int i = 0; i < 5; i++)
{
transformBlock.Post(i);
}
transformBlock.Complete();
await actionBlock.Completion;
}
private static Task DoSomethingAsync(int i)
{
return Task.Delay(1000);
}
}}
}
This way the first block just considers itself done with a message almost instantly and is able to handle, in order, the next message which calls DoSomethingAsync directly, without waiting for the response of the previous call.

Number of Request before DDOSing. Limiting # of async Tasks [duplicate]

I am using the HTTPClient in System.Net.Http to make requests against an API. The API is limited to 10 requests per second.
My code is roughly like so:
List<Task> tasks = new List<Task>();
items..Select(i => tasks.Add(ProcessItem(i));
try
{
await Task.WhenAll(taskList.ToArray());
}
catch (Exception ex)
{
}
The ProcessItem method does a few things but always calls the API using the following:
await SendRequestAsync(..blah). Which looks like:
private async Task<Response> SendRequestAsync(HttpRequestMessage request, CancellationToken token)
{
token.ThrowIfCancellationRequested();
var response = await HttpClient
.SendAsync(request: request, cancellationToken: token).ConfigureAwait(continueOnCapturedContext: false);
token.ThrowIfCancellationRequested();
return await Response.BuildResponse(response);
}
Originally the code worked fine but when I started using Task.WhenAll I started getting 'Rate Limit Exceeded' messages from the API. How can I limit the rate at which requests are made?
Its worth noting that ProcessItem can make between 1-4 API calls depending on the item.
The API is limited to 10 requests per second.
Then just have your code do a batch of 10 requests, ensuring they take at least one second:
Items[] items = ...;
int index = 0;
while (index < items.Length)
{
var timer = Task.Delay(TimeSpan.FromSeconds(1.2)); // ".2" to make sure
var tasks = items.Skip(index).Take(10).Select(i => ProcessItemsAsync(i));
var tasksAndTimer = tasks.Concat(new[] { timer });
await Task.WhenAll(tasksAndTimer);
index += 10;
}
Update
My ProcessItems method makes 1-4 API calls depending on the item.
In this case, batching is not an appropriate solution. You need to limit an asynchronous method to a certain number, which implies a SemaphoreSlim. The tricky part is that you want to allow more calls over time.
I haven't tried this code, but the general idea I would go with is to have a periodic function that releases the semaphore up to 10 times. So, something like this:
private readonly SemaphoreSlim _semaphore = new SemaphoreSlim(10);
private async Task<Response> ThrottledSendRequestAsync(HttpRequestMessage request, CancellationToken token)
{
await _semaphore.WaitAsync(token);
return await SendRequestAsync(request, token);
}
private async Task PeriodicallyReleaseAsync(Task stop)
{
while (true)
{
var timer = Task.Delay(TimeSpan.FromSeconds(1.2));
if (await Task.WhenAny(timer, stop) == stop)
return;
// Release the semaphore at most 10 times.
for (int i = 0; i != 10; ++i)
{
try
{
_semaphore.Release();
}
catch (SemaphoreFullException)
{
break;
}
}
}
}
Usage:
// Start the periodic task, with a signal that we can use to stop it.
var stop = new TaskCompletionSource<object>();
var periodicTask = PeriodicallyReleaseAsync(stop.Task);
// Wait for all item processing.
await Task.WhenAll(taskList);
// Stop the periodic task.
stop.SetResult(null);
await periodicTask;
The answer is similar to this one.
Instead of using a list of tasks and WhenAll, use Parallel.ForEach and use ParallelOptions to limit the number of concurrent tasks to 10, and make sure each one takes at least 1 second:
Parallel.ForEach(
items,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
ProcessItems(item);
await Task.Delay(1000);
}
);
Or if you want to make sure each item takes as close to 1 second as possible:
Parallel.ForEach(
searches,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
var watch = new Stopwatch();
watch.Start();
ProcessItems(item);
watch.Stop();
if (watch.ElapsedMilliseconds < 1000) await Task.Delay((int)(1000 - watch.ElapsedMilliseconds));
}
);
Or:
Parallel.ForEach(
searches,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
await Task.WhenAll(
Task.Delay(1000),
Task.Run(() => { ProcessItems(item); })
);
}
);
UPDATED ANSWER
My ProcessItems method makes 1-4 API calls depending on the item. So with a batch size of 10 I still exceed the rate limit.
You need to implement a rolling window in SendRequestAsync. A queue containing timestamps of each request is a suitable data structure. You dequeue entries with a timestamp older than 10 seconds. As it so happens, there is an implementation as an answer to a similar question on SO.
ORIGINAL ANSWER
May still be useful to others
One straightforward way to handle this is to batch your requests in groups of 10, run those concurrently, and then wait until a total of 10 seconds has elapsed (if it hasn't already). This will bring you in right at the rate limit if the batch of requests can complete in 10 seconds, but is less than optimal if the batch of requests takes longer. Have a look at the .Batch() extension method in MoreLinq. Code would look approximately like
foreach (var taskList in tasks.Batch(10))
{
Stopwatch sw = Stopwatch.StartNew(); // From System.Diagnostics
await Task.WhenAll(taskList.ToArray());
if (sw.Elapsed.TotalSeconds < 10.0)
{
// Calculate how long you still have to wait and sleep that long
// You might want to wait 10.5 or 11 seconds just in case the rate
// limiting on the other side isn't perfectly implemented
}
}
https://github.com/thomhurst/EnumerableAsyncProcessor
I've written a library to help with this sort of logic.
Usage would be:
var responses = await AsyncProcessorBuilder.WithItems(items) // Or Extension Method: items.ToAsyncProcessorBuilder()
.SelectAsync(item => ProcessItem(item), CancellationToken.None)
.ProcessInParallel(levelOfParallelism: 10, TimeSpan.FromSeconds(1));

Parallel processing using concurrent collection

I currently have a function that perform a set of 10 tasks in parallel. After the 10 tasks completes i move on to the next 10 until my queue is empty. I am looking forward to increase the efficiency of that algorithm as right now if 9 of my tasks have completed in 1 min and my 10th task is taking another 10 min i need to wait for all the 10 task to complete even though i have 9 spot free for 9 other task to start using.
Is there a way that when a task is completed, i immediately send another task for processing within that same level(for each loop). I saw that concurrent Dictionary can be use. Can you please guide and provide some sample code.
public async Task Test()
{
List<task> listoftasks =new List<Task>();
foreach(level in levels)
{
Queue<Model1> queue=new Queue<Model1>(Store);
while(queue.Count>0)
{
for(int i=0;i<10;i++)
{
if(!queue.TryDequeue(out Model1 item))
{
break;
}
listoftasks.Add(Task.Run(()=>Dosomething(sql)))
}
await Task.WhenAll(listoftasks);
listoftasks .Clear();
}
}
}
You can just use LimitedConcurrencyLevelTaskScheduler to achieve desired behavour (https://learn.microsoft.com/en-us/dotnet/api/system.threading.tasks.taskscheduler?view=net-5.0). In this case you can just push all tasks at one moment and they will be executed with desired level of concurrency (not more then 10 tasks at the parallel in your case).
You can get each Task to dequeue an item. Use a ConcurrentQueue to ensure thread-safety.
It's kind of a poor-man's scheduler, but it's very lightweight.
ConcurrentQueue<Model1> queue;
void Dequeue()
{
while(queue.TryDequeue(out var item))
DoSomething(item);
}
public async Task Test()
{
queue = new ConcurrentQueue<Model1>(Store);
var listoftasks = new List<Task>();
for (var i = 0; i < Environment.ProcessorCount; i++)
listoftasks.Add(Task.Run(() => Dequeue()));
await Task.WhenAll(listoftasks);
}
Note: this does not handle exceptions, so all exceptions must be handled or swallowed
Personally I'd use an ActionBlock (out of the TPL Dataflow library). It has
built in MaxDegreeOfParallelism
Can easily deal with async IO Bound workloads, or non async CPU Bound workloads
Has cancelation support (if needed)
Can be built into larger pipelines
Can run as perpetual consumer in a multi-producer environment
Given
private ActionBlock<Model> _processor;
Setup
_processor = new ActionBlock<Model>(
DoSomething,
new ExecutionDataflowBlockOptions()
{
CancellationToken = SomeCancelationTokenIfNeeded,
MaxDegreeOfParallelism = 10,
SingleProducerConstrained = true
});
Some Method
public static void DoSomething(Model item)
{ ... }
Usage
await _processor.SendAsync(someItem);

Dispatcher, Async/Await, Concurrent work

I have bunch of async methods, which I invoke from Dispatcher. The methods does not perform any work in the background, they just waits for some I/O operations, or wait for response from the webserver.
async Task FetchAsync()
{
// Prepare request in UI thread
var response = await new WebClient().DownloadDataTaskAsync();
// Process response in UI thread
}
now, I want to perform load tests, by calling multiple FetchAsync() in parallel with some max degree of parallelism.
My first attempt was using Paralell.Foreach(), but id does not work well with async/await.
var option = new ParallelOptions {MaxDegreeOfParallelism = 10};
Parallel.ForEach(UnitsOfWork, uow => uow.FetchAsync().Wait());
I've been looking at reactive extensions, but I'm still not able to take advantage of Dispatcher and async/await.
My goal is to not create separate thread for each FetchAsync(). Can you give me some hints how to do it?
Just call Fetchasync without awaiting each call and then use Task.WhenAll to await all of them together.
var tasks = new List<Task>();
var max = 10;
for(int i = 0; i < max; i++)
{
tasks.Add(FetchAsync());
}
await Task.WhenAll(tasks);
Here is a generic reusable solution to your question that you can reuse not only with your FetchAsync method but for any async method that has the same signature. The api includes real time concurrent throttling support as well:
Parameters are self explanatory:
totalRequestCount: is how many async requests (FatchAsync calls) you want to do in total, async processor is the FetchAsync method itself, maxDegreeOfParallelism is the optional nullable parameter. If you want real time concurrent throttling with max number of concurrent async requests, set it, otherwise not.
public static Task ForEachAsync(
int totalRequestCount,
Func<Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
IEnumerable<Task> tasks;
if (maxDegreeOfParallelism != null)
{
SemaphoreSlim throttler = new SemaphoreSlim(maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value);
tasks = Enumerable.Range(0, totalRequestCount).Select(async requestNumber =>
{
await throttler.WaitAsync();
try
{
await asyncProcessor().ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
}
else
{
tasks = Enumerable.Range(0, totalRequestCount).Select(requestNumber => asyncProcessor());
}
return Task.WhenAll(tasks);
}

How to correctly queue up tasks to run in C#

I have an enumeration of items (RunData.Demand), each representing some work involving calling an API over HTTP. It works great if I just foreach through it all and call the API during each iteration. However, each iteration takes a second or two so I'd like to run 2-3 threads and divide up the work between them. Here's what I'm doing:
ThreadPool.SetMaxThreads(2, 5); // Trying to limit the amount of threads
var tasks = RunData.Demand
.Select(service => Task.Run(async delegate
{
var availabilityResponse = await client.QueryAvailability(service);
// Do some other stuff, not really important
}));
await Task.WhenAll(tasks);
The client.QueryAvailability call basically calls an API using the HttpClient class:
public async Task<QueryAvailabilityResponse> QueryAvailability(QueryAvailabilityMultidayRequest request)
{
var response = await client.PostAsJsonAsync("api/queryavailabilitymultiday", request);
if (response.IsSuccessStatusCode)
{
return await response.Content.ReadAsAsync<QueryAvailabilityResponse>();
}
throw new HttpException((int) response.StatusCode, response.ReasonPhrase);
}
This works great for a while, but eventually things start timing out. If I set the HttpClient Timeout to an hour, then I start getting weird internal server errors.
What I started doing was setting a Stopwatch within the QueryAvailability method to see what was going on.
What's happening is all 1200 items in RunData.Demand are being created at once and all 1200 await client.PostAsJsonAsync methods are being called. It appears it then uses the 2 threads to slowly check back on the tasks, so towards the end I have tasks that have been waiting for 9 or 10 minutes.
Here's the behavior I would like:
I'd like to create the 1,200 tasks, then run them 3-4 at a time as threads become available. I do not want to queue up 1,200 HTTP calls immediately.
Is there a good way to go about doing this?
As I always recommend.. what you need is TPL Dataflow (to install: Install-Package System.Threading.Tasks.Dataflow).
You create an ActionBlock with an action to perform on each item. Set MaxDegreeOfParallelism for throttling. Start posting into it and await its completion:
var block = new ActionBlock<QueryAvailabilityMultidayRequest>(async service =>
{
var availabilityResponse = await client.QueryAvailability(service);
// ...
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 4 });
foreach (var service in RunData.Demand)
{
block.Post(service);
}
block.Complete();
await block.Completion;
Old question, but I would like to propose an alternative lightweight solution using the SemaphoreSlim class. Just reference System.Threading.
SemaphoreSlim sem = new SemaphoreSlim(4,4);
foreach (var service in RunData.Demand)
{
await sem.WaitAsync();
Task t = Task.Run(async () =>
{
var availabilityResponse = await client.QueryAvailability(serviceCopy));
// do your other stuff here with the result of QueryAvailability
}
t.ContinueWith(sem.Release());
}
The semaphore acts as a locking mechanism. You can only enter the semaphore by calling Wait (WaitAsync) which subtracts one from the count. Calling release adds one to the count.
You're using async HTTP calls, so limiting the number of threads will not help (nor will ParallelOptions.MaxDegreeOfParallelism in Parallel.ForEach as one of the answers suggests). Even a single thread can initiate all requests and process the results as they arrive.
One way to solve it is to use TPL Dataflow.
Another nice solution is to divide the source IEnumerable into partitions and process items in each partition sequentially as described in this blog post:
public static Task ForEachAsync<T>(this IEnumerable<T> source, int dop, Func<T, Task> body)
{
return Task.WhenAll(
from partition in Partitioner.Create(source).GetPartitions(dop)
select Task.Run(async delegate
{
using (partition)
while (partition.MoveNext())
await body(partition.Current);
}));
}
While the Dataflow library is great, I think it's a bit heavy when not using block composition. I would tend to use something like the extension method below.
Also, unlike the Partitioner method, this runs the async methods on the calling context - the caveat being that if your code is not truly async, or takes a 'fast path', then it will effectively run synchronously since no threads are explicitly created.
public static async Task RunParallelAsync<T>(this IEnumerable<T> items, Func<T, Task> asyncAction, int maxParallel)
{
var tasks = new List<Task>();
foreach (var item in items)
{
tasks.Add(asyncAction(item));
if (tasks.Count < maxParallel)
continue;
var notCompleted = tasks.Where(t => !t.IsCompleted).ToList();
if (notCompleted.Count >= maxParallel)
await Task.WhenAny(notCompleted);
}
await Task.WhenAll(tasks);
}

Categories

Resources