I want to run a function asynchronously in parallel for all the objects in a collection.
Here's the current code snippet.
Parallel.ForEach(Objs,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
item =>
{
DoSomething(someParameter, item);
});
How can I achieve this is with Async-Parallel Threads
It's not clear what you mean by "asynchronously in parallel," since asynchrony (using fewer threads) is kind of the opposite of parallel (using more threads).
But if you mean that you want to run the asynchronous method DoSomething concurrently, then that can be done via Task.WhenAll:
var tasks = Objs.Select(item => DoSomething(someParamter, item));
await Task.WhenAll(tasks);
If you want to limit the degree of concurrency (e.g., to 10), then you can use a SemaphoreSlim:
var semaphore = new SemaphoreSlim(10);
var tasks = Objs.Select(item =>
{
await semaphore.WaitAsync();
try { await DoSomething(someParameter, item); }
finally { semaphore.Release(); }
});
await Task.WhenAll(tasks);
Related
I need to run many tasks in parallel as fast as possible. But if my program runs more than 30 tasks per 1 second, it will be blocked. How to ensure that tasks run no more than 30 per any 1-second interval?
In other words, we must prevent the new task from starting if 30 tasks were completed in the last 1-second interval.
My ugly possible solution:
private async Task Process(List<Task> taskList, int maxIntervalCount, int timeIntervalSeconds)
{
var timeList = new List<DateTime>();
var sem = new Semaphore(maxIntervalCount, maxIntervalCount);
var tasksToRun = taskList.Select(async task =>
{
do
{
sem.WaitOne();
}
while (HasAllowance(timeList, maxIntervalCount, timeIntervalSeconds));
await task;
timeList.Add(DateTime.Now);
sem.Release();
});
await Task.WhenAll(tasksToRun);
}
private bool HasAllowance(List<DateTime> timeList, int maxIntervalCount, int timeIntervalSeconds)
{
return timeList.Count <= maxIntervalCount
|| DateTime.Now.Subtract(TimeSpan.FromSeconds(timeIntervalSeconds)) > timeList[timeList.Count - maxIntervalCount];
}
User code should never have to control how tasks are scheduled directly. For one thing, it can't - controlling how tasks run is the job of the TaskScheduler. When user code calls .Start(), it simply adds a task to a threadpool queue for execution. await executes already executing tasks.
The TaskScheduler samples show how to create limited concurrency schedulers, but again, there are better, high-level options.
The question's code doesn't throttle the queued tasks anyway, it limits how many of them can be awaited. They are all running already. This is similar to batching the previous asynchronous operation in a pipeline, allowing only a limited number of messages to pass to the next level.
ActionBlock with delay
The easy, out-of-the-box way would be to use an ActionBlock with a limited MaxDegreeOfParallelism, to ensure no more than N concurrent operations can run at the same time. If we know how long each operation takes, we could add a bit of delay to ensure we don't overshoot the throttle limit.
In this case, 7 concurrent workers perform 4 requests/second, for a total of 28 maximum request per second. The BoundedCapacity means that only up to 7 items will be stored in the input buffer before downloader.SendAsync blocks. This way we avoid flooding the ActionBlock if the operations take too long.
var downloader = new ActionBlock<string>(
async url => {
await Task.Delay(250);
var response=await httpClient.GetStringAsync(url);
//Do something with it.
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 7, BoundedCapacity=7 }
);
//Start posting to the downloader
foreach(var item in urls)
{
await downloader.SendAsync(item);
}
downloader.Complete();
await downloader.Completion;
ActionBlock with SemaphoreSlim
Another option would be to combine this with a SemaphoreSlim that gets reset periodically by a timer.
var refreshTimer = new Timer(_=>sm.Release(30));
var downloader = new ActionBlock<string>(
async url => {
await semaphore.WaitAsync();
try
{
var response=await httpClient.GetStringAsync(url);
//Do something with it.
}
finally
{
semaphore.Release();
}
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5, BoundedCapacity=5 }
);
//Start the timer right before we start posting
refreshTimer.Change(1000,1000);
foreach(....)
{
}
This is the snippet:
var tasks = new List<Task>();
foreach(item in listNeedInsert)
{
var task = TaskToRun(item);
tasks.Add(task);
if(tasks.Count == 100)
{
await Task.WhenAll(tasks);
tasks.Clear();
}
}
// Wait for anything left to finish
await Task.WhenAll(tasks);
Notice that I rather add the task into a List<Task>(); and after all is added, I await all in the same List<Task>();
What you do here:
var tasks = taskList.Select(async task =>
{
do
{
sem.WaitOne();
}
while (timeList.Count <= maxIntervalCount
|| DateTime.Now.Subtract(TimeSpan.FromSeconds(timeIntervalSeconds)) > timeList[timeList.Count - maxIntervalCount]);
await task;
is blocking until the task finishes it's work thus making this call:
Task.WhenAll(tasks).Wait();
completely redundant. Furthermore, this line Task.WhenAll(tasks).Wait(); is performing unnecessary blocking on the WhenAll method.
Is the blocking due to some server/firewall/hardware limit or it is based on observation?
You should try to use BlockingCollection<Task> or similar thread safe collections especially if the job of your tasks are I/O-bound. You can even set the capacity to 30:
var collection = BlockingCollection<Task>(30);
Then you can start 2 async method:
var population = Task.Factory.Start(Populate);
var processing = Task.Factory.Start(Dequeue);
await Task.WhenAll(population, processing);
Task Populate()
{
foreach (...)
collection.Add(...);
collection.CompleteAdding();
}
Task Dequeue
{
while(!collection.IsComplete)
await collection.Take(); //consider using TryTake()
}
If the limit presists due to some true limitation (should be very rare) change Populate() as follows:
var stopper = Stopwatch.StartNew();
for (var i = ....) //instead of foreach
{
if (i % 30 == 0)
{
if (stopper.ElapsedMilliseconds < 1000)
Task.Delay(1000 - stopper.ElapsedMilliseconds); //note that this race condition should be avoided in your code
stopper.Restart();
}
collection.Add(...);
}
collection.CompleteAdding();
I think that this problem can be solved by a SemaphoreSlim limited to the number of maximum tasks per interval, and also by a Task.Delay that delays the release of the SemaphoreSlim after each task's completion, for an interval equal to the required throttling interval. Below is an implementation based on this idea. The rate limiting can be applied in two ways:
With includeAsynchronousDuration: false the rate limit affects how many operations can be started during the specified time span. The duration of each operation is not taken into account.
With includeAsynchronousDuration: true the rate limit affects how many operations can be counted as "active" during the specified time span, and is more restrictive (makes the enumeration slower). Instead of counting each operation as a moment in time (when started), it is counted as a time span (between start and completion). An operation is counted as "active" for a specified time span, if and only if its own time span intersects with the specified time span.
/// <summary>
/// Applies an asynchronous transformation for each element of a sequence,
/// limiting the number of transformations that can start or be active during
/// the specified time span.
/// </summary>
public static async Task<TResult[]> ForEachAsync<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, Task<TResult>> action,
int maxActionsPerTimeUnit,
TimeSpan timeUnit,
bool includeAsynchronousDuration = false,
bool onErrorContinue = false, /* Affects only asynchronous errors */
bool executeOnCapturedContext = false)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (action == null) throw new ArgumentNullException(nameof(action));
if (maxActionsPerTimeUnit < 1)
throw new ArgumentOutOfRangeException(nameof(maxActionsPerTimeUnit));
if (timeUnit < TimeSpan.Zero || timeUnit.TotalMilliseconds > Int32.MaxValue)
throw new ArgumentOutOfRangeException(nameof(timeUnit));
using var semaphore = new SemaphoreSlim(maxActionsPerTimeUnit,
maxActionsPerTimeUnit);
using var cts = new CancellationTokenSource();
var tasks = new List<Task<TResult>>();
var releaseTasks = new List<Task>();
try // Watch for exceptions thrown by the source enumerator
{
foreach (var item in source)
{
try
{
await semaphore.WaitAsync(cts.Token)
.ConfigureAwait(executeOnCapturedContext);
}
catch (OperationCanceledException) { break; }
// Exceptions thrown synchronously by invoking the action are breaking
// the loop unconditionally (the onErrorContinue has no effect on them).
var task = action(item);
if (!onErrorContinue) task = ObserveFailureAsync(task);
tasks.Add(task);
releaseTasks.Add(ScheduleSemaphoreReleaseAsync(task));
}
}
catch (Exception ex) { tasks.Add(Task.FromException<TResult>(ex)); }
cts.Cancel(); // Cancel all release tasks
Task<TResult[]> whenAll = Task.WhenAll(tasks);
try { return await whenAll.ConfigureAwait(false); }
catch (OperationCanceledException) when (whenAll.IsCanceled) { throw; }
catch { whenAll.Wait(); throw; } // Propagate AggregateException
finally { await Task.WhenAll(releaseTasks); }
async Task<TResult> ObserveFailureAsync(Task<TResult> task)
{
try { return await task.ConfigureAwait(false); }
catch { cts.Cancel(); throw; }
}
async Task ScheduleSemaphoreReleaseAsync(Task<TResult> task)
{
if (includeAsynchronousDuration)
try { await task.ConfigureAwait(false); } catch { } // Ignore exceptions
// Release only if the Task.Delay completed successfully
try { await Task.Delay(timeUnit, cts.Token).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
semaphore.Release();
}
}
Usage example:
int[] results = await ForEachAsync(Enumerable.Range(1, 100), async n =>
{
await Task.Delay(500); // Simulate some asynchronous I/O-bound operation
return n;
}, maxActionsPerTimeUnit: 30, timeUnit: TimeSpan.FromSeconds(1.0),
includeAsynchronousDuration: true);
The reasons for propagating an AggregateException using the catch+Wait technique, are explained here.
I have a lot of operations what i want to run async. I do:
var tasks = new List<Task<bool>>();
for(var i=0;i<1000;i++){
tasks.Add(CreateGeoreferencedImageAsync(properties, scaleIndex, currentXmin, currentYmin, currentXmax, currentYmax));
}
while (tasks.Count > 0)
{
var bunch = tasks.Take(4).ToList();
bool[] firstFinishedTask = await Task.WhenAll(bunch);
tasks.RemoveRange(0,4);
}
But i see that WhenAll execute all Task from tasks not only from bunch.
What i missed?
UPDATE
private Task<bool> CreateGeoreferencedImageAsync(ImageGenerationProperties
properties, int scaleIndex,
double currentXmin, double currentYmin, double currentXmax, double currentYmax)
{
return Task.Run(() =>
{
return CreateGeoreferencedImage(properties, scaleIndex, currentXmin, currentYmin, currentXmax,
currentYmax);
});
}
WhenAll doesn't execute the tasks. It waits for them to finish. The execution usually starts as soon as the task is created, in your case inside CreateGeoreferencedImageAsync.
Forget about Tasks and especially about Task.Run().
This problem is best solved with Parallel.ForEach(). Or Linq's AsParallel().
Parallel.For(0, 1000,
new ParallelOptions { MaxDegreeOfParallelism = 4 },
i => CreateGeoreferencedImage(properties, scaleIndex,
currentXmin, currentYmin, currentXmax, currentYmax));
I have bunch of async methods, which I invoke from Dispatcher. The methods does not perform any work in the background, they just waits for some I/O operations, or wait for response from the webserver.
async Task FetchAsync()
{
// Prepare request in UI thread
var response = await new WebClient().DownloadDataTaskAsync();
// Process response in UI thread
}
now, I want to perform load tests, by calling multiple FetchAsync() in parallel with some max degree of parallelism.
My first attempt was using Paralell.Foreach(), but id does not work well with async/await.
var option = new ParallelOptions {MaxDegreeOfParallelism = 10};
Parallel.ForEach(UnitsOfWork, uow => uow.FetchAsync().Wait());
I've been looking at reactive extensions, but I'm still not able to take advantage of Dispatcher and async/await.
My goal is to not create separate thread for each FetchAsync(). Can you give me some hints how to do it?
Just call Fetchasync without awaiting each call and then use Task.WhenAll to await all of them together.
var tasks = new List<Task>();
var max = 10;
for(int i = 0; i < max; i++)
{
tasks.Add(FetchAsync());
}
await Task.WhenAll(tasks);
Here is a generic reusable solution to your question that you can reuse not only with your FetchAsync method but for any async method that has the same signature. The api includes real time concurrent throttling support as well:
Parameters are self explanatory:
totalRequestCount: is how many async requests (FatchAsync calls) you want to do in total, async processor is the FetchAsync method itself, maxDegreeOfParallelism is the optional nullable parameter. If you want real time concurrent throttling with max number of concurrent async requests, set it, otherwise not.
public static Task ForEachAsync(
int totalRequestCount,
Func<Task> asyncProcessor,
int? maxDegreeOfParallelism = null)
{
IEnumerable<Task> tasks;
if (maxDegreeOfParallelism != null)
{
SemaphoreSlim throttler = new SemaphoreSlim(maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value);
tasks = Enumerable.Range(0, totalRequestCount).Select(async requestNumber =>
{
await throttler.WaitAsync();
try
{
await asyncProcessor().ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
}
else
{
tasks = Enumerable.Range(0, totalRequestCount).Select(requestNumber => asyncProcessor());
}
return Task.WhenAll(tasks);
}
My code currently has the following 10 worker threads. Each worker thread continues polling a job from the queue and then process the long running job.
for (int k=0; k<10; k++)
{
Task.Factory.StartNew(() => DoPollingThenWork(), TaskCreationOptions.LongRunning);
}
void DoPollingThenWork()
{
while (true)
{
var msg = Poll();
if (msg != null)
{
Thread.Sleep(3000); // process the I/O bound job
}
}
}
I am refactoring the underlying code to use async/await pattern. I think I can rewrite the above code to the followings. It uses one main thread that keeps creating the async task, and use SemaphoreSlim to throttle the number of concurrent tasks to 10.
Task.Factory.StartNew(() => WorkerMainAsync(), TaskCreationOptions.LongRunning);
async Task WorkerMainAsync()
{
SemaphoreSlim ss = new SemaphoreSlim(10);
while (true)
{
await ss.WaitAsync();
Task.Run(async () =>
{
await DoPollingThenWorkAsync();
ss.Release();
});
}
}
async Task DoPollingThenWorkAsync()
{
var msg = Poll();
if (msg != null)
{
await Task.Delay(3000); // process the I/O-bound job
}
}
Both should behave the same. But I think the second options seems better because it doesn't block the thread. But the downside is I can't do Wait (to gracefully stop the task) since the task is like fire and forget. Is the second option the right way to replace the traditional worker threads pattern?
When you have code that's asynchronous, you usually have no reason to use Task.Run() (or, even worse, Task.Factory.StartNew()). This means that you can change your code to something like this:
await WorkerMainAsync();
async Task WorkerMainAsync()
{
SemaphoreSlim ss = new SemaphoreSlim(10);
while (true)
{
await ss.WaitAsync();
// you should probably store this task somewhere and then await it
var task = DoPollingThenWorkAsync();
}
}
async Task DoPollingThenWorkAsync(SemaphoreSlim semaphore)
{
var msg = Poll();
if (msg != null)
{
await Task.Delay(3000); // process the I/O-bound job
}
// this assumes you don't have to worry about exceptions
// otherwise consider try-finally
semaphore.Release();
}
Usually you don't use async/await inside a CPU-bound task. The method that starts such a task (WorkerMainAsync) can use async/await, but you should be tracking pending tasks:
async Task WorkerMainAsync()
{
SemaphoreSlim ss = new SemaphoreSlim(10);
List<Task> trackedTasks = new List<Task>();
while (DoMore())
{
await ss.WaitAsync();
trackedTasks.Add(Task.Run(() =>
{
DoPollingThenWorkAsync();
ss.Release();
}));
}
await Task.WhenAll(trackedTasks);
}
void DoPollingThenWorkAsync()
{
var msg = Poll();
if (msg != null)
{
Thread.Sleep(2000); // process the long running CPU-bound job
}
}
Another exercise would be to remove tasks from trackedTasks as they are finishing. For example, you could use ContinueWith to remove a finished tasks (in this case, remember to use lock to protect trackedTasks from simultaneous access).
If you really need to use await inside DoPollingThenWorkAsync, the code wouldn't change a lot:
trackedTasks.Add(Task.Run(async () =>
{
await DoPollingThenWorkAsync();
ss.Release();
}));
Note that in this case, you'd be dealing with a nested task here for the async lambda, which Task.Run will automatically unwrap for you.
Given the following:
BlockingCollection<MyObject> collection;
public class MyObject
{
public async Task<ReturnObject> DoWork()
{
(...)
return await SomeIOWorkAsync();
}
}
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit though (I believe the Task Factory/ThreadPool does some management here)?
You can make use of the WhenAll extension method.
var combinedTask = await Task.WhenAll(collection.Select(x => x.DoWork());
It will start all tasks concurrently and waits for all to finish.
ThreadPool manages the number of threads running, but that won't help you much with asynchronous Tasks.
Because of that, you need something else. One way to do this is to utilize ActionBlock from TPL Dataflow:
int limit = …;
IEnumerable<MyObject> collection = …;
var block = new ActionBlock<MyObject>(
o => o.DoWork(),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = limit });
foreach (var obj in collection)
block.Post(o);
block.Complete();
await block.Completion;
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit
The easiest way to do that is with Task.WhenAll:
ReturnObject[] results = await Task.WhenAll(collection.Select(x => x.DoWork()));
This will invoke DoWork on all MyObjects in the collection and then wait for them all to complete. The thread pool handles all throttling sensibly.
Is there a different way if I want to capture every individual DoWork() return immediately instead of waiting for all items to complete?
Yes, you can use the method described by Jon Skeet and Stephen Toub. I have a similar solution in my AsyncEx library (available via NuGet), which you can use like this:
// "tasks" is of type "Task<ReturnObject>[]"
var tasks = collection.Select(x => x.DoWork()).OrderByCompletion();
foreach (var task in tasks)
{
var result = await task;
...
}
My comment was a bit cryptic, so I though I'd add this answer:
List<Task<ReturnObject>> workTasks =
collection.Select( o => o.DoWork() ).ToList();
List<Task> resultTasks =
workTasks.Select( o => o.ContinueWith( t =>
{
ReturnObject r = t.Result;
// do something with the result
},
// if you want to run this on the UI thread
TaskScheduler.FromCurrentSynchronizationContext()
)
)
.ToList();
await Task.WhenAll( resultTasks );