When using WaitHandle.WaitAny and Semaphore class like the following:
var s1 = new Semaphore(1, 1);
var s2 = new Semaphore(1, 1);
var handles = new [] { s1, s2 };
var index = WaitHandle.WaitAny(handles);
handles[index].Release();
It seems guaranteed that only one semaphore is acquired by WaitHandle.WaitAny.
Is it possible to obtain similar behavior for asynchronous (async/await) code?
I cannot think of a built-in solution. I'd do it like this:
var s1 = new SemaphoreSlim(1, 1);
var s2 = new SemaphoreSlim(1, 1);
var waits = new [] { s1.WaitAsync(), s2.WaitAsync() };
var firstWait = await Task.WhenAny(waits);
//The wait is still running - perform compensation.
if (firstWait == waits[0])
waits[1].ContinueWith(_ => s2.Release());
if (firstWait == waits[1])
waits[0].ContinueWith(_ => s1.Release());
This acquires both semaphores but it immediately releases the one that came second. This should be equivalent. I cannot think of a negative consequence of acquiring a semaphore needlessly (except performance of course).
Here is a generalized implementation of a WaitAnyAsync method, that acquires asynchronously any of the supplied semaphores:
/// <summary>
/// Asynchronously waits to enter any of the semaphores in the specified array.
/// </summary>
public static async Task<SemaphoreSlim> WaitAnyAsync(SemaphoreSlim[] semaphores,
CancellationToken cancellationToken = default)
{
// Fast path
cancellationToken.ThrowIfCancellationRequested();
var acquired = semaphores.FirstOrDefault(x => x.Wait(0));
if (acquired != null) return acquired;
// Slow path
using var cts = CancellationTokenSource.CreateLinkedTokenSource(
cancellationToken);
Task<SemaphoreSlim>[] acquireTasks = semaphores
.Select(async s => { await s.WaitAsync(cts.Token); return s; })
.ToArray();
Task<SemaphoreSlim> acquiredTask = await Task.WhenAny(acquireTasks);
cts.Cancel(); // Cancel all other tasks
var releaseOtherTasks = acquireTasks
.Where(task => task != acquiredTask)
.Select(async task => (await task).Release());
try { await Task.WhenAll(releaseOtherTasks); }
catch (OperationCanceledException) { } // Ignore
catch
{
// Consider any other error (possibly SemaphoreFullException or
// ObjectDisposedException) as a failure, and propagate the exception.
try { (await acquiredTask).Release(); } catch { }
throw;
}
try { return await acquiredTask; }
catch (OperationCanceledException)
{
// Propagate an exception holding the correct CancellationToken
cancellationToken.ThrowIfCancellationRequested();
throw; // Should never happen
}
}
This method becomes increasingly inefficient as the contention gets higher and higher, so I wouldn't recommend using it in hot paths.
Variation of #usr's answer which solved my slightly more general problem (after quite some time going down the rathole of trying to marry AvailableWaitHandle with Task...)
class SemaphoreSlimExtensions
public static Task AwaitButReleaseAsync(this SemaphoreSlim s) =>
s.WaitAsync().ContinueWith(_t -> s.Release(), TaskContinuationOptions.ExecuteSynchronously);
public static bool TryTake(this SemaphoreSlim s) =>
s.Wait(0);
In my use case, the await is just a trigger for synchronous logic that then walks the full set - the TryTake helper is in my case a natural way to handle the conditional acquisition of the semaphore and the processing that's contingent on that.
var sems = new[] { new SemaphoreSlim(1, 1), new SemaphoreSlim(1, 1) };
await Task.WhenAny(from s in sems select s.AwaitButReleaseAsync());
Putting it here as I believe it to be clean, clear and relatively efficient but would be happy to see improvements on it
Related
I have a bunch of requests to process, some of which may complete synchronously.
I'd like to gather all results that are immediately available and return them early, while waiting for the rest.
Roughly like this:
List<Task<Result>> tasks = new ();
List<Result> results = new ();
foreach (var request in myRequests) {
var task = request.ProcessAsync();
if (task.IsCompleted)
results.Add(task.Result); // or Add(await task) ?
else
tasks.Add(task);
}
// send results that are available "immediately" while waiting for the rest
if (results.Count > 0) SendResults(results);
results = await Task.WhenAll(tasks);
SendResults(results);
I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted, or where it may change back to false again, etc.?
Similarly, could it be dangerous to use task.Result even after checking IsCompleted, should one always prefer await task? What if were using ValueTask instead of Task?
I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted...
If you're in a multithreaded context, it's possible that IsCompleted could return false at the moment when you check on it, but it completes immediately thereafter. In cases like the code you're using, the cost of this happening would be very low, so I wouldn't worry about it.
or where it may change back to false again, etc.?
No, once a Task completes, it cannot uncomplete.
could it be dangerous to use task.Result even after checking IsCompleted.
Nope, that should always be safe.
should one always prefer await task?
await is a great default when you don't have a specific reason to do something else, but there are a variety of use cases where other patterns might be useful. The use case you've highlighted is a good example, where you want to return the results of finished tasks without awaiting all of them.
As Stephen Cleary mentioned in a comment below, it may still be worthwhile to use await to maintain expected exception behavior. You might consider doing something more like this:
var requestsByIsCompleted = myRequests.ToLookup(r => r.IsCompleted);
// send results that are available "immediately" while waiting for the rest
SendResults(await Task.WhenAll(requestsByIsCompleted[true]));
SendResults(await Task.WhenAll(requestsByIsCompleted[false]));
What if were using ValueTask instead of Task?
The answers above apply equally to both types.
You could use code like this to continually send the results of completed tasks while waiting on others to complete.
foreach (var request in myRequests)
{
tasks.Add(request.ProcessAsync());
}
// wait for at least one task to be complete, then send all available results
while (tasks.Count > 0)
{
// wait for at least one task to complete
Task.WaitAny(tasks.ToArray());
// send results for each completed task
var completedTasks = tasks.Where(t => t.IsCompleted);
var results = completedTasks.Where(t => t.IsCompletedSuccessfully).Select(t => t.Result).ToList();
SendResults(results);
// TODO: handle completed but failed tasks here
// remove completed tasks from the tasks list and keep waiting
tasks.RemoveAll(t => completedTasks.Contains(t));
}
Using only await you can achieve the desired behavior:
async Task ProcessAsync(MyRequest request, Sender sender)
{
var result = await request.ProcessAsync();
await sender.SendAsync(result);
}
...
async Task ProcessAll()
{
var tasks = new List<Task>();
foreach(var request in requests)
{
var task = ProcessAsync(request, sender);
// Dont await until all requests are queued up
tasks.Add(task);
}
// Await on all outstanding requests
await Task.WhenAll(tasks);
}
There are already good answers, but in addition of them here is my suggestion too, on how to handle multiple tasks and process each task differently, maybe it will suit your needs. My example is with events, but you can replace them with some kind of state management that fits your needs.
public interface IRequestHandler
{
event Func<object, Task> Ready;
Task ProcessAsync();
}
public class RequestHandler : IRequestHandler
{
// Hier where you wraps your request:
// private object request;
private readonly int value;
public RequestHandler(int value)
=> this.value = value;
public event Func<object, Task> Ready;
public async Task ProcessAsync()
{
await Task.Delay(1000 * this.value);
// Hier where you calls:
// var result = await request.ProcessAsync();
//... then do something over the result or wrap the call in try catch for example
var result = $"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]";
if (this.Ready is not null)
{
// If result passes send the result to all subscribers
await this.Ready.Invoke($"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]");
}
}
}
static void Main()
{
var a = new RequestHandler(1);
a.Ready += PrintAsync;
var b = new RequestHandler(2);
b.Ready += PrintAsync;
var c = new RequestHandler(3);
c.Ready += PrintAsync;
var d= new RequestHandler(4);
d.Ready += PrintAsync;
var e = new RequestHandler(5);
e.Ready += PrintAsync;
var f = new RequestHandler(6);
f.Ready += PrintAsync;
var requests = new List<IRequestHandler>()
{
a, b, c, d, e, f
};
var tasks = requests
.Select(x => Task.Run(x.ProcessAsync));
// Hier you must await all of the tasks
Task
.Run(async () => await Task.WhenAll(tasks))
.Wait();
}
static Task PrintAsync(object output)
{
Console.WriteLine(output);
return Task.CompletedTask;
}
I would like to handle a collection in parallel, but I'm having trouble implementing it and I'm therefore hoping for some help.
The trouble arises if I want to call a method marked async in C#, within the lambda of the parallel loop. For example:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
var count = bag.Count;
The problem occurs with the count being 0, because all the threads created are effectively just background threads and the Parallel.ForEach call doesn't wait for completion. If I remove the async keyword, the method looks like this:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
// some pre stuff
var responseTask = await GetData(item);
responseTask.Wait();
var response = responseTask.Result;
bag.Add(response);
// some post stuff
}
var count = bag.Count;
It works, but it completely disables the await cleverness and I have to do some manual exception handling.. (Removed for brevity).
How can I implement a Parallel.ForEach loop, that uses the await keyword within the lambda? Is it possible?
The prototype of the Parallel.ForEach method takes an Action<T> as parameter, but I want it to wait for my asynchronous lambda.
If you just want simple parallelism, you can do this:
var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;
If you need something more complex, check out Stephen Toub's ForEachAsync post.
You can use the ParallelForEachAsync extension method from AsyncEnumerator NuGet Package:
using Dasync.Collections;
var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;
Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.
One of the new .NET 6 APIs is Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urls = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://stackoverflow.com"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urls, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Another example in Scott Hanselman's blog.
The source, for reference.
With SemaphoreSlim you can achieve parallelism control.
var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
await throttler.WaitAsync();
try
{
var response = await GetData(item);
bag.Add(response);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
var count = bag.Count;
Simplest possible extension method compiled from other answers and the article referenced by the accepted asnwer:
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync();
try
{
await asyncAction(item).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
UPDATE: here's a simple modification that also supports a cancellation token like requested in the comments (untested)
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, CancellationToken, Task> asyncAction, int maxDegreeOfParallelism, CancellationToken cancellationToken)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync(cancellationToken);
if (cancellationToken.IsCancellationRequested) return;
try
{
await asyncAction(item, cancellationToken).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
My lightweight implementation of ParallelForEach async.
Features:
Throttling (max degree of parallelism).
Exception handling (aggregation exception will be thrown at completion).
Memory efficient (no need to store the list of tasks).
public static class AsyncEx
{
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
{
var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
var tcs = new TaskCompletionSource<object>();
var exceptions = new ConcurrentBag<Exception>();
bool addingCompleted = false;
foreach (T item in source)
{
await semaphoreSlim.WaitAsync();
asyncAction(item).ContinueWith(t =>
{
semaphoreSlim.Release();
if (t.Exception != null)
{
exceptions.Add(t.Exception);
}
if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
{
tcs.TrySetResult(null);
}
});
}
Volatile.Write(ref addingCompleted, true);
await tcs.Task;
if (exceptions.Count > 0)
{
throw new AggregateException(exceptions);
}
}
}
Usage example:
await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
var data = await GetData(i);
}, maxDegreeOfParallelism: 100);
I've created an extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
In the accepted answer the ConcurrentBag is not required.
Here's an implementation without it:
var tasks = myCollection.Select(GetData).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result);
Any of the "// some pre stuff" and "// some post stuff" can go into the GetData implementation (or another method that calls GetData)
Aside from being shorter, there's no use of an "async void" lambda, which is an anti pattern.
The following is set to work with IAsyncEnumerable but can be modified to use IEnumerable by just changing the type and removing the "await" on the foreach. It's far more appropriate for large sets of data than creating countless parallel tasks and then awaiting them all.
public static async Task ForEachAsyncConcurrent<T>(this IAsyncEnumerable<T> enumerable, Func<T, Task> action, int maxDegreeOfParallelism, int? boundedCapacity = null)
{
ActionBlock<T> block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = boundedCapacity ?? maxDegreeOfParallelism * 3
});
await foreach (T item in enumerable)
{
await block.SendAsync(item).ConfigureAwait(false);
}
block.Complete();
await block.Completion;
}
For a more simple solution (not sure if the most optimal), you can simply nest Parallel.ForEach inside a Task - as such
var options = new ParallelOptions { MaxDegreeOfParallelism = 5 }
Task.Run(() =>
{
Parallel.ForEach(myCollection, options, item =>
{
DoWork(item);
}
}
The ParallelOptions will do the throttlering for you, out of the box.
I am using it in a real world scenario to run a very long operations in the background. These operations are called via HTTP and it was designed not to block the HTTP call while the long operation is running.
Calling HTTP for long background operation.
Operation starts at the background.
User gets status ID which can be used to check the status using another HTTP call.
The background operation update its status.
That way, the CI/CD call does not timeout because of long HTTP operation, rather it loops the status every x seconds without blocking the process
I have an issue with an endpoint blocking calls from other endpoints in my app. When we call this endpoint, this basically blocks all other api calls from executing, and they need to wait until this is finished.
public async Task<ActionResult> GrantAccesstoUsers()
{
// other operations
var grantResult = await
this._workSpaceProvider.GrantUserAccessAsync(this.CurrentUser.Id).ConfigureAwait(false);
return this.Ok(result);
}
The GrantUserAccessAsync method calls set of tasks that will run on a parallel.
public async Task<List<WorkspaceDetail>> GrantUserAccessAsync(string currentUser)
{
var responselist = new List<WorkspaceDetail>();
try
{
// calling these prematurely to be reused once threads are created
// none expensive calls
var properlyNamedWorkSpaces = await this._helper.GetProperlyNamedWorkspacesAsync(true).ConfigureAwait(false);
var dbGroups = await this._reportCatalogProvider.GetWorkspaceFromCatalog().ConfigureAwait(false);
var catalogInfo = await this._clientServiceHelper.GetDatabaseConfigurationAsync("our-service").ConfigureAwait(false);
if (properlyNamedWorkSpaces != null && properlyNamedWorkSpaces.Count > 0)
{
// these methods returns tasks for parallel processing
var grantUserContributorAccessTaskList = await this.GrantUserContributorAccessTaskList(properlyNamedWorkSpaces, currentUser, dbGroups, catalogInfo).ConfigureAwait(false);
var grantUserAdminAccessTaskList = await this.GrantUserAdminAccessTaskList(properlyNamedWorkSpaces, currentUser, dbGroups, catalogInfo).ConfigureAwait(false);
var removeInvalidUserAndSPNTaskList = await this.RemoveAccessRightsToWorkspaceTaskList(properlyNamedWorkSpaces, dbGroups, currentUser, catalogInfo).ConfigureAwait(false);
var tasklist = new List<Task<WorkspaceDetail>>();
tasklist.AddRange(grantUserContributorAccessTaskList);
tasklist.AddRange(grantUserAdminAccessTaskList);
tasklist.AddRange(removeInvalidUserAndSPNTaskList);
// Start running Parallel Task
Parallel.ForEach(tasklist, task =>
{
Task.Delay(this._config.CurrentValue.PacingDelay);
task.Start();
});
// Get All Client Worspace Processing Results
var clientWorkspaceProcessingResult = await Task.WhenAll(tasklist).ConfigureAwait(false);
// Populate result
responselist.AddRange(clientWorkspaceProcessingResult.ToList());
}
}
catch (Exception)
{
throw;
}
return responselist;
}
These methods are basically identical in structure and they look like this:
private async Task<List<Task<WorkspaceDetail>>> GrantUserContributorAccessTaskList(List<Group> workspaces, string currentUser, List<WorkspaceManagement> dbGroups, DatabaseConfig catalogInfo)
{
var tasklist = new List<Task<WorkspaceDetail>>();
foreach (var workspace in workspaces)
{
tasklist.Add(new Task<WorkspaceDetail>(() =>
this.GrantContributorAccessToUsers(workspace, currentUser, dbGroups, catalogInfo).Result));
// i added a delay here because we encountered an issue before in production and this seems to solve the problem. this is set to 4ms.
Task.Delay(this._config.CurrentValue.DelayInMiliseconds);
}
return tasklist;
}
The other methods called here looks like this:
private async Task<WorkspaceDetail> GrantContributorAccessToUsers(Group workspace, string currentUser, List<Data.ReportCatalogDB.WorkspaceManagement> dbGroups, DatabaseConfig catalogInfo)
{
// This prevents other thread or task to start and prevents exceeding the number of threads allowed
await this._batchProcessor.WaitAsync().ConfigureAwait(false);
var result = new WorkspaceDetail();
try
{
var contributorAccessresult = await this.helper.GrantContributorAccessToUsersAsync(workspace, this._powerBIConfig.CurrentValue.SPNUsers).ConfigureAwait(false);
if (contributorAccessresult != null
&& contributorAccessresult.Count > 0)
{
// do something
}
else
{
// do something
}
// this is done to reuse the call that is being executed in the helper above. it's an expensive call from an external endpoint so we opted to reuse what was used in the initial call, instead of calling it again for this process
var syncWorkspaceAccessToDb = await this.SyncWorkspaceAccessAsync(currentUser, workspace.Id, contributorAccessresult, dbGroups, catalogInfo).ConfigureAwait(false);
foreach (var dbResponse in syncWorkspaceAccessToDb) {
result.ResponseMessage += dbResponse.ResponseMessage;
}
}
catch (Exception ex)
{
this._loghelper.LogEvent(this._logger, logEvent, OperationType.GrantContributorAccessToWorkspaceManager, LogEventStatus.FAIL);
}
finally
{
this._batchProcessor.Release();
}
return result;
}
The last method called writes the record in a database table:
private async Task<List<WorkspaceDetail>> SyncWorkspaceAccessAsync(string currentUser,
Guid workspaceId,
List<GroupUser> groupUsers,
List<WorkspaceManagement> dbGroups,
DatabaseConfig catalogInfo) {
var result = new List<WorkspaceDetail>();
var tasklist = new List<Task<WorkspaceDetail>>();
// get active workspace details from the db
var workspace = dbGroups.Where(x => x.PowerBIGroupId == workspaceId).FirstOrDefault();
try
{
// to auto dispose the provider, we are creating this for each instance because
// having only one instance creates an error when the other task starts running
using (var contextProvider = this._contextFactory.GetReportCatalogProvider(
catalogInfo.Server,
catalogInfo.Database,
catalogInfo.Username,
catalogInfo.Password,
this._dbPolicy))
{
if (workspace != null)
{
// get current group users in the db from the workspace object
var currentDbGroupUsers = workspace.WorkspaceAccess.Where(w => w.Id == workspace.Id
&& w.IsDeleted == false).ToList();
#region identify to process
#region users to add
// identify users to add
var usersToAdd = groupUsers.Where(g => !currentDbGroupUsers.Any(w => w.Id == workspace.Id ))
.Select(g => new WorkspaceAccess
{
// class properties
}).ToList();
#endregion
var addTasks = await this.AddWorkspaceAccessToDbTask(catalogProvider, usersToAdd, workspace.PowerBIGroupId, workspace.WorkspaceName).ConfigureAwait(false);
tasklist.AddRange(addTasks);
// this is a potential fix that i did, hoping adding another parallel thread can solve the problem
Parallel.ForEach(tasklist, new ParallelOptions { MaxDegreeOfParallelism = this._config.CurrentValue.MaxDegreeOfParallelism }, task =>
{
Task.Delay(this._config.CurrentValue.PacingDelay);
task.Start();
});
var processResult = await Task.WhenAll(tasklist).ConfigureAwait(false);
// Populate result
result.AddRange(processResult.ToList());
}
}
}
catch (Exception ex)
{
// handle error
}
return result;
}
I tried some potential solutions already, like the methods here are written with Task.FromResult before instead of async so I changed that. Reference is from this thread:
Using Task.FromResult v/s await in C#
Also, I thought it was a similar issue that we faced before when we are creating multiple db context connections needed when running multiple parallel tasks by adding a small delay on tasks but that didn't solve the problem.
Task.Delay(this._config.CurrentValue.DelayInMiliseconds);
Any help would be much appreciated.
I assume your this._batchProcessor is an instance of SemaphoreSlim. If your other endpoints somehow call
await this._batchProcessor.WaitAsyc()
that means they can't go further until semaphor will be released.
Another thing I'd like to mention: please avoid using Parallel.ForEach with async/await. TPL is not designed to work with async/await, here is good answer why you should avoid using them together: Nesting await in Parallel.ForEach
I need to run many tasks in parallel as fast as possible. But if my program runs more than 30 tasks per 1 second, it will be blocked. How to ensure that tasks run no more than 30 per any 1-second interval?
In other words, we must prevent the new task from starting if 30 tasks were completed in the last 1-second interval.
My ugly possible solution:
private async Task Process(List<Task> taskList, int maxIntervalCount, int timeIntervalSeconds)
{
var timeList = new List<DateTime>();
var sem = new Semaphore(maxIntervalCount, maxIntervalCount);
var tasksToRun = taskList.Select(async task =>
{
do
{
sem.WaitOne();
}
while (HasAllowance(timeList, maxIntervalCount, timeIntervalSeconds));
await task;
timeList.Add(DateTime.Now);
sem.Release();
});
await Task.WhenAll(tasksToRun);
}
private bool HasAllowance(List<DateTime> timeList, int maxIntervalCount, int timeIntervalSeconds)
{
return timeList.Count <= maxIntervalCount
|| DateTime.Now.Subtract(TimeSpan.FromSeconds(timeIntervalSeconds)) > timeList[timeList.Count - maxIntervalCount];
}
User code should never have to control how tasks are scheduled directly. For one thing, it can't - controlling how tasks run is the job of the TaskScheduler. When user code calls .Start(), it simply adds a task to a threadpool queue for execution. await executes already executing tasks.
The TaskScheduler samples show how to create limited concurrency schedulers, but again, there are better, high-level options.
The question's code doesn't throttle the queued tasks anyway, it limits how many of them can be awaited. They are all running already. This is similar to batching the previous asynchronous operation in a pipeline, allowing only a limited number of messages to pass to the next level.
ActionBlock with delay
The easy, out-of-the-box way would be to use an ActionBlock with a limited MaxDegreeOfParallelism, to ensure no more than N concurrent operations can run at the same time. If we know how long each operation takes, we could add a bit of delay to ensure we don't overshoot the throttle limit.
In this case, 7 concurrent workers perform 4 requests/second, for a total of 28 maximum request per second. The BoundedCapacity means that only up to 7 items will be stored in the input buffer before downloader.SendAsync blocks. This way we avoid flooding the ActionBlock if the operations take too long.
var downloader = new ActionBlock<string>(
async url => {
await Task.Delay(250);
var response=await httpClient.GetStringAsync(url);
//Do something with it.
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 7, BoundedCapacity=7 }
);
//Start posting to the downloader
foreach(var item in urls)
{
await downloader.SendAsync(item);
}
downloader.Complete();
await downloader.Completion;
ActionBlock with SemaphoreSlim
Another option would be to combine this with a SemaphoreSlim that gets reset periodically by a timer.
var refreshTimer = new Timer(_=>sm.Release(30));
var downloader = new ActionBlock<string>(
async url => {
await semaphore.WaitAsync();
try
{
var response=await httpClient.GetStringAsync(url);
//Do something with it.
}
finally
{
semaphore.Release();
}
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5, BoundedCapacity=5 }
);
//Start the timer right before we start posting
refreshTimer.Change(1000,1000);
foreach(....)
{
}
This is the snippet:
var tasks = new List<Task>();
foreach(item in listNeedInsert)
{
var task = TaskToRun(item);
tasks.Add(task);
if(tasks.Count == 100)
{
await Task.WhenAll(tasks);
tasks.Clear();
}
}
// Wait for anything left to finish
await Task.WhenAll(tasks);
Notice that I rather add the task into a List<Task>(); and after all is added, I await all in the same List<Task>();
What you do here:
var tasks = taskList.Select(async task =>
{
do
{
sem.WaitOne();
}
while (timeList.Count <= maxIntervalCount
|| DateTime.Now.Subtract(TimeSpan.FromSeconds(timeIntervalSeconds)) > timeList[timeList.Count - maxIntervalCount]);
await task;
is blocking until the task finishes it's work thus making this call:
Task.WhenAll(tasks).Wait();
completely redundant. Furthermore, this line Task.WhenAll(tasks).Wait(); is performing unnecessary blocking on the WhenAll method.
Is the blocking due to some server/firewall/hardware limit or it is based on observation?
You should try to use BlockingCollection<Task> or similar thread safe collections especially if the job of your tasks are I/O-bound. You can even set the capacity to 30:
var collection = BlockingCollection<Task>(30);
Then you can start 2 async method:
var population = Task.Factory.Start(Populate);
var processing = Task.Factory.Start(Dequeue);
await Task.WhenAll(population, processing);
Task Populate()
{
foreach (...)
collection.Add(...);
collection.CompleteAdding();
}
Task Dequeue
{
while(!collection.IsComplete)
await collection.Take(); //consider using TryTake()
}
If the limit presists due to some true limitation (should be very rare) change Populate() as follows:
var stopper = Stopwatch.StartNew();
for (var i = ....) //instead of foreach
{
if (i % 30 == 0)
{
if (stopper.ElapsedMilliseconds < 1000)
Task.Delay(1000 - stopper.ElapsedMilliseconds); //note that this race condition should be avoided in your code
stopper.Restart();
}
collection.Add(...);
}
collection.CompleteAdding();
I think that this problem can be solved by a SemaphoreSlim limited to the number of maximum tasks per interval, and also by a Task.Delay that delays the release of the SemaphoreSlim after each task's completion, for an interval equal to the required throttling interval. Below is an implementation based on this idea. The rate limiting can be applied in two ways:
With includeAsynchronousDuration: false the rate limit affects how many operations can be started during the specified time span. The duration of each operation is not taken into account.
With includeAsynchronousDuration: true the rate limit affects how many operations can be counted as "active" during the specified time span, and is more restrictive (makes the enumeration slower). Instead of counting each operation as a moment in time (when started), it is counted as a time span (between start and completion). An operation is counted as "active" for a specified time span, if and only if its own time span intersects with the specified time span.
/// <summary>
/// Applies an asynchronous transformation for each element of a sequence,
/// limiting the number of transformations that can start or be active during
/// the specified time span.
/// </summary>
public static async Task<TResult[]> ForEachAsync<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, Task<TResult>> action,
int maxActionsPerTimeUnit,
TimeSpan timeUnit,
bool includeAsynchronousDuration = false,
bool onErrorContinue = false, /* Affects only asynchronous errors */
bool executeOnCapturedContext = false)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (action == null) throw new ArgumentNullException(nameof(action));
if (maxActionsPerTimeUnit < 1)
throw new ArgumentOutOfRangeException(nameof(maxActionsPerTimeUnit));
if (timeUnit < TimeSpan.Zero || timeUnit.TotalMilliseconds > Int32.MaxValue)
throw new ArgumentOutOfRangeException(nameof(timeUnit));
using var semaphore = new SemaphoreSlim(maxActionsPerTimeUnit,
maxActionsPerTimeUnit);
using var cts = new CancellationTokenSource();
var tasks = new List<Task<TResult>>();
var releaseTasks = new List<Task>();
try // Watch for exceptions thrown by the source enumerator
{
foreach (var item in source)
{
try
{
await semaphore.WaitAsync(cts.Token)
.ConfigureAwait(executeOnCapturedContext);
}
catch (OperationCanceledException) { break; }
// Exceptions thrown synchronously by invoking the action are breaking
// the loop unconditionally (the onErrorContinue has no effect on them).
var task = action(item);
if (!onErrorContinue) task = ObserveFailureAsync(task);
tasks.Add(task);
releaseTasks.Add(ScheduleSemaphoreReleaseAsync(task));
}
}
catch (Exception ex) { tasks.Add(Task.FromException<TResult>(ex)); }
cts.Cancel(); // Cancel all release tasks
Task<TResult[]> whenAll = Task.WhenAll(tasks);
try { return await whenAll.ConfigureAwait(false); }
catch (OperationCanceledException) when (whenAll.IsCanceled) { throw; }
catch { whenAll.Wait(); throw; } // Propagate AggregateException
finally { await Task.WhenAll(releaseTasks); }
async Task<TResult> ObserveFailureAsync(Task<TResult> task)
{
try { return await task.ConfigureAwait(false); }
catch { cts.Cancel(); throw; }
}
async Task ScheduleSemaphoreReleaseAsync(Task<TResult> task)
{
if (includeAsynchronousDuration)
try { await task.ConfigureAwait(false); } catch { } // Ignore exceptions
// Release only if the Task.Delay completed successfully
try { await Task.Delay(timeUnit, cts.Token).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
semaphore.Release();
}
}
Usage example:
int[] results = await ForEachAsync(Enumerable.Range(1, 100), async n =>
{
await Task.Delay(500); // Simulate some asynchronous I/O-bound operation
return n;
}, maxActionsPerTimeUnit: 30, timeUnit: TimeSpan.FromSeconds(1.0),
includeAsynchronousDuration: true);
The reasons for propagating an AggregateException using the catch+Wait technique, are explained here.
I have an array of tasks and I am awaiting them with Task.WhenAll. My tasks are failing frequently, in which case I inform the user with a message box so that she can try again. My problem is that reporting the error is delayed until all tasks are completed. Instead I would like to inform the user as soon as the first task has thrown an exception. In other words I want a version of Task.WhenAll that fails fast. Since no such build-in method exists I tried to make my own, but my implementation does not behave the way I want. Here is what I came up with:
public static async Task<TResult[]> WhenAllFailFast<TResult>(
params Task<TResult>[] tasks)
{
foreach (var task in tasks)
{
await task.ConfigureAwait(false);
}
return await Task.WhenAll(tasks).ConfigureAwait(false);
}
This generally throws faster than the native Task.WhenAll, but usually not fast enough. A faulted task #2 will not be observed before the completion of task #1. How can I improve it so that it fails as fast as possible?
Update: Regarding cancellation, it is not in my requirements right now, but lets say that for consistency the first cancelled task should stop the awaiting immediately. In this case the combining task returned from WhenAllFailFast should have Status == TaskStatus.Canceled.
Clarification: Τhe cancellation scenario is about the user clicking a Cancel button to stop the tasks from completing. It is not about cancelling automatically the incomplete tasks in case of an exception.
Your best bet is to build your WhenAllFailFast method using TaskCompletionSource. You can .ContinueWith() every input task with a synchronous continuation that errors the TCS when the tasks end in the Faulted state (using the same exception object).
Perhaps something like (not fully tested):
using System;
using System.Threading;
using System.Threading.Tasks;
namespace stackoverflow
{
class Program
{
static async Task Main(string[] args)
{
var cts = new CancellationTokenSource();
cts.Cancel();
var arr = await WhenAllFastFail(
Task.FromResult(42),
Task.Delay(2000).ContinueWith<int>(t => throw new Exception("ouch")),
Task.FromCanceled<int>(cts.Token));
Console.WriteLine("Hello World!");
}
public static Task<TResult[]> WhenAllFastFail<TResult>(params Task<TResult>[] tasks)
{
if (tasks is null || tasks.Length == 0) return Task.FromResult(Array.Empty<TResult>());
// defensive copy.
var defensive = tasks.Clone() as Task<TResult>[];
var tcs = new TaskCompletionSource<TResult[]>();
var remaining = defensive.Length;
Action<Task> check = t =>
{
switch (t.Status)
{
case TaskStatus.Faulted:
// we 'try' as some other task may beat us to the punch.
tcs.TrySetException(t.Exception.InnerException);
break;
case TaskStatus.Canceled:
// we 'try' as some other task may beat us to the punch.
tcs.TrySetCanceled();
break;
default:
// we can safely set here as no other task remains to run.
if (Interlocked.Decrement(ref remaining) == 0)
{
// get the results into an array.
var results = new TResult[defensive.Length];
for (var i = 0; i < tasks.Length; ++i) results[i] = defensive[i].Result;
tcs.SetResult(results);
}
break;
}
};
foreach (var task in defensive)
{
task.ContinueWith(check, default, TaskContinuationOptions.ExecuteSynchronously, TaskScheduler.Default);
}
return tcs.Task;
}
}
}
Edit: Unwraps AggregateException, Cancellation support, return array of results. Defend against array mutation, null and empty. Explicit TaskScheduler.
I recently needed once again the WhenAllFailFast method, and I revised #ZaldronGG's excellent solution to make it a bit more performant (and more in line with Stephen Cleary's recommendations). The implementation below handles around 3,500,000 tasks per second in my PC.
public static Task<TResult[]> WhenAllFailFast<TResult>(params Task<TResult>[] tasks)
{
if (tasks is null) throw new ArgumentNullException(nameof(tasks));
if (tasks.Length == 0) return Task.FromResult(new TResult[0]);
var results = new TResult[tasks.Length];
var remaining = tasks.Length;
var tcs = new TaskCompletionSource<TResult[]>(
TaskCreationOptions.RunContinuationsAsynchronously);
for (int i = 0; i < tasks.Length; i++)
{
var task = tasks[i];
if (task == null) throw new ArgumentException(
$"The {nameof(tasks)} argument included a null value.", nameof(tasks));
HandleCompletion(task, i);
}
return tcs.Task;
async void HandleCompletion(Task<TResult> task, int index)
{
try
{
var result = await task.ConfigureAwait(false);
results[index] = result;
if (Interlocked.Decrement(ref remaining) == 0)
{
tcs.TrySetResult(results);
}
}
catch (OperationCanceledException)
{
tcs.TrySetCanceled();
}
catch (Exception ex)
{
tcs.TrySetException(ex);
}
}
}
Your loop waits for each of the tasks in pseudo-serial, so that's why it waits for task1 to complete before checking if task2 failed.
You might find this article helpful on a pattern for aborting after the first failure: http://gigi.nullneuron.net/gigilabs/patterns-for-asynchronous-composite-tasks-in-c/
public static async Task<TResult[]> WhenAllFailFast<TResult>(
params Task<TResult>[] tasks)
{
var taskList = tasks.ToList();
while (taskList.Count > 0)
{
var task = await Task.WhenAny(taskList).ConfigureAwait(false);
if(task.Exception != null)
{
// Left as an exercise for the reader:
// properly unwrap the AggregateException;
// handle the exception(s);
// cancel the other running tasks.
throw task.Exception.InnerException;
}
taskList.Remove(task);
}
return await Task.WhenAll(tasks).ConfigureAwait(false);
}
I'm adding one more answer to this problem, not because I've found a faster solution, but because I am now a bit skeptical about starting multiple async void operations on an unknown SynchronizationContext. The solution I am proposing here is significantly slower. It's about 3 times slower than #ZaldronGG's excellent solution, and about 10 times slower than my previous async void-based implementation. It has though the advantage that after the completion of the returned Task<TResult[]>, it doesn't leak fire-and-forget continuations attached on the observed tasks. When this task is completed, all the continuations created internally by the WhenAllFailFast method have been cleaned up. Which is a desirable behavior for APIs is general, but in many scenarios it might not be important.
public static Task<TResult[]> WhenAllFailFast<TResult>(params Task<TResult>[] tasks)
{
ArgumentNullException.ThrowIfNull(tasks);
CancellationTokenSource cts = new();
Task<TResult> failedTask = null;
TaskContinuationOptions flags = TaskContinuationOptions.DenyChildAttach |
TaskContinuationOptions.ExecuteSynchronously;
Action<Task<TResult>> continuationAction = new(task =>
{
if (!task.IsCompletedSuccessfully)
if (Interlocked.CompareExchange(ref failedTask, task, null) is null)
cts.Cancel();
});
IEnumerable<Task> continuations = tasks.Select(task => task
.ContinueWith(continuationAction, cts.Token, flags, TaskScheduler.Default));
return Task.WhenAll(continuations).ContinueWith(allContinuations =>
{
cts.Dispose();
var localFailedTask = Volatile.Read(ref failedTask);
if (localFailedTask is not null)
return Task.WhenAll(localFailedTask);
// At this point all the tasks are completed successfully
Debug.Assert(tasks.All(t => t.IsCompletedSuccessfully));
Debug.Assert(allContinuations.IsCompletedSuccessfully);
return Task.WhenAll(tasks);
}, default, flags, TaskScheduler.Default).Unwrap();
}
This implementation is similar to ZaldronGG's in that it attaches one continuation on each task, with the difference being that these continuations are cancelable, and they are canceled en masse when the first non-successful task is observed. It also uses the Unwrap technique that I've discovered recently, which eliminates the need for the manual completion of a TaskCompletionSource<TResult[]> instance, and usually makes for a concise implementation.