Processing Async Calls in ActionBlock - c#

I am having issues with processing Async lambda in ActionBlock.
var state = new ConcurrentBag<UsersData>();
var getData = new ActionBlock<IEnumerable<int>>(async (userIdsBatch) =>
{
var query = _queryBuilder.GetQuery(userIdsBatch);
string response = await _handler.GetResponseAsync(query).ConfigureAwait(false);
UsersData userData = new Parser().GetUserData(response);
state.Add(userData);
},
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = _MAX_DEGREE_OF_PARALLELISM // 1
});
foreach (IEnumerable<int> userIdsBatch in GetUserBatches(userIds, 10))
{
getData.Post(userIdsBatch);
}
getData.Complete();
await getData.Completion.ConfigureAwait(false);
// merge the states for different batches synchronously.
The ActionBlock is exiting before completion of the async call within it. Few Tasks that were produced are thrown away, not all of them are completed.
Is there a way to wait for all the tasks to complete before the synchronously merging the results?

Related

.NET Task returning object and calling Async inside

I have some tasks executing in a WhenAll(). I get a semantic error if a task returns an object and calls an async method inside their Run(). The async method fetches from Blob some string content, then constructs and returns an object.
Do you know how to solve this issue, while maintaining the batch download done by tasks?
I need a list with those FinalWrapperObjects.
Error message
Cannot convert async lamba expression to delegate type
'Func<FinalWrapperObject>'. An async lambda expression may return
void, Task or Task, none of which are convertible to
'Func<FinalWrapperObject>'.
...
List<FinalWrapperObject> finalReturns = new List<FinalWrapperObject>();
List<Task<FinalWrapperObject>> tasks = new List<Task<FinalWrapperObject>>();
var resultsBatch = fetchedObjects.Skip(i).Take(10).ToList();
foreach (var resultBatchItem in resultsBatch)
{
tasks.Add(
new Task<FinalWrapperObject>(async () => //!! errors here on arrow
{
var blobContent = await azureBlobService.GetAsync(resultBatchItem.StoragePath);
return new FinalWrapperObject {
BlobContent = blobContent,
CreationDateTime = resultBatchItem.CreationDateTime
};
})
);
}
FinalWrapperObject[] listFinalWrapperObjects = await Task.WhenAll(tasks);
finalReturns.AddRange(listFinalWrapperObjects);
return finalReturns;
Your code never starts any tasks. Tasks aren't threads anyway. They're a promise that something will complete and maybe produce a value in the future. Some tasks require a thread to run. These are executed using threads that come from a threadpool. Others, eg async IO operations, don't require a thread. Uploading a file is such an IO operation.
Your lambda is asynchronous and already returning a Task so there's no reason to use Task.Run. You can execute it once for all items, collect the Tasks in a list and await all of them. That's the bare-bones way :
async Task<FinalWrapperObject> UploadItemAsync(BatchItem resultBatchItem) =>
{
var blobContent = await azureBlobService.GetAsync(resultBatchItem.StoragePath);
return new FinalWrapperObject {
BlobContent = blobContent,
CreationDateTime = resultBatchItem.CreationDateTime
};
}
...
var tasks=resultsBatch.Select(UploadItemAsync);
var results=await Task.WhenAll(tasks);
Using TPL Dataflow
A better option would be to use the TPL Dataflow classes to upload items concurrently and even construct a pipeline from processing blocks.
var options= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10
};
var results=new BufferBlock<FinalWrapperObject>();
var uploader=new TransformBlock<BatchItem,FinalWrapperObject>(UploadItemAsync,options);
uploader.LinkTo(results);
foreach(var item in fetchedObjects)
{
uploader.PostAsync(item);
}
uploader.Complete();
await uploader.Completion;
By default, a block only processes one message at a time. Using MaxDegreeOfParallelism = 10 we're telling it to process 10 items concurrently. This code will upload 10 items concurrently at a time, as long as there items to post to the uploader block.
The results are forwarded to the results BufferBlock. The items can be extracted with TryReceiveAll :
IList<FinalWrapperObject> items;
results.TryReceiveAll(out items);
Dataflow blocks can be combined into a pipeline. You could have a block that loads items from disk, another to upload them and a final one that stores the response to another file or database :
var dop10= new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = 10,
BoundedCapacity=4
};
var bounded= new ExecutionDataflowBlockOptions
{
BoundedCapacity=4
};
var loader=new TransformBlock<FileInfo,BatchItem>(LoadFile,bounded);
var uploader=new TransformBlock<BatchItem,FinalWrapperObject>(UploadItemAsync,dop10);
var dbLogger=new ActionBlock<FinalWrapperObject>(bounded);
var linkOptions=new DataflowLinkOptions {PropagateCompletion=true};
loader.LinkTo(uploader,linkOptions);
uploader.LinkTo(dbLogger,linkOptions);
var folder=new DirectoryInfo(rootPath);
foreach(var item in folder.EnumerateFiles())
{
await loader.SendAsync(item);
}
loader.Complete();
await dbLogger.Completion;
In this case, all files in a folder are posted to the loader block which loads files one by one and forwards a BatchItem. The uploader uploads the file and the results are stored by dbLogger. In the end, we tell loader we're finished and wait for all items to get processed all the way to the end with await dbLogger.Completion.
The BoundedCapacity is used to put a limit on how many items can be held at each block's input buffer. This prevents loading all files into memory.

Wait with Read() for Parallel.ForEach [duplicate]

I would like to handle a collection in parallel, but I'm having trouble implementing it and I'm therefore hoping for some help.
The trouble arises if I want to call a method marked async in C#, within the lambda of the parallel loop. For example:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
var count = bag.Count;
The problem occurs with the count being 0, because all the threads created are effectively just background threads and the Parallel.ForEach call doesn't wait for completion. If I remove the async keyword, the method looks like this:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
// some pre stuff
var responseTask = await GetData(item);
responseTask.Wait();
var response = responseTask.Result;
bag.Add(response);
// some post stuff
}
var count = bag.Count;
It works, but it completely disables the await cleverness and I have to do some manual exception handling.. (Removed for brevity).
How can I implement a Parallel.ForEach loop, that uses the await keyword within the lambda? Is it possible?
The prototype of the Parallel.ForEach method takes an Action<T> as parameter, but I want it to wait for my asynchronous lambda.
If you just want simple parallelism, you can do this:
var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;
If you need something more complex, check out Stephen Toub's ForEachAsync post.
You can use the ParallelForEachAsync extension method from AsyncEnumerator NuGet Package:
using Dasync.Collections;
var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;
Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.
One of the new .NET 6 APIs is Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urls = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://stackoverflow.com"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urls, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Another example in Scott Hanselman's blog.
The source, for reference.
With SemaphoreSlim you can achieve parallelism control.
var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
await throttler.WaitAsync();
try
{
var response = await GetData(item);
bag.Add(response);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
var count = bag.Count;
Simplest possible extension method compiled from other answers and the article referenced by the accepted asnwer:
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync();
try
{
await asyncAction(item).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
UPDATE: here's a simple modification that also supports a cancellation token like requested in the comments (untested)
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, CancellationToken, Task> asyncAction, int maxDegreeOfParallelism, CancellationToken cancellationToken)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync(cancellationToken);
if (cancellationToken.IsCancellationRequested) return;
try
{
await asyncAction(item, cancellationToken).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
My lightweight implementation of ParallelForEach async.
Features:
Throttling (max degree of parallelism).
Exception handling (aggregation exception will be thrown at completion).
Memory efficient (no need to store the list of tasks).
public static class AsyncEx
{
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
{
var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
var tcs = new TaskCompletionSource<object>();
var exceptions = new ConcurrentBag<Exception>();
bool addingCompleted = false;
foreach (T item in source)
{
await semaphoreSlim.WaitAsync();
asyncAction(item).ContinueWith(t =>
{
semaphoreSlim.Release();
if (t.Exception != null)
{
exceptions.Add(t.Exception);
}
if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
{
tcs.TrySetResult(null);
}
});
}
Volatile.Write(ref addingCompleted, true);
await tcs.Task;
if (exceptions.Count > 0)
{
throw new AggregateException(exceptions);
}
}
}
Usage example:
await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
var data = await GetData(i);
}, maxDegreeOfParallelism: 100);
I've created an extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
In the accepted answer the ConcurrentBag is not required.
Here's an implementation without it:
var tasks = myCollection.Select(GetData).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result);
Any of the "// some pre stuff" and "// some post stuff" can go into the GetData implementation (or another method that calls GetData)
Aside from being shorter, there's no use of an "async void" lambda, which is an anti pattern.
The following is set to work with IAsyncEnumerable but can be modified to use IEnumerable by just changing the type and removing the "await" on the foreach. It's far more appropriate for large sets of data than creating countless parallel tasks and then awaiting them all.
public static async Task ForEachAsyncConcurrent<T>(this IAsyncEnumerable<T> enumerable, Func<T, Task> action, int maxDegreeOfParallelism, int? boundedCapacity = null)
{
ActionBlock<T> block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = boundedCapacity ?? maxDegreeOfParallelism * 3
});
await foreach (T item in enumerable)
{
await block.SendAsync(item).ConfigureAwait(false);
}
block.Complete();
await block.Completion;
}
For a more simple solution (not sure if the most optimal), you can simply nest Parallel.ForEach inside a Task - as such
var options = new ParallelOptions { MaxDegreeOfParallelism = 5 }
Task.Run(() =>
{
Parallel.ForEach(myCollection, options, item =>
{
DoWork(item);
}
}
The ParallelOptions will do the throttlering for you, out of the box.
I am using it in a real world scenario to run a very long operations in the background. These operations are called via HTTP and it was designed not to block the HTTP call while the long operation is running.
Calling HTTP for long background operation.
Operation starts at the background.
User gets status ID which can be used to check the status using another HTTP call.
The background operation update its status.
That way, the CI/CD call does not timeout because of long HTTP operation, rather it loops the status every x seconds without blocking the process

Retry Failed Tasks in C#.Net -WhenAll Async [duplicate]

I have a solution that creates multiple I/O based tasks and I'm using Task.WhenAny() to manage these tasks. But often many of the tasks will fail due to network issue or request throttling etc.
I can't seem to find a solution that enables me to successfully retry failed tasks when using a Task.WhenAny() approach.
Here is what I'm doing:
var tasks = new List<Task<MyType>>();
foreach(var item in someCollection)
{
task.Add(GetSomethingAsync());
}
while (tasks.Count > 0)
{
var child = await Task.WhenAny(tasks);
tasks.Remove(child);
???
}
So the above structure works for completing tasks, but I haven't found a way to handle and retry failing tasks. The await Task.WhenAny throws an AggregateException rather than allowing me to inspect a task status. When In the exception handler I no longer have any way to retry the failed task.
I believe it would be easier to retry within the tasks, and then replace the Task.WhenAny-in-a-loop antipattern with Task.WhenAll
E.g., using Polly:
var tasks = new List<Task<MyType>>();
var policy = ...; // See Polly documentation
foreach(var item in someCollection)
tasks.Add(policy.ExecuteAsync(() => GetSomethingAsync()));
await Task.WhenAll(tasks);
or, more succinctly:
var policy = ...; // See Polly documentation
var tasks = someCollection.Select(item => policy.ExecuteAsync(() => GetSomethingAsync()));
await Task.WhenAll(tasks);
If you don't want to use the Polly library for some reason, you could use the Retry method bellow. It accepts a task factory, and keeps creating and then awaiting a task until it completes successfully, or the maxAttempts have been reached:
public static async Task<TResult> Retry<TResult>(Func<Task<TResult>> taskFactory,
int maxAttempts)
{
int failedAttempts = 0;
while (true)
{
try
{
var task = taskFactory();
return await task.ConfigureAwait(false);
}
catch
{
failedAttempts++;
if (failedAttempts >= maxAttempts) throw;
}
}
}
You could then use this method to download (for example) some web pages.
string[] urls =
{
"https://stackoverflow.com",
"https://superuser.com",
//"https://no-such.url",
};
var httpClient = new HttpClient();
var tasks = urls.Select(url => Retry(async () =>
{
return (Url: url, Html: await httpClient.GetStringAsync(url));
}, maxAttempts: 5));
var results = await Task.WhenAll(tasks);
foreach (var result in results)
{
Console.WriteLine($"Url: {result.Url}, {result.Html.Length:#,0} chars");
}
Output:
Url: https://stackoverflow.com, 112,276 chars
Url: https://superuser.com, 122,784 chars
If you uncomment the third url then instead of these results an HttpRequestException will be thrown, after five failed attempts.
The Task.WhenAll method will wait for the completion of all tasks before propagating the error. In case it is preferable to report the error as soon as possible, you can find solutions in this question.

C# ForEach Loop With ASync Tasks & Dependent Post ASync Tasks

I'm having trouble trying to correctly architect the most efficient way to iterate several async tasks launched from a request object and then performing some other async tasks that depend on both the request object and the result of the first async task. I'm running a C# lambda function in AWS. I've tried a model like this (error handling and such has been omitted for brevity):
public async Task MyAsyncWrapper()
{
List<Task> Tasks = new List<Task>();
foreach (var Request in Requests)
{
var Continuation = this.ExecuteAsync(Request).ContinueWith(async x => {
var KeyValuePair<bool, string> Result = x.Result;
if (Result.Key == true)
{
await this.DoSomethingElseAsync(Request.Id, Request.Name, Result.Value);
Console.WriteLine("COMPLETED");
}
}
Tasks.Add(Continuation);
}
Task.WaitAll(Tasks.ToArray());
}
This approach results in the DoSomethingElseAsync() method not really getting awaited on and in a lot of my Lambda Function calls, I never get the "COMPLETED" output. I've also approached this in this method:
public async Task MyAsyncWrapper()
{
foreach (var Request in Requests)
{
KeyValuePair<bool, string> Result = await this.ExecuteAsync(Request);
if (Result.Key == true)
{
await this.DoSomethingElseAsync(Request.Id, Request.Name, Result.Value);
Console.WriteLine("COMPLETED");
}
}
}
This works, but I think it's wasteful, since I can only execute one iteration of the loop while waiting on the asnyc's to finish. I also have referenced Interleaved Tasks but the issue is that I basically have two loops, one to populate the tasks, and another to iterate them after they've completed, where I don't have access to the original Request object anymore. So basically this:
List<Task<KeyValuePair<bool, string>>> Tasks = new List<Task<KeyValuePair<bool, string>>>();
foreach (var Request in Requests)
{
Tasks.Add(ths.ExecuteAsync(Request);
}
foreach (Task<KeyValuePair<bool, string>> ResultTask in Tasks.Interleaved())
{
KeyValuePair<bool, string> Result = ResultTask.Result;
//Can't access the original request for this method's parameters
await this.DoSomethingElseAsync(???, ???, Result.Value);
}
Any ideas on better ways to implement this type of async chaining in a foreach loop? My ideal approach wouldn't be to return the request object back as part of the response from ExecuteAsync(), so I'd like to try and find other options if possible.
I may be misinterpreting, but why not move your "iteration" into it's own function and then use Task.WhenAll to wait for all iterations in parallel.
public async Task MyAsyncWrapper()
{
var allTasks = Requests.Select(ProcessRequest);
await Task.WhenAll(allTasks);
}
private async Task ProcessRequest(Request request)
{
KeyValuePair<bool, string> Result = await this.ExecuteAsync(request);
if (Result.Key == true)
{
await this.DoSomethingElseAsync(request.Id, request.Name, Result.Value);
Console.WriteLine("COMPLETED");
}
}
Consider using TPL dataflow:
var a = new TransformBlock<Input, OutputA>(async Input i=>
{
// do something async.
return new OutputA();
});
var b = new TransformBlock<OutputA, OutputB>(async OutputA i =>
{
// do more async.
return new OutputB();
});
var c = new ActionBlock<OutputB>(async OutputB i =>
{
// do some final async.
});
a.LinkTo(b, new DataflowLinkOptions { PropogateCompletion = true });
b.LinkTo(c, new DataflowLinkOptions { PropogateCompletion = true });
// push all of the items into the dataflow.
a.Post(new Input());
a.Complete();
// wait for it all to complete.
await c.Completion;

Asynchronous Parallel threads

I want to run a function asynchronously in parallel for all the objects in a collection.
Here's the current code snippet.
Parallel.ForEach(Objs,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
item =>
{
DoSomething(someParameter, item);
});
How can I achieve this is with Async-Parallel Threads
It's not clear what you mean by "asynchronously in parallel," since asynchrony (using fewer threads) is kind of the opposite of parallel (using more threads).
But if you mean that you want to run the asynchronous method DoSomething concurrently, then that can be done via Task.WhenAll:
var tasks = Objs.Select(item => DoSomething(someParamter, item));
await Task.WhenAll(tasks);
If you want to limit the degree of concurrency (e.g., to 10), then you can use a SemaphoreSlim:
var semaphore = new SemaphoreSlim(10);
var tasks = Objs.Select(item =>
{
await semaphore.WaitAsync();
try { await DoSomething(someParameter, item); }
finally { semaphore.Release(); }
});
await Task.WhenAll(tasks);

Categories

Resources