I'm wondering if there is a neat way for IDataflowBlock.Completion to replace needing to use a cancellation token for ReceiveAsync or a similar method which consumes from BufferBlock or another IDataflowBlock.
IDataflowBlock.ReceiveAsync<T>(TimeSpan, CancellationToken)
If InputQueue is a BufferBlock:
BufferBlock<String> InputQueue
for (int i = 0; i < 26; i++)
{
await InputQueue.SendAsync(((char)(97 + i)).ToString());
}
If InputQueue.Complete(); has been called, then when the queue is emptied and IDataflowBlock.Completion will change to status RanToCompletion,
which can be checked with IDataflowBlock.Completion.IsCompleted.
If multiple threads are taking from the queue this could happen during InputQueue.ReceiveAsync, is there a neater alternative to handle InputQueue completing than:
try
{
String parcel = await InputQueue.ReceiveAsync(timeSpan);
}
catch(InvalidOperationException x)
{
}
The simplest way to cancel a Dataflow Block is to provide the token to block's constructor, like this:
new ExecutionDataflowBlockOptions
{
CancellationToken = cancellationSource.Token
});
CancellationToken is defined in DataflowBlockOptions class, so even BufferBlock could be canceled.
Why are you implementing the Receive logic by yourself? Is there some restriction no to use the PropagateCompletion with linking your blocks? For example, if your code looks like this:
internal void HandleMessage()
{
try
{
var parcel = await InputQueue.ReceiveAsync(timeSpan);
// handle parsel
}
catch(InvalidOperationException x)
{
}
}
Then you simply may use the ActionBlock like this:
var InputQueue = new BufferBlock<string>();
var Handler = new ActionBlock<string>(parcel =>
{
// handle parsel
});
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
InputQueue.LinkTo(Handler, linkOptions);
// now after you call Complete method for InputQueue the completion will be propagated to your Handler block:
for (int i = 0; i < 26; i++)
{
await InputQueue.SendAsync(((char)(97 + i)).ToString());
}
InputQueue.Complete();
await Handler.Completion;
Also note that if you need some interaction with UI, you may use your last block as IObservable with Rx.Net library.
Related
I have a bunch of requests to process, some of which may complete synchronously.
I'd like to gather all results that are immediately available and return them early, while waiting for the rest.
Roughly like this:
List<Task<Result>> tasks = new ();
List<Result> results = new ();
foreach (var request in myRequests) {
var task = request.ProcessAsync();
if (task.IsCompleted)
results.Add(task.Result); // or Add(await task) ?
else
tasks.Add(task);
}
// send results that are available "immediately" while waiting for the rest
if (results.Count > 0) SendResults(results);
results = await Task.WhenAll(tasks);
SendResults(results);
I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted, or where it may change back to false again, etc.?
Similarly, could it be dangerous to use task.Result even after checking IsCompleted, should one always prefer await task? What if were using ValueTask instead of Task?
I'm not sure whether relying on IsCompleted might be a bad idea; could there be situations where its result cannot be trusted...
If you're in a multithreaded context, it's possible that IsCompleted could return false at the moment when you check on it, but it completes immediately thereafter. In cases like the code you're using, the cost of this happening would be very low, so I wouldn't worry about it.
or where it may change back to false again, etc.?
No, once a Task completes, it cannot uncomplete.
could it be dangerous to use task.Result even after checking IsCompleted.
Nope, that should always be safe.
should one always prefer await task?
await is a great default when you don't have a specific reason to do something else, but there are a variety of use cases where other patterns might be useful. The use case you've highlighted is a good example, where you want to return the results of finished tasks without awaiting all of them.
As Stephen Cleary mentioned in a comment below, it may still be worthwhile to use await to maintain expected exception behavior. You might consider doing something more like this:
var requestsByIsCompleted = myRequests.ToLookup(r => r.IsCompleted);
// send results that are available "immediately" while waiting for the rest
SendResults(await Task.WhenAll(requestsByIsCompleted[true]));
SendResults(await Task.WhenAll(requestsByIsCompleted[false]));
What if were using ValueTask instead of Task?
The answers above apply equally to both types.
You could use code like this to continually send the results of completed tasks while waiting on others to complete.
foreach (var request in myRequests)
{
tasks.Add(request.ProcessAsync());
}
// wait for at least one task to be complete, then send all available results
while (tasks.Count > 0)
{
// wait for at least one task to complete
Task.WaitAny(tasks.ToArray());
// send results for each completed task
var completedTasks = tasks.Where(t => t.IsCompleted);
var results = completedTasks.Where(t => t.IsCompletedSuccessfully).Select(t => t.Result).ToList();
SendResults(results);
// TODO: handle completed but failed tasks here
// remove completed tasks from the tasks list and keep waiting
tasks.RemoveAll(t => completedTasks.Contains(t));
}
Using only await you can achieve the desired behavior:
async Task ProcessAsync(MyRequest request, Sender sender)
{
var result = await request.ProcessAsync();
await sender.SendAsync(result);
}
...
async Task ProcessAll()
{
var tasks = new List<Task>();
foreach(var request in requests)
{
var task = ProcessAsync(request, sender);
// Dont await until all requests are queued up
tasks.Add(task);
}
// Await on all outstanding requests
await Task.WhenAll(tasks);
}
There are already good answers, but in addition of them here is my suggestion too, on how to handle multiple tasks and process each task differently, maybe it will suit your needs. My example is with events, but you can replace them with some kind of state management that fits your needs.
public interface IRequestHandler
{
event Func<object, Task> Ready;
Task ProcessAsync();
}
public class RequestHandler : IRequestHandler
{
// Hier where you wraps your request:
// private object request;
private readonly int value;
public RequestHandler(int value)
=> this.value = value;
public event Func<object, Task> Ready;
public async Task ProcessAsync()
{
await Task.Delay(1000 * this.value);
// Hier where you calls:
// var result = await request.ProcessAsync();
//... then do something over the result or wrap the call in try catch for example
var result = $"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]";
if (this.Ready is not null)
{
// If result passes send the result to all subscribers
await this.Ready.Invoke($"RequestHandler {this.value} - [{DateTime.Now.ToLongTimeString()}]");
}
}
}
static void Main()
{
var a = new RequestHandler(1);
a.Ready += PrintAsync;
var b = new RequestHandler(2);
b.Ready += PrintAsync;
var c = new RequestHandler(3);
c.Ready += PrintAsync;
var d= new RequestHandler(4);
d.Ready += PrintAsync;
var e = new RequestHandler(5);
e.Ready += PrintAsync;
var f = new RequestHandler(6);
f.Ready += PrintAsync;
var requests = new List<IRequestHandler>()
{
a, b, c, d, e, f
};
var tasks = requests
.Select(x => Task.Run(x.ProcessAsync));
// Hier you must await all of the tasks
Task
.Run(async () => await Task.WhenAll(tasks))
.Wait();
}
static Task PrintAsync(object output)
{
Console.WriteLine(output);
return Task.CompletedTask;
}
I would like to handle a collection in parallel, but I'm having trouble implementing it and I'm therefore hoping for some help.
The trouble arises if I want to call a method marked async in C#, within the lambda of the parallel loop. For example:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
var count = bag.Count;
The problem occurs with the count being 0, because all the threads created are effectively just background threads and the Parallel.ForEach call doesn't wait for completion. If I remove the async keyword, the method looks like this:
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, item =>
{
// some pre stuff
var responseTask = await GetData(item);
responseTask.Wait();
var response = responseTask.Result;
bag.Add(response);
// some post stuff
}
var count = bag.Count;
It works, but it completely disables the await cleverness and I have to do some manual exception handling.. (Removed for brevity).
How can I implement a Parallel.ForEach loop, that uses the await keyword within the lambda? Is it possible?
The prototype of the Parallel.ForEach method takes an Action<T> as parameter, but I want it to wait for my asynchronous lambda.
If you just want simple parallelism, you can do this:
var bag = new ConcurrentBag<object>();
var tasks = myCollection.Select(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
});
await Task.WhenAll(tasks);
var count = bag.Count;
If you need something more complex, check out Stephen Toub's ForEachAsync post.
You can use the ParallelForEachAsync extension method from AsyncEnumerator NuGet Package:
using Dasync.Collections;
var bag = new ConcurrentBag<object>();
await myCollection.ParallelForEachAsync(async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}, maxDegreeOfParallelism: 10);
var count = bag.Count;
Disclaimer: I'm the author of the AsyncEnumerator library, which is open source and licensed under MIT, and I'm posting this message just to help the community.
One of the new .NET 6 APIs is Parallel.ForEachAsync, a way to schedule asynchronous work that allows you to control the degree of parallelism:
var urls = new []
{
"https://dotnet.microsoft.com",
"https://www.microsoft.com",
"https://stackoverflow.com"
};
var client = new HttpClient();
var options = new ParallelOptions { MaxDegreeOfParallelism = 2 };
await Parallel.ForEachAsync(urls, options, async (url, token) =>
{
var targetPath = Path.Combine(Path.GetTempPath(), "http_cache", url);
var response = await client.GetAsync(url);
if (response.IsSuccessStatusCode)
{
using var target = File.OpenWrite(targetPath);
await response.Content.CopyToAsync(target);
}
});
Another example in Scott Hanselman's blog.
The source, for reference.
With SemaphoreSlim you can achieve parallelism control.
var bag = new ConcurrentBag<object>();
var maxParallel = 20;
var throttler = new SemaphoreSlim(initialCount: maxParallel);
var tasks = myCollection.Select(async item =>
{
await throttler.WaitAsync();
try
{
var response = await GetData(item);
bag.Add(response);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
var count = bag.Count;
Simplest possible extension method compiled from other answers and the article referenced by the accepted asnwer:
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync();
try
{
await asyncAction(item).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
UPDATE: here's a simple modification that also supports a cancellation token like requested in the comments (untested)
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, CancellationToken, Task> asyncAction, int maxDegreeOfParallelism, CancellationToken cancellationToken)
{
var throttler = new SemaphoreSlim(initialCount: maxDegreeOfParallelism);
var tasks = source.Select(async item =>
{
await throttler.WaitAsync(cancellationToken);
if (cancellationToken.IsCancellationRequested) return;
try
{
await asyncAction(item, cancellationToken).ConfigureAwait(false);
}
finally
{
throttler.Release();
}
});
await Task.WhenAll(tasks);
}
My lightweight implementation of ParallelForEach async.
Features:
Throttling (max degree of parallelism).
Exception handling (aggregation exception will be thrown at completion).
Memory efficient (no need to store the list of tasks).
public static class AsyncEx
{
public static async Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> asyncAction, int maxDegreeOfParallelism = 10)
{
var semaphoreSlim = new SemaphoreSlim(maxDegreeOfParallelism);
var tcs = new TaskCompletionSource<object>();
var exceptions = new ConcurrentBag<Exception>();
bool addingCompleted = false;
foreach (T item in source)
{
await semaphoreSlim.WaitAsync();
asyncAction(item).ContinueWith(t =>
{
semaphoreSlim.Release();
if (t.Exception != null)
{
exceptions.Add(t.Exception);
}
if (Volatile.Read(ref addingCompleted) && semaphoreSlim.CurrentCount == maxDegreeOfParallelism)
{
tcs.TrySetResult(null);
}
});
}
Volatile.Write(ref addingCompleted, true);
await tcs.Task;
if (exceptions.Count > 0)
{
throw new AggregateException(exceptions);
}
}
}
Usage example:
await Enumerable.Range(1, 10000).ParallelForEachAsync(async (i) =>
{
var data = await GetData(i);
}, maxDegreeOfParallelism: 100);
I've created an extension method for this which makes use of SemaphoreSlim and also allows to set maximum degree of parallelism
/// <summary>
/// Concurrently Executes async actions for each item of <see cref="IEnumerable<typeparamref name="T"/>
/// </summary>
/// <typeparam name="T">Type of IEnumerable</typeparam>
/// <param name="enumerable">instance of <see cref="IEnumerable<typeparamref name="T"/>"/></param>
/// <param name="action">an async <see cref="Action" /> to execute</param>
/// <param name="maxDegreeOfParallelism">Optional, An integer that represents the maximum degree of parallelism,
/// Must be grater than 0</param>
/// <returns>A Task representing an async operation</returns>
/// <exception cref="ArgumentOutOfRangeException">If the maxActionsToRunInParallel is less than 1</exception>
public static async Task ForEachAsyncConcurrent<T>(
this IEnumerable<T> enumerable,
Func<T, Task> action,
int? maxDegreeOfParallelism = null)
{
if (maxDegreeOfParallelism.HasValue)
{
using (var semaphoreSlim = new SemaphoreSlim(
maxDegreeOfParallelism.Value, maxDegreeOfParallelism.Value))
{
var tasksWithThrottler = new List<Task>();
foreach (var item in enumerable)
{
// Increment the number of currently running tasks and wait if they are more than limit.
await semaphoreSlim.WaitAsync();
tasksWithThrottler.Add(Task.Run(async () =>
{
await action(item).ContinueWith(res =>
{
// action is completed, so decrement the number of currently running tasks
semaphoreSlim.Release();
});
}));
}
// Wait for all tasks to complete.
await Task.WhenAll(tasksWithThrottler.ToArray());
}
}
else
{
await Task.WhenAll(enumerable.Select(item => action(item)));
}
}
Sample Usage:
await enumerable.ForEachAsyncConcurrent(
async item =>
{
await SomeAsyncMethod(item);
},
5);
In the accepted answer the ConcurrentBag is not required.
Here's an implementation without it:
var tasks = myCollection.Select(GetData).ToList();
await Task.WhenAll(tasks);
var results = tasks.Select(t => t.Result);
Any of the "// some pre stuff" and "// some post stuff" can go into the GetData implementation (or another method that calls GetData)
Aside from being shorter, there's no use of an "async void" lambda, which is an anti pattern.
The following is set to work with IAsyncEnumerable but can be modified to use IEnumerable by just changing the type and removing the "await" on the foreach. It's far more appropriate for large sets of data than creating countless parallel tasks and then awaiting them all.
public static async Task ForEachAsyncConcurrent<T>(this IAsyncEnumerable<T> enumerable, Func<T, Task> action, int maxDegreeOfParallelism, int? boundedCapacity = null)
{
ActionBlock<T> block = new ActionBlock<T>(
action,
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = maxDegreeOfParallelism,
BoundedCapacity = boundedCapacity ?? maxDegreeOfParallelism * 3
});
await foreach (T item in enumerable)
{
await block.SendAsync(item).ConfigureAwait(false);
}
block.Complete();
await block.Completion;
}
For a more simple solution (not sure if the most optimal), you can simply nest Parallel.ForEach inside a Task - as such
var options = new ParallelOptions { MaxDegreeOfParallelism = 5 }
Task.Run(() =>
{
Parallel.ForEach(myCollection, options, item =>
{
DoWork(item);
}
}
The ParallelOptions will do the throttlering for you, out of the box.
I am using it in a real world scenario to run a very long operations in the background. These operations are called via HTTP and it was designed not to block the HTTP call while the long operation is running.
Calling HTTP for long background operation.
Operation starts at the background.
User gets status ID which can be used to check the status using another HTTP call.
The background operation update its status.
That way, the CI/CD call does not timeout because of long HTTP operation, rather it loops the status every x seconds without blocking the process
I have a C# WinForms (.NET 4.5.2) app utilizing the TPL. The tool has a synchronous function which is passed over to a task factory X amount of times (with different input parameters), where X is a number declared by the user before commencing the process. The tasks are started and stored in a List<Task>.
Assuming the user entered 5, we have this in an async button click handler:
for (int i = 0; i < X; i++)
{
var progress = Progress(); // returns a new IProgress<T>
var task = Task<int>.Factory.StartNew(() => MyFunction(progress), TaskCreationOptions.LongRunning);
TaskList.Add(task);
}
Each progress instance updates the UI.
Now, as soon as a task is finished, I want to fire up a new one. Essentially, the process should run indefinitely, having X tasks running at any given time, unless the user cancels via the UI (I'll use cancellation tokens for this). I try to achieve this using the following:
while (TaskList.Count > 0)
{
var completed = await Task.WhenAny(TaskList.ToArray());
if (completed.Exception == null)
{
// report success
}
else
{
// flatten AggregateException, print out, etc
}
// update some labels/textboxes in the UI, and then:
TaskList.Remove(completed);
var task = Task<int>.Factory.StartNew(() => MyFunction(progress), TaskCreationOptions.LongRunning);
TaskList.Add(task);
}
This is bogging down the UI. Is there a better way of achieving this functionality, while keeping the UI responsive?
A suggestion was made in the comments to use TPL Dataflow but due to time constraints and specs, alternative solutions are welcome
Update
I'm not sure whether the progress reporting might be the problem? Here's what it looks like:
private IProgress<string> Progress()
{
return new Progress<string>(msg =>
{
txtMsg.AppendText(msg);
});
}
Now, as soon as a task is finished, I want to fire up a new one. Essentially, the process should run indefinitely, having X tasks running at any given time
It sounds to me like you want an infinite loop inside your task:
for (int i = 0; i < X; i++)
{
var progress = Progress(); // returns a new IProgress<T>
var task = RunIndefinitelyAsync(progress);
TaskList.Add(task);
}
private async Task RunIndefinitelyAsync(IProgress<T> progress)
{
while (true)
{
try
{
await Task.Run(() => MyFunction(progress));
// handle success
}
catch (Exception ex)
{
// handle exceptions
}
// update some labels/textboxes in the UI
}
}
However, I suspect that the "bogging down the UI" is probably in the // handle success and/or // handle exceptions code. If my suspicion is correct, then push as much of the logic into the Task.Run as possible.
As I understand, you simply need a parallel execution with the defined degree of parallelization. There is a lot of ways to implement what you want. I suggest to use blocking collection and parallel class instead of tasks.
So when user clicks button, you need to create a new blocking collection which will be your data source:
BlockingCollection<IProgress> queue = new BlockingCollection<IProgress>();
CancellationTokenSource source = new CancellationTokenSource();
Now you need a runner that will execute your in parallel:
Task.Factory.StartNew(() =>
Parallel.For(0, X, i =>
{
foreach (IProgress p in queue.GetConsumingEnumerable(source.Token))
{
MyFunction(p);
}
}), source.Token);
Or you can choose more correct way with partitioner. So you'll need a partitioner class:
private class BlockingPartitioner<T> : Partitioner<T>
{
private readonly BlockingCollection<T> _Collection;
private readonly CancellationToken _Token;
public BlockingPartitioner(BlockingCollection<T> collection, CancellationToken token)
{
_Collection = collection;
_Token = token;
}
public override IList<IEnumerator<T>> GetPartitions(int partitionCount)
{
throw new NotImplementedException();
}
public override IEnumerable<T> GetDynamicPartitions()
{
return _Collection.GetConsumingEnumerable(_Token);
}
public override bool SupportsDynamicPartitions
{
get { return true; }
}
}
And runner will looks like this:
ParallelOptions Options = new ParallelOptions();
Options.MaxDegreeOfParallelism = X;
Task.Factory.StartNew(
() => Parallel.ForEach(
new BlockingPartitioner<IProgress>(queue, source.Token),
Options,
p => MyFunction(p)));
So all you need right now is to fill queue with necessary data. You can do it whenever you want.
And final touch, when the user cancels operation, you have two options:
first you can break execution with source.Cancel call,
or you can gracefully stop execution by marking collection complete (queue.CompleteAdding), in that case runner will execute all already queued data and finish.
Of course you need additional code to handle exceptions, progress, state and so on. But main idea is here.
At least as I've implemented it in my code, I had to modify the StartNew Task to get the same behavior. In my View there's a start button. It's IsEnabled property is bound to a Boolean in the View Model. Without adding await task.ContinueWith(_ => true); and moving return true; out of the try block, the PopulateListStartNew Task doesn't wait so the button stays enabled. I prefer to use Task.Factory.StartNew because passing a TaskScheduler makes for more readable code (no Dispatcher clutter). Records is an ObservableCollection.
I thought that Task.Run was basically a shortcut (per Task.Run vs Task.Factory.StartNew. At any rate, I'd like to better understand the difference in behavior and would certainly appreciate any suggestions related to making my example code better.
public async Task<bool> PopulateListTaskRun(CancellationToken cancellationToken)
{
try
{
await Task.Run(async () =>
{
// Clear the records out first, if any
Application.Current.Dispatcher.InvokeAsync(() => Records.Clear());
for (var i = 0; i < 10; i++)
{
if (cancellationToken.IsCancellationRequested)
{
return;
}
// Resharper says do this to avoid "Access to modified closure"
var i1 = i;
Application.Current.Dispatcher.InvokeAsync(() =>
{
Records.Add(new Model
{
Name = NamesList[i1],
Number = i1
});
Status = "cur: " +
i1.ToString(
CultureInfo.InvariantCulture);
});
// Artificial delay so we can see what's going on
await Task.Delay(200);
}
Records[0].Name = "Yes!";
}, cancellationToken);
return true;
}
catch (Exception)
{
return false;
}
}
public async Task<bool> PopulateListStartNew(CancellationToken cancellationToken, TaskScheduler taskScheduler)
{
try
{
var task = await Task.Factory.StartNew(async () =>
{
// Clear the records out first, if any
Records.Clear();
for (var i = 0; i < 10; i++)
{
if (cancellationToken.IsCancellationRequested)
{
return;
}
Records.Add(new Model
{
Name = NamesList[i],
Number = i
});
Status = "cur: " +
i.ToString(
CultureInfo.InvariantCulture);
// Artificial delay so we can see what's going on
await Task.Delay(200);
}
Records[0].Name = "Yes!";
}, cancellationToken, TaskCreationOptions.None, taskScheduler);
// Had to add this
await task.ContinueWith(_ => true);
}
catch (Exception)
{
return false;
}
// Had to move this out of try block
return true;
}
The link you posted in your question has the answer: Task.Run understands and unwraps async Task delegates, while StartNew returns a Task<Task> instead, which you have to unwrap yourself by calling Unwrap or doing a double-await.
However, I recommend you completely rewrite the code as follows. Notes:
Don't use Dispatcher. There shouldn't be a need for it with properly-written async code.
Treat all your background worker methods and asynchronous operations as "services" for your UI thread. So your method will return to the UI context periodically as necessary.
Like this:
public async Task<bool> PopulateListTaskRunAsync(CancellationToken cancellationToken)
{
try
{
// Clear the records out first, if any
Records.Clear();
for (var i = 0; i < 10; i++)
{
cancellationToken.ThrowIfCancellationRequested();
Records.Add(new Model
{
Name = NamesList[i],
Number = i
});
Status = "cur: " + i.ToString(CultureInfo.InvariantCulture);
// Artificial delay so we can see what's going on
await Task.Delay(200);
}
Records[0].Name = "Yes!";
return true;
}
catch (Exception)
{
return false;
}
}
I'm too not comfortable with all this plumbing but I'll try to answer.
First why your second code does not work:
you give StartNew an async delegate which is something like Func<Task> so StartNew will return a Task<Task> and you wait on the outer task which ends immediately as it consists in returning the inner Task (not really sure about that)
then you await the continuation of the inner task, the inner thread of execution, what you intended to do; but I guess it should be the same if you awaited directly the inner task itself this way:
await await Task.Factory.StartNew(async ...
Why your first code works:
according to the MSDN documentation Task.Run directly returns a Task object, the inner task I guess
so you directly await for the inner task, not an intermediate one, so it just works as expected
At least this is my understanding and keep in mind I've not yet played with all this stuff (no VS 2012). :)
Given the following:
BufferBlock<int> sourceBlock = new BufferBlock<int>();
TransformBlock<int, int> targetBlock = new TransformBlock<int, int>(element =>
{
return element * 2;
});
sourceBlock.LinkTo(targetBlock, new DataflowLinkOptions { PropagateCompletion = true });
//feed some elements into the buffer block
for(int i = 1; i <= 1000000; i++)
{
sourceBlock.SendAsync(i);
}
sourceBlock.Complete();
targetBlock.Completion.ContinueWith(_ =>
{
//notify completion of the target block
});
The targetBlock never seems to complete and I think the reason is that all the items in the TransformBlock targetBlock are waiting in the output queue as I have not linked the targetBlock to any other Dataflow block. However, what I actually want to achieve is a notification when (A) the targetBlock is notified of completion AND (B) the input queue is empty. I do not want to care whether items still sit in the output queue of the TransformBlock. How can I go about that? Is the only way to get what I want to query the completion status of the sourceBlock AND to make sure the InputCount of the targetBlock is zero? I am not sure this is very stable (is the sourceBlock truly only marked completed if the last item in the sourceBlock has been passed to the targetBlock?). Is there a more elegant and more efficient way to get to the same goal?
Edit: I just noticed even the "dirty" way to check on completion of the sourceBlock AND InputCount of the targetBlock being zero is not trivial to implement. Where would that block sit? It cannot be within the targetBlock because once above two conditions are met obviously no message is processed within targetBlock anymore. Also checking on the completion status of the sourceBlock introduces a lot of inefficiency.
I believe you can't directly do this. It's possible you could get this information from some private fields using reflection, but I wouldn't recommend doing that.
But you can do this by creating custom blocks. In the case of Complete() it's simple: just create a block that forwards each method to the original block. Except Complete(), where it will also log it.
In the case of figuring out when processing of all items is complete, you could link your block to an intermediate BufferBlock. This way, the output queue will be emptied quickly and so checking Completed of the internal block would give you fairly accurate measurement of when the processing is complete. This would affect your measurements, but hopefully not significantly.
Another option would be to add some logging at the end of the block's delegate. This way, you could see when processing of the last item was finished.
It would be nice if the TransformBlock had a ProcessingCompleted event that would fire when the block has completed the processing of all messages in its queue, but there is no such event. Below is an attempt to rectify this omission. The CreateTransformBlockEx method accepts an Action<Exception> handler, that is invoked when this "event" occurs.
The intention was to always invoke the handler before the final completion of the block. Unfortunately in the case that the supplied CancellationToken is canceled, the completion (cancellation) happens first, and the handler is invoked some milliseconds later. To fix this inconsistency would require some tricky workarounds, and may had other unwanted side-effects, so I am leaving it as is.
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockEx<TInput, TOutput>(Func<TInput, Task<TOutput>> transform,
Action<Exception> onProcessingCompleted,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
if (onProcessingCompleted == null)
throw new ArgumentNullException(nameof(onProcessingCompleted));
dataflowBlockOptions = dataflowBlockOptions ?? new ExecutionDataflowBlockOptions();
var transformBlock = new TransformBlock<TInput, TOutput>(transform,
dataflowBlockOptions);
var bufferBlock = new BufferBlock<TOutput>(dataflowBlockOptions);
transformBlock.LinkTo(bufferBlock);
PropagateCompletion(transformBlock, bufferBlock, onProcessingCompleted);
return DataflowBlock.Encapsulate(transformBlock, bufferBlock);
async void PropagateCompletion(IDataflowBlock block1, IDataflowBlock block2,
Action<Exception> completionHandler)
{
try
{
await block1.Completion.ConfigureAwait(false);
}
catch { }
var exception =
block1.Completion.IsFaulted ? block1.Completion.Exception : null;
try
{
// Invoke the handler before completing the second block
completionHandler(exception);
}
finally
{
if (exception != null) block2.Fault(exception); else block2.Complete();
}
}
}
// Overload with synchronous lambda
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockEx<TInput, TOutput>(Func<TInput, TOutput> transform,
Action<Exception> onProcessingCompleted,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
return CreateTransformBlockEx<TInput, TOutput>(
x => Task.FromResult(transform(x)), onProcessingCompleted,
dataflowBlockOptions);
}
The code of the local function PropagateCompletion mimics the source code of the LinkTo built-in method, when invoked with the PropagateCompletion = true option.
Usage example:
var httpClient = new HttpClient();
var downloader = CreateTransformBlockEx<string, string>(async url =>
{
return await httpClient.GetStringAsync(url);
}, onProcessingCompleted: ex =>
{
Console.WriteLine($"Download completed {(ex == null ? "OK" : "Error")}");
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 10
});
First thing it is not right to use a IPropagator Block as a leaf terminal. But still your requirement can be fulfilled by asynchronously checking the output buffer of the TargetBlock for output messages and then consuming then so that the buffer could be emptied.
` BufferBlock<int> sourceBlock = new BufferBlock<int>();
TransformBlock<int, int> targetBlock = new TransformBlock<int, int>
(element =>
{
return element * 2;
});
sourceBlock.LinkTo(targetBlock, new DataflowLinkOptions {
PropagateCompletion = true });
//feed some elements into the buffer block
for (int i = 1; i <= 100; i++)
{
sourceBlock.SendAsync(i);
}
sourceBlock.Complete();
bool isOutputAvailable = await targetBlock.OutputAvailableAsync();
while(isOutputAvailable)
{
int value = await targetBlock.ReceiveAsync();
isOutputAvailable = await targetBlock.OutputAvailableAsync();
}
await targetBlock.Completion.ContinueWith(_ =>
{
Console.WriteLine("Target Block Completed");//notify completion of the target block
});
`