TPL Dataflow, guarantee completion only when ALL source data blocks completed

TPL Dataflow, guarantee completion only when ALL source data blocks completed - c#

How can I re-write the code that the code completes when BOTH transformblocks completed? I thought completion means that it is marked complete AND the " out queue" is empty?
public Test()
{
broadCastBlock = new BroadcastBlock<int>(i =>
{
return i;
});
transformBlock1 = new TransformBlock<int, string>(i =>
{
Console.WriteLine("1 input count: " + transformBlock1.InputCount);
Thread.Sleep(50);
return ("1_" + i);
});
transformBlock2 = new TransformBlock<int, string>(i =>
{
Console.WriteLine("2 input count: " + transformBlock1.InputCount);
Thread.Sleep(20);
return ("2_" + i);
});
processorBlock = new ActionBlock<string>(i =>
{
Console.WriteLine(i);
});
//Linking
broadCastBlock.LinkTo(transformBlock1, new DataflowLinkOptions { PropagateCompletion = true });
broadCastBlock.LinkTo(transformBlock2, new DataflowLinkOptions { PropagateCompletion = true });
transformBlock1.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true });
transformBlock2.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true });
}
public void Start()
{
const int numElements = 100;
for (int i = 1; i <= numElements; i++)
{
broadCastBlock.SendAsync(i);
}
//mark completion
broadCastBlock.Complete();
processorBlock.Completion.Wait();
Console.WriteLine("Finished");
Console.ReadLine();
}
}
I edited the code, adding an input buffer count for each transform block. Clearly all 100 items are streamed to each of the transform blocks. But as soon as one of the transformblocks finishes the processorblock does not accept any more items and instead the input buffer of the incomplete transformblock just flushes the input buffer.

The issue is exactly what casperOne said in his answer. Once the first transform block completes, the processor block goes into “finishing mode”: it will process remaining items in its input queue, but it won't accept any new items.
There is a simpler fix than splitting your processor block in two though: don't set PropagateCompletion, but instead set completion of the processor block manually when both transform blocks complete:
Task.WhenAll(transformBlock1.Completion, transformBlock2.Completion)
.ContinueWith(_ => processorBlock.Complete());

The issue here is that you are setting the PropagateCompletion property each time you call the LinkTo method to link the blocks and the different in wait times in your transformation blocks.
From the documentation for the Complete method on the IDataflowBlock interface (emphasis mine):
Signals to the IDataflowBlock that it should not accept nor produce any more messages nor consume any more postponed messages.
Because you stagger out your wait times in each of the TransformBlock<TInput, TOutput> instances, transformBlock2 (waiting for 20 ms) is finished before transformBlock1 (waiting for 50 ms). transformBlock2 completes first, and then sends the signal to processorBlock which then says "I'm not accepting anything else" (and transformBlock1 hasn't produced all of its messages yet).
Note that the processing of transformBlock1 before transformBlock1 is not absolutely guaranteed; it's feasible that the thread pool (assuming you're using the default scheduler) will process the tasks in a different order (but more than likely will not, as it will steal work from the queues once the 20 ms items are done).
Your pipeline looks like this:
broadcastBlock
/ \
transformBlock1 transformBlock2
\ /
processorBlock
In order to get around this, you want to have a pipeline that looks like this:
broadcastBlock
/ \
transformBlock1 transformBlock2
| |
processorBlock1 processorBlock2
Which is accomplished by just creating two separate ActionBlock<TInput> instances, like so:
// The action, can be a method, makes it easier to share.
Action<string> a = i => Console.WriteLine(i);
// Create the processor blocks.
processorBlock1 = new ActionBlock<string>(a);
processorBlock2 = new ActionBlock<string>(a);
// Linking
broadCastBlock.LinkTo(transformBlock1,
new DataflowLinkOptions { PropagateCompletion = true });
broadCastBlock.LinkTo(transformBlock2,
new DataflowLinkOptions { PropagateCompletion = true });
transformBlock1.LinkTo(processorBlock1,
new DataflowLinkOptions { PropagateCompletion = true });
transformBlock2.LinkTo(processorBlock2,
new DataflowLinkOptions { PropagateCompletion = true });
You then need to wait on both processor blocks instead of just one:
Task.WhenAll(processorBlock1.Completion, processorBlock2.Completion).Wait();
A very important note here; when creating an ActionBlock<TInput>, the default is to have the MaxDegreeOfParallelism property on the ExecutionDataflowBlockOptions instance passed to it set to one.
This means that the calls to the Action<T> delegate that you pass to the ActionBlock<TInput> are thread-safe, only one will execute at a time.
Because you now have two ActionBlock<TInput> instances pointing to the same Action<T> delegate, you aren't guaranteed thread-safety.
If your method is thread-safe, then you don't have to do anything (which would allow you to set the MaxDegreeOfParallelism property to DataflowBlockOptions.Unbounded, since there's no reason to block).
If it's not thread-safe, and you need to guarantee it, you need to resort to traditional synchronization primitives, like the lock statement.
In this case, you'd do it like so (although it's clearly not needed, as the WriteLine method on the Console class is thread-safe):
// The lock.
var l = new object();
// The action, can be a method, makes it easier to share.
Action<string> a = i => {
// Ensure one call at a time.
lock (l) Console.WriteLine(i);
};
// And so on...

An addition to svick's answer: to be consistent with the behaviour you get with the PropagateCompletion option, you also need to forward exceptions in case a preceding block faulted. An extension method like the following takes care of that as well:
public static void CompleteWhenAll(this IDataflowBlock target, params IDataflowBlock[] sources) {
if (target == null) return;
if (sources.Length == 0) { target.Complete(); return; }
Task.Factory.ContinueWhenAll(
sources.Select(b => b.Completion).ToArray(),
tasks => {
var exceptions = (from t in tasks where t.IsFaulted select t.Exception).ToList();
if (exceptions.Count != 0) {
target.Fault(new AggregateException(exceptions));
} else {
target.Complete();
}
}
);
}

Here is a method that is functionally equivalent to pkt's CompleteWhenAll method, but with slightly less code:
public static void PropagateCompletion(IDataflowBlock[] sources,
IDataflowBlock target)
{
// Arguments validation omitted
Task allSourcesCompletion = Task.WhenAll(sources.Select(s => s.Completion));
ThreadPool.QueueUserWorkItem(async _ =>
{
try { await allSourcesCompletion.ConfigureAwait(false); } catch { }
Exception exception = allSourcesCompletion.IsFaulted ?
allSourcesCompletion.Exception : null;
if (exception is null) target.Complete(); else target.Fault(exception);
});
}
Usage example:
PropagateCompletion(new[] { transformBlock1, transformBlock2 }, processorBlock);
The PropagateCompletion method is a variant of a more general method with the same name, that I have posted here.

Other answers are quite clear about why PropagateCompletion=true mess things up when a block has more than two sources.
To provide a simple solution to the problem, you may want to look at an open source library DataflowEx that solves this kind of problem with smarter completion rules built-in. (It uses TPL Dataflow linking internally but supports complex completion propagation. The implementation looks similiar to WhenAll but also handles the dynamic link adding. Please check Dataflow.RegisterDependency() and TaskEx.AwaitableWhenAll() for impl detail.)
I slightly changed your code to make everything work using DataflowEx:
public CompletionDemo1()
{
broadCaster = new BroadcastBlock<int>(
i =>
{
return i;
}).ToDataflow();
transformBlock1 = new TransformBlock<int, string>(
i =>
{
Console.WriteLine("1 input count: " + transformBlock1.InputCount);
Thread.Sleep(50);
return ("1_" + i);
});
transformBlock2 = new TransformBlock<int, string>(
i =>
{
Console.WriteLine("2 input count: " + transformBlock2.InputCount);
Thread.Sleep(20);
return ("2_" + i);
});
processor = new ActionBlock<string>(
i =>
{
Console.WriteLine(i);
}).ToDataflow();
/** rather than TPL linking
broadCastBlock.LinkTo(transformBlock1, new DataflowLinkOptions { PropagateCompletion = true });
broadCastBlock.LinkTo(transformBlock2, new DataflowLinkOptions { PropagateCompletion = true });
transformBlock1.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true });
transformBlock2.LinkTo(processorBlock, new DataflowLinkOptions { PropagateCompletion = true });
**/
//Use DataflowEx linking
var transform1 = transformBlock1.ToDataflow();
var transform2 = transformBlock2.ToDataflow();
broadCaster.LinkTo(transform1);
broadCaster.LinkTo(transform2);
transform1.LinkTo(processor);
transform2.LinkTo(processor);
}
Full code is here.
Disclaimer: I am the author of DataflowEx, which is published under MIT license.

Related

Can not run TPL Dataflow pipeline

I am trying to create a pipeline using TPL Dataflow where i can store messages in a batch block , and whenever its treshold is hit it would send the data to an action block.I have added a buffer block in case the action block is too slow.
So far i have tried all possible methods to move data from the first block to the second to no avail. I have linked the blocks , added the DataFlowLinkOptions of PropagateCompletion set to true. What else do I have to do in order for this pipeline to work ?
Pipeline
class LogPipeline<T>
{
private ActionBlock<T[]> actionBlock;
private BufferBlock<T> bufferBlock;
private BatchBlock<T> batchBlock;
private readonly Action<T[]> action;
private readonly int BufferSize;
private readonly int BatchSize;
public LogPipeline(Action<T[]> action, int bufferSize = 4, int batchSize = 2)
{
this.BufferSize = bufferSize;
this.BatchSize = batchSize;
this.action = action;
}
private void Initialize()
{
this.bufferBlock = new BufferBlock<T>(new DataflowBlockOptions
{ TaskScheduler = TaskScheduler.Default,
BoundedCapacity = this.BufferSize });
this.actionBlock = new ActionBlock<T[]>(this.action);
this.batchBlock = new BatchBlock<T>(BatchSize);
this.bufferBlock.LinkTo(this.batchBlock, new DataflowLinkOptions
{ PropagateCompletion = true });
this.batchBlock.LinkTo(this.actionBlock, new DataflowLinkOptions
{ PropagateCompletion = true });
}
public void Post(T log)
{
this.bufferBlock.Post(log);
}
public void Start()
{
this.Initialize();
}
public void Stop()
{
actionBlock.Complete();
}
}
Test
[TestCase(100, 1000, 5)]
public void CanBatchPipelineResults(int batchSize, int bufferSize, int cycles)
{
List<int> data = new List<int>();
LogPipeline<int> logPipeline = new LogPipeline<int>(
batchSize: batchSize,
bufferSize: bufferSize,
action: (logs) =>
{
data.AddRange(logs);
});
logPipeline.Start();
int SelectWithEffect(int element)
{
logPipeline.Post(element);
return 3;
}
int count = 0;
while (true)
{
if (count++ > cycles)
{
break;
}
var sent = Parallel.For(0, bufferSize, (x) => SelectWithEffect(x));
}
logPipeline.Stop();
Assert.IsTrue(data.Count == cycles * batchSize);
}
Why are all my blocks empty besides the buffer? I have tried with SendAsync also to no avail. No data is moved from the first block to the next no matter what I do.
I have both with and without the link options.
Update :
I have completely erased the pipeline and also the Parallel.
I have tried with all kinds of input blocks (batch/buffer/transform) and it seems there is no way subsequent blocks are getting something.
I have also tried with await SendAsync as well as Post.
I have only tried within unit tests classes.
Could this be the issue ?
Update 2
I was wrong complicating things , i have tried a more simple example . Inside a testcase even this doesnt work:
List<int> items=new List<int>();
var tf=new TransformBlock<int,int>(x=>x+1);
var action= new ActionBlock<int>(x=>items.Add(x));
tf.LinkTo(action, new DataFlowOptions{ PropagateCompletion=true});
tf.Post(3);
//Breakpoint here

The reason nothing seems to happen before the test ends is that none of the block has a chance to run. The code blocks all CPUs by using Parallel.For so no other task has a chance to run. This means that all posted messages are still in the first block. The code then calls Complete on the last block but doesn't even await for it to finish processing before checking the results.
The code can be simplified a lot. For starters, all blocks have input buffers, they don't need extra buffering.
The pipeline could be replaced with just this :
//Arrange
var list=new List<int>();
var head=new BatchBlock<int>(BatchSize);
var act=new ActionBlock<int[]>(nums=>list.AddRange(nums);
var options= new DataflowLinkOptions{ PropagateCompletion = true };
head.LinkTo(act);
//ACT
//Just fire everything at once, because why not
var tasks=Enumerable.Range(0,cycles)(
i=>Task.Run(()=> head.Post(i)));
await tasks;
//Tell the head block we're done
head.Complete();
//Wait for the last block to complete
await act.Completion;
//ASSERT
Assert.Equal(cycles, data.Count);
There's no real need to create a complex class to encapsulate the pipeline. It doesn't "start" - the blocks do nothing if they have no data. To abstract it, one only needs to provide access to the head block and the last block's Completion task

By calling logPipeline.Stop immediately after sending the data to the BufferBlock, you are completing the ActionBlock, and so it declines all messages that the BatchBlock is trying later to send to it. From the documentation of the ActionBlock.Complete method:
Signals to the dataflow block that it shouldn't accept or produce any more messages and shouldn't consume any more postponed messages.
Update: Regarding the updated requirements in the question:
Whenever its threshold is hit it would send the data to an action block.
...my suggestion is to move this logic inside the LogPipeline.Post method. The method BufferBlock.Post returns false if the block hasn't accepted the data sent to it.
public void Post(T log)
{
if (!this.bufferBlock.Post(log)) this.actionBlock.Post(log);
}

C# using DataflowBlock.Completion to cancel consumer tasks instead of CancellationToken

I'm wondering if there is a neat way for IDataflowBlock.Completion to replace needing to use a cancellation token for ReceiveAsync or a similar method which consumes from BufferBlock or another IDataflowBlock.
IDataflowBlock.ReceiveAsync<T>(TimeSpan, CancellationToken)
If InputQueue is a BufferBlock:
BufferBlock<String> InputQueue
for (int i = 0; i < 26; i++)
{
await InputQueue.SendAsync(((char)(97 + i)).ToString());
}
If InputQueue.Complete(); has been called, then when the queue is emptied and IDataflowBlock.Completion will change to status RanToCompletion,
which can be checked with IDataflowBlock.Completion.IsCompleted.
If multiple threads are taking from the queue this could happen during InputQueue.ReceiveAsync, is there a neater alternative to handle InputQueue completing than:
try
{
String parcel = await InputQueue.ReceiveAsync(timeSpan);
}
catch(InvalidOperationException x)
{
}

The simplest way to cancel a Dataflow Block is to provide the token to block's constructor, like this:
new ExecutionDataflowBlockOptions
{
CancellationToken = cancellationSource.Token
});
CancellationToken is defined in DataflowBlockOptions class, so even BufferBlock could be canceled.
Why are you implementing the Receive logic by yourself? Is there some restriction no to use the PropagateCompletion with linking your blocks? For example, if your code looks like this:
internal void HandleMessage()
{
try
{
var parcel = await InputQueue.ReceiveAsync(timeSpan);
// handle parsel
}
catch(InvalidOperationException x)
{
}
}
Then you simply may use the ActionBlock like this:
var InputQueue = new BufferBlock<string>();
var Handler = new ActionBlock<string>(parcel =>
{
// handle parsel
});
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
InputQueue.LinkTo(Handler, linkOptions);
// now after you call Complete method for InputQueue the completion will be propagated to your Handler block:
for (int i = 0; i < 26; i++)
{
await InputQueue.SendAsync(((char)(97 + i)).ToString());
}
InputQueue.Complete();
await Handler.Completion;
Also note that if you need some interaction with UI, you may use your last block as IObservable with Rx.Net library.

Propagating completion

I have a very basic, linear pipeline, in which I'd like to propagate completion and wait until everything completes:
static void Main(string[] args)
{
ExecutePipeline().Wait();
}
static async Task ExecutePipeline()
{
var addBlock = new TransformBlock<int, int>(x =>
{
var result = x + 2;
Console.WriteLine(result);
return result;
});
var subBlock = new TransformBlock<int, int>(x =>
{
var result = x - 2;
Console.WriteLine(result);
return result;
});
var mulBlock = new TransformBlock<int, int>(x =>
{
var result = x * 2;
Console.WriteLine(result);
return result;
});
var divBlock = new TransformBlock<int, int>(x =>
{
var result = x / 2;
Console.WriteLine(result);
return result;
});
var flowOptions = new DataflowLinkOptions { PropagateCompletion = true };
addBlock.LinkTo(mulBlock, flowOptions);
mulBlock.LinkTo(subBlock, flowOptions);
subBlock.LinkTo(divBlock, flowOptions);
addBlock.Post(4);
addBlock.Complete();
mulBlock.Complete();
subBlock.Complete();
await divBlock.Completion;
}
Unfortunately, in its current state, only the result of addBlock gets printed and the program terminates, instead of printing all of the results before termination.
If I comment out all of the lines which call Complete() on their blocks or if I leave addBlock.Complete() uncommented, I get a printout of all results in the pipeline, but the program never ends, since the completion is not propagated. However, if I unblock either mulBlock.Complete() or subBlock.Complete(), similarly to how the default code behaves, the program prints out the result of addBlock and terminates.
What's interesting is that uncommenting either of those two last mentioned blocks or all of them has the same behavior, which makes me question how the completion propagates if one of them is commented. Obviously, I'm missing something in the logic, but I just can't figure out what it is. How would I accomplish the desired behavior of printing all of the results?
EDIT:
So, I finally found something that worked for me at https://stackoverflow.com/a/26803579/2006048
It appears that I needed to change the last block of code to simply this:
addBlock.Post(4);
addBlock.Complete();
await addBlock.Completion;
The original code did not work because Complete() was called on each block before data could propagate, so it was a case of a race condition.
However, with this new edited code, it's calling Complete() on addBlock and awaits for its completion. This makes the program work as intended, but leaves me yet more confused. Why is it that Completion must be awaited from the addBlock and not from the last block in the chain, which is divBlock? I would think that Completion() is only called on addBlock because PropagationCompletion is set to true, but then I would think that we would wait for completion of the last block, not the first one.
If I await for the completion of mulBlock, then only the results of addBlock get printed. If I await for the completion of subBlock, the results of addBlock and mulBlock get printed. If I await for completion of divBlock, the results of addBlock, mulBlock and subBlock get printed.
I was basing my code on Stephen Cleary's Concurrency in C# Cookbook example (Section 4.1 Linking Blocks (Page 48)):
var multiplyBlock = new TransformBlock<int, int>(item => item * 2);
var subtractBlock = new TransformBlock<int, int>(item => item - 2);
var options = new DataflowLinkOptions { PropagateCompletion = true };
multiplyBlock.LinkTo(subtractBlock, options);
...
// The first block's completion is automatically propagated to the second block.
multiplyBlock.Complete();
await subtractBlock.Completion;
When I setup Cleary's code to match what I have, the same behavior is exhibited. Program prints result and terminates only when I await for multiplyBlock.Completion.

The problem is that a block completes only after all its queues are emptied, which includes the output queue. What happens in your case is that the completion propagates correctly, but then divBlock gets stuck in the "almost complete" mode, waiting for the item in its output queue to be removed.
To solve this, you can either change divBlock to be an ActionBlock, or you can link it to a DataflowBlock.NullTarget<int>().

How do I signal completion of my dataflow?

I've got a class the implements a dataflow composed of 3 steps using TPL Dataflow.
In the constructor I create the steps as TransformBlocks and link them up using LinkTo with DataflowLinkOptions.PropagateCompletion set to true. The class exposes a single method which kicks of the workflow by calling SendAsync on the 1st step. The method returns the "Completion" property of the final step of the workflow.
At the moment the steps in the workflow appear to execute as expected but final step never completes unless I explicitly call Complete on it. But doing that short-circuits the workflow and none of the steps are executed? What am I doing wrong?
public class MessagePipeline {
private TransformBlock<object, object> step1;
private TransformBlock<object, object> step2;
private TransformBlock<object, object> step3;
public MessagePipeline() {
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
step1 = new TransformBlock<object, object>(
x => {
Console.WriteLine("Step1...");
return x;
});
step2 = new TransformBlock<object, object>(
x => {
Console.WriteLine("Step2...");
return x;
});
step3 = new TransformBlock<object, object>(
x => {
Console.WriteLine("Step3...");
return x;
});
step1.LinkTo(step2, linkOptions);
step2.LinkTo(step3, linkOptions);
}
public Task Push(object message) {
step1.SendAsync(message);
step1.Complete();
return step3.Completion;
}
}
...
public class Program {
public static void Main(string[] args) {
var pipeline = new MessagePipeline();
var result = pipeline.Push("Hello, world!");
result.ContinueWith(_ => Console.WriteLine("Completed"));
Console.ReadLine();
}
}

When you link the steps, you need to pass a DataflowLinkOptions with the the PropagateCompletion property set to true to propagate both completion and errors. Once you do that, calling Complete() on the first block will propagete completion to downstream blocks.
Once a block receives the completion event, it finishes processing then notifies its linked downstream targets.
This way you can post all your data to the first step and call Complete(). The final block will only complete when all upstream blocks have completed.
For example,
var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
myFirstBlock.LinkTo(mySecondBlock,linkOptions);
mySecondBlock.LinkTo(myFinalBlock,linkOptions);
foreach(var message in messages)
{
myFirstBlock.Post(message);
}
myFirstBlock.Complete();
......
await myFinalBlock.Completion;
PropagateCompletion isn't true by default because in more complex scenarios (eg non-linear flows, or dynamically changing flows) you don't want completion and errors to propagate automatically. You may also want to avoid automatic completion if you want to handle errors without terminating the entire flow.
Way back when TPL Dataflow was in beta the default was true but this was changed on RTM
UPDATE
The code never completes because the final step is a TransformBlock with no linked target to receive its output. This means that even though the block received the completion signal, it hasn't finished all its work and can't change its own Completion status.
Changing it to an ActionBlock<object> removes the issue.

You need to explicitly call Complete.

TPL Dataflow, can I query whether a data block is marked complete but has not yet completed?

Given the following:
BufferBlock<int> sourceBlock = new BufferBlock<int>();
TransformBlock<int, int> targetBlock = new TransformBlock<int, int>(element =>
{
return element * 2;
});
sourceBlock.LinkTo(targetBlock, new DataflowLinkOptions { PropagateCompletion = true });
//feed some elements into the buffer block
for(int i = 1; i <= 1000000; i++)
{
sourceBlock.SendAsync(i);
}
sourceBlock.Complete();
targetBlock.Completion.ContinueWith(_ =>
{
//notify completion of the target block
});
The targetBlock never seems to complete and I think the reason is that all the items in the TransformBlock targetBlock are waiting in the output queue as I have not linked the targetBlock to any other Dataflow block. However, what I actually want to achieve is a notification when (A) the targetBlock is notified of completion AND (B) the input queue is empty. I do not want to care whether items still sit in the output queue of the TransformBlock. How can I go about that? Is the only way to get what I want to query the completion status of the sourceBlock AND to make sure the InputCount of the targetBlock is zero? I am not sure this is very stable (is the sourceBlock truly only marked completed if the last item in the sourceBlock has been passed to the targetBlock?). Is there a more elegant and more efficient way to get to the same goal?
Edit: I just noticed even the "dirty" way to check on completion of the sourceBlock AND InputCount of the targetBlock being zero is not trivial to implement. Where would that block sit? It cannot be within the targetBlock because once above two conditions are met obviously no message is processed within targetBlock anymore. Also checking on the completion status of the sourceBlock introduces a lot of inefficiency.

I believe you can't directly do this. It's possible you could get this information from some private fields using reflection, but I wouldn't recommend doing that.
But you can do this by creating custom blocks. In the case of Complete() it's simple: just create a block that forwards each method to the original block. Except Complete(), where it will also log it.
In the case of figuring out when processing of all items is complete, you could link your block to an intermediate BufferBlock. This way, the output queue will be emptied quickly and so checking Completed of the internal block would give you fairly accurate measurement of when the processing is complete. This would affect your measurements, but hopefully not significantly.
Another option would be to add some logging at the end of the block's delegate. This way, you could see when processing of the last item was finished.

It would be nice if the TransformBlock had a ProcessingCompleted event that would fire when the block has completed the processing of all messages in its queue, but there is no such event. Below is an attempt to rectify this omission. The CreateTransformBlockEx method accepts an Action<Exception> handler, that is invoked when this "event" occurs.
The intention was to always invoke the handler before the final completion of the block. Unfortunately in the case that the supplied CancellationToken is canceled, the completion (cancellation) happens first, and the handler is invoked some milliseconds later. To fix this inconsistency would require some tricky workarounds, and may had other unwanted side-effects, so I am leaving it as is.
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockEx<TInput, TOutput>(Func<TInput, Task<TOutput>> transform,
Action<Exception> onProcessingCompleted,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
if (onProcessingCompleted == null)
throw new ArgumentNullException(nameof(onProcessingCompleted));
dataflowBlockOptions = dataflowBlockOptions ?? new ExecutionDataflowBlockOptions();
var transformBlock = new TransformBlock<TInput, TOutput>(transform,
dataflowBlockOptions);
var bufferBlock = new BufferBlock<TOutput>(dataflowBlockOptions);
transformBlock.LinkTo(bufferBlock);
PropagateCompletion(transformBlock, bufferBlock, onProcessingCompleted);
return DataflowBlock.Encapsulate(transformBlock, bufferBlock);
async void PropagateCompletion(IDataflowBlock block1, IDataflowBlock block2,
Action<Exception> completionHandler)
{
try
{
await block1.Completion.ConfigureAwait(false);
}
catch { }
var exception =
block1.Completion.IsFaulted ? block1.Completion.Exception : null;
try
{
// Invoke the handler before completing the second block
completionHandler(exception);
}
finally
{
if (exception != null) block2.Fault(exception); else block2.Complete();
}
}
}
// Overload with synchronous lambda
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockEx<TInput, TOutput>(Func<TInput, TOutput> transform,
Action<Exception> onProcessingCompleted,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
return CreateTransformBlockEx<TInput, TOutput>(
x => Task.FromResult(transform(x)), onProcessingCompleted,
dataflowBlockOptions);
}
The code of the local function PropagateCompletion mimics the source code of the LinkTo built-in method, when invoked with the PropagateCompletion = true option.
Usage example:
var httpClient = new HttpClient();
var downloader = CreateTransformBlockEx<string, string>(async url =>
{
return await httpClient.GetStringAsync(url);
}, onProcessingCompleted: ex =>
{
Console.WriteLine($"Download completed {(ex == null ? "OK" : "Error")}");
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 10
});

First thing it is not right to use a IPropagator Block as a leaf terminal. But still your requirement can be fulfilled by asynchronously checking the output buffer of the TargetBlock for output messages and then consuming then so that the buffer could be emptied.
` BufferBlock<int> sourceBlock = new BufferBlock<int>();
TransformBlock<int, int> targetBlock = new TransformBlock<int, int>
(element =>
{
return element * 2;
});
sourceBlock.LinkTo(targetBlock, new DataflowLinkOptions {
PropagateCompletion = true });
//feed some elements into the buffer block
for (int i = 1; i <= 100; i++)
{
sourceBlock.SendAsync(i);
}
sourceBlock.Complete();
bool isOutputAvailable = await targetBlock.OutputAvailableAsync();
while(isOutputAvailable)
{
int value = await targetBlock.ReceiveAsync();
isOutputAvailable = await targetBlock.OutputAvailableAsync();
}
await targetBlock.Completion.ContinueWith(_ =>
{
Console.WriteLine("Target Block Completed");//notify completion of the target block
});
`

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

TPL Dataflow, guarantee completion only when ALL source data blocks completed - c#

Related

Can not run TPL Dataflow pipeline

C# using DataflowBlock.Completion to cancel consumer tasks instead of CancellationToken

Propagating completion

How do I signal completion of my dataflow?

TPL Dataflow, can I query whether a data block is marked complete but has not yet completed?

Categories

Resources