Can not run TPL Dataflow pipeline

Can not run TPL Dataflow pipeline - c#

I am trying to create a pipeline using TPL Dataflow where i can store messages in a batch block , and whenever its treshold is hit it would send the data to an action block.I have added a buffer block in case the action block is too slow.
So far i have tried all possible methods to move data from the first block to the second to no avail. I have linked the blocks , added the DataFlowLinkOptions of PropagateCompletion set to true. What else do I have to do in order for this pipeline to work ?
Pipeline
class LogPipeline<T>
{
private ActionBlock<T[]> actionBlock;
private BufferBlock<T> bufferBlock;
private BatchBlock<T> batchBlock;
private readonly Action<T[]> action;
private readonly int BufferSize;
private readonly int BatchSize;
public LogPipeline(Action<T[]> action, int bufferSize = 4, int batchSize = 2)
{
this.BufferSize = bufferSize;
this.BatchSize = batchSize;
this.action = action;
}
private void Initialize()
{
this.bufferBlock = new BufferBlock<T>(new DataflowBlockOptions
{ TaskScheduler = TaskScheduler.Default,
BoundedCapacity = this.BufferSize });
this.actionBlock = new ActionBlock<T[]>(this.action);
this.batchBlock = new BatchBlock<T>(BatchSize);
this.bufferBlock.LinkTo(this.batchBlock, new DataflowLinkOptions
{ PropagateCompletion = true });
this.batchBlock.LinkTo(this.actionBlock, new DataflowLinkOptions
{ PropagateCompletion = true });
}
public void Post(T log)
{
this.bufferBlock.Post(log);
}
public void Start()
{
this.Initialize();
}
public void Stop()
{
actionBlock.Complete();
}
}
Test
[TestCase(100, 1000, 5)]
public void CanBatchPipelineResults(int batchSize, int bufferSize, int cycles)
{
List<int> data = new List<int>();
LogPipeline<int> logPipeline = new LogPipeline<int>(
batchSize: batchSize,
bufferSize: bufferSize,
action: (logs) =>
{
data.AddRange(logs);
});
logPipeline.Start();
int SelectWithEffect(int element)
{
logPipeline.Post(element);
return 3;
}
int count = 0;
while (true)
{
if (count++ > cycles)
{
break;
}
var sent = Parallel.For(0, bufferSize, (x) => SelectWithEffect(x));
}
logPipeline.Stop();
Assert.IsTrue(data.Count == cycles * batchSize);
}
Why are all my blocks empty besides the buffer? I have tried with SendAsync also to no avail. No data is moved from the first block to the next no matter what I do.
I have both with and without the link options.
Update :
I have completely erased the pipeline and also the Parallel.
I have tried with all kinds of input blocks (batch/buffer/transform) and it seems there is no way subsequent blocks are getting something.
I have also tried with await SendAsync as well as Post.
I have only tried within unit tests classes.
Could this be the issue ?
Update 2
I was wrong complicating things , i have tried a more simple example . Inside a testcase even this doesnt work:
List<int> items=new List<int>();
var tf=new TransformBlock<int,int>(x=>x+1);
var action= new ActionBlock<int>(x=>items.Add(x));
tf.LinkTo(action, new DataFlowOptions{ PropagateCompletion=true});
tf.Post(3);
//Breakpoint here

The reason nothing seems to happen before the test ends is that none of the block has a chance to run. The code blocks all CPUs by using Parallel.For so no other task has a chance to run. This means that all posted messages are still in the first block. The code then calls Complete on the last block but doesn't even await for it to finish processing before checking the results.
The code can be simplified a lot. For starters, all blocks have input buffers, they don't need extra buffering.
The pipeline could be replaced with just this :
//Arrange
var list=new List<int>();
var head=new BatchBlock<int>(BatchSize);
var act=new ActionBlock<int[]>(nums=>list.AddRange(nums);
var options= new DataflowLinkOptions{ PropagateCompletion = true };
head.LinkTo(act);
//ACT
//Just fire everything at once, because why not
var tasks=Enumerable.Range(0,cycles)(
i=>Task.Run(()=> head.Post(i)));
await tasks;
//Tell the head block we're done
head.Complete();
//Wait for the last block to complete
await act.Completion;
//ASSERT
Assert.Equal(cycles, data.Count);
There's no real need to create a complex class to encapsulate the pipeline. It doesn't "start" - the blocks do nothing if they have no data. To abstract it, one only needs to provide access to the head block and the last block's Completion task

By calling logPipeline.Stop immediately after sending the data to the BufferBlock, you are completing the ActionBlock, and so it declines all messages that the BatchBlock is trying later to send to it. From the documentation of the ActionBlock.Complete method:
Signals to the dataflow block that it shouldn't accept or produce any more messages and shouldn't consume any more postponed messages.
Update: Regarding the updated requirements in the question:
Whenever its threshold is hit it would send the data to an action block.
...my suggestion is to move this logic inside the LogPipeline.Post method. The method BufferBlock.Post returns false if the block hasn't accepted the data sent to it.
public void Post(T log)
{
if (!this.bufferBlock.Post(log)) this.actionBlock.Post(log);
}

Related

C# Abortable Asynchronous Fifo Queue - leaking massive amounts of memory

I need to process data from a producer in FIFO fashion with the ability to abort processing if the same producer produces a new bit of data.
So I implemented an abortable FIFO queue based on Stephen Cleary's AsyncCollection (called AsyncCollectionAbortableFifoQueuein my sample) and one on TPL's BufferBlock (BufferBlockAbortableAsyncFifoQueue in my sample). Here's the implementation based on AsyncCollection
public class AsyncCollectionAbortableFifoQueue<T> : IExecutableAsyncFifoQueue<T>
{
private AsyncCollection<AsyncWorkItem<T>> taskQueue = new AsyncCollection<AsyncWorkItem<T>>();
private readonly CancellationToken stopProcessingToken;
public AsyncCollectionAbortableFifoQueue(CancellationToken cancelToken)
{
stopProcessingToken = cancelToken;
_ = processQueuedItems();
}
public Task<T> EnqueueTask(Func<Task<T>> action, CancellationToken? cancelToken)
{
var tcs = new TaskCompletionSource<T>();
var item = new AsyncWorkItem<T>(tcs, action, cancelToken);
taskQueue.Add(item);
return tcs.Task;
}
protected virtual async Task processQueuedItems()
{
while (!stopProcessingToken.IsCancellationRequested)
{
try
{
var item = await taskQueue.TakeAsync(stopProcessingToken).ConfigureAwait(false);
if (item.CancelToken.HasValue && item.CancelToken.Value.IsCancellationRequested)
item.TaskSource.SetCanceled();
else
{
try
{
T result = await item.Action().ConfigureAwait(false);
item.TaskSource.SetResult(result); // Indicate completion
}
catch (Exception ex)
{
if (ex is OperationCanceledException && ((OperationCanceledException)ex).CancellationToken == item.CancelToken)
item.TaskSource.SetCanceled();
item.TaskSource.SetException(ex);
}
}
}
catch (Exception) { }
}
}
}
public interface IExecutableAsyncFifoQueue<T>
{
Task<T> EnqueueTask(Func<Task<T>> action, CancellationToken? cancelToken);
}
processQueuedItems is the task that dequeues AsyncWorkItem's from the queue, and executes them unless cancellation has been requested.
The asynchronous action to execute gets wrapped into an AsyncWorkItem which looks like this
internal class AsyncWorkItem<T>
{
public readonly TaskCompletionSource<T> TaskSource;
public readonly Func<Task<T>> Action;
public readonly CancellationToken? CancelToken;
public AsyncWorkItem(TaskCompletionSource<T> taskSource, Func<Task<T>> action, CancellationToken? cancelToken)
{
TaskSource = taskSource;
Action = action;
CancelToken = cancelToken;
}
}
Then there's a task looking and dequeueing items for processing and either processing them, or aborting if the CancellationToken has been triggered.
That all works just fine - data gets processed, and if a new piece of data is received, processing of the old is aborted. My problem now stems from these Queues leaking massive amounts of memory if I crank up the usage (producer producing a lot more than the consumer processes). Given it's abortable, the data that is not processed, should be discarded and eventually disappear from memory.
So let's look at how I'm using these queues. I have a 1:1 match of producer and consumer. Every consumer handles data of a single producer. Whenever I get a new data item, and it doesn't match the previous one, I catch the queue for the given producer (User.UserId) or create a new one (the 'executor' in the code snippet). Then I have a ConcurrentDictionary that holds a CancellationTokenSource per producer/consumer combo. If there's a previous CancellationTokenSource, I call Cancel on it and Dispose it 20 seconds later (immediate disposal would cause exceptions in the queue). I then enqueue processing of the new data. The queue returns me a task that I can await so I know when processing of the data is complete, and I then return the result.
Here's that in code
internal class SimpleLeakyConsumer
{
private ConcurrentDictionary<string, IExecutableAsyncFifoQueue<bool>> groupStateChangeExecutors = new ConcurrentDictionary<string, IExecutableAsyncFifoQueue<bool>>();
private readonly ConcurrentDictionary<string, CancellationTokenSource> userStateChangeAborters = new ConcurrentDictionary<string, CancellationTokenSource>();
protected CancellationTokenSource serverShutDownSource;
private readonly int operationDuration = 1000;
internal SimpleLeakyConsumer(CancellationTokenSource serverShutDownSource, int operationDuration)
{
this.serverShutDownSource = serverShutDownSource;
this.operationDuration = operationDuration * 1000; // convert from seconds to milliseconds
}
internal async Task<bool> ProcessStateChange(string userId)
{
var executor = groupStateChangeExecutors.GetOrAdd(userId, new AsyncCollectionAbortableFifoQueue<bool>(serverShutDownSource.Token));
CancellationTokenSource oldSource = null;
using (var cancelSource = userStateChangeAborters.AddOrUpdate(userId, new CancellationTokenSource(), (key, existingValue) =>
{
oldSource = existingValue;
return new CancellationTokenSource();
}))
{
if (oldSource != null && !oldSource.IsCancellationRequested)
{
oldSource.Cancel();
_ = delayedDispose(oldSource);
}
try
{
var executionTask = executor.EnqueueTask(async () => { await Task.Delay(operationDuration, cancelSource.Token).ConfigureAwait(false); return true; }, cancelSource.Token);
var result = await executionTask.ConfigureAwait(false);
userStateChangeAborters.TryRemove(userId, out var aborter);
return result;
}
catch (Exception e)
{
if (e is TaskCanceledException || e is OperationCanceledException)
return true;
else
{
userStateChangeAborters.TryRemove(userId, out var aborter);
return false;
}
}
}
}
private async Task delayedDispose(CancellationTokenSource src)
{
try
{
await Task.Delay(20 * 1000).ConfigureAwait(false);
}
finally
{
try
{
src.Dispose();
}
catch (ObjectDisposedException) { }
}
}
}
In this sample implementation, all that is being done is wait, then return true.
To test this mechanism, I wrote the following Data producer class:
internal class SimpleProducer
{
//variables defining the test
readonly int nbOfusers = 10;
readonly int minimumDelayBetweenTest = 1; // seconds
readonly int maximumDelayBetweenTests = 6; // seconds
readonly int operationDuration = 3; // number of seconds an operation takes in the tester
private readonly Random rand;
private List<User> users;
private readonly SimpleLeakyConsumer consumer;
protected CancellationTokenSource serverShutDownSource, testAbortSource;
private CancellationToken internalToken = CancellationToken.None;
internal SimpleProducer()
{
rand = new Random();
testAbortSource = new CancellationTokenSource();
serverShutDownSource = new CancellationTokenSource();
generateTestObjects(nbOfusers, 0, false);
consumer = new SimpleLeakyConsumer(serverShutDownSource, operationDuration);
}
internal void StartTests()
{
if (internalToken == CancellationToken.None || internalToken.IsCancellationRequested)
{
internalToken = testAbortSource.Token;
foreach (var user in users)
_ = setNewUserPresence(internalToken, user);
}
}
internal void StopTests()
{
testAbortSource.Cancel();
try
{
testAbortSource.Dispose();
}
catch (ObjectDisposedException) { }
testAbortSource = new CancellationTokenSource();
}
internal void Shutdown()
{
serverShutDownSource.Cancel();
}
private async Task setNewUserPresence(CancellationToken token, User user)
{
while (!token.IsCancellationRequested)
{
var nextInterval = rand.Next(minimumDelayBetweenTest, maximumDelayBetweenTests);
try
{
await Task.Delay(nextInterval * 1000, testAbortSource.Token).ConfigureAwait(false);
}
catch (TaskCanceledException)
{
break;
}
//now randomly generate a new state and submit it to the tester class
UserState? status;
var nbStates = Enum.GetValues(typeof(UserState)).Length;
if (user.CurrentStatus == null)
{
var newInt = rand.Next(nbStates);
status = (UserState)newInt;
}
else
{
do
{
var newInt = rand.Next(nbStates);
status = (UserState)newInt;
}
while (status == user.CurrentStatus);
}
_ = sendUserStatus(user, status.Value);
}
}
private async Task sendUserStatus(User user, UserState status)
{
await consumer.ProcessStateChange(user.UserId).ConfigureAwait(false);
}
private void generateTestObjects(int nbUsers, int nbTeams, bool addAllUsersToTeams = false)
{
users = new List<User>();
for (int i = 0; i < nbUsers; i++)
{
var usr = new User
{
UserId = $"User_{i}",
Groups = new List<Team>()
};
users.Add(usr);
}
}
}
It uses the variables at the beginning of the class to control the test. You can define the number of users (nbOfusers - every user is a producer that produces new data), the minimum (minimumDelayBetweenTest) and maximum (maximumDelayBetweenTests) delay between a user producing the next data and how long it takes the consumer to process the data (operationDuration).
StartTests starts the actual test, and StopTests stops the tests again.
I'm calling these as follows
static void Main(string[] args)
{
var tester = new SimpleProducer();
Console.WriteLine("Test successfully started, type exit to stop");
string str;
do
{
str = Console.ReadLine();
if (str == "start")
tester.StartTests();
else if (str == "stop")
tester.StopTests();
}
while (str != "exit");
tester.Shutdown();
}
So, if I run my tester and type 'start', the Producer class starts producing states that are consumed by Consumer. And memory usage starts to grow and grow and grow. The sample is configured to the extreme, the real-life scenario I'm dealing with is less intensive, but one action of the producer could trigger multiple actions on the consumer side which also have to be executed in the same asynchronous abortable fifo fashion - so worst case, one set of data produced triggers an action for ~10 consumers (that last part I stripped out for brevity).
When I'm having a 100 producers, and each producer produces a new data item every 1-6 seconds (randomly, also the data produces is random). Consuming the data takes 3 seconds.. so there's plenty of cases where there's a new set of data before the old one has been properly processed.
Looking at two consecutive memory dumps, it's obvious where the memory usage is coming from.. it's all fragments that have to do with the queue. Given that I'm disposing every TaskCancellationSource and not keeping any references to the produced data (and the AsyncWorkItem they're put into), I'm at a loss to explain why this keeps eating up my memory and I'm hoping somebody else can show me the errors of my way. You can also abort testing by typing 'stop'.. you'll see that no longer is memory being eaten, but even if you pause and trigger GC, memory is not being freed either.
The source code of the project in runnable form is on Github. After starting it, you have to type start (plus enter) in the console to tell the producer to start producing data. And you can stop producing data by typing stop (plus enter)

Your code has so many issues making it impossible to find a leak through debugging. But here are several things that already are an issue and should be fixed first:
Looks like getQueue creates a new queue for the same user each time processUseStateUpdateAsync gets called and does not reuse existing queues:
var executor = groupStateChangeExecutors.GetOrAdd(user.UserId, getQueue());
CancellationTokenSource is leaking on each call of the code below, as new value created each time the method AddOrUpdate is called, it should not be passed there that way:
userStateChangeAborters.AddOrUpdate(user.UserId, new CancellationTokenSource(), (key, existingValue
Also code below should use the same cts as you pass as new cts, if dictionary has no value for specific user.UserId:
return new CancellationTokenSource();
Also there is a potential leak of cancelSource variable as it gets bound to a delegate which can live for a time longer than you want, it's better to pass concrete CancellationToken there:
executor.EnqueueTask(() => processUserStateUpdateAsync(user, state, previousState,
cancelSource.Token));
By some reason you do not dispose aborter here and in one more place:
userStateChangeAborters.TryRemove(user.UserId, out var aborter);
Creation of Channel can have potential leaks:
taskQueue = Channel.CreateBounded<AsyncWorkItem<T>>(new BoundedChannelOptions(1)
You picked option FullMode = BoundedChannelFullMode.DropOldest which should remove oldest values if there are any, so I assume that that stops queued items from processing as they would not be read. It's a hypotheses, but I assume that if an old item is removed without being handled, then processUserStateUpdateAsync won't get called and all resources won't be freed.
You can start with these found issues and it should be easier to find the real cause after that.

Run X number of Task<T> at any given time while keeping UI responsive

I have a C# WinForms (.NET 4.5.2) app utilizing the TPL. The tool has a synchronous function which is passed over to a task factory X amount of times (with different input parameters), where X is a number declared by the user before commencing the process. The tasks are started and stored in a List<Task>.
Assuming the user entered 5, we have this in an async button click handler:
for (int i = 0; i < X; i++)
{
var progress = Progress(); // returns a new IProgress<T>
var task = Task<int>.Factory.StartNew(() => MyFunction(progress), TaskCreationOptions.LongRunning);
TaskList.Add(task);
}
Each progress instance updates the UI.
Now, as soon as a task is finished, I want to fire up a new one. Essentially, the process should run indefinitely, having X tasks running at any given time, unless the user cancels via the UI (I'll use cancellation tokens for this). I try to achieve this using the following:
while (TaskList.Count > 0)
{
var completed = await Task.WhenAny(TaskList.ToArray());
if (completed.Exception == null)
{
// report success
}
else
{
// flatten AggregateException, print out, etc
}
// update some labels/textboxes in the UI, and then:
TaskList.Remove(completed);
var task = Task<int>.Factory.StartNew(() => MyFunction(progress), TaskCreationOptions.LongRunning);
TaskList.Add(task);
}
This is bogging down the UI. Is there a better way of achieving this functionality, while keeping the UI responsive?
A suggestion was made in the comments to use TPL Dataflow but due to time constraints and specs, alternative solutions are welcome
Update
I'm not sure whether the progress reporting might be the problem? Here's what it looks like:
private IProgress<string> Progress()
{
return new Progress<string>(msg =>
{
txtMsg.AppendText(msg);
});
}

Now, as soon as a task is finished, I want to fire up a new one. Essentially, the process should run indefinitely, having X tasks running at any given time
It sounds to me like you want an infinite loop inside your task:
for (int i = 0; i < X; i++)
{
var progress = Progress(); // returns a new IProgress<T>
var task = RunIndefinitelyAsync(progress);
TaskList.Add(task);
}
private async Task RunIndefinitelyAsync(IProgress<T> progress)
{
while (true)
{
try
{
await Task.Run(() => MyFunction(progress));
// handle success
}
catch (Exception ex)
{
// handle exceptions
}
// update some labels/textboxes in the UI
}
}
However, I suspect that the "bogging down the UI" is probably in the // handle success and/or // handle exceptions code. If my suspicion is correct, then push as much of the logic into the Task.Run as possible.

As I understand, you simply need a parallel execution with the defined degree of parallelization. There is a lot of ways to implement what you want. I suggest to use blocking collection and parallel class instead of tasks.
So when user clicks button, you need to create a new blocking collection which will be your data source:
BlockingCollection<IProgress> queue = new BlockingCollection<IProgress>();
CancellationTokenSource source = new CancellationTokenSource();
Now you need a runner that will execute your in parallel:
Task.Factory.StartNew(() =>
Parallel.For(0, X, i =>
{
foreach (IProgress p in queue.GetConsumingEnumerable(source.Token))
{
MyFunction(p);
}
}), source.Token);
Or you can choose more correct way with partitioner. So you'll need a partitioner class:
private class BlockingPartitioner<T> : Partitioner<T>
{
private readonly BlockingCollection<T> _Collection;
private readonly CancellationToken _Token;
public BlockingPartitioner(BlockingCollection<T> collection, CancellationToken token)
{
_Collection = collection;
_Token = token;
}
public override IList<IEnumerator<T>> GetPartitions(int partitionCount)
{
throw new NotImplementedException();
}
public override IEnumerable<T> GetDynamicPartitions()
{
return _Collection.GetConsumingEnumerable(_Token);
}
public override bool SupportsDynamicPartitions
{
get { return true; }
}
}
And runner will looks like this:
ParallelOptions Options = new ParallelOptions();
Options.MaxDegreeOfParallelism = X;
Task.Factory.StartNew(
() => Parallel.ForEach(
new BlockingPartitioner<IProgress>(queue, source.Token),
Options,
p => MyFunction(p)));
So all you need right now is to fill queue with necessary data. You can do it whenever you want.
And final touch, when the user cancels operation, you have two options:
first you can break execution with source.Cancel call,
or you can gracefully stop execution by marking collection complete (queue.CompleteAdding), in that case runner will execute all already queued data and finish.
Of course you need additional code to handle exceptions, progress, state and so on. But main idea is here.

How to consume chuncks from ConcurrentQueue correctly

I need to implement a queue of requests which can be populated from multiple threads. When this queue becomes larger than 1000 completed requests, this requests should be stored into database. Here is my implementation:
public class RequestQueue
{
private static BlockingCollection<VerificationRequest> _queue = new BlockingCollection<VerificationRequest>();
private static ConcurrentQueue<VerificationRequest> _storageQueue = new ConcurrentQueue<VerificationRequest>();
private static volatile bool isLoading = false;
private static object _lock = new object();
public static void Launch()
{
Task.Factory.StartNew(execute);
}
public static void Add(VerificationRequest request)
{
_queue.Add(request);
}
public static void AddRange(List<VerificationRequest> requests)
{
Parallel.ForEach(requests, new ParallelOptions() {MaxDegreeOfParallelism = 3},
(request) => { _queue.Add(request); });
}
private static void execute()
{
Parallel.ForEach(_queue.GetConsumingEnumerable(), new ParallelOptions {MaxDegreeOfParallelism = 5}, EnqueueSaveRequest );
}
private static void EnqueueSaveRequest(VerificationRequest request)
{
_storageQueue.Enqueue( new RequestExecuter().ExecuteVerificationRequest( request ) );
if (_storageQueue.Count > 1000 && !isLoading)
{
lock ( _lock )
{
if ( _storageQueue.Count > 1000 && !isLoading )
{
isLoading = true;
var requestChunck = new List<VerificationRequest>();
VerificationRequest req;
for (var i = 0; i < 1000; i++)
{
if( _storageQueue.TryDequeue(out req))
requestChunck.Add(req);
}
new VerificationRequestRepository().InsertRange(requestChunck);
isLoading = false;
}
}
}
}
}
Is there any way to implement this without lock and isLoading?

The easiest way to do what you ask is to use the blocks in the TPL Dataflow library. Eg
var batchBlock = new BatchBlock<VerificationRequest>(1000);
var exportBlock = new ActionBlock<VerificationRequest[]>(records=>{
new VerificationRequestRepository().InsertRange(records);
};
batchBlock.LinkTo(exportBlock , new DataflowLinkOptions { PropagateCompletion = true });
That's it.
You can send messages to the starting block with
batchBlock.Post(new VerificationRequest(...));
Once you finish your work, you can take down the entire pipeline and flush any leftover messages by calling batchBlock.Complete(); and await for the final block to finish:
batchBlock.Complete();
await exportBlock.Completion;
The BatchBlock batches up to 1000 records into arrays of 1000 items and passes them to the next block. An ActionBlock uses 1 task only by default, so it is thread-safe. You could use an existing instance of your repository without worrying about cross-thread access:
var repository=new VerificationRequestRepository();
var exportBlock = new ActionBlock<VerificationRequest[]>(records=>{
repository.InsertRange(records);
};
Almost all blocks have a concurrent input buffer. Each block runs on its own TPL task, so each step runs concurrently with each other. This means that you get asynchronous execution "for free" and can be important if you have multiple linked steps, eg you use a TransformBlock to modify the messages flowing through the pipeline.
I use such pipelines to create pipelines that call external services, parse responses, generate the final records, batch them and send them to the database with a block that uses SqlBulkCopy.

How do I signal completion of my dataflow?

I've got a class the implements a dataflow composed of 3 steps using TPL Dataflow.
In the constructor I create the steps as TransformBlocks and link them up using LinkTo with DataflowLinkOptions.PropagateCompletion set to true. The class exposes a single method which kicks of the workflow by calling SendAsync on the 1st step. The method returns the "Completion" property of the final step of the workflow.
At the moment the steps in the workflow appear to execute as expected but final step never completes unless I explicitly call Complete on it. But doing that short-circuits the workflow and none of the steps are executed? What am I doing wrong?
public class MessagePipeline {
private TransformBlock<object, object> step1;
private TransformBlock<object, object> step2;
private TransformBlock<object, object> step3;
public MessagePipeline() {
var linkOptions = new DataflowLinkOptions { PropagateCompletion = true };
step1 = new TransformBlock<object, object>(
x => {
Console.WriteLine("Step1...");
return x;
});
step2 = new TransformBlock<object, object>(
x => {
Console.WriteLine("Step2...");
return x;
});
step3 = new TransformBlock<object, object>(
x => {
Console.WriteLine("Step3...");
return x;
});
step1.LinkTo(step2, linkOptions);
step2.LinkTo(step3, linkOptions);
}
public Task Push(object message) {
step1.SendAsync(message);
step1.Complete();
return step3.Completion;
}
}
...
public class Program {
public static void Main(string[] args) {
var pipeline = new MessagePipeline();
var result = pipeline.Push("Hello, world!");
result.ContinueWith(_ => Console.WriteLine("Completed"));
Console.ReadLine();
}
}

When you link the steps, you need to pass a DataflowLinkOptions with the the PropagateCompletion property set to true to propagate both completion and errors. Once you do that, calling Complete() on the first block will propagete completion to downstream blocks.
Once a block receives the completion event, it finishes processing then notifies its linked downstream targets.
This way you can post all your data to the first step and call Complete(). The final block will only complete when all upstream blocks have completed.
For example,
var linkOptions=new DataflowLinkOptions { PropagateCompletion = true};
myFirstBlock.LinkTo(mySecondBlock,linkOptions);
mySecondBlock.LinkTo(myFinalBlock,linkOptions);
foreach(var message in messages)
{
myFirstBlock.Post(message);
}
myFirstBlock.Complete();
......
await myFinalBlock.Completion;
PropagateCompletion isn't true by default because in more complex scenarios (eg non-linear flows, or dynamically changing flows) you don't want completion and errors to propagate automatically. You may also want to avoid automatic completion if you want to handle errors without terminating the entire flow.
Way back when TPL Dataflow was in beta the default was true but this was changed on RTM
UPDATE
The code never completes because the final step is a TransformBlock with no linked target to receive its output. This means that even though the block received the completion signal, it hasn't finished all its work and can't change its own Completion status.
Changing it to an ActionBlock<object> removes the issue.

You need to explicitly call Complete.

TPL Dataflow, can I query whether a data block is marked complete but has not yet completed?

Given the following:
BufferBlock<int> sourceBlock = new BufferBlock<int>();
TransformBlock<int, int> targetBlock = new TransformBlock<int, int>(element =>
{
return element * 2;
});
sourceBlock.LinkTo(targetBlock, new DataflowLinkOptions { PropagateCompletion = true });
//feed some elements into the buffer block
for(int i = 1; i <= 1000000; i++)
{
sourceBlock.SendAsync(i);
}
sourceBlock.Complete();
targetBlock.Completion.ContinueWith(_ =>
{
//notify completion of the target block
});
The targetBlock never seems to complete and I think the reason is that all the items in the TransformBlock targetBlock are waiting in the output queue as I have not linked the targetBlock to any other Dataflow block. However, what I actually want to achieve is a notification when (A) the targetBlock is notified of completion AND (B) the input queue is empty. I do not want to care whether items still sit in the output queue of the TransformBlock. How can I go about that? Is the only way to get what I want to query the completion status of the sourceBlock AND to make sure the InputCount of the targetBlock is zero? I am not sure this is very stable (is the sourceBlock truly only marked completed if the last item in the sourceBlock has been passed to the targetBlock?). Is there a more elegant and more efficient way to get to the same goal?
Edit: I just noticed even the "dirty" way to check on completion of the sourceBlock AND InputCount of the targetBlock being zero is not trivial to implement. Where would that block sit? It cannot be within the targetBlock because once above two conditions are met obviously no message is processed within targetBlock anymore. Also checking on the completion status of the sourceBlock introduces a lot of inefficiency.

I believe you can't directly do this. It's possible you could get this information from some private fields using reflection, but I wouldn't recommend doing that.
But you can do this by creating custom blocks. In the case of Complete() it's simple: just create a block that forwards each method to the original block. Except Complete(), where it will also log it.
In the case of figuring out when processing of all items is complete, you could link your block to an intermediate BufferBlock. This way, the output queue will be emptied quickly and so checking Completed of the internal block would give you fairly accurate measurement of when the processing is complete. This would affect your measurements, but hopefully not significantly.
Another option would be to add some logging at the end of the block's delegate. This way, you could see when processing of the last item was finished.

It would be nice if the TransformBlock had a ProcessingCompleted event that would fire when the block has completed the processing of all messages in its queue, but there is no such event. Below is an attempt to rectify this omission. The CreateTransformBlockEx method accepts an Action<Exception> handler, that is invoked when this "event" occurs.
The intention was to always invoke the handler before the final completion of the block. Unfortunately in the case that the supplied CancellationToken is canceled, the completion (cancellation) happens first, and the handler is invoked some milliseconds later. To fix this inconsistency would require some tricky workarounds, and may had other unwanted side-effects, so I am leaving it as is.
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockEx<TInput, TOutput>(Func<TInput, Task<TOutput>> transform,
Action<Exception> onProcessingCompleted,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
if (onProcessingCompleted == null)
throw new ArgumentNullException(nameof(onProcessingCompleted));
dataflowBlockOptions = dataflowBlockOptions ?? new ExecutionDataflowBlockOptions();
var transformBlock = new TransformBlock<TInput, TOutput>(transform,
dataflowBlockOptions);
var bufferBlock = new BufferBlock<TOutput>(dataflowBlockOptions);
transformBlock.LinkTo(bufferBlock);
PropagateCompletion(transformBlock, bufferBlock, onProcessingCompleted);
return DataflowBlock.Encapsulate(transformBlock, bufferBlock);
async void PropagateCompletion(IDataflowBlock block1, IDataflowBlock block2,
Action<Exception> completionHandler)
{
try
{
await block1.Completion.ConfigureAwait(false);
}
catch { }
var exception =
block1.Completion.IsFaulted ? block1.Completion.Exception : null;
try
{
// Invoke the handler before completing the second block
completionHandler(exception);
}
finally
{
if (exception != null) block2.Fault(exception); else block2.Complete();
}
}
}
// Overload with synchronous lambda
public static IPropagatorBlock<TInput, TOutput>
CreateTransformBlockEx<TInput, TOutput>(Func<TInput, TOutput> transform,
Action<Exception> onProcessingCompleted,
ExecutionDataflowBlockOptions dataflowBlockOptions = null)
{
return CreateTransformBlockEx<TInput, TOutput>(
x => Task.FromResult(transform(x)), onProcessingCompleted,
dataflowBlockOptions);
}
The code of the local function PropagateCompletion mimics the source code of the LinkTo built-in method, when invoked with the PropagateCompletion = true option.
Usage example:
var httpClient = new HttpClient();
var downloader = CreateTransformBlockEx<string, string>(async url =>
{
return await httpClient.GetStringAsync(url);
}, onProcessingCompleted: ex =>
{
Console.WriteLine($"Download completed {(ex == null ? "OK" : "Error")}");
}, new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 10
});

First thing it is not right to use a IPropagator Block as a leaf terminal. But still your requirement can be fulfilled by asynchronously checking the output buffer of the TargetBlock for output messages and then consuming then so that the buffer could be emptied.
` BufferBlock<int> sourceBlock = new BufferBlock<int>();
TransformBlock<int, int> targetBlock = new TransformBlock<int, int>
(element =>
{
return element * 2;
});
sourceBlock.LinkTo(targetBlock, new DataflowLinkOptions {
PropagateCompletion = true });
//feed some elements into the buffer block
for (int i = 1; i <= 100; i++)
{
sourceBlock.SendAsync(i);
}
sourceBlock.Complete();
bool isOutputAvailable = await targetBlock.OutputAvailableAsync();
while(isOutputAvailable)
{
int value = await targetBlock.ReceiveAsync();
isOutputAvailable = await targetBlock.OutputAvailableAsync();
}
await targetBlock.Completion.ContinueWith(_ =>
{
Console.WriteLine("Target Block Completed");//notify completion of the target block
});
`

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Can not run TPL Dataflow pipeline - c#

Related

C# Abortable Asynchronous Fifo Queue - leaking massive amounts of memory

Run X number of Task<T> at any given time while keeping UI responsive

How to consume chuncks from ConcurrentQueue correctly

How do I signal completion of my dataflow?

TPL Dataflow, can I query whether a data block is marked complete but has not yet completed?

Categories

Resources