TPL Dataflow Blocks using LinkTo Predicate

TPL Dataflow Blocks using LinkTo Predicate - c#

I have some blocks that eventually go from a TransformBlock to one of three other transform blocks based on the LinkTo predicate. I am using DataflowLinkOptions to propagate the completion. The problem is that when a predicate is satisfied and that block is started the rest of my pipeline continues on. It would seem that the pipeline should wait for this block to finish first.
The code for this is something like this:
var linkOptions = new DataflowLinkOptions {PropagateCompletion = true};
mainBlock.LinkTo(block1, linkOptions, x => x.Status = Status.Complete);
mainBlock.LinkTo(block2, linkOptions, x => x.Status = Status.Cancelled);
mainBlock.LinkTo(block3, linkOptions, x => x.Status = Status.Delayed);
mainBlock.LinkTo(DataflowBlock.NullTarget<Thing>(), linkOptions);
Now, this doesn't work as I'd expect as I said, so the only way Ive found to get the behavior that I want is to take the linkOptions out and add the following into the lambda for the mainBlock.
mainBlock = new TransformBlock<Thing,Thing>(input =>
{
DoMyStuff(input);
if (input.Status = Status.Complete)
{
mainBlock.Completion.ContinueWith(t => block1.Complete());
}
if (input.Status = Status.Cancelled)
{
mainBlock.Completion.ContinueWith(t => block2.Complete());
}
if (input.Status = Status.Delayed)
{
mainBlock.Completion.ContinueWith(t => block3.Complete());
}
return input;
});
So the question, is this the only way to get this to work?
BTW, this has been run in my unit test with a single data item running through it to try and debug the pipeline behavior. Each block has been tested individually with multiple unit tests. So what happens in my pipeline unit test is that the assert is hit before the block finished executing and so fails.
If I remove the block2 and block3 links and debug the test using the linkOptions it works fine.

Your problem is not with the code in your question, that works correctly: when the main block completes, all the three followup blocks are marked for completion too.
The problem is with the end block: you're using PropagateCompletion there too, which means that when any of the three previous blocks completes, the end block is marked for completion. What you want is to mark it for completion when all three blocks complete and the Task.WhenAll().ContinueWith() combination from your answer does that (though the first part of that snippet is unnecessary, that does exactly the same thing PropagateCompletion would).
As it turns out, the link option propagation (at least this is my guess) will propagate the completion for blocks that don't satisfy the predicate in the linkTo.
Yes, it propagates completion always. Completion doesn't have any item associated with it, so it doesn't make any sense to apply the predicate to it. Maybe the fact that you always have only a single item (which is not common) makes this more confusing for you?
If my guess is correct I sort of feel like this is bug or design error in the link option completion propagation. Why should a block be complete if it was never used?
Why shouldn't it? To me, this makes perfect sense: even when there were no items with Status.Delayed this time around, you still want to complete the block that processes those items, so that any follow-up code can know that all delayed items were already processed. And the fact that there weren't any doesn't matter.
Anyway, if you encounter this often, you might want to create a helper method that links several source blocks to a single target block at the same time and propagates completion correctly:
public static void LinkTo<T>(
this IReadOnlyCollection<ISourceBlock<T>> sources, ITargetBlock<T> target,
bool propagateCompletion)
{
foreach (var source in sources)
{
source.LinkTo(target);
}
if (propagateCompletion)
Task.WhenAll(sources.Select(source => source.Completion))
.ContinueWith(_ => target.Complete());
}
Usage:
new[] { block1, block2, block3 }.LinkTo(endBlock, propagateCompletion: true);

Ok. So I have to thank Cory first off. When I first read his comment I was little annoyed because I felt like my code illustrated the concept pretty well and could be turned into a working version easily. But anyway, I felt the need to do a complete testable version I could post because of his comment.
In my test the surprising part was even though it mimicked my real code the path I thought would fail passed and the path that I though would pass failed. This made my head spin a bit. So I started to do some more permutations of the original code. Basically I created blocks that were synchronous and blocks that were asynchronous and made both kinds of pipelines. Four in total, 2 sync and 2 async, one of each used link options to propagate and the other used completion tasks in the MainBlock as shown.
After adding some task delays to the async tasks I found that the synchronous versions passed the test and the async ones failed.
So, the eventual solution to the problem was sort of none of the above. As it turns out, the link option propagation (at least this is my guess) will propagate the completion for blocks that don't satisfy the predicate in the linkTo. So, when a Thing with a Status of Complete comes down it goes to Block1.
Oh, I should point out in the complete test code I made all blocks 1,2 & 3 connect to the same EndBlock, which is not shown in the original code.
Anyway, after the predicate is satisfied and the Thing goes to Block1, blocks 2 and 3 I believe are set to complete. This causes the EndBlock to complete which we are awaiting in the unit test and the Assert fails because Block1 isn't done doing its work yet.
If my guess is correct I sort of feel like this is bug or design error in the link option completion propagation. Why should a block be complete if it was never used?
So, here is what I did to solve the problem. I took out the link options and manually wired up the completion events. Like this:
MainBlock.Completion.ContinueWith(t =>
{
Block1.Complete();
Block2.Complete();
Block3.Complete();
});
Task.WhenAll(Block1.Completion, Block2.Completion, Block3.Completion)
.ContinueWith(t =>
{
EndBlock.Complete();
});
This worked fine, and when moved to my real code worked as well. The Task.WhenAll is what made me believe that unused blocks were set to complete and why automatic propagation was the problem.
I hope this helps someone. I will come back and add a link when I post all my test code.
Edit:
Here is a link to the test code gist https://gist.github.com/jmichas/bfab9cec84f0d1e40e12

Related

Prevent multiple execution of a ReactiveCommand (CreateAsyncTask)

Is it possible to prevent multiple execution of a ReactiveCommand.
Here is the 'simple' code I use:
The command is created:
this.LoadCommand = ReactiveCommand.CreateAsyncTask(
async _ => await this._dataService.Load(),
RxApp.TaskpoolScheduler);
After I add the subscription to the command:
this.LoadCommand.Subscribe(assets => ...);
And finally, I execute the command:
this.LoadCommand.ExecuteAsyncTask();
If I call the ExecuteAsyncTask multiple time at several location, I would like that any subsequent calls wait for the first one to finish.
EDIT:
Here is the complete code for the Subscribe method:
this.LoadCommand.Subscribe(assets =>
{
Application.Current.Dispatcher.Invoke(
DispatcherPriority.Background,
new Action(() => this.Assets.Clear()));
foreach (Asset asset in assets)
{
Application.Current.Dispatcher.Invoke(
DispatcherPriority.Background,
new Action<Asset>(a =>
{
this.Assets.Add(a);
}), asset);
}
});
Thanks,
Adrien.

I downloaded your sample application, and was able to fix it. Here's my 2 cents:
1) I took out the Rx.TaskpoolScheduler parameter in your command creation. That tells it to deliver the results using that scheduler, and I think you want to stick to delivering results on the UI thread.
2) Since by making this change you are now running your Subscribe logic on the UI thread, you don't need to deal with all that Invoking. You can access the collection directly:
this.LoadCommand.Subscribe(dataCollection =>
{
DataCollection.Clear();
DataCollection.AddRange(dataCollection);
});
Making just those 2 changes caused it to "work".
I'm no expert, but what I think was happening is that the actual ReactiveCommand "LoadCommand" you had was immediately returning and delivering results on various TaskPool threads. So it would never allow concurrency within the Command itself, which is by design. However the subscribes, I think since each was coming in on a different thread, were happening concurrently (race). So all the clears occurred, then all the adds.
By subscribing and handling all on the same thread you can avoid this, and if you can manage it on the UI thread, you won't need to involve Invoking to the Dispatcher.
Also, in this particular situation using the Invoke on the Dispatcher with the priority DispatcherPriority.Background seems to execute things in a non-serial fashion, not sure exactly the order, but it seemed to do all the clears, then the adds in reverse order (I incremented them so I could tell which invocation it was). So there is definitely something to be said for that. FWIW changing the priority to DispatcherPriority.Send kept it serial and displayed the "expected" behavior. That being said, I still prefer avoiding Invoking to the Dispatcher altogether, if you can.

Use Rx Start, Retry, Delay, Wait for synchronous file delete retry

I need to delete a file and some other process in the application blocks it. As a workaround I decided to try several times with an interval. Is it correct approach:
Observable.Start(() => File.Delete(path)).Retry(2)
.Delay(TimeSpan.FromMilliseconds(500)).Wait();

This won't work the way you want. There are three problems:
Delay doesn't work how you think - it delays passing on the events, but the source still runs immediately.
You are issuing the Retry before the Delay
You need to use Defer to create a factory because Start will only call the embedded function once on evaluation.
Have a look at this answer for more detail on Delay and why DelaySubscription is better: Rx back off and retry.
This answer has a good implementation of a back-off retry: Write an Rx "RetryAfter" extension method
A simple fix for your code could be this, which catches the exception and rethrows it after a delay - but there's no delay if it works:
Observable.Defer(() => Observable.Start(() => File.Delete(path)))
.Catch((Exception ex) =>
Observable.Throw<Unit>(ex)
.DelaySubscription(TimeSpan.FromMilliseconds(500)))
.Retry(2)
.Wait();
Do have a look at the second link above for a fuller and better implementation though.
I kept the code above simple to make the point and isn't perfect - it always delays the exception for example.
You really want to have the DelaySubscription on the action and have it's delay time be dynamically calculated depending on the number of retries, which is what the linked implementation will do.

When to use OrderByCompletion (Jon Skeet) vs Parallel.ForEach with async delegates

Recently Jon Skeet at NDC London spoke about C# 5 async/await and presented the idea of "ordering by completion" a list of async tasks. A link http://msmvps.com/blogs/jon_skeet/archive/2012/01/16/eduasync-part-19-ordering-by-completion-ahead-of-time.aspx
I am a bit confused or should I say I am not sure when will this technique be more appropriate to use.
I cannot understand the difference between this and the below example
var bag = new ConcurrentBag<object>();
Parallel.ForEach(myCollection, async item =>
{
// some pre stuff
var response = await GetData(item);
bag.Add(response);
// some post stuff
}
or ForEachAsync as explained by Stephen Toub - http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx
EDIT: Found a blog post from Stephen Toub explaining "Ordering by completion" <=> "Processing tasks as they complete". Worth reading. After reading this I could clearly understand the reasons how it works and also when to use this technique.

Don't use Parallel.ForEach to execute async code. Parallel.ForEach doesn't understand async, so your lambda will be turned into async void, which won't work correctly (Parallel.ForEach will return before all work is done; exceptions won't be handled properly; possibly other issues).
Use something like ForEachAsync() when you have a collection of objects (not Tasks), you want to perform some async action for each of them and the actions should execute in parallel.
Use OrderByCompletion() when you have a collection of Tasks, you want perform some action (asynchronous or not) for the result of each Task, the actions should not execute in parallel and you want to execute the actions based on the order in which the Tasks complete.

Parallel.ForEach(myCollection, async item =>
This is almost certainly not what you want. The delegate has type Action<T>, and so the anonymous method is an async void method. That means it gets launched, and you have no way of checking its status other than by checking for any of its side effects. In particular, if anything goes wrong, you cannot catch and handle the exception.
Assuming nothing goes wrong, though, results will be added to bag as they complete. Until anything completes, bag will be empty.
In contrast, OrderByCompletion returns an IEnumerable<Task<T>> that immediately contains all not-yet-finished tasks. You could await the fifth element and continue when any five tasks have completed. This might be useful when, for example, you want to run a large number of tasks and periodically update a form to show the progress.
The third option you gave, ForEachAsync, would behave like ForEach, except it would do it right, without the problems mentioned above.

Using Tasks as a way to separate computation of results from committing results

An API pattern we are considering for separating the work of calculating some results from the committing of those results is:
interface IResults { }
class Results : IResults { }
Task<IResults> CalculateResultsAsync(CancellationToken ct)
{
return Task.Run<IResults>(() => new Results(), ct);
}
void CommitResults(IResults iresults)
{
Results results = (Results)iresults;
// Commit the results
}
This would allow a client to have a UI that kicked off the calculation of some results and know when the calculation was ready, and then at that time decide whether or not to commit the results. This is mainly to help us deal with the case where during the calculation, the UI will allow the user to cancel the operation. We want to ensure that:
The cancel UI is only shown while the action is still cancellable (i.e once we're in CommitResults, there is no going back), so once the CalculateResultsAsync task completes, we take down the cancel UI and as long as the user hasn't cancelled, go ahead and call the commit method.
We don't want to have a case (i.e. a race condition) where the user hits cancel and the results are committed anyways.
The client will never make use of IResults other than to pass it back to CommitResults.
Question:
The general question is: is this a good approach? Specifically:
It doesn't feel right having this split into two methods since the client is never inspecting IResults, they are just handing it back to the Commit method.
Is there a standard approach to this problem?

This is a very standard pattern (if not the ideal pattern), especially when your Results object is immutable. We do this regularly in TPL-using code inside the Visual Studio codebase. Much happiness always exists when your asynchronous/parallel logic is processing data, and the mutating crap lives apart from that.
If you're familiar with or have heard of the "Roslyn" project, this is a pattern we're actually encouraging people to use. The idea is refactorings can process asynchronously in the background and produce an object just like your result one that represents the result of the refactoring being applied. Then, on the UI thread anybody can take one of those result objects and apply it, which goes and updates all your files to contain the new text.
I do find the entire IResults/Results thing a bit strange -- it's not clear if you're using this to hide implementations from yourself or not. If the empty interface and the cast bugs you, you could consider adding to IResult a commit method, which the result object implements. Up to you.

I'm not sure why exactly would you need this pattern. To me, it seems that if you check the CancellationToken just before starting the commit, you're going to get exactly the same result, with simpler interface.

Use case for .PublishLast() (previously Prune)

In my opinion, I have a pretty good "feel" for RX functions - I use many of them or can imagine how other can be useful - but I can't find a place for the .Prune function. I know that this is a Multicast to AsyncSubject, but how this can be useful in a real scenario?
Edit: Richard says WebRequest is a good candidate for Prune(). I still don't see how. Let's take an example - I want to transform incoming uri's to images:
public static IObservable<BitmapImage> ToImage(this IObservable<string> source)
{
var streams =
from wc in source.Select(WebRequest.Create)
from s in Observable
.FromAsyncPattern<WebResponse>(wc.BeginGetResponse,
wc.EndGetResponse)()
.Catch(Observable.Empty<WebResponse>())
select s.GetResponseStream();
return streams
.ObserveOnDispatcher()
.Select(x =>
{
var bmp = new BitmapImage();
bmp.SetSource(x);
return bmp;
});
}
I don't see it necessary to append .Prune to .FromAsyncPattern, because when you're calling FromAsyncPattern() (which is hot) you subscribe "instantly".

As it was confirmed on the RX Forum Prune is just a covenience operator.
If your observable has a single value and you're publishing it, can replace Publish\Connect with a single call to .Prune()
So from my experience, the most common scenario for Prune is:
You have a cold observable that produces side-effects and emits only one value
You have more than one subscriber to that observable, so you want to make it hot (because of side-effects)
Another, pointed out in the forum, is when you need to cache a particular value on a hot observable(usually first). Then you use FromEvent(...).Take(1).Prune() and anybody who subscribes to this will get the same value guaranteed. This one is not just "convenience", but pretty much the only easy way to achieve the result.
Pretty useful, after all!

The most common scenario is when the source observable is hot and can complete before you subscribe to it. The AsyncSubject captures the last value and re-emits it for any future subscribers.
Edit
I'd have to check, but I believe FromAsyncPattern uses an AsyncSubject internally, so is actually already "Pruned".
However, assuming you were working with some other hot source that did not, the use of Prune comes entirely down to the lifetime of the IObservable before it is subscribed to. If you subscribe to it instantly, there is no need for Prune. If the IObservable will exist for a while before being subscribed to, however, it may have already completed.
This is my understanding, as someone who has ported Rx but never used Prune. Maybe you should ask the same question on the Rx forums? You've got a chance of it being answered by someone on the Rx team.

I've also found a neat use for it when I've got multiple UI components that need to listen to a task (e.g. callback) and by default, Subscribe() on a cold observable will kick off that task several times which is typically not what you want when sharing state across UI components.
I know Richard mentioned a lot of these points but I figured this is such a perfect candidate for single-run Tasks, to add this example in too.
var oTask = Observable.FromAsync(() => Task.Factory.StartNew(() =>
{
Thread.Sleep(1000);
Console.WriteLine("Executed Task");
}));
//Setup the IConnectedObservable
var oTask2 = oTask.PublishLast();
//Subscribe - nothing happens
oTask2.Subscribe(x => { Console.WriteLine("Called from Task 1"); });
oTask2.Subscribe(x => { Console.WriteLine("Called from Task 2"); });
//The one and only time the Task is run
oTask2.Connect();
//Subscribe after the task is already complete - we want the results
Thread.Sleep(5000);
oTask2.Subscribe(x => { Console.WriteLine("Called from Task 3"); });

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.