Use case for .PublishLast() (previously Prune) - c#

In my opinion, I have a pretty good "feel" for RX functions - I use many of them or can imagine how other can be useful - but I can't find a place for the .Prune function. I know that this is a Multicast to AsyncSubject, but how this can be useful in a real scenario?
Edit: Richard says WebRequest is a good candidate for Prune(). I still don't see how. Let's take an example - I want to transform incoming uri's to images:
public static IObservable<BitmapImage> ToImage(this IObservable<string> source)
{
var streams =
from wc in source.Select(WebRequest.Create)
from s in Observable
.FromAsyncPattern<WebResponse>(wc.BeginGetResponse,
wc.EndGetResponse)()
.Catch(Observable.Empty<WebResponse>())
select s.GetResponseStream();
return streams
.ObserveOnDispatcher()
.Select(x =>
{
var bmp = new BitmapImage();
bmp.SetSource(x);
return bmp;
});
}
I don't see it necessary to append .Prune to .FromAsyncPattern, because when you're calling FromAsyncPattern() (which is hot) you subscribe "instantly".

As it was confirmed on the RX Forum Prune is just a covenience operator.
If your observable has a single value and you're publishing it, can replace Publish\Connect with a single call to .Prune()
So from my experience, the most common scenario for Prune is:
You have a cold observable that produces side-effects and emits only one value
You have more than one subscriber to that observable, so you want to make it hot (because of side-effects)
Another, pointed out in the forum, is when you need to cache a particular value on a hot observable(usually first). Then you use FromEvent(...).Take(1).Prune() and anybody who subscribes to this will get the same value guaranteed. This one is not just "convenience", but pretty much the only easy way to achieve the result.
Pretty useful, after all!

The most common scenario is when the source observable is hot and can complete before you subscribe to it. The AsyncSubject captures the last value and re-emits it for any future subscribers.
Edit
I'd have to check, but I believe FromAsyncPattern uses an AsyncSubject internally, so is actually already "Pruned".
However, assuming you were working with some other hot source that did not, the use of Prune comes entirely down to the lifetime of the IObservable before it is subscribed to. If you subscribe to it instantly, there is no need for Prune. If the IObservable will exist for a while before being subscribed to, however, it may have already completed.
This is my understanding, as someone who has ported Rx but never used Prune. Maybe you should ask the same question on the Rx forums? You've got a chance of it being answered by someone on the Rx team.

I've also found a neat use for it when I've got multiple UI components that need to listen to a task (e.g. callback) and by default, Subscribe() on a cold observable will kick off that task several times which is typically not what you want when sharing state across UI components.
I know Richard mentioned a lot of these points but I figured this is such a perfect candidate for single-run Tasks, to add this example in too.
var oTask = Observable.FromAsync(() => Task.Factory.StartNew(() =>
{
Thread.Sleep(1000);
Console.WriteLine("Executed Task");
}));
//Setup the IConnectedObservable
var oTask2 = oTask.PublishLast();
//Subscribe - nothing happens
oTask2.Subscribe(x => { Console.WriteLine("Called from Task 1"); });
oTask2.Subscribe(x => { Console.WriteLine("Called from Task 2"); });
//The one and only time the Task is run
oTask2.Connect();
//Subscribe after the task is already complete - we want the results
Thread.Sleep(5000);
oTask2.Subscribe(x => { Console.WriteLine("Called from Task 3"); });

Related

TPL Dataflow Blocks using LinkTo Predicate

I have some blocks that eventually go from a TransformBlock to one of three other transform blocks based on the LinkTo predicate. I am using DataflowLinkOptions to propagate the completion. The problem is that when a predicate is satisfied and that block is started the rest of my pipeline continues on. It would seem that the pipeline should wait for this block to finish first.
The code for this is something like this:
var linkOptions = new DataflowLinkOptions {PropagateCompletion = true};
mainBlock.LinkTo(block1, linkOptions, x => x.Status = Status.Complete);
mainBlock.LinkTo(block2, linkOptions, x => x.Status = Status.Cancelled);
mainBlock.LinkTo(block3, linkOptions, x => x.Status = Status.Delayed);
mainBlock.LinkTo(DataflowBlock.NullTarget<Thing>(), linkOptions);
Now, this doesn't work as I'd expect as I said, so the only way Ive found to get the behavior that I want is to take the linkOptions out and add the following into the lambda for the mainBlock.
mainBlock = new TransformBlock<Thing,Thing>(input =>
{
DoMyStuff(input);
if (input.Status = Status.Complete)
{
mainBlock.Completion.ContinueWith(t => block1.Complete());
}
if (input.Status = Status.Cancelled)
{
mainBlock.Completion.ContinueWith(t => block2.Complete());
}
if (input.Status = Status.Delayed)
{
mainBlock.Completion.ContinueWith(t => block3.Complete());
}
return input;
});
So the question, is this the only way to get this to work?
BTW, this has been run in my unit test with a single data item running through it to try and debug the pipeline behavior. Each block has been tested individually with multiple unit tests. So what happens in my pipeline unit test is that the assert is hit before the block finished executing and so fails.
If I remove the block2 and block3 links and debug the test using the linkOptions it works fine.
Your problem is not with the code in your question, that works correctly: when the main block completes, all the three followup blocks are marked for completion too.
The problem is with the end block: you're using PropagateCompletion there too, which means that when any of the three previous blocks completes, the end block is marked for completion. What you want is to mark it for completion when all three blocks complete and the Task.WhenAll().ContinueWith() combination from your answer does that (though the first part of that snippet is unnecessary, that does exactly the same thing PropagateCompletion would).
As it turns out, the link option propagation (at least this is my guess) will propagate the completion for blocks that don't satisfy the predicate in the linkTo.
Yes, it propagates completion always. Completion doesn't have any item associated with it, so it doesn't make any sense to apply the predicate to it. Maybe the fact that you always have only a single item (which is not common) makes this more confusing for you?
If my guess is correct I sort of feel like this is bug or design error in the link option completion propagation. Why should a block be complete if it was never used?
Why shouldn't it? To me, this makes perfect sense: even when there were no items with Status.Delayed this time around, you still want to complete the block that processes those items, so that any follow-up code can know that all delayed items were already processed. And the fact that there weren't any doesn't matter.
Anyway, if you encounter this often, you might want to create a helper method that links several source blocks to a single target block at the same time and propagates completion correctly:
public static void LinkTo<T>(
this IReadOnlyCollection<ISourceBlock<T>> sources, ITargetBlock<T> target,
bool propagateCompletion)
{
foreach (var source in sources)
{
source.LinkTo(target);
}
if (propagateCompletion)
Task.WhenAll(sources.Select(source => source.Completion))
.ContinueWith(_ => target.Complete());
}
Usage:
new[] { block1, block2, block3 }.LinkTo(endBlock, propagateCompletion: true);
Ok. So I have to thank Cory first off. When I first read his comment I was little annoyed because I felt like my code illustrated the concept pretty well and could be turned into a working version easily. But anyway, I felt the need to do a complete testable version I could post because of his comment.
In my test the surprising part was even though it mimicked my real code the path I thought would fail passed and the path that I though would pass failed. This made my head spin a bit. So I started to do some more permutations of the original code. Basically I created blocks that were synchronous and blocks that were asynchronous and made both kinds of pipelines. Four in total, 2 sync and 2 async, one of each used link options to propagate and the other used completion tasks in the MainBlock as shown.
After adding some task delays to the async tasks I found that the synchronous versions passed the test and the async ones failed.
So, the eventual solution to the problem was sort of none of the above. As it turns out, the link option propagation (at least this is my guess) will propagate the completion for blocks that don't satisfy the predicate in the linkTo. So, when a Thing with a Status of Complete comes down it goes to Block1.
Oh, I should point out in the complete test code I made all blocks 1,2 & 3 connect to the same EndBlock, which is not shown in the original code.
Anyway, after the predicate is satisfied and the Thing goes to Block1, blocks 2 and 3 I believe are set to complete. This causes the EndBlock to complete which we are awaiting in the unit test and the Assert fails because Block1 isn't done doing its work yet.
If my guess is correct I sort of feel like this is bug or design error in the link option completion propagation. Why should a block be complete if it was never used?
So, here is what I did to solve the problem. I took out the link options and manually wired up the completion events. Like this:
MainBlock.Completion.ContinueWith(t =>
{
Block1.Complete();
Block2.Complete();
Block3.Complete();
});
Task.WhenAll(Block1.Completion, Block2.Completion, Block3.Completion)
.ContinueWith(t =>
{
EndBlock.Complete();
});
This worked fine, and when moved to my real code worked as well. The Task.WhenAll is what made me believe that unused blocks were set to complete and why automatic propagation was the problem.
I hope this helps someone. I will come back and add a link when I post all my test code.
Edit:
Here is a link to the test code gist https://gist.github.com/jmichas/bfab9cec84f0d1e40e12

What is the correct way to schedule periodic events in Rx?

Simple question, I would hope: I'm writing an application in which I want to retrieve data from a database; I've elected to use Rx for this purpose to represent the database as a sequence of values.
I only want to poll the database (and thus have my observer's notifications occur) at a maximum of once every 5 seconds. Right now, I have something like this, where the Scheduler is scheduling a periodic task that causes my observer to be subscribed to the observable that is my database:
_scheduler.SchedulePeriodic(_repository, TimeSpan.FromSeconds(5),
(repo) => repo.AsObservable()
.Where(item => _SomeFilter(item))
.Subscribe(item => _SomeProcessFunction(item))
);
Function names and the like omitted for brevity; repo.AsObservable() is simply a function that returns an IObservable<T> of all the items inside the repository at that point.
Now, I figure that this is the correct way of doing things, however before I came up with this solution I did come up with a different solution in which I had an Observable.Timer with the subscribed observer would subscribe to the AsObservable() return value every timer tick instead.
My question is that this seems very.. odd - why am I subscribing multiple times to the observable?
Sorry if this question is confusing, it confused me while writing it, however the schedulers are also confusing for me :P
What if you use the built in operators instead of manually scheduling tasks?
repo.AsObservable()
.Where(_SomeFilter)
// Wait 5 seconds before completing
.Concat(Observable.Empty<T>().Delay(TimeSpan.FromSeconds(5))
// Resubscribe indefinitely after source completes
.Repeat()
// Subscribe
.Subscribe(_SomeProcessFunction);

Is a Subject in RX.Net always harmful?

I was talking to a colleague who pointed me to the SO question about subjects being considered harmful. However, I have two cases where I have some non-deterministic code that does not seem reasonable any other way.
Non-standard event:
event handler(class, result)
{
subject.OnNext(result);
}
public delegate void _handler
([MarshalAs(UnmanagedType.Interface), In] MyClass class,
[MarshalAs(UnmanagedType.Interface), In] ResultClass result)
Parallel Tasks (Non-Deterministic number of tasks all running in parallel, starting at different times):
Task.Start(()=> ...).ContinueWith(prevTask => subject.OnNext(prevTask.result))
The subject is not exposed, only through an observable. Is there another route suggested that isnt a ton of boilerplate?
Subjects are not always harmful. There are many legitimate uses of them even within Rx itself. However, many times a person goes to use a Subject, there's already a robust Rx method written for that scenario(and it may or may not be using subjects internally). This is the case for your 2 examples. Look at Task.ToObservable and Observable.FromEventPattern.
Another common case subjects are misused is when a developer breaks a stream in two. They become convinced they need to subscribe to a stream and in the callback they produce data for a new stream. They do this with a Subject. But usually they just should have used Select instead.
Observable.FromEvent
System.FromEvent works for more than just built-in event types: you just need to use the correct overload.
class Program
{
private static event Action<int> MyEvent;
public static void Main(string[] args)
{
Observable.FromEvent<int>(
(handler) => Program.MyEvent += handler,
(handler) => Program.MyEvent -= handler
)
.Subscribe(Console.WriteLine);
Program.MyEvent(5);
Console.ReadLine();
}
}
Task.ToObservable & Merge
If you already have access to all of your tasks, you can convert them to Observables, and Merge them into a single observable.
class Program
{
public static void Main(string[] args)
{
Observable.Merge(
// Async / Await
(
(Func<Task<string>>)
(async () => { await Task.Delay(250); return "async await"; })
)().ToObservable(),
// FromResult
Task.FromResult("FromResult").ToObservable(),
// Run
Task.Run(() => "Run").ToObservable()
)
.Subscribe(Console.WriteLine);
Console.ReadLine();
}
}
Merge Observable
Alternatively, if you do not have all of your tasks up front, you can still use Merge, but you'll need some way of communicating future tasks. In this case, I've used a subject, but you should use the simplest Observable possible to express this. If that's a subject, then by all means, use a subject.
class Program
{
public static void Main(string[] args)
{
// We use a subject here since we don't have all of the tasks yet.
var tasks = new Subject<Task<string>>();
// Make up some tasks.
var fromResult = Task.FromResult("FromResult");
var run = Task.Run(() => "Run");
Func<Task<string>> asyncAwait = async () => {
await Task.Delay(250);
return "async await";
};
// Merge any future Tasks into an observable, and subscribe.
tasks.Merge().Subscribe(Console.WriteLine);
// Send tasks.
tasks.OnNext(fromResult);
tasks.OnNext(run);
tasks.OnNext(asyncAwait());
Console.ReadLine();
}
}
Subjects
Why to use or not to use Subjects is a question I don't have the time to answer adequately. Typically speaking, however, I find that using a Subject tends to be the "easy way out" when it appears an operator does not already exist.
If you can somehow limit the exposure of a subject in terms of it's visibility to the rest of the application, then by all means use a subject and do so. If you're looking for message bus functionality, however, you should rethink the design of the application, as message buses are anti-patterns.
Subjects aren't harmful. That is probably even a little too dogmatic for me (and I am first to boo-boo the use of subjects). I would say that Subjects indicate a code smell. You probably could be doing it better without them, but if you keep the encapsulated within your class then at least you keep the smell in one place.
Here I would say, that you are already using "non-standard" event patterns, and it seems you don't want to, or cant, change that. In this case, it seems the usage of subjects as a bridge isn't going to make it any worse than it is.
If you were starting from scratch, then I would suggest that you deeply think about your design and you will probably find that you just wouldn't need a subject.
Lastly, I agree with the other comments that you should be using a FromEvent and ToTask, but you suggest these do not work. Why? I dont think you provide nearly enough of your code base to help with design questions like this. e.g. How are thee nondeterministic task being created? and by what? What is the actual problem you are trying to solve. If you could provide a full example, you might get the amount of attention you are looking for.
Here is what a good book about the Rx says regarding why and when Subject can be harmful:
http://www.introtorx.com/Content/v1.0.10621.0/18_UsageGuidelines.html
"Avoid the use of the subject types. Rx is effectively a functional
programming paradigm. Using subjects means we are now managing state,
which is potentially mutating. Dealing with both mutating state and
asynchronous programming at the same time is very hard to get right.
Furthermore, many of the operators (extension methods) have been
carefully written to ensure that correct and consistent lifetime of
subscriptions and sequences is maintained; when you introduce
subjects, you can break this. Future releases may also see significant
performance degradation if you explicitly use subjects."

Reactive extensions Subject uses

I am at the early stages of learning about Rx and have come across the Subject class. I don't quite understand why this class exists. I understand that it implements both IObservable and IObserver but what are Subjects used for?
As far as I can tell, they can act as a proxy between a source and a bunch of subscribers but couldn't the subscribers just subscribe directly to the source? When I see instances of a Subject being used as an observable and observer I get confused.
I am sure I am just not getting some basic fact here but I don't know what Subject brings to the game. I guess I am looking for some basic (but hopefully real world) example of when Subjects are useful and when they are not (as I have also read that Subjects are not usually used, replaced with Observable.Create).
First, a lot of folks will tell you Subject<T> doesn't belong, since it goes against some other tenets/patterns in the Rx framework.
That said, they act as either an IObservable or an IObserver, so you get some useful functionality out of them - I generally use them during the initial development stages for:
A "debug point" of sorts, where I can subscribe to an IObservable chain inline with a Subject<T>, and inspect the contents with the debugger.
An "observable on demand", where I can manually call OnNext and pass in data I want to inject into the stream
Used to use them to replicate what ConnectableObserable now does - a "broadcast" mechanism for multiple subscribers to a single Observable, but that can be done with Publish now.
Bridging layer between disparate systems; again, this is largely unnecessary now with the various FromAsync, FromEvent extensions, but they can still be used as such (basically, the "old" system injects events into the Subject<T> via OnNext, and from then on the normal Rx flow.
Using subjects means we are now managing state, which is potentially mutating. Mutating state and asynchronous programming are very hard to get right. Furthermore many of the operators (extension methods) have been carefully written to ensure correct and consistent lifetime of subscriptions and sequences are maintained. When you introduce subjects you can break this.
A significant benefit that the Create method has over subjects is that the sequence will be lazily evaluated.
In this example we show how we might first return a sequence via standard blocking eagerly evaluated call, and then we show the correct way to return an observable sequence without blocking by lazy evaluation.
Below example will be blocked for at least 1 second before they even receive the IObservable, regardless of if they do actually subscribe to it or not.
private IObservable<string> BlockingMethod()
{
var subject = new ReplaySubject<string>();
subject.OnNext("a");
subject.OnNext("b");
subject.OnCompleted();
Thread.Sleep(1000);
return subject;
}
Where as in bleow example consumer immediately receives the IObservable and will only incur the cost of the thread sleep if they subscribe.
private IObservable<string> NonBlocking()
{
return Observable.Create<string>(
(IObserver<string> observer) =>
{
observer.OnNext("a");
observer.OnNext("b");
observer.OnCompleted();
Thread.Sleep(1000);
return Disposable.Create(() => Console.WriteLine("Observer has unsubscribed"));
//or can return an Action like
//return () => Console.WriteLine("Observer has unsubscribed");
});
}

response-gating next message send, with Rx

given a List<Message> i send out the first message with my Send(message). Now I would like to wait for (an asynchronous) response to come back before i send out the next message...
Block until notified 'old' way
i know how to implement an event-based solution for this situation, using thread locking / with Monitor.Wait and Monitor.Pulse
Reactive 'new' way?
But I was wondering whether it would make sense to utilize Reactive Extensions here?
If Rx would convey worthwhile benefits here then how could I make the response reactively gate the next send invocation? Obviously it would involve IObservable, probably two as primary sources, but then what?
The question is not very specific and seems to be very general in the sense that you have not mentioned what is the sender receiver etc, so the answer would also be very general :)
var receiveObs = //You have created a observable around the receive mechanism
var responses = messages.Select(m => {
send(m);
return receiveObs.First();
}).ToList();
I think Rx is a good choice here, but I think I could be missing something in your requirements. From what I understand Rx provides a very simple solution.
If you already have a list of messages then you can send them reactively like so:
messages
.ToObservable()
.ObserveOn(Scheduler.ThreadPool)
.Subscribe(m =>
{
Send(m);
});
This pushes the calls to Send to the thread-pool and, by the built-in behaviour of observables, each call to Send waits until the previous call is completed.
Since this is all happening on a different thread your code is non-blocking.
The extra benefit of Rx is that you wouldn't need to change the behaviour or signature of your Send method to make this work.
Simple, huh?
I tested this and it worked fine given my understanding of your problem. Is this all you need or is there something I missed?
I'm not sure Rx is a good fit here. Rx is based on the concept of "push collections", i.e. pushing data to consumers instead of pulling it. What you want is pull the first item, send it asynchronously, and continue with the next element when the asynchronous operation finished. For this kind of job, the perfect tool would be async / await*!
async void SendMessages(List<Message> messages)
{
foreach (Message message in messages)
{
await SendAsync(message);
}
}
with
Task SendAsync(Message message);
* available in the Async CTP or the .NET 4.5 Preview
Assuming your Send method follows the APM model, the following approach should work for you
List<Message> messages;
IObservable<Response> xs;
xs = messages.ToObservable().SelectMany(msg => Observable.FromAsyncPattern(Send, msg));
Edit - this won't work as Anderson has suggested, here's an example showing the problem
Func<int,string> Send = (ii) => { "in Send".Dump(); Thread.Sleep(2000); return ii.ToString(); };
Func<int,IObservable<string>> sendIO = Observable.FromAsyncPattern<int,string>(Send.BeginInvoke, Send.EndInvoke);
(new [] { 1, 2, 3 }).ToObservable().SelectMany(sendIO).Dump();

Categories

Resources