I need to delete a file and some other process in the application blocks it. As a workaround I decided to try several times with an interval. Is it correct approach:
Observable.Start(() => File.Delete(path)).Retry(2)
.Delay(TimeSpan.FromMilliseconds(500)).Wait();
This won't work the way you want. There are three problems:
Delay doesn't work how you think - it delays passing on the events, but the source still runs immediately.
You are issuing the Retry before the Delay
You need to use Defer to create a factory because Start will only call the embedded function once on evaluation.
Have a look at this answer for more detail on Delay and why DelaySubscription is better: Rx back off and retry.
This answer has a good implementation of a back-off retry: Write an Rx "RetryAfter" extension method
A simple fix for your code could be this, which catches the exception and rethrows it after a delay - but there's no delay if it works:
Observable.Defer(() => Observable.Start(() => File.Delete(path)))
.Catch((Exception ex) =>
Observable.Throw<Unit>(ex)
.DelaySubscription(TimeSpan.FromMilliseconds(500)))
.Retry(2)
.Wait();
Do have a look at the second link above for a fuller and better implementation though.
I kept the code above simple to make the point and isn't perfect - it always delays the exception for example.
You really want to have the DelaySubscription on the action and have it's delay time be dynamically calculated depending on the number of retries, which is what the linked implementation will do.
Related
Does anyone have a steer on when to use one of these methods over the other. They seem to do the same thing in that they convert from TPL Task to an Observable.
Observable.FromAsync appear to support cancellation tokens which might be the subtle difference that allows the method generating the task to participate in cooperative cancellation if the observable is disposed.
Just wondering if I'm missing something obvious as to why you'd use one over the other.
Thanks
Observable.FromAsync accepts a TaskFactory in the form of Func<Task> or Func<Task<TResult>>,
in this case, the task is only created and executed, when the observable is subscribed to.
Where as .ToObservable() requires an already created (and thus started) Task.
#Sickboy answer is correct.
Observable.FromAsync() will start the task at the moment of subscription.
Task.ToObservable() needs an already running task.
One use for Observable.FromAsync is to control reentrancy for multiple calls to an async method.
This is an example where these two methods are not equivalent:
//ob is some IObservable<T>
//ExecuteQueryAsync is some async method
//Here, ExecuteQueryAsync will run **serially**, the second call will start
//only when the first one is already finished. This is an important property
//if ExecuteQueryAsync doesn't support reentrancy
ob
.Select(x => Observable.FromAsync(() => ExecuteQueryAsync(x))
.Concat()
.ObserveOnDispatcher()
.Subscribe(action)
vs
//ob is some IObservable<T>
//ExecuteQueryAsync is some async method
//Even when the `Subscribe` action order will be the same as the first
//example because of the `Concat`, ExecuteQueryAsync calls could be
//parallel, the second call to the method could start before the end of the
//first call.
.Select(x => ExecuteQueryAsync(x).ToObservable())
.Concat()
.Subscribe(action)
Note that on the first example one may need the ObserveOn() or ObserveOnDispatcher() method to ensure that the action is executed on the original dispatcher, since the Observable.FromAsync doesn't await the task, thus the continuation is executed on any available dispatcher
Looking at the code, it appears that (at least in some flows) that Observable.FromAsync calls into .ToObservable()*. I am sure the intent that they should be semantically equivalent (assuming you pass the same parameters e.g. Scheduler, CancellationToken etc.).
One is better suited to chaining/fluent syntax, one may read better in isolation. Whichever you coding style favors.
*https://github.com/Reactive-Extensions/Rx.NET/blob/859e6159cb07be67fd36b18c2ae2b9a62979cb6d/Rx.NET/Source/System.Reactive.Linq/Reactive/Linq/QueryLanguage.Async.cs#L727
Aside from being able to use a CancellationToken, FromAsync wraps in a defer so this allows changing the task logic based upon conditions at the time of subscription. Note that the Task will not be started, internally task.ToObservable will be called. The Func does allow you to start the task though when you create it.
I have tried to simplify my issue by a sample code here. I have a producer thread constantly pumping in data and I am trying to batch it with a time delay between batches so that the UI has time to render it. But the result is not as expected, the produce and consumer seems to be on the same thread.
I don't want the batch buffer to sleep on the thread that is producing. Tried SubscribeOn did not help much. What am I doing wrong here, how do I get this to print different thread Ids on producer and consumer thread.
static void Main(string[] args)
{
var stream = new ReplaySubject<int>();
Task.Factory.StartNew(() =>
{
int seed = 1;
while (true)
{
Console.WriteLine("Thread {0} Producing {1}",
Thread.CurrentThread.ManagedThreadId, seed);
stream.OnNext(seed);
seed++;
Thread.Sleep(TimeSpan.FromMilliseconds(500));
}
});
stream.Buffer(5).Do(x =>
{
Console.WriteLine("Thread {0} sleeping to create time gap between batches",
Thread.CurrentThread.ManagedThreadId);
Thread.Sleep(TimeSpan.FromSeconds(2));
})
.SubscribeOn(NewThreadScheduler.Default).Subscribe(items =>
{
foreach (var item in items)
{
Console.WriteLine("Thread {0} Consuming {1}",
Thread.CurrentThread.ManagedThreadId, item);
}
});
Console.Read();
}
Understanding the difference between ObserveOn and SubscribeOn is key here. See - ObserveOn and SubscribeOn - where the work is being done for an in depth explanation of these.
Also, you absolutely don't want to use a Thread.Sleep in your Rx. Or anywhere. Ever. Do is almost as evil, but Thead.Sleep is almost always totally evil. Buffer has serveral overloads you want to use instead - these include a time based overload and an overload that accepts a count limit and a time-limit, returning a buffer when either of these are reached. A time-based buffering will introduce the necessary concurrency between producer and consumer - that is, deliver the buffer to it's subscriber on a separate thread from the producer.
Also see these questions and answers which have good discussions on keeping consumers responsive (in the context of WPF here, but the points are generally applicable).
Process lots of small tasks and keep the UI responsive
Buffer data from database cursor while keeping UI responsive
The last question above specifically uses the time-based buffer overload. As I said, using Buffer or ObserveOn in your call chain will allow you to add concurrency between producer and consumer. You still need to take care that the processing of a buffer is still fast enough that you don't get a queue building up on the buffer subscriber.
If queues do build up, you'll need to think about means of applying backpressure, dropping updates and/or conflating the updates. These is a big topic too broad for in depth discussion here - but basically you either:
Drop events. There have been many ways discussed to tackle this in Rx. I current like Ignore incoming stream updates if last callback hasn't finished yet but also see With Rx, how do I ignore all-except-the-latest value when my Subscribe method is running and there are many other discussions of this.
Signal the producer out of band to tell it to slow down or send conflated updates, or
You introduce an operator that does in-stream conflation - like a smarter Buffer that could compress events to, for example, only include the latest price on a stock item etc. You can author operators that are sensitive to the time that OnNext invocations take to process, for example.
See if proper buffering helps first, then think about throttling/conflating events at the source as (a UI can only show so much infomation anway) - then consider smarter conflation as this can get quite complex. https://github.com/AdaptiveConsulting/ReactiveTrader is a good example of a project using some advanced conflation techniques.
Although the other answers are correct, I'd like to identify your actual problem as perhaps a misunderstanding of the behavior of Rx. Putting the producer to sleep blocks subsequent calls to OnNext and it seems as though you're assuming Rx automatically calls OnNext concurrently, but in fact it doesn't for very good reasons. Actually, Rx has a contract that requires serialized notifications.
See §§4.2, 6.7 in the Rx Design Guidelines for details.
Ultimately, it looks as though you're trying to implement the BufferIntrospective operator from Rxx. This operator allows you to pass in a concurrency-introducing scheduler, similar to ObserveOn, to create a concurrency boundary between a producer and a consumer. BufferIntrospective is a dynamic backpressure strategy that pushes out heterogeneously-sized batches based on the changing latencies of an observer. While the observer is processing the current batch, the operator buffers all incoming concurrent notifications. To accomplish this, the operator takes advantage of the fact that OnNext is a blocking call (per the §4.2 contract) and for that reason this operator should be applied as close to the edge of the query as possible, generally immediately before you call Subscribe.
As James described, you could call it a "smart buffering" strategy itself, or see it as the baseline for implementing such a strategy; e.g., I've also defined a SampleIntrospective operator that drops all but the last notification in each batch.
ObserveOn is probably what you want. It takes a SynchronizationContext as an argument, that should be the SynchronizationContext of your UI. If you don't know how to get it, see Using SynchronizationContext for sending events back to the UI for WinForms or WPF
I have some blocks that eventually go from a TransformBlock to one of three other transform blocks based on the LinkTo predicate. I am using DataflowLinkOptions to propagate the completion. The problem is that when a predicate is satisfied and that block is started the rest of my pipeline continues on. It would seem that the pipeline should wait for this block to finish first.
The code for this is something like this:
var linkOptions = new DataflowLinkOptions {PropagateCompletion = true};
mainBlock.LinkTo(block1, linkOptions, x => x.Status = Status.Complete);
mainBlock.LinkTo(block2, linkOptions, x => x.Status = Status.Cancelled);
mainBlock.LinkTo(block3, linkOptions, x => x.Status = Status.Delayed);
mainBlock.LinkTo(DataflowBlock.NullTarget<Thing>(), linkOptions);
Now, this doesn't work as I'd expect as I said, so the only way Ive found to get the behavior that I want is to take the linkOptions out and add the following into the lambda for the mainBlock.
mainBlock = new TransformBlock<Thing,Thing>(input =>
{
DoMyStuff(input);
if (input.Status = Status.Complete)
{
mainBlock.Completion.ContinueWith(t => block1.Complete());
}
if (input.Status = Status.Cancelled)
{
mainBlock.Completion.ContinueWith(t => block2.Complete());
}
if (input.Status = Status.Delayed)
{
mainBlock.Completion.ContinueWith(t => block3.Complete());
}
return input;
});
So the question, is this the only way to get this to work?
BTW, this has been run in my unit test with a single data item running through it to try and debug the pipeline behavior. Each block has been tested individually with multiple unit tests. So what happens in my pipeline unit test is that the assert is hit before the block finished executing and so fails.
If I remove the block2 and block3 links and debug the test using the linkOptions it works fine.
Your problem is not with the code in your question, that works correctly: when the main block completes, all the three followup blocks are marked for completion too.
The problem is with the end block: you're using PropagateCompletion there too, which means that when any of the three previous blocks completes, the end block is marked for completion. What you want is to mark it for completion when all three blocks complete and the Task.WhenAll().ContinueWith() combination from your answer does that (though the first part of that snippet is unnecessary, that does exactly the same thing PropagateCompletion would).
As it turns out, the link option propagation (at least this is my guess) will propagate the completion for blocks that don't satisfy the predicate in the linkTo.
Yes, it propagates completion always. Completion doesn't have any item associated with it, so it doesn't make any sense to apply the predicate to it. Maybe the fact that you always have only a single item (which is not common) makes this more confusing for you?
If my guess is correct I sort of feel like this is bug or design error in the link option completion propagation. Why should a block be complete if it was never used?
Why shouldn't it? To me, this makes perfect sense: even when there were no items with Status.Delayed this time around, you still want to complete the block that processes those items, so that any follow-up code can know that all delayed items were already processed. And the fact that there weren't any doesn't matter.
Anyway, if you encounter this often, you might want to create a helper method that links several source blocks to a single target block at the same time and propagates completion correctly:
public static void LinkTo<T>(
this IReadOnlyCollection<ISourceBlock<T>> sources, ITargetBlock<T> target,
bool propagateCompletion)
{
foreach (var source in sources)
{
source.LinkTo(target);
}
if (propagateCompletion)
Task.WhenAll(sources.Select(source => source.Completion))
.ContinueWith(_ => target.Complete());
}
Usage:
new[] { block1, block2, block3 }.LinkTo(endBlock, propagateCompletion: true);
Ok. So I have to thank Cory first off. When I first read his comment I was little annoyed because I felt like my code illustrated the concept pretty well and could be turned into a working version easily. But anyway, I felt the need to do a complete testable version I could post because of his comment.
In my test the surprising part was even though it mimicked my real code the path I thought would fail passed and the path that I though would pass failed. This made my head spin a bit. So I started to do some more permutations of the original code. Basically I created blocks that were synchronous and blocks that were asynchronous and made both kinds of pipelines. Four in total, 2 sync and 2 async, one of each used link options to propagate and the other used completion tasks in the MainBlock as shown.
After adding some task delays to the async tasks I found that the synchronous versions passed the test and the async ones failed.
So, the eventual solution to the problem was sort of none of the above. As it turns out, the link option propagation (at least this is my guess) will propagate the completion for blocks that don't satisfy the predicate in the linkTo. So, when a Thing with a Status of Complete comes down it goes to Block1.
Oh, I should point out in the complete test code I made all blocks 1,2 & 3 connect to the same EndBlock, which is not shown in the original code.
Anyway, after the predicate is satisfied and the Thing goes to Block1, blocks 2 and 3 I believe are set to complete. This causes the EndBlock to complete which we are awaiting in the unit test and the Assert fails because Block1 isn't done doing its work yet.
If my guess is correct I sort of feel like this is bug or design error in the link option completion propagation. Why should a block be complete if it was never used?
So, here is what I did to solve the problem. I took out the link options and manually wired up the completion events. Like this:
MainBlock.Completion.ContinueWith(t =>
{
Block1.Complete();
Block2.Complete();
Block3.Complete();
});
Task.WhenAll(Block1.Completion, Block2.Completion, Block3.Completion)
.ContinueWith(t =>
{
EndBlock.Complete();
});
This worked fine, and when moved to my real code worked as well. The Task.WhenAll is what made me believe that unused blocks were set to complete and why automatic propagation was the problem.
I hope this helps someone. I will come back and add a link when I post all my test code.
Edit:
Here is a link to the test code gist https://gist.github.com/jmichas/bfab9cec84f0d1e40e12
Simple question, I would hope: I'm writing an application in which I want to retrieve data from a database; I've elected to use Rx for this purpose to represent the database as a sequence of values.
I only want to poll the database (and thus have my observer's notifications occur) at a maximum of once every 5 seconds. Right now, I have something like this, where the Scheduler is scheduling a periodic task that causes my observer to be subscribed to the observable that is my database:
_scheduler.SchedulePeriodic(_repository, TimeSpan.FromSeconds(5),
(repo) => repo.AsObservable()
.Where(item => _SomeFilter(item))
.Subscribe(item => _SomeProcessFunction(item))
);
Function names and the like omitted for brevity; repo.AsObservable() is simply a function that returns an IObservable<T> of all the items inside the repository at that point.
Now, I figure that this is the correct way of doing things, however before I came up with this solution I did come up with a different solution in which I had an Observable.Timer with the subscribed observer would subscribe to the AsObservable() return value every timer tick instead.
My question is that this seems very.. odd - why am I subscribing multiple times to the observable?
Sorry if this question is confusing, it confused me while writing it, however the schedulers are also confusing for me :P
What if you use the built in operators instead of manually scheduling tasks?
repo.AsObservable()
.Where(_SomeFilter)
// Wait 5 seconds before completing
.Concat(Observable.Empty<T>().Delay(TimeSpan.FromSeconds(5))
// Resubscribe indefinitely after source completes
.Repeat()
// Subscribe
.Subscribe(_SomeProcessFunction);
I'm making an observable out of an AsyncPattern, which I want to keep polling at intervals. So far I've got to here:
var observer = Observable.Defer(ObservableFunc)
.Concat(Observable.Empty<int>().Delay(TimeSpan.FromSeconds(_pollInterval)))
.Timeout(TimeSpan.FromSeconds(_Timeout_s))
.Materialize()
.Repeat()
.Publish()
.RefCount();
Don't poll until someone subscribes (Defer)
re-poll a given time after last response (and not just keep blindly polling) (Concat/Delay)
Detect if the poll has timed out (no answer) (Timeout)
Start again if it does time out (Repeat)
Don't re-subscribe for new subscribers, stop polling when there are no more subscribers (Publish/RefCount).
My question is about the Materialize in the middle there. This (to me), seems to be a fairly elegant way of letting the TimeoutException 'through' so that the subscribers can know about it. I'm just not sure whether I should let it carry on as a Notification, or maybe re-materialize it into some kind of Maybe/Nullable T.
This may not "qualify" as an answer, but probably too long for a comment...sigh
My gut says: propogate the timeout as a Maybe/Nullable.
Reasoning:
Probably no one subscribing to this cares what the reason it failed to produce a value was, they just care that a next value was unavailable for some reason. (Of course, I'm making a lot of assumptions here)
To whit, I'd follow the Timeout call with a Catch that would inject/return a "null value" (defined however you'd like - Maybe, Nullable<T>, etc), thus making the "shape" of the resulting stream way more clear to any subscribers.