Is there a way to use Aggregate function (Max, Count, ....) with Buffer before a sequence is completed.
When Completed this will produce results, but with continues stream it does not give
any results?
I was expecting there is some way to make this work with buffer?
IObservable<long> source;
IObservable<IGroupedObservable<long, long>> group = source
.Buffer(TimeSpan.FromSeconds(5))
.GroupBy(i => i % 3);
IObservable<long> sub = group.SelectMany(grp => grp.Max());
sub.Subscribe(l =>
{
Console.WriteLine("working");
});
Use Scan instead of Aggregate. Scan works just like Aggregate except that it sends out intermediate values as the stream advances. It is good for "running totals", which appears to be what you are asking for.
All the "statistical" operators in Rx (Min/Max/Sum/Count/Average) are using a mechanism that propagate the calculate value just when the subscription is completed, and that is the big difference between Scan and Aggregate, basically if you want to be notified when a new value is pushed in your subscription it is necessary to use Scan.
In your case if you want to keep the same logic, you should combine with GroupByUntil or Window operators, the conditions to use both can create and complete the group subscription regularly, and that will be used to push the next value.
You can get more info here: http://www.introtorx.com/content/v1.0.10621.0/07_Aggregation.html#BuildYourOwn
By the way I wrote a text related to what you want. Check in: http://www.codeproject.com/Tips/853256/Real-time-statistics-with-Rx-Statistical-Demo-App
Related
I have a sequence of Images (IObservable<ImageSource>) that goes through this "pipeline".
Each image is recognized using OCR
If the results have valid values, the are uploaded to a service that can register a set of results at a given time (not concurrently).
If the results have any invalid value, they are presented to the user in order to fix them. After they are fixed, the process continues.
During the process, the UI should stay responsive.
The problem is that I don't know how to handle the case when the user has to interact. I just cannot do this
subscription = images
.Do(source => source.Freeze())
.Select(image => OcrService.Recognize(image))
.Subscribe(ocrResults => Upload(ocrResults));
...because when ocrResults have to be fixed by the user, the flow should be kept on hold until the valid values are accepted (ie. the user could execute a Command clicking a Button)
How do I say: if the results are NOT valid, wait until the user fixes them?
This seems to be a mix of UX, WPF and Rx all wrapped up in one problem. Trying to solve it with only Rx is probably going to send you in to a tail spin. I am sure you could solve it with just Rx, and no more thought about it, but would you want to? Would it be testable, loosely coupled and easy to maintain?
In my understanding of the problem you have to following steps
User Uploads/Selects some images
The system performs OCR on each image
If the OCR tool deems the image source to be valid, the result of the processing is uploaded
If the OCR tool deems the image source to be invalid, the user "fixes" the result and the result is uploaded
But this may be better described as
User Uploads/Selects some images
The system performs OCR on each image
The result of the OCR is placed in a validation queue
While the result is invalid, a user is required to manually update it to a valid state.
The valid result is uploaded
So this to me seem that you need a task/queue based UI so that a User can see invalid OCR results that they need to work on. This also then tells me that if a person is involved, that it should probably be outside of the Rx query.
Step 1 - Perform ORC
subscription = images
.Subscribe(image=>
{
//image.Freeze() --probably should be done by the source sequence
var result = _ocrService.Recognize(image);
_validator.Enqueue(result);
});
Step 2 - Validate Result
//In the Enqueue method, if queue is empty, ProcessHead();
//Else add to queue.
//When Head item is updated, ProcessHead();
//ProcessHead method checks if the head item is valid, and if it is uploads it and remove from queue. Once removed from queue, if(!IsEmpty) {ProcessHead();}
//Display Head of the Queue (and probably queue depth) to user so they can interact with it.
Step 3 - Upload result
Upload(ocrResults)
So here Rx is just a tool in our arsenal, not the one hammer that needs to solve all problems. I have found that with most "Rx" problems that grow in size, that Rx just acts as the entry and exit points for various Queue structures. This allows us to make the queuing in our system explicit instead of implicit (i.e. hidden inside of Rx operators).
I'm assuming your UploadAsync method returns a Task to allow you to wait for it to finished? If so, there are overloads of SelectMany that handle tasks.
images.Select(originalImage => ImageOperations.Resize(originalImage))
.SelectMany(resizedImg => imageUploader.UploadAsync(resizedImg))
.Subscribe();
Assuming you've got an async method which implements the "user fix process":
/* show the image to the user, which fixes it, returns true if fixed, false if should be skipped */
async Task UserFixesTheOcrResults(ocrResults);
Then your observable becomes:
subscription = images
.Do(source => source.Freeze())
.Select(image => OcrService.Recognize(image))
.Select(ocrResults=> {
if (ocrResults.IsValid)
return Observable.Return(ocrResults);
else
return UserFixesTheOcrResults(ocrResults).ToObservable().Select(_ => ocrResults)
})
.Concat()
.Subscribe(ocrResults => Upload(ocrResults));
I'm trying to merge two sensor data streams on a regular interval and I'm having trouble doing this properly in Rx. The best I've come up with is the the sample below, however I doubt this is optimal use of Rx.
Is there a better way?
I've tried Sample() but the sensors produce values at irregular intervals, slow (>1sec) and fast (<1sec). Sample() only seems to deal with fast data.
Observable<SensorA> sensorA = ... /* hot */
Observable<SensorB> sensorB = ... /* hot */
SensorA lastKnownSensorA;
SensorB lastKnownSensorB;
sensorA.Subscribe(s => lastKnownSensorA = s);
sensorB.Subscribe(s => lastKnownSensorB = s);
var combined = Observable.Interval(TimeSpan.FromSeconds(1))
.Where(t => _lastKnownSensorA != null)
.Select(t => new SensorAB(lastKnownSensorA, lastKnownSensorB)
I think #JonasChapuis 's answer may be what you are after, but there are a couple of issues which might be problematic:
CombineLatest does not emit a value until all sources have emitted at least one value each, which can cause loss of data from faster sources up until that point. That can be mitigated by using StartWith to seed a null object or default value on each sensor stream.
Sample will not emit a value if no new values have been observed in the sample period. I can't tell from the question if this is desirable or not, but if not there is in interesting trick to address this using a "pace" stream, described below to create a fixed frequency, instead of the maximum frequency obtained with Sample.
To address the CombineLatest issue, determine appropriate null values for your sensor streams - I usually make these available via a static Null property on the type - which makes the intention very clear. For value types use of Nullable<T> can also be a good option:
Observable<SensorA> sensorA = ... .StartWith(SensorA.Null);
Observable<SensorB> sensorB = ... .StartWith(SensorB.Null);
N.B. Don't make the common mistake of applying StartWith only to the output of CombinedLatest... that won't help!
Now, if you need regular results (which naturally could include repeats of the most recent readings), create a "pace" stream that emits at the desired interval:
var pace = Observable.Interval(TimeSpan.FromSeconds(1));
Then combine as follows, omitting the pace value from results:
var sensorReadings = Observable.CombineLatest(
pace, sensorA, sensorB,
(_, a, b) => new SensorAB(a,b));
It's also worth knowing about the MostRecent operator which can be combined with Zip very effectively if you want to drive output at the speed of a specific stream. See these answers where I demonstrate that approach: How to combine a slow moving observable with the most recent value of a fast moving observable and the more interesting tweak to handle multiple streams: How do I combine three observables such that
How about using the CombineLatest() operator to merge the latest values of the sensors every time either produces a value, followed by Sample() to ensure a max frequency of one measurement per second?
sensorA.CombineLatest(sensorB, (a, b) => new {A=a, B=b}).Sample(TimeSpan.FromSeconds(1))
Very similar to this question: Rx IObservable buffering to smooth out bursts of events, I am interested in smoothing out observables that may occur in bursts.
Hopefully the diagram below illustrates that I am aiming for:
Raw: A--B--CDE-F--------------G-----------------------
Interval: o--o--o--o--o--o--o--o--o--o--o--o--o--o--o--o--o
Output: A--B--C--D--E--F-----------G---------------------
Given the raw stream, I wish to stretch these events over regular intervals.
Throttling does not work as then I end up losing elements of the raw sequence.
Zip works well if the raw stream is more frequent than the timer, but fails if there are periods where there are no raw events.
EDIT
In response to Dan's answer, the problem with Buffer is that if bursts of many events arrive within a short time interval then I receive the events too often. Below shows what could happen with a buffer size of 3, and a timeout configured to the required interval:
Raw: -ABC-DEF-----------G-H-------------------------------
Interval: o--------o--------o--------o--------o--------o--------
Buffered: ---A---D-------------------G--------------------------
B E H
C F
Desired: ---------A--------B--------C--------D--------E ..etc.
How about this? (inspired by James' answer mentioned in the comments)...
public static IObservable<T> Regulate<T>(this IObservable<T> source, TimeSpan period)
{
var interval = Observable.Interval(period).Publish().RefCount();
return source.Select(x => Observable.Return(x)
.CombineLatest(interval, (v, _) => v)
.Take(1))
.Concat();
}
It turns each value in the raw observable into its own observable. The CombineLatest means it won't produce a value until the interval does. Then we just take one value from each of these observables and concatenate.
The first value in the raw observable gets delayed by one period. I'm not sure if that is an issue for you or not.
It looks like what you want to use is Buffer. One of the overloads allows you to specify an interval as well as the buffer length. You could conceivably set the length to 1.
Raw.Buffer(interval, 1);
For some more examples of its use, you can refer to the IntroToRX site.
I've been looking for examples on how to use Observable.Buffer in rx but can't find anything more substantial than boiler plate time buffered stuff.
There does seem to be an overload to specify a "bufferClosingSelector" but I can't wrap my mind around it.
What I'm trying to do is create a sequence that buffers by time or by an "accumulation".
Consider a request stream where every request has some sort of weight to it and I do not want to process more than x accumulated weight at a time, or if not enough has accumulated just give me what has come in the last timeframe(regular Buffer functionality)
bufferClosingSelector is a function called every time to get an Observable which will produce a value when the buffer is expected to be closed.
For example,
source.Buffer(() => Observable.Timer(TimeSpan.FromSeconds(1))) works like the regular Buffer(time) overload.
In you want to weight a sequence, you can apply a Scan over the sequence and then decide on your aggregating condition.
E.g., source.Scan((a,c) => a + c).SkipWhile(a => a < 100) gives you a sequence which produces a value when the source sequence has added up to more than 100.
You can use Amb to race these two closing conditions to see which reacts first:
.Buffer(() => Observable.Amb
(
Observable.Timer(TimeSpan.FromSeconds(1)),
source.Scan((a,c) => a + c).SkipWhile(a => a < 100)
)
)
You can use any series of combinators which produces any value for the buffer to be closed at that point.
Note:
The value given to the closing selector doesn't matter - it's the notification that matters. So to combine sources of different types with Amb simply change it to System.Reactive.Unit.
Observable.Amb(stream1.Select(_ => new Unit()), stream2.Select(_ => new Unit())
With regards to this solution.
Is there a way to limit the number of keywords to be taken into consideration? For example, I'd like only first 1000 words of text to be calculated. There's a "Take" method in Linq, but it serves a different purpose - all words will be calculated, and N records will be returned. What's the right alternative to make this correctly?
Simply apply Take earlier - straight after the call to Split:
var results = src.Split()
.Take(1000)
.GroupBy(...) // etc
Well, strictly speaking LINQ is not necessarily going to read everything; Take will stop as soon as it can. The problem is that in the related question you look at Count, and it is hard to get a Count without consuming all the data. Likewise, string.Split will look at everything.
But if you wrote a lazy non-buffering Split function (using yield return) and you wanted the first 1000 unique words, then
var words = LazySplit(text).Distinct().Take(1000);
would work
Enumerable.Take does in fact stream results out; it doesn't buffer up its source entirely and then return only the first N. Looking at your original solution though, the problem is that the input to where you would want to do a Take is String.Split. Unfortunately, this method doesn't use any sort of deferred execution; it eagerly creates an array of all the 'splits' and then returns it.
Consequently, the technique to get a streaming sequence of words from some text would be something like:
var words = src.StreamingSplit() // you'll have to implement that
.Take(1000);
However, I do note that the rest of your query is:
...
.GroupBy(str => str) // group words by the value
.Select(g => new
{
str = g.Key, // the value
count = g.Count() // the count of that value
});
Do note that GroupBy is a buffering operation - you can expect that all of the 1,000 words from its source will end up getting stored somewhere in the process of the groups being piped out.
As I see it, the options are:
If you don't mind going through all of the text for splitting purposes, then src.Split().Take(1000) is fine. The downside is wasted time (to continue splitting after it is no longer necesary) and wasted space (to store all of the words in an array even though only the first 1,000) will be needed. However, the rest of the query will not operate on any more words than necessary.
If you can't afford to do (1) because of time / memory constraints, go with src.StreamingSplit().Take(1000) or equivalent. In this case, none of the original text will be processed after 1,000 words have been found.
Do note that those 1,000 words themselves will end up getting buffered by the GroupBy clause in both cases.