Replay()-like functionality but with the ability to displace stale values? - c#

Wondering if anyone can think of an elegant solution to this use case:
I am consuming an observable (IObservable of type TEntity) which is providing me a stream of entities. If any of those entities are updated, then the provider of the observable will push down the updated entity.
I am using a Replay() on this stream so that I only need to subscribe to the underlying stream once and so that late subscribers can see all the values. The problem is that there is potential for a memory-leak here, because the Replay() will hold onto all the updates it sees, whereas all I need is the latest update for each entity.
I could replace the Replay() with a Scan() which allows me to maintain the latest updates only, but then I would have to push out a Dictionary of all the updates observed so far, rather than just the specific entity that has changed.
The only solution I can think of is to use a Scan() as above, but in the Scan() implementation I will push all updates into an Subject. Subscribers to the IObservable I will expose will receive a merge of the Snapshot stored in the Scan() dictionary plus any updates, as follows:
private Subject<Entity> _updateSubject = new Subject<Entity>();
private IObservable<Dictionary<string, Entity>> _subscriptionStream;
//called once on initialisation
private void CreateSubscription()
{
IObservable<Entity> source = GetSomeLongRunningSubscriptionStream();
_subscriptionStream = source
.Scan(new Dictionary<string, Entity>(), (accumulator,update) =>
{
accumulator[update.ID] = update;
_updateSubject.OnNext(update);
return accumulator;
})
.Replay(1);
}
//called each time a consumer wants access to the stream
public IObservable<Entity> GetStream()
{
return _subscriptionStream.Take(1).SelectMany(x => x).Select(x => x.Value)
.Merge(_updateSubject.AsObservable());
}
Can anyone think of a more elegant solution with holds the state within a single stream rather than resorting to Subjects?
Thanks
************** Edit **************
As per my comment, I've gone with something similar to this. Let me know your thoughts
//called once on initialisation
private void CreateSubscription()
{
_baseSubscriptionObservable = GetSomeLongRunningSubscriptionStream ().Publish();
_snapshotObservable = _baseSubscriptionObservable
.Scan(new Dictionary<string,Entity>(), (accumulator, update) =>
{
accumulator[update.ID] = update;
return accumulator;
})
.StartWith(new Dictionary<string, Entity>())
.Replay(1);
_baseSubscriptionObservable.Connect ();
_snapshotObservable.Connect ();
}
public IObservable<Entity> GetStream()
{
return _snapshotObservable.Take (1).Select (x => x.Values).SelectMany (x => x)
.Merge (_baseSubscriptionObservable);
}

I generally like what you're doing, but there are a number of issues that I can see.
To start with you've split CreateSubscription and GetStream into two methods, with the idea that you'll have one underlying subscription to the GetSomeLongRunningSubscriptionStream() stream. Unfortunately, in this case, you'll have zero subscriptions regardless how many subscriptions you get to your final observable as .Replay(1) returns an IConnectableObservable<> which you need to call .Connect() on to begin the flow of values.
The next thing is that you're updating your accumulator with the latest value and then in GetStream you're adding in the latest value along with merging in a flattened stream of your accumulator. You're returning the latest value twice each time.
Here's how I would suggest that you do it:
private IObservable<IList<Timestamped<Entity>>> GetStream()
{
return
Observable
.Create<IList<Timestamped<Entity>>>(o =>
GetSomeLongRunningSubscriptionStream()
.Timestamp()
.Scan(
new Dictionary<string, Timestamped<Entity>>(),
(accumulator, update) =>
{
accumulator[update.Value.ID] = update;
return accumulator;
})
.Select(x => x.Select(y => y.Value).ToList())
.Replay(1)
.RefCount()
.Subscribe(o));
}
It's almost always best to avoid any state when using Rx (that isn't localized within the observable). So I've merged together CreateSubscription and GetStream into a single GetStream method and I've encapsulated the whole observable into a Observable.Create.
In order to avoid pushing out values twice and to facilitate your ability to know what the latest update is I've added a call to .Timestamp() to put the latest time an Entity was returned.
I've kept the .Scan(...) with the dictionary, but it is now a Dictionary<string, Timestamped<Entity>>.
For each value added/updated I then flatten the dictionary and return the underlying values as a list. At this point you could order the list to make sure that the latest values are either first or last to suit your needs.
I've then used the .Replay(1).RefCount() combination to turn the IConnectableObservable<> returned by .Replay(1) back into an IObservable<>, with the understanding that you'll dispose of the underlying subscription when all subscribers dispose. This is probably the most crucial part of your query. It should be done this way. This is the Rx way of ensuring that you avoid memory leaks.
If you desperately need to keep the underlying connection open then you would need to encapsulate all of your code within a class that implements IDisposable to clean up the .Connect() that you would require.
Something like this:
public class EntityStream : IDisposable
{
private IDisposable _connection = null;
public EntityStream(IObservable<Entity> someLongRunningSubscriptionStream)
{
_stream =
someLongRunningSubscriptionStream
.Timestamp()
.Scan(
new Dictionary<string, Timestamped<Entity>>(),
(accumulator, update) =>
{
accumulator[update.Value.ID] = update;
return accumulator;
})
.Select(x => x.Select(y => y.Value).ToList())
.Replay(1);
_connection = _stream.Connect();
}
private IConnectableObservable<IList<Timestamped<Entity>>> _stream = null;
public IObservable<IList<Timestamped<Entity>>> GetStream()
{
return _stream.AsObservable();
}
public void Dispose()
{
if (_connection != null)
{
_connection.Dispose();
_connection = null;
}
}
}
I so very rarely do this though. I would thoroughly recommend the doing the first method. You should only mix OOP and Rx when you have to.
Please let me know if you need any clarification.

Related

How to listen to change feed continuously RethinkDB

I am having the following problem: with RethinkDB using RunChangesAsync method runs once and when used, it starts listening to changes on a given query. When the query changes, you are given the Cursor<Change<Class>> , which is a delta between the initial state and the actual state.
My question is how can I make this run continuously?
If I use:
while(true)
{
code.... //changes happening while program is here
....../
...RunChangesAsync();
/......processed buffered items
code //new changes here
}
If there are changes happening where i pointed in the code, they would not be caught by the RunChanges. The only changes that would be caught would be while RunChanges is listening. Not before ..or after it retrieves the results.
So I tried wrapping the RunChanges in an observable but it does not listen continuously for changes as I would have expected...it just retrieves 2 null items (garbage I suppose) and ends.
Observable
public IObservable<Cursor<Change<UserStatus?>>> GetObservable() =>
r.Db(Constants.DB_NAME).Table(Constants.CLIENT_TABLE).RunChangesAsync<UserStatus?>(this.con,CancellationToken.None).ToObservable();
Observer
class PlayerSubscriber : IObserver<Cursor<Change<UserStatus?>>>
{
public void OnCompleted() => Console.WriteLine("Finished");
public void OnError(Exception error) => Console.WriteLine("error");
public void OnNext(Cursor<Change<UserStatus?>> value)
{
foreach (var item in value.BufferedItems)
Console.WriteLine(item);
}
}
Program
class Program
{
public static RethinkDB r = RethinkDB.R;
public static bool End = false;
static async Task Main(string[] args)
{
var address = new Address { Host = "127.0.0.1", Port = 28015 };
var con = await r.Connection().Hostname(address.Host).Port(address.Port).ConnectAsync();
var database = new Database(r, con);
var obs = database.GetObservable();
var sub = new PlayerSubscriber();
var disp = obs.Subscribe(sub);
Console.ReadKey();
Console.WriteLine("Hello World!");
}
}
When I am debugging as you can see, the OnNext method of the Observer is executed only once (returns two null objects) and then it closes.
P.S: Database is just a wrapper around rethinkdb queries. The only method used is GetObservable which I posted it. The UserStatus is a POCO.
When creating a change feed, you'll want to create one change feed object. For example, when you get back a Cursor<Change<T>> after running .RunChangesAsync(); that is really all you need.
The cursor object you get back from query.RunChangesAsync() is your change feed object that you will use for the entire lifetime you want to receive changes.
In your example:
while(true)
{
code.... //changes happening while program is here
....../
...RunChangesAsync();
/......processed buffered items
code //new changes here
}
Having .RunChangesAsync(); in a while loop is not the correct approach. You don't need to re-run the query again and get another Cursor<Change<T>>. I'll explain how this works at the end of this post.
Also, do not use cursor.BufferedItems on the cursor object. The cursor.BufferedItems property on the cursor is not meant to consumed by your code directly; the cursor.BufferedItems property is only exposed for those special situations where you want to "peek ahead" inside the cursor object (client-side) for items that are ready to be consumed that are specific to your change feed query.
The proper way to consume items in your change feed is to enumerate over the cursor object itself as shown below:
var cursor = await query.RunChangesAsync(conn);
foreach (var item in cursor){
Console.WriteLine(item);
}
When the cursor runs out of items, it will make a request to the RethinkDB server for more items. Keep in mind, each iteration of the foreach loop can be potentially a blocking call. For example, the foreach loop can block indefinitely when 1) there are no items on the client-side to be consumed (.BufferedItems.Count == 0) and 2) there are no items that have been changed on the server-side according to your change feed query criteria. under these circumstances, the foreach loop will block until RethinkDB server sends you an item that is ready to be consumed.
Documentation about using Reactive Extensions and RethinkDB in C#
There is a driver unit test that shows how .NET Reactive Extensions can work here.
Specifically, Lines 31 - 47 in this unit test set up a change feed with Reactive Extensions:
var changes = R.Db(DbName).Table(TableName)
//.changes()[new {include_states = true, include_initial = true}]
.Changes()
.RunChanges<JObject>(conn);
changes.IsFeed.Should().BeTrue();
var observable = changes.ToObservable();
//use a new thread if you want to continue,
//otherwise, subscription will block.
observable.SubscribeOn(NewThreadScheduler.Default)
.Subscribe(
x => OnNext(x),
e => OnError(e),
() => OnCompleted()
);
Additionally, here is a good example and explanation of what happens and how to consume a change feed with C#:
Hope that helps.
Thanks,
Brian
If you have an operation that has the signature Task<int> ReadAsync(), then the way to set up polling, is like this:
IObservable<int> PollRead(TimeSpan interval)
{
return
Observable
.Interval(interval)
.SelectMany(n => Observable.FromAsync(() => ReadAsync()));
}
I'd also caution about you creating your own implementation of IObservable<T> - it's fraught with danger. You should use Observer.Create(...) if you are creating your own observer that you want to hand around. Generally you don't even do that.

"yield return" from event handler

I have a class which takes a stream in the constructor. You can then set up callbacks for various events, and then call StartProcessing. The issue is that I want to use it from a function which should return an IEnumerable.
Example:
public class Parser
{
public Parser(System.IO.Stream s) { // saves stream and does some set up }
public delegate void OnParsedHandler(List<string> token);
public event OnParsedHandler OnParsedData;
public void StartProcessing()
{
// reads stream and makes callback when it has a whole record
}
}
public class Application
{
public IEnumerable<Thing> GetThings(System.IO.Stream s)
{
Parser p = new Parser(s);
p.OnParsedData += (List<string> str) =>
{
Thing t = new Thing(str[0]);
// here is where I would like to yield
// but I can't
yield return t;
};
p.StartProcessing();
}
}
Right now my solution, which isn't so great, is to put them all the Things into a List which is captured by the lambda, and then iterate over them after calling StartProcessing.
public class Application
{
public IEnumerable<Thing> GetThings(System.IO.Stream s)
{
Parser p = new Parser(s);
List<Thing> thingList = new List<Thing>();
p.OnParsedData += (List<string> str) =>
{
Thing t = new Thing(str[0]);
thingList .Add(t);
};
p.StartProcessing();
foreach(Thing t in thingList )
{
yield return t;
}
}
}
The issue here is that now I have to save all of the Thing objects into list.
The problem you have here is that you don't fundamentally have a "pull" mechanic here, you're trying to push data from the parser. If the parser is going to push data to you, rather than letting the caller pull the data, then GetThings should return an IObservable, rather than an IEnumerable, so the caller can consume the data when it's ready.
If it really is important to have a pull mechanic here then Parser shouldn't fire an event to indicate that it has new data, but rather the caller should be able to ask it for new data and have it get it; it should either return all of the parsed data, or itself return an IEnumerable.
Interesting question. I would like to build upon what #servy has said regarding push and pull. In your implementation above, you are effectively adapting a push mechanism to a pull interface.
Now, first things first. You have not specified whether the call to the StartProcessing() method is a blocking call or not. A couple of remarks regarding that:
If the method is blocking (synchronous), then there is really no point in adapting it to a pull model anyway. The caller will see all the data processed in a single blocking call.
In that regard, receiving the data indirectly via an event handler scatters into two seemingly unrelated constructs what should otherwise be a single, cohesive, explicit operation. For example:
void ProcessAll(Action<Thing> callback);
On the other hand, if the StartProcessing() method actually spawns a new thread (maybe better named BeginProcessing() and follow the Event-based Asynchronous Pattern or another async processing pattern), you could adapt it to a pull machanism by means of a synchronization construct using a wait handle: ManualResetEvent, mutex and the like. Pseudo-code:
public IEnumerable<Thing> GetThings(System.IO.Stream s)
{
var parser = new Parser(s);
var waitable = new AutoResetEvent(false);
Thing item = null;
parser.OnParsedData += (Thing thing) =>
{
item = thing;
waitable.Set();
};
IAsyncResult result = parser.BeginProcessing();
while (!result.IsCompleted)
{
waitable.WaitOne();
yield return item;
}
}
Disclaimer
The above code serves only as a means for presenting an idea. It is not thread-safe and the synchronization mechanics do not work properly. See the producer-consumer pattern for more information.

'Flushing' observable Scan

This is a weird 'problem', I'm not sure what the best way to handle it is.
To simplify, let's say that I've got an observable source with some data reading coming from 'outside':
{ Value, TimeStamp }
I'm putting that through Observable.Scan so that I can output:
{ Value, TimeStamp, TimeDelta }
This means that my data always comes out 'one late', but that's not a problem.
We're 'recording' from this observable, and when you stop one recording, there's still one data value 'stuck' waiting for it's follower.
Even that's not a problem. The problem is that when you go to start recording again, the last value from the previous 'recording' gets stuck on to the beginning of the new one.
The most obvious thing to do is just to unsubscribe and resubscribe, but.... it's not that simple, because this scanned source is not only recorded, but also sent to the UI, and used for further calculations down the line: so I'd have to do an enormous unsubscribe/resubscribe.
I'm trying to think of a way to inject some kind of 'reset' data, but not sure how one goes about sending information back 'up' the observable stream...
Maybe I've just bitten off more than I can chew? Or used too much Observable?
There are going to be a number of ways to do this, but one that is fairly easy is to use the .Switch() operator.
It essentially works like this: if you have an IObservable<IObservable<T>> you can then call .Switch() to turn it into an IObservable<T> where it basically subscribes to the last value produced by the outer observable and unsubscribes to the previously produced observable.
Now that sounds a bit funky, but here's how it can work. Given you have an observable called outsideObservable then you defining a second observable (resubscribeObservable) that produces a value every time you want to resubscribe, and you subscribe to them like this:
var subscription =
resubscribeObservable
.Select(_ => outsideObservable)
.Switch()
.Subscribe(x =>
{
/* Do stuff here */
});
Now to resubscribe to outsideObservable you just have to produce a value from resubscribeObservable.
The easiest way to do this is to define it like var resubscribeObservable = new Subject<Unit>(); and then call resubscribeObservable.OnNext(Unit.Default); every time you want to resubscribe.
Alternatively if you have some event, say a user clicking a button, then you could use an observable based on that event as your resubscribeObservable.
Integrating suggestions from the comments, this would look something like:
var factory = Observable.Defer(() => outsideObservable);
var resetterObservable = new Subject<Unit>();
var resettableObservable =
resetterObservable
.StartWith(Unit.Default)
.Select(_ => factory)
.Switch()
.Publish()
.RefCount();
The Publish().RefCount() is just to protect the outsideObservable from multiple simultaneous subscriptions.
This is what I've boiled the accepted answer down to. Not yet in production, but tests seem to show it does what I want.
public interface IResetter
{
IObservable<T> MakeResettable<T>(Func<IObservable<T>> selector);
}
public class Resetter : IResetter
{
private Subject<Unit> _Resetter = new Subject<Unit>();
public void Reset()
{
_Resetter.OnNext(Unit.Default);
}
public IObservable<T> MakeResettable<T>(Func<IObservable<T>> selector)
{
return
_Resetter
.StartWith(Unit.Default)
.Select(_ => Observable.Defer(selector))
.Switch()
.Publish().RefCount();
}
}

What is the best practise for implementing an Rx handler?

I have this class for explaining my problem:
public class DataObserver: IDisposable
{
private readonly List<IDisposable> _subscriptions = new List<IDisposable>();
private readonly SomeBusinessLogicServer _server;
public DataObserver(SomeBusinessLogicServer server, IObservable<SomeData> data)
{
_server = server;
_subscriptions.Add(data.Subscribe(TryHandle));
}
private void TryHandle(SomeData data)
{
try
{
_server.MakeApiCallAsync(data).Wait();
}
catch (Exception)
{
// Handle exceptions somehow!
}
}
public void Dispose()
{
_subscriptions.ForEach(s => s.Dispose());
_subscriptions.Clear();
}
}
A) How can I avoid blocking inside the TryHandle() function?
B) How would you publish exceptions caught inside that function for handling them properly?
The Rx Design Guidelines provide a lot of useful advice when writing your own Rx operators:
http://go.microsoft.com/fwlink/?LinkID=205219
I'm sure I'll get lambasted for linking to an external article, but this link has been good for a couple of years and it's too big to republish on SO.
First, take a look at CompositeDisposable instead of re-implementing it yourself.
Other than that, there are many answers to your question. I have found that the best insight I've had when working with Rx is realizing that most cases where you want to subscribe are really just more chains in the observable you are building and you don't really want to subscribe but instead want to apply yet another transform to the incoming observable. And let some code that is further "on the edge of the system" and has more knowledge of how to handle errors do the actual subscribing
In the example you have presented:
A) Don't block by just transforming the IObservable<SomeData> into an IObservable<Task> (which is really better expressed as an IObservable<IObservable<Unit>>).
B) Publish exceptions by just ending the observable with an error or, if you don't want the exception to end the observable, exposing an IObservable<Exception>.
Here's how I'd re-write your example, assuming you did not want the stream to end on error, but instead just keep running after reporting the errors:
public static class DataObserver
{
public static IObservable<Exception> ApplyLogic(this IObservable<SomeData> source, SomeBusinessLogicServer server)
{
return source
.Select(data =>
{
// execute the async method as an observable<Unit>
// ignore its results, but capture its error (if any) and yield it.
return Observable
.FromAsync(() => server.MakeApiCallAsync(data))
.IgnoreElements()
.Select(_ => (Exception)null) // to cast from IObservable<Unit> to IObservable<Exception>
.Catch((Exception e) => Observable.Return(e));
})
// runs the Api calls sequentially (so they will not run concurrently)
// If you prefer to let the calls run in parallel, then use
// .Merge() instead of .Concat()
.Concat() ;
}
}
// Usage (in Main() perhaps)
IObservable<SomeData> dataStream = ...;
var subscription = dataStream.ApplyLogic(server).Subscribe(error =>
{
Console.WriteLine("An error occurred processing a dataItem: {0}", error);
}, fatalError =>
{
Console.WriteLine("A fatal error occurred retrieving data from the dataStream: {0}", fatalError);
});

RX: How to concat a Snapshot stream and an Update stream?

I've been trying to create an observable which streams a state-of-the-world (snapshot) from a repository cache, followed by live updates from a separate feed. The catch is that the snapshot call is blocking, so the updates have to be buffered during that time.
This is what I've come up with, a little simplified. The GetStream() method is the one I'm concerned with. I'm wondering whether there is a more elegant solution. Assume GetDataFeed() pulses updates to the cache all day long.
private static readonly IConnectableObservable<long> _updateStream;
public static Constructor()
{
_updateStream = GetDataFeed().Publish();
_updateStream.Connect();
}
static void Main(string[] args)
{
_updateStream.Subscribe(Console.WriteLine);
Console.ReadLine();
GetStream().Subscribe(l => Console.WriteLine("Stream: " + l));
Console.ReadLine();
}
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var bufferedStream = new ReplaySubject<long>();
_updateStream.Subscribe(bufferedStream);
var data = GetSnapshot();
// This returns the ticks from GetSnapshot
// followed by the buffered ticks from _updateStream
// followed by any subsequent ticks from _updateStream
data.ToObservable().Concat(bufferedStream).Subscribe(observer);
return Disposable.Empty;
});
}
private static IObservable<long> GetDataFeed()
{
var feed = Observable.Interval(TimeSpan.FromSeconds(1));
return Observable.Create<long>(observer =>
{
feed.Subscribe(observer);
return Disposable.Empty;
});
}
Popular opinion opposes Subjects as they are not 'functional', but I can't find a way of doing this without a ReplaySubject. The Replay filter on a hot observable wouldn't work because it would replay everything (potentially a whole day's worth of stale updates).
I'm also concerned about race conditions. Is there a way to guarantee sequencing of some sort, should an earlier update be buffered before the snapshot? Can the whole thing be done more safely and elegantly with other RX operators?
Thanks.
-Will
Whether you use a ReplaySubject or the Replay function really makes no difference. Replay uses a ReplaySubject under the hood. I'll also note that you are leaking subscriptions like mad, which can cause a resource leak. Also, you put no limit on the size of the replay buffer. If you watch the observable all day long, then that replay buffer will keep growing and growing. You should put a limit on it to prevent that.
Here is an updated version of GetStream. In this version I take the simplistic approach of just limitting the Replay to the most recent 1 minute of data. This assumes that GetData will always complete and the observer will observe the results within that 1 minute. Your mileage may vary and you can probably improve upon this scheme. But at least this way when you have watched the observable all day long, that buffer will not have grown unbounded and will still only contain a minute's worth of updates.
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var updateStreamSubscription = new SingleAssignmentDisposable();
var sequenceDisposable = new SingleAssignmentDisposable();
var subscriptions = new CompositeDisposable(updateStreamDisposable, sequenceDisposable);
// start buffering the updates
var bufferedStream = _updateStream.Replay(TimeSpan.FromMinutes(1));
updateStreamSubscription.Disposable = bufferedStream.Connect();
// now retrieve the initial snapshot data
var data = GetSnapshot();
// subscribe to the snapshot followed by the buffered data
sequenceDisposable.Disposable = data.ToObservable().Concat(bufferedStream).subscribe(observer);
// return the composite disposable which will unsubscribe when the observer wishes
return subscriptions;
});
}
As for your questions about race conditions and filtering out "old" updates...if your snapshot data includes some sort of version information, and your update stream also providers version information, then you can effectively measure the latest version returned by your snapshot query and then filter the buffered stream to ignore values for older versions. Here is a rough example:
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var updateStreamSubscription = new SingleAssignmentDisposable();
var sequenceDisposable = new SingleAssignmentDisposable();
var subscriptions = new CompositeDisposable(updateStreamDisposable, sequenceDisposable);
// start buffering the updates
var bufferedStream = _updateStream.Replay(TimeSpan.FromMinutes(1));
updateStreamSubscription.Disposable = bufferedStream.Connect();
// now retrieve the initial snapshot data
var data = GetSnapshot();
var snapshotVersion = data.Length > 0 ? data[data.Length - 1].Version : 0;
var filteredUpdates = bufferedStream.Where(update => update.Version > snapshotVersion);
// subscribe to the snapshot followed by the buffered data
sequenceDisposable.Disposable = data.ToObservable().Concat(filteredUpdates).subscribe(observer);
// return the composite disposable which will unsubscribe when the observer wishes
return subscriptions;
});
}
I have successfully used this pattern when merging live updates with a stored snapshot. I haven't yet found an elegant Rx operator that already does this without any race conditions. But the above method could probably be turned into such. :)
Edit: Note I have left out error handling in the examples above. In theory the call to GetSnapshot could fail and you'd leak the subscription to the update stream. I suggest wrapping everything after the CompositeDisposable declaration in a try/catch block, and in the catch handler, ensure call subscriptions.Dispose() before re-throwing the exception.

Categories

Resources