store retrieve IObservable subscription state in Rx - c#

[ this question is in the realm of Reactive Extensions (Rx) ]
A subscription that needs to continue on application restart
int nValuesBeforeOutput = 123;
myStream.Buffer(nValuesBeforeOutput).Subscribe(
i => Debug.WriteLine("Something Critical on Every 123rd Value"));
Now I need to serialize and deserialize the state of this subscription so that next time the application is started the buffer count does NOT start from zero, but from whatever the buffer count got to before application exit.
How could you persist the state of IObservable.Subscribe() in this case and later load it?
Is there a general solution to saving observer state in Rx?
From Answer to Solution
Based on Paul Betts approach, here's a semi-generalizable implementation that worked in my initial testing
Use
int nValuesBeforeOutput = 123;
var myRecordableStream = myStream.Record(serializer);
myRecordableStream.Buffer(nValuesBeforeOutput).ClearRecords(serializer).Subscribe(
i => Debug.WriteLine("Something Critical on Every 123rd Value"));
Extension methods
private static bool _alreadyRecording;
public static IObservable<T> Record<T>(this IObservable<T> input,
IRepositor repositor)
{
IObservable<T> output = input;
List<T> records = null;
if (repositor.Deserialize(ref records))
{
ISubject<T> history = new ReplaySubject<T>();
records.ForEach(history.OnNext);
output = input.Merge(history);
}
if (!_alreadyRecording)
{
_alreadyRecording = true;
input.Subscribe(i => repositor.SerializeAppend(new List<T> {i}));
}
return output;
}
public static IObservable<T> ClearRecords<T>(this IObservable<T> input,
IRepositor repositor)
{
input.Subscribe(i => repositor.Clear());
return input;
}
Notes
This will not work for storing states that depend on time-intervals between the values produced
You need a serializer implementation that supports serializing T
_alreadyRecording is needed if you subscribe to myRecordableStream more than once
_alreadyRecording is a static boolean, very ugly, and prevents the extension methods from being used in more than one place if needing parallel subscriptions - needs to be re-implemented for future use

There is no general solution for this, and making one would be NonTrivialâ„¢. The closest thing you can do is make myStream some sort of replay Observable (i.e. instead of serializing the state, serialize the state of myStream and redo the work to get you back to where you were).

Related

"yield return" from event handler

I have a class which takes a stream in the constructor. You can then set up callbacks for various events, and then call StartProcessing. The issue is that I want to use it from a function which should return an IEnumerable.
Example:
public class Parser
{
public Parser(System.IO.Stream s) { // saves stream and does some set up }
public delegate void OnParsedHandler(List<string> token);
public event OnParsedHandler OnParsedData;
public void StartProcessing()
{
// reads stream and makes callback when it has a whole record
}
}
public class Application
{
public IEnumerable<Thing> GetThings(System.IO.Stream s)
{
Parser p = new Parser(s);
p.OnParsedData += (List<string> str) =>
{
Thing t = new Thing(str[0]);
// here is where I would like to yield
// but I can't
yield return t;
};
p.StartProcessing();
}
}
Right now my solution, which isn't so great, is to put them all the Things into a List which is captured by the lambda, and then iterate over them after calling StartProcessing.
public class Application
{
public IEnumerable<Thing> GetThings(System.IO.Stream s)
{
Parser p = new Parser(s);
List<Thing> thingList = new List<Thing>();
p.OnParsedData += (List<string> str) =>
{
Thing t = new Thing(str[0]);
thingList .Add(t);
};
p.StartProcessing();
foreach(Thing t in thingList )
{
yield return t;
}
}
}
The issue here is that now I have to save all of the Thing objects into list.
The problem you have here is that you don't fundamentally have a "pull" mechanic here, you're trying to push data from the parser. If the parser is going to push data to you, rather than letting the caller pull the data, then GetThings should return an IObservable, rather than an IEnumerable, so the caller can consume the data when it's ready.
If it really is important to have a pull mechanic here then Parser shouldn't fire an event to indicate that it has new data, but rather the caller should be able to ask it for new data and have it get it; it should either return all of the parsed data, or itself return an IEnumerable.
Interesting question. I would like to build upon what #servy has said regarding push and pull. In your implementation above, you are effectively adapting a push mechanism to a pull interface.
Now, first things first. You have not specified whether the call to the StartProcessing() method is a blocking call or not. A couple of remarks regarding that:
If the method is blocking (synchronous), then there is really no point in adapting it to a pull model anyway. The caller will see all the data processed in a single blocking call.
In that regard, receiving the data indirectly via an event handler scatters into two seemingly unrelated constructs what should otherwise be a single, cohesive, explicit operation. For example:
void ProcessAll(Action<Thing> callback);
On the other hand, if the StartProcessing() method actually spawns a new thread (maybe better named BeginProcessing() and follow the Event-based Asynchronous Pattern or another async processing pattern), you could adapt it to a pull machanism by means of a synchronization construct using a wait handle: ManualResetEvent, mutex and the like. Pseudo-code:
public IEnumerable<Thing> GetThings(System.IO.Stream s)
{
var parser = new Parser(s);
var waitable = new AutoResetEvent(false);
Thing item = null;
parser.OnParsedData += (Thing thing) =>
{
item = thing;
waitable.Set();
};
IAsyncResult result = parser.BeginProcessing();
while (!result.IsCompleted)
{
waitable.WaitOne();
yield return item;
}
}
Disclaimer
The above code serves only as a means for presenting an idea. It is not thread-safe and the synchronization mechanics do not work properly. See the producer-consumer pattern for more information.

How to use Lazy to handle concurrent request?

I'm new in C# and trying to understand how to work with Lazy.
I need to handle concurrent request by waiting the result of an already running operation. Requests for data may come in simultaneously with same/different credentials.
For each unique set of credentials there can be at most one GetDataInternal call in progress, with the result from that one call returned to all queued waiters when it is ready
private readonly ConcurrentDictionary<Credential, Lazy<Data>> Cache
= new ConcurrentDictionary<Credential, Lazy<Data>>();
public Data GetData(Credential credential)
{
// This instance will be thrown away if a cached
// value with our "credential" key already exists.
Lazy<Data> newLazy = new Lazy<Data>(
() => GetDataInternal(credential),
LazyThreadSafetyMode.ExecutionAndPublication
);
Lazy<Data> lazy = Cache.GetOrAdd(credential, newLazy);
bool added = ReferenceEquals(newLazy, lazy); // If true, we won the race.
Data data;
try
{
// Wait for the GetDataInternal call to complete.
data = lazy.Value;
}
finally
{
// Only the thread which created the cache value
// is allowed to remove it, to prevent races.
if (added) {
Cache.TryRemove(credential, out lazy);
}
}
return data;
}
Is that right way to use Lazy or my code is not safe?
Update:
Is it good idea to start using MemoryCache instead of ConcurrentDictionary? If yes, how to create a key value, because it's a string inside MemoryCache.Default.AddOrGetExisting()
This is correct. This is a standard pattern (except for the removal) and it's a really good cache because it prevents cache stampeding.
I'm not sure you want to remove from the cache when the computation is done because the computation will be redone over and over that way. If you don't need the removal you can simplify the code by basically deleting the second half.
Note, that Lazy has a problem in the case of an exception: The exception is stored and the factory will never be re-executed. The problem persists forever (until a human restarts the app). In my mind this makes Lazy completely unsuitable for production use in most cases.
This means that a transient error such as a network issue can render the app unavailable permanently.
This answer is directed to the updated part of the original question. See #usr answer regarding thread-safety with Lazy<T> and the potential pitfalls.
I would like to know how to avoid using ConcurrentDictionary<TKey, TValue> and start
using MemoryCache? How to implement
MemoryCache.Default.AddOrGetExisting()?
If you're looking for a cache which has a mechanism for auto expiry, then MemoryCache is a good choice if you don't want to implement the mechanics yourself.
In order to utilize MemoryCache which forces a string representation for a key, you'll need to create a unique string representation of a credential, perhaps a given user id or a unique username?
If you can, you can create an override of ToString which represents your unique identifier or simply use the said property, and utilize MemoryCache like this:
public class Credential
{
public Credential(int userId)
{
UserId = userId;
}
public int UserId { get; private set; }
}
And now your method will look like this:
private const EvictionIntervalMinutes = 10;
public Data GetData(Credential credential)
{
Lazy<Data> newLazy = new Lazy<Data>(
() => GetDataInternal(credential), LazyThreadSafetyMode.ExecutionAndPublication);
CacheItemPolicy evictionPolicy = new CacheItemPolicy
{
AbsoluteExpiration = DateTimeOffset.UtcNow.AddMinutes(EvictionIntervalMinutes)
};
var result = MemoryCache.Default.AddOrGetExisting(
new CacheItem(credential.UserId.ToString(), newLazy), evictionPolicy);
return result != null ? ((Lazy<Data>)result.Value).Value : newLazy.Value;
}
MemoryCache provides you with a thread-safe implementation, this means that two threads accessing AddOrGetExisting will only cause a single cache item to be added or retrieved. Further, Lazy<T> with ExecutionAndPublication guarantess only a single unique invocation of the factory method.

Adding Sample(TimeSpan span) to Reactive Extensions pipeline causes threading issues

Using Reactive Extensions, I have created a rolling buffer of values that caches a small history of recent values in a data stream for use in a plotting application. Since the values arrive much faster than I am interested in displaying, I would like to use the Sample(Timespan span) method in my Reactive pipeline to slow things down. However, adding it to the sample below causes an Exception to be thrown after a bit in the WriteEnumerable method (collection was modified). This is obviously a threading issue related to Sample, but I'm stumped on how exactly to alleviate it. I've tried setting the Scheduler to use in the Sample method to no avail.
Any advice?
class Program
{
static void Main(string[] args)
{
Observable.Interval(TimeSpan.FromSeconds(0.1))
.Take(500)
.TimedRollingBuffer(TimeSpan.FromSeconds(10))
.Sample(TimeSpan.FromSeconds(0.5))
.Subscribe(frame => WriteEnumerable(frame));
var input = "";
while (input != "exit")
{
input = Console.ReadLine();
}
}
private static void WriteEnumerable<T>(IEnumerable<T> enumerable)
{
foreach (T thing in enumerable)
Console.WriteLine(thing + " " + DateTime.UtcNow);
Console.WriteLine(Environment.NewLine);
}
}
public static class Extensions
{
public static IObservable<IEnumerable<Timestamped<T>>> TimedRollingBuffer<T>(this IObservable<T> observable, TimeSpan timeRange)
{
return Observable.Create<IEnumerable<Timestamped<T>>>(
o =>
{
var queue = new Queue<Timestamped<T>>();
return observable.Timestamp().Subscribe(
tx =>
{
queue.Enqueue(tx);
DateTime now = DateTime.Now;
while (queue.Peek().Timestamp < now.Subtract(timeRange))
queue.Dequeue();
o.OnNext(queue);
},
ex => o.OnError(ex),
() => o.OnCompleted()
);
});
}
}
credit where credit is due: reactive extensions sliding time window
The exception is due to the fact you are modifying the queue whilst enumerating it.
The RollingBuffer implementation you cited by #Enigmativity does the right thing - you will notice in his implemention the OnNext is invoked with a ToArray() ensuring a copy of the list as it stands is dispatched to observers rather than the mutating original.
In your case, the introduction of Sample introduces concurrency (which is fine - that's what it is supposed to do) - however, you are passing the queue itself, which will mutate whilst enumeration is occurring due to this introduced concurrency. This is a bug.
In your TimedRollingBuffer, if you were to use o.OnNext(queue.ToArray()) instead of o.OnNext(queue) you wouldn't have this problem.

How to manage observable subscription for dependent observables?

This sample console application has 2 observables. The first one pushes numbers from 1 to 100. This observable is subscribed by the AsyncClass which runs a long running process for each number it gets. Upon completion of this new async process I want to be able to 'push' to 2 subscribers which would be doing something with this new value.
My attempts are commented in the source code below.
AsyncClass:
class AsyncClass
{
private readonly IConnectableObservable<int> _source;
private readonly IDisposable _sourceDisposeObj;
public IObservable<string> _asyncOpObservable;
public AsyncClass(IConnectableObservable<int> source)
{
_source = source;
_sourceDisposeObj = _source.Subscribe(
ProcessArguments,
ExceptionHandler,
Completed
);
_source.Connect();
}
private void Completed()
{
Console.WriteLine("Completed");
Console.ReadKey();
}
private void ExceptionHandler(Exception exp)
{
throw exp;
}
private void ProcessArguments(int evtArgs)
{
Console.WriteLine("Argument being processed with value: " + evtArgs);
//_asyncOpObservable = LongRunningOperationAsync("hello").Publish();
// not going to work either since this creates a new observable for each value from main observer
}
// http://rxwiki.wikidot.com/101samples
public IObservable<string> LongRunningOperationAsync(string param)
{
// should not be creating an observable here, rather 'pushing' values?
return Observable.Create<string>(
o => Observable.ToAsync<string, string>(DoLongRunningOperation)(param).Subscribe(o)
);
}
private string DoLongRunningOperation(string arg)
{
return "Hello";
}
}
Main:
static void Main(string[] args)
{
var source = Observable
.Range(1, 100)
.Publish();
var asyncObj = new AsyncClass(source);
var _asyncTaskSource = asyncObj._asyncOpObservable;
var ui1 = new UI1(_asyncTaskSource);
var ui2 = new UI2(_asyncTaskSource);
}
UI1 (and UI2, they're basically the same):
class UI1
{
private IConnectableObservable<string> _asyncTaskSource;
private IDisposable _taskSourceDisposable;
public UI1(IConnectableObservable<string> asyncTaskSource)
{
_asyncTaskSource = asyncTaskSource;
_asyncTaskSource.Connect();
_taskSourceDisposable = _asyncTaskSource.Subscribe(RefreshUI, HandleException, Completed);
}
private void Completed()
{
Console.WriteLine("UI1: Stream completed");
}
private void HandleException(Exception obj)
{
Console.WriteLine("Exception! "+obj.Message);
}
private void RefreshUI(string obj)
{
Console.WriteLine("UI1: UI refreshing with value "+obj);
}
}
This is my first project with Rx so let me know if I should be thinking differently. Any help would be highly appreciated!
I'm going to let you know you should be thinking differently... :) Flippancy aside, this looks like a case of bad collision between object-oriented and functional-reactive styles.
It's not clear what the requirements are around timing of the data flow and caching of results here - the use of Publish and IConnectableObservable is a little confused. I'm going to guess you want to avoid the 2 downstream subscriptions causing the processing of a value being duplicated? I'm basing some of my answer on that premise. The use of Publish() can achieve this by allowing multiple subscribers to share a subscription to a single source.
Idiomatic Rx wants you to try and keep to a functional style. In order to do this, you want to present the long running work as a function. So let's say, instead of trying to wire your AsyncClass logic directly into the Rx chain as a class, you could present it as a function like this contrived example:
async Task<int> ProcessArgument(int argument)
{
// perform your lengthy calculation - maybe in an OO style,
// maybe creating class instances and invoking methods etc.
await Task.Delay(TimeSpan.FromSeconds(1));
return argument + 1;
}
Now, you can construct a complete Rx observable chain calling this function, and through the use of Publish().RefCount() you can avoid multiple subscribers causing duplicate effort. Note how this separates concerns too - the code processing the value is simpler because the reuse is handled elsewhere.
var query = source.SelectMany(x => ProcessArgument(x).ToObservable())
.Publish().RefCount();
By creating a single chain for subscribers, the work is only started when necessary on subscription. I've used Publish().RefCount() - but if you want to ensure values aren't missed by the second and subsequent subscribers, you could use Replay (easy) or use Publish() and then Connect - but you'll want the Connect logic outside the individual subscriber's code because you just need to call it once when all subscribers have subscribed.

RX: How to concat a Snapshot stream and an Update stream?

I've been trying to create an observable which streams a state-of-the-world (snapshot) from a repository cache, followed by live updates from a separate feed. The catch is that the snapshot call is blocking, so the updates have to be buffered during that time.
This is what I've come up with, a little simplified. The GetStream() method is the one I'm concerned with. I'm wondering whether there is a more elegant solution. Assume GetDataFeed() pulses updates to the cache all day long.
private static readonly IConnectableObservable<long> _updateStream;
public static Constructor()
{
_updateStream = GetDataFeed().Publish();
_updateStream.Connect();
}
static void Main(string[] args)
{
_updateStream.Subscribe(Console.WriteLine);
Console.ReadLine();
GetStream().Subscribe(l => Console.WriteLine("Stream: " + l));
Console.ReadLine();
}
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var bufferedStream = new ReplaySubject<long>();
_updateStream.Subscribe(bufferedStream);
var data = GetSnapshot();
// This returns the ticks from GetSnapshot
// followed by the buffered ticks from _updateStream
// followed by any subsequent ticks from _updateStream
data.ToObservable().Concat(bufferedStream).Subscribe(observer);
return Disposable.Empty;
});
}
private static IObservable<long> GetDataFeed()
{
var feed = Observable.Interval(TimeSpan.FromSeconds(1));
return Observable.Create<long>(observer =>
{
feed.Subscribe(observer);
return Disposable.Empty;
});
}
Popular opinion opposes Subjects as they are not 'functional', but I can't find a way of doing this without a ReplaySubject. The Replay filter on a hot observable wouldn't work because it would replay everything (potentially a whole day's worth of stale updates).
I'm also concerned about race conditions. Is there a way to guarantee sequencing of some sort, should an earlier update be buffered before the snapshot? Can the whole thing be done more safely and elegantly with other RX operators?
Thanks.
-Will
Whether you use a ReplaySubject or the Replay function really makes no difference. Replay uses a ReplaySubject under the hood. I'll also note that you are leaking subscriptions like mad, which can cause a resource leak. Also, you put no limit on the size of the replay buffer. If you watch the observable all day long, then that replay buffer will keep growing and growing. You should put a limit on it to prevent that.
Here is an updated version of GetStream. In this version I take the simplistic approach of just limitting the Replay to the most recent 1 minute of data. This assumes that GetData will always complete and the observer will observe the results within that 1 minute. Your mileage may vary and you can probably improve upon this scheme. But at least this way when you have watched the observable all day long, that buffer will not have grown unbounded and will still only contain a minute's worth of updates.
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var updateStreamSubscription = new SingleAssignmentDisposable();
var sequenceDisposable = new SingleAssignmentDisposable();
var subscriptions = new CompositeDisposable(updateStreamDisposable, sequenceDisposable);
// start buffering the updates
var bufferedStream = _updateStream.Replay(TimeSpan.FromMinutes(1));
updateStreamSubscription.Disposable = bufferedStream.Connect();
// now retrieve the initial snapshot data
var data = GetSnapshot();
// subscribe to the snapshot followed by the buffered data
sequenceDisposable.Disposable = data.ToObservable().Concat(bufferedStream).subscribe(observer);
// return the composite disposable which will unsubscribe when the observer wishes
return subscriptions;
});
}
As for your questions about race conditions and filtering out "old" updates...if your snapshot data includes some sort of version information, and your update stream also providers version information, then you can effectively measure the latest version returned by your snapshot query and then filter the buffered stream to ignore values for older versions. Here is a rough example:
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var updateStreamSubscription = new SingleAssignmentDisposable();
var sequenceDisposable = new SingleAssignmentDisposable();
var subscriptions = new CompositeDisposable(updateStreamDisposable, sequenceDisposable);
// start buffering the updates
var bufferedStream = _updateStream.Replay(TimeSpan.FromMinutes(1));
updateStreamSubscription.Disposable = bufferedStream.Connect();
// now retrieve the initial snapshot data
var data = GetSnapshot();
var snapshotVersion = data.Length > 0 ? data[data.Length - 1].Version : 0;
var filteredUpdates = bufferedStream.Where(update => update.Version > snapshotVersion);
// subscribe to the snapshot followed by the buffered data
sequenceDisposable.Disposable = data.ToObservable().Concat(filteredUpdates).subscribe(observer);
// return the composite disposable which will unsubscribe when the observer wishes
return subscriptions;
});
}
I have successfully used this pattern when merging live updates with a stored snapshot. I haven't yet found an elegant Rx operator that already does this without any race conditions. But the above method could probably be turned into such. :)
Edit: Note I have left out error handling in the examples above. In theory the call to GetSnapshot could fail and you'd leak the subscription to the update stream. I suggest wrapping everything after the CompositeDisposable declaration in a try/catch block, and in the catch handler, ensure call subscriptions.Dispose() before re-throwing the exception.

Categories

Resources