RX: How to concat a Snapshot stream and an Update stream? - c#

I've been trying to create an observable which streams a state-of-the-world (snapshot) from a repository cache, followed by live updates from a separate feed. The catch is that the snapshot call is blocking, so the updates have to be buffered during that time.
This is what I've come up with, a little simplified. The GetStream() method is the one I'm concerned with. I'm wondering whether there is a more elegant solution. Assume GetDataFeed() pulses updates to the cache all day long.
private static readonly IConnectableObservable<long> _updateStream;
public static Constructor()
{
_updateStream = GetDataFeed().Publish();
_updateStream.Connect();
}
static void Main(string[] args)
{
_updateStream.Subscribe(Console.WriteLine);
Console.ReadLine();
GetStream().Subscribe(l => Console.WriteLine("Stream: " + l));
Console.ReadLine();
}
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var bufferedStream = new ReplaySubject<long>();
_updateStream.Subscribe(bufferedStream);
var data = GetSnapshot();
// This returns the ticks from GetSnapshot
// followed by the buffered ticks from _updateStream
// followed by any subsequent ticks from _updateStream
data.ToObservable().Concat(bufferedStream).Subscribe(observer);
return Disposable.Empty;
});
}
private static IObservable<long> GetDataFeed()
{
var feed = Observable.Interval(TimeSpan.FromSeconds(1));
return Observable.Create<long>(observer =>
{
feed.Subscribe(observer);
return Disposable.Empty;
});
}
Popular opinion opposes Subjects as they are not 'functional', but I can't find a way of doing this without a ReplaySubject. The Replay filter on a hot observable wouldn't work because it would replay everything (potentially a whole day's worth of stale updates).
I'm also concerned about race conditions. Is there a way to guarantee sequencing of some sort, should an earlier update be buffered before the snapshot? Can the whole thing be done more safely and elegantly with other RX operators?
Thanks.
-Will

Whether you use a ReplaySubject or the Replay function really makes no difference. Replay uses a ReplaySubject under the hood. I'll also note that you are leaking subscriptions like mad, which can cause a resource leak. Also, you put no limit on the size of the replay buffer. If you watch the observable all day long, then that replay buffer will keep growing and growing. You should put a limit on it to prevent that.
Here is an updated version of GetStream. In this version I take the simplistic approach of just limitting the Replay to the most recent 1 minute of data. This assumes that GetData will always complete and the observer will observe the results within that 1 minute. Your mileage may vary and you can probably improve upon this scheme. But at least this way when you have watched the observable all day long, that buffer will not have grown unbounded and will still only contain a minute's worth of updates.
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var updateStreamSubscription = new SingleAssignmentDisposable();
var sequenceDisposable = new SingleAssignmentDisposable();
var subscriptions = new CompositeDisposable(updateStreamDisposable, sequenceDisposable);
// start buffering the updates
var bufferedStream = _updateStream.Replay(TimeSpan.FromMinutes(1));
updateStreamSubscription.Disposable = bufferedStream.Connect();
// now retrieve the initial snapshot data
var data = GetSnapshot();
// subscribe to the snapshot followed by the buffered data
sequenceDisposable.Disposable = data.ToObservable().Concat(bufferedStream).subscribe(observer);
// return the composite disposable which will unsubscribe when the observer wishes
return subscriptions;
});
}
As for your questions about race conditions and filtering out "old" updates...if your snapshot data includes some sort of version information, and your update stream also providers version information, then you can effectively measure the latest version returned by your snapshot query and then filter the buffered stream to ignore values for older versions. Here is a rough example:
public static IObservable<long> GetStream()
{
return Observable.Create<long>(observer =>
{
var updateStreamSubscription = new SingleAssignmentDisposable();
var sequenceDisposable = new SingleAssignmentDisposable();
var subscriptions = new CompositeDisposable(updateStreamDisposable, sequenceDisposable);
// start buffering the updates
var bufferedStream = _updateStream.Replay(TimeSpan.FromMinutes(1));
updateStreamSubscription.Disposable = bufferedStream.Connect();
// now retrieve the initial snapshot data
var data = GetSnapshot();
var snapshotVersion = data.Length > 0 ? data[data.Length - 1].Version : 0;
var filteredUpdates = bufferedStream.Where(update => update.Version > snapshotVersion);
// subscribe to the snapshot followed by the buffered data
sequenceDisposable.Disposable = data.ToObservable().Concat(filteredUpdates).subscribe(observer);
// return the composite disposable which will unsubscribe when the observer wishes
return subscriptions;
});
}
I have successfully used this pattern when merging live updates with a stored snapshot. I haven't yet found an elegant Rx operator that already does this without any race conditions. But the above method could probably be turned into such. :)
Edit: Note I have left out error handling in the examples above. In theory the call to GetSnapshot could fail and you'd leak the subscription to the update stream. I suggest wrapping everything after the CompositeDisposable declaration in a try/catch block, and in the catch handler, ensure call subscriptions.Dispose() before re-throwing the exception.

Related

Reentrance method and partial synchronized calls

I do have a singleton component that manages some information blocks. An information block is a calculated information identified by some characteristics (concrete an Id and a time period). These calculations may take some seconds. All information blocks are stored in a collection.
Some other consumers are using these information blocks. The calculation should start when the first request for this Id and time period comes. I had following flow in mind:
The first consumer requests the data identified by Id and time period.
The component checks if the information block already exists
If not: Create the information block, put it into the collection and start the calculation in a background task. If yes: Take it from the collection
After that the flow goes to the information block:
When the calculation is already finished (by a former call), a callback from the consumer is called with the result of the calculation.
When the calculation is still in process, the callback is called when the calculation is finished.
So long, so good.
The critical section comes when the second (or any other subsequent) call is coming and the calculation is still running. The idea is that the calculation method holds each consumers callback and then when the calculation is finished all consumers callbacks are called.
public class SingletonInformationService
{
private readonly Collection<InformationBlock> blocks = new();
private object syncObject = new();
public void GetInformationBlock(Guid id, TimePersiod timePeriod,
Action<InformationBlock> callOnFinish)
{
InformationBlock block = null;
lock(syncObject)
{
// check out if the block already exists
block = blocks.SingleOrDefault(b => b.Id ...);
if (block == null)
{
block = new InformationBlock(...);
blocks.Add(block);
}
}
block?.BeginCalculation(callOnFinish);
return true;
}
}
public class InformationBlock
{
private Task calculationTask = null;
private CalculationState isCalculating isCalculating = CalculationState.Unknown;
private List<Action<InformationBlock> waitingRoom = new();
internal void BeginCalculation(Action<InformationBlock> callOnFinish)
{
if (isCalculating == CalculationState.Finished)
{
callOnFinish(this);
return;
}
else if (isCalculating == CalculationState.IsRunning)
{
waitingRoom.Add(callOnFinish);
return;
}
// add the first call to the waitingRoom
waitingRoom.Add(callOnFinish);
isCalculating = CalculationState.IsRunning;
calculationTask = Task.Run(() => { // run the calculation})
.ContinueWith(taskResult =>
{
//.. apply the calculation result to local properties
this.Property1 = taskResult.Result.Property1;
// set the state to mark this instance as complete
isCalculating = CalculationState.Finished;
// inform all calls about the result
waitingRoom.ForEach(c => c(this));
waitingRoom.Clear();
}, TaskScheduler.FromCurrentSynchronizationContext());
}
}
Is that approach a good idea? Do you see any failures or possible deadlocks? The method BeginCalculation might be called more than once while the calculation is running. Should I await for the calculationTask?
To have deadlocks, you'll need some cycles: object A depends of object B, that depends on object A again (image below). As I see, that's not your case, since the InformationBlock class doesn't access the service, but is only called by it.
The lock block is also very small, so probably it'll not put you in troubles.
You could look for the Thread-Safe Collection from C# standard libs. This could simplify your code.
I suggest you to use a ConcurrentDictionary, because it's fastest then iterate over the collection every request.

Parallel.ForEach: Best way to save off a collection when its record count gets high?

So I'm running a Parallel.ForEach that basically generates a bunch of data which is ultimately going to be saved to a database. However, since collection of data can get quite large I need to be able to occasionally save/clear the collection so as to not run into an OutOfMemoryException.
I'm new to using Parallel.ForEach, concurrent collections, and locks, so I'm a little fuzzy on what exactly needs to be done to make sure everything works correctly (i.e. we don't get any records added to the collection between the Save and Clear operations).
Currently I'm saying, if the record count is above a certain threshold, save the data in the current collection, within a lock block.
ConcurrentStack<OutRecord> OutRecs = new ConcurrentStack<OutRecord>();
object StackLock = new object();
Parallel.ForEach(inputrecords, input =>
{
lock(StackLock)
{
if (OutRecs.Count >= 50000)
{
Save(OutRecs);
OutRecs.Clear();
}
}
OutRecs.Push(CreateOutputRecord(input);
});
if (OutRecs.Count > 0) Save(OutRecs);
I'm not 100% certain whether or not this works the way I think it does. Does the lock stop other instances of the loop from writing to output collection? If not is there a better way to do this?
Your lock will work correctly but it will not be very efficient because all your worker threads will be forced to pause for the entire duration of each save operation. Also, locks tends to be (relatively) expensive, so performing a lock in each iteration of each thread is a bit wasteful.
One of your comments mentioned giving each worker thread its own data storage: yes, you can do this. Here's an example that you could tailor to your needs:
Parallel.ForEach(
// collection of objects to iterate over
inputrecords,
// delegate to initialize thread-local data
() => new List<OutRecord>(),
// body of loop
(inputrecord, loopstate, localstorage) =>
{
localstorage.Add(CreateOutputRecord(inputrecord));
if (localstorage.Count > 1000)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
return localstorage;
},
// finally block gets executed after each thread exits
localstorage =>
{
if (localstorage.Count > 0)
{
// Save() must be thread-safe, or you'll need to wrap it in a lock
Save(localstorage);
localstorage.Clear();
}
});
One approach is to define an abstraction that represents the destination for your data. It could be something like this:
public interface IRecordWriter<T> // perhaps come up with a better name.
{
void WriteRecord(T record);
void Flush();
}
Your class that processes the records in parallel doesn't need to worry about how those records are handled or what happens when there's too many of them. The implementation of IRecordWriter handles all those details, making your other class easier to test.
An implementation of IRecordWriter could look something like this:
public abstract class BufferedRecordWriter<T> : IRecordWriter<T>
{
private readonly ConcurrentQueue<T> _buffer = new ConcurrentQueue<T>();
private readonly int _maxCapacity;
private bool _flushing;
public ConcurrentQueueRecordOutput(int maxCapacity = 100)
{
_maxCapacity = maxCapacity;
}
public void WriteRecord(T record)
{
_buffer.Enqueue(record);
if (_buffer.Count >= _maxCapacity && !_flushing)
Flush();
}
public void Flush()
{
_flushing = true;
try
{
var recordsToWrite = new List<T>();
while (_buffer.TryDequeue(out T dequeued))
{
recordsToWrite.Add(dequeued);
}
if(recordsToWrite.Any())
WriteRecords(recordsToWrite);
}
finally
{
_flushing = false;
}
}
protected abstract void WriteRecords(IEnumerable<T> records);
}
When the buffer reaches the maximum size, all the records in it are sent to WriteRecords. Because _buffer is a ConcurrentQueue it can keep reading records even as they are added.
That Flush method could be anything specific to how you write your records. Instead of this being an abstract class the actual output to a database or file could be yet another dependency that gets injected into this one. You can make decisions like that, refactor, and change your mind because the very first class isn't affected by those changes. All it knows about is the IRecordWriter interface which doesn't change.
You might notice that I haven't made absolutely certain that Flush won't execute concurrently on different threads. I could put more locking around this, but it really doesn't matter. This will avoid most concurrent executions, but it's okay if concurrent executions both read from the ConcurrentQueue.
This is just a rough outline, but it shows how all of the steps become simpler and easier to test if we separate them. One class converts inputs to outputs. Another class buffers the outputs and writes them. That second class can even be split into two - one as a buffer, and another as the "final" writer that sends them to a database or file or some other destination.

How to listen to change feed continuously RethinkDB

I am having the following problem: with RethinkDB using RunChangesAsync method runs once and when used, it starts listening to changes on a given query. When the query changes, you are given the Cursor<Change<Class>> , which is a delta between the initial state and the actual state.
My question is how can I make this run continuously?
If I use:
while(true)
{
code.... //changes happening while program is here
....../
...RunChangesAsync();
/......processed buffered items
code //new changes here
}
If there are changes happening where i pointed in the code, they would not be caught by the RunChanges. The only changes that would be caught would be while RunChanges is listening. Not before ..or after it retrieves the results.
So I tried wrapping the RunChanges in an observable but it does not listen continuously for changes as I would have expected...it just retrieves 2 null items (garbage I suppose) and ends.
Observable
public IObservable<Cursor<Change<UserStatus?>>> GetObservable() =>
r.Db(Constants.DB_NAME).Table(Constants.CLIENT_TABLE).RunChangesAsync<UserStatus?>(this.con,CancellationToken.None).ToObservable();
Observer
class PlayerSubscriber : IObserver<Cursor<Change<UserStatus?>>>
{
public void OnCompleted() => Console.WriteLine("Finished");
public void OnError(Exception error) => Console.WriteLine("error");
public void OnNext(Cursor<Change<UserStatus?>> value)
{
foreach (var item in value.BufferedItems)
Console.WriteLine(item);
}
}
Program
class Program
{
public static RethinkDB r = RethinkDB.R;
public static bool End = false;
static async Task Main(string[] args)
{
var address = new Address { Host = "127.0.0.1", Port = 28015 };
var con = await r.Connection().Hostname(address.Host).Port(address.Port).ConnectAsync();
var database = new Database(r, con);
var obs = database.GetObservable();
var sub = new PlayerSubscriber();
var disp = obs.Subscribe(sub);
Console.ReadKey();
Console.WriteLine("Hello World!");
}
}
When I am debugging as you can see, the OnNext method of the Observer is executed only once (returns two null objects) and then it closes.
P.S: Database is just a wrapper around rethinkdb queries. The only method used is GetObservable which I posted it. The UserStatus is a POCO.
When creating a change feed, you'll want to create one change feed object. For example, when you get back a Cursor<Change<T>> after running .RunChangesAsync(); that is really all you need.
The cursor object you get back from query.RunChangesAsync() is your change feed object that you will use for the entire lifetime you want to receive changes.
In your example:
while(true)
{
code.... //changes happening while program is here
....../
...RunChangesAsync();
/......processed buffered items
code //new changes here
}
Having .RunChangesAsync(); in a while loop is not the correct approach. You don't need to re-run the query again and get another Cursor<Change<T>>. I'll explain how this works at the end of this post.
Also, do not use cursor.BufferedItems on the cursor object. The cursor.BufferedItems property on the cursor is not meant to consumed by your code directly; the cursor.BufferedItems property is only exposed for those special situations where you want to "peek ahead" inside the cursor object (client-side) for items that are ready to be consumed that are specific to your change feed query.
The proper way to consume items in your change feed is to enumerate over the cursor object itself as shown below:
var cursor = await query.RunChangesAsync(conn);
foreach (var item in cursor){
Console.WriteLine(item);
}
When the cursor runs out of items, it will make a request to the RethinkDB server for more items. Keep in mind, each iteration of the foreach loop can be potentially a blocking call. For example, the foreach loop can block indefinitely when 1) there are no items on the client-side to be consumed (.BufferedItems.Count == 0) and 2) there are no items that have been changed on the server-side according to your change feed query criteria. under these circumstances, the foreach loop will block until RethinkDB server sends you an item that is ready to be consumed.
Documentation about using Reactive Extensions and RethinkDB in C#
There is a driver unit test that shows how .NET Reactive Extensions can work here.
Specifically, Lines 31 - 47 in this unit test set up a change feed with Reactive Extensions:
var changes = R.Db(DbName).Table(TableName)
//.changes()[new {include_states = true, include_initial = true}]
.Changes()
.RunChanges<JObject>(conn);
changes.IsFeed.Should().BeTrue();
var observable = changes.ToObservable();
//use a new thread if you want to continue,
//otherwise, subscription will block.
observable.SubscribeOn(NewThreadScheduler.Default)
.Subscribe(
x => OnNext(x),
e => OnError(e),
() => OnCompleted()
);
Additionally, here is a good example and explanation of what happens and how to consume a change feed with C#:
Hope that helps.
Thanks,
Brian
If you have an operation that has the signature Task<int> ReadAsync(), then the way to set up polling, is like this:
IObservable<int> PollRead(TimeSpan interval)
{
return
Observable
.Interval(interval)
.SelectMany(n => Observable.FromAsync(() => ReadAsync()));
}
I'd also caution about you creating your own implementation of IObservable<T> - it's fraught with danger. You should use Observer.Create(...) if you are creating your own observer that you want to hand around. Generally you don't even do that.

Replay()-like functionality but with the ability to displace stale values?

Wondering if anyone can think of an elegant solution to this use case:
I am consuming an observable (IObservable of type TEntity) which is providing me a stream of entities. If any of those entities are updated, then the provider of the observable will push down the updated entity.
I am using a Replay() on this stream so that I only need to subscribe to the underlying stream once and so that late subscribers can see all the values. The problem is that there is potential for a memory-leak here, because the Replay() will hold onto all the updates it sees, whereas all I need is the latest update for each entity.
I could replace the Replay() with a Scan() which allows me to maintain the latest updates only, but then I would have to push out a Dictionary of all the updates observed so far, rather than just the specific entity that has changed.
The only solution I can think of is to use a Scan() as above, but in the Scan() implementation I will push all updates into an Subject. Subscribers to the IObservable I will expose will receive a merge of the Snapshot stored in the Scan() dictionary plus any updates, as follows:
private Subject<Entity> _updateSubject = new Subject<Entity>();
private IObservable<Dictionary<string, Entity>> _subscriptionStream;
//called once on initialisation
private void CreateSubscription()
{
IObservable<Entity> source = GetSomeLongRunningSubscriptionStream();
_subscriptionStream = source
.Scan(new Dictionary<string, Entity>(), (accumulator,update) =>
{
accumulator[update.ID] = update;
_updateSubject.OnNext(update);
return accumulator;
})
.Replay(1);
}
//called each time a consumer wants access to the stream
public IObservable<Entity> GetStream()
{
return _subscriptionStream.Take(1).SelectMany(x => x).Select(x => x.Value)
.Merge(_updateSubject.AsObservable());
}
Can anyone think of a more elegant solution with holds the state within a single stream rather than resorting to Subjects?
Thanks
************** Edit **************
As per my comment, I've gone with something similar to this. Let me know your thoughts
//called once on initialisation
private void CreateSubscription()
{
_baseSubscriptionObservable = GetSomeLongRunningSubscriptionStream ().Publish();
_snapshotObservable = _baseSubscriptionObservable
.Scan(new Dictionary<string,Entity>(), (accumulator, update) =>
{
accumulator[update.ID] = update;
return accumulator;
})
.StartWith(new Dictionary<string, Entity>())
.Replay(1);
_baseSubscriptionObservable.Connect ();
_snapshotObservable.Connect ();
}
public IObservable<Entity> GetStream()
{
return _snapshotObservable.Take (1).Select (x => x.Values).SelectMany (x => x)
.Merge (_baseSubscriptionObservable);
}
I generally like what you're doing, but there are a number of issues that I can see.
To start with you've split CreateSubscription and GetStream into two methods, with the idea that you'll have one underlying subscription to the GetSomeLongRunningSubscriptionStream() stream. Unfortunately, in this case, you'll have zero subscriptions regardless how many subscriptions you get to your final observable as .Replay(1) returns an IConnectableObservable<> which you need to call .Connect() on to begin the flow of values.
The next thing is that you're updating your accumulator with the latest value and then in GetStream you're adding in the latest value along with merging in a flattened stream of your accumulator. You're returning the latest value twice each time.
Here's how I would suggest that you do it:
private IObservable<IList<Timestamped<Entity>>> GetStream()
{
return
Observable
.Create<IList<Timestamped<Entity>>>(o =>
GetSomeLongRunningSubscriptionStream()
.Timestamp()
.Scan(
new Dictionary<string, Timestamped<Entity>>(),
(accumulator, update) =>
{
accumulator[update.Value.ID] = update;
return accumulator;
})
.Select(x => x.Select(y => y.Value).ToList())
.Replay(1)
.RefCount()
.Subscribe(o));
}
It's almost always best to avoid any state when using Rx (that isn't localized within the observable). So I've merged together CreateSubscription and GetStream into a single GetStream method and I've encapsulated the whole observable into a Observable.Create.
In order to avoid pushing out values twice and to facilitate your ability to know what the latest update is I've added a call to .Timestamp() to put the latest time an Entity was returned.
I've kept the .Scan(...) with the dictionary, but it is now a Dictionary<string, Timestamped<Entity>>.
For each value added/updated I then flatten the dictionary and return the underlying values as a list. At this point you could order the list to make sure that the latest values are either first or last to suit your needs.
I've then used the .Replay(1).RefCount() combination to turn the IConnectableObservable<> returned by .Replay(1) back into an IObservable<>, with the understanding that you'll dispose of the underlying subscription when all subscribers dispose. This is probably the most crucial part of your query. It should be done this way. This is the Rx way of ensuring that you avoid memory leaks.
If you desperately need to keep the underlying connection open then you would need to encapsulate all of your code within a class that implements IDisposable to clean up the .Connect() that you would require.
Something like this:
public class EntityStream : IDisposable
{
private IDisposable _connection = null;
public EntityStream(IObservable<Entity> someLongRunningSubscriptionStream)
{
_stream =
someLongRunningSubscriptionStream
.Timestamp()
.Scan(
new Dictionary<string, Timestamped<Entity>>(),
(accumulator, update) =>
{
accumulator[update.Value.ID] = update;
return accumulator;
})
.Select(x => x.Select(y => y.Value).ToList())
.Replay(1);
_connection = _stream.Connect();
}
private IConnectableObservable<IList<Timestamped<Entity>>> _stream = null;
public IObservable<IList<Timestamped<Entity>>> GetStream()
{
return _stream.AsObservable();
}
public void Dispose()
{
if (_connection != null)
{
_connection.Dispose();
_connection = null;
}
}
}
I so very rarely do this though. I would thoroughly recommend the doing the first method. You should only mix OOP and Rx when you have to.
Please let me know if you need any clarification.

Creating generated sequence of events as a cold sequence

FWIW - I'm scrapping the previous version of this question in favor of different one along the same way after asking for advice on meta
I have a webservice that contains configuration data. I would like to call it at regular intervals Tok in order to refresh the configuration data in the application that uses it. If the service is in error (timeout, down, etc) I want to keep the data from the previous call and call the service again after a different time interval Tnotok. Finally I want the behavior to be testable.
Since managing time sequences and testability seems like a strong point of the Reactive Extensions, I started using an Observable that will be fed by a generated sequence. Here is how I create the sequence:
Observable.Generate<DataProviderResult, DataProviderResult>(
// we start with some empty data
new DataProviderResult() {
Failures = 0
, Informations = new List<Information>()},
// never stop
(r) => true,
// there is no iteration
(r) => r,
// we get the next value from a call to the webservice
(r) => FetchNextResults(r),
// we select time for next msg depending on the current failures
(r) => r.Failures > 0 ? tnotok : tok,
// we pass a TestScheduler
scheduler)
.Suscribe(r => HandleResults(r));
I have two problems currently:
It looks like I am creating a hot observable. Even trying to use Publish/Connect I have the suscribed action missing the first event. How can I create it as a cold observable?
myObservable = myObservable.Publish();
myObservable.Suscribe(r => HandleResults(r));
myObservable.Connect() // doesn't call onNext for first element in sequence
When I suscribe, the order in which the suscription and the generation seems off, since for any frame the suscription method is fired before the FetchNextResults method. Is it normal? I would expect the sequence to call the method for frame f, not f+1.
Here is the code that I'm using for fetching and suscription:
private DataProviderResult FetchNextResults(DataProviderResult previousResult)
{
Console.WriteLine(string.Format("Fetching at {0:hh:mm:ss:fff}", scheduler.Now));
try
{
return new DataProviderResult() { Informations = dataProvider.GetInformation().ToList(), Failures = 0};
}
catch (Exception)
{}
previousResult.Failures++;
return previousResult;
}
private void HandleResults(DataProviderResult result)
{
Console.WriteLine(string.Format("Managing at {0:hh:mm:ss:fff}", scheduler.Now));
dataResult = result;
}
Here is what I'm seeing that prompted me articulating these questions:
Starting at 12:00:00:000
Fetching at 12:00:00:000 < no managing the result that has been fetched here
Managing at 12:00:01:000 < managing before fetching for frame f
Fetching at 12:00:01:000
Managing at 12:00:02:000
Fetching at 12:00:02:000
EDIT: Here is a bare bones copy-pastable program that illustrates the problem.
/*using System;
using System.Reactive.Concurrency;
using System.Reactive.Linq;
using Microsoft.Reactive.Testing;*/
private static int fetchData(int i, IScheduler scheduler)
{
writeTime("fetching " + (i+1).ToString(), scheduler);
return i+1;
}
private static void manageData(int i, IScheduler scheduler)
{
writeTime("managing " + i.ToString(), scheduler);
}
private static void writeTime(string msg, IScheduler scheduler)
{
Console.WriteLine(string.Format("{0:mm:ss:fff} {1}", scheduler.Now, msg));
}
private static void Main(string[] args)
{
var scheduler = new TestScheduler();
writeTime("start", scheduler);
var datas = Observable.Generate<int, int>(fetchData(0, scheduler),
(d) => true,
(d) => fetchData(d, scheduler),
(d) => d,
(d) => TimeSpan.FromMilliseconds(1000),
scheduler)
.Subscribe(i => manageData(i, scheduler));
scheduler.AdvanceBy(TimeSpan.FromMilliseconds(3000).Ticks);
}
This outputs the following:
00:00:000 start
00:00:000 fetching 1
00:01:000 managing 1
00:01:000 fetching 2
00:02:000 managing 2
00:02:000 fetching 3
I don't understand why the managing of the first element is not picked up immediately after its fetching. There is one second between the sequence effectively pulling the data and the data being handed to the observer. Am I missing something here or is it expected behavior? If so is there a way to have the observer react immediately to the new value?
You are misunderstanding the purpose of the timeSelector parameter. It is called each time a value is generated and it returns a time which indicates how long to delay before delivering that value to observers and then generating the next value.
Here's a non-Generate way to tackle your problem.
private DataProviderResult FetchNextResult()
{
// let exceptions throw
return dataProvider.GetInformation().ToList();
}
private IObservable<DataProviderResult> CreateObservable(IScheduler scheduler)
{
// an observable that produces a single result then completes
var fetch = Observable.Defer(
() => Observable.Return(FetchNextResult));
// concatenate this observable with one that will pause
// for "tok" time before completing.
// This observable will send the result
// then pause before completing.
var fetchThenPause = fetch.Concat(Observable
.Empty<DataProviderResult>()
.Delay(tok, scheduler));
// Now, if fetchThenPause fails, we want to consume/ignore the exception
// and then pause for tnotok time before completing with no results
var fetchPauseOnErrors = fetchThenPause.Catch(Observable
.Empty<DataProviderResult>()
.Delay(tnotok, scheduler));
// Now, whenever our observable completes (after its pause), start it again.
var fetchLoop = fetchPauseOnErrors.Repeat();
// Now use Publish(initialValue) so that we remember the most recent value
var fetchLoopWithMemory = fetchLoop.Publish(null);
// YMMV from here on. Lets use RefCount() to start the
// connection the first time someone subscribes
var fetchLoopAuto = fetchLoopWithMemory.RefCount();
// And lets filter out that first null that will arrive before
// we ever get the first result from the data provider
return fetchLoopAuto.Where(t => t != null);
}
public MyClass()
{
Information = CreateObservable();
}
public IObservable<DataProviderResult> Information { get; private set; }
Generate produces cold observable sequences, so that is my first alarm bell.
I tried to pull your code into linqpad* and run it and changed it a bit to focus on the problem. It seems to me that you have the Iterator and ResultSelector functions confused. These are back-to-front. When you iterate, you should take the value from your last iteration and use it to produce your next value. The result selector is used to pick off (Select) the value form the instance you are iterating on.
So in your case, the type you are iterating on is the type you want to produce values of. Therefore keep your ResultSelector function just the identity function x=>x, and your IteratorFunction should be the one that make the WebService call.
Observable.Generate<DataProviderResult, DataProviderResult>(
// we start with some empty data
new DataProviderResult() {
Failures = 0
, Informations = new List<Information>()},
// never stop
(r) => true,
// we get the next value(iterate) by making a call to the webservice
(r) => FetchNextResults(r),
// there is no projection
(r) => r,
// we select time for next msg depending on the current failures
(r) => r.Failures > 0 ? tnotok : tok,
// we pass a TestScheduler
scheduler)
.Suscribe(r => HandleResults(r));
As a side note, try to prefer immutable types instead of mutating values as you iterate.
*Please provide an autonomous working snippet of code so people can better answer your question. :-)

Categories

Resources