Taking a snapshot of an IObservable<T> - c#

Suppose I have a service:
public interface ICustomersService
{
IObservable<ICustomer> Customers
{
get;
}
}
The implementation of the Customers property starts by grabbing all existing customers and passing them onto the observer, after which it only passes on customers that are added to the system later. Thus, it never completes.
Now suppose I wanted to grab a snapshot (as a List<ICustomer>) of the current customers, ignoring any that may be added in future. How do I do that? Any invocation of ToList() or its kin will block forever because the sequence never completes.
I figured I could write my own extension, so I tried this:
public static class RxExtensions
{
public static List<T> ToSnapshot<T>(this IObservable<T> #this)
{
var list = new List<T>();
using (#this.Subscribe(x => list.Add(x)));
return list;
}
}
This appears to work. For example:
var customers = new ReplaySubject<string>();
// snapshot has nothing in it
var snapshot1 = customers.ToSnapshot();
customers.OnNext("A");
customers.OnNext("B");
// snapshot has just the two customers in it
var snapshot2 = customers.ToSnapshot();
customers.OnNext("C");
// snapshot has three customers in it
var snapshot3 = customers.ToSnapshot();
I realize the current implementation depends on the scheduler being the current thread, otherwise ToSnapshot will likely close its subscription before items are received. However, I suspect I could also include a ToSnapshot override that takes an IScheduler and ensures any items scheduled there are received prior to ending the snapshot.
I can't find this sort of snapshot functionality built into Rx. Am I missing something?

You could try using a timeout on your observable
source.Customers().TakeUntil(DateTime.Now).ToEnumerable();

There are several ways to approach this. I have tried the following with success in commercial projects:
1) A separate method to get an enumerable of current customers as Chris demonstrated.
2) A method to combine a "state of the world" call with a live stream - this was somewhat more involved than Chris's example because in order to ensure no missed data one typically has to start listening to the live stream first, then get the snapshot, then combine the two with de-duping.
I achieved this with a custom Observable.Create implementation that cached the live stream until the history was retrieved and then merged the cache with the history before switching to live.
This returned Customers but wrapped with additional metadata that described the age of the data.
3) Most recently, it's been more useful to me to return IObservable<IEnumerable<Customer>> where the first event is the entire state of the world. The reason this has been more useful is that many systems I work on get updates in batches, and it's often faster to update a UI with an entire batch than item by item. It is otherwise similar to (2) except you can just use a FirstAsync() to get the snapshot you need.
I propose you consider this approach. You can always use a SelectMany(x => x) to flatten a stream of IObservable<IEnumerable<Customer>> to an IObservable<Customer> if you need to.
I'll see if I can dig out an example implementation when I get back to the home office!

What you've done here is actually pretty nifty. The reason ToSnapshot works is because the underlying implementation of your subscribe logic is yielding all of the customers to the observer before releasing control flow. Basically, Dispose is called only after the control flow is released, and the control flow is only released after you've yielded all pre-existing contacts.
While this is cool, it's also a misleading. The method you've written, ToSnapshot, should really be named something like TakeSyncronousNotifications. The extension is making heavy assumptions about how the underlying observable works, and isn't really in the spirit of Rx.
To make things easier to understand for the consumer, I would expose additional properties which explicitly state what is being returned.
public interface ICustomersService
{
IEnumerable<ICustomer> ExistingCustomers { get; }
IObservable<ICustomer> NewCustomers { get; }
IObservable<ICustomer> Customers { get; }
}
public class CustomerService : ICustomerService
{
public IEnumerable<ICustomer> ExistingCustomers { get { ... } }
public IObservable<ICustomer> NewCustomers { get { ... } }
public IObservable<ICustomer> Customers
{
get
{
return this.ExistingCustomers.ToObservable().Concat(this.NewCustomers);
}
}
}
Edit:
Consider the following problem...
50 = x + y. solve for and evaluate x.
The math just doesn't work unless you know what y is. In this example, y is the "new customers", x is the "existing customers", and 50 is the combination of the two.
By exposing only a combination of the existing and new customers, and not the existing and new customers themselves, you've lost too much data. You need to expose at least x or y to the consumer, otherwise there's no way to solve for the other.

Related

Rx.NET 'Distinct' to get the lastest value?

I'm new to Rx and I'm trying to make a GUI to display stock market data. The concept is a bit like ReactiveTrader, but I'll need to display the whole "depth", i.e., all prices and their buy/sell quantities in the market instead of only the "top level" of the market buy/sells, sorting by price.
The data structure for each "price level" is like this:
public class MarketDepthLevel
{
public int MarketBidQuantity { get; set; }
public decimal Price { get; set; }
public int MarketAskQuantity { get; set; }
}
And underneath the GUI, a socket listens to network updates and return them as an Observable:
IObservable<MarketDepthLevel> MarketPriceLevelStream;
Which after transformed into a ReactiveList, eventually bound to a DataGrid.
The transformation would basically choose the latest updates of each price level, and sort them by price. So I come up with something like this:
public IReactiveDerivedList<MarketDepthLevel> MarketDepthStream
{
get
{
return MarketDepthLevelStream
.Distinct(x => x.Price)
.CreateCollection()
.CreateDerivedCollection(x => x, orderer: (x, y) => y.Price.CompareTo(x.Price));
}
}
But there are problems:
When 'Distinct' sees a same price as appeared before, it discards the new one, but I need the new one to replace the old ones (as they contain the lasted MarketBidQuantity/MarketAskQuantity)
It seems a bit clumsy to CreateCollection/CreateDerivedColleciton
Any thoughts on solving these (especially the 1st problem)?
Thanks
Just group the items and then project each group to be the last item in the group:
return MarketDepthLevelStream
.GroupBy(x => x.Price, (key, items) => items.Last());
If I understand you correctly, you want to project a stream of MarketDepthLevel updates in a list of the latest bid/ask quantities for each level (in finance parlance, this is a type of ladder). The ladder is held as a ReactiveList bound the UI. (ObserveOn may be required, although ReactiveList handles this in most cases I believe)
Here's an example ladder snapped from http://ratesetter.com, where the "Price" is expressed as a percentage (Rate), and the bid/ask sizes by the amount lenders want and borrowers need at each price level:
At this point, I begin to get slightly lost. I'm confused as to why you need any further Rx operators, since you could simply Subscribe to the update stream as is and have the handler update a fixed list data-bound to a UI. Doesn't each new event simply need to be added to the ReactiveList if it's a new price, or replace an existing entry with a matching price? Imperative code to do this is fine if it's the last step in the chain to the UI.
There is only value in doing this in the Rx stream itself if you need to convert the MarketDepthLevelStream into a stream of ladders. That could be useful, but you don't mention this need.
Such a need could be driven by either the desire to multicast the stream to many subscribers, and/or because you have further transformations or projections you need to make on the ladders.
Bear in mind, if the ladder is large, then working with whole ladder updates in the UI might give you performance issues - in many cases, individual updates into a mutable structure like a ReactiveList are a practical way to go.
If working with a stream of ladders is a requirement, then look to Observable.Scan. This is the work-horse operator in Rx that maintains local state. It is used for any form of running aggregate - such as a running total, average etc., or in this case, a ladder.
Now if all you need is the static list described above, I am potentially off on a massive detour here, but it's a useful discussion so...
You'll want to think carefully about the type used for the ladder aggregate - you need to be concious of how down-stream events would be consumed. It's likely to need an immutable collection type so that things don't get weird for subscribers (each event should be in effect a static snapshot of the ladder). With immutable collections, it may be important to think about memory efficiency.
Here's a simple example of how a ladder stream might work, using an immutable collection from nuget pre-release package System.Collections.Immutable:
public static class ObservableExtensions
{
public static IObservable<ImmutableSortedDictionary<decimal, MarketDepthLevel>>
ToLadder(this IObservable<MarketDepthLevel> source)
{
return source.Scan(ImmutableSortedDictionary<decimal, MarketDepthLevel>.Empty,
(lastLadder, depthLevel) =>
lastLadder.SetItem(depthLevel.Price, depthLevel));
}
}
The ToLadder extension method creates an empty immutable ladder as the seed aggregate, and each successive MarketDepthLevel event produces a new update ladder. You may want to see if ImmutableSortedSet is sufficient.
You would probably want to wrap/project this into your own type, but hopefully you get the idea.
Ultimately, this still leaves you with the challenge of updating a UI - and as mentioned before now you are stuck with the whole ladder, meaning you have to bind a whole ladder every time, or convert it back to a stream of individual updates - and it's getting way to off topic to tackle that here...!

Property initialisation anti-pattern

Now and again I end up with code along these lines, where I create some objects then loop through them to initialise some properties using another class...
ThingRepository thingRepos = new ThingRepository();
GizmoProcessor gizmoProcessor = new GizmoProcessor();
WidgetProcessor widgetProcessor = new WidgetProcessor();
public List<Thing> GetThings(DateTime date)
{
List<Thing> allThings = thingRepos.FetchThings();
// Loops through setting thing.Gizmo to a new Gizmo
gizmoProcessor.AddGizmosToThings(allThings);
// Loops through setting thing.Widget to a new Widget
widgetProcessor.AddWidgetsToThings(allThings);
return allThings;
}
...which just, well, feels wrong.
Is this a bad idea?
Is there a name of an anti-pattern that I'm using here?
What are the alternatives?
Edit: assume that both GizmoProcessor and WidgetProcessor have to go off and do some calculation, and get some extra data from other tables. They're not just data stored in a repository. They're creating new Gizmos and Widgets based on each Thing and assigning them to Thing's properties.
The reason this feels odd to me is that Thing isn't an autonomous object; it can't create itself and child objects. It's requiring higher-up code to create a fully finished object. I'm not sure if that's a bad thing or not!
ThingRepository is supposed to be the single access point to get collections of Thing's, or at least that's where developers will intuitively look. For that reason, it feels strange that GetThings(DateTime date) should be provided by another object. I'd rather place that method in ThingRepository itself.
The fact that the Thing's returned by GetThings(DateTime date) are different, "fatter" animals than those returned by ThingRepository.FetchThings() also feels awkward and counter-intuitive. If Gizmo and Widget are really part of the Thing entity, you should be able to access them every time you have an instance of Thing, not just for instances returned by GetThings(DateTime date).
If the Date parameter in GetThings() isn't important or could be gathered at another time, I would use calculated properties on Thing to implement on-demand access to Gizmo and Widget :
public class Thing
{
//...
public Gizmo Gizmo
{
get
{
// calculations here
}
}
public Widget Widget
{
get
{
// calculations here
}
}
}
Note that this approach is valid as long as the calculations performed are not too costly. Calculated properties with expensive processing are not recommended - see http://msdn.microsoft.com/en-us/library/bzwdh01d%28VS.71%29.aspx#cpconpropertyusageguidelinesanchor1
However, these calculations don't have to be implemented inline in the getters - they can be delegated to third-party Gizmo/Widget processors, potentially with a caching strategy, etc.
If you have complex intialization then you could use a Strategy pattern. Here is a quick overview adapted from this strategy pattern overview
Create a strategy interface to abstract the intialization
public interface IThingInitializationStrategy
{
void Initialize(Thing thing);
}
The initialization implementation that can be used by the strategy
public class GizmosInitialization
{
public void Initialize(Thing thing)
{
// Add gizmos here and other initialization
}
}
public class WidgetsInitialization
{
public void Initialize(Thing thing)
{
// Add widgets here and other initialization
}
}
And finally a service class that accepts the strategy implementation in an abstract way
internal class ThingInitalizationService
{
private readonly IThingInitializationStrategy _initStrategy;
public ThingInitalizationService(IThingInitializationStrategy initStrategy)
{
_initStrategy = initStrategy;
}
public Initialize(Thing thing)
{
_initStrategy.Initialize(thing);
}
}
You can then use the initialization strategies like so
var initializationStrategy = new GizmosInitializtion();
var initializationService = new ThingInitalizationService(initializationStrategy);
List<Thing> allThings = thingRepos.FetchThings();
allThings.Foreach ( thing => initializationService.Initialize(thing) );
Tho only real potential problem would be that you're iterating over the same loop multiple times, but if you need to hit a database to get all the gizmos and widgets then it might be more efficient to request them in batches so passing the full list to your Add... methods would make sense.
The other option would be to look into returning the gizmos and widgets with the thing in the first repository call (assuming they reside in the same repo). It might make the query more complex, but it would probably be more efficient. Unless of course you don't ALWAYS need to get gizmos and widgets when you fetch things.
To answer your questions:
Is this a bad idea?
From my experience, you rarely know if it's a good/bad idea until you need to change it.
IMO, code is either: Over-engineered, under-engineered, or unreadable
In the meantime, you do your best and stick to the best practices (KISS, single responsibility, etc)
Personally, I don't think the processor classes should be modifying the state of any Thing.
I also don't think the processor classes should be given a collection of Things to modify.
Is there a name of an anti-pattern that I'm using here?
Sorry, unable to help.
What are the alternatives?
Personally, I would write the code as such:
public List<Thing> GetThings(DateTime date)
{
List<Thing> allThings = thingRepos.FetchThings();
// Build the gizmo and widget for each thing
foreach (var thing in allThings)
{
thing.Gizmo = gizmoProcessor.BuildGizmo(thing);
thing.Widget = widgetProcessor.BuildWidget(thing);
}
return allThings;
}
My reasons being:
The code is in a class that "Gets things". So logically, I think it's acceptable for it to traverse each Thing object and initialise them.
The intention is clear: I'm initialising the properties for each Thing before returning them.
I prefer initialising any properties of Thing in a central location.
I don't think that gizmoProcessor and widgetProcessor classes should have any business with a Collection of Things
I prefer the Processors to have a method to build and return a single widget/gizmo
However, if your processor classes are building several properties at once, then only would I refactor the property initialisation to each processor.
public List<Thing> GetThings(DateTime date)
{
List<Thing> allThings = thingRepos.FetchThings();
// Build the gizmo and widget for each thing
foreach (var thing in allThings)
{
// [Edited]
// Notice a trend here: The common Initialize(Thing) interface
// Could probably be refactored into some
// super-mega-complex Composite Builder-esque class should you ever want to
gizmoProcessor.Initialize(thing);
widgetProcessor.Initialize(thing);
}
return allThings;
}
P.s.:
I personally do not care that much for (Anti)Pattern names.
While it helps to discuss a problem at a higher level of abstraction, I wouldn't commit every (anti)pattern names to memory.
When I come across a Pattern that I believe is helpful, then only do I remember it.
I'm quite lazy, and my rationale is that: Why bother remembering every pattern and anti pattern if I'm only going to use a handful?
[Edit]
Noticed an answer was already given regarding using a Strategy Service.

Function should get called only once

I have one c# function which returns me List of States. I want this function should get called only once like static variable.
public List GetStateList()
{
List lstState=new List();
lstState.add("State1");
lstState.add("State2");
lstState.add("State3");
return lstState;
}
I m calling this function from many places since this state list is going to be same so i want this function should get called only once, and next time when this function is getting called it should not re create the whole list again.
How could i achieve this in c#.
Memoise it. It'll still be called multiple times, but only do the full work once:
private List<string> _states; //if GetStateList() doesn't depend on object
//state, then this can be static.
public List GetStateList()
{
if(_states == null)
{
List lstState=new List();
lstState.add("State1");
lstState.add("State2");
lstState.add("State3");
_states = lstState;
}
return _states;
}
Depending on threading issues, you may wish to either:
Lock on the whole thing. Guaranteed single execution.
Lock on the assignment to _states. There may be some needless work in the early period, but all callers will receive the same object.
Allow for early callers to overwrite each other.
While the last may seem the most wasteful, it can be the best in the long run, as after the initial period where different calls may needlessly overwrite each other, it becomes a simpler and faster from that point on. It really depends on just how much work is done, and how often it may be concurrently called prior to _states being assigned to.
One issue with reusing a list is that callers can modify this list, which will affect any pre-existing references to it. For such a small amount of data, this isn't likely to save you very much in the long run. I'd probably be content to just return a new array each time.
I certainly wouldn't bother with lazy instantiation; populate it in the constructor and be done:
public static class States {
static States() {
All = Array.AsReadOnly(new string[] { "state1", "state2", "state3" });
}
public static readonly ReadOnlyCollection<string> All;
}
Now it's thread-safe, (relatively) tamper-proof, and above all, simple.

Looking for advice on thread safety using static methods to 'process' a class instance

I have recently inherited a system that uses a very basic approach to processing workitems, basically, it does them one by one. To be honest, up until recently this worked well. However, we are looking to implement a similiar process for another type of workitem and I have been looking into Task Parallel Library and think that will fit the bill. However, I have some concerns about Thread Safety and to be honest, this is an area that I lack knowledge, so I am asking only my 2nd question on here in hope that someone can give me some good points as I have yet to find a definitive yes or no answer for this.
So we have our 'WorkItem' class
public class WorkItem
{
public int Id {get; set;}
public string data { get; set;}
}
A List<WorkItem> will be generated and these will then be processed using a Parallel.Foreach loop.
The Parallel.Foreach will call a private method, which in turn will call static methods from another assembly;
//Windows service that will run the Parallel.Foreach
private int MainMethod(WorkItem item)
{
item.Data = Processor.ProcessWorkItemDataProcess1(item.data);
item.Data = Processor.ProcessWorkItemDataProcess2(item.data);
SendToWorkFlow(item);
}
public static class Processor
{
public static string ProcessWorkItemDataProcess1(string data)
{
//Process it here
return string
}
public static string ProcessWorkItemDataProcess2(string data)
{
//Process it here
return string
}
}
And so on. All of these methods have logic in them to process the WorkItem instance at various different stages. Once complete, the MainMethod will send the processed WorkItem off to a Workflow System.
We will be processing these in batches of up to 30 in order not to overload the other systems. My concerns are basically the potential of 30 instances of WorkItem accessing the same static methods could cause some data integrity issues. For example, ProcessWorkItemDataProcess2 is called with WorkItem1.Data and is subsequently called with WorkItem2.Data and somehow WorkItem2.Data is returned when it should be WorkItem1.Data
All of the static methods are self-contained in so far as they have defined logic and will only (in theory) use the WorkItem that it was called with. There are no methods such as DB access, file access, etc.
So, hopefully that explains what I am doing. Should I have any concerns? If so, will creating an instance of the Processor class for each WorkItem solve any potential problems?
Thanks in advance
The scenario you describe doesn't sound like it has any blatant threading issues. Your worries about a static method being called on two different threads and getting the data mixed up is unfounded, unless you write code to mix things up. ;>
Since the methods are static, they don't have any shared object instance to worry about. That's good. You have isolated the work into self-contained work items. That is good.
You will need to check to make sure that none of the static methods access any global state, like static variables or properties, or reading from a file (the same file name for multiple work items). Reading of global state is less of a concern, writing is what will throw a wrench in the works.
You should also review your code to see how data is assigned to your work items and whether any of the code that processes the work items modifies the work item data. If the work items are treated as strictly read only by the methods, that's good. If the methods write changes back to fields or properties of the work items, you will need to double check that the data in the work items is not shared with any other work items. If the code that constructs the work item instances assigns a cached value to a property of multiple work items, and the static methods modify properties of that value, you will have threading conflicts. If the work item construction always constructs new instances of values that are assigned to properties of the work item, this shouldn't be an issue.
In a nutshell, if you have multiple threads accessing shared state, and at least one is writing, then you need to worry about thread safety. If not then you're golden.

How to implement badges?

I've given some thought to implementing badges (just like the badges here on Stack Overflow) and think it would be difficult without Windows services, but I'd like to avoid that if possible.
I came up with a plan to implement some examples:
Audobiographer: Check if all fields in the profile is filled out.
Commentor: When making a comment check if the number of comments equal 10, if so award the badge.
Good Answer: When voting up check to see if vote score is 25 or higher.
How could this be implemented in the database? Or would another way be better?
A similar-to-Stackoverflow implementation is actually a lot simpler than you have described, based on bits of info dropped by the team every once in awhile.
In the database, you simply store a collection of BadgeID-UserID pairs to track who has what (and a count or a rowID to allow multiple awards for some badges).
In the application, there is a worker object for each badge type. The object is in cache, and when the cache expires, the worker runs its own logic for determining who should get the badge and making the updates, and then it re-inserts itself into the cache:
public abstract class BadgeJob
{
protected BadgeJob()
{
//start cycling on initialization
Insert();
}
//override to provide specific badge logic
protected abstract void AwardBadges();
//how long to wait between iterations
protected abstract TimeSpan Interval { get; }
private void Callback(string key, object value, CacheItemRemovedReason reason)
{
if (reason == CacheItemRemovedReason.Expired)
{
this.AwardBadges();
this.Insert();
}
}
private void Insert()
{
HttpRuntime.Cache.Add(this.GetType().ToString(),
this,
null,
Cache.NoAbsoluteExpiration,
this.Interval,
CacheItemPriority.Normal,
this.Callback);
}
}
And a concrete implementation:
public class CommenterBadge : BadgeJob
{
public CommenterBadge() : base() { }
protected override void AwardBadges()
{
//select all users who have more than x comments
//and dont have the commenter badge
//add badges
}
//run every 10 minutes
protected override TimeSpan Interval
{
get { return new TimeSpan(0,10,0); }
}
}
Jobs. That is the key. Out of process jobs that run at set intervals to check the criteria that you mention. I don't think you even need to have a windows service unless it requires some external resources to set the levels. I actually think StackOverflow uses jobs as well for their calculations.
You could use triggers and check upon update or insert, then if your conditions are met add badge. That would handle it pretty seem less. Commence the trigger bashing in 3, 2, 1...
comments must be stored within the database right? then i think there are two main ways to do this.
1) when a user logs in you get a count of the comments. this is obvisously not the desired approach as the count could take a lot of time
2) when a user posts a comment you could either do a count then and store the count with the use details or you could do a trigger which executes when a comment is added. the trigger would then get the details of the newly created comment, grab the user id, get a count and store that against the user in a table of some sort.
i like the idea of a trigger as your program can return w/out waiting for sql server to do its stuff.

Categories

Resources