Rx.NET 'Distinct' to get the lastest value? - c#

I'm new to Rx and I'm trying to make a GUI to display stock market data. The concept is a bit like ReactiveTrader, but I'll need to display the whole "depth", i.e., all prices and their buy/sell quantities in the market instead of only the "top level" of the market buy/sells, sorting by price.
The data structure for each "price level" is like this:
public class MarketDepthLevel
{
public int MarketBidQuantity { get; set; }
public decimal Price { get; set; }
public int MarketAskQuantity { get; set; }
}
And underneath the GUI, a socket listens to network updates and return them as an Observable:
IObservable<MarketDepthLevel> MarketPriceLevelStream;
Which after transformed into a ReactiveList, eventually bound to a DataGrid.
The transformation would basically choose the latest updates of each price level, and sort them by price. So I come up with something like this:
public IReactiveDerivedList<MarketDepthLevel> MarketDepthStream
{
get
{
return MarketDepthLevelStream
.Distinct(x => x.Price)
.CreateCollection()
.CreateDerivedCollection(x => x, orderer: (x, y) => y.Price.CompareTo(x.Price));
}
}
But there are problems:
When 'Distinct' sees a same price as appeared before, it discards the new one, but I need the new one to replace the old ones (as they contain the lasted MarketBidQuantity/MarketAskQuantity)
It seems a bit clumsy to CreateCollection/CreateDerivedColleciton
Any thoughts on solving these (especially the 1st problem)?
Thanks

Just group the items and then project each group to be the last item in the group:
return MarketDepthLevelStream
.GroupBy(x => x.Price, (key, items) => items.Last());

If I understand you correctly, you want to project a stream of MarketDepthLevel updates in a list of the latest bid/ask quantities for each level (in finance parlance, this is a type of ladder). The ladder is held as a ReactiveList bound the UI. (ObserveOn may be required, although ReactiveList handles this in most cases I believe)
Here's an example ladder snapped from http://ratesetter.com, where the "Price" is expressed as a percentage (Rate), and the bid/ask sizes by the amount lenders want and borrowers need at each price level:
At this point, I begin to get slightly lost. I'm confused as to why you need any further Rx operators, since you could simply Subscribe to the update stream as is and have the handler update a fixed list data-bound to a UI. Doesn't each new event simply need to be added to the ReactiveList if it's a new price, or replace an existing entry with a matching price? Imperative code to do this is fine if it's the last step in the chain to the UI.
There is only value in doing this in the Rx stream itself if you need to convert the MarketDepthLevelStream into a stream of ladders. That could be useful, but you don't mention this need.
Such a need could be driven by either the desire to multicast the stream to many subscribers, and/or because you have further transformations or projections you need to make on the ladders.
Bear in mind, if the ladder is large, then working with whole ladder updates in the UI might give you performance issues - in many cases, individual updates into a mutable structure like a ReactiveList are a practical way to go.
If working with a stream of ladders is a requirement, then look to Observable.Scan. This is the work-horse operator in Rx that maintains local state. It is used for any form of running aggregate - such as a running total, average etc., or in this case, a ladder.
Now if all you need is the static list described above, I am potentially off on a massive detour here, but it's a useful discussion so...
You'll want to think carefully about the type used for the ladder aggregate - you need to be concious of how down-stream events would be consumed. It's likely to need an immutable collection type so that things don't get weird for subscribers (each event should be in effect a static snapshot of the ladder). With immutable collections, it may be important to think about memory efficiency.
Here's a simple example of how a ladder stream might work, using an immutable collection from nuget pre-release package System.Collections.Immutable:
public static class ObservableExtensions
{
public static IObservable<ImmutableSortedDictionary<decimal, MarketDepthLevel>>
ToLadder(this IObservable<MarketDepthLevel> source)
{
return source.Scan(ImmutableSortedDictionary<decimal, MarketDepthLevel>.Empty,
(lastLadder, depthLevel) =>
lastLadder.SetItem(depthLevel.Price, depthLevel));
}
}
The ToLadder extension method creates an empty immutable ladder as the seed aggregate, and each successive MarketDepthLevel event produces a new update ladder. You may want to see if ImmutableSortedSet is sufficient.
You would probably want to wrap/project this into your own type, but hopefully you get the idea.
Ultimately, this still leaves you with the challenge of updating a UI - and as mentioned before now you are stuck with the whole ladder, meaning you have to bind a whole ladder every time, or convert it back to a stream of individual updates - and it's getting way to off topic to tackle that here...!

Related

Status of aggregate root dependent on the status of children

I am new to using DDD and need some advice. I have an aggregate root that contains a collection of children. The status of the root (IsFinished) depends on the children. Let's say I absolutely need the aggregate root to have a field '_someField', while there can be an unlimited number of aggregate roots with the same field value '_someField' and at the same time this field must be a string. According to this field, I need to find all aggregate roots that have some specific value (e.g. 'Test') and are in !IsFinished state. I am using EF6 and the search is quite slow. How would one generally go about solving something like this?
public class MyAggregateRoot: AggregateRoot
{
private IList<Item> _items;
private string _someField;
public bool IsFinished => _items.All(i => i.Status == ItemStatus.Finished);
public MyAggregateRoot()
{
_items = new List<Item>();
}
...
}
For example, I was thinking of having a status variable inside the aggregate root that would be set to IsFinished the moment the last item is set to ItemStatus.Finished, but in that case it could happen for example that I have the status of the root set to Finished for some reason but some Item will not be in the finished state and therefore the data will be in an inconsistent state.
EDIT:
So let me ask it a little differently. What bothers me is whether it's good to always go through the collection to find some state, or for example some total - for example if I have a cart with some items, each item has a price and I want to calculate the total price.
One option would be to go through the collection of all items and add up their price. The other option would be to have a totalPrice variable in the Aggregare root (cart) and always recalculate that price appropriately when adding an item to the cart. I think the first solution is better from DDD point of view, but it will be worse from performance point of view.
Another example I can think of is some todo-list that contains individual tasks. Each task has a status not-completed or completed, and the todo-list has a status that is completed when all the todo-list's tasks are completed. If I had a really large number of todo-lists and I wanted to display all completed todo-lists, for example, I would also have to go through the tasks for all todo-lists and determine the todo-list status based on the status of all the tasks.
I'd like to get advice from an experienced developer who could, for example, explain to me when it's better to use the first solution and when to use the second (or a completely different one). Or tell me if, for example, the second solution somehow violates the DDD.
It depends on the use case.
If the use case states that there may be a large or unlimited number of children, it is technically not practical to put the children into the aggregate root.
On the other hand a small set of children in the parent may be handled by the ORM mapper (hint: I'm a Java developer, in my case it would be Hibernate) without performance problems.
Had a similar issue: My Process parent could have an unlimited number of Activity children. I decided to implement Process as an entity (being the aggregate root) and Activity as an value object with a reference to its Process. Changes to an Activity causing changes to its parent are propagated by events and handled asynchronously (so eventual consistency comes into play).
Vaughn Vernon discusses such design decisions in its "red book", that helped me a lot to come to a decision for implementation.

Handling Population within a game (unity)

I am currently working on a sim city style game, and I am currently looking into managing the population for the city. The script will eventually work as when a house is placed, that house adds to the capacity of how many people the city can house. When the construction is complete then I will add that number of citizen structs to a List of citizens.
However, imagine that the population reaches in excess of 1000, 10000 citizens. Will this be the optimal solution for controlling a large volume of citizens? Moreover, when a house is removed this will remove the amount for the population (removing from the list) thus leaving job vacancies. I eventually would like for the player to be able to shift focus so any buildings with enum category of the focus shift will mean the work force will fill in those jobs first. Again would using the List and Linq queries be the way to go or would there be a better solution found with something else?
public class City : MonoBehaviour
{
public List<Citizen> citizens = new List<Citizen>();
public List<Building> cityBuildings = new List<Building>();
// TODO (LINQ): Method for checking if a building has no employees and this employee is unemployed then assign this citizen to the building
}
public struct Citizen
{
public Building employedAt;
public bool CheckEmployment()
{
if (employedAt != null)
{
return true;
}
else
{
return false;
}
}
}
The answer is - as you may have expected - it depends. The LINQ-operations are usually quite fast unless you are talking millions of objects. However they will produce some garbage that has to be collected eventually. If you perform such operations every frame you may run into GC hiccups. If you run stuff not that often (e.g. only when a player places/removes a house, etc.) this approach should work fine.
If you need maximum performance you may want to have a look at the new DOTS architecture (aka. ECS) in Unity, which allows you to manage large quantities of data fast. That being said - premature optimization is the root of all evil and DOTS is quite the beast to wrap your head around.
I'd start with the LINQ queries, making sure they are not called every frame and maybe some clever caching and only bring in the big guns when I actually have a performance problem.

Taking a snapshot of an IObservable<T>

Suppose I have a service:
public interface ICustomersService
{
IObservable<ICustomer> Customers
{
get;
}
}
The implementation of the Customers property starts by grabbing all existing customers and passing them onto the observer, after which it only passes on customers that are added to the system later. Thus, it never completes.
Now suppose I wanted to grab a snapshot (as a List<ICustomer>) of the current customers, ignoring any that may be added in future. How do I do that? Any invocation of ToList() or its kin will block forever because the sequence never completes.
I figured I could write my own extension, so I tried this:
public static class RxExtensions
{
public static List<T> ToSnapshot<T>(this IObservable<T> #this)
{
var list = new List<T>();
using (#this.Subscribe(x => list.Add(x)));
return list;
}
}
This appears to work. For example:
var customers = new ReplaySubject<string>();
// snapshot has nothing in it
var snapshot1 = customers.ToSnapshot();
customers.OnNext("A");
customers.OnNext("B");
// snapshot has just the two customers in it
var snapshot2 = customers.ToSnapshot();
customers.OnNext("C");
// snapshot has three customers in it
var snapshot3 = customers.ToSnapshot();
I realize the current implementation depends on the scheduler being the current thread, otherwise ToSnapshot will likely close its subscription before items are received. However, I suspect I could also include a ToSnapshot override that takes an IScheduler and ensures any items scheduled there are received prior to ending the snapshot.
I can't find this sort of snapshot functionality built into Rx. Am I missing something?
You could try using a timeout on your observable
source.Customers().TakeUntil(DateTime.Now).ToEnumerable();
There are several ways to approach this. I have tried the following with success in commercial projects:
1) A separate method to get an enumerable of current customers as Chris demonstrated.
2) A method to combine a "state of the world" call with a live stream - this was somewhat more involved than Chris's example because in order to ensure no missed data one typically has to start listening to the live stream first, then get the snapshot, then combine the two with de-duping.
I achieved this with a custom Observable.Create implementation that cached the live stream until the history was retrieved and then merged the cache with the history before switching to live.
This returned Customers but wrapped with additional metadata that described the age of the data.
3) Most recently, it's been more useful to me to return IObservable<IEnumerable<Customer>> where the first event is the entire state of the world. The reason this has been more useful is that many systems I work on get updates in batches, and it's often faster to update a UI with an entire batch than item by item. It is otherwise similar to (2) except you can just use a FirstAsync() to get the snapshot you need.
I propose you consider this approach. You can always use a SelectMany(x => x) to flatten a stream of IObservable<IEnumerable<Customer>> to an IObservable<Customer> if you need to.
I'll see if I can dig out an example implementation when I get back to the home office!
What you've done here is actually pretty nifty. The reason ToSnapshot works is because the underlying implementation of your subscribe logic is yielding all of the customers to the observer before releasing control flow. Basically, Dispose is called only after the control flow is released, and the control flow is only released after you've yielded all pre-existing contacts.
While this is cool, it's also a misleading. The method you've written, ToSnapshot, should really be named something like TakeSyncronousNotifications. The extension is making heavy assumptions about how the underlying observable works, and isn't really in the spirit of Rx.
To make things easier to understand for the consumer, I would expose additional properties which explicitly state what is being returned.
public interface ICustomersService
{
IEnumerable<ICustomer> ExistingCustomers { get; }
IObservable<ICustomer> NewCustomers { get; }
IObservable<ICustomer> Customers { get; }
}
public class CustomerService : ICustomerService
{
public IEnumerable<ICustomer> ExistingCustomers { get { ... } }
public IObservable<ICustomer> NewCustomers { get { ... } }
public IObservable<ICustomer> Customers
{
get
{
return this.ExistingCustomers.ToObservable().Concat(this.NewCustomers);
}
}
}
Edit:
Consider the following problem...
50 = x + y. solve for and evaluate x.
The math just doesn't work unless you know what y is. In this example, y is the "new customers", x is the "existing customers", and 50 is the combination of the two.
By exposing only a combination of the existing and new customers, and not the existing and new customers themselves, you've lost too much data. You need to expose at least x or y to the consumer, otherwise there's no way to solve for the other.

code performance question

Let's say I have a relatively large list of an object MyObjectModel called MyBigList. One of the properties of MyObjectModel is an int called ObjectID. In theory, I think MyBigList could reach 15-20MB in size. I also have a table in my database that stores some scalars about this list so that it can be recomposed later.
What is going to be more efficient?
Option A:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int RowID = PutScalarsInDB(MyBigList);
Option B:
List<MyObjectModel> MyBigList = null;
MyBigList = GetBigList(some parameters);
int TheCount = MyBigList.Count();
StringBuilder ListOfObjectID = null;
foreach (MyObjectModel ThisObject in MyBigList)
{
ListOfObjectID.Append(ThisObject.ObjectID.ToString());
}
int RowID = PutScalarsInDB ( TheCount, ListOfObjectID);
In option A I pass MyBigList to a function that extracts the scalars from the list, stores these in the DB and returns the row where these entries were made. In option B, I keep MyBigList in the page method where I do the extraction of the scalars and I just pass these to the PutScalarsInDB function.
What's the better option, and it could be that yet another is better? I'm concerned about passing around objects this size and memory usage.
I don't think you'll see a material difference between these two approaches. From your description, it sounds like you'll be burning the same CPU cycles either way. The things that matter are:
Get the list
Iterate through the list to get the IDs
Iterate through the list to update the database
The order in which these three activities occur, and whether they occur within a single method or a subroutine, doesn't matter. All other activities (declaring variables, assigning results, etc.,) are of zero to negligible performance impact.
Other things being equal, your first option may be slightly more performant because you'll only be iterating once, I assume, both extracting IDs and updating the database in a single pass. But the cost of iteration will likely be very small compared with the cost of updating the database, so it's not a performance difference you're likely to notice.
Having said all that, there are many, many more factors that may impact performance, such as the type of list you're iterating through, the speed of your connection to the database, etc., that could dwarf these other considerations. It doesn't look like too much code either way. I'd strongly suggesting building both and testing them.
Then let us know your results!
If you want to know which method has more performance you can use the stopwatch class to check the time needed for each method. see here for stopwatch usage: http://www.dotnetperls.com/stopwatch
I think there are other issues for a asp.net application you need to verify:
From where do read your list? if you read it from the data base, would it be more efficient to do your work in database within a stored procedure.
Where is it stored? Is it only read and destroyed or is it stored in session or application?

How to break down large 'macro' classes?

One application I work on does only one thing, looking from outside world. Takes a file as input and after ~5 minutes spits out another file.
What happens inside is actually a sequential series of action. The application is, in our opinion, structured well because each action is like a small box, without too many dependencies.
Usually some later actions use some information from previous one and just a few can be executed in parallel - for the sake of simplicity we prefer to the execution sequential.
Now the problem is that the function that executes all this actions is like a batch file: a long list of calls to different functions with different arguments. So, looking in the code it looks like:
main
{
try
{
result1 = Action1(inputFile);
result2 = Action2(inputFile);
result3 = Action3(result2.value);
result4 = Action4(result1.value, inputFile);
... //You get the idea. There is no pattern passed paramteres
resultN = ActionN(parameters);
write output
}
catch
{
something went wrong, display the error
}
}
How would you model the main function of this application so is not just a long list of commands?
Not everything needs to fit to a clever pattern. There are few more elegant ways to express a long series of imperative statements than as, well, a long series of imperative statements.
If there are certain kinds of flexibility you feel you are currently lacking, express them, and we can try to propose solutions.
If there are certain clusters of actions and results that are re-used often, you could pull them out into new functions and build "aggregate" actions from them.
You could look in to dataflow languages and libraries, but I expect the gain to be small.
Not sure if it's the best approach, but you could have an object that would store all the results and you would give it to each method in turn. Every method would read the parameters it needs and write its result there. You could then have a collection of actions (either as delegates or objects implementing an interface) and call them in a loop.
class Results
{
public int Result1 { get; set; }
public string Result2 { get; set; }
…
}
var actions = new Action<Results>[] { Action1, Action2, … };
Results results = new Results();
foreach (var action in actions)
action(results);
You can think of implementing a Sequential Workflow from Windows Workflow
First of all, this solution is far not bad. If the actions are disjunct, I mean there are no global parameters or other hidden dependencies between different actions or between actions and the environment, it's a good solution. Easy to maintain or read, and when you need to expand the functionality, you have just to add new actions, when the "quantity" changes, you have just to add or remove lines from the macro sequence. If there's no need for change frequently the process chain: don't move!
If it's a system, where the implementation of actions don't often changes, but their order and parameters yes, you may design a simple script language, and transform the macro class into that script. This script should be maintained by someone else than you, someone who is familiar with the problem domain in the level of your "actions". So, he/she can assembly the application using script language without your assistance.
One nice approach for that kind of problem splitting is dataflow programming (a.k.a. Flow-based programming). In dataflow programming, there are pre-written components. Components are black boxes (from the view of the application developer), they have consumer (input) and producer (output) ports, which can be connected to form a processing network, which is then the application. If there're a good set of components for a domain, many applications can created without programming new components. Also, components can be built of other components (they called composite components).
Wikipedia (good starting point):
http://en.wikipedia.org/wiki/Dataflow_programming
http://en.wikipedia.org/wiki/Flow-based_programming
JPM's site (book, wiki, everything):
http://jpaulmorrison.com/fbp/
I think, bigger systems must have that split point you describe as "macro". Even games have that point, e.g. FPS games have a 3D engine and a game logic script, or there's SCUMM VM, which is the same.

Categories

Resources