Join n observable sources on key, with potentially missing keys - c#

I have multiple data sources which share a tag/key which I need to re-synchronize. The type signature would look a bit like this:
IObservable<R> JoinOnKey<T,R>(IObservable<T>[] sources,
Func<T,int> getKey,
Func<T[],R> projection)
Unfortunately there are two complications:
some sources may have missing tags, but I still want to get the others
This implies that the function needs to 'give up' waiting after a certain time period. So the signature changes to this:
IObservable<R> JoinOnKey<T,R>(IObservable<T>[] sources,
Func<T,int> getKey,
Func<T[],R> projection,
int maxItemsToWaitBeforeGivingUp)
though tags arrive in the same (increasing) order on each source, there is an upstream 'reset' function which can set them all back to zero.
After researching Buffer, Join, GroupJoin, Zip etc., I've ended up hacking my own solution, which maintains an internal queue of arrays, which has to be locked on every new incoming item. I'm not happy with it, so I'm interested in any ideas or pointers to potentially 'cleaner' solutions.

You mention giving up after a certain time but your code has a parameter for giving up after a certain number of items. I am going to assume the code is a typo.
I think this code satisfies your first constraint. I am not 100% certain of your 2nd constraint. Do you receive some notification of this reset event? Without such a notification I'm not sure you can reliably handle it correctly.
var count = sources.Length;
var timer = Observable.Timer(maxTimeToWaitBeforeGivingUp);
sources
.Merge()
.GroupByUntil(getKey, g => g.Take(count).TakeUntil(timer).Count())
.SelectMany(g => g.ToArray().Select(projection));

Related

C# DateTime OrderBy when list contains some identical DateTime values

I have a list of reference types that are sorted by a DateTime property. Some of them have identical DateTime's. For example, multiple baseball games starting at the exact same time.
I want to confirm that OrderBy will sort the same way every time; that is to say because I provided the input in the order of A->B->C; that the output will also be A->B->C (all items have identical DateTime).
I wrote a unit test to confirm that the order is preserved. The test passed. But without really knowing what is going on I still don't feel confident.
Can someone please confirm the OrderBy behavior for me? I tried searching via google and couldn't find anything definitive.
The formal name for the concept you're asking about is called a Stable Sort.
Knowing this, you can check the documentation for Enumerable.OrderBy and see that it does, indeed, use a stable sorting algorithm. From near the end of the Remarks section:
This method performs a stable sort; that is, if the keys of two elements are equal, the order of the elements is preserved.
Additionally, there was some confustion with Linq-to-SQL in the comments on the question. If your data is already in a List<T> object you are not using Linq-to-SQL anymore. However, it's worth noting that Linq-to-SQL uses IQueryable.OrderBy rather than Enumerable.ORderBy, and IQueryable.OrderBy does not guarantee a stable sort. You may get a stable sort, but it depends on what the database engine does.
In short the OrderBy won't change the order if the OrderBy value is identical. So if the collection you called it on is ordered then it will keep that order. If it's a list then you're all good but other collection types do not guarantee order, such as dictionary or hashset so I imagine the order of them could change as you can't guarantee their order in the underlying collection.
Edit: as someone has mentioned in the comments linq to objects OrderBy is a stable (or deterministic sort) so the order will be the same every time and items considered equal will not have their order changed
First make sure you read the excelent answer of Joel.
That said, if you can not rely on Linq2Object's Stable Sort (EF for example), you also have the option of ThenBy:
OrderBy(c => c.MyDate).ThenBy(n => n.MyId)
This way you can apply a second order, if the first one has multiple same values.

Rx.NET 'Distinct' to get the lastest value?

I'm new to Rx and I'm trying to make a GUI to display stock market data. The concept is a bit like ReactiveTrader, but I'll need to display the whole "depth", i.e., all prices and their buy/sell quantities in the market instead of only the "top level" of the market buy/sells, sorting by price.
The data structure for each "price level" is like this:
public class MarketDepthLevel
{
public int MarketBidQuantity { get; set; }
public decimal Price { get; set; }
public int MarketAskQuantity { get; set; }
}
And underneath the GUI, a socket listens to network updates and return them as an Observable:
IObservable<MarketDepthLevel> MarketPriceLevelStream;
Which after transformed into a ReactiveList, eventually bound to a DataGrid.
The transformation would basically choose the latest updates of each price level, and sort them by price. So I come up with something like this:
public IReactiveDerivedList<MarketDepthLevel> MarketDepthStream
{
get
{
return MarketDepthLevelStream
.Distinct(x => x.Price)
.CreateCollection()
.CreateDerivedCollection(x => x, orderer: (x, y) => y.Price.CompareTo(x.Price));
}
}
But there are problems:
When 'Distinct' sees a same price as appeared before, it discards the new one, but I need the new one to replace the old ones (as they contain the lasted MarketBidQuantity/MarketAskQuantity)
It seems a bit clumsy to CreateCollection/CreateDerivedColleciton
Any thoughts on solving these (especially the 1st problem)?
Thanks
Just group the items and then project each group to be the last item in the group:
return MarketDepthLevelStream
.GroupBy(x => x.Price, (key, items) => items.Last());
If I understand you correctly, you want to project a stream of MarketDepthLevel updates in a list of the latest bid/ask quantities for each level (in finance parlance, this is a type of ladder). The ladder is held as a ReactiveList bound the UI. (ObserveOn may be required, although ReactiveList handles this in most cases I believe)
Here's an example ladder snapped from http://ratesetter.com, where the "Price" is expressed as a percentage (Rate), and the bid/ask sizes by the amount lenders want and borrowers need at each price level:
At this point, I begin to get slightly lost. I'm confused as to why you need any further Rx operators, since you could simply Subscribe to the update stream as is and have the handler update a fixed list data-bound to a UI. Doesn't each new event simply need to be added to the ReactiveList if it's a new price, or replace an existing entry with a matching price? Imperative code to do this is fine if it's the last step in the chain to the UI.
There is only value in doing this in the Rx stream itself if you need to convert the MarketDepthLevelStream into a stream of ladders. That could be useful, but you don't mention this need.
Such a need could be driven by either the desire to multicast the stream to many subscribers, and/or because you have further transformations or projections you need to make on the ladders.
Bear in mind, if the ladder is large, then working with whole ladder updates in the UI might give you performance issues - in many cases, individual updates into a mutable structure like a ReactiveList are a practical way to go.
If working with a stream of ladders is a requirement, then look to Observable.Scan. This is the work-horse operator in Rx that maintains local state. It is used for any form of running aggregate - such as a running total, average etc., or in this case, a ladder.
Now if all you need is the static list described above, I am potentially off on a massive detour here, but it's a useful discussion so...
You'll want to think carefully about the type used for the ladder aggregate - you need to be concious of how down-stream events would be consumed. It's likely to need an immutable collection type so that things don't get weird for subscribers (each event should be in effect a static snapshot of the ladder). With immutable collections, it may be important to think about memory efficiency.
Here's a simple example of how a ladder stream might work, using an immutable collection from nuget pre-release package System.Collections.Immutable:
public static class ObservableExtensions
{
public static IObservable<ImmutableSortedDictionary<decimal, MarketDepthLevel>>
ToLadder(this IObservable<MarketDepthLevel> source)
{
return source.Scan(ImmutableSortedDictionary<decimal, MarketDepthLevel>.Empty,
(lastLadder, depthLevel) =>
lastLadder.SetItem(depthLevel.Price, depthLevel));
}
}
The ToLadder extension method creates an empty immutable ladder as the seed aggregate, and each successive MarketDepthLevel event produces a new update ladder. You may want to see if ImmutableSortedSet is sufficient.
You would probably want to wrap/project this into your own type, but hopefully you get the idea.
Ultimately, this still leaves you with the challenge of updating a UI - and as mentioned before now you are stuck with the whole ladder, meaning you have to bind a whole ladder every time, or convert it back to a stream of individual updates - and it's getting way to off topic to tackle that here...!

Using .Where() on a List

Assuming the two following possible blocks of code inside of a view, with a model passed to it using something like return View(db.Products.Find(id));
List<UserReview> reviews = Model.UserReviews.OrderByDescending(ur => ur.Date).ToList();
if (myUserReview != null)
reviews = reviews.Where(ur => ur.Id != myUserReview.Id).ToList();
IEnumerable<UserReview> reviews = Model.UserReviews.OrderByDescending(ur => ur.Date);
if (myUserReview != null)
reviews = reviews.Where(ur => ur.Id != myUserReview.Id);
What are the performance implications between the two? By this point, is all of the product related data in memory, including its navigation properties? Does using ToList() in this instance matter at all? If not, is there a better approach to using Linq queries on a List without having to call ToList() every time, or is this the typical way one would do it?
Read http://blogs.msdn.com/b/charlie/archive/2007/12/09/deferred-execution.aspx
Deferred execution is one of the many marvels intrinsic to linq. The short version is that your data is never touched (it remains idle in the source be that in-memory, or in-database, or wherever). When you construct a linq query all you are doing is creating an IEnumerable class that is 'capable' of enumerating your data. The work doesn't begin until you actually start enumerating and then each piece of data comes all the way from the source, through the pipeline, and is serviced by your code one item at a time. If you break your loop early, you have saved some work - later items are never processed. That's the simple version.
Some linq operations cannot work this way. Orderby is the best example. Orderby has to know every piece of data because it possible that the last piece retrieved from the source very well could be the first piece that you are supposed to get. So when an operation such as orderby is in the pipe, it will actually cache your dataset internally. So all data has been pulled from the source, and has gone through the pipeline, up to the orderby, and then the orderby becomes your new temporary data source for any operations that come afterwards in the expression. Even so, orderby tries as much as possible to follow the deferred execution paradigm by waiting until the last possible moment to build its cache. Including orderby in your query still doesn't do any work, immediately, but the work begins once you start enumerating.
To answer your question directly, your call to ToList is doing exactly that. OrderByDescending is caching the data from your datasource => ToList additionally persists it into a variable that you can actually touch (reviews) => where starts pulling records one at a time from reviews, and if it matches then your final ToList call is storing the results into yet another list in memory.
Beyond the memory implications, ToList is additionally thwarting deferred execution because it also STOPS the processing of your view at the time of the call, to entirely process that entire linq expression, in order to build its in-memory representation of the results.
Now none of this is a real big deal if the number of records we're talking about are in the dozens. You'll never notice the difference at runtime because it happens so quick. But when dealing with large scale datasets, deferring as much as possible for as long as possible in hopes that something will happen allowing you to cancel a full enumeration... in addition to the memory savings... gold.
In your version without ToList: OrderByDescending will still cache a copy of your dataset as processed through the pipeline up to that point, internally, sorted of course. That's ok, you gotta do what you gotta do. But that doesn't happen until you actually try to retrieve your first record later in your view. Once that cache is complete, you get your first record, and for every next record you are then pulling from that cache, checking against the where clause, you get it or you don't based upon that where and have saved a couple of in memory copies and a lot of work.
Magically, I bet even your lead-in of db.Products.Find(id) doesn't even start spinning until your view starts enumerating (if not using ToList). If db.Products is a Linq2SQL datasource, then every other element you've specified will reduce into SQL verbiage, and your entire expression will be deferred.
Hope this helps! Read further on Deferred Execution. And if you want to know 'how' that works, look into c# iterators (yield return). There's a page somewhere on MSDN that I'm having trouble finding that contains the common linq operations, and whether they defer execution or not. I'll update if I can track that down.
/*edit*/ to clarify - all of the above is in the context of raw linq, or Linq2Objects. Until we find that page, common sense will tell you how it works. If you close your eyes and imagine implementing orderby, or any other linq operation, if you can't think of a way to implement it with 'yield return', or without caching, then execution is not likely deferred and a cache copy is likely and/or a full enumeration... orderby, distinct, count, sum, etc... Linq2SQL is a whole other can of worms. Even in that context, ToList will still stop and process the whole expression and store the results because a list, is a list, and is in memory. But Linq2SQL is uniquely capable of deferring many of those aforementioned clauses, and then some, by incorporating them into the generated SQL that is sent to the SQL server. So even orderby can be deferred in this way because the clause will be pushed down into your original datasource and then ignored in the pipe.
Good luck ;)
Not enough context to know for sure.
But ToList() guarantees that the data has been copied into memory, and your first example does that twice.
The second example could involve queryable data or some other on-demand scenario. Even if the original data was all already in memory and even if you only added a call to ToList() at the end, that would be one less copy in-memory than the first example.
And it's entirely possible that in the second example, by the time you get to the end of that little snippet, no actual processing of the original data has been done at all. In that case, the database might not even be queried until some code actually enumerates the final reviews value.
As for whether there's a "better" way to do it, not possible to say. You haven't defined "better". Personally, I tend to prefer the second example...why materialize data before you need it? But in some cases, you do want to force materialization. It just depends on the scenario.

Do linq queries re-execute if no changes occured?

I was reading up on LINQ. I know that there are deferred and immediate queries. I know with deferred types running the query when it's enumerated allows any changes to the data set to be reflected in each enumeration. But I can't seem to find an answer if there's a mechanism in place to prevent the query from running if no changes occurred to the data set since the last enumeration.
I read on MSDN referring to LINQ queries:
Therefore, it follows that if a query is enumerated twice it will be executed twice.
Have I overlooked an obvious - but...?
Indeed, there is none. Actually, that's not quite true - some LINQ providers will spot trivial but common examples like:
int id = ...
var customer = ctx.Customers.SingleOrDefault(x => x.Id == id);
and will intercept that via the identity-manager, i.e. it will check whether it has a matching record already in the context; if it does: it doesn't execute anything.
You should note that the re-execution also has nothing to do with whether or not data has changed; it will re-execute (or not) regardless.
There are two take-away messages here:
don't iterate any enumerable more than once: not least, it isn't guaranteed to work at all
if you want to buffer data, put it into a list/array
So:
var list = someQuery.ToList();
will never re-execute the query, no matter how many times you iterate over list. Because list is not a query: it is a list.
A third take-away would be:
if you have a context that lives long enough that it is interesting to ask about data migration, then you are probably holding your data-context for far, far too long - they are intended to be short-lived

How do you buffer items into groups in Reactive Extensions?

I have an IObservable; where a property change has an entity ID and PropertyName. I want to use this to update a database, but if multiple properties change almost simultaneously I only want to do one update for all properties of the same entity.
If this was a static IEnumerable and I was using LINQ I could simply use:
MyList.GroupBy(C=>C.EntityID);
However, the list never terminates (never calls IObserver.OnComplete). What I want to be able to do is wait a period of time, say 1 second, group all the calls appropriately for that one second.
Ideally I would have separate counters for each EntityID and they would reset whenever a new property change was found for that EntityID.
I can't use something like Throttle because I want to handle all of the property changes, I just want to handle them together in one go.
Here you go:
MyObservable
.Buffer(TimeSpan.FromSeconds(1.0))
.Select(MyList =>
MyList.GroupBy(C=>C.EntityID));
The Buffer method seems to do what you want. Give it the TimeSpan and it'll collapse all the messages into a list. There is also the Window method which does something similar, but I'm not entirely sure what its semantics might be.

Categories

Resources