How do you buffer items into groups in Reactive Extensions? - c#

I have an IObservable; where a property change has an entity ID and PropertyName. I want to use this to update a database, but if multiple properties change almost simultaneously I only want to do one update for all properties of the same entity.
If this was a static IEnumerable and I was using LINQ I could simply use:
MyList.GroupBy(C=>C.EntityID);
However, the list never terminates (never calls IObserver.OnComplete). What I want to be able to do is wait a period of time, say 1 second, group all the calls appropriately for that one second.
Ideally I would have separate counters for each EntityID and they would reset whenever a new property change was found for that EntityID.
I can't use something like Throttle because I want to handle all of the property changes, I just want to handle them together in one go.

Here you go:
MyObservable
.Buffer(TimeSpan.FromSeconds(1.0))
.Select(MyList =>
MyList.GroupBy(C=>C.EntityID));

The Buffer method seems to do what you want. Give it the TimeSpan and it'll collapse all the messages into a list. There is also the Window method which does something similar, but I'm not entirely sure what its semantics might be.

Related

Join n observable sources on key, with potentially missing keys

I have multiple data sources which share a tag/key which I need to re-synchronize. The type signature would look a bit like this:
IObservable<R> JoinOnKey<T,R>(IObservable<T>[] sources,
Func<T,int> getKey,
Func<T[],R> projection)
Unfortunately there are two complications:
some sources may have missing tags, but I still want to get the others
This implies that the function needs to 'give up' waiting after a certain time period. So the signature changes to this:
IObservable<R> JoinOnKey<T,R>(IObservable<T>[] sources,
Func<T,int> getKey,
Func<T[],R> projection,
int maxItemsToWaitBeforeGivingUp)
though tags arrive in the same (increasing) order on each source, there is an upstream 'reset' function which can set them all back to zero.
After researching Buffer, Join, GroupJoin, Zip etc., I've ended up hacking my own solution, which maintains an internal queue of arrays, which has to be locked on every new incoming item. I'm not happy with it, so I'm interested in any ideas or pointers to potentially 'cleaner' solutions.
You mention giving up after a certain time but your code has a parameter for giving up after a certain number of items. I am going to assume the code is a typo.
I think this code satisfies your first constraint. I am not 100% certain of your 2nd constraint. Do you receive some notification of this reset event? Without such a notification I'm not sure you can reliably handle it correctly.
var count = sources.Length;
var timer = Observable.Timer(maxTimeToWaitBeforeGivingUp);
sources
.Merge()
.GroupByUntil(getKey, g => g.Take(count).TakeUntil(timer).Count())
.SelectMany(g => g.ToArray().Select(projection));

Method that is getting all the latest updates since last runt of the same method

I was not sure how to write the title, but anyway. I have a method that randomly sends me objects (of emails) from 10 to 60 seconds. Now is the problem that I need to make a new method called getNewMessages, that returns a list of emails that is NEW from the last time the method is called.
So lets say that the first time I call this method it gets all the new emails (this part is not a problem), but the second time I call it, the old emails should be removed from the list and the new ones should fill the list, SINCE the last time we called the method.
Make an enclosing entity so will have the last date retrieved then use a Linq query to get only the new ones
Sample:
var result = list.Where(v=>v.LastUpdate > _lastUpdate);
_lastUpdate = DateTime.Now;
result.ToList().ForEach(v=>v.LastUpdate = _lastUpdate);
Your question is quite vague, so it is quite hard to give a concrete answer. But maybe some ideas may help you.
Every mail you receive gets some kind of unique id. This id could in your case derived from the sender and the timestamp of the received mail. Maybe you should write an IEqualityComparer and consider this GetHashCode algorithm for your implementation. Then you simply have to put all your mails into a new HashSet<Email>(MyEqualityComparer) and when your method gets called you can simply check there if you already computed this message or not.
If you don't like to store all mails within the HashSet<T> (maybe to improve memory footprint) you could call the MyMailComparer.GetHashCode(email) and put all the results into a HashSet<int> and discard the mail afterwards. But be aware that this could possible drop a new message cause it would be possible that two different messages could produce the same hash code, hence the need for the Equals() method to check in this case if both message are really equal, but this would make it necessary to keep the mails within the memory.

Collection properties should be read only - Loophole?

In the process of adhering to code analysis errors, I'm changing my properties to have private setters. Then I started trying to understand why a bit more. From some research, MS says this:
A writable collection property allows a user to replace the collection with a completely different collection.
And the answer, here, states:
Adding a public setter on a List<T> object is dangerous.
But the reason why it's dangerous is not listed. And that's the part where I'm curious.
If we have this collection:
public List<Foo> Foos { get; set; }
Why make the setter private? Apparently we don't want client code to replace the collection, but if a client can remove every element, and then add whatever they want, what's the point? Is that not the same as replacing the collection entirely? How is value provided by following this code analysis rule?
Not exposing the setter prevents a situation where the collection is assigned a value of null. There's a difference between null and a collection without any values. Consider:
for (var value in this.myCollection){ // do something
When there are no values (i.e. someone has called Remove on every value), nothing bad happens. When this.myCollection is null, however, a NullReferenceException will be thrown.
Code Analysis is making the assumption that your code doesn't check that myCollection is null before operating on it.
It's probably also an additional safeguard for the thread-safe collection types defined in System.Collections.Concurrent. Imagine some thread trying to replace the entire collection by overwritting it. By getting rid of the public setter, the only option the thread has is to call the thread-safe Add and Remove methods.
If you're exposing an IList (which would be better practice) the consumer could replace the collection with an entirely different class that implements IList, which could have unpredictable effects. You could have subscribed to events on that collection, or on items in that collection that you're now incorrectly responding to.
In addition to SimpleCoder's null checking (which is, of course, important), there's other things you need to consider.
Someone could replace the List, causing big problems in thread safety
Events to a replaced List won't be sent to subscribers of the old one
You're exposing much, much more behavior then you need to. For example, I wouldn't even make the getter public.
To clarify point 3, don't do cust.Orders.clear(), but make a function called clearOrders() instead.
What if a customer isn't allowed to go over a credit limit? You have no control over that if you expose the list. You'd have to check that (and every other piece of business logic) every place where you might add an order. Yikes! That's a lot of potential for bugs. Instead, you can place it all in an addOrder(Order o) function and be right as rain.
For almost every (I'd say every, but sometimes cheating feels good...) business class, every property should be private for get and set, and if feasible make them readonly too. In this way, users of your class get only behaviors. Protect as much of your data as you can!
ReadOnlyCollection and ReadOnlyObservableCollection exists only for read only collection scenearios.
ReadOnlyObservableCollection is very useful for one way binding in WPF/Silverlight/Metro apps.
If you have a Customer class with a List Property then this property should always have a private setter else it can be changed from outside the customer object via:
customer.Orders = new List<Order>
//this could overwrite data.
Always use the add and remove methods of the collection.
The Orders List should be created inside the Customer constructor via:
Orders = new List<Order>();
Do you really want to check everywhere in your code wether the customer.Orders != null then operate on the Orders?
Or you create the Orders property in your customer object as suggested and never check for customer.Orders == null instead just enumerate the Orders, if its count is zero nothing happens...

Why do we need Single() in LINQ?

Why is the main purpose of the extension method Single()?
I know it will throw an exception if more than an element that matches the predicate in the sequence, but I still don't understand in which context it could be useful.
Edit:
I do understand what Single is doing, so you don't need to explain in your question what this method does.
It's useful for declaratively stating
I want the single element in the list and if more than one item matches then something is very wrong
There are many times when programs need to reduce a set of elements to the one that is interesting based an a particular predicate. If more than one matches it indicates an error in the program. Without the Single method a program would need to traverse parts of the potentially expensive list more once.
Compare
Item i = someCollection.Single(thePredicate);
To
Contract.Requires(someCollection.Where(thePredicate).Count() == 1);
Item i = someCollection.First(thePredicate);
The latter requires two statements and iterates a potentially expensive list twice. Not good.
Note: Yes First is potentially faster because it only has to iterate the enumeration up until the first element that matches. The rest of the elements are of no consequence. On the other hand Single must consider the entire enumeration. If multiple matches are of no consequence to your program and indicate no programming errors then yes use First.
Using Single allows you to document your expectations on the number of results, and to fail early, fail hard if they are wrong. Unless you enjoy long debugging sessions for their own sake, I'd say it's enormously useful for increasing the robustness of your code.
Every LINQ operator returns a sequence, so an IEnumerable<T>. To get an actual element, you need one of the First, Last or Single methods - you use the latter if you know for sure the sequence only contains one element. An example would be a 1:1 ID:Name mapping in a database.
A Single will return a single instance of the class/object and not a collection. Very handy when you get a single record by Id. I never expect more than one row.

Transaction support in an observable collection

I'm interested the most efficient way to change an observable collection in such a way that only one property changed is fired. Lets say that I want to populate the list with 3 items, there is no addCollection method or something like that, so I have to do clear + 3 times add. Do I need to create a different observable collection and assign? Or what techniqies do others use?
NET Framework's ObservableCollection class sends individual notifications on as each item added to the collection and provides no mechanism for AddRange-type functionality. However you can very easily create your own collection that implements INotifyCollectionChanged and send whatever notifications you like.
On issue you may encounter is that the INotifyCollectionChanged interface includes the ability to specify that multiple items were added to the collection in a single message, but no standard NET Framework classes actually create these notifications. Because of this, some third-party and open source controls that assume only one item has been added when they receive an Add notification. Even the built-in NET Framework classes may have undiscovered bugs related to this.
For these reasons I would recommend your custom collection have a mode in which it can be set to always send a Reset notification at the end of an AddRange instead of a single multi-item Add notification. You could optimize this further by sending multiple single-item Add notifictions or a Reset notification depending on the actual number of items added.
Of course there are situations in which it is just as easy to replace the ObservableCollection with a new one. At times this will be much less efficient than looping Add() because event handlers and CollectionViews are rebuilt. Other times it will be more efficient if the collection is large and your loop only adds a few items at a time.
And sometimes it won't work at all.

Categories

Resources