Observable.Window and .Zip not functioning like I would expect

Observable.Window and .Zip not functioning like I would expect - c#

I'm trying to turn an IEnumerable into an IObservable that delivers its items in chunks one second apart.
var spartans = Enumerable.Range(0, 300).ToObservable();
spartans
.Window(30)
.Zip(Observable.Timer(DateTimeOffset.Now, TimeSpan.FromMilliseconds(1000)), (x, _) => x)
.SelectMany(w => w)
.Subscribe(
n => Console.WriteLine("{0}", n),
() => Console.WriteLine("all end"));
With this code, the only thing that is printed is "all end" after ten seconds. If I remove the .Zip then the entire sequence prints instantaneously, and if I remove the .Window and .SelectMany then the entire sequence prints one item per second. If I peek into the "windowed" observable inside the lambda passed to SelectMany, I can see that it is empty. My question is, why?

The problem is occurring because of how Window works with a count - and this one isn't particularly intuitive!
As you know, Window serves a stream of streams. However, with a count, the child streams are "warm" - i.e. when an observer of this stream receives a new window in it's OnNext handler, it must subscribe to it before it cedes control back to the observable, or the events are lost.
Zip doesn't "know" it's dealing with this situation, and doesn't give you the opportunity to subscribe to each child window before it grabs the next.
If you remove the Zip, you see all the events because the SelectMany does subscribe to all the child windows as it receives them.
The easiest fix is to use Buffer instead of Window - make that one change and your code works. That's because Buffer works very similarly to SelectMany, effectively preserving the windows by doing this:
Window(30).SelectMany(x => x.ToList())
The elements are no longer warm windows but are crystallized as lists, and your Zip will now work as expected, with the following SelectMany flattening the lists out.
Important Performance Consideration
It's important to note that this approach will cause the entire IEnumerable<T> to be run through in one go. If the source enumerable should be lazily evaluated (which is usually desirable), you'll need to go a different way. Using a downstream observable to control the pace of an upstream one is tricky ground.
Let's replace your enumerable with a helper method so we can see when each batch of 30 is evaluated:
static IEnumerable<int> Spartans()
{
for(int i = 0; i < 300; i++)
{
if(i % 30 == 0)
Console.WriteLine("30 More!");
yield return i;
}
}
And use it like this (with the Buffer "fix" here, but the behaviour is similar with Window):
Spartans().ToObservable()
.Buffer(30)
.Zip(Observable.Timer(DateTimeOffset.Now,
TimeSpan.FromMilliseconds(1000)),
(x, _) => x)
.SelectMany(w => w)
.Subscribe(
n => Console.WriteLine("{0}", n),
() => Console.WriteLine("all end"));
Then you see this kind of output demonstrating how the source enumerable is drained all at once:
30 More!
0
1
...miss a few...
29
30 More!
30 More!
30 More!
30 More!
30 More!
30 More!
30 More!
30 More!
30 More!
30
31
32
...etc...
To truly pace the source, rather than using ToObservable() directly you could do the following. Note the Buffer operation on the Spartans() IEnumerable<T> comes from nuget package Ix-Main - added by the Rx team to plug a few holes on the IEnumerable<T> monad:
var spartans = Spartans().Buffer(30);
var pace = Observable.Timer(DateTimeOffset.Now, TimeSpan.FromMilliseconds(1000));
pace.Zip(spartans, (_,x) => x)
.SelectMany(x => x)
.Subscribe(
n => Console.WriteLine("{0}", n),
() => Console.WriteLine("all end"));
And the output becomes a probably much more desirable lazily evaluated output:
30 More!
0
1
2
...miss a few...
29
30 More!
30
31
32
...miss a few...
59
30 More!
60
61
62
...etc

I'm not sure how to get this working with Window, but what about this:
var spartans = Enumerable.Range(0, 300).ToObservable();
spartans
.Select(x => Observable.Timer(TimeSpan.FromSeconds(1)).Select(_ => x))
.Merge(30);

Related

Is List<>.Sort() the best way to reduce the number of checks?

I am making a meme ranking app that would rank your favorite memes best to worst and remove the least ranked memes up to a point to remove extra bloat from old and outdated memes saving space on your disk. I thought that because the List<T>.Sort() function was pretty fast, it'll quickly help the user sort through possibly hundreds of memes. This was not the case because when I would try to use the sort using the method below I got some strange results.
// Using Task.Run() temp. due to the easy access. Will thread this properly in the future.
Task.Run(() =>
{
Manager.Files.Sort(delegate (Photo x, Photo y) {
// I have Invoke built into the ChangeImage method but having double duty doesn't slow it down.
Invoke(new MethodInvoker(() =>
{
ChangeImage(pictureBox1, x.Filename);
ChangeImage(pictureBox2, y.Filename);
}));
WaitForButtonPress.WaitOne(); // Pauses the thread until an image is chosen.
switch (LastButton)
{
case 1: // if x is better than y
return 1;
case 2: // if y is better than x
return -1;
case 3: // if y and x are equals
return 0;
default:
break;
}
return 0;
});
});
The issue I'm having with this code is the fact that sometimes pepe.jpg and isThisAPidgon.png are often compared to each other multiple times especially after the end of a streak of showing up in the comparison. pepe.jpg vs. 1.jpg, pepe.jpg vs. 2.png... pepe.jpeg vs. nth.jpg, pepe.jpg vs. isThisAPidgon.png, then isThisAPidgon.png vs. pepe.jpg again but reversed. Upon finding this strange behavior I tried checking how many times they are called.
static void Main(string[] args)
{
List<Number> numbers = new List<Number>();
Random rand = new Random();
for (int i = 0; i < 500; i++)
{
numbers.Add(new Number() { Num = rand.Next(0, 500) });
}
foreach(Number num in numbers)
{
Console.WriteLine(num.num);
}
numbers.Sort((Number x, Number y) =>
{
int numx = x.Num;
int numy = y.Num;
if (numx > numy)
return 1;
else if (numy > numx)
return -1;
else
return 0;
//return x.Num - y.Num;
});
int total = 0;
foreach(Number num in numbers)
{
Console.WriteLine($"Num: {num.num} Times Checked: {num.timesChecked}");
total += num.timesChecked;
}
Console.WriteLine($"Finished with {total} checks.");
}
Class Number:
class Number
{
public Number()
{
}
public int num;
public int timesChecked = 0;
public int Num { get { timesChecked++; return num; } set => num = value; }
}
With the < == > comparison returning 1, -1, or 0 and returning the difference of x.num and y.num both yield the same result: some appear way more often than some. Here are some examples.
#checked with differences
Num: 168 Times Checked: 8
Num: 170 Times Checked: 17
Num: 170 Times Checked: 316 #316?
Num: 170 Times Checked: 14
Num: 171 Times Checked: 15
#checked with differences
Num: 237 Times Checked: 12
Num: 237 Times Checked: 9
Num: 240 Times Checked: 105 #More reasonable...
Num: 241 Times Checked: 14
Num: 242 Times Checked: 15
#checked with differences
Num: 395 Times Checked: 10
Num: 397 Times Checked: 8
Num: 398 Times Checked: 502 #How could it fail to sort this number in more tries than the array is long?
Num: 398 Times Checked: 7
Num: 399 Times Checked: 8
#checked with <==>
Num: 306 Times Checked: 15
Num: 306 Times Checked: 17
Num: 307 Times Checked: 756 #This is ridiculous how does this happen?
Num: 307 Times Checked: 13
Num: 309 Times Checked: 15
It seems that differences get below 10000 total checks but when checked with the <==>/1,-1,0 method it seems to consistently get above 15000 total. Is there a sorting algorithm that focuses on reducing the number of times an object needs to be compared to be sorted?
Edit: I made a mistake in the <==> comparison example. I'm using x.Num and y.Num twice which inflated the result. To fix this I stored the two properties as locals and it dropped the total number from above 15000 to below 10000 around 9000 while subtracting sill remains below 10000 around 8000.

Most sort algorithms have a complexity O(n log n) and that means they need to perform that many comparisons in order to sort the data. So, no, you aren't going to be able to use Sort for what you are doing.
Secondly, some build in Sort methods switch behavior depending on the size of the list so your user interface may feel very different depending on which algorithm they pick. I've never seen someone use Sort to determine UI-behavior before, novel, but unusual.
If you do want to use a sort algorithm maybe go for insertion sort (compare each new item against existing list using binary search to find where it goes) or quicksort (partition the elements into two sets by comparing one against all others).
But ... I don't think either of these is going to be a great user experience, both will feel repetitive. And, given that this is a subjective question, the answer often isn't a purely linear ordering of items. People aren't consistent and they will create cycles A->B->C->A when they do this.
So here's a suggestion for a UI experience that feels less repetitive, can handle subjective anomalies and is easy to implement:
Pick pairs of images maybe at random and ask the user to rank one over the other. Let the user be inconsistent if they wish. Take each pair A->B that they create and put them in a Graph. Find any nodes in the graph that aren't connected yet, or which only have a single connection, and focus asking how they rank against nodes you've already scored.
That way if they've ranked A->B->C and then rank C->D the algorithm isn't going to keep asking how A and B compare to D.
And finally apply a technique called Topological Sort and ignore any cycles you find. An approximate topological sort if you like.
There's a Graph library (that I wrote) which includes this capability. See this test for an example of calling .TopologicalSortApprox().
Once all the items are in the graph you can keep going, using comparisons that try to flatten the graph closer to a straight line. But at any time if the user gets bored and wants to stop (nobody wants to do n log n comparisons!) you at least have an approximate rank you can use.

C# LINQ evaluation order (right to left)?

LINQ evaluates clauses from right to left? That's why seems so many articles which explains "Lazy evaluation" using a Take operation in the end?
The following example, Code Snippet 2 a lot faster than Code Snippet 1 because it didn't do "ToList"
Code Snippet 1 (Takes about 13000 msec)
var lotsOfNums = Enumerable.Range(0, 10000000).ToList();
Stopwatch sw = new Stopwatch();
sw.Start();
// Get all the even numbers
var a = lotsOfNums.Where(num => num % 2 == 0).ToList();
// Multiply each even number by 100.
var b = a.Select(num => num * 100).ToList();
var c = b.Select(num => new Random(num).NextDouble()).ToList();
// Get the top 10
var d = c.Take(10);
// a, b, c and d have executed on each step.
foreach (var num in d)
{
Console.WriteLine(num);
}
sw.Stop();
Console.WriteLine("Elapsed milliseconds: " + sw.ElapsedMilliseconds);
Code Snippet 2 (3 msec)
sw.Reset();
sw.Start();
var e = lotsOfNums.Where(num => num % 2 == 0).Select(num => num * 100).Select(num => new Random(num).NextDouble()).Take(10);
foreach (var num in e)
{
Console.WriteLine(num);
}
sw.Stop();
Console.WriteLine("Elapsed milliseconds: " + sw.ElapsedMilliseconds);
Console.Read();
However, for Code Snippet 2, I find the relative position of "Take" is not relevant?
To be specific, I changed from:
var e = lotsOfNums.Where(num => num % 2 == 0).Select(num => num * 100).Select(num => new Random(num).NextDouble()).Take(10);
To:
var e = lotsOfNums.Take(10).Where(num => num % 2 == 0).Select(num => num * 100).Select(num => new Random(num).NextDouble());
There's no difference in performance?
Also worth noting, if you move the NextDouble to far right, since LINQ evaluates left to right, your result list will be empty and also Select(NextDouble) forces all subsequent clauses in left to loop thru the whole list, it will take much longer time to evaluate.
var e = lotsOfNums.Select(num => new Random(num).NextDouble()).Where(num => num % 2 == 0).Select(num => num * 100).Take(10);

LINQ evaluates clauses from right to left?
No, clauses are evaluated left to right. Everything is evaluated left to right in C#.
That's why seems so many articles which explains "Lazy evaluation" using a Take operation in the end?
I don't understand the question.
UPDATE: I understand the question. The original poster believes incorrectly that Take has the semantics of ToList; that it executes the query, and therefore goes at the end. This belief is incorrect. A Take clause just appends a Take operation to the query; it does not execute the query.
You must put the Take operation where it needs to be. Remember, x.Take(y).Where(z) and x.Where(z).Take(y) are very different queries. You can't just move a Take around without changing the meaning of the query, so put it in the right place: as early as possible, but not so early that it changes the meaning of the query.
Position of "NextDouble" select clause matters?
Matters to who? Again, I don't understand the question. Can you clarify it?
Why codesnippet 1 and codesnippet 2 has same performance stats?
Since you have not given us your measurements, we have no basis upon which to make a comparison. But your two code samples do completely different things; one executes a query, one just builds a query. Building a query that is never executed is faster than executing it!
I thought "ToList" force early evaluation thus make things slower?
That's correct.
There's no difference in performance? (between my two query constructions)
You've constructed two queries; you have not executed them. Construction of queries is fast, and not typically worth measuring. Measure the performance of the execution of the query, not the construction of the query, if you want to know how fast the query executes!

I think you seem to have the impression that .Take() forces evaluation, which it does not. You're seeing similar performance regardless of the position of Take() because your query isn't actually being evaluated at all. You have to add a .ToList() at the end (or maybe iterate over the result) to test the performance of the query you've built.

Parallel ordered consumable

I would like to process some items in parallel. This processing is independent (order does not matter) and returns an output. These outputs should then be relayed back in order as quickly as possible.
That is to say, the method should behave equivalent to this (except calling Process in parallel):
IEnumerable<T> OrderedParallelImmediateSelect<T> (IEnumerable<object> source)
{
foreach (var input in source) {
var result = Process (input);
yield return result;
}
}
Accordingly, it it required to try to process the items in order. As this is (of course) not guaranteed to finish in order, the result collector must be sure to wait for delayed results.
As soon as the next result in order comes in, it must be returned immediately. We cannot wait for the whole input to be processed before sorting the results.
This is an example of how this could look like:
begin 0
begin 1 <-- we start processing in increasing order
begin 2
complete 1 <-- 1 is complete but we are still waiting for 0
begin 3
complete 0 <-- 0 is complete, so we can return it and 1, too
return 0
return 1
begin 4
begin 5
complete 4 <-- 2 and 3 are missing before we may return this
complete 2 <-- 2 is done, 4 must keep waiting
return 2
begin 6
complete 3 <-- 3 and 4 can now be returned
return 3
return 4
If at all possible, I would like to perform processing on a regular thread pool.
Is this scenario something .NET provides a solution for? I've built a custom solution, but would prefer to use something simpler.
I'm aware of a lot of similar questions, but it seems they all either allow waiting for all items to finish processing or do not guarantee ordered results.
Here's an attempt that sadly does not seem to work. Replacing IEnumerable with ParallelQuery had no effect.
int Process (int item)
{
Console.WriteLine ($"+ {item}");
Thread.Sleep (new Random (item).Next (100, 1000));
Console.WriteLine ($"- {item}");
return item;
}
void Output (IEnumerable<int> items)
{
foreach (var it in items) {
Console.WriteLine ($"=> {it}");
}
}
IEnumerable<int> OrderedParallelImmediateSelect (IEnumerable<int> source)
{
// This processes in parallel but does not return the results immediately
return source.AsParallel ().AsOrdered ().Select (Process);
}
var input = Enumerable.Range (0, 20);
Output (OrderedParallelImmediateSelect (input));
Output:
+0 +1 +3 +2 +4 +5 +6 +7 +9 +10 +11 +8 -1 +12 -3 +13 -5 +14 -7 +15 -9 +16 -11 +17 -14 +18 -16 +19 -0 -18 -2 -4 -6 -8 -13 -10 -15 -17 -12 -19 =>0 =>1 =>2 =>3 =>4 =>5 =>6 =>7 =>8 =>9 =>10 =>11 =>12 =>13 =>14 =>15 =>16 =>17 =>18 =>19

I created this program, as a console application:
using System;
using System.Linq;
using System.Threading;
namespace PlayAreaCSCon
{
class Program
{
static void Main(string[] args)
{
var items = Enumerable.Range(0, 1000);
int prodCount = 0;
foreach(var item in items.AsParallel()
.AsOrdered()
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.Select((i) =>
{
Thread.Sleep(i % 100);
Interlocked.Increment(ref prodCount);
return i;
}))
{
Console.WriteLine(item);
}
Console.ReadLine();
}
}
}
I then initially set a breakpoint on Console.WriteLine(item);. Running the program, when I first hit that breakpoint, prodCount is 5 - we're definitely consuming results before all processing has completed. And after removing the breakpoint, all results appear to be produced in the original order.

The ParallelMergeOptions.NotBuffered disables the buffering of the output, but there is also buffering happening at the other side. The PLINQ employs chunk partitioning by default, which means that the source is enumerated in chunks. This is easy to miss, because the chunks initially have a size of one, and are becoming progressively chunkier as the enumeration unfolds. To remove the buffering at the input side, you must use the EnumerablePartitionerOptions.NoBuffering option:
IEnumerable<int> OrderedParallelImmediateSelect(IEnumerable<int> source)
{
return Partitioner
.Create(source, EnumerablePartitionerOptions.NoBuffering)
.AsParallel()
.AsOrdered()
.WithMergeOptions(ParallelMergeOptions.NotBuffered)
.Select(Process);
}
Something else you might be interested to know is that the current thread participates in the processing of the source, along with ThreadPool threads. So if you have additional work to do during the enumeration of the resulting parallel query, this work will use less than the full power of a thread. It will be like running on a low-priority thread. If you don't want this to happen, you can offload the enumeration of the query to a separate ThreadPool thread, so that the Process runs only on ThreadPool threads, and the current thread is freed and can dedicate itself to the work on the results. There is a custom OffloadEnumeration method in this answer, that could be appended at the end of the query:
//...
.Select(Process)
.OffloadEnumeration();
...or used in the foreach loop:
foreach (var item in OffloadEnumeration(query)) // ...

Pop if X elements in the past Y seconds?

Sorry, kinda new to RX.NET here... I've been able to do a lot of things in RX, but not the main thing I need to do.
I only want the observable to pop if it has received X elements in the past Y seconds. In other words, at any point in time, when a new element is pushed, look back at the past X elements and see if they all occurred within the past Y seconds.
I can't figure out the "look back and count" part.
It seems like Window would be the right operator, due to is slidability, but perhaps I'm biased in my definition of Window. It feels like the closer function is where I'd do that, but I'm not wizard yet.

This function will return the list of x elements that fall within your y timespan.
public static IObservable<IList<T>> GetFullWindows<T>(this IObservable<T> source, int x, TimeSpan y)
{
return source
.Publish(_obs => _obs
.Window(_obs, _ => Observable.Timer(y))
.SelectMany(window => window.Take(x).ToList())
.Where(l => l.Count == x)
);
}
Here's a usage example:
var obs = Observable.Generate(0, i => i < 20, i => i + 1, i => i, i => TimeSpan.FromMilliseconds((20 - i) * 100));
var b = obs
.GetFullWindows(4, TimeSpan.FromSeconds(4.5));
b.Subscribe(l => l.Dump()); //using Linqpad
The trick is using Buffer or Window with the overlapping window feature. Window works better here, because you can short-circuit it if you have met the count criteria.

Take N values from observable on each interval

I have an observable which streams a value for each ms. , this is done every 250 ms. ( meaning 250 values in 250 ms (give or take) ).
Mock sample code :
IObservable<IEnumerable<int>> input = from _ in Observable.Interval(TimeSpan.FromMilliseconds(250))
select CreateSamples(250);
input.Subscribe(values =>
{
for (int i = 0; i < values.Count(); i++)
{
Console.WriteLine("Value : {0}", i);
}
});
Console.ReadKey();
private static IEnumerable<int> CreateSamples(int count)
{
for (int i = 0; i < 250; i++)
{
yield return i;
}
}
What i need is to create some form of process observable which process the input observable in a rate of 8 values every 33 ms
Something along the line of this :
IObservable<IEnumerable<int>> process = from _ in Observable.Interval(TimeSpan.FromMilliseconds(33))
select stream.Take(8);
I was wondering 2 things :
1) How can i write the first sample with the built in operators that reactive extensions provides ?
2) How can i create that process stream which takes values from the input stream
which with the behavior iv'e described ?
I tried using Window as a suggestion from comments below .
input.Window(TimeSpan.FromMilliseconds(33)).Take(8).Subscribe(winObservable => Debug.WriteLine(" !! "));
It seems as though i get 8 and only 8 observables of an unknown number of values
What i require is a recurrence of 8 values every 33 ms. from input observable.
What the code above did is 8 observables of IEnumrable and then stand idle.
EDIT : Thanks to James World . here's a sample .
var input = Observable.Range(1, int.MaxValue);
var timedInput = Observable.Interval(TimeSpan.FromMilliseconds(33))
.Zip(input.Buffer(8), (_, buffer) => buffer);
timedInput.SelectMany(x => x).Subscribe(Console.WriteLine);
But now it get's trickier i need for the Buffer value to calculated
i need this to be done by the actual MS passed between Intervals
when you write a TimeSpan.FromMilliseconds(33) the Interval event of the timer would actually be raised around 45 ms give or take .
Is there any way to calculate the buffer , something like PSUDO
input.TimeInterval().Buffer( s => s.Interval.Milliseconds / 4)

You won't be able to do this with any kind of accuracy with a reasonable solution because .NET timer resolution is 15ms.
If the timer was fast enough, you would have to flatten and repackage the stream with a pacer, something like:
// flatten stream
var fs = input.SelectMany(x => x);
// buffer 8 values and release every 33 milliseconds
var xs = Observable.Interval(TimeSpan.FromMilliseconds(33))
.Zip(fs.Buffer(8), (_,buffer) => buffer);
Although as I said, this will give very jittery timing. If that kind of timing resolution is important to you, go native!

I agree with James' analysis.
I'm wondering if this query gives you a better result:
IObservable<IList<int>> input =
Observable
.Generate(
0,
x => true,
x => x < 250 ? x + 1 : 0,
x => x,
x => TimeSpan.FromMilliseconds(33.0 / 8.0))
.Buffer(TimeSpan.FromMilliseconds(33.0));

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.