I have an observable which streams a value for each ms. , this is done every 250 ms. ( meaning 250 values in 250 ms (give or take) ).
Mock sample code :
IObservable<IEnumerable<int>> input = from _ in Observable.Interval(TimeSpan.FromMilliseconds(250))
select CreateSamples(250);
input.Subscribe(values =>
{
for (int i = 0; i < values.Count(); i++)
{
Console.WriteLine("Value : {0}", i);
}
});
Console.ReadKey();
private static IEnumerable<int> CreateSamples(int count)
{
for (int i = 0; i < 250; i++)
{
yield return i;
}
}
What i need is to create some form of process observable which process the input observable in a rate of 8 values every 33 ms
Something along the line of this :
IObservable<IEnumerable<int>> process = from _ in Observable.Interval(TimeSpan.FromMilliseconds(33))
select stream.Take(8);
I was wondering 2 things :
1) How can i write the first sample with the built in operators that reactive extensions provides ?
2) How can i create that process stream which takes values from the input stream
which with the behavior iv'e described ?
I tried using Window as a suggestion from comments below .
input.Window(TimeSpan.FromMilliseconds(33)).Take(8).Subscribe(winObservable => Debug.WriteLine(" !! "));
It seems as though i get 8 and only 8 observables of an unknown number of values
What i require is a recurrence of 8 values every 33 ms. from input observable.
What the code above did is 8 observables of IEnumrable and then stand idle.
EDIT : Thanks to James World . here's a sample .
var input = Observable.Range(1, int.MaxValue);
var timedInput = Observable.Interval(TimeSpan.FromMilliseconds(33))
.Zip(input.Buffer(8), (_, buffer) => buffer);
timedInput.SelectMany(x => x).Subscribe(Console.WriteLine);
But now it get's trickier i need for the Buffer value to calculated
i need this to be done by the actual MS passed between Intervals
when you write a TimeSpan.FromMilliseconds(33) the Interval event of the timer would actually be raised around 45 ms give or take .
Is there any way to calculate the buffer , something like PSUDO
input.TimeInterval().Buffer( s => s.Interval.Milliseconds / 4)
You won't be able to do this with any kind of accuracy with a reasonable solution because .NET timer resolution is 15ms.
If the timer was fast enough, you would have to flatten and repackage the stream with a pacer, something like:
// flatten stream
var fs = input.SelectMany(x => x);
// buffer 8 values and release every 33 milliseconds
var xs = Observable.Interval(TimeSpan.FromMilliseconds(33))
.Zip(fs.Buffer(8), (_,buffer) => buffer);
Although as I said, this will give very jittery timing. If that kind of timing resolution is important to you, go native!
I agree with James' analysis.
I'm wondering if this query gives you a better result:
IObservable<IList<int>> input =
Observable
.Generate(
0,
x => true,
x => x < 250 ? x + 1 : 0,
x => x,
x => TimeSpan.FromMilliseconds(33.0 / 8.0))
.Buffer(TimeSpan.FromMilliseconds(33.0));
Related
I need to convert a large List of length n into a double[n,1] array. What is the fastest way to make the conversion?
For further background this is to pass into an set Excel object's Range.Value which requires a two dimensional array.
I'm writing this on the assumption that you really want the most efficient way to do this. Extreme performance almost always comes with a trade-off, usually code readability.
I can still substantially optimize one part of this as the comments note, but I didn't want to go overboard using dynamic methods on first pass.
const int TEST_SIZE = 100 * 1000;
//Test data setup
var list = new List<double>();
for (int i = 0; i < TEST_SIZE; i++)
list.Add(i);
//Grab the list's underlying array, which is not public
//This can be made MUCH faster with dynamic methods if you want me to optimize
var underlying = (double[])typeof(List<double>)
.GetField("_items", BindingFlags.NonPublic | BindingFlags.Instance)
.GetValue(list);
//We need the actual length of the list because there can be extra space in the array
//Do NOT use "underlying.Length"
int underlyingLength = list.Count;
//Benchmark it
var sw = Stopwatch.StartNew();
var twodarray = new double[underlyingLength, 1];
Buffer.BlockCopy(underlying, 0, twodarray, 0, underlyingLength * sizeof(double));
var elapsed = sw.Elapsed;
Console.WriteLine($"Elapsed: {elapsed}");
Output:
Elapsed: 00:00:00.0001998
Hardware used:
AMD Ryzen 7 3800X # 3.9 Ghz
32 GB DDR4 3200 RAM
I think this is what you want.
This operation will take no more than a few milliseconds even on a slow core. So why bother? How many times will you do this conversion? If millions of times, than try to find a better approach. But if you do this when the end-user presses a button...
Criticize the answer, but please providing metrics if about efficiency.
// Populate a List with 100.000 doubles
Random r = new Random();
List<double> dList = new List<double>();
int i = 0;
while (i++ < 100000) dList.Add(r.NextDouble());
// Convert to double[100000,1]
Stopwatch chrono = Stopwatch.StartNew();
// Conversion:
double[,] ddArray = new double[dList.Count, 1];
int dIndex = 0;
dList.ForEach((x) => ddArray[dIndex++, 0] = x);
Console.WriteLine("Completed in: {0}ms", chrono.Elapsed);
Outputs: (10 repetitions) - Maximum: 2.6 ms
Completed in: 00:00:00.0020677ms
Completed in: 00:00:00.0026287ms
Completed in: 00:00:00.0013854ms
Completed in: 00:00:00.0010382ms
Completed in: 00:00:00.0019168ms
Completed in: 00:00:00.0011480ms
Completed in: 00:00:00.0011172ms
Completed in: 00:00:00.0013586ms
Completed in: 00:00:00.0017165ms
Completed in: 00:00:00.0010508ms
Edit 1.
double[,] ddArray = new double[dList.Count, 1];
foreach (double x in dList) ddArray[dIndex++, 0] = x;
seems just a little bit faster, but needs more testing:
Completed in: 00:00:00.0020318ms
Completed in: 00:00:00.0019077ms
Completed in: 00:00:00.0023162ms
Completed in: 00:00:00.0015881ms
Completed in: 00:00:00.0013692ms
Completed in: 00:00:00.0022482ms
Completed in: 00:00:00.0015960ms
Completed in: 00:00:00.0012306ms
Completed in: 00:00:00.0015039ms
Completed in: 00:00:00.0016553ms
I'm trying to send/publish at 100ms, and the message looks like this
x.x.x.x.x.x.x.x.x.x
So every 100ms or so subscribe will be called. My problem is that I think, it's not fast enough, (i.e if the current subscribe is not yet done and another subscribe is being called/message being published)
I was thinking, on how could I keep on populating the list, at the same time graph with Oxyplot. Can I use threading for this?
var x = 0;
channel.Subscribe(message =>
{
this.RunOnUiThread(() =>
{
var sample = message.Data;
byte[] data = (byte[])sample;
var data1 = System.Text.Encoding.ASCII.GetString(data);
var splitData = data1.Split('-');
foreach(string s in splitData) //Contains 10
{
double y = double.Parse(s);
y /= 100;
series1.Points.Add(new DataPoint(x, y));
MyModel.InvalidatePlot(true);
x++;
}
if (x >= xaxis.Maximum)
{
xaxis.Pan(xaxis.Transform(-1 + xaxis.Offset));
}
});
});
Guaranteeing a minimum execution time goes into Realtime Programming. And with a App on a Smartphone OS you are about as far from that I can imagine you to be. The only thing farther "off" would be any interpreted langauge (PHP, Python).
The only thing you can do is define a minimum time between itterations. I did once wrote some example code for doing that from within a alternative thread. A basic rate limiting code:
integer interval = 20;
DateTime dueTime = DateTime.Now.AddMillisconds(interval);
while(true){
if(DateTime.Now >= dueTime){
//insert code here
//Update next dueTime
dueTime = DateTime.Now.AddMillisconds(interval);
}
else{
//Just yield to not tax out the CPU
Thread.Sleep(1);
}
}
Note that DateTime.Now only resturns something new about every 18 ms, so anything less then 20 would be too little.
If you think you can not afford a minimum time, you may need to read the Speed Rant.
I'm trying to turn an IEnumerable into an IObservable that delivers its items in chunks one second apart.
var spartans = Enumerable.Range(0, 300).ToObservable();
spartans
.Window(30)
.Zip(Observable.Timer(DateTimeOffset.Now, TimeSpan.FromMilliseconds(1000)), (x, _) => x)
.SelectMany(w => w)
.Subscribe(
n => Console.WriteLine("{0}", n),
() => Console.WriteLine("all end"));
With this code, the only thing that is printed is "all end" after ten seconds. If I remove the .Zip then the entire sequence prints instantaneously, and if I remove the .Window and .SelectMany then the entire sequence prints one item per second. If I peek into the "windowed" observable inside the lambda passed to SelectMany, I can see that it is empty. My question is, why?
The problem is occurring because of how Window works with a count - and this one isn't particularly intuitive!
As you know, Window serves a stream of streams. However, with a count, the child streams are "warm" - i.e. when an observer of this stream receives a new window in it's OnNext handler, it must subscribe to it before it cedes control back to the observable, or the events are lost.
Zip doesn't "know" it's dealing with this situation, and doesn't give you the opportunity to subscribe to each child window before it grabs the next.
If you remove the Zip, you see all the events because the SelectMany does subscribe to all the child windows as it receives them.
The easiest fix is to use Buffer instead of Window - make that one change and your code works. That's because Buffer works very similarly to SelectMany, effectively preserving the windows by doing this:
Window(30).SelectMany(x => x.ToList())
The elements are no longer warm windows but are crystallized as lists, and your Zip will now work as expected, with the following SelectMany flattening the lists out.
Important Performance Consideration
It's important to note that this approach will cause the entire IEnumerable<T> to be run through in one go. If the source enumerable should be lazily evaluated (which is usually desirable), you'll need to go a different way. Using a downstream observable to control the pace of an upstream one is tricky ground.
Let's replace your enumerable with a helper method so we can see when each batch of 30 is evaluated:
static IEnumerable<int> Spartans()
{
for(int i = 0; i < 300; i++)
{
if(i % 30 == 0)
Console.WriteLine("30 More!");
yield return i;
}
}
And use it like this (with the Buffer "fix" here, but the behaviour is similar with Window):
Spartans().ToObservable()
.Buffer(30)
.Zip(Observable.Timer(DateTimeOffset.Now,
TimeSpan.FromMilliseconds(1000)),
(x, _) => x)
.SelectMany(w => w)
.Subscribe(
n => Console.WriteLine("{0}", n),
() => Console.WriteLine("all end"));
Then you see this kind of output demonstrating how the source enumerable is drained all at once:
30 More!
0
1
...miss a few...
29
30 More!
30 More!
30 More!
30 More!
30 More!
30 More!
30 More!
30 More!
30 More!
30
31
32
...etc...
To truly pace the source, rather than using ToObservable() directly you could do the following. Note the Buffer operation on the Spartans() IEnumerable<T> comes from nuget package Ix-Main - added by the Rx team to plug a few holes on the IEnumerable<T> monad:
var spartans = Spartans().Buffer(30);
var pace = Observable.Timer(DateTimeOffset.Now, TimeSpan.FromMilliseconds(1000));
pace.Zip(spartans, (_,x) => x)
.SelectMany(x => x)
.Subscribe(
n => Console.WriteLine("{0}", n),
() => Console.WriteLine("all end"));
And the output becomes a probably much more desirable lazily evaluated output:
30 More!
0
1
2
...miss a few...
29
30 More!
30
31
32
...miss a few...
59
30 More!
60
61
62
...etc
I'm not sure how to get this working with Window, but what about this:
var spartans = Enumerable.Range(0, 300).ToObservable();
spartans
.Select(x => Observable.Timer(TimeSpan.FromSeconds(1)).Select(_ => x))
.Merge(30);
I´m writing a C# (.NET 4.5) application that is used to aggregate time based events for reporting purposes. To make my query logic reusable for both realtime and historical data I make use of the Reactive Extensions (2.0) and their IScheduler infrastructure (HistoricalScheduler and friends).
For example, assume we create a list of events (sorted chronologically, but they may coincide!) whose only payload ist their timestamp and want to know their distribution across buffers of a fixed duration:
const int num = 100000;
const int dist = 10;
var events = new List<DateTimeOffset>();
var curr = DateTimeOffset.Now;
var gap = new Random();
var time = new HistoricalScheduler(curr);
for (int i = 0; i < num; i++)
{
events.Add(curr);
curr += TimeSpan.FromMilliseconds(gap.Next(dist));
}
var stream = Observable.Generate<int, DateTimeOffset>(
0,
s => s < events.Count,
s => s + 1,
s => events[s],
s => events[s],
time);
stream.Buffer(TimeSpan.FromMilliseconds(num), time)
.Subscribe(l => Console.WriteLine(time.Now + ": " + l.Count));
time.AdvanceBy(TimeSpan.FromMilliseconds(num * dist));
Running this code results in a System.StackOverflowException with the following stack trace (it´s the last 3 lines all the way down):
mscorlib.dll!System.Threading.Interlocked.Exchange<System.IDisposable>(ref System.IDisposable location1, System.IDisposable value) + 0x3d bytes
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x37 bytes
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes
...
System.Reactive.Core.dll!System.Reactive.Disposables.AnonymousDisposable.Dispose() + 0x4d bytes
System.Reactive.Core.dll!System.Reactive.Disposables.SingleAssignmentDisposable.Dispose() + 0x4f bytes
System.Reactive.Core.dll!System.Reactive.Concurrency.ScheduledItem<System.DateTimeOffset>.Cancel() + 0x23 bytes
...
Ok, the problem seems to come from my use of Observable.Generate(), depending on the list size (num) and regardless of the choice of scheduler.
What am I doing wrong? Or more generally, what would be the preferred way to create an IObservable from an IEnumerable of events that provide their own timestamps?
(update - realized I didn't provide an alternative: see at bottom of answer)
The problem is in how Observable.Generate works - it's used to unfold a corecursive (think recursion turned inside out) generator based on the arguments; if those arguments end up generating a very nested corecursive generator, you'll blow your stack.
From this point on, I'm speculating a lot (don't have the Rx source in front of me) (see below), but I'm willing to bet your definition ends up expanding into something like:
initial_state =>
generate_next(initial_state) =>
generate_next(generate_next(initial_state)) =>
generate_next(generate_next(generate_next(initial_state))) =>
generate_next(generate_next(generate_next(generate_next(initial_state)))) => ...
And on and on until your call stack gets big enough to overflow. At, say, a method signature + your int counter, that'd be something like 8-16 bytes per recursive call (more depending on how the state machine generator is implemented), so 60,000 sounds about right (1M / 16 ~ 62500 maximum depth)
EDIT: Pulled up the source - confirmed: the "Run" method of Generate looks like this - take note of the nested calls to Generate:
protected override IDisposable Run(
IObserver<TResult> observer,
IDisposable cancel,
Action<IDisposable> setSink)
{
if (this._timeSelectorA != null)
{
Generate<TState, TResult>.α α =
new Generate<TState, TResult>.α(
(Generate<TState, TResult>) this,
observer,
cancel);
setSink(α);
return α.Run();
}
if (this._timeSelectorR != null)
{
Generate<TState, TResult>.δ δ =
new Generate<TState, TResult>.δ(
(Generate<TState, TResult>) this,
observer,
cancel);
setSink(δ);
return δ.Run();
}
Generate<TState, TResult>._ _ =
new Generate<TState, TResult>._(
(Generate<TState, TResult>) this,
observer,
cancel);
setSink(_);
return _.Run();
}
EDIT: Derp, didn't offer any alternatives...here's one that might work:
(EDIT: fixed Enumerable.Range, so stream size won´t be multiplied by chunkSize)
const int num = 160000;
const int dist = 10;
var events = new List<DateTimeOffset>();
var curr = DateTimeOffset.Now;
var gap = new Random();
var time = new HistoricalScheduler(curr);
for (int i = 0; i < num; i++)
{
events.Add(curr);
curr += TimeSpan.FromMilliseconds(gap.Next(dist));
}
// Size too big? Fine, we'll chunk it up!
const int chunkSize = 10000;
var numberOfChunks = events.Count / chunkSize;
// Generate a whole mess of streams based on start/end indices
var streams =
from chunkIndex in Enumerable.Range(0, (int)Math.Ceiling((double)events.Count / chunkSize) - 1)
let startIdx = chunkIndex * chunkSize
let endIdx = Math.Min(events.Count, startIdx + chunkSize)
select Observable.Generate<int, DateTimeOffset>(
startIdx,
s => s < endIdx,
s => s + 1,
s => events[s],
s => events[s],
time);
// E pluribus streamum
var stream = Observable.Concat(streams);
stream.Buffer(TimeSpan.FromMilliseconds(num), time)
.Subscribe(l => Console.WriteLine(time.Now + ": " + l.Count));
time.AdvanceBy(TimeSpan.FromMilliseconds(num * dist));
OK, I´ve taken a different factory method that doesn´t require lamdba expressions as state transitions and now I don´t see any stack overflows anymore. I´m not yet sure if this would qualify as a correct answer to my question, but it works and I thought I´d share it here:
var stream = Observable.Create<DateTimeOffset>(o =>
{
foreach (var e in events)
{
time.Schedule(e, () => o.OnNext(e));
}
time.Schedule(events[events.Count - 1], () => o.OnCompleted());
return Disposable.Empty;
});
Manually scheduling the events before (!) returning the subscription seems awkward to me, but in this case it can be done inside the lambda expression.
If there is anything wrong about this approach, please correct me. Also, I´d still be happy to hear what implicit assumptions by System.Reactive I have violated with my original code.
(Oh my, I should have checked that earlier: with RX v1.0, the original Observable.Generate() does in fact seem to work!)
I was hoping to figure out a way to write the below in a functional style with extension functions. Ideally this functional style would perform well compared to the iterative/loop version. I'm guessing that there isn't a way. Probably because of the many additional function calls and stack allocations, etc.
Fundamentally I think the pattern which is making it troublesome is that it both calculates a value to use for the Predicate and then needs that calculated value again as part of the resulting collection.
// This is what is passed to each function.
// Do not assume the array is in order.
var a = (0).To(999999).ToArray().Shuffle();
// Approx times in release mode (on my machine):
// Functional is avg 20ms per call
// Iterative is avg 5ms per call
// Linq is avg 14ms per call
private static List<int> Iterative(int[] a)
{
var squares = new List<int>(a.Length);
for (int i = 0; i < a.Length; i++)
{
var n = a[i];
if (n % 2 == 0)
{
int square = n * n;
if (square < 1000000)
{
squares.Add(square);
}
}
}
return squares;
}
private static List<int> Functional(int[] a)
{
return
a
.Where(x => x % 2 == 0 && x * x < 1000000)
.Select(x => x * x)
.ToList();
}
private static List<int> Linq(int[] a)
{
var squares =
from num in a
where num % 2 == 0 && num * num < 1000000
select num * num;
return squares.ToList();
}
An alternative to Konrad's suggestion. This avoids the double calculation, but also avoids even calculating the square when it doesn't have to:
return a.Where(x => x % 2 == 0)
.Select(x => x * x)
.Where(square => square < 1000000)
.ToList();
Personally, I wouldn't sweat the difference in performance until I'd seen it be significant in a larger context.
(I'm assuming that this is just an example, by the way. Normally you'd possibly compute the square root of 1000000 once and then just compare n with that, to shave off a few milliseconds. It does require two comparisons or an Abs operation though, of course.)
EDIT: Note that a more functional version would avoid using ToList at all. Return IEnumerable<int> instead, and let the caller transform it into a List<T> if they want to. If they don't, they don't take the hit. If they only want the first 5 values, they can call Take(5). That laziness could be a big performance win over the original version, depending on the context.
Just solving your problem of the double calculation:
return (from x in a
let sq = x * x
where x % 2 == 0 && sq < 1000000
select sq).ToList();
That said, I’m not sure that this will lead to much performance improvement. Is the functional variant actually noticeably faster than the iterative one? The code offers quite a lot of potential for automated optimisation.
How about some parallel processing? Or does the solution have to be LINQ (which I consider to be slow).
var squares = new List<int>(a.Length);
Parallel.ForEach(a, n =>
{
if(n < 1000 && n % 2 == 0) squares.Add(n * n);
}
The Linq version would be:
return a.AsParallel()
.Where(n => n < 1000 && n % 2 == 0)
.Select(n => n * n)
.ToList();
I don't think there's a functional solution that will be completely on-par with the iterative solution performance-wise. In my timings (see below) the 'functional' implementation from the OP appears to be around twice as slow as the iterative implementation.
Micro-benchmarks like this one are prone to all manner of issues. A common tactic in dealing with variability problems is to repeatedly call the method being timed and compute an average time per call - like this:
// from main
Time(Functional, "Functional", a);
Time(Linq, "Linq", a);
Time(Iterative, "Iterative", a);
// ...
static int reps = 1000;
private static List<int> Time(Func<int[],List<int>> func, string name, int[] a)
{
var sw = System.Diagnostics.Stopwatch.StartNew();
List<int> ret = null;
for(int i = 0; i < reps; ++i)
{
ret = func(a);
}
sw.Stop();
Console.WriteLine(
"{0} per call timings - {1} ticks, {2} ms",
name,
sw.ElapsedTicks/(double)reps,
sw.ElapsedMilliseconds/(double)reps);
return ret;
}
Here are the timings from one session:
Functional per call timings - 46493.541 ticks, 16.945 ms
Linq per call timings - 46526.734 ticks, 16.958 ms
Iterative per call timings - 21971.274 ticks, 8.008 ms
There are a host of other challenges as well: strobe-effects with the timer use, how and when the just-in-time compiler does its thing, the garbage collector running its collections, the order that competing algorithms are run, the type of cpu, the OS swapping other processes in and out, etc.
I tried my hand at a little optimization. I removed the square from the test (num * num < 1000000) - changing it to (num < 1000) - which seemed safe since there are no negatives in the input - that is, I took the square root of both sides of the inequality. Surprisingly, I got different results as compared to the methods in the OP - there were only 500 items in my optimized output as compared to the 241,849 from the three implementations in the OP implementations. So why the difference? Much of the input when squared overflows 32 bit integers, so those extra 241,349 items came from numbers that when squared overflowed to either negative numbers or numbers under 1 million while still passing our evenness test.
optimized (functional) timing:
Optimized per call timings - 16849.529 ticks, 6.141 ms
This was one of the functional implementations altered as suggested. It output the 500 items passing the criteria as expected. It is deceptively "faster" only because it output fewer items than the iterative solution.
We can make the original implementations blow up with an OverflowException by adding a checked block around their implementations. Here is a checked block added to the "Iterative" method:
private static List<int> Iterative(int[] a)
{
checked
{
var squares = new List<int>(a.Length);
// rest of method omitted for brevity...
return squares;
}
}