Self contained Reactive Extensions helper methods that require state - c#

Looking at https://eprystupa.wordpress.com/2009/12/18/detecting-running-highlow-prices-using-reactive-extensions-for-net/ it has an interesting code block:
var rnd = new Random();
var feed = Observable.Defer(() =>
Observable.Return(Math.Round(30.0 + rnd.NextDouble(), 2))
.Delay(TimeSpan.FromSeconds(1 * rnd.NextDouble())))
.Repeat();
// Daily low price feed
double min = double.MaxValue;
var feedLo = feed
.Where(p => p < min)
.Do(p => min = Math.Min(min, p))
.Select(p => "New LO: " + p);
// Daily high price feed
double max = double.MinValue;
var feedHi = feed
.Where(p => p > max)
.Do(p => max = Math.Max(max, p))
.Select(p => "New HI: " + p);
// Combine hi and lo in one feed and subscribe to it
feedLo.Merge(feedHi).Subscribe(Console.WriteLine);
The above is OK and does the job but the local variables max and min mean the code is quite specific whereas i would like to attach the NewLowHi code/indicator to an existing IObservable<double> much like https://github.com/fiatsasia/Financier has:
public static IObservable<TSource> SimpleMovingAverage<TSource>(this IObservable<TSource> source, int period)
{
return source.Buffer(period, 1).Select(e => e.Average());
}
What would be the best practice to create a self contained NewLowHi indicator which i could subscribe to without using (or at least hiding internally) the local variables max and min?

The code that you referred to on the WordPress site has some flaws.
Because of the way that they created the feed it is a hot observable in that every subscription will receive a different set of figures. So the feedLo and the feedHi observables will be working from different sets of variables.
But it gets worse. If two subscriptions are made to feedLo, for example, then there will be two subscriptions to feed but only one state variable for min which means that the value coming out will be the minimum value of both subscriptions and not the minimum for each.
I'll show how to do this properly, but first your question is about how to encapsulate state. Here's how:
IObservable<T> feed =
Observable
.Defer(() =>
{
int state = 42;
return Observable... // define your observable here.
});
Now, the feed source uses Random for its state. We can go ahead and rewrite feed using the above a pattern.
var feed =
Observable
.Defer(() =>
{
var rnd = new Random();
return
Observable
.Generate(
0, x => true, x => x,
x => Math.Round(30.0 + rnd.NextDouble(), 2),
x => TimeSpan.FromSeconds(rnd.NextDouble()));
});
I prefer to use Observable.Generate than the Defer/Return/Delay/Repeat pattern.
Now for how to get the min and max values out.
I want an IObservable<(State state, double value)> that gives me the high and low values from the one single subscription to the source observable. Here's what State looks like:
public enum State
{
High,
Low,
}
Here's my observable:
IObservable<(State state, double value)> feedHighLow(IObservable<double> source) =>
source.Publish(xs => Observable.Merge(
xs.Scan(Math.Min).DistinctUntilChanged().Select(x => (state: State.Low, value: x)),
xs.Scan(Math.Max).DistinctUntilChanged().Select(x => (state: State.High, value: x))));
Now I can call feedHighLow(feed) and get a stream of the High/Low values from a single subscription to the source feed. The Publish call ensures a single subscription to the source and the Merge means I can run two distinct observables to get the min and the max respectively.
I get results like this:

Related

UniRX using Merge operator for Shared observable

I have issue when using Merge for shared observables. In my project I have several streams that loads different data which must be added by certain order. So I've made simple example to find a solution. And the issue is that the 2nd merged observable won't get emitted values for obvious reasons.
If I remove Share operator everything would be fine but in this case root observable executes 2 times.
So another option is to add Replay operator after Share. But than I have to use Connect somewhere. Unfortunately in my project the observable is just a small part of huge loading chain.
And that's where I stucked.
Next code shows what problem is. observableFlatMap variable doesn't emit anything because every value that sharedObservable emits goes through observableNotEven and observableFlatMap connects only after every integer goes away.
using System.Linq;
using UniRx;
using UniRx.Diagnostics;
using UnityEngine;
public class Share : MonoBehaviour
{
void Start()
{
PrintNumbers();
}
private void PrintNumbers()
{
System.IObservable<int> sharedObservable = GetObservableInts();
var observableEven = sharedObservable.
Where(x => x % 2 == 0)
.Debug("Even");
var observableNotEven = sharedObservable.
Where(x => x % 2 == 1)
.Debug("NotEven");
var observableFlatMap = observableEven
.Select(x => x * 10);
_ = Observable.Merge(observableNotEven, observableFlatMap)
.Subscribe(_number => Debug.Log(_number))
.AddTo(this);
}
private static System.IObservable<int> GetObservableInts()
{
var count = 10;
var arrayInt = new int[count];
for (int id = 0; id != count; ++id)
arrayInt[id] = id;
var sharedObservable = arrayInt.ToObservable()
.Debug("Array")
.Share();
return sharedObservable;
}
I found 2 similar solution:
Use this operators .DelayFrame/Delay or .DelayFramsSubscription/DelaySubscription
before .Share
Feature of .Share behaviour is that it starts to emit values after first subscription and it continues to emit until last subscriber unsubscribes. And in my case after first subscriber (observableNotEven) .Share emits each value of integer array before next observable (observableFlatMap) connected (merged) to general observable sequence
var sharedObservable = arrayInt.ToObservable()
.Debug("Array")
.DelayFrameSubscription(1)
.Share();
Update
The answer to my question is to use Publish even if you get shared observable.
private void PrintNumbers()
{
IConnectableObservable<int> sharedObservable = GetObservableInts().Publish();
var observableEven = sharedObservable
.Where(x => x % 2 == 0)
.Debug("Even");
var observableNotEven = sharedObservable
.Where(x => x % 2 == 1)
.Debug("NotEven");
var observableFlatMap = observableEven
.Select(x => x * 10);
_ = Observable.Merge(observableNotEven, observableFlatMap)
.Subscribe(_number => Debug.Log(_number))
.AddTo(this);
//After observable chain building is complete. You should use *Connect*
sharedObservable.Connect().AddTo(this);
}

Looking for an elegant Rx.NET way to implement certain data processing

Given:
Database as the source of the data
The data has to be grouped and aggregated, where the aggregation process must be done in code and is asynchronous.
I am using the following simple code to simulate the real life:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
using System.Reactive.Linq;
using System.Reactive.Threading.Tasks;
using System.Threading.Tasks;
namespace ObservableTest
{
class Program
{
public class Result
{
public int Key;
private int m_previous = -1;
public async Task<Result> AggregateAsync(int x)
{
return await Task.Run(async () =>
{
await Task.Delay(10);
Debug.Assert(m_previous < 0 ? x == Key : m_previous == x - 10);
m_previous = x;
return this;
});
}
public int Complete()
{
Debug.Assert(m_previous / 10 == 9);
return Key;
}
}
static void Main()
{
var obs = GetSource()
.GroupBy(x => x % 10)
.SelectMany(g => g.Aggregate(Observable.Return(new Result { Key = g.Key }), (resultObs, x) => resultObs.SelectMany(result => result.AggregateAsync(x).ToObservable()))
.Merge()
.Select(x => x.Complete()));
obs.Subscribe(Console.WriteLine, () => Console.WriteLine("Press enter to exit ..."));
Console.ReadLine();
}
static IObservable<int> GetSource()
{
return Enumerable.Range(0, 10).SelectMany(remainder => Enumerable.Range(0, 10).Select(i => 10 * i + remainder)).ToObservable();
}
}
}
The GetSource returns numbers from 0 to 99 in a certain order. The order already matches the one needed for the grouping. View this method as if it was quering a database using a SQL statement with ORDER BY matching the anticipated grouping.
So, having an observable of database content I need to group it, aggregate asynchronously and replace each group with the aggregation result.
Here is my solution (from the code above):
var obs = GetSource()
.GroupBy(x => x % 10)
.SelectMany(g => g.Aggregate(Observable.Return(new Result { Key = g.Key }), (resultObs, x) => resultObs.SelectMany(result => result.AggregateAsync(x).ToObservable()))
.Merge()
.Select(x => x.Complete()));
I see multiple problems with it:
GroupBy is wrong here, because the data is already in the right order. It should be a sort of Window or Buffer, but driven by a predicate rather than sample count or time interval.
The asynchronous aggregation looks cumbersome and hence I assume I botched it too.
What is the proper Rx.NET way of achieving what I want?
I am not entirely sure whether there is a proper Rx way to solve this problem but things start getting messy in Rx when dealing with collections especially when items need to be added, updated or removed.
I wrote DynamicData an open source project which specifically deals with manipulating collections. So my disclaimer with this answer is I am very biased as to the solution.
Back to the problem, I would instantiate an observable cache like this
var myCache = new SourceCache<MyObject, MyId>(myobject=>myobject.Id)
You can now observe the cache and apply operators. To group and apply some transforms do the following
var mystream = myCache.Connect()
.Group(myobject => // group value) //creates an observable a cache for each group
.Transform((myGroup,key) => myGroup.Cache.Connect().QueryWhenChanged(query=> //aggregate result)
//now do something with the result
where Transform is an overload of Rx Select operator. I previous blogged a detailed solution which may be appropriate to your problem here Aggregation Example.
This cache is thread safe and you can use the addorupdate and remove methods to load and change it asynchronously.
Remember by default RX avoids concurrency. However, if you need to you can introduce schedulers to assign work when you want it.
Per your comments:
I don't believe using GroupBy Is bad here at all if you want a predicate to drive the partitioning.
my approach is below (can paste into linqpad with the reactive library included). I still struggling with warping my mind with observable but I believe this follows a good idiom as it's also shown by microsoft at https://msdn.microsoft.com/en-us/library/hh242963%28v=vs.103%29.aspx (last example)
void Main()
{
Console.WriteLine("starting on thread {0}",Thread.CurrentThread.ManagedThreadId);
//GetSource()
//.GroupBy(x => x % 10)
var sharedSource = GetSource().Publish();
var closingSignal = sharedSource.Where(MyPredicateFunc);
sharedSource.Window(()=>closingSignal)
.Select(x => x.ObserveOn(TaskPoolScheduler.Default))
.SelectMany(g=>g.Aggregate(0, (s,i) =>ExpensiveAggregateFunctionNoTask(s,i)).SingleAsync())
.Subscribe(i=>Console.WriteLine("Got {0} on thread {1}",i,Thread.CurrentThread.ManagedThreadId))
;
sharedSource.Connect();
}// Define other methods and classes here
bool MyPredicateFunc(int i){
return (i %10 == 0);
}
static IObservable<int> GetSource()
{
return Enumerable.Range(0, 10)
.SelectMany(remainder => Enumerable.Range(0, 10).Select(i => 10 * i + remainder)).ToObservable();
}
int ExpensiveAggregateFunctionNoTask(int lastResult, int currentElement){
var r = lastResult+currentElement;
Console.WriteLine("Adding {0} and {1} on thread {2}", lastResult, currentElement, Thread.CurrentThread.ManagedThreadId);
Task.Delay(250).Wait(); //simulate expensive operation
return r;
}
Doing this you will see that the we have created a new thread for each grouping made and then we wait async in the SelectMany.

Take specific number of array first then process

I have this code below that:
InstanceCollection instances = this.MyService(typeID, referencesIDs);
My problem here is when the referencesIDs.Count() is greater than a specific count, it throws an error which is related to SQL.
Suggested to me is to call the this.MyService multiple times so it won't process many referencesIDs.
What is the way to do that? I am thinking of using a while loop like this:
while (referencesIDs.Count() != maxCount)
{
newReferencesIDs = referencesIDs.Take(500).ToArray();
instances = this.MyService(typeID, newReferencesIDs);
maxCount += newReferencesIDs.Count();
}
The problem that I can see here is that how can I remove the first 500 referencesIDs on the newReferencesIDs? Because if I won't remove the first 500 after the first loop, it will continue to add the referencesIDs.
Are you just looking to update the referencesIDs value? Something like this?:
referencesIDs = referencesIDs.Skip(500);
Then the next time you call .Take(500) on referencesIDs it'll get the next 500 values.
Conversely, without updating the referencesIDs variable, you can include the Skip in your loop. Something like this:
var pageSize = 500;
var skipCount = 0;
while(...)
{
newReferencesIDs = referencesIDs.Skip(skipCount).Take(pageSize).ToArray();
skipCount += pageSize;
...
}
My first choice would be to fix the service, if you have access to it. A SQL-specific error could be a result of an incomplete database configuration, or a poorly written SQL query on the server. For example, Oracle limits IN lists in SQL queries to about 1000 items by default, but your Oracle DBA should be able to re-configure this limit for you. Alternatively, server side programmers could rewrite their query to avoid hitting this limit in the first place.
If this does not work, you could split your list into blocks of max size that does not trigger the error, make multiple calls to the server, and combine the instances on your end, like this:
InstanceCollection instances = referencesIDs
.Select((id, index) => new {Id = id, Index = index})
.GroupBy(p => p.Index / 500) // 500 is the max number of IDs
.SelectMany(g => this.MyService(typeID, g.Select(item => item.Id).ToArray()))
.ToList();
If you want a general way of splitting lists into chunks, you can use something like:
/// <summary>
/// Split a source IEnumerable into smaller (more manageable) lists.
/// </summary>
public static IEnumerable<IList<TSource>>
SplitIntoChunks<TSource>(this IEnumerable<TSource> source, int chunkSize)
{
long i = 1;
var list = new List<TSource>();
foreach (var t in source)
{
list.Add(t);
if (i++ % chunkSize == 0)
{
yield return list;
list = new List<TSource>();
}
}
if (list.Count > 0)
yield return list;
}
And then you can use SelectMany to flatten results:
InstanceCollection instances = referencesIDs
.SplitIntoChunks(500)
.SelectMany(chunk => MyService(typeID, chunk))
.ToList();

Sliding time window for record analysis

I have a data structure of phone calls. For this question there are two fields, CallTime and NumberDialled.
The analysis I want to perform is "Are there more than two calls to the same number in a 10 second window" The collection is sorted by CallTime already and is a List<Cdr>.
My solution is
List<Cdr> records = GetRecordsSortedByCallTime();
for (int i = 0; i < records.Count; i++)
{
var baseRecord = records[i];
for (int j = i; j < records.Count; j++)
{
var comparisonRec = records[j];
if (comparisonRec.CallTime.Subtract(baseRecord.CallTime).TotalSeconds < 20)
{
if (comparisonRec.NumberDialled == baseRecord.NumberDialled)
ReportProblem(baseRecord, comparisonRec);
}
else
{
// We're more than 20 seconds away from the base record. Break out of the inner loop
break;
}
}
}
Whis is ugly to say the least. Is there a better, cleaner and faster way of doing this?
Although I haven't tested this on a large data set, I will be running it on about 100,000 records per hour so there will be a large number of comparisons for each record.
Update The data is sorted by time not number as in an earlier version of the question
If the phone calls are already sorted by call time, you can do the following:
Initialize a hash table that has a counter for every phone number (the hash table can be first empty and you add elements to it as you go)
Have two pointers to the linked list of yours, let's call them 'left' and 'right'
Whenever the timestamp between the 'left' and 'right' call is less than 10 seconds, move 'right' forwards by one, and increment the count of the newly encountered phone number by one
Whenever the difference is above 10 seconds, move 'left' forwards by one and decrement the count for the phone number from which 'left' pointer left by one
At any point, if there is a phone number whose counter in the hash table is 3 or more, you have found a phone number that has more than 2 calls within a 10 seconds window
This is a linear-time algorithm and processes all the numbers in parallel.
I didn't know you exact structures, so I created my own for this demonstration:
class CallRecord
{
public long NumberDialled { get; set; }
public DateTime Stamp { get; set; }
}
class Program
{
static void Main(string[] args)
{
var calls = new List<CallRecord>()
{
new CallRecord { NumberDialled=123, Stamp=new DateTime(2011,01,01,10,10,0) },
new CallRecord { NumberDialled=123, Stamp=new DateTime(2011,01,01,10,10,9) },
new CallRecord { NumberDialled=123, Stamp=new DateTime(2011,01,01,10,10,18) },
};
var dupCalls = calls.Where(x => calls.Any(y => y.NumberDialled == x.NumberDialled && (x.Stamp - y.Stamp).Seconds > 0 && (x.Stamp - y.Stamp).Seconds <= 10)).Select(x => x.NumberDialled).Distinct();
foreach (var dupCall in dupCalls)
{
Console.WriteLine(dupCall);
}
Console.ReadKey();
}
}
The LINQ expression loops through all records and finds records which are ahead of the current record (.Seconds > 0), and within the time limit (.Seconds <= 10). This might be a bit of a performance hog due to the Any method constantly going over your whole list, but at least the code is cleaner :)
I recommand you to use Rx Extension and the Interval method.
The Reactive Extensions (Rx) is a library for composing asynchronous and event-based programs using observable sequences and LINQ-style query operators. Using Rx, developers represent asynchronous data streams with Observables, query asynchronous data streams using LINQ operators, and parameterize the concurrency in the asynchronous data streams using Schedulers
The Interval method returns an observable sequence that produces a value after each period
Here is quick example :
var callsPer10Seconds = Observable.Interval(TimeSpan.FromSeconds(10));
from x in callsPer10Seconds
group x by x into g
let count = g.Count()
orderby count descending
select new {Value = g.Key, Count = count};
foreach (var x in q)
{
Console.WriteLine("Value: " + x.Value + " Count: " + x.Count);
}
records.OrderBy(p => p.CallTime)
.GroupBy(p => p.NumberDialled)
.Select(p => new { number = p.Key, cdr = p.ToList() })
.Select(p => new
{
number = p.number,
cdr =
p.cdr.Select((value, index) => index == 0 ? null : (TimeSpan?)(value.CallTime - p.cdr[index - 1].CallTime))
.FirstOrDefault(q => q.HasValue && q.Value.TotalSeconds < 10)
}).Where(p => p.cdr != null);
In two steps :
Generate an enumeration with the call itself and all calls in the interesting span
Filter this list to find consecutive calls
The computation is done in parallel on each record using the AsParallel extension method.
It is also possible to not call the ToArray at the end and let the computation be done while other code could execute on the thread instead of forcing it to wait for the parallel computation to finish.
var records = new [] {
new { CallTime= DateTime.Now, NumberDialled = 1 },
new { CallTime= DateTime.Now.AddSeconds(1), NumberDialled = 1 }
};
var span = TimeSpan.FromSeconds(10);
// Select for each call itself and all other calls in the next 'span' seconds
var callInfos = records.AsParallel()
.Select((r, i) =>
new
{
Record = r,
Following = records.Skip(i+1)
.TakeWhile(r2 => r2.CallTime - r.CallTime < span)
}
);
// Filter the calls that interest us
var problematic = (from callinfo in callInfos
where callinfo.Following.Any(r => callinfo.Record.NumberDialled == r.NumberDialled)
select callinfo.Record)
.ToArray();
If performance is acceptable (which I think it should be, since 100k records is not particularly many), this approach is (I think) nice and clean:
First we group up the records by number:
var byNumber =
from cdr in calls
group cdr by cdr.NumberDialled into g
select new
{
NumberDialled = g.Key,
Calls = g.OrderBy(cdr => cdr.CallTime)
};
What we do now is Zip (.NET 4) each calls collection with itself-shifted-by-one, to transform the list of call times into a list of gaps between calls. We then look for numbers where there's a gap of at most 10 seconds:
var interestingNumbers =
from g in byNumber
let callGaps = g.Calls.Zip(g.Calls.Skip(1),
(cdr1, cdr2) => cdr2.CallTime - cdr1.CallTime)
where callGaps.Any(ts => ts.TotalSeconds <= 10)
select g.NumberDialled;
Now interestingNumbers is a sequence of the numbers of interest.

Multiple SUM using LINQ

I have a loop like the following, can I do the same using multiple SUM?
foreach (var detail in ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload &&
pd.InventoryType == InventoryTypes.Finished))
{
weight += detail.GrossWeight;
length += detail.Length;
items += detail.NrDistaff;
}
Technically speaking, what you have is probably the most efficient way to do what you are asking. However, you could create an extension method on IEnumerable<T> called Each that might make it simpler:
public static class EnumerableExtensions
{
public static void Each<T>(this IEnumerable<T> col, Action<T> itemWorker)
{
foreach (var item in col)
{
itemWorker(item);
}
}
}
And call it like so:
// Declare variables in parent scope
double weight;
double length;
int items;
ArticleLedgerEntries
.Where(
pd =>
pd.LedgerEntryType == LedgerEntryTypeTypes.Unload &&
pd.InventoryType == InventoryTypes.Finished
)
.Each(
pd =>
{
// Close around variables defined in parent scope
weight += pd.GrossWeight;
lenght += pd.Length;
items += pd.NrDistaff;
}
);
UPDATE:
Just one additional note. The above example relies on a closure. The variables weight, length, and items should be declared in a parent scope, allowing them to persist beyond each call to the itemWorker action. I've updated the example to reflect this for clarity sake.
You can call Sum three times, but it will be slower because it will make three loops.
For example:
var list = ArticleLedgerEntries.Where(pd => pd.LedgerEntryType == LedgerEntryTypeTypes.Unload
&& pd.InventoryType == InventoryTypes.Finished))
var totalWeight = list.Sum(pd => pd.GrossWeight);
var totalLength = list.Sum(pd => pd.Length);
var items = list.Sum(pd => pd.NrDistaff);
Because of delayed execution, it will also re-evaluate the Where call every time, although that's not such an issue in your case. This could be avoided by calling ToArray, but that will cause an array allocation. (And it would still run three loops)
However, unless you have a very large number of entries or are running this code in a tight loop, you don't need to worry about performance.
EDIT: If you really want to use LINQ, you could misuse Aggregate, like this:
int totalWeight, totalLength, items;
list.Aggregate((a, b) => {
weight += detail.GrossWeight;
length += detail.Length;
items += detail.NrDistaff;
return a;
});
This is phenomenally ugly code, but should perform almost as well as a straight loop.
You could also sum in the accumulator, (see example below), but this would allocate a temporary object for every item in your list, which is a dumb idea. (Anonymous types are immutable)
var totals = list.Aggregate(
new { Weight = 0, Length = 0, Items = 0},
(t, pd) => new {
Weight = t.Weight + pd.GrossWeight,
Length = t.Length + pd.Length,
Items = t.Items + pd.NrDistaff
}
);
You could also group by true - 1 (which is actually including any of the items and then have them counted or summered):
var results = from x in ArticleLedgerEntries
group x by 1
into aggregatedTable
select new
{
SumOfWeight = aggregatedTable.Sum(y => y.weight),
SumOfLength = aggregatedTable.Sum(y => y.Length),
SumOfNrDistaff = aggregatedTable.Sum(y => y.NrDistaff)
};
As far as Running time, it is almost as good as the loop (with a constant addition).
You'd be able to do this pivot-style, using the answer in this topic: Is it possible to Pivot data using LINQ?
Ok. I realize that there isn't an easy way to do this using LINQ. I'll take may foreach loop because I understood that it isn't so bad. Thanks to all of you

Categories

Resources