System.Reactive: time-based buffer by Timestamp field - c#

I am using Reactive library in my C# project to group data according to configured policies. All these data implement the following interface
public interface IPoint
{
object Value { get; }
DateTimeOffset Timestamp { get; }
}
One of the grouping policies I have to implement consists creating non-overlappable groups of a fixed time size, which looking at Reactive documentation could be achieved using Buffer(TimeSpan) function. This is exactly what I need, but instead of using a runtime calculated timestamp I need to use the ones defined in the Timestamp property of my objects.
I found this solution, which seems quite working:
public void Subscribe(Action<IEnumerable<IPoint>> callback)
{
long windowSizeTicks = TimeRange.Ticks; // TimeRange is my TimeSpan "buffer" size
dataPoints.GroupByUntil(x => x.Timestamp.Ticks / windowSizeTicks,
g => dataPoints.Where(x => x.Timestamp.Ticks / windowSizeTicks != g.Key))
.SelectMany(x => x.ToList())
.Subscribe(callback);
// dataPoints is ISubject<IPoint>
}
This solution just creates groups according to Ticks divisibility by TimeRange, which doesn't work correctly if the first item is not divisible by it.
An example to explain what I mean: considering the following points
Value: 1, Timestamp: "2021-04-26T00:00:01"
Value: 2, Timestamp: "2021-04-26T00:00:02"
Value: 3, Timestamp: "2021-04-26T00:00:03"
Value: 4, Timestamp: "2021-04-26T00:00:04"
and a "buffer size" of 2 seconds, I am expecting them to be grouped as [1, 2], [3, 4], but instead I receive [1], [2, 3], [4]. This happens because the grouping key is created considering the absolute time and not the difference from the starting of the data list.
I could save the timestamp of the first item and change the grouping function in this way, but I think (or at least I hope) there could be a better solution:
public void Subscribe(Action<IEnumerable<IPoint>> callback)
{
long windowSizeTicks = TimeRange.Ticks; // TimeRange is my TimeSpan "buffer" size
dataPoints.GroupByUntil(x => (x.Timestamp.Ticks - firstPoint.Timestamp.Ticks) / windowSizeTicks,
g => dataPoints.Where(x => (x.Timestamp.Ticks - firstPoint.Timestamp.Ticks) / windowSizeTicks != g.Key))
.SelectMany(x => x.ToList())
.Subscribe(callback);
// dataPoints is ISubject<IPoint>
}
I am a newbie at Reactive, so any helpful comment is welcome.
Thank you.

Related

System.Reactive: Buffer(timeSpan, timeShift) by Timestamp field

As said in this previous question, I am using the Reactive library in a C# project to group incoming data according to pre-configured policies. All these data implement the following interface:
public interface IPoint
{
object Value { get; }
DateTimeOffset Timestamp { get; }
}
My goal is to implement a "hopping" buffer based on the Timestamps of the received data (both buffer size and hop/shift size are declared at the beginning as TimeSpans). Hop/shift size can be less than the buffer size, which means that some IPoint instances can belong to more than one group.
An example: considering the following IPoint
Value: 1, Timestamp: "2021-05-25T00:00:01"
Value: 2, Timestamp: "2021-05-25T00:00:02"
Value: 3, Timestamp: "2021-05-25T00:00:03"
Value: 4, Timestamp: "2021-05-25T00:00:04"
Value: 5, Timestamp: "2021-05-25T00:00:05"
with a buffer size of 3 seconds and a hop/shift size of 2 seconds, I am expecting them to be grouped as [1, 2, 3], [3, 4, 5].
with a buffer size of 2 seconds and a hop/shift size of 3 seconds, I am expecting them to be grouped as [1, 2], [4, 5]
I've seen that there is a Buffer(timeSpan, timeShift) extension doing this job, but it considers a runtime-calculated timestamp instead of the ones of the passed IPoints.
I've tried to look for a solution, but I couldn't find anything helpful.
I am a newbie at Reactive, so any helpful comment is welcome (also for the other question). Thank you.
Edit: as in the previous question, I am using an ISubject<IPoint> in this way:
ISubject<IPoint> subject = new Subject<IPoint>();
// ...
// when new data come from an external source
public void Add(IPoint newPoint)
{
subject.OnNext(newPoint);
}
// subscription made by another class in order to be called when "hopping" buffer is ready
public void Subscribe(Action<IEnumerable<IPoint>> callback)
{
// TODO: implement buffer + .Subscribe(callback)
}

C# - Best Way to Match 2 items from a List Without Nested Loops

Say i have a list that hold minitues of film durations called
filmDurations in type of int.
And i have a int parameter called flightDuration for a duration
of any given flight in minitues.
My objective is :
For any given flightDuration, i want to match 2 film from my filmDurations that their sums exactly finishes 30 minutes from flight.
For example :
filmDurations = {130,105,125,140,120}
flightDuration = 280
My output : (130 120)
I can do it with nested loops. But it is not effective and it is time consuming.
I want to do it more effectively.
I thinked using Linq but still it is O(n^2).
What is the best effective way?
Edit: I want to clear one thing.
I want to find filmDurations[i] + filmDurations[j] in;
filmDurations[i] + filmDurations[j] == fligtDuration - 30
And say i have very big amont of film durations.
You could sort all durations (remove duplicates) (O(n log n)) and than iterate through them (until the length flight-duration -30). Search for the corresponding length of the second film (O(log n)).
This way you get all duration-pairs in O(n log n).
You can also use a HashMap (duration -> Films) to find matching pairs.
This way you can avoid sorting and binary search. Iterate through all durations and look up in the map if there are entries with duration = (flight-duration -30).
Filling the map needs O(n) lookup O(1) and you need to iterate all durations.
-> Over all complexity O(n) but you loose the possibility to find 'nearly matching pairs which would be easy to implement using the sorted list approach described above)
As Leisen Chang said you can put all items into dictionary. After doing that rewrite your equation
filmDurations[i] + filmDurations[j] == fligtDuration - 30
as
filmDurations[i] == (fligtDuration - 30 - filmDurations[j])
Now for each item in filmDurations search for (fligtDuration - 30 - filmDurations[j]) in dictionary. And if such item found you have a solution.
Next code implement this concept
public class IndicesSearch
{
private readonly List<int> filmDurations;
private readonly Dictionary<int, int> valuesAndIndices;
public IndicesSearch(List<int> filmDurations)
{
this.filmDurations = filmDurations;
// preprocessing O(n)
valuesAndIndices = filmDurations
.Select((v, i) => new {value = v, index = i})
.ToDictionary(k => k.value, v => v.index);
}
public (int, int) FindIndices(
int flightDuration,
int diff = 30)
{
// search, also O(n)
for (var i = 0; i < filmDurations.Count; ++i)
{
var filmDuration = filmDurations[i];
var toFind = flightDuration - filmDuration - diff;
if (valuesAndIndices.TryGetValue(toFind, out var j))
return (i, j);
}
// no solution found
return (-1, -1); // or throw exception
}
}

NTILE function equivalent in C#

I need to implement the following SQL in C# Linq:
SELECT NTILE (3) OVER (ORDER BY TransactionCount DESC) AS A...
I couldn't find any answer to a similar problem except this. However I don't think that is what I am looking for.
I don't even know where to start, if anyone could please give me at least a starting point I'd appreciated.
-- EDIT --
Trying to explain a little better.
I have one Store X with Transactions, Items, Units and other data that I retrieve from SQL and store in a object in C#.
I have a list of all stores with the same data but in this case I retrieve it from Analysis Services due to the large amount of data retrieved (and other reasons) and I store all of it in another object in C#.
So what I need is to order the list and find out if store X is in the top quartile of that list or second or third...
I hope that helps to clarify what I am trying to achieve.
Thank you
I believe that there is no simple LINQ equivalent of NTILE(n). Anyway, depending on your needs it's not that hard to write one.
The T-SQL documentation says
Distributes the rows in an ordered partition into a specified number of groups. The groups are numbered, starting at one. For each row, NTILE returns the number of the group to which the row belongs.
(see here)
For a very crude implementation of NTILE you can use GroupBy. The following example uses an int[] for sake of simplicity, but of course you are not restricted to
int n = 4;
int[] data = { 5, 2, 8, 2, 3, 8, 3, 2, 9, 5 };
var ntile = data.OrderBy(value => value)
.Select((value,index) => new {Value = value, Index = index})
.GroupBy(c => Math.Floor(c.Index / (data.Count() / (double)n)), c => c.Value);
First, our data is ordered ascending by it's values. If you are not using simple ints this could be something like store => store.Revenue (given you'd like to get the quantiles by revenue of the stores). Futhermore we are selecting the ordered data to an anonymous type, to include the indices. This is necessary since the indices are necessary for grouping, but it seems as GroupBy does not support lambdas with indices, as Select does.
The third line is a bit less intuitive, but I'll try and explain: The NTILE function assigns groups, the rows are assigned to. To create n groups, we devide N (number of items) by n to get the items per group and then device the current index by that, to determine in which group the current item is. To get the number of groups right I had to make the number of items per group fractional and floor the calculated group number, but admittedly, this is rather empirical.
ntile will contain n groups, each one having Key equal to the group number. Each group is enumerable. If you'd like to determine, if an element is in the second quartile, you can check if groups.Where(g => g.Key == 1) contains the element.
Remarks: The method I've used to determine the group may need some fine adjustment.
You can do it using GroupBy function by grouping based on index of the object. Consider a list of integers like this:-
List<int> numbers = new List<int> { 1, 2, 3, 4, 5, 6, 7, 8 };
You can first project the Index of all elements using Select and finally group by their resp. index. While calculating the Index we can divide it by NTILE value (3 in this case):-
var result = numbers.Select((v, i) => new { Value = v, Index = i / 3 })
.GroupBy(x => x.Index)
.Select(x => x.Select(z => z.Value).ToList());
Fiddle.

Getting all combinations of K and less elements in List of N elements with big K

I want to have all combination of elements in a list for a result like this:
List: {1,2,3}
1
2
3
1,2
1,3
2,3
My problem is that I have 180 elements, and I want to have all combinations up to 5 elements. With my tests with 4 elements, it took a long time (2 minutes) but all went well. But with 5 elements, I get a run out of memory exception.
My code presently is this:
public IEnumerable<IEnumerable<Rondin>> getPossibilites(List<Rondin> rondins)
{
var combin5 = rondins.Combinations(5);
var combin4 = rondins.Combinations(4);
var combin3 = rondins.Combinations(3);
var combin2 = rondins.Combinations(2);
var combin1 = rondins.Combinations(1);
return combin5.Concat(combin4).Concat(combin3).Concat(combin2).Concat(combin1).ToList();
}
With the fonction: (taken from this question: Algorithm to return all combinations of k elements from n)
public static IEnumerable<IEnumerable<T>> Combinations<T>(this IEnumerable<T> elements, int k)
{
return k == 0 ? new[] { new T[0] } :
elements.SelectMany((e, i) =>
elements.Skip(i + 1).Combinations(k - 1).Select(c => (new[] { e }).Concat(c)));
}
I need to search in the list for a combination where each element added up is near (with a certain precision) to a value, this for each element in an other list. There is all my code for this part:
var possibilites = getPossibilites(opt.rondins);
possibilites = possibilites.Where(p => p.Sum(r => r.longueur + traitScie) < 144);
foreach(BilleOptimisee b in opt.billesOptimisees)
{
var proches = possibilites.Where(p => p.Sum(r => (r.longueur + traitScie)) < b.chute && Math.Abs(b.chute - p.Sum(r => r.longueur)) - (p.Count() * 0.22) < 0.01).OrderByDescending(p => p.Sum(r => r.longueur)).ElementAt(0);
if(proches != null)
{
foreach (Rondin r in proches)
{
opt.rondins.Remove(r);
b.rondins.Add(r);
possibilites = possibilites.Where(p => !p.Contains(r));
}
}
}
With the code I have, how can I limit the memory taken by my list ? Or is there a better solution to search in a very big set of combinations ?
Please, if my question is not good, tell me why and I will do my best to learn and ask better questions next time ;)
Your output list for combinations of 5 elements will have ~1.5*10^9 (that's billion with b) sublists of size 5. If you use 32bit integers, even neglecting lists overhead and assuming you have a perfect list with 0b overhead - that will be ~200GB!
You should reconsider if you actually need to generate the list like you do, some alternative might be: streaming the list of elements - i.e. generating them on the fly.
That can be done by creating a function, which gets the last combination as an argument - and outputs the next. (to think how it is done, think about increasing by one a number. you go from last to first, remembering a "carry over" until you are done)
A streaming example for choosing 2 out of 4:
start: {4,3}
curr = start {4, 3}
curr = next(curr) {4, 2} // reduce last by one
curr = next(curr) {4, 1} // reduce last by one
curr = next(curr) {3, 2} // cannot reduce more, reduce the first by one, and set the follower to maximal possible value
curr = next(curr) {3, 1} // reduce last by one
curr = next(curr) {2, 1} // similar to {3,2}
done.
Now, you need to figure how to do it for lists of size 2, then generalize it for arbitrary size - and program your streaming combination generator.
Good Luck!
Let your precision be defined in the imaginary spectrum.
Use a real index to access the leaf and then traverse the leaf with the required precision.
See PrecisLise # http://net7mma.codeplex.com/SourceControl/latest#Common/Collections/Generic/PrecicseList.cs
While the implementation is not 100% complete as linked you can find where I used a similar concept here:
http://net7mma.codeplex.com/SourceControl/latest#RtspServer/MediaTypes/RFC6184Media.cs
Using this concept I was able to re-order h.264 Access Units and their underlying Network Access Layer Components in what I consider a very interesting way... outside of interesting it also has the potential to be more efficient using close the same amount of memory.
et al, e.g, 0 can be proceeded by 0.1 or 0.01 or 0.001, depending on the type of the key in the list (double, float, Vector, inter alia) you may have the added benefit of using the FPU or even possibly Intrinsics if supported by your processor, thus making sorting and indexing much faster than would be possible on normal sets regardless of the underlying storage mechanism.
Using this concept allows for very interesting ordering... especially if you provide a mechanism to filter the precision.
I was also able to find several bugs in the bit-stream parser of quite a few well known media libraries using this methodology...
I found my solution, I'm writing it here so that other people that has a similar problem than me can have something to work with...
I made a recursive fonction that check for a fixed amount of possibilities that fit the conditions. When the amount of possibilities is found, I return the list of possibilities, do some calculations with the results, and I can restart the process. I added a timer to stop the research when it takes too long. Since my condition is based on the sum of the elements, I do every possibilities with distinct values, and search for a small amount of possibilities each time (like 1).
So the fonction return a possibility with a very high precision, I do what I need to do with this possibility, I remove the elements of the original list, and recall the fontion with the same precision, until there is nothing returned, so I can continue with an other precision. When many precisions are done, there is only about 30 elements in my list, so I can call for all the possibilities (that still fits the maximum sum), and this part is much easier than the beginning.
There is my code:
public List<IEnumerable<Rondin>> getPossibilites(IEnumerable<Rondin> rondins, int nbElements, double minimum, double maximum, int instance = 0, double longueur = 0)
{
if(instance == 0)
timer = DateTime.Now;
List<IEnumerable<Rondin>> liste = new List<IEnumerable<Rondin>>();
//Get all distinct rondins that can fit into the maximal length
foreach (Rondin r in rondins.Where(r => r.longueur < (maximum - longueur)).DistinctBy(r => r.longueur).OrderBy(r => r.longueur))
{
//Check the current length
double longueur2 = longueur + r.longueur + traitScie;
//If the current length is under the maximal length
if (longueur2 < maximum)
{
//Get all the possibilities with all rondins except the current one, and add them to the list
foreach (IEnumerable<Rondin> poss in getPossibilites(rondins.Where(rondin => rondin.id != r.id), nbElements - liste.Count, minimum, maximum, instance + 1, longueur2).Select(possibilite => possibilite.Concat(new Rondin[] { r })))
{
liste.Add(poss);
if (liste.Count >= nbElements && nbElements > 0)
break;
}
//If this the current length in higher than the minimum, add it to the list
if (longueur2 >= minimum)
liste.Add(new Rondin[] { r });
}
//If we have enough possibilities, we stop the research
if (liste.Count >= nbElements && nbElements > 0)
break;
//If the research is taking too long, stop the research and return the list;
if (DateTime.Now.Subtract(timer).TotalSeconds > 30)
break;
}
return liste;
}

IObservable - Ignore new elements for a span of time

I'm trying to "throttle" an IObservable in (what I think is) a different way of the standard throttle methods.
I want to ignore values for 1s following a first non ignored value in the stream.
For example, if 1s=5 dashes
source: --1-23--45-----678901234
result: --1-----4------6----1---
Any ideas on how to achieve this?
Here is an idiomatic way to do this in Rx, as an extension method - an explanation and example using your scenario follows.
The desired function works a lot like Observable.Throttle but emits qualifying events as soon as they arrive rather than delaying for the duration of the throttle or sample period. For a given duration after a qualifying event, subsequent events are suppressed:
public static IObservable<T> SampleFirst<T>(
this IObservable<T> source,
TimeSpan sampleDuration,
IScheduler scheduler = null)
{
scheduler = scheduler ?? Scheduler.Default;
return source.Publish(ps =>
ps.Window(() => ps.Delay(sampleDuration,scheduler))
.SelectMany(x => x.Take(1)));
}
The idea is to use the overload of Window that creates non-overlapping windows using a windowClosingSelector that uses the source time-shifted back by the sampleDuration. Each window will therefore: (a) be closed by the first element in it and (b) remain open until a new element is permitted. We then simply select the first element from each window.
In the following example, I have repeated exactly your test scenario modelling one "dash" as 100 ticks. Note the delay is specified as 499 ticks rather than 500 due to the resolution of passing events between multiple schedulers causing 1 tick drifts - in practice you wouldn't need to dwell on this as single tick resolutions is unlikely to be meaningful. The ReactiveTest class and OnNext helper methods are made available by including the Rx testing framework nuget package rx-testing:
public class Tests : ReactiveTest
{
public void Scenario()
{
var scheduler = new TestScheduler();
var test = scheduler.CreateHotObservable<int>(
// set up events as per the OP scenario
// using 1 dash = 100 ticks
OnNext(200, 1),
OnNext(400, 2),
OnNext(500, 3),
OnNext(800, 4),
OnNext(900, 5),
OnNext(1500, 6),
OnNext(1600, 7),
OnNext(1700, 8),
OnNext(1800, 9),
OnNext(1900, 0),
OnNext(2000, 1),
OnNext(2100, 2),
OnNext(2200, 3),
OnNext(2300, 4)
);
test.SampleFirst(TimeSpan.FromTicks(499), scheduler)
.Timestamp(scheduler)
.Subscribe(x => Console.WriteLine(
"Time: {0} Value: {1}", x.Timestamp.Ticks, x.Value));
scheduler.Start();
}
}
Note that output is as per your scenario:
Time: 200 Value: 1
Time: 800 Value: 4
Time: 1500 Value: 6
Time: 2000 Value: 1
This should do the trick. There may be a shorter implementation.
The accumulate in the Scan stores the Timestamp of the last kept Item and marks whether to Keep each item.
public static IObservable<T> RateLimit<T>(this IObservable<T> source, TimeSpan duration)
{
return observable
.Timestamp()
.Scan(
new
{
Item = default(T),
Timestamp = DateTimeOffset.MinValue,
Keep = false
},
(a, x) =>
{
var keep = a.Timestamp + duration <= x.Timestamp;
return new
{
Item = x.Value,
Timestamp = keep ? x.Timestamp : a.Timestamp,
Keep = keep
};
}
})
.Where(a => a.Keep)
.Select(a => a.Item);
}

Categories

Resources