RX Throttle with timeout [duplicate] - c#

I want to effectively throttle an event stream, so that my delegate is called when the first event is received but then not for 1 second if subsequent events are received. After expiry of that timeout (1 second), if a subsequent event was received I want my delegate to be called.
Is there a simple way to do this using Reactive Extensions?
Sample code:
static void Main(string[] args)
{
Console.WriteLine("Running...");
var generator = Observable
.GenerateWithTime(1, x => x <= 100, x => x, x => TimeSpan.FromMilliseconds(1), x => x + 1)
.Timestamp();
var builder = new StringBuilder();
generator
.Sample(TimeSpan.FromSeconds(1))
.Finally(() => Console.WriteLine(builder.ToString()))
.Subscribe(feed =>
builder.AppendLine(string.Format("Observed {0:000}, generated at {1}, observed at {2}",
feed.Value,
feed.Timestamp.ToString("mm:ss.fff"),
DateTime.Now.ToString("mm:ss.fff"))));
Console.ReadKey();
}
Current output:
Running...
Observed 064, generated at 41:43.602, observed at 41:43.602
Observed 100, generated at 41:44.165, observed at 41:44.602
But I want to observe (timestamps obviously will change)
Running...
Observed 001, generated at 41:43.602, observed at 41:43.602
....
Observed 100, generated at 41:44.165, observed at 41:44.602

Okay,
you have 3 scenarios here:
1) I would like to get one value of the event stream every second.
means: that if it produces more events per second, you will get a always bigger buffer.
observableStream.Throttle(timeSpan)
2) I would like to get the latest event, that was produced before the second happens
means: other events get dropped.
observableStream.Sample(TimeSpan.FromSeconds(1))
3) you would like to get all events, that happened in the last second. and that every second
observableStream.BufferWithTime(timeSpan)
4) you want to select what happens in between the second with all the values, till the second has passed, and your result is returned
observableStream.CombineLatest(Observable.Interval(1000), selectorOnEachEvent)

Here's is what I got with some help from the RX Forum:
The idea is to issue a series of "tickets" for the original sequence to fire. These "tickets" are delayed for the timeout, excluding the very first one, which is immediately pre-pended to the ticket sequence. When an event comes in and there is a ticket waiting, the event fires immediately, otherwise it waits till the ticket and then fires. When it fires, the next ticket is issued, and so on...
To combine the tickets and original events, we need a combinator. Unfortunately, the "standard" .CombineLatest cannot be used here because it would fire on tickets and events that were used previousely. So I had to create my own combinator, which is basically a filtered .CombineLatest, that fires only when both elements in the combination are "fresh" - were never returned before. I call it .CombineVeryLatest aka .BrokenZip ;)
Using .CombineVeryLatest, the above idea can be implemented as such:
public static IObservable<T> SampleResponsive<T>(
this IObservable<T> source, TimeSpan delay)
{
return source.Publish(src =>
{
var fire = new Subject<T>();
var whenCanFire = fire
.Select(u => new Unit())
.Delay(delay)
.StartWith(new Unit());
var subscription = src
.CombineVeryLatest(whenCanFire, (x, flag) => x)
.Subscribe(fire);
return fire.Finally(subscription.Dispose);
});
}
public static IObservable<TResult> CombineVeryLatest
<TLeft, TRight, TResult>(this IObservable<TLeft> leftSource,
IObservable<TRight> rightSource, Func<TLeft, TRight, TResult> selector)
{
var ls = leftSource.Select(x => new Used<TLeft>(x));
var rs = rightSource.Select(x => new Used<TRight>(x));
var cmb = ls.CombineLatest(rs, (x, y) => new { x, y });
var fltCmb = cmb
.Where(a => !(a.x.IsUsed || a.y.IsUsed))
.Do(a => { a.x.IsUsed = true; a.y.IsUsed = true; });
return fltCmb.Select(a => selector(a.x.Value, a.y.Value));
}
private class Used<T>
{
internal T Value { get; private set; }
internal bool IsUsed { get; set; }
internal Used(T value)
{
Value = value;
}
}
Edit: here's another more compact variation of CombineVeryLatest proposed by Andreas Köpf on the forum:
public static IObservable<TResult> CombineVeryLatest
<TLeft, TRight, TResult>(this IObservable<TLeft> leftSource,
IObservable<TRight> rightSource, Func<TLeft, TRight, TResult> selector)
{
return Observable.Defer(() =>
{
int l = -1, r = -1;
return Observable.CombineLatest(
leftSource.Select(Tuple.Create<TLeft, int>),
rightSource.Select(Tuple.Create<TRight, int>),
(x, y) => new { x, y })
.Where(t => t.x.Item2 != l && t.y.Item2 != r)
.Do(t => { l = t.x.Item2; r = t.y.Item2; })
.Select(t => selector(t.x.Item1, t.y.Item1));
});
}

I was struggling with this same problem last night, and believe I've found a more elegant (or at least shorter) solution:
var delay = Observable.Empty<T>().Delay(TimeSpan.FromSeconds(1));
var throttledSource = source.Take(1).Concat(delay).Repeat();

This is the what I posted as an answer to this question in the Rx forum:
UPDATE:
Here is a new version that does no longer delay event forwarding when events occur with a time difference of more than one second:
public static IObservable<T> ThrottleResponsive3<T>(this IObservable<T> source, TimeSpan minInterval)
{
return Observable.CreateWithDisposable<T>(o =>
{
object gate = new Object();
Notification<T> last = null, lastNonTerminal = null;
DateTime referenceTime = DateTime.UtcNow - minInterval;
var delayedReplay = new MutableDisposable();
return new CompositeDisposable(source.Materialize().Subscribe(x =>
{
lock (gate)
{
var elapsed = DateTime.UtcNow - referenceTime;
if (elapsed >= minInterval && delayedReplay.Disposable == null)
{
referenceTime = DateTime.UtcNow;
x.Accept(o);
}
else
{
if (x.Kind == NotificationKind.OnNext)
lastNonTerminal = x;
last = x;
if (delayedReplay.Disposable == null)
{
delayedReplay.Disposable = Scheduler.ThreadPool.Schedule(() =>
{
lock (gate)
{
referenceTime = DateTime.UtcNow;
if (lastNonTerminal != null && lastNonTerminal != last)
lastNonTerminal.Accept(o);
last.Accept(o);
last = lastNonTerminal = null;
delayedReplay.Disposable = null;
}
}, minInterval - elapsed);
}
}
}
}), delayedReplay);
});
}
This was my earlier try:
var source = Observable.GenerateWithTime(1,
x => x <= 100, x => x, x => TimeSpan.FromMilliseconds(1), x => x + 1)
.Timestamp();
source.Publish(o =>
o.Take(1).Merge(o.Skip(1).Sample(TimeSpan.FromSeconds(1)))
).Run(x => Console.WriteLine(x));

Ok, here's one solution. I don't like it, particularly, but... oh well.
Hat tips to Jon for pointing me at SkipWhile, and to cRichter for the BufferWithTime. Thanks guys.
static void Main(string[] args)
{
Console.WriteLine("Running...");
var generator = Observable
.GenerateWithTime(1, x => x <= 100, x => x, x => TimeSpan.FromMilliseconds(1), x => x + 1)
.Timestamp();
var bufferedAtOneSec = generator.BufferWithTime(TimeSpan.FromSeconds(1));
var action = new Action<Timestamped<int>>(
feed => Console.WriteLine("Observed {0:000}, generated at {1}, observed at {2}",
feed.Value,
feed.Timestamp.ToString("mm:ss.fff"),
DateTime.Now.ToString("mm:ss.fff")));
var reactImmediately = true;
bufferedAtOneSec.Subscribe(list =>
{
if (list.Count == 0)
{
reactImmediately = true;
}
else
{
action(list.Last());
}
});
generator
.SkipWhile(item => reactImmediately == false)
.Subscribe(feed =>
{
if(reactImmediately)
{
reactImmediately = false;
action(feed);
}
});
Console.ReadKey();
}

Have you tried the Throttle extension method?
From the docs:
Ignores values from an observable sequence which are followed by another value before dueTime
It's not quite clear to me whether that's going to do what you want or not - in that you want to ignore the following values rather than the first value... but I would expect it to be what you want. Give it a try :)
EDIT: Hmmm... no, I don't think Throttle is the right thing, after all. I believe I see what you want to do, but I can't see anything in the framework to do it. I may well have missed something though. Have you asked on the Rx forum? It may well be that if it's not there now, they'd be happy to add it :)
I suspect you could do it cunningly with SkipUntil and SelectMany somehow... but I think it should be in its own method.

What you are searching for is the CombineLatest.
public static IObservable<TResult> CombineLatest<TLeft, TRight, TResult>(
IObservable<TLeft> leftSource,
IObservable<TRight> rightSource,
Func<TLeft, TRight, TResult> selector
)
that merges 2 obeservables, and returning all values, when the selector (time) has a value.
edit: john is right, that is maybe not the preferred solution

Inspired by Bluelings answer I provide here a version that compiles with Reactive Extensions 2.2.5.
This particular version counts the number of samples and also provide the last sampled value. To do this the following class is used:
class Sample<T> {
public Sample(T lastValue, Int32 count) {
LastValue = lastValue;
Count = count;
}
public T LastValue { get; private set; }
public Int32 Count { get; private set; }
}
Here is the operator:
public static IObservable<Sample<T>> SampleResponsive<T>(this IObservable<T> source, TimeSpan interval, IScheduler scheduler = null) {
if (source == null)
throw new ArgumentNullException(nameof(source));
return Observable.Create<Sample<T>>(
observer => {
var gate = new Object();
var lastSampleValue = default(T);
var lastSampleTime = default(DateTime);
var sampleCount = 0;
var scheduledTask = new SerialDisposable();
return new CompositeDisposable(
source.Subscribe(
value => {
lock (gate) {
var now = DateTime.UtcNow;
var elapsed = now - lastSampleTime;
if (elapsed >= interval) {
observer.OnNext(new Sample<T>(value, 1));
lastSampleValue = value;
lastSampleTime = now;
sampleCount = 0;
}
else {
if (scheduledTask.Disposable == null) {
scheduledTask.Disposable = (scheduler ?? Scheduler.Default).Schedule(
interval - elapsed,
() => {
lock (gate) {
if (sampleCount > 0) {
lastSampleTime = DateTime.UtcNow;
observer.OnNext(new Sample<T>(lastSampleValue, sampleCount));
sampleCount = 0;
}
scheduledTask.Disposable = null;
}
}
);
}
lastSampleValue = value;
sampleCount += 1;
}
}
},
error => {
if (sampleCount > 0)
observer.OnNext(new Sample<T>(lastSampleValue, sampleCount));
observer.OnError(error);
},
() => {
if (sampleCount > 0)
observer.OnNext(new Sample<T>(lastSampleValue, sampleCount));
observer.OnCompleted();
}
),
scheduledTask
);
}
);
}

Related

check discontinuity of multiple ranges in a list

I would like to ask you if there's a way by Linq to check discontinuity of multiple ranges, for example we have a class AgeRange:
public class AgeRange
{
public int firstValue {get;set;}
public int secondValue {get;set;}
}
var ageRange1 = new AgeRange(0,2); // interval [0,2]
var ageRange2 = new AgeRange(4,10); // interval [4,10]
var ageRange3 = new AgeRange(11,int.MaxValue); // interval [11,+oo[
var ageRangeList = new List<AgeRange>();
ageRangeList.Add(ageRange1);
ageRangeList.Add(ageRange2);
ageRangeList.Add(ageRange3);
in this example we have a discontinuity between first range and second range.
is there a way in Linq to check discontinuity between elements in ageRangeList ?
Thanks for you help.
Assuming firstValue always <= secondValue (for the same element), you can try to use Aggregate:
var start = ageRangeList
.OrderBy(a => a.firstValue).Dump()
.First();
var result = ageRangeList
.OrderBy(a => a.firstValue)
.Aggregate(
(hasGap: false, s: start.secondValue),
(tuple, range) =>
{
if (tuple.hasGap)
{
return tuple;
}
else
{
var max = Math.Max(tuple.s, tuple.s+1); //hacky overflow protection
if (max < range.firstValue)
{
return (true, tuple.s);
}
else
{
return (false, Math.Max(tuple.s, range.secondValue));
}
}
})
.hasGap;
The downside of such approach is that it still will need to loop through all age ranges.
If you want to find first discontinuity and use that information elsewhere
public static IEnumerable<AgeRange> FindDiscontinuity(List<AgeRange> ageRangeList) {
foreach(var ageRange in ageRangeList.Zip(ageRangeList.Skip(1), (a, b) => new {Prev = a, Current = b})) {
if(ageRange.Prev.SecondValue != ageRange.Current.FirstValue) {
yield return ageRange.Prev;
yield return ageRange.Current;
break;
}
}
}
public static void Main()
{
var ageRange1 = new AgeRange(0, 2);
var ageRange2 = new AgeRange(4, 10);
var ageRange3 = new AgeRange(11, int.MaxValue);
var ageRangeList = new List<AgeRange>();
ageRangeList.Add(ageRange1);
ageRangeList.Add(ageRange2);
ageRangeList.Add(ageRange3);
var result = FindDiscontinuity(ageRangeList);
foreach(var ageRange in result) {
Console.WriteLine("{0}, {1}", ageRange.FirstValue, ageRange.SecondValue);
}
}
You can change the function so it can return boolean value instead of data.

Most efficient way to find all modes in a List of randomly generated integers and how often they occured

If a method written in C# will be passed either a null or somewhere between 0 to 6,000,000 randomly generated and unsorted integers, what is the most efficient way to determine all modes and how many times they occurred? In particular, can anyone help me with a LINQ based solution, which I'm struggling with?
Here is what I have so far:
My closest LINQ solution so far only grabs the first mode it finds and does not specify the number of occurrences. It is also about 7 times as slow on my computer as my ugly, bulky implementation, which is hideous.
int mode = numbers.GroupBy(number => number).OrderByDescending(group => group.Count()).Select(k => k.Key).FirstOrDefault();
My manually coded method.
public class NumberCount
{
public int Value;
public int Occurrences;
public NumberCount(int value, int occurrences)
{
Value = value;
Occurrences = occurrences;
}
}
private static List<NumberCount> findMostCommon(List<int> integers)
{
if (integers == null)
return null;
else if (integers.Count < 1)
return new List<NumberCount>();
List<NumberCount> mostCommon = new List<NumberCount>();
integers.Sort();
mostCommon.Add(new NumberCount(integers[0], 1));
for (int i=1; i<integers.Count; i++)
{
if (mostCommon[mostCommon.Count - 1].Value != integers[i])
mostCommon.Add(new NumberCount(integers[i], 1));
else
mostCommon[mostCommon.Count - 1].Occurrences++;
}
List<NumberCount> answer = new List<NumberCount>();
answer.Add(mostCommon[0]);
for (int i=1; i<mostCommon.Count; i++)
{
if (mostCommon[i].Occurrences > answer[0].Occurrences)
{
if (answer.Count == 1)
{
answer[0] = mostCommon[i];
}
else
{
answer = new List<NumberCount>();
answer.Add(mostCommon[i]);
}
}
else if (mostCommon[i].Occurrences == answer[0].Occurrences)
{
answer.Add(mostCommon[i]);
}
}
return answer;
}
Basically, I'm trying to get an elegant, compact LINQ solution at least as fast as my ugly method. Thanks in advance for any suggestions.
I would personally use a ConcurrentDictionary that would update a counter and dictionary are faster to access. I use this method quite a lot and it's more readable.
// create a dictionary
var dictionary = new ConcurrentDictionary<int, int>();
// list of you integers
var numbers = new List<int>();
// parallel the iteration ( we can because concurrent dictionary is thread safe-ish
numbers.AsParallel().ForAll((number) =>
{
// add the key if it's not there with value of 1 and if it's there it use the lambda function to increment by 1
dictionary.AddOrUpdate(number, 1, (key, old) => old + 1);
});
Then it's only a matter of getting the most occurrence there is many ways. I don't fully understand your version but the single most is only a matter of 1 aggregate like so :
var topMostOccurence = dictionary.Aggregate((x, y) => { return x.Value > y.Value ? x : y; });
what you want: 2+ numbers could appear same times in an array, like: {1,1,1,2,2,2,3,3,3}
your current code is from here: Find the most occurring number in a List<int>
but it returns a number only, it's exactly a wrong result.
The problem of Linq is: loop cannot end if you don't want it continue.
But, here I result a list with LINQ as you required:
List<NumberCount> MaxOccurrences(List<int> integers)
{
return integers?.AsParallel()
.GroupBy(x => x)//group numbers, key is number, count is count
.Select(k => new NumberCount(k.Key, k.Count()))
.GroupBy(x => x.Occurrences)//group by Occurrences, key is Occurrences, value is result
.OrderByDescending(x => x.Key) //sort
.FirstOrDefault()? //the first one is result
.ToList();
}
Test details:
Array Size:30000
30000
MaxOccurrences only
MaxOccurrences1: 207
MaxOccurrences2: 38
=============
Full List
Original1: 28
Original2: 23
ConcurrentDictionary1: 32
ConcurrentDictionary2: 34
AsParallel1: 27
AsParallel2: 19
AsParallel3: 36
ArraySize: 3000000
3000000
MaxOccurrences only
MaxOccurrences1: 3009
MaxOccurrences2: 1962 //<==this is the best one in big loop.
=============
Full List
Original1: 3200
Original2: 3234
ConcurrentDictionary1: 3391
ConcurrentDictionary2: 2681
AsParallel1: 3776
AsParallel2: 2389
AsParallel3: 2155
Here is code:
class Program
{
static void Main(string[] args)
{
const int listSize = 3000000;
var rnd = new Random();
var randomList = Enumerable.Range(1, listSize).OrderBy(e => rnd.Next()).ToList();
// the code that you want to measure comes here
Console.WriteLine(randomList.Count);
Console.WriteLine("MaxOccurrences only");
Test(randomList, MaxOccurrences1);
Test(randomList, MaxOccurrences2);
Console.WriteLine("=============");
Console.WriteLine("Full List");
Test(randomList, Original1);
Test(randomList, Original2);
Test(randomList, AsParallel1);
Test(randomList, AsParallel2);
Test(randomList, AsParallel3);
Console.ReadLine();
}
private static void Test(List<int> data, Action<List<int>> method)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
method(data);
watch.Stop();
Console.WriteLine($"{method.Method.Name}: {watch.ElapsedMilliseconds}");
}
private static void Original1(List<int> integers)
{
integers?.GroupBy(number => number)
.OrderByDescending(group => group.Count())
.Select(k => new NumberCount(k.Key, k.Count()))
.ToList();
}
private static void Original2(List<int> integers)
{
integers?.GroupBy(number => number)
.Select(k => new NumberCount(k.Key, k.Count()))
.OrderByDescending(x => x.Occurrences)
.ToList();
}
private static void AsParallel1(List<int> integers)
{
integers?.GroupBy(number => number)
.AsParallel() //each group will be count by a CPU unit
.Select(k => new NumberCount(k.Key, k.Count())) //Grap result, before sort
.OrderByDescending(x => x.Occurrences) //sort after result
.ToList();
}
private static void AsParallel2(List<int> integers)
{
integers?.AsParallel()
.GroupBy(number => number)
.Select(k => new
{
Key = k.Key,
Occurrences = k.Count()
}) //Grap result, before sort
.OrderByDescending(x => x.Occurrences) //sort after result
.ToList();
}
private static void AsParallel3(List<int> integers)
{
integers?.AsParallel()
.GroupBy(number => number)
.Select(k => new NumberCount(k.Key, k.Count())) //Grap result, before sort
.OrderByDescending(x => x.Occurrences) //sort after result
.ToList();
}
private static void MaxOccurrences1(List<int> integers)
{
integers?.AsParallel()
.GroupBy(number => number)
.GroupBy(x => x.Count())
.OrderByDescending(x => x.Key)
.FirstOrDefault()?
.ToList()
.Select(k => new NumberCount(k.Key, k.Count()))
.ToList();
}
private static void MaxOccurrences2(List<int> integers)
{
integers?.AsParallel()
.GroupBy(x => x)//group numbers, key is number, count is count
.Select(k => new NumberCount(k.Key, k.Count()))
.GroupBy(x => x.Occurrences)//group by Occurrences, key is Occurrences, value is result
.OrderByDescending(x => x.Key) //sort
.FirstOrDefault()? //the first one is result
.ToList();
}
private static void ConcurrentDictionary1(List<int> integers)
{
ConcurrentDictionary<int, int> result = new ConcurrentDictionary<int, int>();
integers?.ForEach(x => { result.AddOrUpdate(x, 1, (key, old) => old + 1); });
result.OrderByDescending(x => x.Value).ToList();
}
private static void ConcurrentDictionary2(List<int> integers)
{
ConcurrentDictionary<int, int> result = new ConcurrentDictionary<int, int>();
integers?.AsParallel().ForAll(x => { result.AddOrUpdate(x, 1, (key, old) => old + 1); });
result.OrderByDescending(x => x.Value).ToList();
}
}
public class NumberCount
{
public int Value;
public int Occurrences;
public NumberCount(int value, int occurrences)
{
Value = value;
Occurrences = occurrences;
}
}
Different code is more efficient for differing lengths, but as the length approaches 6 million, this approach seems fastest. In general, LINQ is not for improving the speed of code, but the understanding and maintainability, depending on how you feel about functional programming styles.
Your code is fairly fast, and beats the simple LINQ approaches using GroupBy. It gains a good advantage from using the fact that List.Sort is highly optimized, and my code uses that as well, but on a local copy of the list to avoid changing the source. My code is similar in approach to yours, but is designed around a single pass doing all the computation needed. It uses an extension method I re-optimized for this problem, called GroupByRuns, that returns an IEnumerable<IGrouping<T,T>>. It also is hand expanded rather than fall back on the generic GroupByRuns that takes extra arguments for key and result selection. Since .Net doesn't have an end user accessible IGrouping<,> implementation (!), I rolled my own that implements ICollection to optimize Count().
This code runs about 1.3x as fast as yours (after I slightly optimized yours by 5%).
First, the RunGrouping class to return a group of runs:
public class RunGrouping<T> : IGrouping<T, T>, ICollection<T> {
public T Key { get; }
int Count;
int ICollection<T>.Count => Count;
public bool IsReadOnly => true;
public RunGrouping(T key, int count) {
Key = key;
Count = count;
}
public IEnumerator<T> GetEnumerator() {
for (int j1 = 0; j1 < Count; ++j1)
yield return Key;
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public void Add(T item) => throw new NotImplementedException();
public void Clear() => throw new NotImplementedException();
public bool Contains(T item) => Count > 0 && EqualityComparer<T>.Default.Equals(Key, item);
public void CopyTo(T[] array, int arrayIndex) => throw new NotImplementedException();
public bool Remove(T item) => throw new NotImplementedException();
}
Second, the extension method on IEnumerable that groups the runs:
public static class IEnumerableExt {
public static IEnumerable<IGrouping<T, T>> GroupByRuns<T>(this IEnumerable<T> src) {
var cmp = EqualityComparer<T>.Default;
bool notAtEnd = true;
using (var e = src.GetEnumerator()) {
bool moveNext() {
return notAtEnd;
}
IGrouping<T, T> NextRun() {
var prev = e.Current;
var ct = 0;
while (notAtEnd && cmp.Equals(e.Current, prev)) {
++ct;
notAtEnd = e.MoveNext();
}
return new RunGrouping<T>(prev, ct);
}
notAtEnd = e.MoveNext();
while (notAtEnd)
yield return NextRun();
}
}
}
Finally, the extension method that finds the max count modes. Basically it goes through the runs and keeps a record of those int with the current longest run count.
public static class IEnumerableIntExt {
public static IEnumerable<KeyValuePair<int, int>> MostCommon(this IEnumerable<int> src) {
var mysrc = new List<int>(src);
mysrc.Sort();
var maxc = 0;
var maxmodes = new List<int>();
foreach (var g in mysrc.GroupByRuns()) {
var gc = g.Count();
if (gc > maxc) {
maxmodes.Clear();
maxmodes.Add(g.Key);
maxc = gc;
}
else if (gc == maxc)
maxmodes.Add(g.Key);
}
return maxmodes.Select(m => new KeyValuePair<int, int>(m, maxc));
}
}
Given an existing random list of integers rl, you can get the answer with:
var ans = rl.MostCommon();
I tested with the code below on my Intel i7-8700K and achieved the following results:
Lambda: found 78 in 134 ms.
Manual: found 78 in 368 ms.
Dictionary: found 78 in 195 ms.
static IEnumerable<int> GenerateNumbers(int amount)
{
Random r = new Random();
for (int i = 0; i < amount; i++)
yield return r.Next(100);
}
static void Main(string[] args)
{
var numbers = GenerateNumbers(6_000_000).ToList();
Stopwatch sw = Stopwatch.StartNew();
int mode = numbers.GroupBy(number => number).OrderByDescending(group => group.Count()).Select(k =>
{
int count = k.Count();
return new { Mode = k.Key, Count = count };
}).FirstOrDefault().Mode;
sw.Stop();
Console.WriteLine($"Lambda: found {mode} in {sw.ElapsedMilliseconds} ms.");
sw = Stopwatch.StartNew();
mode = findMostCommon(numbers)[0].Value;
sw.Stop();
Console.WriteLine($"Manual: found {mode} in {sw.ElapsedMilliseconds} ms.");
// create a dictionary
var dictionary = new ConcurrentDictionary<int, int>();
sw = Stopwatch.StartNew();
// parallel the iteration ( we can because concurrent dictionary is thread safe-ish
numbers.AsParallel().ForAll((number) =>
{
// add the key if it's not there with value of 1 and if it's there it use the lambda function to increment by 1
dictionary.AddOrUpdate(number, 1, (key, old) => old + 1);
});
mode = dictionary.Aggregate((x, y) => { return x.Value > y.Value ? x : y; }).Key;
sw.Stop();
Console.WriteLine($"Dictionary: found {mode} in {sw.ElapsedMilliseconds} ms.");
Console.ReadLine();
}
So far, Netmage's is the fastest I've found. The only thing I have been able to make that can beat it (at least with a valid range of 1 to 500,000,000) will only work with arrays ranging from 1 to 500,000,000 or smaller in value on my computer because I only have 8 GB of RAM. This prevents me from testing it with the full 1 to int.MaxValue range and I suspect it will fall behind in terms of speed at that size as it appears to struggle more and more with larger ranges. It uses the values as indexes and the value at those indexes as the occurrences. With 6 million randomly generated positive 16 bit integers, it is about 20 times as fast as my original method with both in Release Mode. It is only about 1.6 times as fast with 32 bit integers ranging from 1 to 500,000,000.
private static List<NumberCount> findMostCommon(List<int> integers)
{
List<NumberCount> answers = new List<NumberCount>();
int[] mostCommon = new int[_Max];
int max = 0;
for (int i = 1; i < integers.Count; i++)
{
int iValue = integers[i];
mostCommon[iValue]++;
int intVal = mostCommon[iValue];
if (intVal > 1)
{
if (intVal > max)
{
max++;
answers.Clear();
answers.Add(new NumberCount(iValue, max));
}
else if (intVal == max)
{
answers.Add(new NumberCount(iValue, max));
}
}
}
if (answers.Count < 1)
answers.Add(new NumberCount(0, -100)); // This -100 Occurrecnces value signifies that all values are equal.
return answers;
}
Perhaps a branching like this would be optiomal:
if (list.Count < sizeLimit)
answers = getFromSmallRangeMethod(list);
else
answers = getFromStandardMethod(list);

Can I use an anonymous type in a List<T> instead of a helper class?

I need a list with some objects for calculation.
my current code looks like this
private class HelperClass
{
public DateTime TheDate {get;set;}
public TimeSpan TheDuration {get;set;}
public bool Enabled {get;set;}
}
private TimeSpan TheMethod()
{
// create entries for every date
var items = new List<HelperClass>();
foreach(DateTime d in GetAllDatesOrdered())
{
items.Add(new HelperClass { TheDate = d, Enabled = GetEnabled(d), });
}
// calculate the duration for every entry
for (int i = 0; i < items.Count; i++)
{
var item = items[i];
if (i == items.Count -1) // the last one
item.TheDuration = DateTime.Now - item.TheDate;
else
item.TheDuration = items[i+1].TheDate - item.TheDate;
}
// calculate the total duration and return the result
var result = TimeSpan.Zero;
foreach(var item in items.Where(x => x.Enabled))
result = result.Add(item.TheDuration);
return result;
}
Now I find it a bit ugly just to introduce a type for my calculation (HelperClass).
My first approach was to use Tuple<DateTime, TimeSpan, bool> like I usually do this but since I need to modify the TimeSpan after creating the instance I can't use Tuple since Tuple.ItemX is readonly.
I thought about an anonymous type, but I can't figure out how to init my List
var item1 = new { TheDate = DateTime.Now,
TheDuration = TimeSpan.Zero, Enabled = true };
var items = new List<?>(); // How to declare this ???
items.Add(item1);
Using a projection looks like the way forward to me - but you can compute the durations as you go, by "zipping" your collection with itself, offset by one. You can then do the whole method in one query:
// Materialize the result to avoid computing possibly different sequences
var allDatesAndNow = GetDatesOrdered().Concat(new[] { DateTime.Now })
.ToList();
return allDatesNow.Zip(allDatesNow.Skip(1),
(x, y) => new { Enabled = GetEnabled(x),
Duration = y - x })
.Where(x => x.Enabled)
.Aggregate(TimeSpan.Zero, (t, pair) => t + pair.Duration);
The Zip call pairs up each date with its subsequent one, converting each pair of values into a duration and an enabled flag. The Where call filters out disabled pairs. The Aggregate call sums the durations from the resulting pairs.
You could do it with LINQ like:
var itemsWithoutDuration = GetAllDatesOrdered()
.Select(d => new { TheDate = d, Enabled = GetEnabled(d) })
.ToList();
var items = itemsWithoutDuration
.Select((it, k) => new { TheDate = it.d, Enabled = it.Enabled,
TheDuration = (k == (itemsWithoutDuration.Count - 1) ? DateTime.Now : itemsWithoutDuration[k+1].TheDate) - it.TheDate })
.ToList();
But by that point the Tuple is both more readable and more concise!

How can I use LINQ to calculate the longest streak?

Currently, this is just something I am curious about, I don't have any code I am working on but I am wondering how this could be achieved...
Lets say for example that I have an application that tracks the results of all the football teams in the world. What I want to be able to do is to identify the longest "win" streak for any given team.
I imagine I would most likely have some sort of data table like so:
MatchDate datetime
TeamA string
TeamB string
TeamAGoals int
TeamBGoals int
So what I would want to do for example is find the longest win streak where TeamA = "My Team" and obviously this would mean TeamAGoals must be greater than TeamBGoals.
As I have said, this is all just for example. It may be better for a different DB design for something like this. But the root question is how to calculate the longest streak/run of matching results.
This is an old question now, but I just had to solve the same problem myself, and thought people might be interested in a fully LINQ implementation of Rawling's LongestStreak extension method. This uses Aggregate with a seed and result selector to run through the list.
public static int LongestStreak<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
return source.Aggregate(
new {Longest = 0, Current = 0},
(agg, element) => predicate(element) ?
new {Longest = Math.Max(agg.Longest, agg.Current + 1), Current = agg.Current + 1} :
new {agg.Longest, Current = 0},
agg => agg.Longest);
}
There's no out-of-the-box LINQ method to count streaks, so you'll need a custom LINQy method such as
public static int LongestStreak<TSource>(
this IEnumerable<TSource> source,
Func<TSource, bool> predicate)
{
int longestStreak = 0;
int currentStreak = 0;
foreach (TSource s in source)
{
if (predicate(s))
currentStreak++;
else
{
if (currentStreak > longestStreak) longestStreak = currentStreak;
currentStreak = 0;
}
}
if (currentStreak > longestStreak) longestStreak = currentStreak;
return longestStreak;
}
Then, to use this, first turn each "match result" into a pair of "team results".
var teamResults = matches.SelectMany(m => new[] {
new {
MatchDate = m.MatchDate,
Team = m.TeamA,
Won = m.TeamAGoals > m.TeamBGoals },
new {
MatchDate = m.MatchDate,
Team = m.TeamB,
Won = m.TeamBGoals > m.TeamAGoals }
});
Group these by team.
var groupedResults = teamResults.GroupBy(r => r.Team);
Then calculate the streaks.
var streaks = groupedResults.Select(g => new
{
Team = g.Key,
StreakLength = g
// unnecessary if the matches were ordered originally
.OrderBy(r => r.MatchDate)
.LongestStreak(r => r.Won)
});
If you want the longest streak only, use MoreLinq's MaxBy; if you want them all ordered, you can use OrderByDescending(s => s.StreakLength).
Alternatively, if you want to do this in one pass, and assuming matches is already ordered, using the following class
class StreakAggregator<TKey>
{
public Dictionary<TKey, int> Best = new Dictionary<TKey, int>();
public Dictionary<TKey, int> Current = new Dictionary<TKey, int>();
public StreakAggregator<TKey> UpdateWith(TKey key, bool success)
{
int c = 0;
Current.TryGetValue(key, out c);
if (success)
{
Current[key] = c + 1;
}
else
{
int b = 0;
Best.TryGetValue(key, out b);
if (c > b)
{
Best[key] = c;
}
Current[key] = 0;
}
return this;
}
public StreakAggregator<TKey> Finalise()
{
foreach (TKey k in Current.Keys.ToArray())
{
UpdateWith(k, false);
}
return this;
}
}
you can then do
var streaks = teamResults.Aggregate(
new StreakAggregator<string>(),
(a, r) => a.UpdateWith(r.Team, r.Won),
(a) => a.Finalise().Best.Select(kvp =>
new { Team = kvp.Key, StreakLength = kvp.Value }));
and OrderBy or whatever as before.
You can get all results of team with single query:
var results = from m in Matches
let homeMatch = m.TeamA == teamName
let awayMatch = m.TeamB == teamName
let hasWon = (homeMatch && m.TeamAGoals > m.TeamBGoals) ||
(awayMatch && m.TeamBGoals > m.TeamAGoals)
where homeMatch || awayMatch
orderby m.MatchDate
select hasWon;
Then just do simple calculation of longest streak:
int longestStreak = 0;
int currentStreak = 0;
foreach (var hasWon in results)
{
if (hasWon)
{
currentStreak++;
if (currentStreak > longestStreak)
longestStreak = currentStreak;
continue;
}
currentStreak = 0;
}
You can use it as is, extract to method, or create IEnumerable extension for calculating longest sequence in results.
You could make use of string.Split. Something like this:
int longestStreak =
string.Concat(results.Select(r => (r.ours > r.theirs) ? "1" : "0"))
.Split(new[] { '0' })
.Max(s => s.Length);
Or, better, create a Split extension method for IEnumerable<T> to avoid the need to go via a string, like this:
public static IEnumerable<IEnumerable<T>> Split<T>(this IEnumerable<T> items, Predicate<T> p)
{
while (true)
{
items = items.SkipWhile(i => !p(i));
var trueItems = items.TakeWhile (i => p(i)).ToList();
if (trueItems.Count > 0)
{
yield return trueItems;
items = items.Skip(trueItems.Count);
}
else
{
break;
}
}
}
You can then simply do this:
int longestStreak = results.Split(r => r.ours > r.theirs).Max(g => g.Count());

Does reactive extensions support rolling buffers?

I'm using reactive extensions to collate data into buffers of 100ms:
this.subscription = this.dataService
.Where(x => !string.Equals("FOO", x.Key.Source))
.Buffer(TimeSpan.FromMilliseconds(100))
.ObserveOn(this.dispatcherService)
.Where(x => x.Count != 0)
.Subscribe(this.OnBufferReceived);
This works fine. However, I want slightly different behavior than that provided by the Buffer operation. Essentially, I want to reset the timer if another data item is received. Only when no data has been received for the entire 100ms do I want to handle it. This opens up the possibility of never handling the data, so I should also be able to specify a maximum count. I would imagine something along the lines of:
.SlidingBuffer(TimeSpan.FromMilliseconds(100), 10000)
I've had a look around and haven't been able to find anything like this in Rx? Can anyone confirm/deny this?
This is possible by combining the built-in Window and Throttle methods of Observable. First, let's solve the simpler problem where we ignore the maximum count condition:
public static IObservable<IList<T>> BufferUntilInactive<T>(this IObservable<T> stream, TimeSpan delay)
{
var closes = stream.Throttle(delay);
return stream.Window(() => closes).SelectMany(window => window.ToList());
}
The powerful Window method did the heavy lifting. Now it's easy enough to see how to add a maximum count:
public static IObservable<IList<T>> BufferUntilInactive<T>(this IObservable<T> stream, TimeSpan delay, Int32? max=null)
{
var closes = stream.Throttle(delay);
if (max != null)
{
var overflows = stream.Where((x,index) => index+1>=max);
closes = closes.Merge(overflows);
}
return stream.Window(() => closes).SelectMany(window => window.ToList());
}
I'll write a post explaining this on my blog. https://gist.github.com/2244036
Documentation for the Window method:
http://leecampbell.blogspot.co.uk/2011/03/rx-part-9join-window-buffer-and-group.html
http://enumeratethis.com/2011/07/26/financial-charts-reactive-extensions/
I wrote an extension to do most of what you're after - BufferWithInactivity.
Here it is:
public static IObservable<IEnumerable<T>> BufferWithInactivity<T>(
this IObservable<T> source,
TimeSpan inactivity,
int maximumBufferSize)
{
return Observable.Create<IEnumerable<T>>(o =>
{
var gate = new object();
var buffer = new List<T>();
var mutable = new SerialDisposable();
var subscription = (IDisposable)null;
var scheduler = Scheduler.ThreadPool;
Action dump = () =>
{
var bts = buffer.ToArray();
buffer = new List<T>();
if (o != null)
{
o.OnNext(bts);
}
};
Action dispose = () =>
{
if (subscription != null)
{
subscription.Dispose();
}
mutable.Dispose();
};
Action<Action<IObserver<IEnumerable<T>>>> onErrorOrCompleted =
onAction =>
{
lock (gate)
{
dispose();
dump();
if (o != null)
{
onAction(o);
}
}
};
Action<Exception> onError = ex =>
onErrorOrCompleted(x => x.OnError(ex));
Action onCompleted = () => onErrorOrCompleted(x => x.OnCompleted());
Action<T> onNext = t =>
{
lock (gate)
{
buffer.Add(t);
if (buffer.Count == maximumBufferSize)
{
dump();
mutable.Disposable = Disposable.Empty;
}
else
{
mutable.Disposable = scheduler.Schedule(inactivity, () =>
{
lock (gate)
{
dump();
}
});
}
}
};
subscription =
source
.ObserveOn(scheduler)
.Subscribe(onNext, onError, onCompleted);
return () =>
{
lock (gate)
{
o = null;
dispose();
}
};
});
}
With Rx Extensions 2.0, your can answer both requirements with a new Buffer overload accepting a timeout and a size:
this.subscription = this.dataService
.Where(x => !string.Equals("FOO", x.Key.Source))
.Buffer(TimeSpan.FromMilliseconds(100), 1)
.ObserveOn(this.dispatcherService)
.Where(x => x.Count != 0)
.Subscribe(this.OnBufferReceived);
See https://msdn.microsoft.com/en-us/library/hh229200(v=vs.103).aspx for the documentation.
I guess this can be implemented on top of Buffer method as shown below:
public static IObservable<IList<T>> SlidingBuffer<T>(this IObservable<T> obs, TimeSpan span, int max)
{
return Observable.CreateWithDisposable<IList<T>>(cl =>
{
var acc = new List<T>();
return obs.Buffer(span)
.Subscribe(next =>
{
if (next.Count == 0) //no activity in time span
{
cl.OnNext(acc);
acc.Clear();
}
else
{
acc.AddRange(next);
if (acc.Count >= max) //max items collected
{
cl.OnNext(acc);
acc.Clear();
}
}
}, err => cl.OnError(err), () => { cl.OnNext(acc); cl.OnCompleted(); });
});
}
NOTE: I haven't tested it, but I hope it gives you the idea.
Colonel Panic's solution is almost perfect. The only thing that is missing is a Publish component, in order to make the solution work with cold sequences too.
/// <summary>
/// Projects each element of an observable sequence into a buffer that's sent out
/// when either a given inactivity timespan has elapsed, or it's full,
/// using the specified scheduler to run timers.
/// </summary>
public static IObservable<IList<T>> BufferUntilInactive<T>(
this IObservable<T> source, TimeSpan dueTime, int maxCount,
IScheduler scheduler = default)
{
if (maxCount < 1) throw new ArgumentOutOfRangeException(nameof(maxCount));
scheduler ??= Scheduler.Default;
return source.Publish(published =>
{
var combinedBoundaries = Observable.Merge
(
published.Throttle(dueTime, scheduler),
published.Skip(maxCount - 1)
);
return published
.Window(() => combinedBoundaries)
.SelectMany(window => window.ToList());
});
}
Beyond adding the Publish, I've also replaced the original .Where((_, index) => index + 1 >= maxCount) with the equivalent but shorter .Skip(maxCount - 1). For completeness there is also an IScheduler parameter, which configures the scheduler where the timer is run.

Categories

Resources