I want to effectively throttle an event stream, so that my delegate is called when the first event is received but then not for 1 second if subsequent events are received. After expiry of that timeout (1 second), if a subsequent event was received I want my delegate to be called.
Is there a simple way to do this using Reactive Extensions?
Sample code:
static void Main(string[] args)
{
Console.WriteLine("Running...");
var generator = Observable
.GenerateWithTime(1, x => x <= 100, x => x, x => TimeSpan.FromMilliseconds(1), x => x + 1)
.Timestamp();
var builder = new StringBuilder();
generator
.Sample(TimeSpan.FromSeconds(1))
.Finally(() => Console.WriteLine(builder.ToString()))
.Subscribe(feed =>
builder.AppendLine(string.Format("Observed {0:000}, generated at {1}, observed at {2}",
feed.Value,
feed.Timestamp.ToString("mm:ss.fff"),
DateTime.Now.ToString("mm:ss.fff"))));
Console.ReadKey();
}
Current output:
Running...
Observed 064, generated at 41:43.602, observed at 41:43.602
Observed 100, generated at 41:44.165, observed at 41:44.602
But I want to observe (timestamps obviously will change)
Running...
Observed 001, generated at 41:43.602, observed at 41:43.602
....
Observed 100, generated at 41:44.165, observed at 41:44.602
Okay,
you have 3 scenarios here:
1) I would like to get one value of the event stream every second.
means: that if it produces more events per second, you will get a always bigger buffer.
observableStream.Throttle(timeSpan)
2) I would like to get the latest event, that was produced before the second happens
means: other events get dropped.
observableStream.Sample(TimeSpan.FromSeconds(1))
3) you would like to get all events, that happened in the last second. and that every second
observableStream.BufferWithTime(timeSpan)
4) you want to select what happens in between the second with all the values, till the second has passed, and your result is returned
observableStream.CombineLatest(Observable.Interval(1000), selectorOnEachEvent)
Here's is what I got with some help from the RX Forum:
The idea is to issue a series of "tickets" for the original sequence to fire. These "tickets" are delayed for the timeout, excluding the very first one, which is immediately pre-pended to the ticket sequence. When an event comes in and there is a ticket waiting, the event fires immediately, otherwise it waits till the ticket and then fires. When it fires, the next ticket is issued, and so on...
To combine the tickets and original events, we need a combinator. Unfortunately, the "standard" .CombineLatest cannot be used here because it would fire on tickets and events that were used previousely. So I had to create my own combinator, which is basically a filtered .CombineLatest, that fires only when both elements in the combination are "fresh" - were never returned before. I call it .CombineVeryLatest aka .BrokenZip ;)
Using .CombineVeryLatest, the above idea can be implemented as such:
public static IObservable<T> SampleResponsive<T>(
this IObservable<T> source, TimeSpan delay)
{
return source.Publish(src =>
{
var fire = new Subject<T>();
var whenCanFire = fire
.Select(u => new Unit())
.Delay(delay)
.StartWith(new Unit());
var subscription = src
.CombineVeryLatest(whenCanFire, (x, flag) => x)
.Subscribe(fire);
return fire.Finally(subscription.Dispose);
});
}
public static IObservable<TResult> CombineVeryLatest
<TLeft, TRight, TResult>(this IObservable<TLeft> leftSource,
IObservable<TRight> rightSource, Func<TLeft, TRight, TResult> selector)
{
var ls = leftSource.Select(x => new Used<TLeft>(x));
var rs = rightSource.Select(x => new Used<TRight>(x));
var cmb = ls.CombineLatest(rs, (x, y) => new { x, y });
var fltCmb = cmb
.Where(a => !(a.x.IsUsed || a.y.IsUsed))
.Do(a => { a.x.IsUsed = true; a.y.IsUsed = true; });
return fltCmb.Select(a => selector(a.x.Value, a.y.Value));
}
private class Used<T>
{
internal T Value { get; private set; }
internal bool IsUsed { get; set; }
internal Used(T value)
{
Value = value;
}
}
Edit: here's another more compact variation of CombineVeryLatest proposed by Andreas Köpf on the forum:
public static IObservable<TResult> CombineVeryLatest
<TLeft, TRight, TResult>(this IObservable<TLeft> leftSource,
IObservable<TRight> rightSource, Func<TLeft, TRight, TResult> selector)
{
return Observable.Defer(() =>
{
int l = -1, r = -1;
return Observable.CombineLatest(
leftSource.Select(Tuple.Create<TLeft, int>),
rightSource.Select(Tuple.Create<TRight, int>),
(x, y) => new { x, y })
.Where(t => t.x.Item2 != l && t.y.Item2 != r)
.Do(t => { l = t.x.Item2; r = t.y.Item2; })
.Select(t => selector(t.x.Item1, t.y.Item1));
});
}
I was struggling with this same problem last night, and believe I've found a more elegant (or at least shorter) solution:
var delay = Observable.Empty<T>().Delay(TimeSpan.FromSeconds(1));
var throttledSource = source.Take(1).Concat(delay).Repeat();
This is the what I posted as an answer to this question in the Rx forum:
UPDATE:
Here is a new version that does no longer delay event forwarding when events occur with a time difference of more than one second:
public static IObservable<T> ThrottleResponsive3<T>(this IObservable<T> source, TimeSpan minInterval)
{
return Observable.CreateWithDisposable<T>(o =>
{
object gate = new Object();
Notification<T> last = null, lastNonTerminal = null;
DateTime referenceTime = DateTime.UtcNow - minInterval;
var delayedReplay = new MutableDisposable();
return new CompositeDisposable(source.Materialize().Subscribe(x =>
{
lock (gate)
{
var elapsed = DateTime.UtcNow - referenceTime;
if (elapsed >= minInterval && delayedReplay.Disposable == null)
{
referenceTime = DateTime.UtcNow;
x.Accept(o);
}
else
{
if (x.Kind == NotificationKind.OnNext)
lastNonTerminal = x;
last = x;
if (delayedReplay.Disposable == null)
{
delayedReplay.Disposable = Scheduler.ThreadPool.Schedule(() =>
{
lock (gate)
{
referenceTime = DateTime.UtcNow;
if (lastNonTerminal != null && lastNonTerminal != last)
lastNonTerminal.Accept(o);
last.Accept(o);
last = lastNonTerminal = null;
delayedReplay.Disposable = null;
}
}, minInterval - elapsed);
}
}
}
}), delayedReplay);
});
}
This was my earlier try:
var source = Observable.GenerateWithTime(1,
x => x <= 100, x => x, x => TimeSpan.FromMilliseconds(1), x => x + 1)
.Timestamp();
source.Publish(o =>
o.Take(1).Merge(o.Skip(1).Sample(TimeSpan.FromSeconds(1)))
).Run(x => Console.WriteLine(x));
Ok, here's one solution. I don't like it, particularly, but... oh well.
Hat tips to Jon for pointing me at SkipWhile, and to cRichter for the BufferWithTime. Thanks guys.
static void Main(string[] args)
{
Console.WriteLine("Running...");
var generator = Observable
.GenerateWithTime(1, x => x <= 100, x => x, x => TimeSpan.FromMilliseconds(1), x => x + 1)
.Timestamp();
var bufferedAtOneSec = generator.BufferWithTime(TimeSpan.FromSeconds(1));
var action = new Action<Timestamped<int>>(
feed => Console.WriteLine("Observed {0:000}, generated at {1}, observed at {2}",
feed.Value,
feed.Timestamp.ToString("mm:ss.fff"),
DateTime.Now.ToString("mm:ss.fff")));
var reactImmediately = true;
bufferedAtOneSec.Subscribe(list =>
{
if (list.Count == 0)
{
reactImmediately = true;
}
else
{
action(list.Last());
}
});
generator
.SkipWhile(item => reactImmediately == false)
.Subscribe(feed =>
{
if(reactImmediately)
{
reactImmediately = false;
action(feed);
}
});
Console.ReadKey();
}
Have you tried the Throttle extension method?
From the docs:
Ignores values from an observable sequence which are followed by another value before dueTime
It's not quite clear to me whether that's going to do what you want or not - in that you want to ignore the following values rather than the first value... but I would expect it to be what you want. Give it a try :)
EDIT: Hmmm... no, I don't think Throttle is the right thing, after all. I believe I see what you want to do, but I can't see anything in the framework to do it. I may well have missed something though. Have you asked on the Rx forum? It may well be that if it's not there now, they'd be happy to add it :)
I suspect you could do it cunningly with SkipUntil and SelectMany somehow... but I think it should be in its own method.
What you are searching for is the CombineLatest.
public static IObservable<TResult> CombineLatest<TLeft, TRight, TResult>(
IObservable<TLeft> leftSource,
IObservable<TRight> rightSource,
Func<TLeft, TRight, TResult> selector
)
that merges 2 obeservables, and returning all values, when the selector (time) has a value.
edit: john is right, that is maybe not the preferred solution
Inspired by Bluelings answer I provide here a version that compiles with Reactive Extensions 2.2.5.
This particular version counts the number of samples and also provide the last sampled value. To do this the following class is used:
class Sample<T> {
public Sample(T lastValue, Int32 count) {
LastValue = lastValue;
Count = count;
}
public T LastValue { get; private set; }
public Int32 Count { get; private set; }
}
Here is the operator:
public static IObservable<Sample<T>> SampleResponsive<T>(this IObservable<T> source, TimeSpan interval, IScheduler scheduler = null) {
if (source == null)
throw new ArgumentNullException(nameof(source));
return Observable.Create<Sample<T>>(
observer => {
var gate = new Object();
var lastSampleValue = default(T);
var lastSampleTime = default(DateTime);
var sampleCount = 0;
var scheduledTask = new SerialDisposable();
return new CompositeDisposable(
source.Subscribe(
value => {
lock (gate) {
var now = DateTime.UtcNow;
var elapsed = now - lastSampleTime;
if (elapsed >= interval) {
observer.OnNext(new Sample<T>(value, 1));
lastSampleValue = value;
lastSampleTime = now;
sampleCount = 0;
}
else {
if (scheduledTask.Disposable == null) {
scheduledTask.Disposable = (scheduler ?? Scheduler.Default).Schedule(
interval - elapsed,
() => {
lock (gate) {
if (sampleCount > 0) {
lastSampleTime = DateTime.UtcNow;
observer.OnNext(new Sample<T>(lastSampleValue, sampleCount));
sampleCount = 0;
}
scheduledTask.Disposable = null;
}
}
);
}
lastSampleValue = value;
sampleCount += 1;
}
}
},
error => {
if (sampleCount > 0)
observer.OnNext(new Sample<T>(lastSampleValue, sampleCount));
observer.OnError(error);
},
() => {
if (sampleCount > 0)
observer.OnNext(new Sample<T>(lastSampleValue, sampleCount));
observer.OnCompleted();
}
),
scheduledTask
);
}
);
}
In the following code I expected the result to be 3
Task<int> parent = Task.Factory.StartNew(() =>
{
var sum = 0;
TaskFactory tf = new TaskFactory(TaskCreationOptions.AttachedToParent,
TaskContinuationOptions.ExecuteSynchronously);
tf.StartNew(() => sum++);
tf.StartNew(() => sum++);
tf.StartNew(() => sum++);
return sum;
});
var finalTask = parent.ContinueWith(parentTask => Console.WriteLine(parentTask.Result));
finalTask.Wait();
However the result is 0 which I don't understand. The odd thing is when I change it to use and Array it does seem to be doing the right thing.
Task<Int32[]> parent = Task.Factory.StartNew(() =>
{
var results = new Int32[3];
TaskFactory tf = new TaskFactory(TaskCreationOptions.AttachedToParent,
TaskContinuationOptions.ExecuteSynchronously);
tf.StartNew(() => results[0] = 0);
tf.StartNew(() => results[1] = 1);
tf.StartNew(() => results[2] = 2);
return results;
});
var finalTask = parent.ContinueWith(
parentTask =>
{
foreach (int i in parentTask.Result)
Console.WriteLine(i);
});
finalTask.Wait();
Here the result is as expected:
0
1
2
I guess I am missing something very obvious, what do I need to fix in the first piece of code to have it return 3
Update
I already had a look at this Solution which is why I haven't used Task.Run but it didn't really make a difference
Difference between cases caused by difference in value-type versus reference-type semantic. int is a value type, so it copied on return, and any subsequent changes to sum variable are not seen. Arrays are reference type, so only reference copied on return, so any changes in array made by child tasks will be visible, as it is the same array. To make your first case work, you need to replace int by some reference type:
public class Reference<T> {
public T Value;
public Reference(T value) {
Value=value;
}
}
public static void Test() {
Task<Reference<int>> parent=Task.Factory.StartNew(() => {
var sum=new Reference<int>(0);
TaskFactory tf=new TaskFactory(TaskCreationOptions.AttachedToParent,
TaskContinuationOptions.ExecuteSynchronously);
tf.StartNew(() => sum.Value++);
tf.StartNew(() => sum.Value++);
tf.StartNew(() => sum.Value++);
return sum;
});
var finalTask=parent.ContinueWith(parentTask => Console.WriteLine(parentTask.Result.Value));
finalTask.Wait();
}
I've been trying for a long time to find a "clean" pattern to handle a .SelectMany with anonymous types when you don't always want to return a result. My most common use case looks like this:
We have a list of customers that I want to do reporting on.
Each customer's data resides in a separate database, so I do a parallel .SelectMany
In each lambda expression, I gather results for the customer toward the final report.
If a particular customer should be skipped, I need to return a empty list.
I whip these up often for quick reporting, so I'd prefer an anonymous type.
For example, the logic may looks something like this:
//c is a customer
var context = GetContextForCustomer(c);
// look up some data, myData using the context connection
if (someCondition)
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
else
return null;
This could be implemented as a foreach statement:
var results = new List<WhatType?>();
foreach (var c in customers) {
var context = GetContextForCustomer(c);
if (someCondition)
results.AddRange(myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 }));
}
Or it could be implemented with a .SelectMany that is pre-filtered with a .Where:
customers
.Where(c => someCondition)
.AsParallel()
.SelectMany(c => {
var context = GetContextForCustomer(c);
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
})
.ToList();
There are problems with both of these approaches. The foreach solution requires initializing a List to store the results, and you have to define the type. The .SelectMany with .Where is often impractical because the logic for someCondition is fairly complex and depends on some data lookups. So my ideal solution would look something like this:
customers
.AsParallel()
.SelectMany(c => {
var context = GetContextForCustomer(c);
if (someCondition)
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
else
continue? return null? return empty list?
})
.ToList();
What do I put in the else line to skip a return value? None of the solutions I can come up with work or are ideal:
continue doesn't compile because it's not an active foreach loop
return null causes an NRE
return empty list requires me to initialize a list of anonymous type again.
Is there a way to accomplish the above that is clean, simple, and neat, and satisfies all my (picky) requirements?
You could return an empty Enumerable<dynamic>. Here's an example (though without your customers and someCondition, because I don't know what they are, but of the same general form of your example):
new int[] { 1, 2, 3, 4 }
.AsParallel()
.SelectMany(i => {
if (i % 2 == 0)
return Enumerable.Repeat(new { i, squared = i * i }, i);
else
return Enumerable.Empty<dynamic>();
})
.ToList();
So, with your objects and someCondition, it would look like
customers
.AsParallel()
.SelectMany(c => {
var context = GetContextForCustomer(c);
if (someCondition)
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
else
return Enumerable.Empty<dynamic>();
})
.ToList();
Without knowing what someCondition and myData look like...
Why don't you just Select and Where the contexts as well:
customers
.Select(c => GetContextForCustomer(c))
.Where(ctx => someCondition)
.SelectMany(ctx =>
myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
EDIT: I just realized you need to carry both the customer and context further, so you can do this:
customers
.Select(c => new { Customer = c, Context = GetContextForCustomer(c) })
.Where(x => someCondition(x.Context))
.SelectMany(x =>
myData.Select(d => new { CustomerID = x.Customer, X1 = d.x1, X2 = d.x2 });
You can try following:
customers
.AsParallel()
.SelectMany(c => {
var context = GetContextForCustomer(c);
if (someCondition)
return myData.Select(x => new { CustomerID = c, X1 = x.x1, X2 = x.x2 });
else
return Enumerable.Empty<int>().Select(x => new { CustomerID = 0, X1 = "defValue", X2 = "defValue" });
})
.ToList();
All anonymous types with the same set of properties (the same names and types) are combined into one one anonymous class by compiler. That's why both your Select and the one on Enumerable.Empty will return the same T.
You can create your own variarion of SelectMany LINQ method which supports nulls:
public static class EnumerableExtensions
{
public static IEnumerable<TResult> NullableSelectMany<TSource, TResult> (
this IEnumerable<TSource> source,
Func<TSource, IEnumerable<TResult>> selector)
{
if (source == null)
throw new ArgumentNullException("source");
if (selector == null)
throw new ArgumentNullException("selector");
foreach (TSource item in source) {
IEnumerable<TResult> results = selector(item);
if (results != null) {
foreach (TResult result in results)
yield return result;
}
}
}
}
Now you can return null in the selector lambda.
The accepted answer returns dynamic. The cleanest would be to move the filtering logic into a Where which makes the whole thing look better in linq context. Since you specifically rule that out in the question and I'm not a fan of delegates written over multiple lines in a linq call I will try this, but one can argue its more hacky.
var results = new
{
customerID = default(int), //notice the casing of property names
x1 = default(U), //whatever types they are
x2 = default(V)
}.GetEmptyListOfThisType();
foreach (var customerID in customers) {
var context = GetContextForCustomer(customerID);
if (someCondition)
results.AddRange(myData.Select(x => new { customerID, x.x1, x.x2 }));
}
public static List<T> GetEmptyListOfThisType<T>(this T item)
{
return new List<T>();
}
Notice the appropriate use of property names which is in accordance with other variable names, hence you dont have to write the property names a second time in the Select call.
I have the following code:
Task.Factory.ContinueWhenAll(items.Select(p =>
{
return CreateItem(p);
}).ToArray(), completedTasks => { Console.WriteLine("completed"); });
Is it possible to convert ContinueWhenAll to a synchronous method? I want to switch back between async and sync.
Edit: I should metnion that each of the "tasks" in the continuewhenall method should be executing synchronously.
If you want to leave your existing code intact and have a variable option of executing synchronously you should make these changes:
bool isAsync = false; // some flag to check for async operation
var batch = Task.Factory.ContinueWhenAll(items.Select(p =>
{
return CreateItem(p);
}).ToArray(), completedTasks => { Console.WriteLine("completed"); });
if (!isAsync)
batch.Wait();
This way you can toggle it programmatically instead of by editing your source code. And you can keep the continuation code the same for both methods.
Edit:
Here is a simple pattern for having the same method represented as a synchronous and async version:
public Item CreateItem(string name)
{
return new Item(name);
}
public Task<Item> CreateItemAsync(string name)
{
return Task.Factory.StartNew(() => CreateItem(name));
}
Unless am mistaken this is what you're looking for
Task.WaitAll(tasks);
//continuation code here
i think you can try this.
using TaskContinuationOptions for a simple scenario.
var taskFactory = new TaskFactory(TaskScheduler.Defau
var random = new Random();
var tasks = Enumerable.Range(1, 30).Select(p => {
return taskFactory.StartNew(() => {
var timeout = random.Next(5, p * 50);
Thread.Sleep(timeout / 2);
Console.WriteLine(#" 1: ID = " + p);
return p;
}).ContinueWith(t => {
Console.WriteLine(#"* 2: ID = " + t.Result);
}, TaskContinuationOptions.ExecuteSynchronously);
}).ToArray();
Task.WaitAll(tasks);
or using TPL Dataflow for a complex scenario.
var step2 = new ActionBlock<int>(i => {
Thread.Sleep(i);
Console.WriteLine(#"* 2: ID = " + i);
}, new ExecutionDataflowBlockOptions {
MaxDegreeOfParallelism = 1,
//MaxMessagesPerTask = 1
});
var random = new Random();
var tasks = Enumerable.Range(1, 50).Select(p => {
return Task.Factory.StartNew(() => {
var timeout = random.Next(5, p * 50);
Thread.Sleep(timeout / 2);
Console.WriteLine(#" 1: ID = " + p);
return p;
}).ContinueWith(t => {
Thread.Sleep(t.Result);
step2.Post(t.Result);
});
}).ToArray();
await Task.WhenAll(tasks).ContinueWith(t => step2.Complete());
await step2.Completion;
I have a LINQ query as follows
m_FOO = rawcollection.Select(p=> p.Split(' ')).Select(p =>
{
int thing = 0;
try
{
thing = CalculationThatCanFail(p[1]);
}
catch{}
return new { Test = p[0], FooThing = thing};
})
.GroupBy(p => p.Test)
.ToDictionary(p => p.Key, s => s.Select(q => q.FooThing).ToList());
So, the CalculationThatCanFail throws sometimes. I don't want to put null in and then filter that out with another Where statement later, and a junk value is equally unacceptable. Does anyone know how to handle this cleanly? Thanks.
EDIT: There's a good reason for the double Select statement. This example was edited for brevity
I'm not clear from question if you mean, you don't want to use null for FooThing or you don't want to use null for the entire anonymously typed object. In any case, would this fit the bill?
m_FOO = rawcollection.Select(p=> p.Split(' ')).Select(p =>
{
int thing = 0;
try
{
thing = CalculationThatCanFail(p[1]);
return new { Test = p[0], FooThing = thing};
}
catch
{
return null;
}
})
.Where(p => p != null)
.GroupBy(p => p.Test)
.ToDictionary(p => p.Key, s => s.Select(q => q.FooThing).ToList());
For these situations I use a Maybe type (similar to this one) for calculations that may or may not return a value, instead of nulls or junk values. It would look like this:
Maybe<int> CalculationThatMayHaveAValue(string x)
{
try
{
return CalculationThatCanFail(x);
}
catch
{
return Maybe<int>.None;
}
}
//...
var xs = ps.Select(p =>
{
Maybe<int> thing = CalculationThatMayHaveAValue(p[1]);
return new { Test = p[0], FooThing = thing};
})
.Where(x => x.FooThing.HasValue);