Reactive Extensions: buffer until subscriber is idle - c#

I have a program where I'm receiving events and want to process them in batches, so that all items that come in while I'm processing the current batch will appear in the next batch.
The simple TimeSpan and count based Buffer methods in Rx will give me multiple batches of items instead of giving me one big batch of everything that has come in (in cases when the subscriber takes longer than the specified TimeSpan or more than N items come in and N is greater than count).
I looked at using the more complex Buffer overloads that take Func<IObservable<TBufferClosing>> or IObservable<TBufferOpening> and Func<TBufferOpening, IObservable<TBufferClosing>>, but I can't find examples of how to use these, much less figure out how to apply them to what I'm trying to do.

Does this do what you want?
var xs = new Subject<int>();
var ys = new Subject<Unit>();
var zss =
xs.Buffer(ys);
zss
.ObserveOn(Scheduler.Default)
.Subscribe(zs =>
{
Thread.Sleep(1000);
Console.WriteLine(String.Join("-", zs));
ys.OnNext(Unit.Default);
});
ys.OnNext(Unit.Default);
xs.OnNext(1);
Thread.Sleep(200);
xs.OnNext(2);
Thread.Sleep(600);
xs.OnNext(3);
Thread.Sleep(400);
xs.OnNext(4);
Thread.Sleep(300);
xs.OnNext(5);
Thread.Sleep(900);
xs.OnNext(6);
Thread.Sleep(100);
xs.OnNext(7);
Thread.Sleep(1000);
My Result:
1-2-3
4-5
6-7

What you need is something to buffer the values and then when the worker
is ready it asks for the current buffer and then resets it. This can
be done with a combination of RX and Task
class TicTac<Stuff> {
private TaskCompletionSource<List<Stuff>> Items = new TaskCompletionSource<List<Stuff>>();
List<Stuff> in = new List<Stuff>();
public void push(Stuff stuff){
lock(this){
if(in == null){
in = new List<Stuff>();
Items.SetResult(in);
}
in.Add(stuff);
}
}
private void reset(){
lock(this){
Items = new TaskCompletionSource<List<Stuff>>();
in = null;
}
}
public async Task<List<Stuff>> Items(){
List<Stuff> list = await Items.Task;
reset();
return list;
}
}
then
var tictac = new TicTac<double>();
IObservable<double> source = ....
source.Subscribe(x=>tictac.Push(x));
Then in your worker
while(true){
var items = await tictac.Items();
Thread.Sleep(100);
for each (item in items){
Console.WriteLine(item);
}
}

The way I have done this before is to pull up the ObserveOn method in DotPeek/Reflector and take that queuing concept that it has and adapt it to our requirements. For example, in UI applications with fast ticking data (like finance) the UI thread can get flooded with events and sometimes it cant update quick enough. In these cases we want to drop all events except the last one (for a particular instrument). In this case we changed the internal Queue of the ObserveOn to a single value of T (look for ObserveLatestOn(IScheduler)). In your case you want the Queue, however you want to push the whole queue not just the first value. This should get you started.

Kind of an expansion of #Enigmativity's answer. I have used this to solve the problem:
public static IObservable<(Action ready, IReadOnlyList<T> values)> BufferUntilReady<T>(this IObservable<T> stream)
{
var gate = new BehaviorSubject<Guid>(Guid.NewGuid());
void Ready() => gate.OnNext(Guid.NewGuid());
return stream.Publish(shared => shared
.Buffer(gate.CombineLatest(shared, ValueTuple.Create)
.DistinctUntilChanged(new AnyEqualityComparer<Guid, T>()))
.Where(x => x.Any())
.Select(x => ((Action) Ready, (IReadOnlyList<T>) x)));
}
public class AnyEqualityComparer<T1, T2> : IEqualityComparer<(T1 a, T2 b)>
{
public bool Equals((T1 a, T2 b) x, (T1 a, T2 b) y) => Equals(x.a, y.a) || Equals(x.b, y.b);
public int GetHashCode((T1 a, T2 b) obj) => throw new NotSupportedException();
}
The subscriber receives a Ready() function to be called when ready to receive next buffer. I don't observe each buffer on the same thread to avoid cycles, but I guess you could break it some other place, if you need each buffer to be handled on the same thread.

Related

In RX how to create buffers no faster than they can be processed

Using RX's buffer operator allows the creation of batches after a certain number of results have appeared, or after a specified time, whichever is sooner. This is very useful when piping results to, say, a database on another machine, where one wants to keep latency down, but avoid sending huge numbers of requests (one per result).
I have an additional requirement, which is to preserve the ordering of results into the database (some are updates, which must come after the corresponding adds). This means that outgoing requests cannot overlap in case they get out of order.
Ideally each buffer should continue filling up even after it would normally emit if a previous database request has not yet returned, as this will minimise latency and the number of requests going to the database.
How could the following code be modified to make this work?
source
.Buffer(TimeSpan.FromSeconds(1), 25)
.Subscribe(async batch => await SendToDatabase(batch));
To force outgoing requests to wait until the previous one has returned before being processed, there is an RX trick which turns each result into an observable which completes only when it has finished processing. By combining these with concat the next will not be started until the previous one completes.
source
.Buffer(TimeSpan.FromSeconds(1), 25)
.Select(batch =>
Observable.FromAsync(async () =>
await SendToDatabase(batch)
)
)
.Concat()
.Subscribe(async batch => await SendToDatabase(batch));
This will still produce batches while waiting, though, so is not a perfect solution.
I have written a new observable extension BufferAndAct which does this.
In summary, it takes a time interval, a number (of items), and an action to be applied to each batch. It tries to act on a batch when the time interval expires or when the number of items has been reached, but it will never start acting on a new batch until the previous one has completed, so there is no limit on the potential size of a batch. Modifications could be made to bring this in line with some of the other overloads of Buffer.
It uses a further extension Split which acts like one of the overloads of Buffer, turning an observable of source items into an observable of observables of source items, splitting them when a signal is received from an input observable.
BufferAndAct uses Split to create an observable which gives a tick when a normal, timed, buffer would be emitted on the source observable, and is reset when the actual buffer is released. This could be later, because there is another observable which ticks when there is no request currently in progress. By zipping these two ticks together, Buffer can be used to emit a batch as soon as both criteria are met.
Usage is as follows:
source
.BufferAndAct(TimeSpan.FromSeconds(1), 25, async batch =>
await SendToDatabase(batch)
)
.Subscribe(r => {})
And the source for both extensions:
public static IObservable<TDest> BufferAndAct<TSource, TDest>(
this IObservable<TSource> source,
TimeSpan timeSpan,
int count,
Func<IList<TSource>, Task<TDest>> action
)
{
return new AnonymousObservable<TDest>(observer =>
{
var actionStartedObserver = new Subject<Unit>();
var actionCompleteObserver = new Subject<Unit>();
var published = source.Publish();
var batchReady = published.Select(i => Unit.Default).Split(actionStartedObserver).Select(s => s.Buffer(timeSpan, count).Select(u => Unit.Default).Take(1)).Concat();
var disposable = published.Buffer(Observable.Zip(actionCompleteObserver.StartWith(Unit.Default), batchReady)).SelectMany(async list =>
{
actionStartedObserver.OnNext(Unit.Default);
try
{
return await action(list);
}
finally
{
actionCompleteObserver.OnNext(Unit.Default);
}
}).Finally(() => {}).Subscribe(observer);
published.Connect();
return Disposable.Create(() =>
{
disposable.Dispose();
actionCompleteObserver.Dispose();
});
});
}
public static IObservable<Unit> BufferAndAct<TSource>(
this IObservable<TSource> source,
TimeSpan timeSpan,
int count,
Func<IList<TSource>, Task> action
)
{
return BufferAndAct(source, timeSpan, count, s =>
{
action(s);
return Task.FromResult(Unit.Default);
});
}
public static IObservable<IObservable<TSource>> Split<TSource>(
this IObservable<TSource> source,
IObservable<Unit> boundaries
)
{
return Observable.Create<IObservable<TSource>>(observer =>
{
var tuple = Split(observer);
var d1 = boundaries.Subscribe(tuple.Item2);
var d2 = source.Subscribe(tuple.Item1);
return Disposable.Create(() =>
{
d2.Dispose();
d1.Dispose();
});
});
}
private static Tuple<IObserver<TSource>, IObserver<Unit>> Split<TSource>(this IObserver<IObservable<TSource>> output)
{
ReplaySubject<TSource> obs = null;
var completed = 0; // int not bool to use in interlocked
Action newObservable = () =>
{
obs?.OnCompleted();
obs = new ReplaySubject<TSource>();
output.OnNext(obs);
};
Action completeOutput = () =>
{
if (Interlocked.CompareExchange(ref completed, 0, 1) == 1)
{
output.OnCompleted();
}
};
newObservable();
return new Tuple<IObserver<TSource>, IObserver<Unit>>(Observer.Create<TSource>(obs.OnNext, output.OnError, () =>
{
obs.OnCompleted();
completeOutput();
}), Observer.Create<Unit>(s => newObservable(), output.OnError, () => completeOutput()));
}

FIFO Queue improvement for C#

hi i am working on an assignment and i should implement a queue which handles jobs waiting to be processed (producer-consumer problem). I have to develop a better queue that works more efficiently than the FIFO queue. There are parameters that describe the waiting time before the starvation occurs, the time they need to process after the queue is over for them. consumers come at a specified time, can wait for specified time and they take some time to execute whatever they wanna do when their turn has come. can you help me with a better queue rather than FIFO method?
First of all you are trying to solve different problems at the same time, if you want to improve the performance of the regular queue, you can implement a queue based on a priority of the elements(a heap) if you want to maintain the priority of the regular queue you can put a priority based on an integer, increasing that number every time you add an element into the heap.
Here I am attaching the first link that I found on google for
Priority queue. The order of the insertion is O(log n) if you use a Binary Heap
Now if you want to implement that queue allowing concurrency, you need to isolate the common resource(for example the basic structure where the heap store the elements).
Albahari is a good reference to see how producer-consumer works with the concurrency.
And here are all the classes that you can use to implement the concurrency for producer-consumer Concurrency sheet
I am adding an example with one of those Types
//BlockingCollection with a fix number of Products to put, it works with 10 items max on the collection
class Program
{
private static int counter = 1;
private static BlockingCollection<Product> products =
new BlockingCollection<Product>(10);
static void Main(string[] args)
{
//three producers
Task.Run(() => Producer());
Task.Run(() => Producer());
Task.Run(() => Producer());
Task.Run(() => Consumer());
Console.ReadLine();
}
static void Producer()
{
while (true)
{
var product = new Product()
{
Number = counter,
Name = "Product " + counter++
};
//Adding one element
Console.WriteLine("Producing: " + product);
products.Add(product);
Thread.Sleep(2000);
}
}
static void Consumer()
{
while (true)
{
//wait until exist one element
if (products.Count == 0)
continue;
var product = products.Take();
Console.WriteLine("Consuming: " + product);
Thread.Sleep(2000);
}
}
}
public class Product
{
public int Number { get; set; }
public string Name { get; set; }
public override string ToString()
{
return Name;
}
}

How to use Rx to deliver events on a schedule?

I'm using Reactive Extensions for C#. I want several threads to enqueue items on a ConcurrentQueue. Then I want to Subscribe to that queue, but only get 1 element every 1 second. This answer almost works, but not when I add more elements to the queue.
Given a queue of ints: [1, 2, 3, 4, 5, 6]. I want Subscribe(Console.WriteLine) to print a value every second. I want to add more ints from another thread onto the queue while Rx is printing these numbers out. Any ideas?
To pace an input stream to output no faster than at a rate described by a Timespan interval, use this:
var paced = input.Select(i => Observable.Empty<T>()
.Delay(interval)
.StartWith(i)).Concat();
See here for an explanation. Here's an example implementation tailored to a concurrent queue that dequeues quickly. Note that using the ToObservable extension of IEnumerable<T> to convert ConcurrentQueue<T> to an observable directly would be a mistake, because sadly this observable completes as soon as the queue is empty. It's jolly annoying that - at least as far as I can see - there's no asynchronous dequeue on a ConcurrentQueue<T> and so I had to introduce a polling mechanism. Other abstractions (e.g. BlockingCollection<T>) may serve you better!
public static class ObservableExtensions
{
public static IObservable<T> Pace<T>(this ConcurrentQueue<T> queue,
TimeSpan interval)
{
var source = Observable.Create<T>(async (o, ct) => {
while(!ct.IsCancellationRequested)
{
T next;
while(queue.TryDequeue(out next))
o.OnNext(next);
// You might want to use some arbitrary shorter interval here
// to allow the stream to resume after a long delay in source
// events more promptly
await Task.Delay(interval, ct);
}
ct.ThrowIfCancellationRequested();
});
// this does the pacing
return source.Select(i => Observable.Empty<T>()
.Delay(interval)
.StartWith(i)).Concat()
.Publish().RefCount(); // to allow multiple subscribers
}
}
Example usage:
public static void Main()
{
var queue = new ConcurrentQueue<int>();
var stopwatch = new Stopwatch();
queue.Pace(TimeSpan.FromSeconds(1))
.Subscribe(
x => Console.WriteLine(stopwatch.ElapsedMilliseconds + ": x" + x),
e => Console.WriteLine(e.Message),
() => Console.WriteLine("Done"));
stopwatch.Start();
queue.Enqueue(1);
queue.Enqueue(2);
Thread.Sleep(500);
queue.Enqueue(3);
Thread.Sleep(5000);
queue.Enqueue(4);
queue.Enqueue(5);
queue.Enqueue(6);
Console.ReadLine();
}
May be you will satisfied with one of Observable.Buffer overload. But consider not to use buffering with long running subsriptions because buffered elements can stress your RAM.
You can also build you own extension method with any desired behavior using Observable.Generate
void Main()
{
var queue = new ConcurrentQueue<int>();
queue.Enqueue(1);
queue.Enqueue(2);
queue.Enqueue(3);
queue.Enqueue(4);
queue.ObserveEach(TimeSpan.FromSeconds(1)).DumpLive("queue");
}
// Define other methods and classes here
public static class Ex {
public static IObservable<T> ObserveConcurrentQueue<T>(this ConcurrentQueue<T> queue, TimeSpan period)
{
return Observable
.Generate(
queue,
x => true,
x => x,
x => x.DequeueOrDefault(),
x => period)
.Where(x => !x.Equals(default(T)));
}
public static T DequeueOrDefault<T>(this ConcurrentQueue<T> queue)
{
T result;
if (queue.TryDequeue(out result))
return result;
else
return default(T);
}
}

Parallel ForEach wait 500 ms before spawning

I have this situation:
var tasks = new List<ITask> ...
Parallel.ForEach(tasks, currentTask => currentTask.Execute() );
Is it possible to instruct PLinq to wait for 500ms before the next thread is spawned?
System.Threading.Thread.Sleep(5000);
You are using Parallel.Foreach totally wrong, You should make a special Enumerator that rate limits itself to getting data once every 500 ms.
I made some assumptions on how your DTO works due to you not providing any details.
private IEnumerator<SomeResource> GetRateLimitedResource()
{
SomeResource someResource = null;
do
{
someResource = _remoteProvider.GetData();
if(someResource != null)
{
yield return someResource;
Thread.Sleep(500);
}
} while (someResource != null);
}
here is how your paralell should look then
Parallel.ForEach(GetRateLimitedResource(), SomeFunctionToProcessSomeResource);
There are already some good suggestions. I would agree with others that you are using PLINQ in a manner it wasn't meant to be used.
My suggestion would be to use System.Threading.Timer. This is probably better than writing a method that returns an IEnumerable<> that forces a half second delay, because you may not need to wait the full half second, depending on how much time has passed since your last API call.
With the timer, it will invoke a delegate that you've provided it at the interval you specify, so even if the first task isn't done, a half second later it will invoke your delegate on another thread, so there won't be any extra waiting.
From your example code, it sounds like you have a list of tasks, in this case, I would use System.Collections.Concurrent.ConcurrentQueue to keep track of the tasks. Once the queue is empty, turn off the timer.
You could use Enumerable.Aggregate instead.
var task = tasks.Aggregate((t1, t2) =>
t1.ContinueWith(async _ =>
{ Thread.Sleep(500); return t2.Result; }));
If you don't want the tasks chained then there is also the overload to Select assuming the tasks are in order of delay.
var tasks = Enumerable
.Range(1, 10)
.Select(x => Task.Run(() => x * 2))
.Select((x, i) => Task.Delay(TimeSpan.FromMilliseconds(i * 500))
.ContinueWith(_ => x.Result));
foreach(var result in tasks.Select(x => x.Result))
{
Console.WriteLine(result);
}
From the comments a better options would be to guard the resource instead of using the time delay.
static object Locker = new object();
static int GetResultFromResource(int arg)
{
lock(Locker)
{
Thread.Sleep(500);
return arg * 2;
}
}
var tasks = Enumerable
.Range(1, 10)
.Select(x => Task.Run(() => GetResultFromResource(x)));
foreach(var result in tasks.Select(x => x.Result))
{
Console.WriteLine(result);
}
In this case how about a Producer-Consumer pattern with a BlockingCollection<T>?
var tasks = new BlockingCollection<ITask>();
// add tasks, if this is an expensive process, put it out onto a Task
// tasks.Add(x);
// we're done producin' (allows GetConsumingEnumerable to finish)
tasks.CompleteAdding();
RunTasks(tasks);
With a single consumer thread:
static void RunTasks(BlockingCollection<ITask> tasks)
{
foreach (var task in tasks.GetConsumingEnumerable())
{
task.Execute();
// this may not be as accurate as you would like
Thread.Sleep(500);
}
}
If you have access to .Net 4.5 you can use Task.Delay:
static void RunTasks(BlockingCollection<ITask> tasks)
{
foreach (var task in tasks.GetConsumingEnumerable())
{
Task.Delay(500)
.ContinueWith(() => task.Execute())
.Wait();
}
}

C# multithreading return

I am not familiar with multithreading. Image I have a method to do some intensive search on a string, and return 2 lists of integers as out parameters.
public static void CalcModel(string s, out List<int> startPos, out List<int> len)
{
// Do some intensive search
}
The search on long string is very time consuming. So I want to split the string into several fragments, search with multithreads, and recombine the result (adjust the startPos accordingly).
How to integrate multithreading in this kinda process? Thanks
I forgot to mention the following two things:
I want to set a string length cutoff, and let the code to decide how many fragments it needs.
I had a hard time to associate the startPos of each fragments (on the original string) with the thread. How can I do that?
Rather than get too bogged down in details, generally, you send each thread a "return object." Once you've started all the threads, you block on them and wait until they are all finished.
While each thread is running, the thread modifies its work object and terminates when it has produced the output.
So roughly this (I can't tell exactly how you want to split it up, so perhaps you can modify this):
public class WorkItem {
public string InputString;
public List<int> startPos;
public List<int> len;
}
public static void CalcLotsOfStrings(string s, out List<int> startPos, out List<int> len)
{
WorkItem wi1 = new WorkItem();
wi1.InputString = s;
Thread t1 = new Thread(InternalCalcThread1);
t1.Start(wi1);
WorkItem wi2 = new WorkItem();
wi2.InputString = s;
Thread t2 = new Thread(InternalCalcThread2);
t2.Start(wi2);
// You can now wait for the threads to complete or start new threads
// When you're done, wi1 and wi2 will be filled with the updated data
// but make sure not to use them until the threads are done!
}
public static void InternalCalcThread1(object item)
{
WorkItem w = item as WorkItem;
w.startPos = new List<int>();
w.len = new List<int>();
// Do work here - populate the work item data
}
public static void InternalCalcThread2(object item)
{
// Do work here
}
You can try this, but I am not sure about the performance on these methods
Parallel.Invoke(
() => CalcModel(s,startPos, len),
() => CalcModel(s,startPos, len)
);
To create and run multiple threads is a very easy task. All you need is method which acts as a starting point for a thread.
Suppose you have the CalcModel method as defined in your original post then you only have to do:
// instantiate the thread with a method as starting point
Thread t = new Thread(new ThreadStart(CalcModel));
// run the thread
t.Start();
However if you want the thread to return some values you might apply a little trick because you can't return values directly like you do it with a return statement or out parameters.
You can 'wrap' the thread in its own class and let him store its data in the class's fields:
public class ThreadClass {
public string FieldA;
public string FieldB;
//...
public static void Run () {
Thread t = new Thread(new ThreadStart(_run));
t.Start();
}
private void _run() {
//...
fieldA = "someData";
fieldB = "otherData"
//...
}
}
That's only a very rough example to illustrate the idea. I doesn't include any parts for thread synchronization or thread control.
I would say the more difficult task would be to think about splitting your CalcModel method in a way that it can be parallelized and then maybe more important how the partially results can be joined together to form one single end solution.

Categories

Resources