I'm trying to implement a consumer in C#. There are many publishers which could be executing concurrently. I've created three examples, one with Rx and subject, one with BlockingCollection and a third using ToObservable from the BlockingCollection. They all do the same thing in this simple example and I want them to work with multiple producers.
What are the different qualities of each approach?
I'm already using Rx, so I'd prefer this approach. But I'm concerned that OnNext has no thread safe guarantee and I don't know what the queuing semantics are of Subject and the default scheduler.
Is there a thread safe subject?
Are all messages going to be processed?
Are there any other scenarios when this wont work? Is it processing concurrently?
void SubjectOnDefaultScheduler()
{
var observable = new Subject<long>();
observable.
ObserveOn(Scheduler.Default).
Subscribe(i => { DoWork(i); });
observable.OnNext(1);
observable.OnNext(2);
observable.OnNext(3);
}
Not Rx, but easily adapted to use/subscribe it. It takes an item and then processes it. This should happen serially.
void BlockingCollectionAndConsumingTask()
{
var blockingCollection = new BlockingCollection<long>();
var taskFactory = new TaskFactory();
taskFactory.StartNew(() =>
{
foreach (var i in blockingCollection.GetConsumingEnumerable())
{
DoWork(i);
}
});
blockingCollection.Add(1);
blockingCollection.Add(2);
blockingCollection.Add(3);
}
Using a blocking collection a bit like a subject seems like a good compromise. I'm guessing implicitly will schedule onto task, so that I can use async/await, is that correct?
void BlockingCollectionToObservable()
{
var blockingCollection = new BlockingCollection<long>();
blockingCollection.
GetConsumingEnumerable().
ToObservable(Scheduler.Default).
Subscribe(i => { DoWork(i); });
blockingCollection.Add(1);
blockingCollection.Add(2);
blockingCollection.Add(3);
}
Subject is not thread-safe. OnNexts issued concurrently will directly call an Observer concurrently. Personally I find this quite surprising given the extent to which other areas of Rx enforce the correct semantics. I can only assume this was done for performance considerations.
Subject is kind of a half-way house though, in that it does enforce termination with OnError or OnComplete - after either of these are raised, OnNext is a NOP. And this behaviour is thread-safe.
But use Observable.Synchronize() on a Subject and it will force outgoing calls to obey the proper Rx semantics. In particular, OnNext calls will block if made concurrently.
The underlying mechanism is the standard .NET lock. When the lock is contended by multiple threads they are granted the lock on a first-come first-served basis most of the time. There are certain conditions where fairness is violated. However, you will definitely get the serialized access you are looking for.
ObserveOn has behaviour that is platform specific - if available, you can supply a SynchronizationContext and OnNext calls are Posted to it. With a Scheduler, it ends up putting calls onto a ConcurrentQueue<T> and dispatching them serially via the scheduler - so the thread of execution will depend on the scheduler. Either way, the queuing behaviour will also enforce the correct semantics.
In both cases (Synchronize & ObserveOn), you certainly won't lose messages. With ObserveOn, you can implicitly choose thread you'll process messages on by your choice of Scheduler/Context, with Synchronize you'll process messages on the calling thread. Which is better will depend on your scenario.
There's more to consider as well - such as what you want to do if your producers out-pace your consumer.
You might want to have a look at Rxx Consume as well: http://rxx.codeplex.com/SourceControl/changeset/view/63470#1100703
Sample code showing Synchronize behaviour (Nuget Rx-Testing, Nunit) - it's a bit hokey with the Thread.Sleep code but it's quite fiddly to be bad and I was lazy :):
public class SubjectTests
{
[Test]
public void SubjectDoesNotRespectGrammar()
{
var subject = new Subject<int>();
var spy = new ObserverSpy(Scheduler.Default);
var sut = subject.Subscribe(spy);
// Swap the following with the preceding to make this test pass
//var sut = subject.Synchronize().Subscribe(spy);
Task.Factory.StartNew(() => subject.OnNext(1));
Task.Factory.StartNew(() => subject.OnNext(2));
Thread.Sleep(2000);
Assert.IsFalse(spy.ConcurrencyViolation);
}
private class ObserverSpy : IObserver<int>
{
private int _inOnNext;
public ObserverSpy(IScheduler scheduler)
{
_scheduler = scheduler;
}
public bool ConcurrencyViolation = false;
private readonly IScheduler _scheduler;
public void OnNext(int value)
{
var isInOnNext = Interlocked.CompareExchange(ref _inOnNext, 1, 0);
if (isInOnNext == 1)
{
ConcurrencyViolation = true;
return;
}
var wait = new ManualResetEvent(false);
_scheduler.Schedule(TimeSpan.FromSeconds(1), () => wait.Set());
wait.WaitOne();
_inOnNext = 0;
}
public void OnError(Exception error)
{
}
public void OnCompleted()
{
}
}
}
Related
I've been trying to implement a simple producer-consumer pattern using Rx and observable collections. I also need to be able to throttle the number of subscribers easily. I have seen lots of references to LimitedConcurrencyLevelTaskScheduler in parallel extensions but I don't seem to be able to get this to use multiple threads.
I think I'm doing something silly so I was hoping someone could explain what. In the unit test below, I expect multiple (2) threads to be used to consume the strings in the blocking collection. What am I doing wrong?
[TestClass]
public class LimitedConcurrencyLevelTaskSchedulerTestscs
{
private ConcurrentBag<string> _testStrings = new ConcurrentBag<string>();
ConcurrentBag<int> _threadIds= new ConcurrentBag<int>();
[TestMethod]
public void WhenConsumingFromBlockingCollection_GivenLimitOfTwoThreads_TwoThreadsAreUsed()
{
// Setup the command queue for processing combinations
var commandQueue = new BlockingCollection<string>();
var taskFactory = new TaskFactory(new LimitedConcurrencyLevelTaskScheduler(2));
var scheduler = new TaskPoolScheduler(taskFactory);
commandQueue.GetConsumingEnumerable()
.ToObservable(scheduler)
.Subscribe(Go, ex => { throw ex; });
var iterationCount = 100;
for (int i = 0; i < iterationCount; i++)
{
commandQueue.Add(string.Format("string {0}", i));
}
commandQueue.CompleteAdding();
while (!commandQueue.IsCompleted)
{
Thread.Sleep(100);
}
Assert.AreEqual(iterationCount, _testStrings.Count);
Assert.AreEqual(2, _threadIds.Distinct().Count());
}
private void Go(string testString)
{
_testStrings.Add(testString);
_threadIds.Add(Thread.CurrentThread.ManagedThreadId);
}
}
Everyone seems to go through the same learning curve with Rx. The thing to understand is that Rx doesn't do parallel processing unless you explicitly make a query that forces parallelism. Schedulers do not introduce parallelism.
Rx has a contract of behaviour that says zero or more values are produced in series (regardless of how many threads might be used), one after another, with no overlap, finally to be followed by an optional single error or a single complete message, and then nothing else.
This is often written as OnNext*(OnError|OnCompleted).
All that schedulers do is define the rule to determine which thread a new value is processed on if the scheduler has no pending values it is processing for the current observable.
Now take your code:
var taskFactory = new TaskFactory(new LimitedConcurrencyLevelTaskScheduler(2));
var scheduler = new TaskPoolScheduler(taskFactory);
This says that the scheduler will run values for a subscription on one of two threads. But it doesn't mean that it will do this for every value produced. Remember, since values are produced in series, one after another, it is better to re-use an existing thread than to go to the high cost of creating a new thread. So what Rx does is re-use the existing thread if a new value is scheduled on the scheduler before the current value is finished being processed.
This is the key - it re-uses the thread if a new value is scheduled before the processing of existing values is complete.
So your code does this:
commandQueue.GetConsumingEnumerable()
.ToObservable(scheduler)
.Subscribe(Go, ex => { throw ex; });
It means that the scheduler will only create a thread when the first value comes along. But by the time the expensive thread creation operation is complete then the code that adds values to the commandQueue is also done so it has queued them all and hence it can more efficiently use a single thread rather than create a costly second one.
To avoid this you need to construct the query to introduce parallelism.
Here's how:
public void WhenConsumingFromBlockingCollection_GivenLimitOfTwoThreads_TwoThreadsAreUsed()
{
var taskFactory = new TaskFactory(new LimitedConcurrencyLevelTaskScheduler(2));
var scheduler = new TaskPoolScheduler(taskFactory);
var iterationCount = 100;
Observable
.Range(0, iterationCount)
.SelectMany(n => Observable.Start(() => n.ToString(), scheduler)
.Do(x => Go(x)))
.Wait();
(iterationCount == _testStrings.Count).Dump();
(2 == _threadIds.Distinct().Count()).Dump();
}
Now, I've used the Do(...)/.Wait() combo to give you the equivalent of a blocking .Subscribe(...) method.
This results is your asserts both returning true.
I have found that by modifying the subscription as follows I can add 5 subscribers but only two threads will process the contents of the collection so this serves my purpose.
for(int i = 0; i < 5; i++)
observable.Subscribe(Go, ex => { throw ex; });
I'd be interested to know if there is a better or more elegant way to achieve this!
Searched hard for a piece of code which does what i want and i am happy with. Reading this and this helped a lot.
I have a scenario where i need a single consumer to be notified by a single producer when new data is available but would also like the consumer to be notified periodically regardless of if new data is available.
It is fine if the consumer is notified more than the reoccurring period but it should not be notified less frequent.
It is possible that multiple notifications for 'new data' occur while the consumer is already notified and working. (So SemaphoreSlim was not a good fit).
Hence, a consumer which is slower than the rate of producer notifications, would not queue up subsequent notifications, they would just "re-signal" that same "data available" flag without affect.
I would also like the consumer to asynchronously wait for the notifications (without blocking a thread).
I have stitched together the below class which wraps around TaskCompletionSource and also uses an internal Timer.
public class PeriodicalNotifier : IDisposable
{
// Need some dummy type since TaskCompletionSource has only the generic version
internal struct VoidTypeStruct { }
// Always reuse this allocation
private static VoidTypeStruct dummyStruct;
private TaskCompletionSource<VoidTypeStruct> internalCompletionSource;
private Timer reSendTimer;
public PeriodicalNotifier(int autoNotifyIntervalMs)
{
internalCompletionSource = new TaskCompletionSource<VoidTypeStruct>();
reSendTimer = new Timer(_ => Notify(), null, 0, autoNotifyIntervalMs);
}
public async Task WaitForNotifictionAsync(CancellationToken cancellationToken)
{
using (cancellationToken.Register(() => internalCompletionSource.TrySetCanceled()))
{
await internalCompletionSource.Task;
// Recreate - to be able to set again upon the next wait
internalCompletionSource = new TaskCompletionSource<VoidTypeStruct>();
}
}
public void Notify()
{
internalCompletionSource.TrySetResult(dummyStruct);
}
public void Dispose()
{
reSendTimer.Dispose();
internalCompletionSource.TrySetCanceled();
}
}
Users of this class can do something like this:
private PeriodicalNotifier notifier = new PeriodicalNotifier(100);
// ... In some task - which should be non-blocking
while (some condition)
{
await notifier.WaitForNotifictionAsync(_tokenSource.Token);
// Do some work...
}
// ... In some thread, producer added new data
notifier.Notify();
Efficiency is important to me, the scenario is of a high frequency data stream, and so i had in mind:
The non-blocking nature of the wait.
I assume Timer is more efficient than recreating Task.Delay and cancelling it if it's not the one to notify.
A concern for the recreation of the TaskCompletionSource
My questions are:
Does my code correctly solve the problem? Any hidden pitfalls?
Am i missing some trivial solution / existing block for this use case?
Update:
I have reached a conclusion that aside from re implementing a more lean Task Completion structure (like in here and here) i have no more optimizations to make. Hope that helps anyone looking at a similar scenario.
Yes, your implementation makes sense but the TaskCompletionSource recreation should be outside the using scope, otherwise the "old" cancellation token may cancel the "new" TaskCompletionSource.
I think using some kind of AsyncManualResetEvent combined with a Timer would be simpler and less error-prone. There's a very nice namespace with async tools in the Visual Studio SDK by Microsoft. You need to install the SDK and then reference the Microsoft.VisualStudio.Threading assembly. Here's an implementation using their AsyncManualResetEvent with the same API:
public class PeriodicalNotifier : IDisposable
{
private readonly Timer _timer;
private readonly AsyncManualResetEvent _asyncManualResetEvent;
public PeriodicalNotifier(TimeSpan autoNotifyInterval)
{
_asyncManualResetEvent = new AsyncManualResetEvent();
_timer = new Timer(_ => Notify(), null, TimeSpan.Zero, autoNotifyInterval);
}
public async Task WaitForNotifictionAsync(CancellationToken cancellationToken)
{
await _asyncManualResetEvent.WaitAsync().WithCancellation(cancellationToken);
_asyncManualResetEvent.Reset();
}
public void Notify()
{
_asyncManualResetEvent.Set();
}
public void Dispose()
{
_timer.Dispose();
}
}
You notify by setting the reset event, asynchronously wait using WaitAsync, enable Cancellation using the WithCancellation extension method and then reset the event. Multiple notifications are "merged" by setting the same reset event.
Subject<Result> notifier = new Subject<Result)();
notifier
.Select(value => Observable.Interval(TimeSpan.FromMilliSeconds(100))
.Select(_ => value)).Switch()
.Subscribe(value => DoSomething(value));
//Some other thread...
notifier.OnNext(...);
This Rx query will keep sending value, every 100 milliseconds, until a new value turns up. Then we notify that value every 100 milliseconds.
If we receive values faster than once every 100 milliseconds, then we basically have the same output as input.
I have an event source which fired by a Network I/O very frequently, based on underlying design, of course the event was always on different thread each time, now I wrapped this event via Rx with: Observable.FromEventPattern(...), now I'm using the TakeWhile(predict) to filter some special event data.
At now, I have some concerns on its thread safety, the TakeWhile(predict) works as a hit and mute, but in concurrent situation, can it still be guaranteed? because I guess the underlying implementation could be(I can't read the source code since it's too complicated...):
public static IObservable<TSource> TakeWhile<TSource>(this IObservable<TSource> source, Func<TSource, bool> predict)
{
ISubject<TSource> takeUntilObservable = new TempObservable<TSource>();
IDisposable dps = null;
// 0 for takeUntilObservable still active, 1 for predict failed, diposed and OnCompleted already send.
int state = 0;
dps = source.Subscribe(
(s) =>
{
/* NOTE here the 'hit and mute' still not thread safe, one thread may enter 'else' and under CompareExchange, but meantime another thread may passed the predict(...) and calling OnNext(...)
* so the CompareExchange here mainly for avoid multiple time call OnCompleted() and Dispose();
*/
if (predict(s) && state == 0)
{
takeUntilObservable.OnNext(s);
}
else
{
// !=0 means already disposed and OnCompleted send, avoid multiple times called via parallel threads.
if (0 == Interlocked.CompareExchange(ref state, 1, 0))
{
try
{
takeUntilObservable.OnCompleted();
}
finally
{
dps.Dispose();
}
}
}
},
() =>
{
try
{
takeUntilObservable.OnCompleted();
}
finally { dps.Dispose(); }
},
(ex) => { takeUntilObservable.OnError(ex); });
return takeUntilObservable;
}
That TempObservable is just a simple implementation of ISubject.
If my guess reasonable, then seems the thread safety can't be guaranteed, means some unexpected event data may still incoming to OnNext(...) because that 'mute' is still on going.
Then I write a simple testing to verify, but out of expectation, the results are all positive:
public class MultipleTheadEventSource
{
public event EventHandler OnSthNew;
int cocurrentCount = 1000;
public void Start()
{
for (int i = 0; i < this.cocurrentCount; i++)
{
int j = i;
ThreadPool.QueueUserWorkItem((state) =>
{
var safe = this.OnSthNew;
if (safe != null)
safe(j, null);
});
}
}
}
[TestMethod()]
public void MultipleTheadEventSourceTest()
{
int loopTimes = 10;
int onCompletedCalledTimes = 0;
for (int i = 0; i < loopTimes; i++)
{
MultipleTheadEventSource eventSim = new MultipleTheadEventSource();
var host = Observable.FromEventPattern(eventSim, "OnSthNew");
host.TakeWhile(p => { return int.Parse(p.Sender.ToString()) < 110; }).Subscribe((nxt) =>
{
//try print the unexpected values, BUT I Never saw it happened!!!
if (int.Parse(nxt.Sender.ToString()) >= 110)
{
this.testContextInstance.WriteLine(nxt.Sender.ToString());
}
}, () => { Interlocked.Increment(ref onCompletedCalledTimes); });
eventSim.Start();
}
// simply wait everything done.
Thread.Sleep(60000);
this.testContextInstance.WriteLine("onCompletedCalledTimes: " + onCompletedCalledTimes);
}
before I do the testing, some friends here suggest me try to use Synchronize<TSource> or ObserveOn to make it thread safe, so any idea on my proceeding thoughts and why the issue not reproduced?
As per your other question, the answer still remains the same: In Rx you should assume that Observers are called in a serialized fashion.
To provider a better answer; Originally the Rx team ensured that the Observable sequences were thread safe, however the performance penalty for well behaved/designed applications was unnecessary. So a decision was taken to remove the thread safety to remove the performance cost. To allow you to opt back into to thread safety you could apply the Synchronize() method which would serialize all method calls OnNext/OnError/OnCompleted. This doesn't mean they will get called on the same thread, but you wont get your OnNext method called while another one is being processed.
The bad news, from memory this happened in Rx 2.0, and you are specifically asking about Rx 1.0. (I am not sure Synchonize() even exists in 1.xx?)
So if you are in Rx v1, then you have this blurry certainty of what is thread safe and what isn't. I am pretty sure the Subjects are safe, but I can't be sure about the factory methods like FromEventPattern.
My recommendation is: if you need to ensure thread safety, Serialize your data pipeline. The easiest way to do this is to use a single threaded IScheduler implementation i.e. DispatcherScheduler or a EventLoopScheduler instance.
Some good news is that when I wrote the book on Rx it did target v1, so this section is very relevant for you http://introtorx.com/Content/v1.0.10621.0/15_SchedulingAndThreading.html
So if your query right now looked like this:
Observable.FromEventPatter(....)
.TakeWhile(x=>x>5)
.Subscribe(....);
To ensure that the pipeline is serialized you can create an EventLoopScheduler (at the cost of dedicating a thread to this):
var scheduler = new EventLoopScheduler();
Observable.FromEventPatter(....)
.ObserveOn(scheduler)
.TakeWhile(x=>x>5)
.Subscribe(....);
I need single-producer, single-consumer FIFO query because
I need to process messages in the order they received.
I need to do this asynchronous because caller should not wait while I'm processing message.
Next message processing should be started only when previous message processing is finished. Sometimes frequency of "receiving" messages is higher than frequency of "processing" messages. But in average I should be able to process all messages, just sometimes I have to "queue" pack of them.
So it's pretty like TCP/IP I think, where you have one producer and one consumer, SOMETIMES you can receive messages faster than you can process, so you have to query them. Where order IS important and where caller absolutely not interested what you doing with that stuff.
This sounds easy enough and I likely can use general Queue for that, but I want to use BlockingCollection for that because I don't want to write any code with ManualResetEvent etc.
How suitable BlockingCollection for my task and probably you can suggest something else?
BlockingCollection class implements IProducerConsumerCollection interface so perfectly fits your requirements.
You can create two Tasks, one for async producer and an other one as consumer worker. Former would add items to BlockingCollection and the latter just consume as soon as new are available in FIFO order.
Producer-consumer sample application using TPL Tasks and BlockingCollection:
class ProducerConsumer
{
private static BlockingCollection<string> queue = new BlockingCollection<string>();
static void Main(string[] args)
{
Start();
}
public static void Start()
{
var producerWorker = Task.Factory.StartNew(() => RunProducer());
var consumerWorker = Task.Factory.StartNew(() => RunConsumer());
Task.WaitAll(producerWorker, consumerWorker);
}
private static void RunProducer()
{
int itemsCount = 100;
while (itemsCount-- > 0)
{
queue.Add(itemsCount + " - " + Guid.NewGuid().ToString());
Thread.Sleep(250);
}
}
private static void RunConsumer()
{
foreach (var item in queue.GetConsumingEnumerable())
{
Console.WriteLine(DateTime.Now.ToString("HH:mm:ss.ffff") + " | " + item);
}
}
}
IProducerConsumerCollection:
Defines methods to manipulate thread-safe collections intended for
producer/consumer usage. This interface provides a unified
representation for producer/consumer collections so that higher level
abstractions such as
System.Collections.Concurrent.BlockingCollection(Of T) can use the
collection as the underlying storage mechanism.
Since it's a queue you need, why not stick to a queue? You can use a Syncrhonized Queue .
I've heard a bunch of podcasts recently about the TPL in .NET 4.0. Most of them describe background activities like downloading images or doing a computation, using tasks so that the work doesn't interfere with a GUI thread.
Most of the code I work on has more of a multiple-producer / single-consumer flavor, where work items from multiple sources must be queued and then processed in order. One example would be logging, where log lines from multiple threads are sequentialized into a single queue for eventual writing to a file or database. All the records from any single source must remain in order, and records from the same moment in time should be "close" to each other in the eventual output.
So multiple threads or tasks or whatever are all invoking a queuer:
lock( _queue ) // or use a lock-free queue!
{
_queue.enqueue( some_work );
_queueSemaphore.Release();
}
And a dedicated worker thread processes the queue:
while( _queueSemaphore.WaitOne() )
{
lock( _queue )
{
some_work = _queue.dequeue();
}
deal_with( some_work );
}
It's always seemed reasonable to dedicate a worker thread for the consumer side of these tasks. Should I write future programs using some construct from the TPL instead? Which one? Why?
You can use a long running Task to process items from a BlockingCollection as suggested by Wilka. Here's an example which pretty much meets your applications requirements. You'll see output something like this:
Log from task B
Log from task A
Log from task B1
Log from task D
Log from task C
Not that outputs from A, B, C & D appear random because they depend on the start time of the threads but B always appears before B1.
public class LogItem
{
public string Message { get; private set; }
public LogItem (string message)
{
Message = message;
}
}
public void Example()
{
BlockingCollection<LogItem> _queue = new BlockingCollection<LogItem>();
// Start queue listener...
CancellationTokenSource canceller = new CancellationTokenSource();
Task listener = Task.Factory.StartNew(() =>
{
while (!canceller.Token.IsCancellationRequested)
{
LogItem item;
if (_queue.TryTake(out item))
Console.WriteLine(item.Message);
}
},
canceller.Token,
TaskCreationOptions.LongRunning,
TaskScheduler.Default);
// Add some log messages in parallel...
Parallel.Invoke(
() => { _queue.Add(new LogItem("Log from task A")); },
() => {
_queue.Add(new LogItem("Log from task B"));
_queue.Add(new LogItem("Log from task B1"));
},
() => { _queue.Add(new LogItem("Log from task C")); },
() => { _queue.Add(new LogItem("Log from task D")); });
// Pretend to do other things...
Thread.Sleep(1000);
// Shut down the listener...
canceller.Cancel();
listener.Wait();
}
I know this answer is about a year late, but take a look at MSDN.
which shows how to create a LimitedConcurrencyLevelTaskScheduler from the TaskScheduler class. By limiting the concurrency to a single task, that should then process your tasks in order as they are queued via:
LimitedConcurrencyLevelTaskScheduler lcts = new LimitedConcurrencyLevelTaskScheduler(1);
TaskFactory factory = new TaskFactory(lcts);
factory.StartNew(()=>
{
// your code
});
I'm not sure that TPL is adequate in your use case. From my understanding the main use case for TPL is to split one huge task into several smaller tasks that can be run side by side. For example if you have a big list and you want to apply the same transformation on each element. In this case you can have several tasks applying the transformation on a subset of the list.
The case you describe doesn't seem to fit in this picture for me. In your case you don't have several tasks that do the same thing in parallel. You have several different tasks that each does is own job (the producers) and one task that consumes. Perhaps TPL could be used for the consumer part if you want to have multiple consumers because in this case, each consumer does the same job (assuming you find a logic to enforce the temporal consistency you look for).
Well, this of course is just my personnal view on the subject
Live long and prosper
It sounds like BlockingCollection would be handy for you. So for your code above, you could use something like (assuming _queue is a BlockingCollection instance):
// for your producers
_queue.Add(some_work);
A dedicated worker thread processing the queue:
foreach (var some_work in _queue.GetConsumingEnumerable())
{
deal_with(some_work);
}
Note: when all your producers have finished producing stuff, you'll need to call CompleteAdding() on _queue otherwise your consumer will be stuck waiting for more work.