How to manage observable subscription for dependent observables? - c#

This sample console application has 2 observables. The first one pushes numbers from 1 to 100. This observable is subscribed by the AsyncClass which runs a long running process for each number it gets. Upon completion of this new async process I want to be able to 'push' to 2 subscribers which would be doing something with this new value.
My attempts are commented in the source code below.
AsyncClass:
class AsyncClass
{
private readonly IConnectableObservable<int> _source;
private readonly IDisposable _sourceDisposeObj;
public IObservable<string> _asyncOpObservable;
public AsyncClass(IConnectableObservable<int> source)
{
_source = source;
_sourceDisposeObj = _source.Subscribe(
ProcessArguments,
ExceptionHandler,
Completed
);
_source.Connect();
}
private void Completed()
{
Console.WriteLine("Completed");
Console.ReadKey();
}
private void ExceptionHandler(Exception exp)
{
throw exp;
}
private void ProcessArguments(int evtArgs)
{
Console.WriteLine("Argument being processed with value: " + evtArgs);
//_asyncOpObservable = LongRunningOperationAsync("hello").Publish();
// not going to work either since this creates a new observable for each value from main observer
}
// http://rxwiki.wikidot.com/101samples
public IObservable<string> LongRunningOperationAsync(string param)
{
// should not be creating an observable here, rather 'pushing' values?
return Observable.Create<string>(
o => Observable.ToAsync<string, string>(DoLongRunningOperation)(param).Subscribe(o)
);
}
private string DoLongRunningOperation(string arg)
{
return "Hello";
}
}
Main:
static void Main(string[] args)
{
var source = Observable
.Range(1, 100)
.Publish();
var asyncObj = new AsyncClass(source);
var _asyncTaskSource = asyncObj._asyncOpObservable;
var ui1 = new UI1(_asyncTaskSource);
var ui2 = new UI2(_asyncTaskSource);
}
UI1 (and UI2, they're basically the same):
class UI1
{
private IConnectableObservable<string> _asyncTaskSource;
private IDisposable _taskSourceDisposable;
public UI1(IConnectableObservable<string> asyncTaskSource)
{
_asyncTaskSource = asyncTaskSource;
_asyncTaskSource.Connect();
_taskSourceDisposable = _asyncTaskSource.Subscribe(RefreshUI, HandleException, Completed);
}
private void Completed()
{
Console.WriteLine("UI1: Stream completed");
}
private void HandleException(Exception obj)
{
Console.WriteLine("Exception! "+obj.Message);
}
private void RefreshUI(string obj)
{
Console.WriteLine("UI1: UI refreshing with value "+obj);
}
}
This is my first project with Rx so let me know if I should be thinking differently. Any help would be highly appreciated!

I'm going to let you know you should be thinking differently... :) Flippancy aside, this looks like a case of bad collision between object-oriented and functional-reactive styles.
It's not clear what the requirements are around timing of the data flow and caching of results here - the use of Publish and IConnectableObservable is a little confused. I'm going to guess you want to avoid the 2 downstream subscriptions causing the processing of a value being duplicated? I'm basing some of my answer on that premise. The use of Publish() can achieve this by allowing multiple subscribers to share a subscription to a single source.
Idiomatic Rx wants you to try and keep to a functional style. In order to do this, you want to present the long running work as a function. So let's say, instead of trying to wire your AsyncClass logic directly into the Rx chain as a class, you could present it as a function like this contrived example:
async Task<int> ProcessArgument(int argument)
{
// perform your lengthy calculation - maybe in an OO style,
// maybe creating class instances and invoking methods etc.
await Task.Delay(TimeSpan.FromSeconds(1));
return argument + 1;
}
Now, you can construct a complete Rx observable chain calling this function, and through the use of Publish().RefCount() you can avoid multiple subscribers causing duplicate effort. Note how this separates concerns too - the code processing the value is simpler because the reuse is handled elsewhere.
var query = source.SelectMany(x => ProcessArgument(x).ToObservable())
.Publish().RefCount();
By creating a single chain for subscribers, the work is only started when necessary on subscription. I've used Publish().RefCount() - but if you want to ensure values aren't missed by the second and subsequent subscribers, you could use Replay (easy) or use Publish() and then Connect - but you'll want the Connect logic outside the individual subscriber's code because you just need to call it once when all subscribers have subscribed.

Related

"yield return" from event handler

I have a class which takes a stream in the constructor. You can then set up callbacks for various events, and then call StartProcessing. The issue is that I want to use it from a function which should return an IEnumerable.
Example:
public class Parser
{
public Parser(System.IO.Stream s) { // saves stream and does some set up }
public delegate void OnParsedHandler(List<string> token);
public event OnParsedHandler OnParsedData;
public void StartProcessing()
{
// reads stream and makes callback when it has a whole record
}
}
public class Application
{
public IEnumerable<Thing> GetThings(System.IO.Stream s)
{
Parser p = new Parser(s);
p.OnParsedData += (List<string> str) =>
{
Thing t = new Thing(str[0]);
// here is where I would like to yield
// but I can't
yield return t;
};
p.StartProcessing();
}
}
Right now my solution, which isn't so great, is to put them all the Things into a List which is captured by the lambda, and then iterate over them after calling StartProcessing.
public class Application
{
public IEnumerable<Thing> GetThings(System.IO.Stream s)
{
Parser p = new Parser(s);
List<Thing> thingList = new List<Thing>();
p.OnParsedData += (List<string> str) =>
{
Thing t = new Thing(str[0]);
thingList .Add(t);
};
p.StartProcessing();
foreach(Thing t in thingList )
{
yield return t;
}
}
}
The issue here is that now I have to save all of the Thing objects into list.
The problem you have here is that you don't fundamentally have a "pull" mechanic here, you're trying to push data from the parser. If the parser is going to push data to you, rather than letting the caller pull the data, then GetThings should return an IObservable, rather than an IEnumerable, so the caller can consume the data when it's ready.
If it really is important to have a pull mechanic here then Parser shouldn't fire an event to indicate that it has new data, but rather the caller should be able to ask it for new data and have it get it; it should either return all of the parsed data, or itself return an IEnumerable.
Interesting question. I would like to build upon what #servy has said regarding push and pull. In your implementation above, you are effectively adapting a push mechanism to a pull interface.
Now, first things first. You have not specified whether the call to the StartProcessing() method is a blocking call or not. A couple of remarks regarding that:
If the method is blocking (synchronous), then there is really no point in adapting it to a pull model anyway. The caller will see all the data processed in a single blocking call.
In that regard, receiving the data indirectly via an event handler scatters into two seemingly unrelated constructs what should otherwise be a single, cohesive, explicit operation. For example:
void ProcessAll(Action<Thing> callback);
On the other hand, if the StartProcessing() method actually spawns a new thread (maybe better named BeginProcessing() and follow the Event-based Asynchronous Pattern or another async processing pattern), you could adapt it to a pull machanism by means of a synchronization construct using a wait handle: ManualResetEvent, mutex and the like. Pseudo-code:
public IEnumerable<Thing> GetThings(System.IO.Stream s)
{
var parser = new Parser(s);
var waitable = new AutoResetEvent(false);
Thing item = null;
parser.OnParsedData += (Thing thing) =>
{
item = thing;
waitable.Set();
};
IAsyncResult result = parser.BeginProcessing();
while (!result.IsCompleted)
{
waitable.WaitOne();
yield return item;
}
}
Disclaimer
The above code serves only as a means for presenting an idea. It is not thread-safe and the synchronization mechanics do not work properly. See the producer-consumer pattern for more information.

How to return a data before method complete execution?

I have a slow and expensive method that return some data for me:
public Data GetData(){...}
I don't want to wait until this method will execute. Rather than I want to return a cached data immediately.
I have a class CachedData that contains one property Data cachedData.
So I want to create another method public CachedData GetCachedData() that will initiate a new task(call GetData inside of it) and immediately return cached data and after task will finish we will update the cache.
I need to have thread safe GetCachedData() because I will have multiple request that will call this method.
I will have a light ping "is there anything change?" each minute and if it will return true (cachedData != currentData) then I will call GetCachedData().
I'm new in C#. Please, help me to implement this method.
I'm using .net framework 4.5.2
The basic idea is clear:
You have a Data property which is wrapper around an expensive function call.
In order to have some response immediately the property holds a cached value and performs updating in the background.
No need for an event when the updater is done because you poll, for now.
That seems like a straight-forward design. At some point you may want to use events, but that can be added later.
Depending on the circumstances it may be necessary to make access to the property thread-safe. I think that if the Data cache is a simple reference and no other data is updated together with it, a lock is not necessary, but you may want to declare the reference volatile so that the reading thread does not rely on a stale cached (ha!) version. This post seems to have good links which discuss the issues.
If you will not call GetCachedData at the same time, you may not use lock. If data is null (for sure first run) we will wait long method to finish its work.
public class SlowClass
{
private static object _lock;
private static Data _cachedData;
public SlowClass()
{
_lock = new object();
}
public void GetCachedData()
{
var task = new Task(DoStuffLongRun);
task.Start();
if (_cachedData == null)
task.Wait();
}
public Data GetData()
{
if (_cachedData == null)
GetCachedData();
return _cachedData;
}
private void DoStuffLongRun()
{
lock (_lock)
{
Console.WriteLine("Locked Entered");
Thread.Sleep(5000);//Do Long Stuff
_cachedData = new Data();
}
}
}
I have tested on console application.
static void Main(string[] args)
{
var mySlow = new SlowClass();
var mySlow2 = new SlowClass();
mySlow.GetCachedData();
for (int i = 0; i < 5; i++)
{
Console.WriteLine(i);
mySlow.GetData();
mySlow2.GetData();
}
mySlow.GetCachedData();
Console.Read();
}
Maybe you can use the MemoryCache class,
as explained here in MSDN

What is the best practise for implementing an Rx handler?

I have this class for explaining my problem:
public class DataObserver: IDisposable
{
private readonly List<IDisposable> _subscriptions = new List<IDisposable>();
private readonly SomeBusinessLogicServer _server;
public DataObserver(SomeBusinessLogicServer server, IObservable<SomeData> data)
{
_server = server;
_subscriptions.Add(data.Subscribe(TryHandle));
}
private void TryHandle(SomeData data)
{
try
{
_server.MakeApiCallAsync(data).Wait();
}
catch (Exception)
{
// Handle exceptions somehow!
}
}
public void Dispose()
{
_subscriptions.ForEach(s => s.Dispose());
_subscriptions.Clear();
}
}
A) How can I avoid blocking inside the TryHandle() function?
B) How would you publish exceptions caught inside that function for handling them properly?
The Rx Design Guidelines provide a lot of useful advice when writing your own Rx operators:
http://go.microsoft.com/fwlink/?LinkID=205219
I'm sure I'll get lambasted for linking to an external article, but this link has been good for a couple of years and it's too big to republish on SO.
First, take a look at CompositeDisposable instead of re-implementing it yourself.
Other than that, there are many answers to your question. I have found that the best insight I've had when working with Rx is realizing that most cases where you want to subscribe are really just more chains in the observable you are building and you don't really want to subscribe but instead want to apply yet another transform to the incoming observable. And let some code that is further "on the edge of the system" and has more knowledge of how to handle errors do the actual subscribing
In the example you have presented:
A) Don't block by just transforming the IObservable<SomeData> into an IObservable<Task> (which is really better expressed as an IObservable<IObservable<Unit>>).
B) Publish exceptions by just ending the observable with an error or, if you don't want the exception to end the observable, exposing an IObservable<Exception>.
Here's how I'd re-write your example, assuming you did not want the stream to end on error, but instead just keep running after reporting the errors:
public static class DataObserver
{
public static IObservable<Exception> ApplyLogic(this IObservable<SomeData> source, SomeBusinessLogicServer server)
{
return source
.Select(data =>
{
// execute the async method as an observable<Unit>
// ignore its results, but capture its error (if any) and yield it.
return Observable
.FromAsync(() => server.MakeApiCallAsync(data))
.IgnoreElements()
.Select(_ => (Exception)null) // to cast from IObservable<Unit> to IObservable<Exception>
.Catch((Exception e) => Observable.Return(e));
})
// runs the Api calls sequentially (so they will not run concurrently)
// If you prefer to let the calls run in parallel, then use
// .Merge() instead of .Concat()
.Concat() ;
}
}
// Usage (in Main() perhaps)
IObservable<SomeData> dataStream = ...;
var subscription = dataStream.ApplyLogic(server).Subscribe(error =>
{
Console.WriteLine("An error occurred processing a dataItem: {0}", error);
}, fatalError =>
{
Console.WriteLine("A fatal error occurred retrieving data from the dataStream: {0}", fatalError);
});

Threading and asynchronous operations in C#

I'm an old dog trying to learn a new trick. I'm extremely familiar with a language called PowerBuilder and in that language, when you want to do things asynchronously, you spawn an object in a new thread. I'll reiterate that: the entire object is instantiated in a separate thread and has a different execution context. Any and all methods on that object execute in the context of that separate thread.
Well now, I'm trying to implement some asynchronous executing using C# and the threading model in .NET feels completely different to me. It looks like I'm instantiating objects in one thread but that I can specify (on a call-by-call basis) that certain methods execute in a different thread.
The difference seems subtle, but it's frustrating me. My old-school thinking says, "I have a helper named Bob. Bob goes off and does stuff." The new-school thinking, if I understand it right, is "I am Bob. If I need to, I can sometimes rub my belly and pat my head at the same time."
My real-world coding problem: I'm writing an interface engine that accepts messages via TCP, parses them into usable data, then puts that data into a database. "Parsing" a message takes approximately one second. Depending on the parsed data, the database operation may take less than a second or it might take ten seconds. (All times made up to clarify the problem.)
My old-school thinking tells me that my database class should live in a separate thread and have something like a ConcurrentQueue. It would simply spin on that queue, processing anything that might be in there. The Parser, on the other hand, would need to push messages into that queue. These messages would be (delegates?) things like "Create an order based on the data in this object" or "Update an order based on the data in this object". It might be worth noting that I actually want to process the "messages" in the "queue" in a strict, single-threaded FIFO order.
Basically, my database connection can't always keep up with my parser. I need a way to make sure my parser doesn't slow down while my database processes try to catch up. Advice?
-- edit: with code!
Everyone and everything is telling me to use BlockingCollection. So here's a brief explanation of the end goal and code to go with it:
This will be a Windows service. When started, it will spawn multiple "environments", with each "environment" containing one "dbworker" and one "interface". The "interface" will have one "parser" and one "listener".
class cEnvironment {
private cDBWorker MyDatabase;
private cInterface MyInterface;
public void OnStart () {
MyDatabase = new cDBWorker ();
MyInterface = new cInterface ();
MyInterface.OrderReceived += this.InterfaceOrderReceivedEventHandler;
MyDatabase.OnStart ();
MyInterface.OnStart ();
}
public void OnStop () {
MyInterface.OnStop ();
MyDatabase.OnStop ();
MyInterface.OrderReceived -= this.InterfaceOrderReceivedEventHandler;
}
void InterfaceOrderReceivedEventHandler (object sender, OrderReceivedEventArgs e) {
MyDatabase.OrderQueue.Add (e.Order);
}
}
class cDBWorker {
public BlockingCollection<cOrder> OrderQueue = new BlockingCollection<cOrder> ();
private Task ProcessingTask;
public void OnStart () {
ProcessingTask = Task.Factory.StartNew (() => Process (), TaskCreationOptions.LongRunning);
}
public void OnStop () {
OrderQueue.CompleteAdding ();
ProcessingTask.Wait ();
}
public void Process () {
foreach (cOrder Order in OrderQueue.GetConsumingEnumerable ()) {
switch (Order.OrderType) {
case 1:
SuperFastMethod (Order);
break;
case 2:
ReallySlowMethod (Order);
break;
}
}
}
public void SuperFastMethod (cOrder Order) {
}
public void ReallySlowMethod (cOrder Order) {
}
}
class cInterface {
protected cListener MyListener;
protected cParser MyParser;
public void OnStart () {
MyListener = new cListener ();
MyParser = new cParser ();
MyListener.DataReceived += this.ListenerDataReceivedHandler;
MyListener.OnStart ();
}
public void OnStop () {
MyListener.OnStop ();
MyListener.DataReceived -= this.ListenerDataReceivedHandler;
}
public event OrderReceivedEventHandler OrderReceived;
protected virtual void OnOrderReceived (OrderReceivedEventArgs e) {
if (OrderReceived != null)
OrderReceived (this, e);
}
void ListenerDataReceivedHandler (object sender, DataReceivedEventArgs e) {
foreach (string Message in MyParser.GetMessages (e.RawData)) {
OnOrderReceived (new OrderReceivedEventArgs (MyParser.ParseMessage (Message)));
}
}
It compiles. (SHIP IT!) But does that mean that I'm doing it right?
BlockingCollection makes putting this kind of thing together pretty easy:
// the queue
private BlockingCollection<Message> MessagesQueue = new BlockingCollection<Message>();
// the consumer
private MessageParser()
{
foreach (var msg in MessagesQueue.GetConsumingEnumerable())
{
var parsedMessage = ParseMessage(msg);
// do something with the parsed message
}
}
// In your main program
// start the consumer
var consumer = Task.Factory.StartNew(() => MessageParser(),
TaskCreationOptions.LongRunning);
// the main loop
while (messageAvailable)
{
var msg = GetMessageFromTcp();
// add it to the queue
MessagesQueue.Add(msg);
}
// done receiving messages
// tell the consumer that no more messages will be added
MessagesQueue.CompleteAdding();
// wait for consumer to finish
consumer.Wait();
The consumer does a non-busy wait on the queue, so it's not eating CPU resources when there's nothing available.

Multiple publishers sending concurrent messages to a single subscriber in Retlang?

I need to build an application where some number of instances of an object are generating "pulses", concurrently. (Essentially this just means that they are incrementing a counter.) I also need to track the total counters for each object. Also, whenever I perform a read on a counter, it needs to be reset to zero.
So I was talking to a guy at work, and he mentioned Retlang and message-based concurrency, which sounded super interesting. But obviously I am very new to the concept. So I've built a small prototype, and I get the expected results, which is awesome - but I'm not sure if I've potentially made some logical errors and left the software open to bugs, due to my inexperience with Retlang and concurrent programming in general.
First off, I have these classes:
public class Plc {
private readonly IChannel<Pulse> _channel;
private readonly IFiber _fiber;
private readonly int _pulseInterval;
private readonly int _plcId;
public Plc(IChannel<Pulse> channel, int plcId, int pulseInterval) {
_channel = channel;
_pulseInterval = pulseInterval;
_fiber = new PoolFiber();
_plcId = plcId;
}
public void Start() {
_fiber.Start();
// Not sure if it's safe to pass in a delegate which will run in an infinite loop...
// AND use a shared channel object...
_fiber.Enqueue(() => {
SendPulse();
});
}
private void SendPulse() {
while (true) {
// Not sure if it's safe to use the same channel object in different
// IFibers...
_channel.Publish(new Pulse() { PlcId = _plcId });
Thread.Sleep(_pulseInterval);
}
}
}
public class Pulse {
public int PlcId { get; set; }
}
The idea here is that I can instantiate multiple Plcs, pass each one the same IChannel, and then have them execute the SendPulse function concurrently, which would allow each one to publish to the same channel. But as you can see from my comments, I'm a little skeptical that what I'm doing is actually legit. I'm mostly worried about using the same IChannel object to Publish in the context of different IFibers, but I'm also worried about never returning from the delegate that was passed to Enqueue. I'm hoping some one can provide some insight as to how I should be handling this.
Also, here is the "subscriber" class:
public class PulseReceiver {
private int[] _pulseTotals;
private readonly IFiber _fiber;
private readonly IChannel<Pulse> _channel;
private object _pulseTotalsLock;
public PulseReceiver(IChannel<Pulse> channel, int numberOfPlcs) {
_pulseTotals = new int[numberOfPlcs];
_channel = channel;
_fiber = new PoolFiber();
_pulseTotalsLock = new object();
}
public void Start() {
_fiber.Start();
_channel.Subscribe(_fiber, this.UpdatePulseTotals);
}
private void UpdatePulseTotals(Pulse pulse) {
// This occurs in the execution context of the IFiber.
// If we were just dealing with the the published Pulses from the channel, I think
// we wouldn't need the lock, since I THINK the published messages would be taken
// from a queue (i.e. each Plc is publishing concurrently, but Retlang enqueues
// the messages).
lock(_pulseTotalsLock) {
_pulseTotals[pulse.PlcId - 1]++;
}
}
public int GetTotalForPlc(int plcId) {
// However, this access takes place in the application thread, not in the IFiber,
// and I think there could potentially be a race condition here. I.e. the array
// is being updated from the IFiber, but I think I'm reading from it and resetting values
// concurrently in a different thread.
lock(_pulseTotalsLock) {
if (plcId <= _pulseTotals.Length) {
int currentTotal = _pulseTotals[plcId - 1];
_pulseTotals[plcId - 1] = 0;
return currentTotal;
}
}
return -1;
}
}
So here, I am reusing the same IChannel that was given to the Plc instances, but having a different IFiber subscribe to it. Ideally then I could receive the messages from each Plc, and update a single private field within my class, but in a thread safe way.
From what I understand (and I mentioned in my comments), I think that I would be safe to simply update the _pulseTotals array in the delegate which I gave to the Subscribe function, because I would receive each message from the Plcs serially.
However, I'm not sure how best to handle the bit where I need to read the totals and reset them. As you can see from the code and comments, I ended up wrapping a lock around any access to the _pulseTotals array. But I'm not sure if this is necessary, and I would love to know a) if it is in fact necessary to do this, and why, or b) the correct way to implement something similar.
And finally for good measure, here's my main function:
static void Main(string[] args) {
Channel<Pulse> pulseChannel = new Channel<Pulse>();
PulseReceiver pulseReceiver = new PulseReceiver(pulseChannel, 3);
pulseReceiver.Start();
List<Plc> plcs = new List<Plc>() {
new Plc(pulseChannel, 1, 500),
new Plc(pulseChannel, 2, 250),
new Plc(pulseChannel, 3, 1000)
};
plcs.ForEach(plc => plc.Start());
while (true) {
Thread.Sleep(10000);
Console.WriteLine(string.Format("Plc 1: {0}\nPlc 2: {1}\nPlc 3: {2}\n", pulseReceiver.GetTotalForPlc(1), pulseReceiver.GetTotalForPlc(2), pulseReceiver.GetTotalForPlc(3)));
}
}
I instantiate one single IChannel, pass it to everything, where internally the Receiver subscribes with an IFiber, and where the Plcs use IFibers to "enqueue" a non-returning method which continually publishes to the channel.
Again, the console output looks exactly like I would expect it to look, i.e. I see 20 "pulses" for Plc 1 after waiting 10 seconds. And the resetting of the counters after a read also seems to work, i.e. Plc 1 has 20 "pulses" after each 10 second increment. But that doesn't reassure me that I haven't overlooked something important.
I'm really excited to learn a bit more about Retlang and concurrent programming techniques, so hopefuly someone has the time to sift through my code and offer some suggestions for my specific concerns, or else even a different design based on my requirements!

Categories

Resources