Is a Subject in RX.Net always harmful?

Is a Subject in RX.Net always harmful? - c#

I was talking to a colleague who pointed me to the SO question about subjects being considered harmful. However, I have two cases where I have some non-deterministic code that does not seem reasonable any other way.
Non-standard event:
event handler(class, result)
{
subject.OnNext(result);
}
public delegate void _handler
([MarshalAs(UnmanagedType.Interface), In] MyClass class,
[MarshalAs(UnmanagedType.Interface), In] ResultClass result)
Parallel Tasks (Non-Deterministic number of tasks all running in parallel, starting at different times):
Task.Start(()=> ...).ContinueWith(prevTask => subject.OnNext(prevTask.result))
The subject is not exposed, only through an observable. Is there another route suggested that isnt a ton of boilerplate?

Subjects are not always harmful. There are many legitimate uses of them even within Rx itself. However, many times a person goes to use a Subject, there's already a robust Rx method written for that scenario(and it may or may not be using subjects internally). This is the case for your 2 examples. Look at Task.ToObservable and Observable.FromEventPattern.
Another common case subjects are misused is when a developer breaks a stream in two. They become convinced they need to subscribe to a stream and in the callback they produce data for a new stream. They do this with a Subject. But usually they just should have used Select instead.

Observable.FromEvent
System.FromEvent works for more than just built-in event types: you just need to use the correct overload.
class Program
{
private static event Action<int> MyEvent;
public static void Main(string[] args)
{
Observable.FromEvent<int>(
(handler) => Program.MyEvent += handler,
(handler) => Program.MyEvent -= handler
)
.Subscribe(Console.WriteLine);
Program.MyEvent(5);
Console.ReadLine();
}
}
Task.ToObservable & Merge
If you already have access to all of your tasks, you can convert them to Observables, and Merge them into a single observable.
class Program
{
public static void Main(string[] args)
{
Observable.Merge(
// Async / Await
(
(Func<Task<string>>)
(async () => { await Task.Delay(250); return "async await"; })
)().ToObservable(),
// FromResult
Task.FromResult("FromResult").ToObservable(),
// Run
Task.Run(() => "Run").ToObservable()
)
.Subscribe(Console.WriteLine);
Console.ReadLine();
}
}
Merge Observable
Alternatively, if you do not have all of your tasks up front, you can still use Merge, but you'll need some way of communicating future tasks. In this case, I've used a subject, but you should use the simplest Observable possible to express this. If that's a subject, then by all means, use a subject.
class Program
{
public static void Main(string[] args)
{
// We use a subject here since we don't have all of the tasks yet.
var tasks = new Subject<Task<string>>();
// Make up some tasks.
var fromResult = Task.FromResult("FromResult");
var run = Task.Run(() => "Run");
Func<Task<string>> asyncAwait = async () => {
await Task.Delay(250);
return "async await";
};
// Merge any future Tasks into an observable, and subscribe.
tasks.Merge().Subscribe(Console.WriteLine);
// Send tasks.
tasks.OnNext(fromResult);
tasks.OnNext(run);
tasks.OnNext(asyncAwait());
Console.ReadLine();
}
}
Subjects
Why to use or not to use Subjects is a question I don't have the time to answer adequately. Typically speaking, however, I find that using a Subject tends to be the "easy way out" when it appears an operator does not already exist.
If you can somehow limit the exposure of a subject in terms of it's visibility to the rest of the application, then by all means use a subject and do so. If you're looking for message bus functionality, however, you should rethink the design of the application, as message buses are anti-patterns.

Subjects aren't harmful. That is probably even a little too dogmatic for me (and I am first to boo-boo the use of subjects). I would say that Subjects indicate a code smell. You probably could be doing it better without them, but if you keep the encapsulated within your class then at least you keep the smell in one place.
Here I would say, that you are already using "non-standard" event patterns, and it seems you don't want to, or cant, change that. In this case, it seems the usage of subjects as a bridge isn't going to make it any worse than it is.
If you were starting from scratch, then I would suggest that you deeply think about your design and you will probably find that you just wouldn't need a subject.
Lastly, I agree with the other comments that you should be using a FromEvent and ToTask, but you suggest these do not work. Why? I dont think you provide nearly enough of your code base to help with design questions like this. e.g. How are thee nondeterministic task being created? and by what? What is the actual problem you are trying to solve. If you could provide a full example, you might get the amount of attention you are looking for.

Here is what a good book about the Rx says regarding why and when Subject can be harmful:
http://www.introtorx.com/Content/v1.0.10621.0/18_UsageGuidelines.html
"Avoid the use of the subject types. Rx is effectively a functional
programming paradigm. Using subjects means we are now managing state,
which is potentially mutating. Dealing with both mutating state and
asynchronous programming at the same time is very hard to get right.
Furthermore, many of the operators (extension methods) have been
carefully written to ensure that correct and consistent lifetime of
subscriptions and sequences is maintained; when you introduce
subjects, you can break this. Future releases may also see significant
performance degradation if you explicitly use subjects."

Related

Chaining arbitrary number of tasks together in C#.NET

What I have
I have a set of asynchronous processing methods, similar to:
public class AsyncProcessor<T>
{
//...rest of members, etc.
public Task Process(T input)
{
//Some special processing, most likely inside a Task, so
//maybe spawn a new Task, etc.
Task task = Task.Run(/* maybe private method that does the processing*/);
return task;
}
}
What I want
I would like to chain them all together, to execute in sequential order.
What I tried
I have tried to do the following:
public class CompositeAsyncProcessor<T>
{
private readonly IEnumerable<AsyncProcessor<T>> m_processors;
//Constructor receives the IEnumerable<AsyncProcessor<T>> and
//stores it in the field above.
public Task ProcessInput(T input)
{
Task chainedTask = Task.CompletedTask;
foreach (AsyncProcessor<T> processor in m_processors)
{
chainedTask = chainedTask.ContinueWith(t => processor.Process(input));
}
return chainedTask;
}
}
What went wrong
However, tasks do not run in order because, from what I have understood, inside the call to ContinueWith, the processor.Process(input) call is performed immediately and the method returns independently of the status of the returned task. Therefore, all processing Tasks still begin almost simultaneously.
My question
My question is whether there is something elegant that I can do to chain the tasks in order (i.e. without execution overlap). Could I achieve this using the following statement, (I am struggling a bit with the details), for example?
chainedTask = chainedTask.ContinueWith(async t => await processor.Process(input));
Also, how would I do this without using async/await, only ContinueWith?
Why would I want to do this?
Because my Processor objects have access to, and request things from "thread-unsafe" resources. Also, I cannot just await all the methods because I have no idea about how many they are, so I cannot just write down the necessary lines of code.
What do I mean by thread-unsafe? A specific problem
Because I may be using the term incorrectly, an illustration is a bit better to explain this bit. Among the "resources" used by my Processor objects, all of them have access to an object such as the following:
public interface IRepository
{
void Add(object obj);
bool Remove(object obj);
IEnumerable<object> Items { get; }
}
The implementation currently used is relatively naive. So some Processor objects add things, while others retrieve the Items for inspection. Naturally, one of the exceptions I get all too often is:
InvalidOperationException: Collection was modified, enumeration
operation may not execute.
I could spend some time locking access and pre-running the enumerations. However, this was the second option I would get down to, while my first thought was to just make the processes run sequentially.
Why must I use Tasks?
While I have full control in this case, I could say that for the purposes of the question, I might not be able to change the base implementation, so what would happen if I were stuck with Tasks? Furthermore, the operations actually do represent relatively time-consuming CPU-bound operations plus I am trying to achieve a responsive user interface so I needed to unload some burden to asynchronous operations. While being useful and, in most of my use-cases, not having the necessity to chain multiple of them, rather a single one each time (or a couple, but always specific and of a specific count, so I was able to hook them together without iterations and async/await), one of the use-cases finally necessitated chaining an unknown number of Tasks together.
How I deal with this currently
The way I am dealing with this currently is to append a call to Wait() inside the ContinueWith call, i.e.:
foreach (AsyncProcessor<T> processor in m_processors)
{
chainedTask = chainedTask.ContinueWith(t => processor.Process(input).Wait());
}
I would appreciate any idea on how I should do this, or how I could do it more elegantly (or, "async-properly", so to speak). Also, I would like to know how I can do this without async/await.
Why my question is different from this question, which did not answer my question entirely.
Because the linked question has two tasks, so the solution is to simply write the two lines required, while I have an arbitrary (and unknown) number of tasks, so I need an suitable iteration. Also, my method is not async. I now understand (from the single briefly available answer, which was deleted) that I could do it fairly easily if I changed my method to async and await each processor's Task method, but I still wish to know how this could be achieved without async/await syntax.
Why my question is not a duplicate of the other linked questions
Because none of them explains how to chain correctly using ContinueWith and I am interested in a solution that utilizes ContinueWith and does not make use of the async/await pattern. I know this pattern may be the preferable solution, I want to understand how to (if possible) make arbitrary chaining using ContinueWith calls properly. I now know I don't need ContinueWith. The question is, how do I do it with ContinueWith?

foreach + await will run Processes sequentially.
public async Task ProcessInputAsync(T input)
{
foreach (var processor in m_processors)
{
await processor.Process(input));
}
}
Btw. Process, should be called ProcessAsync

The method Task.ContinueWith does not understand async delegates, like Task.Run do, so when you return a Task it considers this as a normal return value and wraps it in another Task. So you end up receiving a Task<Task> instead of what you expected to get. The problem would be obvious if the AsyncProcessor.Process was returning a generic Task<T>. In this case you would get a compile error because of the illegal casting from Task<Task<T>> to Task<T>. In your case you cast from Task<Task> to Task, which is legal, since Task<TResult> derives from Task.
Solving the problem is easy. You just need to unwrap the Task<Task> to a simple Task, and there is a built-in method Unwrap that does exactly that.
There is another problem that you need to solve though. Currently your code suppresses all exceptions that may occur on each individual AsyncProcessor.Process, which I don't think it was intended. So you must decide which strategy to follow in this case. Are you going to propagate the first exception immediately, or you prefer to cache them all and propagate them at the end bundled in an AggregateException, like the Task.WhenAll does? The example bellow implements the first strategy.
public class CompositeAsyncProcessor<T>
{
//...
public Task Process(T input)
{
Task current = Task.CompletedTask;
foreach (AsyncProcessor<T> processor in m_processors)
{
current = current.ContinueWith(antecessor =>
{
if (antecessor.IsFaulted)
return Task.FromException<T>(antecessor.Exception.InnerException);
return processor.Process(input);
},
CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default
).Unwrap();
}
return current;
}
}
I have used an overload of ContinueWith that allows configuring all the options, because the defaults are not ideal. The default TaskContinuationOptions is None. Configuring it to ExecuteSynchronously you minimize the thread switches, since each continuation will run in the same thread that completed the previous one.
The default task scheduler is TaskScheduler.Current. By specifying TaskScheduler.Default you make it explicit that you want the continuations to run in thread-pool threads (for some exceptional cases that won't be able to run synchronously). The TaskScheduler.Current is context specific, and if it ever surprises you it won't be in a good way.
As you see there are a lot of gotchas with the old-school ContinueWith approach. Using the modern await in a loop is a lot easier to implement, and a lot more difficult to get it wrong.

Using async await pattern from static method when concurrency isn't needed

Is there a benefit to using the async/await pattern when you are running things in a synchronous manner?
For instance in my app I have static methods that are called by cron (hangfire) to do various IO bound tasks. A simple contrived example is this:
static void Run(string[] args)
{
var data = Test(args);
//..do stuff with returned data
}
public static List<string> Test(string[] args)
{
return Db.Select(args);
}
Is there any advantage to writing this code like so:
static void Run(string[] args)
{
var dataTask = await TestAsync(args);
dataTask.Wait();
//..do stuff with returned data
}
public static async Task<List<string>> TestAsync(string[] args)
{
return await Db.SelectAsync(args);
}
My colleague tells me that I should always use this pattern and use async methods if they are available as it adds under the hood optimization but he is unable to explain why that is and I can't really find any clear cut explanation.
If I write my code with this type of pattern in my static methods it ends up looking like:
var data = someMethod();
data.Wait();
var data2 = someOtherMethod(data);
data2.Wait();
I understand using async await pattern when firing up lots of concurrent tasks but when the code originates from a static method and has to run in order like this is there any benefit at all? Which way should I write it?

as it adds under the hood optimization but he is unable to explain why that is
It is amazing to me how many people believe that async is always the best choice yet they cannot say why. This is a big misunderstanding in the community. Unfortunately, Microsoft is kind of pushing this notion. I believe they are doing this to simplify guidance.
Async IO helps with two things: 1) save threads 2) make GUI apps easier by doing away with thread management.
Most applications are totally unconstrained by the number of threads that are running. For those apps, async IO adds zero throughput, it costs additional CPU and complicates the code. I know, because I have measured throughput and scalability. I have worked on many applications.
In particular, the IO itself does not become faster. The only thing that changes is the way the call is initiated and completed. There are no IO optimizations here whatsoever.
Use async when it is either convenient to you or you have evidence that the number of running threads will be a problem. Do not use it by default because productivity will be lower. There are additional ways to add bugs. Tooling is worse, the code is longer, debugging and profiling is harder.
Of course, there is nothing wrong with using it if it is the right tool for the job.

Improving performance of Parallel.For in C# with more methods

Recently I've stumbled upon a Parralel.For loop that performs way better than a regular for loop for my purposes.
This is how I use it:
Parallel.For(0, values.Count, i =>Products.Add(GetAllProductByID(values[i])));
It made my application work a lot faster, but still not fast enough. My question to you guys is:
Does Parallel.Foreach performs faster than Parallel.For?
Is there some "hybrid" method with whom I can combine my Parralel.For loop to perform even faster (i.e. use more CPU power)? If yes, how?
Can someone help me out with this?

If you want to play with parallel, I suggest using Parallel Linq (PLinq) instead of Parallel.For / Parallel.ForEach , e.g.
var Products = Enumerable
.Range(0, values.Count)
.AsParallel()
//.WithDegreeOfParallelism(10) // <- if you want, say 10 threads
.Select(i => GetAllProductByID(values[i]))
.ToList(); // <- this is thread safe now
With a help of With methods (e.g. WithDegreeOfParallelism) you can try tuning you implementation.

There are two related concepts: asynchronous programming and multithreading. Basically, to do things "in parallel" or asynchronously, you can either create new threads or work asynchronously on the same thread.
Keep in mind that either way you'll need some mechanism to prevent race conditions. From the Wikipedia article I linked to, a race condition is defined as follows:
A race condition or race hazard is the behavior of an electronic,
software or other system where the output is dependent on the sequence
or timing of other uncontrollable events. It becomes a bug when events
do not happen in the order the programmer intended.
As a few people have mentioned in the comments, you can't rely on the standard List class to be thread-safe - i.e. it might behave in unexpected ways if you're updating it from multiple threads. Microsoft now offers special "built-in" collection classes (in the System.Collections.Concurrent namespace) that'll behave in the expected way if you're updating it asynchronously or from multiple threads.
For well-documented libraries (and Microsoft's generally pretty good about this in their documentation), the documentation will often explicitly state whether the class or method in question is thread-safe. For example, in the documentation for System.Collections.Generic.List, it states the following:
Public static (Shared in Visual Basic) members of this type are thread
safe. Any instance members are not guaranteed to be thread safe.
In terms of asynchronous programming (vs. multithreading), my standard illustration of this is as follows: suppose you go a restaurant with 10 people. When the waiter comes by, the first person he asks for his order isn't ready; however, the other 9 people are. Thus, the waiter asks the other 9 people for their orders and then comes back to the original guy. (It's definitely not the case that they'll get a second waiter to wait for the original guy to be ready to order and doing so probably wouldn't save much time anyway). That's how async/await typically works (the exception being that some of the Task Parallel library calls, like Thread.Run(...), actually are executing on other threads - in our illustration, bringing in a second waiter - so make sure you check the documentation for which is which).
Basically, which you choose (asynchronously on the same thread or creating new threads) depends on whether you're trying to do something that's I/O-bound (i.e. you're just waiting for an operation to complete or for a result) or CPU-bound.
If your main purpose is to wait for a result from Ebay, it would probably be better to work asynchronously in the same thread as you may not get much of a performance benefit for using multithreading. Think back to our analogy: bringing in a second waiter just to wait for the first guy to be ready to order isn't necessarily any better than just having the waiter to come back to him.
I'm not sitting in front of an IDE so forgive me if this syntax isn't perfect, but here's an approximate idea of what you can do:
public async Task GetResults(int[] productIDsToGet) {
var tasks = new List<Task>();
foreach (int productID in productIDsToGet) {
Task task = GetResultFromEbay(productID);
tasks.Add(task);
}
// Wait for all of the tasks to complete
await Task.WhenAll(tasks);
}
private async Task GetResultFromEbay(int productIdToGet) {
// Get result asynchronously from eBay
}

Reactive extensions Subject uses

I am at the early stages of learning about Rx and have come across the Subject class. I don't quite understand why this class exists. I understand that it implements both IObservable and IObserver but what are Subjects used for?
As far as I can tell, they can act as a proxy between a source and a bunch of subscribers but couldn't the subscribers just subscribe directly to the source? When I see instances of a Subject being used as an observable and observer I get confused.
I am sure I am just not getting some basic fact here but I don't know what Subject brings to the game. I guess I am looking for some basic (but hopefully real world) example of when Subjects are useful and when they are not (as I have also read that Subjects are not usually used, replaced with Observable.Create).

First, a lot of folks will tell you Subject<T> doesn't belong, since it goes against some other tenets/patterns in the Rx framework.
That said, they act as either an IObservable or an IObserver, so you get some useful functionality out of them - I generally use them during the initial development stages for:
A "debug point" of sorts, where I can subscribe to an IObservable chain inline with a Subject<T>, and inspect the contents with the debugger.
An "observable on demand", where I can manually call OnNext and pass in data I want to inject into the stream
Used to use them to replicate what ConnectableObserable now does - a "broadcast" mechanism for multiple subscribers to a single Observable, but that can be done with Publish now.
Bridging layer between disparate systems; again, this is largely unnecessary now with the various FromAsync, FromEvent extensions, but they can still be used as such (basically, the "old" system injects events into the Subject<T> via OnNext, and from then on the normal Rx flow.

Using subjects means we are now managing state, which is potentially mutating. Mutating state and asynchronous programming are very hard to get right. Furthermore many of the operators (extension methods) have been carefully written to ensure correct and consistent lifetime of subscriptions and sequences are maintained. When you introduce subjects you can break this.
A significant benefit that the Create method has over subjects is that the sequence will be lazily evaluated.
In this example we show how we might first return a sequence via standard blocking eagerly evaluated call, and then we show the correct way to return an observable sequence without blocking by lazy evaluation.
Below example will be blocked for at least 1 second before they even receive the IObservable, regardless of if they do actually subscribe to it or not.
private IObservable<string> BlockingMethod()
{
var subject = new ReplaySubject<string>();
subject.OnNext("a");
subject.OnNext("b");
subject.OnCompleted();
Thread.Sleep(1000);
return subject;
}
Where as in bleow example consumer immediately receives the IObservable and will only incur the cost of the thread sleep if they subscribe.
private IObservable<string> NonBlocking()
{
return Observable.Create<string>(
(IObserver<string> observer) =>
{
observer.OnNext("a");
observer.OnNext("b");
observer.OnCompleted();
Thread.Sleep(1000);
return Disposable.Create(() => Console.WriteLine("Observer has unsubscribed"));
//or can return an Action like
//return () => Console.WriteLine("Observer has unsubscribed");
});
}

Use case for .PublishLast() (previously Prune)

In my opinion, I have a pretty good "feel" for RX functions - I use many of them or can imagine how other can be useful - but I can't find a place for the .Prune function. I know that this is a Multicast to AsyncSubject, but how this can be useful in a real scenario?
Edit: Richard says WebRequest is a good candidate for Prune(). I still don't see how. Let's take an example - I want to transform incoming uri's to images:
public static IObservable<BitmapImage> ToImage(this IObservable<string> source)
{
var streams =
from wc in source.Select(WebRequest.Create)
from s in Observable
.FromAsyncPattern<WebResponse>(wc.BeginGetResponse,
wc.EndGetResponse)()
.Catch(Observable.Empty<WebResponse>())
select s.GetResponseStream();
return streams
.ObserveOnDispatcher()
.Select(x =>
{
var bmp = new BitmapImage();
bmp.SetSource(x);
return bmp;
});
}
I don't see it necessary to append .Prune to .FromAsyncPattern, because when you're calling FromAsyncPattern() (which is hot) you subscribe "instantly".

As it was confirmed on the RX Forum Prune is just a covenience operator.
If your observable has a single value and you're publishing it, can replace Publish\Connect with a single call to .Prune()
So from my experience, the most common scenario for Prune is:
You have a cold observable that produces side-effects and emits only one value
You have more than one subscriber to that observable, so you want to make it hot (because of side-effects)
Another, pointed out in the forum, is when you need to cache a particular value on a hot observable(usually first). Then you use FromEvent(...).Take(1).Prune() and anybody who subscribes to this will get the same value guaranteed. This one is not just "convenience", but pretty much the only easy way to achieve the result.
Pretty useful, after all!

The most common scenario is when the source observable is hot and can complete before you subscribe to it. The AsyncSubject captures the last value and re-emits it for any future subscribers.
Edit
I'd have to check, but I believe FromAsyncPattern uses an AsyncSubject internally, so is actually already "Pruned".
However, assuming you were working with some other hot source that did not, the use of Prune comes entirely down to the lifetime of the IObservable before it is subscribed to. If you subscribe to it instantly, there is no need for Prune. If the IObservable will exist for a while before being subscribed to, however, it may have already completed.
This is my understanding, as someone who has ported Rx but never used Prune. Maybe you should ask the same question on the Rx forums? You've got a chance of it being answered by someone on the Rx team.

I've also found a neat use for it when I've got multiple UI components that need to listen to a task (e.g. callback) and by default, Subscribe() on a cold observable will kick off that task several times which is typically not what you want when sharing state across UI components.
I know Richard mentioned a lot of these points but I figured this is such a perfect candidate for single-run Tasks, to add this example in too.
var oTask = Observable.FromAsync(() => Task.Factory.StartNew(() =>
{
Thread.Sleep(1000);
Console.WriteLine("Executed Task");
}));
//Setup the IConnectedObservable
var oTask2 = oTask.PublishLast();
//Subscribe - nothing happens
oTask2.Subscribe(x => { Console.WriteLine("Called from Task 1"); });
oTask2.Subscribe(x => { Console.WriteLine("Called from Task 2"); });
//The one and only time the Task is run
oTask2.Connect();
//Subscribe after the task is already complete - we want the results
Thread.Sleep(5000);
oTask2.Subscribe(x => { Console.WriteLine("Called from Task 3"); });

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Is a Subject in RX.Net always harmful? - c#

Related

Chaining arbitrary number of tasks together in C#.NET

Using async await pattern from static method when concurrency isn't needed

Improving performance of Parallel.For in C# with more methods

Reactive extensions Subject uses

Use case for .PublishLast() (previously Prune)

Categories

Resources