Tasks not running concurrently by custom LINQ operator

Tasks not running concurrently by custom LINQ operator - c#

I am attempting to create a concurrent version of SelectAwait (and others) present as part of System.Linq.Async which provides extension methods to IAsyncEnumerable. This is the code that I am using:
private async IAsyncEnumerable<TOut> SelectParallelAsync<T, TOut>(
this IAsyncEnumerable<T> enumerable, Func<T, Task<TOut>> predicate)
{
var sem = new SemaphoreSlim(1, 10);
var retVal = enumerable.Select(item => {
var task = Task.Run(async () => {
await sem.WaitAsync();
var retVal = await predicate(item);
sem.Release();
return retVal;
});
return task;
});
await foreach (var item in retVal)
yield return await item;
}
Enumerable is a simple enumerable from 0-1000. The code is being called as
.SelectParallelAsync(async i =>
{
Console.WriteLine($"In Select : {i}");
await Task.Delay(1000);
return i + 5;
});
I was expecting all the tasks to get started immediately and being run 10 at a time. However, they get triggered one after another. Is there any way I can achieve something like this? Much appreciated.
EDIT: I am using semaphore instead of Parallel.ForEach or .AsParallel().WithMaxDegreeOfParallelism because I want to share this semaphore between multiple methods. Furthermore, PLINQ is not exactly very extendable and I can't add my own extension methods to it.

The enumeration of the source IAsyncEnumerable<T> enumerable is driven by the enumeration of the resulting AsyncEnumerable<TOut>. When the consumer of the resulting sequence requests the first TOut element of the sequence, at that point a T value will be requested from the source IAsyncEnumerable<T> enumerable. Then the value will be projected to a Task<TOut>, then this task will be awaited, and finally the result of the task will returned to the consumer. Everything happens sequentially. There is no concurrency. There is no internal activity before the consumer asks for an element, and after the element has been delivered to the consumer.
Adding concurrency to a LINQ operator is much more involved than it might appear from the first glance. It means that when the consumer asks for the first element, 10 tasks must start at once. And when any of these tasks completes, another task must start automatically in its place, without the consumer asking for it. And there must be a limit to how many tasks can be stored internally, that have not yet been requested by the consumer. And no more tasks should be started when this limit is reached, until the consumer takes one and creates an empty slot. And you must think what to do with the internal mechanism that actively starts the tasks and watches for their completion, in case the consumer decides that it had enough, and won't request any more elements (by exiting the consuming loop). And you must also think what to do with the stored tasks, in case the one that is about to be delivered to the consumer has failed. And what if more than one tasks have failed? And what to do in case the enumeration is canceled with a CancellationToken?
Doing all these correctly using only primitive tools like TaskCompletionSources and SemaphoreSlims, without using higher-level tools like the Channel<T>, is extremely difficult. If you are not familiar with the Channel<T>, my advice is to spend some time and familiarize yourself with it. It's a quite simple mechanism. If you know anything about the BlockingCollection<T> class, the Channel<T> is an asynchronous version of it.
In another question I have posted an AwaitResults method, that could be used to implement the SelectParallelAsync operator quite easily:
private IAsyncEnumerable<TOut> SelectParallelAsync<T, TOut>(
this IAsyncEnumerable<T> enumerable, Func<T, Task<TOut>> predicate)
{
return enumerable
.Select(item => predicate(item))
.AwaitResults(maxConcurrency: 10);
}
You could study that implementation, and change it to fit your needs.

Related

Parallel.ForEach with async lambda waiting forall iterations to complete

recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Is there any way how could I write:
List<int> list = new List<int>[]();
Parallel.ForEach(arrayValues, async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
list.Add(x);
});
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?

recently I have seen several SO threads related to Parallel.ForEach mixed with async lambdas, but all proposed answers were some kind of workarounds.
Well, that's because Parallel doesn't work with async. And from a different perspective, why would you want to mix them in the first place? They do opposite things. Parallel is all about adding threads and async is all about giving up threads. If you want to do asynchronous work concurrently, then use Task.WhenAll. That's the correct tool for the job; Parallel is not.
That said, it sounds like you want to use the wrong tool, so here's how you do it...
How can I ensure that list will contain all items from all iterations executed withing lambdas in each iteration?
You'll need to have some kind of a signal that some code can block on until the processing is done, e.g., CountdownEvent or Monitor. On a side note, you'll need to protect access to the non-thread-safe List<T> as well.
How will generally Parallel.ForEach work with async lambdas, if it hit await will it hand over its thread to next iteration?
Since Parallel doesn't understand async lambdas, when the first await yields (returns) to its caller, Parallel will assume that interation of the loop is complete.
I assume ParallelLoopResult IsCompleted field is not proper one, as it will return true when all iterations are executed, no matter if their actual lambda jobs are finished or not?
Correct. As far as Parallel knows, it can only "see" the method to the first await that returns to its caller. So it doesn't know when the async lambda is complete. It also will assume iterations are complete too early, which throws partitioning off.

You don't need Parallel.For/ForEach here you just need to await a list of tasks.
Background
In short you need to be very careful about async lambdas, and if you are passing them to an Action or Func<Task>
Your problem is because Parallel.For / ForEach is not suited for the async and await pattern or IO bound tasks. They are suited for cpu bound workloads. Which means they essentially have Action parameters and let's the task scheduler create the tasks for you
If you want to run multiple async tasks at the same time use Task.WhenAll , or a TPL Dataflow Block (or something similar) which can deal effectively with both CPU bound and IO bound works loads, or said more directly, they can deal with tasks which is what an async method is.
Unless you need to do more inside of your lambda (for which you haven't shown), just use aSelect and WhenAll
var tasks = items .Select(LongRunningIoOperationAsync);
var results = await Task.WhenAll(tasks); // here is your list of int
If you do, you can still use the await,
var tasks = items.Select(async (item) =>
{
var x = await LongRunningIoOperationAsync(item);
// do other stuff
return x;
});
var results = await Task.WhenAll(tasks);
Note : If you need the extended functionality of Parallel.ForEach (namely the Options to control max concurrency), there are several approach, however RX or DataFlow might be the most succinct

Chaining arbitrary number of tasks together in C#.NET

What I have
I have a set of asynchronous processing methods, similar to:
public class AsyncProcessor<T>
{
//...rest of members, etc.
public Task Process(T input)
{
//Some special processing, most likely inside a Task, so
//maybe spawn a new Task, etc.
Task task = Task.Run(/* maybe private method that does the processing*/);
return task;
}
}
What I want
I would like to chain them all together, to execute in sequential order.
What I tried
I have tried to do the following:
public class CompositeAsyncProcessor<T>
{
private readonly IEnumerable<AsyncProcessor<T>> m_processors;
//Constructor receives the IEnumerable<AsyncProcessor<T>> and
//stores it in the field above.
public Task ProcessInput(T input)
{
Task chainedTask = Task.CompletedTask;
foreach (AsyncProcessor<T> processor in m_processors)
{
chainedTask = chainedTask.ContinueWith(t => processor.Process(input));
}
return chainedTask;
}
}
What went wrong
However, tasks do not run in order because, from what I have understood, inside the call to ContinueWith, the processor.Process(input) call is performed immediately and the method returns independently of the status of the returned task. Therefore, all processing Tasks still begin almost simultaneously.
My question
My question is whether there is something elegant that I can do to chain the tasks in order (i.e. without execution overlap). Could I achieve this using the following statement, (I am struggling a bit with the details), for example?
chainedTask = chainedTask.ContinueWith(async t => await processor.Process(input));
Also, how would I do this without using async/await, only ContinueWith?
Why would I want to do this?
Because my Processor objects have access to, and request things from "thread-unsafe" resources. Also, I cannot just await all the methods because I have no idea about how many they are, so I cannot just write down the necessary lines of code.
What do I mean by thread-unsafe? A specific problem
Because I may be using the term incorrectly, an illustration is a bit better to explain this bit. Among the "resources" used by my Processor objects, all of them have access to an object such as the following:
public interface IRepository
{
void Add(object obj);
bool Remove(object obj);
IEnumerable<object> Items { get; }
}
The implementation currently used is relatively naive. So some Processor objects add things, while others retrieve the Items for inspection. Naturally, one of the exceptions I get all too often is:
InvalidOperationException: Collection was modified, enumeration
operation may not execute.
I could spend some time locking access and pre-running the enumerations. However, this was the second option I would get down to, while my first thought was to just make the processes run sequentially.
Why must I use Tasks?
While I have full control in this case, I could say that for the purposes of the question, I might not be able to change the base implementation, so what would happen if I were stuck with Tasks? Furthermore, the operations actually do represent relatively time-consuming CPU-bound operations plus I am trying to achieve a responsive user interface so I needed to unload some burden to asynchronous operations. While being useful and, in most of my use-cases, not having the necessity to chain multiple of them, rather a single one each time (or a couple, but always specific and of a specific count, so I was able to hook them together without iterations and async/await), one of the use-cases finally necessitated chaining an unknown number of Tasks together.
How I deal with this currently
The way I am dealing with this currently is to append a call to Wait() inside the ContinueWith call, i.e.:
foreach (AsyncProcessor<T> processor in m_processors)
{
chainedTask = chainedTask.ContinueWith(t => processor.Process(input).Wait());
}
I would appreciate any idea on how I should do this, or how I could do it more elegantly (or, "async-properly", so to speak). Also, I would like to know how I can do this without async/await.
Why my question is different from this question, which did not answer my question entirely.
Because the linked question has two tasks, so the solution is to simply write the two lines required, while I have an arbitrary (and unknown) number of tasks, so I need an suitable iteration. Also, my method is not async. I now understand (from the single briefly available answer, which was deleted) that I could do it fairly easily if I changed my method to async and await each processor's Task method, but I still wish to know how this could be achieved without async/await syntax.
Why my question is not a duplicate of the other linked questions
Because none of them explains how to chain correctly using ContinueWith and I am interested in a solution that utilizes ContinueWith and does not make use of the async/await pattern. I know this pattern may be the preferable solution, I want to understand how to (if possible) make arbitrary chaining using ContinueWith calls properly. I now know I don't need ContinueWith. The question is, how do I do it with ContinueWith?

foreach + await will run Processes sequentially.
public async Task ProcessInputAsync(T input)
{
foreach (var processor in m_processors)
{
await processor.Process(input));
}
}
Btw. Process, should be called ProcessAsync

The method Task.ContinueWith does not understand async delegates, like Task.Run do, so when you return a Task it considers this as a normal return value and wraps it in another Task. So you end up receiving a Task<Task> instead of what you expected to get. The problem would be obvious if the AsyncProcessor.Process was returning a generic Task<T>. In this case you would get a compile error because of the illegal casting from Task<Task<T>> to Task<T>. In your case you cast from Task<Task> to Task, which is legal, since Task<TResult> derives from Task.
Solving the problem is easy. You just need to unwrap the Task<Task> to a simple Task, and there is a built-in method Unwrap that does exactly that.
There is another problem that you need to solve though. Currently your code suppresses all exceptions that may occur on each individual AsyncProcessor.Process, which I don't think it was intended. So you must decide which strategy to follow in this case. Are you going to propagate the first exception immediately, or you prefer to cache them all and propagate them at the end bundled in an AggregateException, like the Task.WhenAll does? The example bellow implements the first strategy.
public class CompositeAsyncProcessor<T>
{
//...
public Task Process(T input)
{
Task current = Task.CompletedTask;
foreach (AsyncProcessor<T> processor in m_processors)
{
current = current.ContinueWith(antecessor =>
{
if (antecessor.IsFaulted)
return Task.FromException<T>(antecessor.Exception.InnerException);
return processor.Process(input);
},
CancellationToken.None,
TaskContinuationOptions.ExecuteSynchronously,
TaskScheduler.Default
).Unwrap();
}
return current;
}
}
I have used an overload of ContinueWith that allows configuring all the options, because the defaults are not ideal. The default TaskContinuationOptions is None. Configuring it to ExecuteSynchronously you minimize the thread switches, since each continuation will run in the same thread that completed the previous one.
The default task scheduler is TaskScheduler.Current. By specifying TaskScheduler.Default you make it explicit that you want the continuations to run in thread-pool threads (for some exceptional cases that won't be able to run synchronously). The TaskScheduler.Current is context specific, and if it ever surprises you it won't be in a good way.
As you see there are a lot of gotchas with the old-school ContinueWith approach. Using the modern await in a loop is a lot easier to implement, and a lot more difficult to get it wrong.

IQueryable async extension methods exact time execution and returned task

I have a method that returns data from the database
public Task<List<T>> GetAsync(int someId)
{
return dbSet
.Where(x => x.Id == someId)
.ToListAsync();
}
I have a dictionary that stores Task by some key
Dictionary<int, Task<List<T>>> SomeDictionary { get; set; }
In some other place I get the Task and put it in the dictionary
var someTask = repo.GetAsync(someId);
someInstance.AddInDictionary(key, someTask);
Then I get the task and await it to get the result
var task = someInstance.GetFromDictionary(key);
List<T> result = await task;
So my questions are:
1. Where do the IQueryable translate to sql query and execute in the database:
when I call the method in the repo
var someTask = repo.GetAsync(someId);
-or when I await the task
List<T> result = await task;
2. In the dictionary, when I store Tasks
Dictionary<int, Task<List<T>>> SomeDictionary { get; set; }
...do I store only a operation that should return a result, or a operation together with the actual result? In other words, does saving the Task instead of saving the List, save any memory?
Thanks in advance.

The await keyword is effectively syntactic sugar. It works in a similar (not same) fashion to a callback, where the callback is the code proceeding the await. It does not control when or how the Task is executed.
This means that Entity Framework is responsible for scheduling and executing the Task without knowledge of await, and therefore has the ability to schedule the Task's execution however it desires.
Depending on how ToListAsync is implemented, it may contain a portion of code which executes immediately and synchronously with the current thread.
I have not seen Microsoft state when the IQueryable is translated in to SQL, or the connection to the SQL server is initiated. My guess would be that the IQueryable is translated to SQL and the connection handshake starts immediately upon calling ToListAsync, and that the remainder of the method is executed as soon as possible on one of the ThreadPool threads.
That being said, calling code should be written in such a way that it does not rely on the internal operations of the asynchronous method.
As for the second part of your question, Task is a reference type. It has the same memory overhead as any other reference type. If you declare N fields to house N Tasks, you'll end up with N * B bits, where B is 32 or 64 depending on your operating system. If you store them in a List, which is also a reference type, you'll probably end up consuming more memory, because List internally keeps a larger array than is necessary so it can append/prepend items more efficiently. If you store N Tasks in an Array of N elements, you'll end up with (N + 1) * B (could be more, not sure), since the array is also a reference.

What's the diference between Task.WhenAll() and foreach(var task in tasks)

After a few hours of struggle I found a bug in my app. I considered the 2 functions below to have identical behavior, but it turned out they don't.
Can anyone tell me what's really going on under the hood, and why they behave in a different way?
public async Task MyFunction1(IEnumerable<Task> tasks){
await Task.WhenAll(tasks);
Console.WriteLine("all done"); // happens AFTER all tasks are finished
}
public async Task MyFunction2(IEnumerable<Task> tasks){
foreach(var task in tasks){
await task;
}
Console.WriteLine("all done"); // happens BEFORE all tasks are finished
}

They'll function identically if all tasks complete successfully.
If you use WhenAll and any items fail, it still won't be completed until all of the items are finished, and it'll represent an AggregatException that wraps all errors from all tasks.
If you await each one then it'll complete as soon as it hits any item that fails, and it'll represent an exception for that one error, not any others.
The two also differ in that WhenAll will materialize the entire IEnumerable right at the start, before adding any continuations to other items. If the IEnumerable represents a collection of already existing and started tasks, then this isn't relevant, but if the act of iterating the enumerable creates and/or starts tasks, then materializing the sequence at the start would run them all in parallel, and awaiting each before fetching the next task would execute them sequentially. Below is a IEnumerable you could pass in that would behave as I've described here:
public static IEnumerable<Task> TaskGeneratorSequence()
{
for(int i = 0; i < 10; i++)
yield return Task.Delay(TimeSpan.FromSeconds(2);
}

Likely the most important functional difference is that Task.WhenAll can introduce concurrency when your tasks perform truly asynchronous operations, for example, IO. This may or may not be what you want depending on your situation.
For example, if your tasks are querying the database using the same EF DbContext, the next query would fire as soon as the first one is "in flight" which causes EF to blow up as it doesn't support multiple simultaneous queries using the same context.
That's because you're not awaiting each asynchronous operation individually. You're awaiting a task that represents the completion of all of those asynchronous operations. They can also be completed in any order.
However when you await each one individually in a foreach, you only fire the next task when the current one completes, preventing concurrency and ensuring serial execution.
A simple example demonstrating this behavior:
async Task Main()
{
var tasks = new []{1, 2, 3, 4, 5}.Select(i => OperationAsync(i));
foreach(var t in tasks)
{
await t;
}
await Task.WhenAll(tasks);
}
static Random _rand = new Random();
public async Task OperationAsync(int number)
{
// simulate an asynchronous operation
// taking anywhere between 100 to 3000 milliseconds
await Task.Delay(_rand.Next(100, 3000));
Console.WriteLine(number);
}
You'll see that no matter how long OperationAsync takes, with foreach you always get 1, 2, 3, 4, 5 printed. But with Task.WhenAll they are executed concurrently and printed in their completion order.

Executing TPL code in a reactive pipeline and controlling execution via test scheduler

I'm struggling to get my head around why the following test does not work:
[Fact]
public void repro()
{
var scheduler = new TestScheduler();
var count = 0;
// this observable is a simplification of the system under test
// I've just included it directly in the test for clarity
// in reality it is NOT accessible from the test code - it is
// an implementation detail of the system under test
// but by passing in a TestScheduler to the sut, the test code
// can theoretically control the execution of the pipeline
// but per this question, that doesn't work when using FromAsync
Observable
.Return(1)
.Select(i => Observable.FromAsync(Whatever))
.Concat()
.ObserveOn(scheduler)
.Subscribe(_ => Interlocked.Increment(ref count));
Assert.Equal(0, count);
// this call initiates the observable pipeline, but does not
// wait until the entire pipeline has been executed before
// returning control to the caller
// the question is: why? Rx knows I'm instigating an async task
// as part of the pipeline (that's the point of the FromAsync
// method), so why can't it still treat the pipeline atomically
// when I call Start() on the scheduler?
scheduler.Start();
// count is still zero at this point
Assert.Equal(1, count);
}
private async Task<Unit> Whatever()
{
await Task.Delay(100);
return Unit.Default;
}
What I'm trying to do is run some asynchronous code (represented above by Whatever()) whenever an observable ticks. Importantly, I want those calls to be queued. More importantly, I want to be able to control the execution of the pipeline by using the TestScheduler.
It seems like the call to scheduler.Start() is instigating the execution of Whatever() but it isn't waiting until it completes. If I change Whatever() so that it is synchronous:
private async Task<Unit> Whatever()
{
//await Task.Delay(100);
return Unit.Default;
}
then the test passes, but of course that defeats the purpose of what I'm trying to achieve. I could imagine there being a StartAsync() method on the TestScheduler that I could await, but that does not exist.
Can anyone tell me whether there's a way for me to instigate the execution of the reactive pipeline and wait for its completion even when it contains asynchronous calls?

Let me boil down your question to its essentials:
Is there a way, using the TestScheduler, to execute a reactive pipeline and wait for its completion even when it contains asynchronous calls?
I should warn you up front, there is no quick and easy answer here, no convenient "trick" that can be deployed.
Asynchronous Calls and Schedulers
To answer this question I think we need to clarify some points. The term "asynchronous call" in the question above seems to be used specifically to refer to methods with a Task or Task<T> signature - i.e. methods that use the Task Parallel Library (TPL) to run asynchronously.
This is important to note because Reactive Extensions (Rx) takes a different approach to handling asynchronous operations.
In Rx the introduction of concurrency is managed via a scheduler, a type implementing the IScheduler interface. Any operation that introduces concurrency should make a available a scheduler parameter so that the caller can decide an appropriate scheduler. The core library slavishly adheres to this principle. So, for example, Delay allows specification of a scheduler but Where does not.
As you can see from the source, IScheduler provides a number of Schedule overloads. Operations requiring concurrency use these to schedule execution of work. Exactly how that work is executed is deferred completely to the scheduler. This is the power of the scheduler abstraction.
Rx operations introducing concurrency generally provide overloads that allow the scheduler to be omitted, and in that case select a sensible default. This is important to note, because if you want your code to be testable via the use of TestScheduler you must use a TestScheduler for all operations that introduce concurrency. A rogue method that doesn't allow this, could scupper your testing efforts.
TPL Scheduling Abstraction
The TPL has it's own abstraction to handle concurrency: The TaskScheduler. The idea is very similar. You can read about it here..
There are two very important differences between the two abstractions:
Rx schedulers have a first class representation of their own notion of time - the Now property. TPL schedulers do not.
The use of custom schedulers in the TPL is much less prevalent, and there is no equivalent best practice of providing overloads for providing specific TaskSchedulers to a method introducing concurrency (returning a Task or Task<T>). The vast majority of Task returning methods assume use of the default TaskScheduler and give you no choice about where work is run.
Motivation for TestScheduler
The motivation to use a TestScheduler is generally two-fold:
To remove the need to "wait" for operations by speeding up time.
To check that events occurred at expected points in time.
The way this works depends entirely on the fact that schedulers have their own notion of time. Every time an operation is scheduled via an IScheduler, we specify when it must execute - either as soon as possible, or at a specific time in the future. The scheduler then queues work for execution and will execute it when the specified time (according to the scheduler itself) is reached.
When you call Start on the TestScheduler, it works by emptying the queue of all operations with execution times at or before its current notion of Now - and then advancing its clock to the next scheduled work time and repeating until its queue is empty.
This allows neat tricks like being able to test that an operation will never result in an event! If using real time this would be a challenging task, but with virtual time it's easy - once the scheduler queue is completely empty, then the TestScheduler concludes that no further events will ever happen - since if nothing is left on its queue, there is nothing there to schedule further tasks. In fact Start returns at this precisely this point. For this to work, clearly all concurrent operations to be measured must be scheduled on the TestScheduler.
A custom operator that carelessly makes its own choice of scheduler without allowing that choice to be overriden, or an operation that uses its own form of concurrency without a notion of time (such as TPL based calls) will make it difficult, if not impossible, to control execution via a TestScheduler.
If you have an asynchronous operation run by other means, judicious use of the AdvanceTo and AdvanceBy methods of the TestScheduler can allow you to coordinate with that foreign source of concurrency - but the extent to which this is achievable depends on the control afforded by that foreign source.
In the case of the TPL, you do know when a task is done - which does allow the use of waits and timeouts in tests, as ugly as these can be. Through the use of TaskCompleteSources(TCS) you can mock tasks and use AdvanceTo to hit specific points and complete TCSs, but there is no one simple approach here. Often you just have to resort to ugly waits and timeouts because you don't have sufficient control over foreign concurrency.
Rx is generally free-threaded and tries to avoid introducing concurrency wherever possible. Conversely, it's quite possible that different operations within an Rx call chain will need different types of scheduler abstraction. It's not always possible to simulate a call chain with a single test scheduler. Certainly, I have had cause to use multiple TestSchedulers to simulate some complex scenarios - e.g. chains that use the DispatcherScheduler and TaskScheduler sometimes need complex coordination that means you can't simply serialize their operations on to one TestScheduler.
Some projects I have worked on have mandated the use of Rx for all concurrency specifically to avoid these problems. That is not always feasible, and even in these cases, some use of TPL is generally inevitable.
One particular pain point
One particular pain point of Rx that leaves many testers scratching their heads, is the fact that the TPL -> Rx family of conversions introduce concurrency. e.g. ToObservable, SelectMany's overload accepting Task<T> etc. don't provide overloads with a scheduler and insidiously force you off the TestScheduler thread, even if mocking with TCS. For all the pain this causes in testing alone, I consider this a bug. You can read all about this here - dig through and you'll find Dave Sexton's proposed fix, which provides an overload for specifying a scheduler, and is under consideration for inclusion. You may want to look into that pull request.
A Potential Workaround
If you can edit your code to use it, the following helper method might be of use. It converts a task to an observable that will run on the TestScheduler and complete at the correct virtual time.
It schedules work on the TestScheduler that is responsible for collecting the task result - at the virtual time we state the task should complete. The work itself blocks until the task result is available - allowing the TPL task to run for however long it takes, or until a real amount of specified time has passed in which case a TimeoutException is thrown.
The effect of blocking the work means that the TestScheduler won't advance its virtual time past the expected virtual completion time of the task until the task has actually completed. This way, the rest of the Rx chain can run in full-speed virtual time and we only wait on the TPL task, pausing the rest of the chain at the task completion virtual time whilst this happens.
Crucially, other concurrent Rx operations scheduled to run in between the start virtual time of the Task based operation and the stated end virtual time of the Task are not blocked and their virtual completion time will be unaffected.
So set duration to the length of virtual time you want the task to appear to have taken. The result will then be collected at whatever the virtual time is when the task is started, plus the duration specified.
Set timeout to the actual time you will allow the task to take. If it takes longer, a timeout exception is thrown:
public static IObservable<T> ToTestScheduledObseravble<T>(
this Task<T> task,
TestScheduler scheduler,
TimeSpan duration,
TimeSpan? timeout = null)
{
timeout = timeout ?? TimeSpan.FromSeconds(100);
var subject = Subject.Synchronize(new AsyncSubject<T>(), scheduler);
scheduler.Schedule<Task<T>>(task, duration,
(s, t) => {
if (!task.Wait(timeout.Value))
{
subject.OnError(
new TimeoutException(
"Task duration too long"));
}
else
{
switch (task.Status)
{
case TaskStatus.RanToCompletion:
subject.OnNext(task.Result);
subject.OnCompleted();
break;
case TaskStatus.Faulted:
subject.OnError(task.Exception.InnerException);
break;
case TaskStatus.Canceled:
subject.OnError(new TaskCanceledException(task));
break;
}
}
return Disposable.Empty;
});
return subject.AsObservable();
}
Usage in your code would be like this, and your assert will pass:
Observable
.Return(1)
.Select(i => Whatever().ToTestScheduledObseravble(
scheduler, TimeSpan.FromSeconds(1)))
.Concat()
.Subscribe(_ => Interlocked.Increment(ref count));
Conclusion
In summary, you haven't missed any convenient trick. You need to think about how Rx works, and how the TPL works and decide whether:
You avoid mixing TPL and Rx
You mock your interface between TPL and Rx (using TCS or similar), so you test each independently
You live with ugly waits and timeouts and abandon the TestScheduler altogether
You mix ugly waits and timeouts with TestScheduler to bring some modicum of control over your tests.

Noseratio's more elegant Rx way of writing this test. You can await observables to get their last value. Combine with Count() and it becomes trivial.
Note that the TestScheduler isn't serving any purpose in this example.
[Fact]
public async Task repro()
{
var scheduler = new TestScheduler();
var countObs = Observable
.Return(1)
.Select(i => Observable.FromAsync(Whatever))
.Concat()
//.ObserveOn(scheduler) // serves no purpose in this test
.Count();
Assert.Equal(0, count);
//scheduler.Start(); // serves no purpose in this test.
var count = await countObs;
Assert.Equal(1, count);
}

As James mentions above, you cant mix concurrency models like you are. You remove the concurrency from Rx by using the TestScheduler, but never actually introduce concurrency via Rx. You do however introduce concurrency with the TPL (i.e. Task.Delay(100). Here will will actually run asynchronously on a task pool thread. So your synchronous tests will complete before the task has completed.
You could change to something like this
[Fact]
public void repro()
{
var scheduler = new TestScheduler();
var count = 0;
// this observable is a simplification of the system under test
// I've just included it directly in the test for clarity
// in reality it is NOT accessible from the test code - it is
// an implementation detail of the system under test
// but by passing in a TestScheduler to the sut, the test code
// can theoretically control the execution of the pipeline
// but per this question, that doesn't work when using FromAsync
Observable
.Return(1)
.Select(_ => Observable.FromAsync(()=>Whatever(scheduler)))
.Concat()
.ObserveOn(scheduler)
.Subscribe(_ => Interlocked.Increment(ref count));
Assert.Equal(0, count);
// this call initiates the observable pipeline, but does not
// wait until the entire pipeline has been executed before
// returning control to the caller
// the question is: why? Rx knows I'm instigating an async task
// as part of the pipeline (that's the point of the FromAsync
// method), so why can't it still treat the pipeline atomically
// when I call Start() on the scheduler?
scheduler.Start();
// count is still zero at this point
Assert.Equal(1, count);
}
private async Task<Unit> Whatever(IScheduler scheduler)
{
return await Observable.Timer(TimeSpan.FromMilliseconds(100), scheduler).Select(_=>Unit.Default).ToTask();
}
Alternatively, you need to put the Whatever method behind an interface that you can mock out for testing. In which case you would just have your Stub/Mock/Double return the code from above i.e. return await Observable.Timer(TimeSpan.FromMilliseconds(100), scheduler).Select(_=>Unit.Default).ToTask();

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.