Convert IEnumerable<Task<T>> to IObservable<T> - c#

I'm trying to use the Reactive Extensions (Rx) to buffer an enumeration of Tasks as they complete. Does anyone know if there is a clean built-in way of doing this? The ToObservable extension method will just make an IObservable<Task<T>>, which is not what I want, I want an IObservable<T>, that I can then use Buffer on.
Contrived example:
//Method designed to be awaitable
public static Task<int> makeInt()
{
return Task.Run(() => 5);
}
//In practice, however, I don't want to await each individual task
//I want to await chunks of them at a time, which *should* be easy with Observable.Buffer
public static void Main()
{
//Make a bunch of tasks
IEnumerable<Task<int>> futureInts = Enumerable.Range(1, 100).Select(t => makeInt());
//Is there a built in way to turn this into an Observable that I can then buffer?
IObservable<int> buffered = futureInts.TasksToObservable().Buffer(15); //????
buffered.Subscribe(ints => {
Console.WriteLine(ints.Count()); //Should be 15
});
}

You can use the fact that Task can be converted to observable using another overload of ToObservable().
When you have a collection of (single-item) observables, you can create a single observable that contains the items as they complete using Merge().
So, your code could look like this:
futureInts.Select(t => t.ToObservable())
.Merge()
.Buffer(15)
.Subscribe(ints => Console.WriteLine(ints.Count));

Related

Convert `IObservable<T>` to `IEnumerable<Task<T>>`?

I have a couple of asynchronous APIs that use callbacks or events instead of async. I successfully used TaskCompletionSource to wrap them as described here.
Now, I would like to use an API that returns IObservable<T> and yields multiple objects. I read about Rx for .NET, which seems the way to go. However, I'm hesitant to include another dependency and another new paradigm, since I'm already using a lot of things that are new for me in this app (like XAML, MVVM, C#'s async/await).
Is there any way to wrap IObservable<T> analogously to how you wrap a single callback API? I would like to call the API as such:
foreach (var t in GetMultipleInstancesAsync()) {
var res = await t;
Console.WriteLine("Received item:", res);
}
If the observable emits multiple values, you can convert them to Task<T> and then add them to any IEnumerable structure.
Check IObservable ToTask. As discussed here, the observable must complete before awaiting otherwise more values might come over.
This guide here might do the trick for you too
public static Task<IList<T>> BufferAllAsync<T>(this IObservable<T> observable)
{
List<T> result = new List<T>();
object gate = new object();
TaskCompletionSource<IList<T>> finalTask = new TaskCompletionSource<IList<T>>();
observable.Subscribe(
value =>
{
lock (gate)
{
result.Add(value);
}
},
exception => finalTask.TrySetException(exception),
() => finalTask.SetResult(result.AsReadOnly())
);
return finalTask.Task;
}
If you would like to use Rx then you can use your returning list:
GetMultipleInstancesAsync().ToObservable().Subscribe(...);
You can subscribe the OnCompleted/OnError handler.
Also you can wrap it a task list:
var result = await Task.WhenAll(GetMultipleInstancesAsync().ToArray());
So you got an array of your results and you are done.

Producer consumer collection with ability to read and write batches of data

I'm looking for a collection like BufferBlock
but with methods like:
SendAsync<T>(T[])
T[] ReceiveAsync<T>()
Is anyone can help with?
These methods aren't available, SendAsync<T> only takes a single T and RecieveAsync<T> only returns a single T, not arrays.
SendAsync<T>(T[])
T[] ReceiveAsync<T>()
However there is TryReceiveAll<T>(out IList<T> items) and you can call SendAsync<T> in a loop to send an array into the BufferBlock or write your own extension method, something like this:
public static async Task SendAllAsync<T>(this ITargetBlock<T> block, IEnumerable<T> items)
{
foreach(var item in items)
{
await block.SendAsync(item)
}
}
Note that SendAsync does return a bool indicating acceptance of the messages, you could return an array of booleans or just return if any of them come back false but that's up to you.
Likely it would be easier to use a BatchBlock<T> that you can send items to as singles using a loop but emits the items in batches, which would be easier than using TryRecieveAll if you're building a pipeline. BatchBlock Walkthrough and BatchBlock Example
ReceiveAsync and SendAsync are available as extension methods on the ISourceBlock and ITargetBlockT<> interfaces. This means that you have to cast the block to those interfaces in order to use the extension methods, eg :
var buffer=new BufferBlock<string>();
var source=(ISourceBlock<string>)buffer;
var target=(ITargetBlock<string>)buffer;
await target.SendAsync("something");
Typically that's not a problem because all Dataflow methods accept interfaces, not concrete types, eg :
async Task MyProducer(ITargetBlock<string> target)
{
...
await target.SendAsync(..);
...
target.Complete();
}
async Task MyConsumer(ISourceBlock<string> target)
{
...
var message=await target.ReceiveAsync();
...
}
public static async Task Main()
{
var buffer=new BufferBlock<string>();
MyProducer(buffer);
await MyConsumer(buffer);
}

Running async methods in parallel

I've got an async method, GetExpensiveThing(), which performs some expensive I/O work. This is how I am using it:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = await GetExpensiveThing();
var second = await GetExpensiveThing();
return new List<Thing>() { first, second };
}
But since it's an expensive method, I want to execute these calls in in parallel. I would have thought moving the awaits would have solved this:
// Serial execution
public async Task<List<Thing>> GetThings()
{
var first = GetExpensiveThing();
var second = GetExpensiveThing();
return new List<Thing>() { await first, await second };
}
That didn't work, so I wrapped them in some tasks and this works:
// Parallel execution
public async Task<List<Thing>> GetThings()
{
var first = Task.Run(() =>
{
return GetExpensiveThing();
});
var second = Task.Run(() =>
{
return GetExpensiveThing();
});
return new List<Thing>() { first.Result, second.Result };
}
I even tried playing around with awaits and async in and around the tasks, but it got really confusing and I had no luck.
Is there a better to run async methods in parallel, or are tasks a good approach?
Is there a better to run async methods in parallel, or are tasks a good approach?
Yes, the "best" approach is to utilize the Task.WhenAll method. However, your second approach should have ran in parallel. I have created a .NET Fiddle, this should help shed some light. Your second approach should actually be running in parallel. My fiddle proves this!
Consider the following:
public Task<Thing[]> GetThingsAsync()
{
var first = GetExpensiveThingAsync();
var second = GetExpensiveThingAsync();
return Task.WhenAll(first, second);
}
Note
It is preferred to use the "Async" suffix, instead of GetThings and GetExpensiveThing - we should have GetThingsAsync and GetExpensiveThingAsync respectively - source.
Task.WhenAll() has a tendency to become unperformant with large scale/amount of tasks firing simultaneously - without moderation/throttling.
If you are doing a lot of tasks in a list and wanting to await the final outcome, then I propose using a partition with a limit on the degree of parallelism.
I have modified Stephen Toub's blog elegant approach to modern LINQ:
public static Task ParallelForEachAsync<T>(this IEnumerable<T> source, Func<T, Task> funcBody, int maxDoP = 4)
{
async Task AwaitPartition(IEnumerator<T> partition)
{
using (partition)
{
while (partition.MoveNext())
{
await Task.Yield(); // prevents a sync/hot thread hangup
await funcBody(partition.Current);
}
}
}
return Task.WhenAll(
Partitioner
.Create(source)
.GetPartitions(maxDoP)
.AsParallel()
.Select(p => AwaitPartition(p)));
}
How it works is simple, take an IEnumerable - dissect it into evenish partitions and the fire a function/method against each element, in each partition, at the same time. No more than one element in each partition at anyone time, but n Tasks in n partitions.
Extension Usage:
await myList.ParallelForEachAsync(myFunc, Environment.ProcessorCount);
Edit:
I now keep some overloads in a repository on Github if you need more options. It's in a NuGet too for NetStandard.
Edit 2: Thanks to comments from Theodor below, I was able to mitigate poorly written Async Tasks from blocking parallelism by using await Task.Yield();.
You can your the Task.WhenAll, which returns when all depending tasks are done
Check this question here for reference
If GetExpensiveThing is properly asynchronous (meaning it doesn't do any IO or CPU work synchronously), your second solution of invoking both methods and then awaiting the results should've worked. You could've also used Task.WhenAll.
However, if it isn't, you may get better results by posting each task to the thread-pool and using the Task.WhenAll combinator, e.g.:
public Task<IList<Thing>> GetThings() =>
Task.WhenAll(Task.Run(() => GetExpensiveThing()), Task.Run(() => GetExpensiveThing()));
(Note I changed the return type to IList to avoid awaits altogether.)
You should avoid using the Result property. It causes the caller thread to block and wait for the task to complete, unlike await or Task.WhenAll which use continuations.

C# Parallel - Adding items to the collection being iterated over, or equivalent?

Right now, I've got a C# program that performs the following steps on a recurring basis:
Grab current list of tasks from the database
Using Parallel.ForEach(), do work for each task
However, some of these tasks are very long-running. This delays the processing of other pending tasks because we only look for new ones at the start of the program.
Now, I know that modifying the collection being iterated over isn't possible (right?), but is there some equivalent functionality in the C# Parallel framework that would allow me to add work to the list while also processing items in the list?
Generally speaking, you're right that modifying a collection while iterating it is not allowed. But there are other approaches you could be using:
Use ActionBlock<T> from TPL Dataflow. The code could look something like:
var actionBlock = new ActionBlock<MyTask>(
task => DoWorkForTask(task),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = DataflowBlockOptions.Unbounded });
while (true)
{
var tasks = GrabCurrentListOfTasks();
foreach (var task in tasks)
{
actionBlock.Post(task);
await Task.Delay(someShortDelay);
// or use Thread.Sleep() if you don't want to use async
}
}
Use BlockingCollection<T>, which can be modified while consuming items from it, along with GetConsumingParititioner() from ParallelExtensionsExtras to make it work with Parallel.ForEach():
var collection = new BlockingCollection<MyTask>();
Task.Run(async () =>
{
while (true)
{
var tasks = GrabCurrentListOfTasks();
foreach (var task in tasks)
{
collection.Add(task);
await Task.Delay(someShortDelay);
}
}
});
Parallel.ForEach(collection.GetConsumingPartitioner(), task => DoWorkForTask(task));
Here is an example of an approach you could try. I think you want to get away from Parallel.ForEaching and do something with asynchronous programming instead because you need to retrieve results as they finish, rather than in discrete chunks that could conceivably contain both long running tasks and tasks that finish very quickly.
This approach uses a simple sequential loop to retrieve results from a list of asynchronous tasks. In this case, you should be safe to use a simple non-thread safe mutable list because all of the mutation of the list happens sequentially in the same thread.
Note that this approach uses Task.WhenAny in a loop which isn't very efficient for large task lists and you should consider an alternative approach in that case. (See this blog: http://blogs.msdn.com/b/pfxteam/archive/2012/08/02/processing-tasks-as-they-complete.aspx)
This example is based on: https://msdn.microsoft.com/en-GB/library/jj155756.aspx
private async Task<ProcessResult> processTask(ProcessTask task)
{
// do something intensive with data
}
private IEnumerable<ProcessTask> GetOutstandingTasks()
{
// retreive some tasks from db
}
private void ProcessAllData()
{
List<Task<ProcessResult>> taskQueue =
GetOutstandingTasks()
.Select(tsk => processTask(tsk))
.ToList(); // grab initial task queue
while(taskQueue.Any()) // iterate while tasks need completing
{
Task<ProcessResult> firstFinishedTask = await Task.WhenAny(taskQueue); // get first to finish
taskQueue.Remove(firstFinishedTask); // remove the one that finished
ProcessResult result = await firstFinishedTask; // get the result
// do something with task result
taskQueue.AddRange(GetOutstandingTasks().Select(tsk => processData(tsk))) // add more tasks that need performing
}
}

ConcurrentBag.ToObservable() runs once and completes prematurely

I have a static collection, say of tasks to call remote rest api:
static ConcurrentBag<Task<HttpResponseMessage>> _collection = new ConcurrentBag<Task<HttpResponseMessage>>();
static void Main(string[] args)
{
Task.Factory.StartNew(() => Produce());
Task.Factory.StartNew(() => Consume());
Console.ReadKey();
}
One thread adds new items into it:
private static void Produce()
{
while (true)
{
var task = HttpClientFactory.Create().GetAsync("http://example.com");
_collection.Add(task);
Thread.Sleep(500);
}
}
And another thread should process those items:
private static void Consume()
{
_collection.ToObservable()
.Subscribe(
t => Console.WriteLine("++"),
ex => Console.WriteLine(ex.Message),
() => Console.WriteLine("Done"));
}
But it runs only once and completes prematurely. So output is;
++
Done
It would be interesting if it worked like that... but sadly it doesn't. The ToObservable extension method is defined on the IEnumerable<T> interface - so it's getting a point-in-time snap shot of the collection.
You need a collection than can be observed, such as ObservableCollection. With this, you can respond to add events to feed an Rx pipeline (perhaps by wiring the CollectionChanged event up with Observable.FromEventPattern). Bear in mind that this collection doesn't support concurrent adds. Such a technique is one way to "enter the monad" (i.e. obtain an IObservable<T>).
Equivalent is adding your request payloads to a Subject. Either way, you can then project them into asynchronous requests. So say (for arguments sake), your Produce signature looked like this:
private static async Task<HttpResponseMessage> Produce(string requestUrl)
Then you might construct an observable to convert the requestUrls to async web requests using your Produce method like so:
var requests = new Subject<string>();
var responses = requests.SelectMany(
x => Observable.FromAsync(() => Produce(x)));
responses.Subscribe(
t => Console.WriteLine("++"),
ex => Console.WriteLine(ex.Message),
() => Console.WriteLine("Done"));
And submit each request with something like:
requests.OnNext("http://myurl");
If you need concurrent adds, see Observable.Synchronize.
If you need to control the thread(s) that handle the responses, use ObserveOn which I wrote a lengthy explanation of here.

Categories

Resources