IAsyncEnumerable like a Source for akka streams - c#

I want to use IAsyncEnumerable like a Source for akka streams. But I not found, how do it.
No sutiable method in Source class for this code.
using System.Collections.Generic;
using System.Threading.Tasks;
using Akka.Streams.Dsl;
namespace ConsoleApp1
{
class Program
{
static async Task Main(string[] args)
{
Source.From(await AsyncEnumerable())
.Via(/*some action*/)
//.....
}
private static async IAsyncEnumerable<int> AsyncEnumerable()
{
//some async enumerable
}
}
}
How use IAsyncEnumerbale for Source?

This has been done in the past as a part of Akka.NET Streams contrib package, but since I don't see it there anymore, let's go through on how to implement such source. The topic can be quite long, as:
Akka.NET Streams is really about graph processing - we're talking about many-inputs/many-outputs configurations (in Akka.NET they're called inlets and outlets) with support for cycles in graphs.
Akka.NET is not build on top of .NET async/await or even on top of .NET standard thread pool library - they're both pluggable, which means that the lowest barier is basically using callbacks and encoding what C# compiler sometimes does for us.
Akka.NET streams is capable of both pushing and pulling values between stages/operators. IAsyncEnumerable<T> can only pull data while IObservable<T> can only push it, so we get more expressive power here, but this comes at a cost.
The basics of low level API used to implement custom stages can be found in the docs.
The starter boilerplate looks like this:
public static class AsyncEnumerableExtensions {
// Helper method to change IAsyncEnumerable into Akka.NET Source.
public static Source<T, NotUsed> AsSource<T>(this IAsyncEnumerable<T> source) =>
Source.FromGraph(new AsyncEnumerableSource<T>(source));
}
// Source stage is description of a part of the graph that doesn't consume
// any data, only produce it using a single output channel.
public sealed class AsyncEnumerableSource<T> : GraphStage<SourceShape<T>>
{
private readonly IAsyncEnumerable<T> _enumerable;
public AsyncEnumerableSource(IAsyncEnumerable<T> enumerable)
{
_enumerable = enumerable;
Outlet = new Outlet<T>("asyncenumerable.out");
Shape = new SourceShape<T>(Outlet);
}
public Outlet<T> Outlet { get; }
public override SourceShape<T> Shape { get; }
/// Logic if to a graph stage, what enumerator is to enumerable.
protected override GraphStageLogic CreateLogic(Attributes inheritedAttributes) =>
new Logic(this);
sealed class Logic: OutGraphStageLogic
{
public override void OnPull()
{
// method called whenever a consumer asks for new data
}
public override void OnDownstreamFinish()
{
// method called whenever a consumer stage finishes,used for disposals
}
}
}
As mentioned, we don't use async/await straight away here: even more, calling Logic methods in asynchronous context is unsafe. To make it safe we need to register out methods that may be called from other threads using GetAsyncCallback<T> and call them via returned wrappers. This will ensure, that not data races will happen when executing asynchronous code.
sealed class Logic : OutGraphStageLogic
{
private readonly Outlet<T> _outlet;
// enumerator we'll call for MoveNextAsync, and eventually dispose
private readonly IAsyncEnumerator<T> _enumerator;
// callback called whenever _enumerator.MoveNextAsync completes asynchronously
private readonly Action<Task<bool>> _onMoveNext;
// callback called whenever _enumerator.DisposeAsync completes asynchronously
private readonly Action<Task> _onDisposed;
// cache used for errors thrown by _enumerator.MoveNextAsync, that
// should be rethrown after _enumerator.DisposeAsync
private Exception? _failReason = null;
public Logic(AsyncEnumerableSource<T> source) : base(source.Shape)
{
_outlet = source.Outlet;
_enumerator = source._enumerable.GetAsyncEnumerator();
_onMoveNext = GetAsyncCallback<Task<bool>>(OnMoveNext);
_onDisposed = GetAsyncCallback<Task>(OnDisposed);
}
// ... other methods
}
The last part to do are methods overriden on `Logic:
OnPull used whenever the downstream stage calls for new data. Here we need to call for next element of async enumerator sequence.
OnDownstreamFinish called whenever the downstream stage has finished and will not ask for any new data. It's the place for us to dispose our enumerator.
Thing is these methods are not async/await, while their enumerator's equivalent are. What we basically need to do there is to:
Call corresponding async methods of underlying enumerator (OnPull → MoveNextAsync and OnDownstreamFinish → DisposeAsync).
See, if we can take their results immediately - it's important part that usually is done for us as part of C# compiler in async/await calls.
If not, and we need to wait for the results - call ContinueWith to register our callback wrappers to be called once async methods are done.
sealed class Logic : OutGraphStageLogic
{
// ... constructor and fields
public override void OnPull()
{
var hasNext = _enumerator.MoveNextAsync();
if (hasNext.IsCompletedSuccessfully)
{
// first try short-path: values is returned immediately
if (hasNext.Result)
// check if there was next value and push it downstream
Push(_outlet, _enumerator.Current);
else
// if there was none, we reached end of async enumerable
// and we can dispose it
DisposeAndComplete();
}
else
// we need to wait for the result
hasNext.AsTask().ContinueWith(_onMoveNext);
}
// This method is called when another stage downstream has been completed
public override void OnDownstreamFinish() =>
// dispose enumerator on downstream finish
DisposeAndComplete();
private void DisposeAndComplete()
{
var disposed = _enumerator.DisposeAsync();
if (disposed.IsCompletedSuccessfully)
{
// enumerator disposal completed immediately
if (_failReason is not null)
// if we close this stream in result of error in MoveNextAsync,
// fail the stage
FailStage(_failReason);
else
// we can close the stage with no issues
CompleteStage();
}
else
// we need to await for enumerator to be disposed
disposed.AsTask().ContinueWith(_onDisposed);
}
private void OnMoveNext(Task<bool> task)
{
// since this is callback, it will always be completed, we just need
// to check for exceptions
if (task.IsCompletedSuccessfully)
{
if (task.Result)
// if task returns true, it means we read a value
Push(_outlet, _enumerator.Current);
else
// otherwise there are no more values to read and we can close the source
DisposeAndComplete();
}
else
{
// task either failed or has been cancelled
_failReason = task.Exception as Exception ?? new TaskCanceledException(task);
FailStage(_failReason);
}
}
private void OnDisposed(Task task)
{
if (task.IsCompletedSuccessfully) CompleteStage();
else {
var reason = task.Exception as Exception
?? _failReason
?? new TaskCanceledException(task);
FailStage(reason);
}
}
}

As of Akka.NET v1.4.30 this is now natively supported inside Akka.Streams via the RunAsAsyncEnumerable method:
var input = Enumerable.Range(1, 6).ToList();
var cts = new CancellationTokenSource();
var token = cts.Token;
var asyncEnumerable = Source.From(input).RunAsAsyncEnumerable(Materializer);
var output = input.ToArray();
bool caught = false;
try
{
await foreach (var a in asyncEnumerable.WithCancellation(token))
{
cts.Cancel();
}
}
catch (OperationCanceledException e)
{
caught = true;
}
caught.ShouldBeTrue();
I copied that sample from the Akka.NET test suite, in case you're wondering.

You can also use an existing primitive for streaming large collections of data. Here is an example of using Source.unfoldAsync to stream pages of data - in this case github repositories using Octokit - until there is no more.
var source = Source.UnfoldAsync<int, RepositoryPage>(startPage, page =>
{
var pageTask = client.GetRepositoriesAsync(page, pageSize);
var next = pageTask.ContinueWith(task =>
{
var page = task.Result;
if (page.PageNumber * pageSize > page.Total) return Option<(int, RepositoryPage)>.None;
else return new Option<(int, RepositoryPage)>((page.PageNumber + 1, page));
});
return next;
});
To run
using var sys = ActorSystem.Create("system");
using var mat = sys.Materializer();
int startPage = 1;
int pageSize = 50;
var client = new GitHubClient(new ProductHeaderValue("github-search-app"));
var source = ...
var sink = Sink.ForEach<RepositoryPage>(Console.WriteLine);
var result = source.RunWith(sink, mat);
await result.ContinueWith(_ => sys.Terminate());
class Page<T>
{
public Page(IReadOnlyList<T> contents, int page, long total)
{
Contents = contents;
PageNumber = page;
Total = total;
}
public IReadOnlyList<T> Contents { get; set; } = new List<T>();
public int PageNumber { get; set; }
public long Total { get; set; }
}
class RepositoryPage : Page<Repository>
{
public RepositoryPage(IReadOnlyList<Repository> contents, int page, long total)
: base(contents, page, total)
{
}
public override string ToString() =>
$"Page {PageNumber}\n{string.Join("", Contents.Select(x => x.Name + "\n"))}";
}
static class GitHubClientExtensions
{
public static async Task<RepositoryPage> GetRepositoriesAsync(this GitHubClient client, int page, int size)
{
// specify a search term here
var request = new SearchRepositoriesRequest("bootstrap")
{
Page = page,
PerPage = size
};
var result = await client.Search.SearchRepo(request);
return new RepositoryPage(result.Items, page, result.TotalCount);
}
}

Related

Triggering DynamicData cache update using Reactive Subject

As a caveat I'm a novice with Rx (2 weeks) and have been experimenting with using Rx, RxUI and Roland Pheasant's DynamicData.
I have a service that initially loads data from local persistence and then, upon some user (or system) instruction will contact the server (TriggerServer in the example) to get additional or replacement data. The solution I've come up with uses a Subject and I've come across many a site discussing the pros/cons of using them. Although I understand the basics of hot/cold it's all based on reading rather than real world.
So, using the below as a simplified version, is this 'right' way of going about this problem or is there something I haven't properly understood somewhere?
NB: I'm not sure how important it is, but the actual code is taken from a Xamarin.Forms app, that uses RxUI, the user input being a ReactiveCommand.
Example:
using DynamicData;
using System;
using System.Linq;
using System.Reactive;
using System.Reactive.Disposables;
using System.Reactive.Linq;
using System.Reactive.Subjects;
using System.Threading.Tasks;
public class MyService : IDisposable
{
private CompositeDisposable _cleanup;
private Subject<Unit> _serverSubject = new Subject<Unit>();
public MyService()
{
var data = Initialise().Publish();
AllData = data.AsObservableCache();
_cleanup = new CompositeDisposable(AllData, data.Connect());
}
public IObservableCache<MyData, Guid> AllData { get; }
public void TriggerServer()
{
// This is what I'm not sure about...
_serverSubject.OnNext(Unit.Default);
}
private IObservable<IChangeSet<MyData, Guid>> Initialise()
{
return ObservableChangeSet.Create<MyData, Guid>(async cache =>
{
// inital load - is this okay?
cache.AddOrUpdate(await LoadLocalData());
// is this a valid way of doing this?
var sync = _serverSubject.Select(_ => GetDataFromServer())
.Subscribe(async task =>
{
var data = await task.ConfigureAwait(false);
cache.AddOrUpdate(data);
});
return new CompositeDisposable(sync);
}, d=> d.Id);
}
private IObservable<MyData> LoadLocalData()
{
return Observable.Timer(TimeSpan.FromSeconds(3)).Select(_ => new MyData("localdata"));
}
private async Task<MyData> GetDataFromServer()
{
await Task.Delay(2000).ConfigureAwait(true);
return new MyData("serverdata");
}
public void Dispose()
{
_cleanup?.Dispose();
}
}
public class MyData
{
public MyData(string value)
{
Value = value;
}
public Guid Id { get; } = Guid.NewGuid();
public string Value { get; set; }
}
And a simple Console app to run:
public static class TestProgram
{
public static void Main()
{
var service = new MyService();
service.AllData.Connect()
.Bind(out var myData)
.Subscribe(_=> Console.WriteLine("data in"), ()=> Console.WriteLine("COMPLETE"));
while (Continue())
{
Console.WriteLine("");
Console.WriteLine("");
Console.WriteLine($"Triggering Server Call, current data is: {string.Join(", ", myData.Select(x=> x.Value))}");
service.TriggerServer();
}
}
private static bool Continue()
{
Console.WriteLine("Press any key to call server, x to exit");
var key = Console.ReadKey();
return key.Key != ConsoleKey.X;
}
}
Looks very good for first try with Rx
I would suggest few changes:
1) Remove the Initialize() call from the constructor and make it a public method - helps a lot with unit tests and now you can await it if you need to
public static void Main()
{
var service = new MyService();
service.Initialize();
2) Add Throttle to you trigger - this fixes parallel calls to the server returning the same results
3) Don't do anything that can throw in Subscribe, use Do instead:
var sync = _serverSubject
.Throttle(Timespan.FromSeconds(0.5), RxApp.TaskPoolScheduler) // you can pass a scheduler via arguments, or use TestScheduler in unit tests to make time pass faster
.Do(async _ =>
{
var data = await GetDataFromServer().ConfigureAwait(false); // I just think this is more readable, your way was also correct
cache.AddOrUpdate(data);
})
// .Retry(); // or anything alese to handle failures
.Subscribe();
I'm putting what I've come to as my solution just in case there's others that find this while they're wandering the internets.
I ended up removing the Subjects all together and chaining together several SourceCache, so when one changed it pushed into the other and so on. I've removed some code for brevity:
public class MyService : IDisposable
{
private SourceCache<MyData, Guid> _localCache = new SourceCache<MyData, Guid>(x=> x.Id);
private SourceCache<MyData, Guid> _serverCache = new SourceCache<MyData, Guid>(x=> x.Id);
public MyService()
{
var localdata = _localCache.Connect();
var serverdata = _serverCache.Connect();
var alldata = localdata.Merge(serverdata);
AllData = alldata.AsObservableCache();
}
public IObservableCache<MyData, Guid> AllData { get; }
public IObservable<Unit> TriggerLocal()
{
return LoadLocalAsync().ToObservable();
}
public IObservable<Unit> TriggerServer()
{
return LoadServerAsync().ToObservable();
}
}
EDIT: I've changed this again to remove any chaining of caches - I just manage the one cache internally. Lesson is not to post too early.

Creating an async stream source

I have an expensive method to call for creating a batch of source items:
private Task<List<SourceItem>> GetUnprocessedBatch(int batchSize)
{
//impl
}
I want to populate new items only when there is no item to process(or it falls below a certain threshold). I couldn't figure out which Source method to use so far.
I have implemented a crude stream that would keep returning new items:
public class Stream
{
private readonly Queue<SourceItem> scrapeAttempts;
private int batchSize = 100;
private int minItemCount = 10;
public Stream()
{
scrapeAttempts = new Queue<SourceItem>();
}
public async Task<SourceItem> Next()
{
if (scrapeAttempts.Count < minItemCount)
{
var entryScrapeAttempts = await GetUnprocessedBatch(batchSize);
entryScrapeAttempts.ForEach(attempt => scrapeAttempts.Enqueue(attempt));
}
return scrapeAttempts.Dequeue();
}
}
I expected Source.Task would work but it looks like it calls it only once. How can I create a source for this scenario?
So, conceptually what you want is a Source stage, that fetches elements asynchronously in batches, buffers the batch and propagates events downstream one by one. When the buffer is close to being empty, we want to eagerly call the next fetch on the side thread (but not more than once), so it could complete while we're emptying current batch.
This sort of behavior will require building a custom GraphStage. One that could look like this:
sealed class PreFetch<T> : GraphStage<SourceShape<T>>
{
private readonly int threshold;
private readonly Func<Task<IEnumerable<T>>> fetch;
private readonly Outlet<T> outlet = new Outlet<T>("prefetch");
public PreFetch(int threshold, Func<Task<IEnumerable<T>>> fetch)
{
this.threshold = threshold;
this.fetch = fetch;
this.Shape = new SourceShape<T>(this.outlet);
}
public override SourceShape<T> Shape { get; }
protected override GraphStageLogic CreateLogic(Attributes inheritedAttributes) => new Logic(this);
private sealed class Logic : GraphStageLogic
{
public Logic(PreFetch<T> stage) : base(stage.Shape)
{
// queue for batched elements
var queue = new Queue<T>();
// flag which indicates, that pull from downstream was made,
// but we didn't have any elements at that moment
var wasPulled = false;
// determines if fetch was already called
var fetchInProgress = false;
// in order to cooperate with async calls without data races,
// we need to register async callbacks for success and failure scenarios
var onSuccess = this.GetAsyncCallback<IEnumerable<T>>(batch =>
{
foreach (var item in batch) queue.Enqueue(item);
if (wasPulled)
{
// if pull was requested but not fulfilled, we need to push now, as we have elements
// it assumes that fetch returned non-empty batch
Push(stage.outlet, queue.Dequeue());
wasPulled = false;
}
fetchInProgress = false;
});
var onFailure = this.GetAsyncCallback<Exception>(this.FailStage);
SetHandler(stage.outlet, onPull: () => {
if (queue.Count < stage.threshold && !fetchInProgress)
{
// if queue occupation reached bellow expected capacity
// call fetch on a side thread and handle its result asynchronously
stage.fetch().ContinueWith(task =>
{
// depending on if task was failed or not, we call corresponding callback
if (task.IsFaulted || task.IsCanceled)
onFailure(task.Exception as Exception ?? new TaskCanceledException(task));
else onSuccess(task.Result);
});
fetchInProgress = true;
}
// if queue is empty, we cannot push immediatelly, so we only mark
// that pull request has been made but not fulfilled
if (queue.Count == 0)
wasPulled = true;
else
{
Push(stage.outlet, queue.Dequeue());
wasPulled = false;
}
});
}
}
}

How to implement a continuous producer-consumer pattern inside a Windows Service

Here's what I'm trying to do:
Keep a queue in memory of items that need processed (i.e. IsProcessed = 0)
Every 5 seconds, get unprocessed items from the db, and if they're not already in the queue, add them
Continuous pull items from the queue, process them, and each time an item is processed, update it in the db (IsProcessed = 1)
Do this all "as parallel as possible"
I have a constructor for my service like
public MyService()
{
Ticker.Elapsed += FillQueue;
}
and I start that timer when the service starts like
protected override void OnStart(string[] args)
{
Ticker.Enabled = true;
Task.Run(() => { ConsumeWork(); });
}
and my FillQueue is like
private static async void FillQueue(object source, ElapsedEventArgs e)
{
var items = GetUnprocessedItemsFromDb();
foreach(var item in items)
{
if(!Work.Contains(item))
{
Work.Enqueue(item);
}
}
}
and my ConsumeWork is like
private static void ConsumeWork()
{
while(true)
{
if(Work.Count > 0)
{
var item = Work.Peek();
Process(item);
Work.Dequeue();
}
else
{
Thread.Sleep(500);
}
}
}
However this is probably a naive implementation and I'm wondering whether .NET has any type of class that is exactly what I need for this type of situation.
Though #JSteward' answer is a good start, you can improve it with mixing up the TPL-Dataflow and Rx.NET extensions, as a dataflow block may easily become an observer for your data, and with Rx Timer it will be much less effort for you (Rx.Timer explanation).
We can adjust MSDN article for your needs, like this:
private const int EventIntervalInSeconds = 5;
private const int DueIntervalInSeconds = 60;
var source =
// sequence of Int64 numbers, starting from 0
// https://msdn.microsoft.com/en-us/library/hh229435.aspx
Observable.Timer(
// fire first event after 1 minute waiting
TimeSpan.FromSeconds(DueIntervalInSeconds),
// fire all next events each 5 seconds
TimeSpan.FromSeconds(EventIntervalInSeconds))
// each number will have a timestamp
.Timestamp()
// each time we select some items to process
.SelectMany(GetItemsFromDB)
// filter already added
.Where(i => !_processedItemIds.Contains(i.Id));
var action = new ActionBlock<Item>(ProcessItem, new ExecutionDataflowBlockOptions
{
// we can start as many item processing as processor count
MaxDegreeOfParallelism = Environment.ProcessorCount,
});
IDisposable subscription = source.Subscribe(action.AsObserver());
Also, your check for item being already processed isn't quite accurate, as there is a possibility to item get selected as unprocessed from db right at the time you've finished it's processing, yet didn't update it in database. In this case item will be removed from Queue<T>, and after that added there again by producer, this is why I've added the ConcurrentBag<T> to this solution (HashSet<T> isn't thread-safe):
private static async Task ProcessItem(Item item)
{
if (_processedItemIds.Contains(item.Id))
{
return;
}
_processedItemIds.Add(item.Id);
// actual work here
// save item as processed in database
// we need to wait to ensure item not to appear in queue again
await Task.Delay(TimeSpan.FromSeconds(EventIntervalInSeconds * 2));
// clear the processed cache to reduce memory usage
_processedItemIds.Remove(item.Id);
}
public class Item
{
public Guid Id { get; set; }
}
// temporary cache for items in process
private static ConcurrentBag<Guid> _processedItemIds = new ConcurrentBag<Guid>();
private static IEnumerable<Item> GetItemsFromDB(Timestamped<long> time)
{
// log event timing
Console.WriteLine($"Event # {time.Value} at {time.Timestamp}");
// return items from DB
return new[] { new Item { Id = Guid.NewGuid() } };
}
You can implement cache clean up in other way, for example, start a "GC" timer, which will remove processed items from cache on regular basis.
To stop events and processing items you should Dispose the subscription and, maybe, Complete the ActionBlock:
subscription.Dispose();
action.Complete();
You can find more information about Rx.Net in their guidelines on github.
You could use an ActionBlock to do your processing, it has a built in queue that you can post work to. You can read up on tpl-dataflow here: Intro to TPL-Dataflow also Introduction to Dataflow, Part 1. Finally, this is a quick sample to get you going. I've left out a lot but it should at least get you started.
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace MyWorkProcessor {
public class WorkProcessor {
public WorkProcessor() {
Processor = CreatePipeline();
}
public async Task StartProcessing() {
try {
await Task.Run(() => GetWorkFromDatabase());
} catch (OperationCanceledException) {
//handle cancel
}
}
private CancellationTokenSource cts {
get;
set;
}
private ITargetBlock<WorkItem> Processor {
get;
}
private TimeSpan DatabasePollingFrequency {
get;
} = TimeSpan.FromSeconds(5);
private ITargetBlock<WorkItem> CreatePipeline() {
var options = new ExecutionDataflowBlockOptions() {
BoundedCapacity = 100,
CancellationToken = cts.Token
};
return new ActionBlock<WorkItem>(item => ProcessWork(item), options);
}
private async Task GetWorkFromDatabase() {
while (!cts.IsCancellationRequested) {
var work = await GetWork();
await Processor.SendAsync(work);
await Task.Delay(DatabasePollingFrequency);
}
}
private async Task<WorkItem> GetWork() {
return await Context.GetWork();
}
private void ProcessWork(WorkItem item) {
//do processing
}
}
}

How to implement generic callbacks using the C# Task Parallel Library and IProducerConsumerCollection?

I have a component that submits requests to a web-based API, but these requests must be throttled so as not to contravene the API's data limits. This means that all requests must pass through a queue to control the rate at which they are submitted, but they can (and should) execute concurrently to achieve maximum throughput. Each request must return some data to the calling code at some point in the future when it completes.
I'm struggling to create a nice model to handle the return of data.
Using a BlockingCollection I can't just return a Task<TResult> from the Schedule method, because the enqueuing and dequeuing processes are at either ends of the buffer. So instead I create a RequestItem<TResult> type that contains a callback of the form Action<Task<TResult>>.
The idea is that once an item has been pulled from the queue the callback can be invoked with the started task, but I've lost the generic type parameters by that point and I'm left using reflection and all kinds of nastiness (if it's even possible).
For example:
public class RequestScheduler
{
private readonly BlockingCollection<IRequestItem> _queue = new BlockingCollection<IRequestItem>();
public RequestScheduler()
{
this.Start();
}
// This can't return Task<TResult>, so returns void.
// Instead RequestItem is generic but this poses problems when adding to the queue
public void Schedule<TResult>(RequestItem<TResult> request)
{
_queue.Add(request);
}
private void Start()
{
Task.Factory.StartNew(() =>
{
foreach (var item in _queue.GetConsumingEnumerable())
{
// I want to be able to use the original type parameters here
// is there a nice way without reflection?
// ProcessItem submits an HttpWebRequest
Task.Factory.StartNew(() => ProcessItem(item))
.ContinueWith(t => { item.Callback(t); });
}
});
}
public void Stop()
{
_queue.CompleteAdding();
}
}
public class RequestItem<TResult> : IRequestItem
{
public IOperation<TResult> Operation { get; set; }
public Action<Task<TResult>> Callback { get; set; }
}
How can I continue to buffer my requests but return a Task<TResult> to the client when the request is pulled from the buffer and submitted to the API?
First, you can return Task<TResult> from Schedule(), you just need to use TaskCompletionSource for that.
Second, to get around the genericity issue, you can hide all of it inside (non-generic) Actions. In Schedule(), create an action using a lambda that does exactly what you need. The consuming loop will then execute that action, it doesn't need to know what's inside.
Third, I don't understand why are you starting a new Task in each iteration of the loop. For one, it means you won't actually get any throttling.
With these modifications, the code could look like this:
public class RequestScheduler
{
private readonly BlockingCollection<Action> m_queue = new BlockingCollection<Action>();
public RequestScheduler()
{
this.Start();
}
private void Start()
{
Task.Factory.StartNew(() =>
{
foreach (var action in m_queue.GetConsumingEnumerable())
{
action();
}
}, TaskCreationOptions.LongRunning);
}
public Task<TResult> Schedule<TResult>(IOperation<TResult> operation)
{
var tcs = new TaskCompletionSource<TResult>();
Action action = () =>
{
try
{
tcs.SetResult(ProcessItem(operation));
}
catch (Exception e)
{
tcs.SetException(e);
}
};
m_queue.Add(action);
return tcs.Task;
}
private T ProcessItem<T>(IOperation<T> operation)
{
// whatever
}
}

Async result handle to return to callers

I have a method that queues some work to be executed asynchronously. I'd like to return some sort of handle to the caller that can be polled, waited on, or used to fetch the return value from the operation, but I can't find a class or interface that's suitable for the task.
BackgroundWorker comes close, but it's geared to the case where the worker has its own dedicated thread, which isn't true in my case. IAsyncResult looks promising, but the provided AsyncResult implementation is also unusable for me. Should I implement IAsyncResult myself?
Clarification:
I have a class that conceptually looks like this:
class AsyncScheduler
{
private List<object> _workList = new List<object>();
private bool _finished = false;
public SomeHandle QueueAsyncWork(object workObject)
{
// simplified for the sake of example
_workList.Add(workObject);
return SomeHandle;
}
private void WorkThread()
{
// simplified for the sake of example
while (!_finished)
{
foreach (object workObject in _workList)
{
if (!workObject.IsFinished)
{
workObject.DoSomeWork();
}
}
Thread.Sleep(1000);
}
}
}
The QueueAsyncWork function pushes a work item onto the polling list for a dedicated work thread, of which there will only over be one. My problem is not with writing the QueueAsyncWork function--that's fine. My question is, what do I return to the caller? What should SomeHandle be?
The existing .Net classes for this are geared towards the situation where the asynchronous operation can be encapsulated in a single method call that returns. That's not the case here--all of the work objects do their work on the same thread, and a complete work operation might span multiple calls to workObject.DoSomeWork(). In this case, what's a reasonable approach for offering the caller some handle for progress notification, completion, and getting the final outcome of the operation?
Yes, implement IAsyncResult (or rather, an extended version of it, to provide for progress reporting).
public class WorkObjectHandle : IAsyncResult, IDisposable
{
private int _percentComplete;
private ManualResetEvent _waitHandle;
public int PercentComplete {
get {return _percentComplete;}
set
{
if (value < 0 || value > 100) throw new InvalidArgumentException("Percent complete should be between 0 and 100");
if (_percentComplete = 100) throw new InvalidOperationException("Already complete");
if (value == 100 && Complete != null) Complete(this, new CompleteArgs(WorkObject));
_percentComplete = value;
}
public IWorkObject WorkObject {get; private set;}
public object AsyncState {get {return WorkObject;}}
public bool IsCompleted {get {return _percentComplete == 100;}}
public event EventHandler<CompleteArgs> Complete; // CompleteArgs in a usual pattern
// you may also want to have Progress event
public bool CompletedSynchronously {get {return false;}}
public WaitHandle
{
get
{
// initialize it lazily
if (_waitHandle == null)
{
ManualResetEvent newWaitHandle = new ManualResetEvent(false);
if (Interlocked.CompareExchange(ref _waitHandle, newWaitHandle, null) != null)
newWaitHandle.Dispose();
}
return _waitHandle;
}
}
public void Dispose()
{
if (_waitHandle != null)
_waitHandle.Dispose();
// dispose _workObject too, if needed
}
public WorkObjectHandle(IWorkObject workObject)
{
WorkObject = workObject;
_percentComplete = 0;
}
}
public class AsyncScheduler
{
private Queue<WorkObjectHandle> _workQueue = new Queue<WorkObjectHandle>();
private bool _finished = false;
public WorkObjectHandle QueueAsyncWork(IWorkObject workObject)
{
var handle = new WorkObjectHandle(workObject);
lock(_workQueue)
{
_workQueue.Enqueue(handle);
}
return handle;
}
private void WorkThread()
{
// simplified for the sake of example
while (!_finished)
{
WorkObjectHandle handle;
lock(_workQueue)
{
if (_workQueue.Count == 0) break;
handle = _workQueue.Dequeue();
}
try
{
var workObject = handle.WorkObject;
// do whatever you want with workObject, set handle.PercentCompleted, etc.
}
finally
{
handle.Dispose();
}
}
}
}
If I understand correctly you have a collection of work objects (IWorkObject) that each complete a task via multiple calls to a DoSomeWork method. When an IWorkObject object has finished its work you'd like to respond to that somehow and during the process you'd like to respond to any reported progress?
In that case I'd suggest you take a slightly different approach. You could take a look at the Parallel Extension framework (blog). Using the framework, you could write something like this:
public void QueueWork(IWorkObject workObject)
{
Task.TaskFactory.StartNew(() =>
{
while (!workObject.Finished)
{
int progress = workObject.DoSomeWork();
DoSomethingWithReportedProgress(workObject, progress);
}
WorkObjectIsFinished(workObject);
});
}
Some things to note:
QueueWork now returns void. The reason for this is that the actions that occur when progress is reported or when the task completes have become part of the thread that executes the work. You could of course return the Task that the factory creates and return that from the method (to enable polling for example).
The progress-reporting and finish-handling are now part of the thread because you should always avoid polling when possible. Polling is more expensive because usually you either poll too frequently (too early) or not often enough (too late). There is no reason you can't report on the progress and finishing of the task from within the thread that is running the task.
The above could also be implemented using the (lower level) ThreadPool.QueueUserWorkItem method.
Using QueueUserWorkItem:
public void QueueWork(IWorkObject workObject)
{
ThreadPool.QueueUserWorkItem(() =>
{
while (!workObject.Finished)
{
int progress = workObject.DoSomeWork();
DoSomethingWithReportedProgress(workObject, progress);
}
WorkObjectIsFinished(workObject);
});
}
The WorkObject class can contain the properties that need to be tracked.
public class WorkObject
{
public PercentComplete { get; private set; }
public IsFinished { get; private set; }
public void DoSomeWork()
{
// work done here
this.PercentComplete = 50;
// some more work done here
this.PercentComplete = 100;
this.IsFinished = true;
}
}
Then in your example:
Change the collection from a List to a Dictionary that can hold Guid values (or any other means of uniquely identifying the value).
Expose the correct WorkObject's properties by having the caller pass the Guid that it received from QueueAsyncWork.
I'm assuming that you'll start WorkThread asynchronously (albeit, the only asynchronous thread); plus, you'll have to make retrieving the dictionary values and WorkObject properties thread-safe.
private Dictionary<Guid, WorkObject> _workList =
new Dictionary<Guid, WorkObject>();
private bool _finished = false;
public Guid QueueAsyncWork(WorkObject workObject)
{
Guid guid = Guid.NewGuid();
// simplified for the sake of example
_workList.Add(guid, workObject);
return guid;
}
private void WorkThread()
{
// simplified for the sake of example
while (!_finished)
{
foreach (WorkObject workObject in _workList)
{
if (!workObject.IsFinished)
{
workObject.DoSomeWork();
}
}
Thread.Sleep(1000);
}
}
// an example of getting the WorkObject's property
public int GetPercentComplete(Guid guid)
{
WorkObject workObject = null;
if (!_workList.TryGetValue(guid, out workObject)
throw new Exception("Unable to find Guid");
return workObject.PercentComplete;
}
The simplest way to do this is described here. Suppose you have a method string DoSomeWork(int). You then create a delegate of the correct type, for example:
Func<int, string> myDelegate = DoSomeWork;
Then you call the BeginInvoke method on the delegate:
int parameter = 10;
myDelegate.BeginInvoke(parameter, Callback, null);
The Callback delegate will be called once your asynchronous call has completed. You can define this method as follows:
void Callback(IAsyncResult result)
{
var asyncResult = (AsyncResult) result;
var #delegate = (Func<int, string>) asyncResult.AsyncDelegate;
string methodReturnValue = #delegate.EndInvoke(result);
}
Using the described scenario, you can also poll for results or wait on them. Take a look at the url I provided for more info.
Regards,
Ronald
If you don't want to use async callbacks, you can use an explicit WaitHandle, such as a ManualResetEvent:
public abstract class WorkObject : IDispose
{
ManualResetEvent _waitHandle = new ManualResetEvent(false);
public void DoSomeWork()
{
try
{
this.DoSomeWorkOverride();
}
finally
{
_waitHandle.Set();
}
}
protected abstract DoSomeWorkOverride();
public void WaitForCompletion()
{
_waitHandle.WaitOne();
}
public void Dispose()
{
_waitHandle.Dispose();
}
}
And in your code you could say
using (var workObject = new SomeConcreteWorkObject())
{
asyncScheduler.QueueAsyncWork(workObject);
workObject.WaitForCompletion();
}
Don't forget to call Dispose on your workObject though.
You can always use alternate implementations which create a wrapper like this for every work object, and who call _waitHandle.Dispose() in WaitForCompletion(), you can lazily instantiate the wait handle (careful: race conditions ahead), etc. (That's pretty much what BeginInvoke does for delegates.)

Categories

Resources