Here's what I'm trying to do:
Keep a queue in memory of items that need processed (i.e. IsProcessed = 0)
Every 5 seconds, get unprocessed items from the db, and if they're not already in the queue, add them
Continuous pull items from the queue, process them, and each time an item is processed, update it in the db (IsProcessed = 1)
Do this all "as parallel as possible"
I have a constructor for my service like
public MyService()
{
Ticker.Elapsed += FillQueue;
}
and I start that timer when the service starts like
protected override void OnStart(string[] args)
{
Ticker.Enabled = true;
Task.Run(() => { ConsumeWork(); });
}
and my FillQueue is like
private static async void FillQueue(object source, ElapsedEventArgs e)
{
var items = GetUnprocessedItemsFromDb();
foreach(var item in items)
{
if(!Work.Contains(item))
{
Work.Enqueue(item);
}
}
}
and my ConsumeWork is like
private static void ConsumeWork()
{
while(true)
{
if(Work.Count > 0)
{
var item = Work.Peek();
Process(item);
Work.Dequeue();
}
else
{
Thread.Sleep(500);
}
}
}
However this is probably a naive implementation and I'm wondering whether .NET has any type of class that is exactly what I need for this type of situation.
Though #JSteward' answer is a good start, you can improve it with mixing up the TPL-Dataflow and Rx.NET extensions, as a dataflow block may easily become an observer for your data, and with Rx Timer it will be much less effort for you (Rx.Timer explanation).
We can adjust MSDN article for your needs, like this:
private const int EventIntervalInSeconds = 5;
private const int DueIntervalInSeconds = 60;
var source =
// sequence of Int64 numbers, starting from 0
// https://msdn.microsoft.com/en-us/library/hh229435.aspx
Observable.Timer(
// fire first event after 1 minute waiting
TimeSpan.FromSeconds(DueIntervalInSeconds),
// fire all next events each 5 seconds
TimeSpan.FromSeconds(EventIntervalInSeconds))
// each number will have a timestamp
.Timestamp()
// each time we select some items to process
.SelectMany(GetItemsFromDB)
// filter already added
.Where(i => !_processedItemIds.Contains(i.Id));
var action = new ActionBlock<Item>(ProcessItem, new ExecutionDataflowBlockOptions
{
// we can start as many item processing as processor count
MaxDegreeOfParallelism = Environment.ProcessorCount,
});
IDisposable subscription = source.Subscribe(action.AsObserver());
Also, your check for item being already processed isn't quite accurate, as there is a possibility to item get selected as unprocessed from db right at the time you've finished it's processing, yet didn't update it in database. In this case item will be removed from Queue<T>, and after that added there again by producer, this is why I've added the ConcurrentBag<T> to this solution (HashSet<T> isn't thread-safe):
private static async Task ProcessItem(Item item)
{
if (_processedItemIds.Contains(item.Id))
{
return;
}
_processedItemIds.Add(item.Id);
// actual work here
// save item as processed in database
// we need to wait to ensure item not to appear in queue again
await Task.Delay(TimeSpan.FromSeconds(EventIntervalInSeconds * 2));
// clear the processed cache to reduce memory usage
_processedItemIds.Remove(item.Id);
}
public class Item
{
public Guid Id { get; set; }
}
// temporary cache for items in process
private static ConcurrentBag<Guid> _processedItemIds = new ConcurrentBag<Guid>();
private static IEnumerable<Item> GetItemsFromDB(Timestamped<long> time)
{
// log event timing
Console.WriteLine($"Event # {time.Value} at {time.Timestamp}");
// return items from DB
return new[] { new Item { Id = Guid.NewGuid() } };
}
You can implement cache clean up in other way, for example, start a "GC" timer, which will remove processed items from cache on regular basis.
To stop events and processing items you should Dispose the subscription and, maybe, Complete the ActionBlock:
subscription.Dispose();
action.Complete();
You can find more information about Rx.Net in their guidelines on github.
You could use an ActionBlock to do your processing, it has a built in queue that you can post work to. You can read up on tpl-dataflow here: Intro to TPL-Dataflow also Introduction to Dataflow, Part 1. Finally, this is a quick sample to get you going. I've left out a lot but it should at least get you started.
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace MyWorkProcessor {
public class WorkProcessor {
public WorkProcessor() {
Processor = CreatePipeline();
}
public async Task StartProcessing() {
try {
await Task.Run(() => GetWorkFromDatabase());
} catch (OperationCanceledException) {
//handle cancel
}
}
private CancellationTokenSource cts {
get;
set;
}
private ITargetBlock<WorkItem> Processor {
get;
}
private TimeSpan DatabasePollingFrequency {
get;
} = TimeSpan.FromSeconds(5);
private ITargetBlock<WorkItem> CreatePipeline() {
var options = new ExecutionDataflowBlockOptions() {
BoundedCapacity = 100,
CancellationToken = cts.Token
};
return new ActionBlock<WorkItem>(item => ProcessWork(item), options);
}
private async Task GetWorkFromDatabase() {
while (!cts.IsCancellationRequested) {
var work = await GetWork();
await Processor.SendAsync(work);
await Task.Delay(DatabasePollingFrequency);
}
}
private async Task<WorkItem> GetWork() {
return await Context.GetWork();
}
private void ProcessWork(WorkItem item) {
//do processing
}
}
}
Related
I want to use IAsyncEnumerable like a Source for akka streams. But I not found, how do it.
No sutiable method in Source class for this code.
using System.Collections.Generic;
using System.Threading.Tasks;
using Akka.Streams.Dsl;
namespace ConsoleApp1
{
class Program
{
static async Task Main(string[] args)
{
Source.From(await AsyncEnumerable())
.Via(/*some action*/)
//.....
}
private static async IAsyncEnumerable<int> AsyncEnumerable()
{
//some async enumerable
}
}
}
How use IAsyncEnumerbale for Source?
This has been done in the past as a part of Akka.NET Streams contrib package, but since I don't see it there anymore, let's go through on how to implement such source. The topic can be quite long, as:
Akka.NET Streams is really about graph processing - we're talking about many-inputs/many-outputs configurations (in Akka.NET they're called inlets and outlets) with support for cycles in graphs.
Akka.NET is not build on top of .NET async/await or even on top of .NET standard thread pool library - they're both pluggable, which means that the lowest barier is basically using callbacks and encoding what C# compiler sometimes does for us.
Akka.NET streams is capable of both pushing and pulling values between stages/operators. IAsyncEnumerable<T> can only pull data while IObservable<T> can only push it, so we get more expressive power here, but this comes at a cost.
The basics of low level API used to implement custom stages can be found in the docs.
The starter boilerplate looks like this:
public static class AsyncEnumerableExtensions {
// Helper method to change IAsyncEnumerable into Akka.NET Source.
public static Source<T, NotUsed> AsSource<T>(this IAsyncEnumerable<T> source) =>
Source.FromGraph(new AsyncEnumerableSource<T>(source));
}
// Source stage is description of a part of the graph that doesn't consume
// any data, only produce it using a single output channel.
public sealed class AsyncEnumerableSource<T> : GraphStage<SourceShape<T>>
{
private readonly IAsyncEnumerable<T> _enumerable;
public AsyncEnumerableSource(IAsyncEnumerable<T> enumerable)
{
_enumerable = enumerable;
Outlet = new Outlet<T>("asyncenumerable.out");
Shape = new SourceShape<T>(Outlet);
}
public Outlet<T> Outlet { get; }
public override SourceShape<T> Shape { get; }
/// Logic if to a graph stage, what enumerator is to enumerable.
protected override GraphStageLogic CreateLogic(Attributes inheritedAttributes) =>
new Logic(this);
sealed class Logic: OutGraphStageLogic
{
public override void OnPull()
{
// method called whenever a consumer asks for new data
}
public override void OnDownstreamFinish()
{
// method called whenever a consumer stage finishes,used for disposals
}
}
}
As mentioned, we don't use async/await straight away here: even more, calling Logic methods in asynchronous context is unsafe. To make it safe we need to register out methods that may be called from other threads using GetAsyncCallback<T> and call them via returned wrappers. This will ensure, that not data races will happen when executing asynchronous code.
sealed class Logic : OutGraphStageLogic
{
private readonly Outlet<T> _outlet;
// enumerator we'll call for MoveNextAsync, and eventually dispose
private readonly IAsyncEnumerator<T> _enumerator;
// callback called whenever _enumerator.MoveNextAsync completes asynchronously
private readonly Action<Task<bool>> _onMoveNext;
// callback called whenever _enumerator.DisposeAsync completes asynchronously
private readonly Action<Task> _onDisposed;
// cache used for errors thrown by _enumerator.MoveNextAsync, that
// should be rethrown after _enumerator.DisposeAsync
private Exception? _failReason = null;
public Logic(AsyncEnumerableSource<T> source) : base(source.Shape)
{
_outlet = source.Outlet;
_enumerator = source._enumerable.GetAsyncEnumerator();
_onMoveNext = GetAsyncCallback<Task<bool>>(OnMoveNext);
_onDisposed = GetAsyncCallback<Task>(OnDisposed);
}
// ... other methods
}
The last part to do are methods overriden on `Logic:
OnPull used whenever the downstream stage calls for new data. Here we need to call for next element of async enumerator sequence.
OnDownstreamFinish called whenever the downstream stage has finished and will not ask for any new data. It's the place for us to dispose our enumerator.
Thing is these methods are not async/await, while their enumerator's equivalent are. What we basically need to do there is to:
Call corresponding async methods of underlying enumerator (OnPull → MoveNextAsync and OnDownstreamFinish → DisposeAsync).
See, if we can take their results immediately - it's important part that usually is done for us as part of C# compiler in async/await calls.
If not, and we need to wait for the results - call ContinueWith to register our callback wrappers to be called once async methods are done.
sealed class Logic : OutGraphStageLogic
{
// ... constructor and fields
public override void OnPull()
{
var hasNext = _enumerator.MoveNextAsync();
if (hasNext.IsCompletedSuccessfully)
{
// first try short-path: values is returned immediately
if (hasNext.Result)
// check if there was next value and push it downstream
Push(_outlet, _enumerator.Current);
else
// if there was none, we reached end of async enumerable
// and we can dispose it
DisposeAndComplete();
}
else
// we need to wait for the result
hasNext.AsTask().ContinueWith(_onMoveNext);
}
// This method is called when another stage downstream has been completed
public override void OnDownstreamFinish() =>
// dispose enumerator on downstream finish
DisposeAndComplete();
private void DisposeAndComplete()
{
var disposed = _enumerator.DisposeAsync();
if (disposed.IsCompletedSuccessfully)
{
// enumerator disposal completed immediately
if (_failReason is not null)
// if we close this stream in result of error in MoveNextAsync,
// fail the stage
FailStage(_failReason);
else
// we can close the stage with no issues
CompleteStage();
}
else
// we need to await for enumerator to be disposed
disposed.AsTask().ContinueWith(_onDisposed);
}
private void OnMoveNext(Task<bool> task)
{
// since this is callback, it will always be completed, we just need
// to check for exceptions
if (task.IsCompletedSuccessfully)
{
if (task.Result)
// if task returns true, it means we read a value
Push(_outlet, _enumerator.Current);
else
// otherwise there are no more values to read and we can close the source
DisposeAndComplete();
}
else
{
// task either failed or has been cancelled
_failReason = task.Exception as Exception ?? new TaskCanceledException(task);
FailStage(_failReason);
}
}
private void OnDisposed(Task task)
{
if (task.IsCompletedSuccessfully) CompleteStage();
else {
var reason = task.Exception as Exception
?? _failReason
?? new TaskCanceledException(task);
FailStage(reason);
}
}
}
As of Akka.NET v1.4.30 this is now natively supported inside Akka.Streams via the RunAsAsyncEnumerable method:
var input = Enumerable.Range(1, 6).ToList();
var cts = new CancellationTokenSource();
var token = cts.Token;
var asyncEnumerable = Source.From(input).RunAsAsyncEnumerable(Materializer);
var output = input.ToArray();
bool caught = false;
try
{
await foreach (var a in asyncEnumerable.WithCancellation(token))
{
cts.Cancel();
}
}
catch (OperationCanceledException e)
{
caught = true;
}
caught.ShouldBeTrue();
I copied that sample from the Akka.NET test suite, in case you're wondering.
You can also use an existing primitive for streaming large collections of data. Here is an example of using Source.unfoldAsync to stream pages of data - in this case github repositories using Octokit - until there is no more.
var source = Source.UnfoldAsync<int, RepositoryPage>(startPage, page =>
{
var pageTask = client.GetRepositoriesAsync(page, pageSize);
var next = pageTask.ContinueWith(task =>
{
var page = task.Result;
if (page.PageNumber * pageSize > page.Total) return Option<(int, RepositoryPage)>.None;
else return new Option<(int, RepositoryPage)>((page.PageNumber + 1, page));
});
return next;
});
To run
using var sys = ActorSystem.Create("system");
using var mat = sys.Materializer();
int startPage = 1;
int pageSize = 50;
var client = new GitHubClient(new ProductHeaderValue("github-search-app"));
var source = ...
var sink = Sink.ForEach<RepositoryPage>(Console.WriteLine);
var result = source.RunWith(sink, mat);
await result.ContinueWith(_ => sys.Terminate());
class Page<T>
{
public Page(IReadOnlyList<T> contents, int page, long total)
{
Contents = contents;
PageNumber = page;
Total = total;
}
public IReadOnlyList<T> Contents { get; set; } = new List<T>();
public int PageNumber { get; set; }
public long Total { get; set; }
}
class RepositoryPage : Page<Repository>
{
public RepositoryPage(IReadOnlyList<Repository> contents, int page, long total)
: base(contents, page, total)
{
}
public override string ToString() =>
$"Page {PageNumber}\n{string.Join("", Contents.Select(x => x.Name + "\n"))}";
}
static class GitHubClientExtensions
{
public static async Task<RepositoryPage> GetRepositoriesAsync(this GitHubClient client, int page, int size)
{
// specify a search term here
var request = new SearchRepositoriesRequest("bootstrap")
{
Page = page,
PerPage = size
};
var result = await client.Search.SearchRepo(request);
return new RepositoryPage(result.Items, page, result.TotalCount);
}
}
Suppose I am provided with an event producer API consisting of Start(), Pause(), and Resume() methods, and an ItemAvailable event. The producer itself is external code, and I have no control over its threading. A few items may still come through after Pause() is called (the producer is actually remote, so items may already be in flight over the network).
Suppose also that I am writing consumer code, where consumption may be slower than production.
Critical requirements are
The consumer event handler must not block the producer thread, and
All events must be processed (no data can be dropped).
I introduce a buffer into the consumer to smooth out some burstiness. But in the case of extended burstiness, I want to call Producer.Pause(), and then Resume() at an appropriate time, to avoid running out of memory at the consumer side.
I have a solution making use of Interlocked to increment and decrement a counter, which is compared to a threshold to decide whether it is time to Pause or Resume.
Question: Is there a better solution than the Interlocked counter (int current in the code below), in terms of efficiency (and elegance)?
Updated MVP (no longer bounces off the limiter):
namespace Experiments
{
internal class Program
{
// simple external producer API for demo purposes
private class Producer
{
public void Pause(int i) { _blocker.Reset(); Console.WriteLine($"paused at {i}"); }
public void Resume(int i) { _blocker.Set(); Console.WriteLine($"resumed at {i}"); }
public async Task Start()
{
await Task.Run
(
() =>
{
for (int i = 0; i < 10000; i++)
{
_blocker.Wait();
ItemAvailable?.Invoke(this, i);
}
}
);
}
public event EventHandler<int> ItemAvailable;
private ManualResetEventSlim _blocker = new(true);
}
private static async Task Main(string[] args)
{
var p = new Producer();
var buffer = Channel.CreateUnbounded<int>(new UnboundedChannelOptions { SingleWriter = true });
int threshold = 1000;
int resumeAt = 10;
int current = 0;
int paused = 0;
p.ItemAvailable += (_, i) =>
{
if (Interlocked.Increment(ref current) >= threshold
&& Interlocked.CompareExchange(ref paused, 0, 1) == 0
) p.Pause(i);
buffer.Writer.TryWrite(i);
};
var processor = Task.Run
(
async () =>
{
await foreach (int i in buffer.Reader.ReadAllAsync())
{
Console.WriteLine($"processing {i}");
await Task.Delay(10);
if
(
Interlocked.Decrement(ref current) < resumeAt
&& Interlocked.CompareExchange(ref paused, 1, 0) == 1
) p.Resume(i);
}
}
);
p.Start();
await processor;
}
}
}
If you are aiming at elegance, you could consider baking the pressure-awareness functionality inside a custom Channel<T>. Below is a PressureAwareUnboundedChannel<T> class that derives from the Channel<T>. It offers all the functionality of the base class, plus it emits notifications when the channel becomes under pressure, and when the pressure is relieved. The notifications are pushed through an IProgress<bool> instance, that emits a true value when the pressure surpasses a specific high-threshold, and a false value when the pressure drops under a specific low-threshold.
public sealed class PressureAwareUnboundedChannel<T> : Channel<T>
{
private readonly Channel<T> _channel;
private readonly int _highPressureThreshold;
private readonly int _lowPressureThreshold;
private readonly IProgress<bool> _pressureProgress;
private int _pressureState = 0; // 0: no pressure, 1: under pressure
public PressureAwareUnboundedChannel(int lowPressureThreshold,
int highPressureThreshold, IProgress<bool> pressureProgress)
{
if (lowPressureThreshold < 0)
throw new ArgumentOutOfRangeException(nameof(lowPressureThreshold));
if (highPressureThreshold < lowPressureThreshold)
throw new ArgumentOutOfRangeException(nameof(highPressureThreshold));
if (pressureProgress == null)
throw new ArgumentNullException(nameof(pressureProgress));
_highPressureThreshold = highPressureThreshold;
_lowPressureThreshold = lowPressureThreshold;
_pressureProgress = pressureProgress;
_channel = Channel.CreateBounded<T>(Int32.MaxValue);
this.Writer = new ChannelWriter(this);
this.Reader = new ChannelReader(this);
}
private class ChannelWriter : ChannelWriter<T>
{
private readonly PressureAwareUnboundedChannel<T> _parent;
public ChannelWriter(PressureAwareUnboundedChannel<T> parent)
=> _parent = parent;
public override bool TryComplete(Exception error = null)
=> _parent._channel.Writer.TryComplete(error);
public override bool TryWrite(T item)
{
bool success = _parent._channel.Writer.TryWrite(item);
if (success) _parent.SignalWriteOrRead();
return success;
}
public override ValueTask<bool> WaitToWriteAsync(
CancellationToken cancellationToken = default)
=> _parent._channel.Writer.WaitToWriteAsync(cancellationToken);
}
private class ChannelReader : ChannelReader<T>
{
private readonly PressureAwareUnboundedChannel<T> _parent;
public ChannelReader(PressureAwareUnboundedChannel<T> parent)
=> _parent = parent;
public override Task Completion => _parent._channel.Reader.Completion;
public override bool CanCount => _parent._channel.Reader.CanCount;
public override int Count => _parent._channel.Reader.Count;
public override bool TryRead(out T item)
{
bool success = _parent._channel.Reader.TryRead(out item);
if (success) _parent.SignalWriteOrRead();
return success;
}
public override ValueTask<bool> WaitToReadAsync(
CancellationToken cancellationToken = default)
=> _parent._channel.Reader.WaitToReadAsync(cancellationToken);
}
private void SignalWriteOrRead()
{
var currentCount = _channel.Reader.Count;
bool underPressure;
if (currentCount > _highPressureThreshold)
underPressure = true;
else if (currentCount <= _lowPressureThreshold)
underPressure = false;
else
return;
int newState = underPressure ? 1 : 0;
int oldState = underPressure ? 0 : 1;
if (Interlocked.CompareExchange(
ref _pressureState, newState, oldState) != oldState) return;
_pressureProgress.Report(underPressure);
}
}
The encapsulated Channel<T> is actually a bounded channel, having capacity equal to the maximum Int32 value, because only bounded channels implement the Reader.Count property.¹
Usage example:
var progress = new Progress<bool>(underPressure =>
{
if (underPressure) Producer.Pause(); else Producer.Resume();
});
var channel = new PressureAwareUnboundedChannel<Item>(500, 1000, progress);
In this example the Producer will be paused when the items stored inside the channel become more than 1000, and it will be resumed when the number of items drops to 500 or less.
The Progress<bool> action is invoked on the context that was captured at the time of the Progress<bool>'s creation. So if you create it on the UI thread of a GUI application, the action will be invoked on the UI thread, otherwise in will be invoked on the ThreadPool. In the later case there will be no protection against overlapping invocations of the Action<bool>. If the Producer class is not thread-safe, you'll have to add synchronization inside the handler. Example:
var progress = new Progress<bool>(underPressure =>
{
lock (Producer) if (underPressure) Producer.Pause(); else Producer.Resume();
});
¹ Actually unbounded channels also support the Count property, unless they are configured with the SingleReader option.
This is relatively straightforward if you realize there are three "steps" in this problem.
The first step ToChannel(Producer) receives messages from the producer.
The next step, PauseAt signals pause() if there are too many pending items in the out panel.
The third step, ResumeAt signals resume() if its input channel has a count less than a threshold.
It's easy to combine all three steps using typical Channel patterns.
producer.ToChannel(token)
.PauseAt(1000,()=>producer.PauseAsync(),token)
.ResumeAt(10,()=>producer.ResumeAsync(),token)
....
Or a single, generic TrafficJam method:
static ChannelReader<T> TrafficJam(this ChannelReader<T> source,
int pauseAt,int resumeAt,
Func<Task> pause,Func<Task> resume,
CancellationToken token=default)
{
return source
.PauseAt(pauseAt,pause,token)
.ResumeAt(resumeAt,resume,token);
}
ToChannel
The first step is relatively straightforward, an unbounded Channel source based from the producer's events.
static ChannelReader<int> ToChannel(this Producer producer,
CancellationToken token=default)
{
Channel<int> channel=Channel.CreateUnbounded();
var writer=channel.Writer;
producer.ItemAvailable += OnItem;
return channel;
void OnItem(object sender, int item)
{
writer.TryWriteAsync(item);
if(token.IsCancellationRequested)
{
producer.ItemAvailable-=OnItem;
writer.Complete();
}
}
}
The only unusual part is using a local function to allow disabling the event handler and completing the output channel when cancellation is requested
That's enough to queue all the incoming items. ToChannel doesn't bother with starting, pausing etc, that's not its job.
PauseAt
The next function, PauseAt, uses a BoundedChannel to implement the threshold. It forwards incoming messages if it can. If the channel can't accept any more messages it calls the pause callback and awaits until it can resume forwarding :
static ChannelReader<T> PauseAt(this ChannelReader<T> source,
int threshold, Func<Task> pause,
CancellationToken token=default)
{
Channel<T> channel=Channel.CreateBounded(threshold);
var writer=channel.Writer;
_ = Task.Run(async ()=>
await foreach(var msg in source.ReadAllAsync(token))
{
if(writer.CanWrite())
{
await writer.WriteAsync(msg);
}
else
{
await pause();
//Wait until we can post again
await writer.WriteAsync(msg);
}
}
},token)
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel;
}
ResumeAt
The final step, ResumeAt, calls resume() if its input was previously above the threshold and now has fewer items.
If the input isn't bounded, it just forwards all messages.
static ChannelReader<T> ResumeAt(this ChannelReader<T> source,
int resumeAt, Func<Task> resume,
CancellationToken token=default)
{
Channel<T> channel=Channel.CreateUnbounded();
var writer=channel.Writer;
_ = Task.Run(async ()=>{
bool above=false;
await foreach(var msg in source.ReadAllAsync(token))
{
await writer.WriteAsync(msg);
//Do nothing if the source isn't bounded
if(source.CanCount)
{
if(above && source.Count<=resumeAt)
{
await resume();
above=false;
}
above=source.Count>resumeAt;
}
}
},token)
.ContinueWith(t=>writer.TryComplete(t.Exception));
return channel;
}
Since only a single thread is used, we can keep count of the previous count. and whether it was above or below the threshold.
Combining Pause and Resume
Since Pause and Resume work with just channels, they can be combined into a single method :
static ChannelReader<T> TrafficJam(this ChannelReader<T> source,
int pauseAt,int resumeAt,
Func<Task> pause,Func<Task> resume,
CancellationToken token=default)
{
return source.PauseAt(pauseAt,pause,token)
.ResumeAt(resumeAt,resume,token);
}
I have an expensive method to call for creating a batch of source items:
private Task<List<SourceItem>> GetUnprocessedBatch(int batchSize)
{
//impl
}
I want to populate new items only when there is no item to process(or it falls below a certain threshold). I couldn't figure out which Source method to use so far.
I have implemented a crude stream that would keep returning new items:
public class Stream
{
private readonly Queue<SourceItem> scrapeAttempts;
private int batchSize = 100;
private int minItemCount = 10;
public Stream()
{
scrapeAttempts = new Queue<SourceItem>();
}
public async Task<SourceItem> Next()
{
if (scrapeAttempts.Count < minItemCount)
{
var entryScrapeAttempts = await GetUnprocessedBatch(batchSize);
entryScrapeAttempts.ForEach(attempt => scrapeAttempts.Enqueue(attempt));
}
return scrapeAttempts.Dequeue();
}
}
I expected Source.Task would work but it looks like it calls it only once. How can I create a source for this scenario?
So, conceptually what you want is a Source stage, that fetches elements asynchronously in batches, buffers the batch and propagates events downstream one by one. When the buffer is close to being empty, we want to eagerly call the next fetch on the side thread (but not more than once), so it could complete while we're emptying current batch.
This sort of behavior will require building a custom GraphStage. One that could look like this:
sealed class PreFetch<T> : GraphStage<SourceShape<T>>
{
private readonly int threshold;
private readonly Func<Task<IEnumerable<T>>> fetch;
private readonly Outlet<T> outlet = new Outlet<T>("prefetch");
public PreFetch(int threshold, Func<Task<IEnumerable<T>>> fetch)
{
this.threshold = threshold;
this.fetch = fetch;
this.Shape = new SourceShape<T>(this.outlet);
}
public override SourceShape<T> Shape { get; }
protected override GraphStageLogic CreateLogic(Attributes inheritedAttributes) => new Logic(this);
private sealed class Logic : GraphStageLogic
{
public Logic(PreFetch<T> stage) : base(stage.Shape)
{
// queue for batched elements
var queue = new Queue<T>();
// flag which indicates, that pull from downstream was made,
// but we didn't have any elements at that moment
var wasPulled = false;
// determines if fetch was already called
var fetchInProgress = false;
// in order to cooperate with async calls without data races,
// we need to register async callbacks for success and failure scenarios
var onSuccess = this.GetAsyncCallback<IEnumerable<T>>(batch =>
{
foreach (var item in batch) queue.Enqueue(item);
if (wasPulled)
{
// if pull was requested but not fulfilled, we need to push now, as we have elements
// it assumes that fetch returned non-empty batch
Push(stage.outlet, queue.Dequeue());
wasPulled = false;
}
fetchInProgress = false;
});
var onFailure = this.GetAsyncCallback<Exception>(this.FailStage);
SetHandler(stage.outlet, onPull: () => {
if (queue.Count < stage.threshold && !fetchInProgress)
{
// if queue occupation reached bellow expected capacity
// call fetch on a side thread and handle its result asynchronously
stage.fetch().ContinueWith(task =>
{
// depending on if task was failed or not, we call corresponding callback
if (task.IsFaulted || task.IsCanceled)
onFailure(task.Exception as Exception ?? new TaskCanceledException(task));
else onSuccess(task.Result);
});
fetchInProgress = true;
}
// if queue is empty, we cannot push immediatelly, so we only mark
// that pull request has been made but not fulfilled
if (queue.Count == 0)
wasPulled = true;
else
{
Push(stage.outlet, queue.Dequeue());
wasPulled = false;
}
});
}
}
}
I working on real-time search. At this moment on property setter which is bounded to edit text, I call a method which calls API and then fills the list with the result it looks like this:
private string searchPhrase;
public string SearchPhrase
{
get => searchPhrase;
set
{
SetProperty(ref searchPhrase, value);
RunOnMainThread(SearchResult.Clear);
isAllFriends = false;
currentPage = 0;
RunInAsync(LoadData);
}
}
private async Task LoadData()
{
var response = await connectionRepository.GetConnections(currentPage,
pageSize, searchPhrase);
foreach (UserConnection uc in response)
{
if (uc.Type != UserConnection.TypeEnum.Awaiting)
{
RunOnMainThread(() =>
SearchResult.Add(new ConnectionUser(uc)));
}
}
}
But this way is totally useless because of it totally mashup list of a result if a text is entering quickly. So to prevent this I want to run this method async in a property but if a property is changed again I want to kill the previous Task and star it again. How can I achieve this?
Some informations from this thread:
create a CancellationTokenSource
var ctc = new CancellationTokenSource();
create a method doing the async work
private static Task ExecuteLongCancellableMethod(CancellationToken token)
{
return Task.Run(() =>
{
token.ThrowIfCancellationRequested();
// more code here
// check again if this task is canceled
token.ThrowIfCancellationRequested();
// more code
}
}
It is important to have this checks for cancel in the code.
Execute the function:
var cancellable = ExecuteLongCancellableMethod(ctc.Token);
To stop the long running execution use
ctc.Cancel();
For further details please consult the linked thread.
This question can be answered in many different ways. However IMO I would look at creating a class that
Delays itself automatically for X (ms) before performing the seach
Has the ability to be cancelled at any time as the search request changes.
Realistically this will change your code design, and should encapsulate the logic for both 1 & 2 in a separate class.
My initial thoughts are (and none of this is tested and mostly pseudo code).
class ConnectionSearch
{
public ConnectionSearch(string phrase, Action<object> addAction)
{
_searchPhrase = phrase;
_addAction = addAction;
_cancelSource = new CancellationTokenSource();
}
readonly string _searchPhrase = null;
readonly Action<object> _addAction;
readonly CancellationTokenSource _cancelSource;
public void Cancel()
{
_cancelSource?.Cancel();
}
public async void PerformSearch()
{
await Task.Delay(300); //await 300ms between keystrokes
if (_cancelSource.IsCancellationRequested)
return;
//continue your code keep checking for
//loop your dataset
//call _addAction?.Invoke(uc);
}
}
This is basic, really just encapsulates the logic for both points 1 & 2, you will need to adapt the code to do the search.
Next you could change your property to cancel a previous running instance, and then start another instance immediatly after something like below.
ConnectionSearch connectionSearch;
string searchPhrase;
public string SearchPhrase
{
get => searchPhrase;
set
{
//do your setter work
if(connectionSearch != null)
{
connectionSearch.Cancel();
}
connectionSearch = new ConnectionSearch(value, addConnectionUser);
connectionSearch.PerformSearch();
}
}
void addConnectionUser(object uc)
{
//pperform your add logic..
}
The code is pretty straight forward, however you will see in the setter is simply cancelling an existing request and then creating a new request. You could put some disposal cleanup logic in place but this should get you started.
You can implement some sort of debouncer which will encapsulate the logics of task result debouncing, i.e. it will assure if you run many tasks, then only the latest task result will be used:
public class TaskDebouncer<TResult>
{
public delegate void TaskDebouncerHandler(TResult result, object sender);
public event TaskDebouncerHandler OnCompleted;
public event TaskDebouncerHandler OnDebounced;
private Task _lastTask;
private object _lock = new object();
public void Run(Task<TResult> task)
{
lock (_lock)
{
_lastTask = task;
}
task.ContinueWith(t =>
{
if (t.IsFaulted)
throw t.Exception;
lock (_lock)
{
if (_lastTask == task)
{
OnCompleted?.Invoke(t.Result, this);
}
else
{
OnDebounced?.Invoke(t.Result, this);
}
}
});
}
public async Task WaitLast()
{
await _lastTask;
}
}
Then, you can just do:
private readonly TaskDebouncer<Connections[]> _connectionsDebouncer = new TaskDebouncer<Connections[]>();
public ClassName()
{
_connectionsDebouncer.OnCompleted += OnConnectionUpdate;
}
public void OnConnectionUpdate(Connections[] connections, object sender)
{
RunOnMainThread(SearchResult.Clear);
isAllFriends = false;
currentPage = 0;
foreach (var conn in connections)
RunOnMainThread(() => SearchResult.Add(new ConnectionUser(conn)));
}
private string searchPhrase;
public string SearchPhrase
{
get => searchPhrase;
set
{
SetProperty(ref searchPhrase, value);
_connectionsDebouncer.Add(RunInAsync(LoadData));
}
}
private async Task<Connection[]> LoadData()
{
return await connectionRepository
.GetConnections(currentPage, pageSize, searchPhrase)
.Where(conn => conn.Type != UserConnection.TypeEnum.Awaiting)
.ToArray();
}
It is not pretty clear what RunInAsync and RunOnMainThread methods are.
I guess, you don't actually need them.
I would like to have a custom thread pool satisfying the following requirements:
Real threads are preallocated according to the pool capacity. The actual work is free to use the standard .NET thread pool, if needed to spawn concurrent tasks.
The pool must be able to return the number of idle threads. The returned number may be less than the actual number of the idle threads, but it must not be greater. Of course, the more accurate the number the better.
Queuing work to the pool should return a corresponding Task, which should place nice with the Task based API.
NEW The max job capacity (or degree of parallelism) should be adjustable dynamically. Trying to reduce the capacity does not have to take effect immediately, but increasing it should do so immediately.
The rationale for the first item is depicted below:
The machine is not supposed to be running more than N work items concurrently, where N is relatively small - between 10 and 30.
The work is fetched from the database and if K items are fetched then we want to make sure that there are K idle threads to start the work right away. A situation where work is fetched from the database, but remains waiting for the next available thread is unacceptable.
The last item also explains the reason for having the idle thread count - I am going to fetch that many work items from the database. It also explains why the reported idle thread count must never be higher than the actual one - otherwise I might fetch more work that can be immediately started.
Anyway, here is my implementation along with a small program to test it (BJE stands for Background Job Engine):
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace TaskStartLatency
{
public class BJEThreadPool
{
private sealed class InternalTaskScheduler : TaskScheduler
{
private int m_idleThreadCount;
private readonly BlockingCollection<Task> m_bus;
public InternalTaskScheduler(int threadCount, BlockingCollection<Task> bus)
{
m_idleThreadCount = threadCount;
m_bus = bus;
}
public void RunInline(Task task)
{
Interlocked.Decrement(ref m_idleThreadCount);
try
{
TryExecuteTask(task);
}
catch
{
// The action is responsible itself for the error handling, for the time being...
}
Interlocked.Increment(ref m_idleThreadCount);
}
public int IdleThreadCount
{
get { return m_idleThreadCount; }
}
#region Overrides of TaskScheduler
protected override void QueueTask(Task task)
{
m_bus.Add(task);
}
protected override bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued)
{
return TryExecuteTask(task);
}
protected override IEnumerable<Task> GetScheduledTasks()
{
throw new NotSupportedException();
}
#endregion
public void DecrementIdleThreadCount()
{
Interlocked.Decrement(ref m_idleThreadCount);
}
}
private class ThreadContext
{
private readonly InternalTaskScheduler m_ts;
private readonly BlockingCollection<Task> m_bus;
private readonly CancellationTokenSource m_cts;
public readonly Thread Thread;
public ThreadContext(string name, InternalTaskScheduler ts, BlockingCollection<Task> bus, CancellationTokenSource cts)
{
m_ts = ts;
m_bus = bus;
m_cts = cts;
Thread = new Thread(Start)
{
IsBackground = true,
Name = name
};
Thread.Start();
}
private void Start()
{
try
{
foreach (var task in m_bus.GetConsumingEnumerable(m_cts.Token))
{
m_ts.RunInline(task);
}
}
catch (OperationCanceledException)
{
}
m_ts.DecrementIdleThreadCount();
}
}
private readonly InternalTaskScheduler m_ts;
private readonly CancellationTokenSource m_cts = new CancellationTokenSource();
private readonly BlockingCollection<Task> m_bus = new BlockingCollection<Task>();
private readonly List<ThreadContext> m_threadCtxs = new List<ThreadContext>();
public BJEThreadPool(int threadCount)
{
m_ts = new InternalTaskScheduler(threadCount, m_bus);
for (int i = 0; i < threadCount; ++i)
{
m_threadCtxs.Add(new ThreadContext("BJE Thread " + i, m_ts, m_bus, m_cts));
}
}
public void Terminate()
{
m_cts.Cancel();
foreach (var t in m_threadCtxs)
{
t.Thread.Join();
}
}
public Task Run(Action<CancellationToken> action)
{
return Task.Factory.StartNew(() => action(m_cts.Token), m_cts.Token, TaskCreationOptions.DenyChildAttach, m_ts);
}
public Task Run(Action action)
{
return Task.Factory.StartNew(action, m_cts.Token, TaskCreationOptions.DenyChildAttach, m_ts);
}
public int IdleThreadCount
{
get { return m_ts.IdleThreadCount; }
}
}
class Program
{
static void Main()
{
const int THREAD_COUNT = 32;
var pool = new BJEThreadPool(THREAD_COUNT);
var tcs = new TaskCompletionSource<bool>();
var tasks = new List<Task>();
var allRunning = new CountdownEvent(THREAD_COUNT);
for (int i = pool.IdleThreadCount; i > 0; --i)
{
var index = i;
tasks.Add(pool.Run(cancellationToken =>
{
Console.WriteLine("Started action " + index);
allRunning.Signal();
tcs.Task.Wait(cancellationToken);
Console.WriteLine(" Ended action " + index);
}));
}
Console.WriteLine("pool.IdleThreadCount = " + pool.IdleThreadCount);
allRunning.Wait();
Debug.Assert(pool.IdleThreadCount == 0);
int expectedIdleThreadCount = THREAD_COUNT;
Console.WriteLine("Press [c]ancel, [e]rror, [a]bort or any other key");
switch (Console.ReadKey().KeyChar)
{
case 'c':
Console.WriteLine("Cancel All");
tcs.TrySetCanceled();
break;
case 'e':
Console.WriteLine("Error All");
tcs.TrySetException(new Exception("Failed"));
break;
case 'a':
Console.WriteLine("Abort All");
pool.Terminate();
expectedIdleThreadCount = 0;
break;
default:
Console.WriteLine("Done All");
tcs.TrySetResult(true);
break;
}
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException exc)
{
Console.WriteLine(exc.Flatten().InnerException.Message);
}
Debug.Assert(pool.IdleThreadCount == expectedIdleThreadCount);
pool.Terminate();
Console.WriteLine("Press any key");
Console.ReadKey();
}
}
}
It is a very simple implementation and it appears to be working. However, there is a problem - the BJEThreadPool.Run method does not accept asynchronous methods. I.e. my implementation does not allow me to add the following overloads:
public Task Run(Func<CancellationToken, Task> action)
{
return Task.Factory.StartNew(() => action(m_cts.Token), m_cts.Token, TaskCreationOptions.DenyChildAttach, m_ts).Unwrap();
}
public Task Run(Func<Task> action)
{
return Task.Factory.StartNew(action, m_cts.Token, TaskCreationOptions.DenyChildAttach, m_ts).Unwrap();
}
The pattern I use in InternalTaskScheduler.RunInline does not work in this case.
So, my question is how to add the support for asynchronous work items? I am fine with changing the entire design as long as the requirements outlined at the beginning of the post are upheld.
EDIT
I would like to clarify the intented usage of the desired pool. Please, observe the following code:
if (pool.IdleThreadCount == 0)
{
return;
}
foreach (var jobData in FetchFromDB(pool.IdleThreadCount))
{
pool.Run(CreateJobAction(jobData));
}
Notes:
The code is going to be run periodically, say every 1 minute.
The code is going to be run concurrently by multiple machines watching the same database.
FetchFromDB is going to use the technique described in Using SQL Server as a DB queue with multiple clients to atomically fetch and lock the work from the DB.
CreateJobAction is going to invoke the code denoted by jobData (the job code) and close the work upon the completion of that code. The job code is out of my control and it could be pretty much anything - heavy CPU bound code or light asynchronous IO bound code, badly written synchronous IO bound code or a mix of all. It could run for minutes and it could run for hours. Closing the work is my code and it would by asynchronous IO bound code. Because of this, the signature of the returned job action is that of an asynchronous method.
Item 2 underlines the importance of correctly identifying the amount of idle threads. If there are 900 pending work items and 10 agent machines I cannot allow an agent to fetch 300 work items and queue them on the thread pool. Why? Because, it is most unlikely that the agent will be able to run 300 work items concurrently. It will run some, sure enough, but others will be waiting in the thread pool work queue. Suppose it will run 100 and let 200 wait (even though 100 is probably far fetched). This wields 3 fully loaded agents and 7 idle ones. But only 300 work items out of 900 are actually being processed concurrently!!!
My goal is to maximize the spread of the work amongst the available agents. Ideally, I should evaluate the load of an agent and the "heaviness" of the pending work, but it is a formidable task and is reserved for the future versions. Right now, I wish to assign each agent the max job capacity with the intention to provide the means to increase/decrease it dynamically without restarting the agents.
Next observation. The work can take quite a long time to run and it could be all synchronous code. As far as I understand it is undesirable to utilize thread pool threads for such kind of work.
EDIT 2
There is a statement that TaskScheduler is only for the CPU bound work. But what if I do not know the nature of the work? I mean it is a general purpose Background Job Engine and it runs thousands of different kinds of jobs. I do not have means to tell "that job is CPU bound" and "that on is synchronous IO bound" and yet another one is asynchronous IO bound. I wish I could, but I cannot.
EDIT 3
At the end, I do not use the SemaphoreSlim, but neither do I use the TaskScheduler - it finally trickled down my thick skull that it is unappropriate and plain wrong, plus it makes the code overly complex.
Still, I failed to see how SemaphoreSlim is the way. The proposed pattern:
public async Task Enqueue(Func<Task> taskGenerator)
{
await semaphore.WaitAsync();
try
{
await taskGenerator();
}
finally
{
semaphore.Release();
}
}
Expects taskGenerator either be an asynchronous IO bound code or open a new thread otherwise. However, I have no means to determine whether the work to be executed is one or another. Plus, as I have learned from SemaphoreSlim.WaitAsync continuation code if the semaphore is unlocked, the code following the WaitAsync() is going to run on the same thread, which is not very good for me.
Anyway, below is my implementation, in case anyone fancies. Unfortunately, I am yet to understand how to reduce the pool thread count dynamically, but this is a topic for another question.
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace TaskStartLatency
{
public interface IBJEThreadPool
{
void SetThreadCount(int threadCount);
void Terminate();
Task Run(Action action);
Task Run(Action<CancellationToken> action);
Task Run(Func<Task> action);
Task Run(Func<CancellationToken, Task> action);
int IdleThreadCount { get; }
}
public class BJEThreadPool : IBJEThreadPool
{
private interface IActionContext
{
Task Run(CancellationToken ct);
TaskCompletionSource<object> TaskCompletionSource { get; }
}
private class ActionContext : IActionContext
{
private readonly Action m_action;
public ActionContext(Action action)
{
m_action = action;
TaskCompletionSource = new TaskCompletionSource<object>();
}
#region Implementation of IActionContext
public Task Run(CancellationToken ct)
{
m_action();
return null;
}
public TaskCompletionSource<object> TaskCompletionSource { get; private set; }
#endregion
}
private class CancellableActionContext : IActionContext
{
private readonly Action<CancellationToken> m_action;
public CancellableActionContext(Action<CancellationToken> action)
{
m_action = action;
TaskCompletionSource = new TaskCompletionSource<object>();
}
#region Implementation of IActionContext
public Task Run(CancellationToken ct)
{
m_action(ct);
return null;
}
public TaskCompletionSource<object> TaskCompletionSource { get; private set; }
#endregion
}
private class AsyncActionContext : IActionContext
{
private readonly Func<Task> m_action;
public AsyncActionContext(Func<Task> action)
{
m_action = action;
TaskCompletionSource = new TaskCompletionSource<object>();
}
#region Implementation of IActionContext
public Task Run(CancellationToken ct)
{
return m_action();
}
public TaskCompletionSource<object> TaskCompletionSource { get; private set; }
#endregion
}
private class AsyncCancellableActionContext : IActionContext
{
private readonly Func<CancellationToken, Task> m_action;
public AsyncCancellableActionContext(Func<CancellationToken, Task> action)
{
m_action = action;
TaskCompletionSource = new TaskCompletionSource<object>();
}
#region Implementation of IActionContext
public Task Run(CancellationToken ct)
{
return m_action(ct);
}
public TaskCompletionSource<object> TaskCompletionSource { get; private set; }
#endregion
}
private readonly CancellationTokenSource m_ctsTerminateAll = new CancellationTokenSource();
private readonly BlockingCollection<IActionContext> m_bus = new BlockingCollection<IActionContext>();
private readonly LinkedList<Thread> m_threads = new LinkedList<Thread>();
private int m_idleThreadCount;
private static int s_threadCount;
public BJEThreadPool(int threadCount)
{
ReserveAdditionalThreads(threadCount);
}
private void ReserveAdditionalThreads(int n)
{
for (int i = 0; i < n; ++i)
{
var index = Interlocked.Increment(ref s_threadCount) - 1;
var t = new Thread(Start)
{
IsBackground = true,
Name = "BJE Thread " + index
};
Interlocked.Increment(ref m_idleThreadCount);
t.Start();
m_threads.AddLast(t);
}
}
private void Start()
{
try
{
foreach (var actionContext in m_bus.GetConsumingEnumerable(m_ctsTerminateAll.Token))
{
RunWork(actionContext).Wait();
}
}
catch (OperationCanceledException)
{
}
catch
{
// Should never happen - log the error
}
Interlocked.Decrement(ref m_idleThreadCount);
}
private async Task RunWork(IActionContext actionContext)
{
Interlocked.Decrement(ref m_idleThreadCount);
try
{
var task = actionContext.Run(m_ctsTerminateAll.Token);
if (task != null)
{
await task;
}
actionContext.TaskCompletionSource.SetResult(null);
}
catch (OperationCanceledException)
{
actionContext.TaskCompletionSource.TrySetCanceled();
}
catch (Exception exc)
{
actionContext.TaskCompletionSource.TrySetException(exc);
}
Interlocked.Increment(ref m_idleThreadCount);
}
private Task PostWork(IActionContext actionContext)
{
m_bus.Add(actionContext);
return actionContext.TaskCompletionSource.Task;
}
#region Implementation of IBJEThreadPool
public void SetThreadCount(int threadCount)
{
if (threadCount > m_threads.Count)
{
ReserveAdditionalThreads(threadCount - m_threads.Count);
}
else if (threadCount < m_threads.Count)
{
throw new NotSupportedException();
}
}
public void Terminate()
{
m_ctsTerminateAll.Cancel();
foreach (var t in m_threads)
{
t.Join();
}
}
public Task Run(Action action)
{
return PostWork(new ActionContext(action));
}
public Task Run(Action<CancellationToken> action)
{
return PostWork(new CancellableActionContext(action));
}
public Task Run(Func<Task> action)
{
return PostWork(new AsyncActionContext(action));
}
public Task Run(Func<CancellationToken, Task> action)
{
return PostWork(new AsyncCancellableActionContext(action));
}
public int IdleThreadCount
{
get { return m_idleThreadCount; }
}
#endregion
}
public static class Extensions
{
public static Task WithCancellation(this Task task, CancellationToken token)
{
return task.ContinueWith(t => t.GetAwaiter().GetResult(), token);
}
}
class Program
{
static void Main()
{
const int THREAD_COUNT = 16;
var pool = new BJEThreadPool(THREAD_COUNT);
var tcs = new TaskCompletionSource<bool>();
var tasks = new List<Task>();
var allRunning = new CountdownEvent(THREAD_COUNT);
for (int i = pool.IdleThreadCount; i > 0; --i)
{
var index = i;
tasks.Add(pool.Run(async ct =>
{
Console.WriteLine("Started action " + index);
allRunning.Signal();
await tcs.Task.WithCancellation(ct);
Console.WriteLine(" Ended action " + index);
}));
}
Console.WriteLine("pool.IdleThreadCount = " + pool.IdleThreadCount);
allRunning.Wait();
Debug.Assert(pool.IdleThreadCount == 0);
int expectedIdleThreadCount = THREAD_COUNT;
Console.WriteLine("Press [c]ancel, [e]rror, [a]bort or any other key");
switch (Console.ReadKey().KeyChar)
{
case 'c':
Console.WriteLine("ancel All");
tcs.TrySetCanceled();
break;
case 'e':
Console.WriteLine("rror All");
tcs.TrySetException(new Exception("Failed"));
break;
case 'a':
Console.WriteLine("bort All");
pool.Terminate();
expectedIdleThreadCount = 0;
break;
default:
Console.WriteLine("Done All");
tcs.TrySetResult(true);
break;
}
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException exc)
{
Console.WriteLine(exc.Flatten().InnerException.Message);
}
Debug.Assert(pool.IdleThreadCount == expectedIdleThreadCount);
pool.Terminate();
Console.WriteLine("Press any key");
Console.ReadKey();
}
}
}
Asynchronous "work items" are often based on async IO. Async IO does not use threads while it runs. Task schedulers are used to execute CPU work (tasks based on a delegate). The concept TaskScheduler does not apply. You cannot use a custom TaskScheduler to influence what async code does.
Make your work items throttle themselves:
static SemaphoreSlim sem = new SemaphoreSlim(maxDegreeOfParallelism); //shared object
async Task MyWorkerFunction()
{
await sem.WaitAsync();
try
{
MyWork();
}
finally
{
sem.Release();
}
}
As mentioned in another answer by usr you can't do this with a TaskScheduler as that is only for CPU bound work, not limiting the level of parallelization of all types of work, whether parallel or not. He also shows you how you can use a SemaphoreSlim to asynchronously limit the degrees of parallelism.
You can expand on this to generalize these concepts in a few ways. The one that seems like it would be the most beneficial to you would be to create a special type of queue that takes operations that return a Task and executes them in such a way that a given max degree of parallelization is achieved.
public class FixedParallelismQueue
{
private SemaphoreSlim semaphore;
public FixedParallelismQueue(int maxDegreesOfParallelism)
{
semaphore = new SemaphoreSlim(maxDegreesOfParallelism,
maxDegreesOfParallelism);
}
public async Task<T> Enqueue<T>(Func<Task<T>> taskGenerator)
{
await semaphore.WaitAsync();
try
{
return await taskGenerator();
}
finally
{
semaphore.Release();
}
}
public async Task Enqueue(Func<Task> taskGenerator)
{
await semaphore.WaitAsync();
try
{
await taskGenerator();
}
finally
{
semaphore.Release();
}
}
}
This allows you to create a queue for your application (you can even have several separate queues if you want) that has a fixed degree of parallelization. You can then provide operations returning a Task when they complete and the queue will schedule it when it can and return a Task representing when that unit of work has finished.