Dealing with throttling/rate limits (429 error) when using async/await - c#

I have the following async code that gets called from so many places in my project:
public async Task<HttpResponseMessage> MakeRequestAsync(HttpRequestMessage request)
{
var client = new HttpClient();
return await client.SendAsync(request).ConfigureAwait(false);
}
An example of how the above method gets called:
var tasks = items.Select(async i =>
{
var response = await MakeRequestAsync(i.Url);
//do something with response
});
The ZenDesk API that I'm hitting allows about 200 requests per minute after which I'm getting a 429 error. I need to do some sort of a Thread.sleep if I encounter the 429 error, but with with async/await, there may be so many requests in parallel threads waiting to process, I am not sure how I can make all of them sleep for 5 seconds or so and then resume again.
What's the correct way to approach this problem? I'd like to hear quick solutions as well as good-design solutions.

I do not think that this is a duplicate, as marked recently. The other SO poster does not need a time-based sliding window (or time based throttling) and the answer there does not cover this situation. That works only when you want to set a hard limit on outgoing requests.
Anyway, a quasi-quick solution is to make the throttling in the MakeRequestAsync method. Something like this:
public async Task<HttpResponseMessage> MakeRequestAsync(HttpRequestMessage request)
{
//Wait while the limit has been reached.
while(!_throttlingHelper.RequestAllowed)
{
await Task.Delay(1000);
}
var client = new HttpClient();
_throttlingHelper.StartRequest();
var result = await client.SendAsync(request).ConfigureAwait(false);
_throttlingHelper.EndRequest();
return result;
}
The class ThrottlingHelper is just something I made now so you may need to debug it a bit (read - may not work out of the box).
It tries to be a timestamp sliding window.
public class ThrottlingHelper : IDisposable
{
//Holds time stamps for all started requests
private readonly List<long> _requestsTx;
private readonly ReaderWriterLockSlim _lock;
private readonly int _maxLimit;
private TimeSpan _interval;
public ThrottlingHelper(int maxLimit, TimeSpan interval)
{
_requestsTx = new List<long>();
_maxLimit = maxLimit;
_interval = interval;
_lock = new ReaderWriterLockSlim(LockRecursionPolicy.NoRecursion);
}
public bool RequestAllowed
{
get
{
_lock.EnterReadLock();
try
{
var nowTx = DateTime.Now.Ticks;
return _requestsTx.Count(tx => nowTx - tx < _interval.Ticks) < _maxLimit;
}
finally
{
_lock.ExitReadLock();
}
}
}
public void StartRequest()
{
_lock.EnterWriteLock();
try
{
_requestsTx.Add(DateTime.Now.Ticks);
}
finally
{
_lock.ExitWriteLock();
}
}
public void EndRequest()
{
_lock.EnterWriteLock();
try
{
var nowTx = DateTime.Now.Ticks;
_requestsTx.RemoveAll(tx => nowTx - tx >= _interval.Ticks);
}
finally
{
_lock.ExitWriteLock();
}
}
public void Dispose()
{
_lock.Dispose();
}
}
You would use it as a member in the class that makes the requests, and instantiate it like this:
_throttlingHelper = new ThrottlingHelper(200, TimeSpan.FromMinutes(1));
Don't forget to dispose it when you're done with it.
A bit of documentation about ThrottlingHelper:
Constructor params are the maximum requests you want to be able to do in a certain interval and the interval itself as a time span. So, 200 and 1 minute means that that you want no more than 200 requests/minute.
Property RequestAllowed lets you know if you are able to do a request with the current throttling settings.
Methods StartRequest & EndRequest register/unregister a request by using the current date/time.
EDIT/Pitfalls
As indicated by #PhilipABarnes, EndRequest can potentially remove requests that are still in progress. As far as I can see, this can happen in two situations:
The interval is small, such that requests do not get to complete in good time.
Requests actually take more than the interval to execute.
The proposed solution involves actually matching EndRequest calls to StartRequest calls by means of a GUID or something similar.

if there are multiple requests waiting in the while loop for RequestAllowed some of them might start at the same time. how about a simple StartRequestIfAllowed?
public class ThrottlingHelper : DisposeBase
{
//Holds time stamps for all started requests
private readonly List<long> _requestsTx;
private readonly Mutex _mutex = new Mutex();
private readonly int _maxLimit;
private readonly TimeSpan _interval;
public ThrottlingHelper(int maxLimit, TimeSpan interval)
{
_requestsTx = new List<long>();
_maxLimit = maxLimit;
_interval = interval;
}
public bool StartRequestIfAllowed
{
get
{
_mutex.WaitOne();
try
{
var nowTx = DateTime.Now.Ticks;
if (_requestsTx.Count(tx => nowTx - tx < _interval.Ticks) < _maxLimit)
{
_requestsTx.Add(DateTime.Now.Ticks);
return true;
}
else
{
return false;
}
}
finally
{
_mutex.ReleaseMutex();
}
}
}
public void EndRequest()
{
_mutex.WaitOne();
try
{
var nowTx = DateTime.Now.Ticks;
_requestsTx.RemoveAll(tx => nowTx - tx >= _interval.Ticks);
}
finally
{
_mutex.ReleaseMutex();
}
}
protected override void DisposeResources()
{
_mutex.Dispose();
}
}

Related

How to simulate time passing from a unit test?

I have a method running in an infinite loop in a task from the ctor. The message does the equivalent to sending a ping, and upon a certain number of failed pings, prevents other commands from being able to be sent. The method is started from the ctor with Task.Factory.StartNew(()=>InitPingBackgroundTask(pingCts.token));
public async Task InitPingBackgroundTask (CancellationToken token)
{
while(!token.IsCancellationRequested)
{
try
{
Ping();
// handle logic
_pingsFailed = 0;
_canSend = true;
}
catch(TimeoutException)
{
// handle logic
if(++_pingsFailed >= settings.MaxPingLossNoConnection)
_canSend = false;
}
finally
{
await Task.Delay(settings.PingInterval);
}
}
}
I have a second method DoCmd like so:
public void DoCmd()
{
if(_canSend)
{
// handle logic
}
}
In my test, using Moq, I can set the settings to have PingInterval=TimeSpan.FromSeconds(1) and MaxPingLossNoConnection = 2 for the sake of my test.
After setting up the sut and everything else, I want to call DoCmd() and verify that it is not actually sent. (I have mocked its dependencies and can verify that the logic in its method was never called)
One way to achieve this is to add a sleep or delay from within the test, before calling sut.DoCmd(), but this makes the test take longer. Is there a way to simulate time passing (like tick(desiredTime) in angular, or something similar)? To simulate the passing of time without having to actually wait the time?
Any leads would be appreciated :-)
EDIT
Added the test code:
public void NotAllowSendIfTooManyPingsFailed()
{
var settings = new Settings()
{
PingInterval=TimeSpan.FromSeconds(1),
MaxPingLossNoConnection = 0 // ended up changing to 0 to fail immidiately
}
// set up code, that Ping messages should fail
Thread.Sleep(100); // necessary to Ping has a chance to fail before method is called
sut.DoCmd();
// assert logic, verifying that DoCmd did not go through
}
Ideally, I would like to be able to play with MaxPingLossNoConnection and set it to different numbers to test circumstances. That would also require adding sleeps to let the time pass. In this test, I would want to remove the Sleep altogether, as well as in other similar tests where sleep would be longer
In order to deal with time concerns im my tests, I found a little Clock class a while ago and replaced all the calls to DateTime.Now in my systems for Clock.Now. I tweaked it a little bit to have the option to set the time and freeze it or to keep it running during the tests.
Clock.cs
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
namespace ComumLib
{
public class Clock : IDisposable
{
private static DateTime? _startTime;
private static DateTime? _nowForTest;
public static DateTime Now
{
get
{
if (_nowForTest == null)
{
return DateTime.Now;
}
else
{
//freeze time
if (_startTime == null)
{
return _nowForTest.GetValueOrDefault();
}
//keep running
else
{
TimeSpan elapsedTime = DateTime.Now.Subtract(_startTime.GetValueOrDefault());
return _nowForTest.GetValueOrDefault().Add(elapsedTime);
}
}
}
}
public static IDisposable NowIs(DateTime dateTime, bool keepTimeRunning = false)
{
_nowForTest = dateTime;
if (keepTimeRunning)
{
_startTime = DateTime.Now;
}
return new Clock();
}
public static IDisposable ResetNowIs()
{
_startTime = null;
_nowForTest = null;
return new Clock();
}
public void Dispose()
{
_startTime = null;
_nowForTest = null;
}
}
}
In my tests, I either call
DateTime dt = new DateTime(2022,01,01,22,12,44,2);
Clock.NowIs(dt,true); //time keeps running during the test
or
Clock.NowIs(dt); //time remains the same during the test

C# Trying to wrap a function with a stopwatch

I've been attempting to see how long functions take to execute in my code as practice to see where I can optimize. Right now I use a helper class that is essentially a stopwatch with a message to check these. The goal of this is that I should be able to wrap whatever method call I want in the helper and I'll get it's duration.
public class StopwatcherData
{
public long Time { get; set; }
public string Message { get; set; }
public StopwatcherData(long time, string message)
{
Time = time;
Message = message;
}
}
public class Stopwatcher
{
public delegate void CompletedCallBack(string result);
public static List<StopwatcherData> Data { get; set; }
private static Stopwatch stopwatch { get; set;}
public Stopwatcher()
{
Data = new List<StopwatcherData>();
stopwatch = new Stopwatch();
stopwatch.Start();
}
public static void Click(string message)
{
Data.Add(new StopwatcherData(stopwatch.ElapsedMilliseconds, message));
}
public static void Reset()
{
stopwatch.Reset();
stopwatch.Start();
}
}
Right now to use this, I have to call the Reset before the function I want so that the timer is restarted, and then call the click after it.
Stopwatcher.Reset()
MyFunction();
Stopwatcher.Click("MyFunction");
I've read a bit about delegates and actions, but I'm unsure of how to apply them to this situation. Ideally, I would pass the function as part of the Stopwatcher call.
//End Goal:
Stopwatcher.Track(MyFunction(), "MyFunction Time");
Any help is welcome.
It's not really a good idea to profile your application like that, but if you insist, you can at least make some improvements.
First, don't reuse Stopwatch, just create new every time you need.
Second, you need to handle two cases - one when delegate you pass returns value and one when it does not.
Since your Track method is static - it's a common practice to make it thread safe. Non-thread-safe static methods are quite bad idea. For that you can store your messages in a thread-safe collection like ConcurrentBag, or just use lock every time you add item to your list.
In the end you can have something like this:
public class Stopwatcher {
private static readonly ConcurrentBag<StopwatcherData> _data = new ConcurrentBag<StopwatcherData>();
public static void Track(Action action, string message) {
var w = Stopwatch.StartNew();
try {
action();
}
finally {
w.Stop();
_data.Add(new StopwatcherData(w.ElapsedMilliseconds, message));
}
}
public static T Track<T>(Func<T> func, string message) {
var w = Stopwatch.StartNew();
try {
return func();
}
finally {
w.Stop();
_data.Add(new StopwatcherData(w.ElapsedMilliseconds, message));
}
}
}
And use it like this:
Stopwatcher.Track(() => SomeAction(param1), "test");
bool result = Stopwatcher.Track(() => SomeFunc(param2), "test");
If you are going to use that with async delegates (which return Task or Task<T>) - you need to add two more overloads for that case.
Yes, you can create a timer function that accepts any action as a delegate. Try this block:
public static long TimeAction(Action action)
{
var timer = new Stopwatch();
timer.Start();
action();
timer.Stop();
return timer.ElapsedMilliseconds;
}
This can be used like this:
var elapsedMilliseconds = TimeAction(() => MyFunc(param1, param2));
This is a bit more awkward if your wrapped function returns a value, but you can deal with this by assigning a variable from within the closure, like this:
bool isSuccess ;
var elapsedMilliseconds = TimeToAction(() => {
isSuccess = MyFunc(param1, param2);
});
I've had this problem a while ago as well and was always afraid of the case that I'll leave errors when I change Stopwatcher.Track(() => SomeFunc(), "test")(See Evk's answer) back to SomeFunc(). So I tought about something that wraps it without changing it!
I came up with a using, which is for sure not the intended purpose.
public class OneTimeStopwatch : IDisposable
{
private string _logPath = "C:\\Temp\\OneTimeStopwatch.log";
private readonly string _itemname;
private System.Diagnostics.Stopwatch sw = new System.Diagnostics.Stopwatch();
public OneTimeStopwatch(string itemname)
{
_itemname = itemname;
sw.Start();
}
public void Dispose()
{
sw.Stop();
System.IO.File.AppendAllText(_logPath, string.Format($"{_itemname}: {sw.ElapsedMilliseconds}ms{Environment.NewLine}"));
}
}
This can be used a easy way
using (new OneTimeStopwatch("test"))
{
//some sensible code not to touch
System.Threading.Thread.Sleep(1000);
}
//logfile with line "test: 1000ms"
I only need to remove 2 lines (and auto format) to make it normal again.
Plus I can easily wrap multiple lines here which isn't possible without defining new functions in the other approach.
Again, this is not recommended for terms of few miliseconds.

Close task before run again

I working on real-time search. At this moment on property setter which is bounded to edit text, I call a method which calls API and then fills the list with the result it looks like this:
private string searchPhrase;
public string SearchPhrase
{
get => searchPhrase;
set
{
SetProperty(ref searchPhrase, value);
RunOnMainThread(SearchResult.Clear);
isAllFriends = false;
currentPage = 0;
RunInAsync(LoadData);
}
}
private async Task LoadData()
{
var response = await connectionRepository.GetConnections(currentPage,
pageSize, searchPhrase);
foreach (UserConnection uc in response)
{
if (uc.Type != UserConnection.TypeEnum.Awaiting)
{
RunOnMainThread(() =>
SearchResult.Add(new ConnectionUser(uc)));
}
}
}
But this way is totally useless because of it totally mashup list of a result if a text is entering quickly. So to prevent this I want to run this method async in a property but if a property is changed again I want to kill the previous Task and star it again. How can I achieve this?
Some informations from this thread:
create a CancellationTokenSource
var ctc = new CancellationTokenSource();
create a method doing the async work
private static Task ExecuteLongCancellableMethod(CancellationToken token)
{
return Task.Run(() =>
{
token.ThrowIfCancellationRequested();
// more code here
// check again if this task is canceled
token.ThrowIfCancellationRequested();
// more code
}
}
It is important to have this checks for cancel in the code.
Execute the function:
var cancellable = ExecuteLongCancellableMethod(ctc.Token);
To stop the long running execution use
ctc.Cancel();
For further details please consult the linked thread.
This question can be answered in many different ways. However IMO I would look at creating a class that
Delays itself automatically for X (ms) before performing the seach
Has the ability to be cancelled at any time as the search request changes.
Realistically this will change your code design, and should encapsulate the logic for both 1 & 2 in a separate class.
My initial thoughts are (and none of this is tested and mostly pseudo code).
class ConnectionSearch
{
public ConnectionSearch(string phrase, Action<object> addAction)
{
_searchPhrase = phrase;
_addAction = addAction;
_cancelSource = new CancellationTokenSource();
}
readonly string _searchPhrase = null;
readonly Action<object> _addAction;
readonly CancellationTokenSource _cancelSource;
public void Cancel()
{
_cancelSource?.Cancel();
}
public async void PerformSearch()
{
await Task.Delay(300); //await 300ms between keystrokes
if (_cancelSource.IsCancellationRequested)
return;
//continue your code keep checking for
//loop your dataset
//call _addAction?.Invoke(uc);
}
}
This is basic, really just encapsulates the logic for both points 1 & 2, you will need to adapt the code to do the search.
Next you could change your property to cancel a previous running instance, and then start another instance immediatly after something like below.
ConnectionSearch connectionSearch;
string searchPhrase;
public string SearchPhrase
{
get => searchPhrase;
set
{
//do your setter work
if(connectionSearch != null)
{
connectionSearch.Cancel();
}
connectionSearch = new ConnectionSearch(value, addConnectionUser);
connectionSearch.PerformSearch();
}
}
void addConnectionUser(object uc)
{
//pperform your add logic..
}
The code is pretty straight forward, however you will see in the setter is simply cancelling an existing request and then creating a new request. You could put some disposal cleanup logic in place but this should get you started.
You can implement some sort of debouncer which will encapsulate the logics of task result debouncing, i.e. it will assure if you run many tasks, then only the latest task result will be used:
public class TaskDebouncer<TResult>
{
public delegate void TaskDebouncerHandler(TResult result, object sender);
public event TaskDebouncerHandler OnCompleted;
public event TaskDebouncerHandler OnDebounced;
private Task _lastTask;
private object _lock = new object();
public void Run(Task<TResult> task)
{
lock (_lock)
{
_lastTask = task;
}
task.ContinueWith(t =>
{
if (t.IsFaulted)
throw t.Exception;
lock (_lock)
{
if (_lastTask == task)
{
OnCompleted?.Invoke(t.Result, this);
}
else
{
OnDebounced?.Invoke(t.Result, this);
}
}
});
}
public async Task WaitLast()
{
await _lastTask;
}
}
Then, you can just do:
private readonly TaskDebouncer<Connections[]> _connectionsDebouncer = new TaskDebouncer<Connections[]>();
public ClassName()
{
_connectionsDebouncer.OnCompleted += OnConnectionUpdate;
}
public void OnConnectionUpdate(Connections[] connections, object sender)
{
RunOnMainThread(SearchResult.Clear);
isAllFriends = false;
currentPage = 0;
foreach (var conn in connections)
RunOnMainThread(() => SearchResult.Add(new ConnectionUser(conn)));
}
private string searchPhrase;
public string SearchPhrase
{
get => searchPhrase;
set
{
SetProperty(ref searchPhrase, value);
_connectionsDebouncer.Add(RunInAsync(LoadData));
}
}
private async Task<Connection[]> LoadData()
{
return await connectionRepository
.GetConnections(currentPage, pageSize, searchPhrase)
.Where(conn => conn.Type != UserConnection.TypeEnum.Awaiting)
.ToArray();
}
It is not pretty clear what RunInAsync and RunOnMainThread methods are.
I guess, you don't actually need them.

How to implement a continuous producer-consumer pattern inside a Windows Service

Here's what I'm trying to do:
Keep a queue in memory of items that need processed (i.e. IsProcessed = 0)
Every 5 seconds, get unprocessed items from the db, and if they're not already in the queue, add them
Continuous pull items from the queue, process them, and each time an item is processed, update it in the db (IsProcessed = 1)
Do this all "as parallel as possible"
I have a constructor for my service like
public MyService()
{
Ticker.Elapsed += FillQueue;
}
and I start that timer when the service starts like
protected override void OnStart(string[] args)
{
Ticker.Enabled = true;
Task.Run(() => { ConsumeWork(); });
}
and my FillQueue is like
private static async void FillQueue(object source, ElapsedEventArgs e)
{
var items = GetUnprocessedItemsFromDb();
foreach(var item in items)
{
if(!Work.Contains(item))
{
Work.Enqueue(item);
}
}
}
and my ConsumeWork is like
private static void ConsumeWork()
{
while(true)
{
if(Work.Count > 0)
{
var item = Work.Peek();
Process(item);
Work.Dequeue();
}
else
{
Thread.Sleep(500);
}
}
}
However this is probably a naive implementation and I'm wondering whether .NET has any type of class that is exactly what I need for this type of situation.
Though #JSteward' answer is a good start, you can improve it with mixing up the TPL-Dataflow and Rx.NET extensions, as a dataflow block may easily become an observer for your data, and with Rx Timer it will be much less effort for you (Rx.Timer explanation).
We can adjust MSDN article for your needs, like this:
private const int EventIntervalInSeconds = 5;
private const int DueIntervalInSeconds = 60;
var source =
// sequence of Int64 numbers, starting from 0
// https://msdn.microsoft.com/en-us/library/hh229435.aspx
Observable.Timer(
// fire first event after 1 minute waiting
TimeSpan.FromSeconds(DueIntervalInSeconds),
// fire all next events each 5 seconds
TimeSpan.FromSeconds(EventIntervalInSeconds))
// each number will have a timestamp
.Timestamp()
// each time we select some items to process
.SelectMany(GetItemsFromDB)
// filter already added
.Where(i => !_processedItemIds.Contains(i.Id));
var action = new ActionBlock<Item>(ProcessItem, new ExecutionDataflowBlockOptions
{
// we can start as many item processing as processor count
MaxDegreeOfParallelism = Environment.ProcessorCount,
});
IDisposable subscription = source.Subscribe(action.AsObserver());
Also, your check for item being already processed isn't quite accurate, as there is a possibility to item get selected as unprocessed from db right at the time you've finished it's processing, yet didn't update it in database. In this case item will be removed from Queue<T>, and after that added there again by producer, this is why I've added the ConcurrentBag<T> to this solution (HashSet<T> isn't thread-safe):
private static async Task ProcessItem(Item item)
{
if (_processedItemIds.Contains(item.Id))
{
return;
}
_processedItemIds.Add(item.Id);
// actual work here
// save item as processed in database
// we need to wait to ensure item not to appear in queue again
await Task.Delay(TimeSpan.FromSeconds(EventIntervalInSeconds * 2));
// clear the processed cache to reduce memory usage
_processedItemIds.Remove(item.Id);
}
public class Item
{
public Guid Id { get; set; }
}
// temporary cache for items in process
private static ConcurrentBag<Guid> _processedItemIds = new ConcurrentBag<Guid>();
private static IEnumerable<Item> GetItemsFromDB(Timestamped<long> time)
{
// log event timing
Console.WriteLine($"Event # {time.Value} at {time.Timestamp}");
// return items from DB
return new[] { new Item { Id = Guid.NewGuid() } };
}
You can implement cache clean up in other way, for example, start a "GC" timer, which will remove processed items from cache on regular basis.
To stop events and processing items you should Dispose the subscription and, maybe, Complete the ActionBlock:
subscription.Dispose();
action.Complete();
You can find more information about Rx.Net in their guidelines on github.
You could use an ActionBlock to do your processing, it has a built in queue that you can post work to. You can read up on tpl-dataflow here: Intro to TPL-Dataflow also Introduction to Dataflow, Part 1. Finally, this is a quick sample to get you going. I've left out a lot but it should at least get you started.
using System;
using System.Threading;
using System.Threading.Tasks;
using System.Threading.Tasks.Dataflow;
namespace MyWorkProcessor {
public class WorkProcessor {
public WorkProcessor() {
Processor = CreatePipeline();
}
public async Task StartProcessing() {
try {
await Task.Run(() => GetWorkFromDatabase());
} catch (OperationCanceledException) {
//handle cancel
}
}
private CancellationTokenSource cts {
get;
set;
}
private ITargetBlock<WorkItem> Processor {
get;
}
private TimeSpan DatabasePollingFrequency {
get;
} = TimeSpan.FromSeconds(5);
private ITargetBlock<WorkItem> CreatePipeline() {
var options = new ExecutionDataflowBlockOptions() {
BoundedCapacity = 100,
CancellationToken = cts.Token
};
return new ActionBlock<WorkItem>(item => ProcessWork(item), options);
}
private async Task GetWorkFromDatabase() {
while (!cts.IsCancellationRequested) {
var work = await GetWork();
await Processor.SendAsync(work);
await Task.Delay(DatabasePollingFrequency);
}
}
private async Task<WorkItem> GetWork() {
return await Context.GetWork();
}
private void ProcessWork(WorkItem item) {
//do processing
}
}
}

Custom thread pool supporting async actions

I would like to have a custom thread pool satisfying the following requirements:
Real threads are preallocated according to the pool capacity. The actual work is free to use the standard .NET thread pool, if needed to spawn concurrent tasks.
The pool must be able to return the number of idle threads. The returned number may be less than the actual number of the idle threads, but it must not be greater. Of course, the more accurate the number the better.
Queuing work to the pool should return a corresponding Task, which should place nice with the Task based API.
NEW The max job capacity (or degree of parallelism) should be adjustable dynamically. Trying to reduce the capacity does not have to take effect immediately, but increasing it should do so immediately.
The rationale for the first item is depicted below:
The machine is not supposed to be running more than N work items concurrently, where N is relatively small - between 10 and 30.
The work is fetched from the database and if K items are fetched then we want to make sure that there are K idle threads to start the work right away. A situation where work is fetched from the database, but remains waiting for the next available thread is unacceptable.
The last item also explains the reason for having the idle thread count - I am going to fetch that many work items from the database. It also explains why the reported idle thread count must never be higher than the actual one - otherwise I might fetch more work that can be immediately started.
Anyway, here is my implementation along with a small program to test it (BJE stands for Background Job Engine):
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace TaskStartLatency
{
public class BJEThreadPool
{
private sealed class InternalTaskScheduler : TaskScheduler
{
private int m_idleThreadCount;
private readonly BlockingCollection<Task> m_bus;
public InternalTaskScheduler(int threadCount, BlockingCollection<Task> bus)
{
m_idleThreadCount = threadCount;
m_bus = bus;
}
public void RunInline(Task task)
{
Interlocked.Decrement(ref m_idleThreadCount);
try
{
TryExecuteTask(task);
}
catch
{
// The action is responsible itself for the error handling, for the time being...
}
Interlocked.Increment(ref m_idleThreadCount);
}
public int IdleThreadCount
{
get { return m_idleThreadCount; }
}
#region Overrides of TaskScheduler
protected override void QueueTask(Task task)
{
m_bus.Add(task);
}
protected override bool TryExecuteTaskInline(Task task, bool taskWasPreviouslyQueued)
{
return TryExecuteTask(task);
}
protected override IEnumerable<Task> GetScheduledTasks()
{
throw new NotSupportedException();
}
#endregion
public void DecrementIdleThreadCount()
{
Interlocked.Decrement(ref m_idleThreadCount);
}
}
private class ThreadContext
{
private readonly InternalTaskScheduler m_ts;
private readonly BlockingCollection<Task> m_bus;
private readonly CancellationTokenSource m_cts;
public readonly Thread Thread;
public ThreadContext(string name, InternalTaskScheduler ts, BlockingCollection<Task> bus, CancellationTokenSource cts)
{
m_ts = ts;
m_bus = bus;
m_cts = cts;
Thread = new Thread(Start)
{
IsBackground = true,
Name = name
};
Thread.Start();
}
private void Start()
{
try
{
foreach (var task in m_bus.GetConsumingEnumerable(m_cts.Token))
{
m_ts.RunInline(task);
}
}
catch (OperationCanceledException)
{
}
m_ts.DecrementIdleThreadCount();
}
}
private readonly InternalTaskScheduler m_ts;
private readonly CancellationTokenSource m_cts = new CancellationTokenSource();
private readonly BlockingCollection<Task> m_bus = new BlockingCollection<Task>();
private readonly List<ThreadContext> m_threadCtxs = new List<ThreadContext>();
public BJEThreadPool(int threadCount)
{
m_ts = new InternalTaskScheduler(threadCount, m_bus);
for (int i = 0; i < threadCount; ++i)
{
m_threadCtxs.Add(new ThreadContext("BJE Thread " + i, m_ts, m_bus, m_cts));
}
}
public void Terminate()
{
m_cts.Cancel();
foreach (var t in m_threadCtxs)
{
t.Thread.Join();
}
}
public Task Run(Action<CancellationToken> action)
{
return Task.Factory.StartNew(() => action(m_cts.Token), m_cts.Token, TaskCreationOptions.DenyChildAttach, m_ts);
}
public Task Run(Action action)
{
return Task.Factory.StartNew(action, m_cts.Token, TaskCreationOptions.DenyChildAttach, m_ts);
}
public int IdleThreadCount
{
get { return m_ts.IdleThreadCount; }
}
}
class Program
{
static void Main()
{
const int THREAD_COUNT = 32;
var pool = new BJEThreadPool(THREAD_COUNT);
var tcs = new TaskCompletionSource<bool>();
var tasks = new List<Task>();
var allRunning = new CountdownEvent(THREAD_COUNT);
for (int i = pool.IdleThreadCount; i > 0; --i)
{
var index = i;
tasks.Add(pool.Run(cancellationToken =>
{
Console.WriteLine("Started action " + index);
allRunning.Signal();
tcs.Task.Wait(cancellationToken);
Console.WriteLine(" Ended action " + index);
}));
}
Console.WriteLine("pool.IdleThreadCount = " + pool.IdleThreadCount);
allRunning.Wait();
Debug.Assert(pool.IdleThreadCount == 0);
int expectedIdleThreadCount = THREAD_COUNT;
Console.WriteLine("Press [c]ancel, [e]rror, [a]bort or any other key");
switch (Console.ReadKey().KeyChar)
{
case 'c':
Console.WriteLine("Cancel All");
tcs.TrySetCanceled();
break;
case 'e':
Console.WriteLine("Error All");
tcs.TrySetException(new Exception("Failed"));
break;
case 'a':
Console.WriteLine("Abort All");
pool.Terminate();
expectedIdleThreadCount = 0;
break;
default:
Console.WriteLine("Done All");
tcs.TrySetResult(true);
break;
}
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException exc)
{
Console.WriteLine(exc.Flatten().InnerException.Message);
}
Debug.Assert(pool.IdleThreadCount == expectedIdleThreadCount);
pool.Terminate();
Console.WriteLine("Press any key");
Console.ReadKey();
}
}
}
It is a very simple implementation and it appears to be working. However, there is a problem - the BJEThreadPool.Run method does not accept asynchronous methods. I.e. my implementation does not allow me to add the following overloads:
public Task Run(Func<CancellationToken, Task> action)
{
return Task.Factory.StartNew(() => action(m_cts.Token), m_cts.Token, TaskCreationOptions.DenyChildAttach, m_ts).Unwrap();
}
public Task Run(Func<Task> action)
{
return Task.Factory.StartNew(action, m_cts.Token, TaskCreationOptions.DenyChildAttach, m_ts).Unwrap();
}
The pattern I use in InternalTaskScheduler.RunInline does not work in this case.
So, my question is how to add the support for asynchronous work items? I am fine with changing the entire design as long as the requirements outlined at the beginning of the post are upheld.
EDIT
I would like to clarify the intented usage of the desired pool. Please, observe the following code:
if (pool.IdleThreadCount == 0)
{
return;
}
foreach (var jobData in FetchFromDB(pool.IdleThreadCount))
{
pool.Run(CreateJobAction(jobData));
}
Notes:
The code is going to be run periodically, say every 1 minute.
The code is going to be run concurrently by multiple machines watching the same database.
FetchFromDB is going to use the technique described in Using SQL Server as a DB queue with multiple clients to atomically fetch and lock the work from the DB.
CreateJobAction is going to invoke the code denoted by jobData (the job code) and close the work upon the completion of that code. The job code is out of my control and it could be pretty much anything - heavy CPU bound code or light asynchronous IO bound code, badly written synchronous IO bound code or a mix of all. It could run for minutes and it could run for hours. Closing the work is my code and it would by asynchronous IO bound code. Because of this, the signature of the returned job action is that of an asynchronous method.
Item 2 underlines the importance of correctly identifying the amount of idle threads. If there are 900 pending work items and 10 agent machines I cannot allow an agent to fetch 300 work items and queue them on the thread pool. Why? Because, it is most unlikely that the agent will be able to run 300 work items concurrently. It will run some, sure enough, but others will be waiting in the thread pool work queue. Suppose it will run 100 and let 200 wait (even though 100 is probably far fetched). This wields 3 fully loaded agents and 7 idle ones. But only 300 work items out of 900 are actually being processed concurrently!!!
My goal is to maximize the spread of the work amongst the available agents. Ideally, I should evaluate the load of an agent and the "heaviness" of the pending work, but it is a formidable task and is reserved for the future versions. Right now, I wish to assign each agent the max job capacity with the intention to provide the means to increase/decrease it dynamically without restarting the agents.
Next observation. The work can take quite a long time to run and it could be all synchronous code. As far as I understand it is undesirable to utilize thread pool threads for such kind of work.
EDIT 2
There is a statement that TaskScheduler is only for the CPU bound work. But what if I do not know the nature of the work? I mean it is a general purpose Background Job Engine and it runs thousands of different kinds of jobs. I do not have means to tell "that job is CPU bound" and "that on is synchronous IO bound" and yet another one is asynchronous IO bound. I wish I could, but I cannot.
EDIT 3
At the end, I do not use the SemaphoreSlim, but neither do I use the TaskScheduler - it finally trickled down my thick skull that it is unappropriate and plain wrong, plus it makes the code overly complex.
Still, I failed to see how SemaphoreSlim is the way. The proposed pattern:
public async Task Enqueue(Func<Task> taskGenerator)
{
await semaphore.WaitAsync();
try
{
await taskGenerator();
}
finally
{
semaphore.Release();
}
}
Expects taskGenerator either be an asynchronous IO bound code or open a new thread otherwise. However, I have no means to determine whether the work to be executed is one or another. Plus, as I have learned from SemaphoreSlim.WaitAsync continuation code if the semaphore is unlocked, the code following the WaitAsync() is going to run on the same thread, which is not very good for me.
Anyway, below is my implementation, in case anyone fancies. Unfortunately, I am yet to understand how to reduce the pool thread count dynamically, but this is a topic for another question.
using System;
using System.Collections.Concurrent;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace TaskStartLatency
{
public interface IBJEThreadPool
{
void SetThreadCount(int threadCount);
void Terminate();
Task Run(Action action);
Task Run(Action<CancellationToken> action);
Task Run(Func<Task> action);
Task Run(Func<CancellationToken, Task> action);
int IdleThreadCount { get; }
}
public class BJEThreadPool : IBJEThreadPool
{
private interface IActionContext
{
Task Run(CancellationToken ct);
TaskCompletionSource<object> TaskCompletionSource { get; }
}
private class ActionContext : IActionContext
{
private readonly Action m_action;
public ActionContext(Action action)
{
m_action = action;
TaskCompletionSource = new TaskCompletionSource<object>();
}
#region Implementation of IActionContext
public Task Run(CancellationToken ct)
{
m_action();
return null;
}
public TaskCompletionSource<object> TaskCompletionSource { get; private set; }
#endregion
}
private class CancellableActionContext : IActionContext
{
private readonly Action<CancellationToken> m_action;
public CancellableActionContext(Action<CancellationToken> action)
{
m_action = action;
TaskCompletionSource = new TaskCompletionSource<object>();
}
#region Implementation of IActionContext
public Task Run(CancellationToken ct)
{
m_action(ct);
return null;
}
public TaskCompletionSource<object> TaskCompletionSource { get; private set; }
#endregion
}
private class AsyncActionContext : IActionContext
{
private readonly Func<Task> m_action;
public AsyncActionContext(Func<Task> action)
{
m_action = action;
TaskCompletionSource = new TaskCompletionSource<object>();
}
#region Implementation of IActionContext
public Task Run(CancellationToken ct)
{
return m_action();
}
public TaskCompletionSource<object> TaskCompletionSource { get; private set; }
#endregion
}
private class AsyncCancellableActionContext : IActionContext
{
private readonly Func<CancellationToken, Task> m_action;
public AsyncCancellableActionContext(Func<CancellationToken, Task> action)
{
m_action = action;
TaskCompletionSource = new TaskCompletionSource<object>();
}
#region Implementation of IActionContext
public Task Run(CancellationToken ct)
{
return m_action(ct);
}
public TaskCompletionSource<object> TaskCompletionSource { get; private set; }
#endregion
}
private readonly CancellationTokenSource m_ctsTerminateAll = new CancellationTokenSource();
private readonly BlockingCollection<IActionContext> m_bus = new BlockingCollection<IActionContext>();
private readonly LinkedList<Thread> m_threads = new LinkedList<Thread>();
private int m_idleThreadCount;
private static int s_threadCount;
public BJEThreadPool(int threadCount)
{
ReserveAdditionalThreads(threadCount);
}
private void ReserveAdditionalThreads(int n)
{
for (int i = 0; i < n; ++i)
{
var index = Interlocked.Increment(ref s_threadCount) - 1;
var t = new Thread(Start)
{
IsBackground = true,
Name = "BJE Thread " + index
};
Interlocked.Increment(ref m_idleThreadCount);
t.Start();
m_threads.AddLast(t);
}
}
private void Start()
{
try
{
foreach (var actionContext in m_bus.GetConsumingEnumerable(m_ctsTerminateAll.Token))
{
RunWork(actionContext).Wait();
}
}
catch (OperationCanceledException)
{
}
catch
{
// Should never happen - log the error
}
Interlocked.Decrement(ref m_idleThreadCount);
}
private async Task RunWork(IActionContext actionContext)
{
Interlocked.Decrement(ref m_idleThreadCount);
try
{
var task = actionContext.Run(m_ctsTerminateAll.Token);
if (task != null)
{
await task;
}
actionContext.TaskCompletionSource.SetResult(null);
}
catch (OperationCanceledException)
{
actionContext.TaskCompletionSource.TrySetCanceled();
}
catch (Exception exc)
{
actionContext.TaskCompletionSource.TrySetException(exc);
}
Interlocked.Increment(ref m_idleThreadCount);
}
private Task PostWork(IActionContext actionContext)
{
m_bus.Add(actionContext);
return actionContext.TaskCompletionSource.Task;
}
#region Implementation of IBJEThreadPool
public void SetThreadCount(int threadCount)
{
if (threadCount > m_threads.Count)
{
ReserveAdditionalThreads(threadCount - m_threads.Count);
}
else if (threadCount < m_threads.Count)
{
throw new NotSupportedException();
}
}
public void Terminate()
{
m_ctsTerminateAll.Cancel();
foreach (var t in m_threads)
{
t.Join();
}
}
public Task Run(Action action)
{
return PostWork(new ActionContext(action));
}
public Task Run(Action<CancellationToken> action)
{
return PostWork(new CancellableActionContext(action));
}
public Task Run(Func<Task> action)
{
return PostWork(new AsyncActionContext(action));
}
public Task Run(Func<CancellationToken, Task> action)
{
return PostWork(new AsyncCancellableActionContext(action));
}
public int IdleThreadCount
{
get { return m_idleThreadCount; }
}
#endregion
}
public static class Extensions
{
public static Task WithCancellation(this Task task, CancellationToken token)
{
return task.ContinueWith(t => t.GetAwaiter().GetResult(), token);
}
}
class Program
{
static void Main()
{
const int THREAD_COUNT = 16;
var pool = new BJEThreadPool(THREAD_COUNT);
var tcs = new TaskCompletionSource<bool>();
var tasks = new List<Task>();
var allRunning = new CountdownEvent(THREAD_COUNT);
for (int i = pool.IdleThreadCount; i > 0; --i)
{
var index = i;
tasks.Add(pool.Run(async ct =>
{
Console.WriteLine("Started action " + index);
allRunning.Signal();
await tcs.Task.WithCancellation(ct);
Console.WriteLine(" Ended action " + index);
}));
}
Console.WriteLine("pool.IdleThreadCount = " + pool.IdleThreadCount);
allRunning.Wait();
Debug.Assert(pool.IdleThreadCount == 0);
int expectedIdleThreadCount = THREAD_COUNT;
Console.WriteLine("Press [c]ancel, [e]rror, [a]bort or any other key");
switch (Console.ReadKey().KeyChar)
{
case 'c':
Console.WriteLine("ancel All");
tcs.TrySetCanceled();
break;
case 'e':
Console.WriteLine("rror All");
tcs.TrySetException(new Exception("Failed"));
break;
case 'a':
Console.WriteLine("bort All");
pool.Terminate();
expectedIdleThreadCount = 0;
break;
default:
Console.WriteLine("Done All");
tcs.TrySetResult(true);
break;
}
try
{
Task.WaitAll(tasks.ToArray());
}
catch (AggregateException exc)
{
Console.WriteLine(exc.Flatten().InnerException.Message);
}
Debug.Assert(pool.IdleThreadCount == expectedIdleThreadCount);
pool.Terminate();
Console.WriteLine("Press any key");
Console.ReadKey();
}
}
}
Asynchronous "work items" are often based on async IO. Async IO does not use threads while it runs. Task schedulers are used to execute CPU work (tasks based on a delegate). The concept TaskScheduler does not apply. You cannot use a custom TaskScheduler to influence what async code does.
Make your work items throttle themselves:
static SemaphoreSlim sem = new SemaphoreSlim(maxDegreeOfParallelism); //shared object
async Task MyWorkerFunction()
{
await sem.WaitAsync();
try
{
MyWork();
}
finally
{
sem.Release();
}
}
As mentioned in another answer by usr you can't do this with a TaskScheduler as that is only for CPU bound work, not limiting the level of parallelization of all types of work, whether parallel or not. He also shows you how you can use a SemaphoreSlim to asynchronously limit the degrees of parallelism.
You can expand on this to generalize these concepts in a few ways. The one that seems like it would be the most beneficial to you would be to create a special type of queue that takes operations that return a Task and executes them in such a way that a given max degree of parallelization is achieved.
public class FixedParallelismQueue
{
private SemaphoreSlim semaphore;
public FixedParallelismQueue(int maxDegreesOfParallelism)
{
semaphore = new SemaphoreSlim(maxDegreesOfParallelism,
maxDegreesOfParallelism);
}
public async Task<T> Enqueue<T>(Func<Task<T>> taskGenerator)
{
await semaphore.WaitAsync();
try
{
return await taskGenerator();
}
finally
{
semaphore.Release();
}
}
public async Task Enqueue(Func<Task> taskGenerator)
{
await semaphore.WaitAsync();
try
{
await taskGenerator();
}
finally
{
semaphore.Release();
}
}
}
This allows you to create a queue for your application (you can even have several separate queues if you want) that has a fixed degree of parallelization. You can then provide operations returning a Task when they complete and the queue will schedule it when it can and return a Task representing when that unit of work has finished.

Categories

Resources