How to reset a postponed / declined message in TPL Dataflow - c#

I am using TDF for my application which works great so far, unfortunately i stumbled upon a specific problem where it seems it can not be handled directly with existing Dataflow mechanisms:
I have N producers (in this case BufferBlocks) which are all linked to only 1 (all to the same) ActionBlock. This block always processes 1 item at a time, and also only has capacity for 1 item.
To the link from the producers to the ActionBlock I also want to add a filter, but the special case here is that the filter condition can change independently of the processed item, and the item must not be discarded!
So basically i want to process all items, but the order / time can change when an item will be processed.
Unfortunately I learned, that if an item is "declined" once -> the filter condition evaluates false, and if this item is not passed to another block (e.g. NullTarget), the target block does not retry the same item (and does not re-evaluate the filter).
public class ConsumeTest
{
private readonly BufferBlock<int> m_bufferBlock1;
private readonly BufferBlock<int> m_bufferBlock2;
private readonly ActionBlock<int> m_actionBlock;
public ConsumeTest()
{
m_bufferBlock1 = new BufferBlock<int>();
m_bufferBlock2 = new BufferBlock<int>();
var options = new ExecutionDataflowBlockOptions() { BoundedCapacity = 1, MaxDegreeOfParallelism = 1 };
m_actionBlock = new ActionBlock<int>((item) => BlockAction(item), options);
var start = DateTime.Now;
var elapsed = TimeSpan.FromMinutes(1);
m_bufferBlock1.LinkTo(m_actionBlock, x => IsTimeElapsed(start, elapsed));
m_bufferBlock2.LinkTo(m_actionBlock);
FillBuffers();
}
private void BlockAction(int item)
{
Console.WriteLine(item);
Thread.Sleep(2000);
}
private void FillBuffers()
{
for (int i = 0; i < 1000; i++)
{
if (i % 2 == 0)
{
m_bufferBlock1.Post(i);
}
else
{
m_bufferBlock2.Post(i);
}
}
}
private bool IsTimeElapsed(DateTime start, TimeSpan elapsed)
{
Console.WriteLine("checking time elapsed");
return DateTime.Now > (start + elapsed);
}
public async Task Start()
{
await m_actionBlock.Completion;
}
}
The code sets up a testing pipeline, and fills the two buffers with odd and even numbers. Both BufferBlocks are connected to one single ActionBlock that only prints the "processed" number and waits 2 secs.
The filter condition between m_bufferBlock1 and the m_actionBlock checks (for testing purposes) if 1 minute is elapsed since we started the whole thing.
If we run this, it generates the following output:
1
checking time elapsed
3
5
7
9
11
13
15
17
19
As we can see, the ActionBlock takes the first element from the BufferBlock without filter, then tries to take an element from the BufferBlock with a filter. The filter evaluates false and it continues to take all elements from the block without the filter.
My expectation was that after an element from the BufferBlock without filter has been processed, it tries to take the element from the other BufferBlock with the filter again, evaluating it again.
This would be my expected (or desired) result:
1
checking time elapsed
3
checking time elapsed
5
checking time elapsed
7
checking time elapsed
9
checking time elapsed
11
checking time elapsed
13
checking time elapsed
15
// after timer has elapsed take elements also from other buffer
2
17
4
19
My question now is, is there a way to "reset" the already "declined" message so that it is evaluated again, or is there another way by modeling it differently? To outline, it is NOT important that they really are pulled from both Buffers strictly alternating! (because i know that this is scheduling dependent and it is totally fine if from time to time 2 items from the same block are dequeued)
But it is important that the "declined" message must not be discarded or re-queued as the order within one buffer is important.
Thank you in advance

One idea is to refresh the link between the two blocks, periodically or on demand. Implementing a periodically refreshable LinkTo is not very difficult. Here is an implementation:
public static IDisposable LinkTo<TOutput>(this ISourceBlock<TOutput> source,
ITargetBlock<TOutput> target, Predicate<TOutput> predicate,
TimeSpan refreshInterval, DataflowLinkOptions linkOptions = null)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (target == null) throw new ArgumentNullException(nameof(target));
if (predicate == null) throw new ArgumentNullException(nameof(predicate));
if (refreshInterval < TimeSpan.Zero)
throw new ArgumentOutOfRangeException(nameof(refreshInterval));
linkOptions = linkOptions ?? new DataflowLinkOptions();
var locker = new object();
var cts = new CancellationTokenSource();
var token = cts.Token;
var currentLink = source.LinkTo(target, linkOptions, predicate);
var loopTask = Task.Run(async () =>
{
try
{
while (true)
{
await Task.Delay(refreshInterval, token).ConfigureAwait(false);
currentLink.Dispose();
currentLink = source.LinkTo(target, linkOptions, predicate);
}
}
finally
{
lock (locker) { cts.Dispose(); cts = null; }
}
}, token);
_ = Task.Factory.ContinueWhenAny(new[] { source.Completion, target.Completion },
_ => { lock (locker) cts?.Cancel(); }, token, TaskContinuationOptions.None,
TaskScheduler.Default);
return new Unlinker(() =>
{
lock (locker) cts?.Cancel();
// Wait synchronously the task to complete, ignoring cancellation exceptions.
try { loopTask.GetAwaiter().GetResult(); } catch (OperationCanceledException) { }
currentLink.Dispose();
});
}
private struct Unlinker : IDisposable
{
private readonly Action _action;
public Unlinker(Action disposeAction) => _action = disposeAction;
void IDisposable.Dispose() => _action?.Invoke();
}
Usage example:
m_bufferBlock1.LinkTo(m_actionBlock, x => IsTimeElapsed(start, elapsed),
refreshInterval: TimeSpan.FromSeconds(10));
The link between the m_bufferBlock1 and the m_actionBlock will be refreshed every 10 seconds, until one of the two blocks completes.

Related

Reactive Extensions: Items not processed, when busy processing previous items

I'm using Rx to process events in groups of maximal X items or after no new events were sent for Y ms. For this purpose I use the code suggested in this answer here.
This worked fine until i recognized, that when I wait until all events are processed before continuing, it happens that some events stay in the queue unprocessed.
I found out, that this happens when a new event is sent while the queue is still busy processing the previous events. If an event is sent after the processing finished, all events in the queue are processed, even the events which were stuck previously.
Test code and output
To reproduce the behavior I wrote some sample code which adds numbers to the subject and then waits until the number is processed or the timeout is hit (10sec).
If the number is processed another number is added.
This second number is sent while the processing of the previous number is still busy, this leads to the described behavior.
This behavior can also be reproduced without any delay, it just won't happen regularly, because the timing has to be just right for it to happen.
private DispatcherScheduler _schedulerProviderDispatcher = new DispatcherScheduler(Application.Current.Dispatcher);
private Subject<int> _numberEvents = new Subject<int>();
private readonly ConcurrentDictionary<int, object> _numbersInProgress = new ConcurrentDictionary<int, object>();
private IDisposable _disposable;
public async Task SendNumbersAsync()
{
_disposable = _numberEvents.Synchronize()
.BufferUntilInactive(TimeSpan.FromMilliseconds(2000), _schedulerProviderDispatcher, 100)
.Subscribe(numbers =>
{
if (numbers.Count == 0)
return;
var values = string.Empty;
// Handle numbers in queue and remove them from the dictionary
for (var i = 0; i < numbers.Count; ++i)
{
var number = numbers[i];
if (_numbersInProgress.TryRemove(number, out _) == false)
Trace.WriteLine($"Failed to remove number '{number}'");
values += $"{number}, ";
Trace.WriteLine($"Handled Number: {number}, Count: {i + 1}/{numbers.Count}");
}
Trace.WriteLine($"Handled '{numbers.Count}' numbers. Values: '{values}'");
// delay the execution by 1000ms to simulate a slow processing of the numbers
// and create a timeframe where new numbers will be sent and received, but the subject is still busy with the previous processing
var task = Task.Delay(1000);
Task.WaitAll(task);
Trace.WriteLine($"Finished handling '{numbers.Count}' numbers. Values: '{values}'");
});
// push numbers to the subject
var number = 0;
while (_numbersInProgress.Count == 0)
{
++number;
Trace.WriteLine($"Sending number: {number}");
_numberEvents.OnNext(number);
Trace.WriteLine($"Finished sending number: {number}");
// add the number to the progress dictionary
if (_numbersInProgress.TryAdd(number, 0) == false)
Trace.WriteLine($"Failed to add number '{number}'");
var waitCount = 0;
var timedOut = false;
// wait for the numbers to be processed
// if we waited 100 times (10sec) stop
while (_numbersInProgress.Count != 0)
{
Trace.WriteLine($"Waiting ({waitCount})");
await Task.Delay(100);
++waitCount;
if (waitCount > 100)
{
timedOut = true;
break;
}
}
if (timedOut)
Trace.WriteLine($"Timeout waiting: {number}");
else
Trace.WriteLine($"Finished waiting: {number}");
}
// if we still have unprocessed numbers
// send another number to the subject to trigger processing of both numbers
if (_numbersInProgress.Count != 0)
{
// Failed
Trace.WriteLine($"Failed waiting: {number}");
Trace.WriteLine($"Sending number: {9999}");
_numberEvents.OnNext(9999);
Trace.WriteLine($"Finished sending number: {9999}");
}
}
And the extension method, used to group the numbers:
public static IObservable<IList<T>> BufferUntilInactive<T>(this IObservable<T> stream, TimeSpan delay, IScheduler scheduler = null, Int32? maxCount = null)
{
var s = scheduler ?? Scheduler.Default;
var publish = stream.Publish(p =>
{
var closes = p.Throttle(delay, s);
if (maxCount != null)
{
var overflows = p.Where((x, index) => index + 1 >= maxCount);
closes = closes.Amb(overflows);
}
return p.Window(() => closes).SelectMany(window => window.ToList());
});
return publish;
}
When executing this method, the trace outputs the following lines:
sending the second number
number 2 is sent before the processing of the previous sent number 1 is finished
timeout of the second number
the loop runs in the timeout condition after 100 wait cycles and then sends a new number, which triggeres the processing of both numbers in the queue
As you can see I've already tried to solve this problem by encapsulating the throttle and the grouping of the items (BufferUntilInactive) in a publish, but without any success.
Am I missing anything in the BufferUntilInactive method or somewhere else?

Number of Request before DDOSing. Limiting # of async Tasks [duplicate]

I am using the HTTPClient in System.Net.Http to make requests against an API. The API is limited to 10 requests per second.
My code is roughly like so:
List<Task> tasks = new List<Task>();
items..Select(i => tasks.Add(ProcessItem(i));
try
{
await Task.WhenAll(taskList.ToArray());
}
catch (Exception ex)
{
}
The ProcessItem method does a few things but always calls the API using the following:
await SendRequestAsync(..blah). Which looks like:
private async Task<Response> SendRequestAsync(HttpRequestMessage request, CancellationToken token)
{
token.ThrowIfCancellationRequested();
var response = await HttpClient
.SendAsync(request: request, cancellationToken: token).ConfigureAwait(continueOnCapturedContext: false);
token.ThrowIfCancellationRequested();
return await Response.BuildResponse(response);
}
Originally the code worked fine but when I started using Task.WhenAll I started getting 'Rate Limit Exceeded' messages from the API. How can I limit the rate at which requests are made?
Its worth noting that ProcessItem can make between 1-4 API calls depending on the item.
The API is limited to 10 requests per second.
Then just have your code do a batch of 10 requests, ensuring they take at least one second:
Items[] items = ...;
int index = 0;
while (index < items.Length)
{
var timer = Task.Delay(TimeSpan.FromSeconds(1.2)); // ".2" to make sure
var tasks = items.Skip(index).Take(10).Select(i => ProcessItemsAsync(i));
var tasksAndTimer = tasks.Concat(new[] { timer });
await Task.WhenAll(tasksAndTimer);
index += 10;
}
Update
My ProcessItems method makes 1-4 API calls depending on the item.
In this case, batching is not an appropriate solution. You need to limit an asynchronous method to a certain number, which implies a SemaphoreSlim. The tricky part is that you want to allow more calls over time.
I haven't tried this code, but the general idea I would go with is to have a periodic function that releases the semaphore up to 10 times. So, something like this:
private readonly SemaphoreSlim _semaphore = new SemaphoreSlim(10);
private async Task<Response> ThrottledSendRequestAsync(HttpRequestMessage request, CancellationToken token)
{
await _semaphore.WaitAsync(token);
return await SendRequestAsync(request, token);
}
private async Task PeriodicallyReleaseAsync(Task stop)
{
while (true)
{
var timer = Task.Delay(TimeSpan.FromSeconds(1.2));
if (await Task.WhenAny(timer, stop) == stop)
return;
// Release the semaphore at most 10 times.
for (int i = 0; i != 10; ++i)
{
try
{
_semaphore.Release();
}
catch (SemaphoreFullException)
{
break;
}
}
}
}
Usage:
// Start the periodic task, with a signal that we can use to stop it.
var stop = new TaskCompletionSource<object>();
var periodicTask = PeriodicallyReleaseAsync(stop.Task);
// Wait for all item processing.
await Task.WhenAll(taskList);
// Stop the periodic task.
stop.SetResult(null);
await periodicTask;
The answer is similar to this one.
Instead of using a list of tasks and WhenAll, use Parallel.ForEach and use ParallelOptions to limit the number of concurrent tasks to 10, and make sure each one takes at least 1 second:
Parallel.ForEach(
items,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
ProcessItems(item);
await Task.Delay(1000);
}
);
Or if you want to make sure each item takes as close to 1 second as possible:
Parallel.ForEach(
searches,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
var watch = new Stopwatch();
watch.Start();
ProcessItems(item);
watch.Stop();
if (watch.ElapsedMilliseconds < 1000) await Task.Delay((int)(1000 - watch.ElapsedMilliseconds));
}
);
Or:
Parallel.ForEach(
searches,
new ParallelOptions { MaxDegreeOfParallelism = 10 },
async item => {
await Task.WhenAll(
Task.Delay(1000),
Task.Run(() => { ProcessItems(item); })
);
}
);
UPDATED ANSWER
My ProcessItems method makes 1-4 API calls depending on the item. So with a batch size of 10 I still exceed the rate limit.
You need to implement a rolling window in SendRequestAsync. A queue containing timestamps of each request is a suitable data structure. You dequeue entries with a timestamp older than 10 seconds. As it so happens, there is an implementation as an answer to a similar question on SO.
ORIGINAL ANSWER
May still be useful to others
One straightforward way to handle this is to batch your requests in groups of 10, run those concurrently, and then wait until a total of 10 seconds has elapsed (if it hasn't already). This will bring you in right at the rate limit if the batch of requests can complete in 10 seconds, but is less than optimal if the batch of requests takes longer. Have a look at the .Batch() extension method in MoreLinq. Code would look approximately like
foreach (var taskList in tasks.Batch(10))
{
Stopwatch sw = Stopwatch.StartNew(); // From System.Diagnostics
await Task.WhenAll(taskList.ToArray());
if (sw.Elapsed.TotalSeconds < 10.0)
{
// Calculate how long you still have to wait and sleep that long
// You might want to wait 10.5 or 11 seconds just in case the rate
// limiting on the other side isn't perfectly implemented
}
}
https://github.com/thomhurst/EnumerableAsyncProcessor
I've written a library to help with this sort of logic.
Usage would be:
var responses = await AsyncProcessorBuilder.WithItems(items) // Or Extension Method: items.ToAsyncProcessorBuilder()
.SelectAsync(item => ProcessItem(item), CancellationToken.None)
.ProcessInParallel(levelOfParallelism: 10, TimeSpan.FromSeconds(1));

C# Abortable Asynchronous Fifo Queue - leaking massive amounts of memory

I need to process data from a producer in FIFO fashion with the ability to abort processing if the same producer produces a new bit of data.
So I implemented an abortable FIFO queue based on Stephen Cleary's AsyncCollection (called AsyncCollectionAbortableFifoQueuein my sample) and one on TPL's BufferBlock (BufferBlockAbortableAsyncFifoQueue in my sample). Here's the implementation based on AsyncCollection
public class AsyncCollectionAbortableFifoQueue<T> : IExecutableAsyncFifoQueue<T>
{
private AsyncCollection<AsyncWorkItem<T>> taskQueue = new AsyncCollection<AsyncWorkItem<T>>();
private readonly CancellationToken stopProcessingToken;
public AsyncCollectionAbortableFifoQueue(CancellationToken cancelToken)
{
stopProcessingToken = cancelToken;
_ = processQueuedItems();
}
public Task<T> EnqueueTask(Func<Task<T>> action, CancellationToken? cancelToken)
{
var tcs = new TaskCompletionSource<T>();
var item = new AsyncWorkItem<T>(tcs, action, cancelToken);
taskQueue.Add(item);
return tcs.Task;
}
protected virtual async Task processQueuedItems()
{
while (!stopProcessingToken.IsCancellationRequested)
{
try
{
var item = await taskQueue.TakeAsync(stopProcessingToken).ConfigureAwait(false);
if (item.CancelToken.HasValue && item.CancelToken.Value.IsCancellationRequested)
item.TaskSource.SetCanceled();
else
{
try
{
T result = await item.Action().ConfigureAwait(false);
item.TaskSource.SetResult(result); // Indicate completion
}
catch (Exception ex)
{
if (ex is OperationCanceledException && ((OperationCanceledException)ex).CancellationToken == item.CancelToken)
item.TaskSource.SetCanceled();
item.TaskSource.SetException(ex);
}
}
}
catch (Exception) { }
}
}
}
public interface IExecutableAsyncFifoQueue<T>
{
Task<T> EnqueueTask(Func<Task<T>> action, CancellationToken? cancelToken);
}
processQueuedItems is the task that dequeues AsyncWorkItem's from the queue, and executes them unless cancellation has been requested.
The asynchronous action to execute gets wrapped into an AsyncWorkItem which looks like this
internal class AsyncWorkItem<T>
{
public readonly TaskCompletionSource<T> TaskSource;
public readonly Func<Task<T>> Action;
public readonly CancellationToken? CancelToken;
public AsyncWorkItem(TaskCompletionSource<T> taskSource, Func<Task<T>> action, CancellationToken? cancelToken)
{
TaskSource = taskSource;
Action = action;
CancelToken = cancelToken;
}
}
Then there's a task looking and dequeueing items for processing and either processing them, or aborting if the CancellationToken has been triggered.
That all works just fine - data gets processed, and if a new piece of data is received, processing of the old is aborted. My problem now stems from these Queues leaking massive amounts of memory if I crank up the usage (producer producing a lot more than the consumer processes). Given it's abortable, the data that is not processed, should be discarded and eventually disappear from memory.
So let's look at how I'm using these queues. I have a 1:1 match of producer and consumer. Every consumer handles data of a single producer. Whenever I get a new data item, and it doesn't match the previous one, I catch the queue for the given producer (User.UserId) or create a new one (the 'executor' in the code snippet). Then I have a ConcurrentDictionary that holds a CancellationTokenSource per producer/consumer combo. If there's a previous CancellationTokenSource, I call Cancel on it and Dispose it 20 seconds later (immediate disposal would cause exceptions in the queue). I then enqueue processing of the new data. The queue returns me a task that I can await so I know when processing of the data is complete, and I then return the result.
Here's that in code
internal class SimpleLeakyConsumer
{
private ConcurrentDictionary<string, IExecutableAsyncFifoQueue<bool>> groupStateChangeExecutors = new ConcurrentDictionary<string, IExecutableAsyncFifoQueue<bool>>();
private readonly ConcurrentDictionary<string, CancellationTokenSource> userStateChangeAborters = new ConcurrentDictionary<string, CancellationTokenSource>();
protected CancellationTokenSource serverShutDownSource;
private readonly int operationDuration = 1000;
internal SimpleLeakyConsumer(CancellationTokenSource serverShutDownSource, int operationDuration)
{
this.serverShutDownSource = serverShutDownSource;
this.operationDuration = operationDuration * 1000; // convert from seconds to milliseconds
}
internal async Task<bool> ProcessStateChange(string userId)
{
var executor = groupStateChangeExecutors.GetOrAdd(userId, new AsyncCollectionAbortableFifoQueue<bool>(serverShutDownSource.Token));
CancellationTokenSource oldSource = null;
using (var cancelSource = userStateChangeAborters.AddOrUpdate(userId, new CancellationTokenSource(), (key, existingValue) =>
{
oldSource = existingValue;
return new CancellationTokenSource();
}))
{
if (oldSource != null && !oldSource.IsCancellationRequested)
{
oldSource.Cancel();
_ = delayedDispose(oldSource);
}
try
{
var executionTask = executor.EnqueueTask(async () => { await Task.Delay(operationDuration, cancelSource.Token).ConfigureAwait(false); return true; }, cancelSource.Token);
var result = await executionTask.ConfigureAwait(false);
userStateChangeAborters.TryRemove(userId, out var aborter);
return result;
}
catch (Exception e)
{
if (e is TaskCanceledException || e is OperationCanceledException)
return true;
else
{
userStateChangeAborters.TryRemove(userId, out var aborter);
return false;
}
}
}
}
private async Task delayedDispose(CancellationTokenSource src)
{
try
{
await Task.Delay(20 * 1000).ConfigureAwait(false);
}
finally
{
try
{
src.Dispose();
}
catch (ObjectDisposedException) { }
}
}
}
In this sample implementation, all that is being done is wait, then return true.
To test this mechanism, I wrote the following Data producer class:
internal class SimpleProducer
{
//variables defining the test
readonly int nbOfusers = 10;
readonly int minimumDelayBetweenTest = 1; // seconds
readonly int maximumDelayBetweenTests = 6; // seconds
readonly int operationDuration = 3; // number of seconds an operation takes in the tester
private readonly Random rand;
private List<User> users;
private readonly SimpleLeakyConsumer consumer;
protected CancellationTokenSource serverShutDownSource, testAbortSource;
private CancellationToken internalToken = CancellationToken.None;
internal SimpleProducer()
{
rand = new Random();
testAbortSource = new CancellationTokenSource();
serverShutDownSource = new CancellationTokenSource();
generateTestObjects(nbOfusers, 0, false);
consumer = new SimpleLeakyConsumer(serverShutDownSource, operationDuration);
}
internal void StartTests()
{
if (internalToken == CancellationToken.None || internalToken.IsCancellationRequested)
{
internalToken = testAbortSource.Token;
foreach (var user in users)
_ = setNewUserPresence(internalToken, user);
}
}
internal void StopTests()
{
testAbortSource.Cancel();
try
{
testAbortSource.Dispose();
}
catch (ObjectDisposedException) { }
testAbortSource = new CancellationTokenSource();
}
internal void Shutdown()
{
serverShutDownSource.Cancel();
}
private async Task setNewUserPresence(CancellationToken token, User user)
{
while (!token.IsCancellationRequested)
{
var nextInterval = rand.Next(minimumDelayBetweenTest, maximumDelayBetweenTests);
try
{
await Task.Delay(nextInterval * 1000, testAbortSource.Token).ConfigureAwait(false);
}
catch (TaskCanceledException)
{
break;
}
//now randomly generate a new state and submit it to the tester class
UserState? status;
var nbStates = Enum.GetValues(typeof(UserState)).Length;
if (user.CurrentStatus == null)
{
var newInt = rand.Next(nbStates);
status = (UserState)newInt;
}
else
{
do
{
var newInt = rand.Next(nbStates);
status = (UserState)newInt;
}
while (status == user.CurrentStatus);
}
_ = sendUserStatus(user, status.Value);
}
}
private async Task sendUserStatus(User user, UserState status)
{
await consumer.ProcessStateChange(user.UserId).ConfigureAwait(false);
}
private void generateTestObjects(int nbUsers, int nbTeams, bool addAllUsersToTeams = false)
{
users = new List<User>();
for (int i = 0; i < nbUsers; i++)
{
var usr = new User
{
UserId = $"User_{i}",
Groups = new List<Team>()
};
users.Add(usr);
}
}
}
It uses the variables at the beginning of the class to control the test. You can define the number of users (nbOfusers - every user is a producer that produces new data), the minimum (minimumDelayBetweenTest) and maximum (maximumDelayBetweenTests) delay between a user producing the next data and how long it takes the consumer to process the data (operationDuration).
StartTests starts the actual test, and StopTests stops the tests again.
I'm calling these as follows
static void Main(string[] args)
{
var tester = new SimpleProducer();
Console.WriteLine("Test successfully started, type exit to stop");
string str;
do
{
str = Console.ReadLine();
if (str == "start")
tester.StartTests();
else if (str == "stop")
tester.StopTests();
}
while (str != "exit");
tester.Shutdown();
}
So, if I run my tester and type 'start', the Producer class starts producing states that are consumed by Consumer. And memory usage starts to grow and grow and grow. The sample is configured to the extreme, the real-life scenario I'm dealing with is less intensive, but one action of the producer could trigger multiple actions on the consumer side which also have to be executed in the same asynchronous abortable fifo fashion - so worst case, one set of data produced triggers an action for ~10 consumers (that last part I stripped out for brevity).
When I'm having a 100 producers, and each producer produces a new data item every 1-6 seconds (randomly, also the data produces is random). Consuming the data takes 3 seconds.. so there's plenty of cases where there's a new set of data before the old one has been properly processed.
Looking at two consecutive memory dumps, it's obvious where the memory usage is coming from.. it's all fragments that have to do with the queue. Given that I'm disposing every TaskCancellationSource and not keeping any references to the produced data (and the AsyncWorkItem they're put into), I'm at a loss to explain why this keeps eating up my memory and I'm hoping somebody else can show me the errors of my way. You can also abort testing by typing 'stop'.. you'll see that no longer is memory being eaten, but even if you pause and trigger GC, memory is not being freed either.
The source code of the project in runnable form is on Github. After starting it, you have to type start (plus enter) in the console to tell the producer to start producing data. And you can stop producing data by typing stop (plus enter)
Your code has so many issues making it impossible to find a leak through debugging. But here are several things that already are an issue and should be fixed first:
Looks like getQueue creates a new queue for the same user each time processUseStateUpdateAsync gets called and does not reuse existing queues:
var executor = groupStateChangeExecutors.GetOrAdd(user.UserId, getQueue());
CancellationTokenSource is leaking on each call of the code below, as new value created each time the method AddOrUpdate is called, it should not be passed there that way:
userStateChangeAborters.AddOrUpdate(user.UserId, new CancellationTokenSource(), (key, existingValue
Also code below should use the same cts as you pass as new cts, if dictionary has no value for specific user.UserId:
return new CancellationTokenSource();
Also there is a potential leak of cancelSource variable as it gets bound to a delegate which can live for a time longer than you want, it's better to pass concrete CancellationToken there:
executor.EnqueueTask(() => processUserStateUpdateAsync(user, state, previousState,
cancelSource.Token));
By some reason you do not dispose aborter here and in one more place:
userStateChangeAborters.TryRemove(user.UserId, out var aborter);
Creation of Channel can have potential leaks:
taskQueue = Channel.CreateBounded<AsyncWorkItem<T>>(new BoundedChannelOptions(1)
You picked option FullMode = BoundedChannelFullMode.DropOldest which should remove oldest values if there are any, so I assume that that stops queued items from processing as they would not be read. It's a hypotheses, but I assume that if an old item is removed without being handled, then processUserStateUpdateAsync won't get called and all resources won't be freed.
You can start with these found issues and it should be easier to find the real cause after that.

How to execute tasks in parallel but not more than N tasks per T seconds?

I need to run many tasks in parallel as fast as possible. But if my program runs more than 30 tasks per 1 second, it will be blocked. How to ensure that tasks run no more than 30 per any 1-second interval?
In other words, we must prevent the new task from starting if 30 tasks were completed in the last 1-second interval.
My ugly possible solution:
private async Task Process(List<Task> taskList, int maxIntervalCount, int timeIntervalSeconds)
{
var timeList = new List<DateTime>();
var sem = new Semaphore(maxIntervalCount, maxIntervalCount);
var tasksToRun = taskList.Select(async task =>
{
do
{
sem.WaitOne();
}
while (HasAllowance(timeList, maxIntervalCount, timeIntervalSeconds));
await task;
timeList.Add(DateTime.Now);
sem.Release();
});
await Task.WhenAll(tasksToRun);
}
private bool HasAllowance(List<DateTime> timeList, int maxIntervalCount, int timeIntervalSeconds)
{
return timeList.Count <= maxIntervalCount
|| DateTime.Now.Subtract(TimeSpan.FromSeconds(timeIntervalSeconds)) > timeList[timeList.Count - maxIntervalCount];
}
User code should never have to control how tasks are scheduled directly. For one thing, it can't - controlling how tasks run is the job of the TaskScheduler. When user code calls .Start(), it simply adds a task to a threadpool queue for execution. await executes already executing tasks.
The TaskScheduler samples show how to create limited concurrency schedulers, but again, there are better, high-level options.
The question's code doesn't throttle the queued tasks anyway, it limits how many of them can be awaited. They are all running already. This is similar to batching the previous asynchronous operation in a pipeline, allowing only a limited number of messages to pass to the next level.
ActionBlock with delay
The easy, out-of-the-box way would be to use an ActionBlock with a limited MaxDegreeOfParallelism, to ensure no more than N concurrent operations can run at the same time. If we know how long each operation takes, we could add a bit of delay to ensure we don't overshoot the throttle limit.
In this case, 7 concurrent workers perform 4 requests/second, for a total of 28 maximum request per second. The BoundedCapacity means that only up to 7 items will be stored in the input buffer before downloader.SendAsync blocks. This way we avoid flooding the ActionBlock if the operations take too long.
var downloader = new ActionBlock<string>(
async url => {
await Task.Delay(250);
var response=await httpClient.GetStringAsync(url);
//Do something with it.
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 7, BoundedCapacity=7 }
);
//Start posting to the downloader
foreach(var item in urls)
{
await downloader.SendAsync(item);
}
downloader.Complete();
await downloader.Completion;
ActionBlock with SemaphoreSlim
Another option would be to combine this with a SemaphoreSlim that gets reset periodically by a timer.
var refreshTimer = new Timer(_=>sm.Release(30));
var downloader = new ActionBlock<string>(
async url => {
await semaphore.WaitAsync();
try
{
var response=await httpClient.GetStringAsync(url);
//Do something with it.
}
finally
{
semaphore.Release();
}
},
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 5, BoundedCapacity=5 }
);
//Start the timer right before we start posting
refreshTimer.Change(1000,1000);
foreach(....)
{
}
This is the snippet:
var tasks = new List<Task>();
foreach(item in listNeedInsert)
{
var task = TaskToRun(item);
tasks.Add(task);
if(tasks.Count == 100)
{
await Task.WhenAll(tasks);
tasks.Clear();
}
}
// Wait for anything left to finish
await Task.WhenAll(tasks);
Notice that I rather add the task into a List<Task>(); and after all is added, I await all in the same List<Task>();
What you do here:
var tasks = taskList.Select(async task =>
{
do
{
sem.WaitOne();
}
while (timeList.Count <= maxIntervalCount
|| DateTime.Now.Subtract(TimeSpan.FromSeconds(timeIntervalSeconds)) > timeList[timeList.Count - maxIntervalCount]);
await task;
is blocking until the task finishes it's work thus making this call:
Task.WhenAll(tasks).Wait();
completely redundant. Furthermore, this line Task.WhenAll(tasks).Wait(); is performing unnecessary blocking on the WhenAll method.
Is the blocking due to some server/firewall/hardware limit or it is based on observation?
You should try to use BlockingCollection<Task> or similar thread safe collections especially if the job of your tasks are I/O-bound. You can even set the capacity to 30:
var collection = BlockingCollection<Task>(30);
Then you can start 2 async method:
var population = Task.Factory.Start(Populate);
var processing = Task.Factory.Start(Dequeue);
await Task.WhenAll(population, processing);
Task Populate()
{
foreach (...)
collection.Add(...);
collection.CompleteAdding();
}
Task Dequeue
{
while(!collection.IsComplete)
await collection.Take(); //consider using TryTake()
}
If the limit presists due to some true limitation (should be very rare) change Populate() as follows:
var stopper = Stopwatch.StartNew();
for (var i = ....) //instead of foreach
{
if (i % 30 == 0)
{
if (stopper.ElapsedMilliseconds < 1000)
Task.Delay(1000 - stopper.ElapsedMilliseconds); //note that this race condition should be avoided in your code
stopper.Restart();
}
collection.Add(...);
}
collection.CompleteAdding();
I think that this problem can be solved by a SemaphoreSlim limited to the number of maximum tasks per interval, and also by a Task.Delay that delays the release of the SemaphoreSlim after each task's completion, for an interval equal to the required throttling interval. Below is an implementation based on this idea. The rate limiting can be applied in two ways:
With includeAsynchronousDuration: false the rate limit affects how many operations can be started during the specified time span. The duration of each operation is not taken into account.
With includeAsynchronousDuration: true the rate limit affects how many operations can be counted as "active" during the specified time span, and is more restrictive (makes the enumeration slower). Instead of counting each operation as a moment in time (when started), it is counted as a time span (between start and completion). An operation is counted as "active" for a specified time span, if and only if its own time span intersects with the specified time span.
/// <summary>
/// Applies an asynchronous transformation for each element of a sequence,
/// limiting the number of transformations that can start or be active during
/// the specified time span.
/// </summary>
public static async Task<TResult[]> ForEachAsync<TSource, TResult>(
this IEnumerable<TSource> source,
Func<TSource, Task<TResult>> action,
int maxActionsPerTimeUnit,
TimeSpan timeUnit,
bool includeAsynchronousDuration = false,
bool onErrorContinue = false, /* Affects only asynchronous errors */
bool executeOnCapturedContext = false)
{
if (source == null) throw new ArgumentNullException(nameof(source));
if (action == null) throw new ArgumentNullException(nameof(action));
if (maxActionsPerTimeUnit < 1)
throw new ArgumentOutOfRangeException(nameof(maxActionsPerTimeUnit));
if (timeUnit < TimeSpan.Zero || timeUnit.TotalMilliseconds > Int32.MaxValue)
throw new ArgumentOutOfRangeException(nameof(timeUnit));
using var semaphore = new SemaphoreSlim(maxActionsPerTimeUnit,
maxActionsPerTimeUnit);
using var cts = new CancellationTokenSource();
var tasks = new List<Task<TResult>>();
var releaseTasks = new List<Task>();
try // Watch for exceptions thrown by the source enumerator
{
foreach (var item in source)
{
try
{
await semaphore.WaitAsync(cts.Token)
.ConfigureAwait(executeOnCapturedContext);
}
catch (OperationCanceledException) { break; }
// Exceptions thrown synchronously by invoking the action are breaking
// the loop unconditionally (the onErrorContinue has no effect on them).
var task = action(item);
if (!onErrorContinue) task = ObserveFailureAsync(task);
tasks.Add(task);
releaseTasks.Add(ScheduleSemaphoreReleaseAsync(task));
}
}
catch (Exception ex) { tasks.Add(Task.FromException<TResult>(ex)); }
cts.Cancel(); // Cancel all release tasks
Task<TResult[]> whenAll = Task.WhenAll(tasks);
try { return await whenAll.ConfigureAwait(false); }
catch (OperationCanceledException) when (whenAll.IsCanceled) { throw; }
catch { whenAll.Wait(); throw; } // Propagate AggregateException
finally { await Task.WhenAll(releaseTasks); }
async Task<TResult> ObserveFailureAsync(Task<TResult> task)
{
try { return await task.ConfigureAwait(false); }
catch { cts.Cancel(); throw; }
}
async Task ScheduleSemaphoreReleaseAsync(Task<TResult> task)
{
if (includeAsynchronousDuration)
try { await task.ConfigureAwait(false); } catch { } // Ignore exceptions
// Release only if the Task.Delay completed successfully
try { await Task.Delay(timeUnit, cts.Token).ConfigureAwait(false); }
catch (OperationCanceledException) { return; }
semaphore.Release();
}
}
Usage example:
int[] results = await ForEachAsync(Enumerable.Range(1, 100), async n =>
{
await Task.Delay(500); // Simulate some asynchronous I/O-bound operation
return n;
}, maxActionsPerTimeUnit: 30, timeUnit: TimeSpan.FromSeconds(1.0),
includeAsynchronousDuration: true);
The reasons for propagating an AggregateException using the catch+Wait technique, are explained here.

How to limit number of HttpWebRequest per second towards a webserver?

I need to implement a throttling mechanism (requests per second) when using HttpWebRequest for making parallel requests towards one application server. My C# app must issue no more than 80 requests per second to a remote server. The limit is imposed by the remote service admins not as a hard limit but as "SLA" between my platform and theirs.
How can I control the number of requests per second when using HttpWebRequest?
I had the same problem and couldn't find a ready solution so I made one, and here it is. The idea is to use a BlockingCollection<T> to add items that need processing and use Reactive Extensions to subscribe with a rate-limited processor.
Throttle class is the renamed version of this rate limiter
public static class BlockingCollectionExtensions
{
// TODO: devise a way to avoid problems if collection gets too big (produced faster than consumed)
public static IObservable<T> AsRateLimitedObservable<T>(this BlockingCollection<T> sequence, int items, TimeSpan timePeriod, CancellationToken producerToken)
{
Subject<T> subject = new Subject<T>();
// this is a dummyToken just so we can recreate the TokenSource
// which we will pass the proxy class so it can cancel the task
// on disposal
CancellationToken dummyToken = new CancellationToken();
CancellationTokenSource tokenSource = CancellationTokenSource.CreateLinkedTokenSource(producerToken, dummyToken);
var consumingTask = new Task(() =>
{
using (var throttle = new Throttle(items, timePeriod))
{
while (!sequence.IsCompleted)
{
try
{
T item = sequence.Take(producerToken);
throttle.WaitToProceed();
try
{
subject.OnNext(item);
}
catch (Exception ex)
{
subject.OnError(ex);
}
}
catch (OperationCanceledException)
{
break;
}
}
subject.OnCompleted();
}
}, TaskCreationOptions.LongRunning);
return new TaskAwareObservable<T>(subject, consumingTask, tokenSource);
}
private class TaskAwareObservable<T> : IObservable<T>, IDisposable
{
private readonly Task task;
private readonly Subject<T> subject;
private readonly CancellationTokenSource taskCancellationTokenSource;
public TaskAwareObservable(Subject<T> subject, Task task, CancellationTokenSource tokenSource)
{
this.task = task;
this.subject = subject;
this.taskCancellationTokenSource = tokenSource;
}
public IDisposable Subscribe(IObserver<T> observer)
{
var disposable = subject.Subscribe(observer);
if (task.Status == TaskStatus.Created)
task.Start();
return disposable;
}
public void Dispose()
{
// cancel consumption and wait task to finish
taskCancellationTokenSource.Cancel();
task.Wait();
// dispose tokenSource and task
taskCancellationTokenSource.Dispose();
task.Dispose();
// dispose subject
subject.Dispose();
}
}
}
Unit test:
class BlockCollectionExtensionsTest
{
[Fact]
public void AsRateLimitedObservable()
{
const int maxItems = 1; // fix this to 1 to ease testing
TimeSpan during = TimeSpan.FromSeconds(1);
// populate collection
int[] items = new[] { 1, 2, 3, 4 };
BlockingCollection<int> collection = new BlockingCollection<int>();
foreach (var i in items) collection.Add(i);
collection.CompleteAdding();
IObservable<int> observable = collection.AsRateLimitedObservable(maxItems, during, CancellationToken.None);
BlockingCollection<int> processedItems = new BlockingCollection<int>();
ManualResetEvent completed = new ManualResetEvent(false);
DateTime last = DateTime.UtcNow;
observable
// this is so we'll receive exceptions
.ObserveOn(new SynchronizationContext())
.Subscribe(item =>
{
if (item == 1)
last = DateTime.UtcNow;
else
{
TimeSpan diff = (DateTime.UtcNow - last);
last = DateTime.UtcNow;
Assert.InRange(diff.TotalMilliseconds,
during.TotalMilliseconds - 30,
during.TotalMilliseconds + 30);
}
processedItems.Add(item);
},
() => completed.Set()
);
completed.WaitOne();
Assert.Equal(items, processedItems, new CollectionEqualityComparer<int>());
}
}
The Throttle() and Sample() extension methods (On Observable) allow you to regulate a fast sequence of events into a "slower" sequence.
Here is a blog post with an example of Sample(Timespan) that ensures a maxium rate.
My original post discussed how to add a throttling mechanism to WCF via client behavior extensions, but then was pointed out that I misread the question (doh!).
Overall the approach can be to check with a class that determines if we are violating the rate limit or not. There's already been a lot of discussion around how to check for rate violations.
Throttling method calls to M requests in N seconds
If you are violating the rate limit, then sleep for a fix interval and check again. If not, then go ahead and make the HttpWebRequest call.

Categories

Resources