How to consume chuncks from ConcurrentQueue correctly

How to consume chuncks from ConcurrentQueue correctly - c#

I need to implement a queue of requests which can be populated from multiple threads. When this queue becomes larger than 1000 completed requests, this requests should be stored into database. Here is my implementation:
public class RequestQueue
{
private static BlockingCollection<VerificationRequest> _queue = new BlockingCollection<VerificationRequest>();
private static ConcurrentQueue<VerificationRequest> _storageQueue = new ConcurrentQueue<VerificationRequest>();
private static volatile bool isLoading = false;
private static object _lock = new object();
public static void Launch()
{
Task.Factory.StartNew(execute);
}
public static void Add(VerificationRequest request)
{
_queue.Add(request);
}
public static void AddRange(List<VerificationRequest> requests)
{
Parallel.ForEach(requests, new ParallelOptions() {MaxDegreeOfParallelism = 3},
(request) => { _queue.Add(request); });
}
private static void execute()
{
Parallel.ForEach(_queue.GetConsumingEnumerable(), new ParallelOptions {MaxDegreeOfParallelism = 5}, EnqueueSaveRequest );
}
private static void EnqueueSaveRequest(VerificationRequest request)
{
_storageQueue.Enqueue( new RequestExecuter().ExecuteVerificationRequest( request ) );
if (_storageQueue.Count > 1000 && !isLoading)
{
lock ( _lock )
{
if ( _storageQueue.Count > 1000 && !isLoading )
{
isLoading = true;
var requestChunck = new List<VerificationRequest>();
VerificationRequest req;
for (var i = 0; i < 1000; i++)
{
if( _storageQueue.TryDequeue(out req))
requestChunck.Add(req);
}
new VerificationRequestRepository().InsertRange(requestChunck);
isLoading = false;
}
}
}
}
}
Is there any way to implement this without lock and isLoading?

The easiest way to do what you ask is to use the blocks in the TPL Dataflow library. Eg
var batchBlock = new BatchBlock<VerificationRequest>(1000);
var exportBlock = new ActionBlock<VerificationRequest[]>(records=>{
new VerificationRequestRepository().InsertRange(records);
};
batchBlock.LinkTo(exportBlock , new DataflowLinkOptions { PropagateCompletion = true });
That's it.
You can send messages to the starting block with
batchBlock.Post(new VerificationRequest(...));
Once you finish your work, you can take down the entire pipeline and flush any leftover messages by calling batchBlock.Complete(); and await for the final block to finish:
batchBlock.Complete();
await exportBlock.Completion;
The BatchBlock batches up to 1000 records into arrays of 1000 items and passes them to the next block. An ActionBlock uses 1 task only by default, so it is thread-safe. You could use an existing instance of your repository without worrying about cross-thread access:
var repository=new VerificationRequestRepository();
var exportBlock = new ActionBlock<VerificationRequest[]>(records=>{
repository.InsertRange(records);
};
Almost all blocks have a concurrent input buffer. Each block runs on its own TPL task, so each step runs concurrently with each other. This means that you get asynchronous execution "for free" and can be important if you have multiple linked steps, eg you use a TransformBlock to modify the messages flowing through the pipeline.
I use such pipelines to create pipelines that call external services, parse responses, generate the final records, batch them and send them to the database with a block that uses SqlBulkCopy.

Related

Can not run TPL Dataflow pipeline

I am trying to create a pipeline using TPL Dataflow where i can store messages in a batch block , and whenever its treshold is hit it would send the data to an action block.I have added a buffer block in case the action block is too slow.
So far i have tried all possible methods to move data from the first block to the second to no avail. I have linked the blocks , added the DataFlowLinkOptions of PropagateCompletion set to true. What else do I have to do in order for this pipeline to work ?
Pipeline
class LogPipeline<T>
{
private ActionBlock<T[]> actionBlock;
private BufferBlock<T> bufferBlock;
private BatchBlock<T> batchBlock;
private readonly Action<T[]> action;
private readonly int BufferSize;
private readonly int BatchSize;
public LogPipeline(Action<T[]> action, int bufferSize = 4, int batchSize = 2)
{
this.BufferSize = bufferSize;
this.BatchSize = batchSize;
this.action = action;
}
private void Initialize()
{
this.bufferBlock = new BufferBlock<T>(new DataflowBlockOptions
{ TaskScheduler = TaskScheduler.Default,
BoundedCapacity = this.BufferSize });
this.actionBlock = new ActionBlock<T[]>(this.action);
this.batchBlock = new BatchBlock<T>(BatchSize);
this.bufferBlock.LinkTo(this.batchBlock, new DataflowLinkOptions
{ PropagateCompletion = true });
this.batchBlock.LinkTo(this.actionBlock, new DataflowLinkOptions
{ PropagateCompletion = true });
}
public void Post(T log)
{
this.bufferBlock.Post(log);
}
public void Start()
{
this.Initialize();
}
public void Stop()
{
actionBlock.Complete();
}
}
Test
[TestCase(100, 1000, 5)]
public void CanBatchPipelineResults(int batchSize, int bufferSize, int cycles)
{
List<int> data = new List<int>();
LogPipeline<int> logPipeline = new LogPipeline<int>(
batchSize: batchSize,
bufferSize: bufferSize,
action: (logs) =>
{
data.AddRange(logs);
});
logPipeline.Start();
int SelectWithEffect(int element)
{
logPipeline.Post(element);
return 3;
}
int count = 0;
while (true)
{
if (count++ > cycles)
{
break;
}
var sent = Parallel.For(0, bufferSize, (x) => SelectWithEffect(x));
}
logPipeline.Stop();
Assert.IsTrue(data.Count == cycles * batchSize);
}
Why are all my blocks empty besides the buffer? I have tried with SendAsync also to no avail. No data is moved from the first block to the next no matter what I do.
I have both with and without the link options.
Update :
I have completely erased the pipeline and also the Parallel.
I have tried with all kinds of input blocks (batch/buffer/transform) and it seems there is no way subsequent blocks are getting something.
I have also tried with await SendAsync as well as Post.
I have only tried within unit tests classes.
Could this be the issue ?
Update 2
I was wrong complicating things , i have tried a more simple example . Inside a testcase even this doesnt work:
List<int> items=new List<int>();
var tf=new TransformBlock<int,int>(x=>x+1);
var action= new ActionBlock<int>(x=>items.Add(x));
tf.LinkTo(action, new DataFlowOptions{ PropagateCompletion=true});
tf.Post(3);
//Breakpoint here

The reason nothing seems to happen before the test ends is that none of the block has a chance to run. The code blocks all CPUs by using Parallel.For so no other task has a chance to run. This means that all posted messages are still in the first block. The code then calls Complete on the last block but doesn't even await for it to finish processing before checking the results.
The code can be simplified a lot. For starters, all blocks have input buffers, they don't need extra buffering.
The pipeline could be replaced with just this :
//Arrange
var list=new List<int>();
var head=new BatchBlock<int>(BatchSize);
var act=new ActionBlock<int[]>(nums=>list.AddRange(nums);
var options= new DataflowLinkOptions{ PropagateCompletion = true };
head.LinkTo(act);
//ACT
//Just fire everything at once, because why not
var tasks=Enumerable.Range(0,cycles)(
i=>Task.Run(()=> head.Post(i)));
await tasks;
//Tell the head block we're done
head.Complete();
//Wait for the last block to complete
await act.Completion;
//ASSERT
Assert.Equal(cycles, data.Count);
There's no real need to create a complex class to encapsulate the pipeline. It doesn't "start" - the blocks do nothing if they have no data. To abstract it, one only needs to provide access to the head block and the last block's Completion task

By calling logPipeline.Stop immediately after sending the data to the BufferBlock, you are completing the ActionBlock, and so it declines all messages that the BatchBlock is trying later to send to it. From the documentation of the ActionBlock.Complete method:
Signals to the dataflow block that it shouldn't accept or produce any more messages and shouldn't consume any more postponed messages.
Update: Regarding the updated requirements in the question:
Whenever its threshold is hit it would send the data to an action block.
...my suggestion is to move this logic inside the LogPipeline.Post method. The method BufferBlock.Post returns false if the block hasn't accepted the data sent to it.
public void Post(T log)
{
if (!this.bufferBlock.Post(log)) this.actionBlock.Post(log);
}

C# Abortable Asynchronous Fifo Queue - leaking massive amounts of memory

I need to process data from a producer in FIFO fashion with the ability to abort processing if the same producer produces a new bit of data.
So I implemented an abortable FIFO queue based on Stephen Cleary's AsyncCollection (called AsyncCollectionAbortableFifoQueuein my sample) and one on TPL's BufferBlock (BufferBlockAbortableAsyncFifoQueue in my sample). Here's the implementation based on AsyncCollection
public class AsyncCollectionAbortableFifoQueue<T> : IExecutableAsyncFifoQueue<T>
{
private AsyncCollection<AsyncWorkItem<T>> taskQueue = new AsyncCollection<AsyncWorkItem<T>>();
private readonly CancellationToken stopProcessingToken;
public AsyncCollectionAbortableFifoQueue(CancellationToken cancelToken)
{
stopProcessingToken = cancelToken;
_ = processQueuedItems();
}
public Task<T> EnqueueTask(Func<Task<T>> action, CancellationToken? cancelToken)
{
var tcs = new TaskCompletionSource<T>();
var item = new AsyncWorkItem<T>(tcs, action, cancelToken);
taskQueue.Add(item);
return tcs.Task;
}
protected virtual async Task processQueuedItems()
{
while (!stopProcessingToken.IsCancellationRequested)
{
try
{
var item = await taskQueue.TakeAsync(stopProcessingToken).ConfigureAwait(false);
if (item.CancelToken.HasValue && item.CancelToken.Value.IsCancellationRequested)
item.TaskSource.SetCanceled();
else
{
try
{
T result = await item.Action().ConfigureAwait(false);
item.TaskSource.SetResult(result); // Indicate completion
}
catch (Exception ex)
{
if (ex is OperationCanceledException && ((OperationCanceledException)ex).CancellationToken == item.CancelToken)
item.TaskSource.SetCanceled();
item.TaskSource.SetException(ex);
}
}
}
catch (Exception) { }
}
}
}
public interface IExecutableAsyncFifoQueue<T>
{
Task<T> EnqueueTask(Func<Task<T>> action, CancellationToken? cancelToken);
}
processQueuedItems is the task that dequeues AsyncWorkItem's from the queue, and executes them unless cancellation has been requested.
The asynchronous action to execute gets wrapped into an AsyncWorkItem which looks like this
internal class AsyncWorkItem<T>
{
public readonly TaskCompletionSource<T> TaskSource;
public readonly Func<Task<T>> Action;
public readonly CancellationToken? CancelToken;
public AsyncWorkItem(TaskCompletionSource<T> taskSource, Func<Task<T>> action, CancellationToken? cancelToken)
{
TaskSource = taskSource;
Action = action;
CancelToken = cancelToken;
}
}
Then there's a task looking and dequeueing items for processing and either processing them, or aborting if the CancellationToken has been triggered.
That all works just fine - data gets processed, and if a new piece of data is received, processing of the old is aborted. My problem now stems from these Queues leaking massive amounts of memory if I crank up the usage (producer producing a lot more than the consumer processes). Given it's abortable, the data that is not processed, should be discarded and eventually disappear from memory.
So let's look at how I'm using these queues. I have a 1:1 match of producer and consumer. Every consumer handles data of a single producer. Whenever I get a new data item, and it doesn't match the previous one, I catch the queue for the given producer (User.UserId) or create a new one (the 'executor' in the code snippet). Then I have a ConcurrentDictionary that holds a CancellationTokenSource per producer/consumer combo. If there's a previous CancellationTokenSource, I call Cancel on it and Dispose it 20 seconds later (immediate disposal would cause exceptions in the queue). I then enqueue processing of the new data. The queue returns me a task that I can await so I know when processing of the data is complete, and I then return the result.
Here's that in code
internal class SimpleLeakyConsumer
{
private ConcurrentDictionary<string, IExecutableAsyncFifoQueue<bool>> groupStateChangeExecutors = new ConcurrentDictionary<string, IExecutableAsyncFifoQueue<bool>>();
private readonly ConcurrentDictionary<string, CancellationTokenSource> userStateChangeAborters = new ConcurrentDictionary<string, CancellationTokenSource>();
protected CancellationTokenSource serverShutDownSource;
private readonly int operationDuration = 1000;
internal SimpleLeakyConsumer(CancellationTokenSource serverShutDownSource, int operationDuration)
{
this.serverShutDownSource = serverShutDownSource;
this.operationDuration = operationDuration * 1000; // convert from seconds to milliseconds
}
internal async Task<bool> ProcessStateChange(string userId)
{
var executor = groupStateChangeExecutors.GetOrAdd(userId, new AsyncCollectionAbortableFifoQueue<bool>(serverShutDownSource.Token));
CancellationTokenSource oldSource = null;
using (var cancelSource = userStateChangeAborters.AddOrUpdate(userId, new CancellationTokenSource(), (key, existingValue) =>
{
oldSource = existingValue;
return new CancellationTokenSource();
}))
{
if (oldSource != null && !oldSource.IsCancellationRequested)
{
oldSource.Cancel();
_ = delayedDispose(oldSource);
}
try
{
var executionTask = executor.EnqueueTask(async () => { await Task.Delay(operationDuration, cancelSource.Token).ConfigureAwait(false); return true; }, cancelSource.Token);
var result = await executionTask.ConfigureAwait(false);
userStateChangeAborters.TryRemove(userId, out var aborter);
return result;
}
catch (Exception e)
{
if (e is TaskCanceledException || e is OperationCanceledException)
return true;
else
{
userStateChangeAborters.TryRemove(userId, out var aborter);
return false;
}
}
}
}
private async Task delayedDispose(CancellationTokenSource src)
{
try
{
await Task.Delay(20 * 1000).ConfigureAwait(false);
}
finally
{
try
{
src.Dispose();
}
catch (ObjectDisposedException) { }
}
}
}
In this sample implementation, all that is being done is wait, then return true.
To test this mechanism, I wrote the following Data producer class:
internal class SimpleProducer
{
//variables defining the test
readonly int nbOfusers = 10;
readonly int minimumDelayBetweenTest = 1; // seconds
readonly int maximumDelayBetweenTests = 6; // seconds
readonly int operationDuration = 3; // number of seconds an operation takes in the tester
private readonly Random rand;
private List<User> users;
private readonly SimpleLeakyConsumer consumer;
protected CancellationTokenSource serverShutDownSource, testAbortSource;
private CancellationToken internalToken = CancellationToken.None;
internal SimpleProducer()
{
rand = new Random();
testAbortSource = new CancellationTokenSource();
serverShutDownSource = new CancellationTokenSource();
generateTestObjects(nbOfusers, 0, false);
consumer = new SimpleLeakyConsumer(serverShutDownSource, operationDuration);
}
internal void StartTests()
{
if (internalToken == CancellationToken.None || internalToken.IsCancellationRequested)
{
internalToken = testAbortSource.Token;
foreach (var user in users)
_ = setNewUserPresence(internalToken, user);
}
}
internal void StopTests()
{
testAbortSource.Cancel();
try
{
testAbortSource.Dispose();
}
catch (ObjectDisposedException) { }
testAbortSource = new CancellationTokenSource();
}
internal void Shutdown()
{
serverShutDownSource.Cancel();
}
private async Task setNewUserPresence(CancellationToken token, User user)
{
while (!token.IsCancellationRequested)
{
var nextInterval = rand.Next(minimumDelayBetweenTest, maximumDelayBetweenTests);
try
{
await Task.Delay(nextInterval * 1000, testAbortSource.Token).ConfigureAwait(false);
}
catch (TaskCanceledException)
{
break;
}
//now randomly generate a new state and submit it to the tester class
UserState? status;
var nbStates = Enum.GetValues(typeof(UserState)).Length;
if (user.CurrentStatus == null)
{
var newInt = rand.Next(nbStates);
status = (UserState)newInt;
}
else
{
do
{
var newInt = rand.Next(nbStates);
status = (UserState)newInt;
}
while (status == user.CurrentStatus);
}
_ = sendUserStatus(user, status.Value);
}
}
private async Task sendUserStatus(User user, UserState status)
{
await consumer.ProcessStateChange(user.UserId).ConfigureAwait(false);
}
private void generateTestObjects(int nbUsers, int nbTeams, bool addAllUsersToTeams = false)
{
users = new List<User>();
for (int i = 0; i < nbUsers; i++)
{
var usr = new User
{
UserId = $"User_{i}",
Groups = new List<Team>()
};
users.Add(usr);
}
}
}
It uses the variables at the beginning of the class to control the test. You can define the number of users (nbOfusers - every user is a producer that produces new data), the minimum (minimumDelayBetweenTest) and maximum (maximumDelayBetweenTests) delay between a user producing the next data and how long it takes the consumer to process the data (operationDuration).
StartTests starts the actual test, and StopTests stops the tests again.
I'm calling these as follows
static void Main(string[] args)
{
var tester = new SimpleProducer();
Console.WriteLine("Test successfully started, type exit to stop");
string str;
do
{
str = Console.ReadLine();
if (str == "start")
tester.StartTests();
else if (str == "stop")
tester.StopTests();
}
while (str != "exit");
tester.Shutdown();
}
So, if I run my tester and type 'start', the Producer class starts producing states that are consumed by Consumer. And memory usage starts to grow and grow and grow. The sample is configured to the extreme, the real-life scenario I'm dealing with is less intensive, but one action of the producer could trigger multiple actions on the consumer side which also have to be executed in the same asynchronous abortable fifo fashion - so worst case, one set of data produced triggers an action for ~10 consumers (that last part I stripped out for brevity).
When I'm having a 100 producers, and each producer produces a new data item every 1-6 seconds (randomly, also the data produces is random). Consuming the data takes 3 seconds.. so there's plenty of cases where there's a new set of data before the old one has been properly processed.
Looking at two consecutive memory dumps, it's obvious where the memory usage is coming from.. it's all fragments that have to do with the queue. Given that I'm disposing every TaskCancellationSource and not keeping any references to the produced data (and the AsyncWorkItem they're put into), I'm at a loss to explain why this keeps eating up my memory and I'm hoping somebody else can show me the errors of my way. You can also abort testing by typing 'stop'.. you'll see that no longer is memory being eaten, but even if you pause and trigger GC, memory is not being freed either.
The source code of the project in runnable form is on Github. After starting it, you have to type start (plus enter) in the console to tell the producer to start producing data. And you can stop producing data by typing stop (plus enter)

Your code has so many issues making it impossible to find a leak through debugging. But here are several things that already are an issue and should be fixed first:
Looks like getQueue creates a new queue for the same user each time processUseStateUpdateAsync gets called and does not reuse existing queues:
var executor = groupStateChangeExecutors.GetOrAdd(user.UserId, getQueue());
CancellationTokenSource is leaking on each call of the code below, as new value created each time the method AddOrUpdate is called, it should not be passed there that way:
userStateChangeAborters.AddOrUpdate(user.UserId, new CancellationTokenSource(), (key, existingValue
Also code below should use the same cts as you pass as new cts, if dictionary has no value for specific user.UserId:
return new CancellationTokenSource();
Also there is a potential leak of cancelSource variable as it gets bound to a delegate which can live for a time longer than you want, it's better to pass concrete CancellationToken there:
executor.EnqueueTask(() => processUserStateUpdateAsync(user, state, previousState,
cancelSource.Token));
By some reason you do not dispose aborter here and in one more place:
userStateChangeAborters.TryRemove(user.UserId, out var aborter);
Creation of Channel can have potential leaks:
taskQueue = Channel.CreateBounded<AsyncWorkItem<T>>(new BoundedChannelOptions(1)
You picked option FullMode = BoundedChannelFullMode.DropOldest which should remove oldest values if there are any, so I assume that that stops queued items from processing as they would not be read. It's a hypotheses, but I assume that if an old item is removed without being handled, then processUserStateUpdateAsync won't get called and all resources won't be freed.
You can start with these found issues and it should be easier to find the real cause after that.

Why is this TAP async/await code slower than the TPL version?

I had to write a console application that called Microsoft Dynamics CRM web service to perform an action on over eight thousand CRM objects. The details of the web service call are irrelevant and not shown here but I needed a multi-threaded client so that I could make calls in parallel. I wanted to be able to control the number of threads used from a config setting and also for the application to cancel the whole operation if the number of service errors reached a config-defined threshold.
I wrote it using Task Parallel Library Task.Run and ContinueWith, keeping track of how many calls (threads) were in progress, how many errors we'd received, and whether the user had cancelled from the keyboard. Everything worked fine and I had extensive logging to assure myself that threads were finishing cleanly and that everything was tidy at the end of the run. I could see that the program was using the maximum number of threads in parallel and, if our maximum limit was reached, waiting until a running task completed before starting another one.
During my code review, my colleague suggested that it would be better to do it with async/await instead of tasks and continuations, so I created a branch and rewrote it that way. The results were interesting - the async/await version was almost twice as slow, and it never reached the maximum number of allowed parallel operations/threads. The TPL one always got to 10 threads in parallel whereas the async/await version never got beyond 5.
My question is: have I made a mistake in the way I have written the async/await code (or the TPL code even)? If I have not coded it wrong, can you explain why the async/await is less efficient, and does that mean it is better to carry on using TPL for multi-threaded code.
Note that the code I tested with did not actually call CRM - the CrmClient class simply thread-sleeps for a duration specified in the config (five seconds) and then throws an exception. This meant that there were no external variables that could affect the performance.
For the purposes of this question I created a stripped down program that combines both versions; which one is called is determined by a config setting. Each of them starts with a bootstrap runner that sets up the environment, creates the queue class, then uses a TaskCompletionSource to wait for completion. A CancellationTokenSource is used to signal a cancellation from the user. The list of ids to process is read from an embedded file and pushed onto a ConcurrentQueue. They both start off calling StartCrmRequest as many times as max-threads; subsequently, every time a result is processed, the ProcessResult method calls StartCrmRequest again, keeping going until all of our ids are processed.
You can clone/download the complete program from here: https://bitbucket.org/kentrob/pmgfixso/
Here is the relevant configuration:
<appSettings>
<add key="TellUserAfterNCalls" value="5"/>
<add key="CrmErrorsBeforeQuitting" value="20"/>
<add key="MaxThreads" value="10"/>
<add key="CallIntervalMsecs" value="5000"/>
<add key="UseAsyncAwait" value="True" />
</appSettings>
Starting with the TPL version, here is the bootstrap runner that kicks off the queue manager:
public static class TplRunner
{
private static readonly CancellationTokenSource CancellationTokenSource = new CancellationTokenSource();
public static void StartQueue(RuntimeParameters parameters, IEnumerable<string> idList)
{
Console.CancelKeyPress += (s, args) =>
{
CancelCrmClient();
args.Cancel = true;
};
var start = DateTime.Now;
Program.TellUser("Start: " + start);
var taskCompletionSource = new TplQueue(parameters)
.Start(CancellationTokenSource.Token, idList);
while (!taskCompletionSource.Task.IsCompleted)
{
if (Console.KeyAvailable)
{
if (Console.ReadKey().Key != ConsoleKey.Q) continue;
Console.WriteLine("When all threads are complete, press any key to continue.");
CancelCrmClient();
}
}
var end = DateTime.Now;
Program.TellUser("End: {0}. Elapsed = {1} secs.", end, (end - start).TotalSeconds);
}
private static void CancelCrmClient()
{
CancellationTokenSource.Cancel();
Console.WriteLine("Cancelling Crm client. Web service calls in operation will have to run to completion.");
}
}
Here is the TPL queue manager itself:
public class TplQueue
{
private readonly RuntimeParameters parameters;
private readonly object locker = new object();
private ConcurrentQueue<string> idQueue = new ConcurrentQueue<string>();
private readonly CrmClient crmClient;
private readonly TaskCompletionSource<bool> taskCompletionSource = new TaskCompletionSource<bool>();
private int threadCount;
private int crmErrorCount;
private int processedCount;
private CancellationToken cancelToken;
public TplQueue(RuntimeParameters parameters)
{
this.parameters = parameters;
crmClient = new CrmClient();
}
public TaskCompletionSource<bool> Start(CancellationToken cancellationToken, IEnumerable<string> ids)
{
cancelToken = cancellationToken;
foreach (var id in ids)
{
idQueue.Enqueue(id);
}
threadCount = 0;
// Prime our thread pump with max threads.
for (var i = 0; i < parameters.MaxThreads; i++)
{
Task.Run((Action) StartCrmRequest, cancellationToken);
}
return taskCompletionSource;
}
private void StartCrmRequest()
{
if (taskCompletionSource.Task.IsCompleted)
{
return;
}
if (cancelToken.IsCancellationRequested)
{
Program.TellUser("Crm client cancelling...");
ClearQueue();
return;
}
var count = GetThreadCount();
if (count >= parameters.MaxThreads)
{
return;
}
string id;
if (!idQueue.TryDequeue(out id)) return;
IncrementThreadCount();
crmClient.CompleteActivityAsync(new Guid(id), parameters.CallIntervalMsecs).ContinueWith(ProcessResult);
processedCount += 1;
if (parameters.TellUserAfterNCalls > 0 && processedCount%parameters.TellUserAfterNCalls == 0)
{
ShowProgress(processedCount);
}
}
private void ProcessResult(Task<CrmResultMessage> response)
{
if (response.Result.CrmResult == CrmResult.Failed && ++crmErrorCount == parameters.CrmErrorsBeforeQuitting)
{
Program.TellUser(
"Quitting because CRM error count is equal to {0}. Already queued web service calls will have to run to completion.",
crmErrorCount);
ClearQueue();
}
var count = DecrementThreadCount();
if (idQueue.Count == 0 && count == 0)
{
taskCompletionSource.SetResult(true);
}
else
{
StartCrmRequest();
}
}
private int GetThreadCount()
{
lock (locker)
{
return threadCount;
}
}
private void IncrementThreadCount()
{
lock (locker)
{
threadCount = threadCount + 1;
}
}
private int DecrementThreadCount()
{
lock (locker)
{
threadCount = threadCount - 1;
return threadCount;
}
}
private void ClearQueue()
{
idQueue = new ConcurrentQueue<string>();
}
private static void ShowProgress(int processedCount)
{
Program.TellUser("{0} activities processed.", processedCount);
}
}
Note that I am aware that a couple of the counters are not thread safe but they are not critical; the threadCount variable is the only critical one.
Here is the dummy CRM client method:
public Task<CrmResultMessage> CompleteActivityAsync(Guid activityId, int callIntervalMsecs)
{
// Here we would normally call a CRM web service.
return Task.Run(() =>
{
try
{
if (callIntervalMsecs > 0)
{
Thread.Sleep(callIntervalMsecs);
}
throw new ApplicationException("Crm web service not available at the moment.");
}
catch
{
return new CrmResultMessage(activityId, CrmResult.Failed);
}
});
}
And here are the same async/await classes (with common methods removed for the sake of brevity):
public static class AsyncRunner
{
private static readonly CancellationTokenSource CancellationTokenSource = new CancellationTokenSource();
public static void StartQueue(RuntimeParameters parameters, IEnumerable<string> idList)
{
var start = DateTime.Now;
Program.TellUser("Start: " + start);
var taskCompletionSource = new AsyncQueue(parameters)
.StartAsync(CancellationTokenSource.Token, idList).Result;
while (!taskCompletionSource.Task.IsCompleted)
{
...
}
var end = DateTime.Now;
Program.TellUser("End: {0}. Elapsed = {1} secs.", end, (end - start).TotalSeconds);
}
}
The async/await queue manager:
public class AsyncQueue
{
private readonly RuntimeParameters parameters;
private readonly object locker = new object();
private readonly CrmClient crmClient;
private readonly TaskCompletionSource<bool> taskCompletionSource = new TaskCompletionSource<bool>();
private CancellationToken cancelToken;
private ConcurrentQueue<string> idQueue = new ConcurrentQueue<string>();
private int threadCount;
private int crmErrorCount;
private int processedCount;
public AsyncQueue(RuntimeParameters parameters)
{
this.parameters = parameters;
crmClient = new CrmClient();
}
public async Task<TaskCompletionSource<bool>> StartAsync(CancellationToken cancellationToken,
IEnumerable<string> ids)
{
cancelToken = cancellationToken;
foreach (var id in ids)
{
idQueue.Enqueue(id);
}
threadCount = 0;
// Prime our thread pump with max threads.
for (var i = 0; i < parameters.MaxThreads; i++)
{
await StartCrmRequest();
}
return taskCompletionSource;
}
private async Task StartCrmRequest()
{
if (taskCompletionSource.Task.IsCompleted)
{
return;
}
if (cancelToken.IsCancellationRequested)
{
...
return;
}
var count = GetThreadCount();
if (count >= parameters.MaxThreads)
{
return;
}
string id;
if (!idQueue.TryDequeue(out id)) return;
IncrementThreadCount();
var crmMessage = await crmClient.CompleteActivityAsync(new Guid(id), parameters.CallIntervalMsecs);
ProcessResult(crmMessage);
processedCount += 1;
if (parameters.TellUserAfterNCalls > 0 && processedCount%parameters.TellUserAfterNCalls == 0)
{
ShowProgress(processedCount);
}
}
private async void ProcessResult(CrmResultMessage response)
{
if (response.CrmResult == CrmResult.Failed && ++crmErrorCount == parameters.CrmErrorsBeforeQuitting)
{
Program.TellUser(
"Quitting because CRM error count is equal to {0}. Already queued web service calls will have to run to completion.",
crmErrorCount);
ClearQueue();
}
var count = DecrementThreadCount();
if (idQueue.Count == 0 && count == 0)
{
taskCompletionSource.SetResult(true);
}
else
{
await StartCrmRequest();
}
}
}
So, setting MaxThreads to 10 and CrmErrorsBeforeQuitting to 20, the TPL version on my machine completes in 19 seconds and the async/await version takes 35 seconds. Given that I have over 8000 calls to make this is a significant difference. Any ideas?

I think I'm seeing the problem here, or at least a part of it. Look closely at the two bits of code below; they are not equivalent.
// Prime our thread pump with max threads.
for (var i = 0; i < parameters.MaxThreads; i++)
{
Task.Run((Action) StartCrmRequest, cancellationToken);
}
And:
// Prime our thread pump with max threads.
for (var i = 0; i < parameters.MaxThreads; i++)
{
await StartCrmRequest();
}
In the original code (I am taking it as a given that it is functionally sound) there is a single call to ContinueWith. That is exactly how many await statements I would expect to see in a trivial rewrite if it is meant to preserve the original behaviour.
Not a hard and fast rule and only applicable in simple cases, but nevertheless a good thing to keep an eye out for.

I think you over complicated your solution and ended up not getting where you wanted in either implementation.
First of all, connections to any HTTP host are limited by the service point manager. The default limit for client environments is 2, but you can increase it yourself.
No matter how much threads you spawn, there won't be more active requests than those allwed.
Then, as someone pointed out, await logically blocks the execution flow.
And finally, you spent your time creating an AsyncQueue when you should have used TPL data flows.

When implemented with async/await, I would expect the I/O bound algorithm to run on a single thread. Unlike #KirillShlenskiy, I believe that the bit responsible for "bringing back" to caller's context is not responsible for the slow-down. I think you overrun the thread pool by trying to use it for I/O-bound operations. It's designed primarily for compute-bound ops.
Have a look at ForEachAsync. I feel that's what you're looking for (Stephen Toub's discussion, you'll find Wischik's videos meaningful too):
http://blogs.msdn.com/b/pfxteam/archive/2012/03/05/10278165.aspx
(Use degree of concurrency to reduce memory footprint)
http://vimeo.com/43808831
http://vimeo.com/43808833

How to limit number of HttpWebRequest per second towards a webserver?

I need to implement a throttling mechanism (requests per second) when using HttpWebRequest for making parallel requests towards one application server. My C# app must issue no more than 80 requests per second to a remote server. The limit is imposed by the remote service admins not as a hard limit but as "SLA" between my platform and theirs.
How can I control the number of requests per second when using HttpWebRequest?

I had the same problem and couldn't find a ready solution so I made one, and here it is. The idea is to use a BlockingCollection<T> to add items that need processing and use Reactive Extensions to subscribe with a rate-limited processor.
Throttle class is the renamed version of this rate limiter
public static class BlockingCollectionExtensions
{
// TODO: devise a way to avoid problems if collection gets too big (produced faster than consumed)
public static IObservable<T> AsRateLimitedObservable<T>(this BlockingCollection<T> sequence, int items, TimeSpan timePeriod, CancellationToken producerToken)
{
Subject<T> subject = new Subject<T>();
// this is a dummyToken just so we can recreate the TokenSource
// which we will pass the proxy class so it can cancel the task
// on disposal
CancellationToken dummyToken = new CancellationToken();
CancellationTokenSource tokenSource = CancellationTokenSource.CreateLinkedTokenSource(producerToken, dummyToken);
var consumingTask = new Task(() =>
{
using (var throttle = new Throttle(items, timePeriod))
{
while (!sequence.IsCompleted)
{
try
{
T item = sequence.Take(producerToken);
throttle.WaitToProceed();
try
{
subject.OnNext(item);
}
catch (Exception ex)
{
subject.OnError(ex);
}
}
catch (OperationCanceledException)
{
break;
}
}
subject.OnCompleted();
}
}, TaskCreationOptions.LongRunning);
return new TaskAwareObservable<T>(subject, consumingTask, tokenSource);
}
private class TaskAwareObservable<T> : IObservable<T>, IDisposable
{
private readonly Task task;
private readonly Subject<T> subject;
private readonly CancellationTokenSource taskCancellationTokenSource;
public TaskAwareObservable(Subject<T> subject, Task task, CancellationTokenSource tokenSource)
{
this.task = task;
this.subject = subject;
this.taskCancellationTokenSource = tokenSource;
}
public IDisposable Subscribe(IObserver<T> observer)
{
var disposable = subject.Subscribe(observer);
if (task.Status == TaskStatus.Created)
task.Start();
return disposable;
}
public void Dispose()
{
// cancel consumption and wait task to finish
taskCancellationTokenSource.Cancel();
task.Wait();
// dispose tokenSource and task
taskCancellationTokenSource.Dispose();
task.Dispose();
// dispose subject
subject.Dispose();
}
}
}
Unit test:
class BlockCollectionExtensionsTest
{
[Fact]
public void AsRateLimitedObservable()
{
const int maxItems = 1; // fix this to 1 to ease testing
TimeSpan during = TimeSpan.FromSeconds(1);
// populate collection
int[] items = new[] { 1, 2, 3, 4 };
BlockingCollection<int> collection = new BlockingCollection<int>();
foreach (var i in items) collection.Add(i);
collection.CompleteAdding();
IObservable<int> observable = collection.AsRateLimitedObservable(maxItems, during, CancellationToken.None);
BlockingCollection<int> processedItems = new BlockingCollection<int>();
ManualResetEvent completed = new ManualResetEvent(false);
DateTime last = DateTime.UtcNow;
observable
// this is so we'll receive exceptions
.ObserveOn(new SynchronizationContext())
.Subscribe(item =>
{
if (item == 1)
last = DateTime.UtcNow;
else
{
TimeSpan diff = (DateTime.UtcNow - last);
last = DateTime.UtcNow;
Assert.InRange(diff.TotalMilliseconds,
during.TotalMilliseconds - 30,
during.TotalMilliseconds + 30);
}
processedItems.Add(item);
},
() => completed.Set()
);
completed.WaitOne();
Assert.Equal(items, processedItems, new CollectionEqualityComparer<int>());
}
}

The Throttle() and Sample() extension methods (On Observable) allow you to regulate a fast sequence of events into a "slower" sequence.
Here is a blog post with an example of Sample(Timespan) that ensures a maxium rate.

My original post discussed how to add a throttling mechanism to WCF via client behavior extensions, but then was pointed out that I misread the question (doh!).
Overall the approach can be to check with a class that determines if we are violating the rate limit or not. There's already been a lot of discussion around how to check for rate violations.
Throttling method calls to M requests in N seconds
If you are violating the rate limit, then sleep for a fix interval and check again. If not, then go ahead and make the HttpWebRequest call.

Run x number of web requests simultaneously

Our company has a web service which I want to send XML files (stored on my drive) via my own HTTPWebRequest client in C#. This already works. The web service supports 5 synchronuous requests at the same time (I get a response from the web service once the processing on the server is completed). Processing takes about 5 minutes for each request.
Throwing too many requests (> 5) results in timeouts for my client. Also, this can lead to errors on the server side and incoherent data. Making changes on the server side is not an option (from different vendor).
Right now, my Webrequest client will send the XML and wait for the response using result.AsyncWaitHandle.WaitOne();
However, this way, only one request can be processed at a time although the web service supports 5. I tried using a Backgroundworker and Threadpool but they create too many requests at same, which make them useless to me. Any suggestion, how one could solve this problem? Create my own Threadpool with exactly 5 threads? Any suggestions, how to implement this?

The easy way is to create 5 threads ( aside: that's an odd number! ) that consume the xml files from a BlockingCollection.
Something like:
var bc = new BlockingCollection<string>();
for ( int i = 0 ; i < 5 ; i++ )
{
new Thread( () =>
{
foreach ( var xml in bc.GetConsumingEnumerable() )
{
// do work
}
}
).Start();
}
bc.Add( xml_1 );
bc.Add( xml_2 );
...
bc.CompleteAdding(); // threads will end when queue is exhausted

If you're on .Net 4, this looks like a perfect fit for Parallel.ForEach(). You can set its MaxDegreeOfParallelism, which means you are guaranteed that no more items are processed at one time.
Parallel.ForEach(items,
new ParallelOptions { MaxDegreeOfParallelism = 5 },
ProcessItem);
Here, ProcessItem is a method that processes one item by accessing your server and blocking until the processing is done. You could use a lambda instead, if you wanted.

Creating your own threadpool of five threads isn't tricky - Just create a concurrent queue of objects describing the request to make, and have five threads that loop through performing the task as needed. Add in an AutoResetEvent and you can make sure they don't spin furiously while there are no requests that need handling.
It can though be tricky to return the response to the correct calling thread. If this is the case for how the rest of your code works, I'd take a different approach and create a limiter that acts a bit like a monitor but allowing 5 simultaneous threads rather than only one:
private static class RequestLimiter
{
private static AutoResetEvent _are = new AutoResetEvent(false);
private static int _reqCnt = 0;
public ResponseObject DoRequest(RequestObject req)
{
for(;;)
{
if(Interlocked.Increment(ref _reqCnt) <= 5)
{
//code to create response object "resp".
Interlocked.Decrement(ref _reqCnt);
_are.Set();
return resp;
}
else
{
if(Interlocked.Decrement(ref _reqCnt) >= 5)//test so we don't end up waiting due to race on decrementing from finished thread.
_are.WaitOne();
}
}
}
}

You could write a little helper method, that would block the current thread until all the threads have finished executing the given action delegate.
static void SpawnThreads(int count, Action action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
new Thread(() =>
{
action();
countdown.Signal();
}).Start();
}
countdown.Wait();
}
And then use a BlockingCollection<string> (thread-safe collection), to keep track of your xml files. By using the helper method above, you could write something like:
static void Main(string[] args)
{
var xmlFiles = new BlockingCollection<string>();
// Add some xml files....
SpawnThreads(5, () =>
{
using (var web = new WebClient())
{
web.UploadFile(xmlFiles.Take());
}
});
Console.WriteLine("Done");
Console.ReadKey();
}
Update
An even better approach would be to upload the files async, so that you don't waste resources on using threads for an IO task.
Again you could write a helper method:
static void SpawnAsyncs(int count, Action<CountdownEvent> action)
{
var countdown = new CountdownEvent(count);
for (int i = 0; i < count; i++)
{
action(countdown);
}
countdown.Wait();
}
And use it like:
static void Main(string[] args)
{
var urlXML = new BlockingCollection<Tuple<string, string>>();
urlXML.Add(Tuple.Create("http://someurl.com", "filename"));
// Add some more to collection...
SpawnAsyncs(5, c =>
{
using (var web = new WebClient())
{
var current = urlXML.Take();
web.UploadFileCompleted += (s, e) =>
{
// some code to mess with e.Result (response)
c.Signal();
};
web.UploadFileAsyncAsync(new Uri(current.Item1), current.Item2);
}
});
Console.WriteLine("Done");
Console.ReadKey();
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

How to consume chuncks from ConcurrentQueue correctly - c#

Related

Can not run TPL Dataflow pipeline

C# Abortable Asynchronous Fifo Queue - leaking massive amounts of memory

Why is this TAP async/await code slower than the TPL version?

How to limit number of HttpWebRequest per second towards a webserver?

Run x number of web requests simultaneously

Categories

Resources