I'm trying to create an AWS SQS windows service consumer that will poll messages in batch of 10. Each messages will be executed in its own task for parallel execution. Message processing includes calling different api's and sending email so it might take some time.
My problem is that first, I only want to poll the queue when 10 messages can be processed immediately. This is due to sqs visibility timeout and having the received messages "wait" might go over the visibility timeout and be "back" on the queue. This will produce duplication. I don't think tweaking the visibility timeout is good, because there are still chances that messages will be duplicated and that's what I'm trying to avoid. Second, I want to have some sort of limit for parallelism (ex. max limit of 100 concurrent tasks), so that server resources can be kept at bay since there are also other apps running in the server.
How to achieve this? Or are there any other way to remedy these problems?
This answer makes the following assumptions:
Fetching messages from the AWS should be serialized. Only the processing of messages should be parallelized.
Every message fetched from the AWS should be processed. The whole execution should not terminate before all fetched messages have a chance to be processed.
Every message-processing operation should be awaited. The whole execution should not terminate before the completion of all started tasks.
Any error that occurs during the processing of a message should be ignored. The whole execution should not terminate because the processing of a single message failed.
Any error that occurs during the fetching of messages from the AWS should be fatal. The whole execution should terminate, but not before all currently running message-processing operations have completed.
The execution mechanism should be able to handle the case that a fetch-from-the-AWS operation returned a batch having a different number of messages than the requested number.
Below is an implementation that (hopefully) satisfies these requirements:
/// <summary>
/// Starts an execution loop that fetches batches of messages sequentially,
/// and process them one by one in parallel.
/// </summary>
public static async Task ExecutionLoopAsync<TMessage>(
Func<int, Task<TMessage[]>> fetchMessagesAsync,
Func<TMessage, Task> processMessageAsync,
int fetchCount,
int maxDegreeOfParallelism,
CancellationToken cancellationToken = default)
{
// Arguments validation omitted
var semaphore = new SemaphoreSlim(maxDegreeOfParallelism, maxDegreeOfParallelism);
// Count how many times we have acquired the semaphore, so that we know
// how many more times we have to acquire it before we exit from this method.
int acquiredCount = 0;
try
{
while (true)
{
Debug.Assert(acquiredCount == 0);
for (int i = 0; i < fetchCount; i++)
{
await semaphore.WaitAsync(cancellationToken);
acquiredCount++;
}
TMessage[] messages = await fetchMessagesAsync(fetchCount)
?? Array.Empty<TMessage>();
for (int i = 0; i < messages.Length; i++)
{
if (i >= fetchCount) // We got more messages than we asked for
{
await semaphore.WaitAsync();
acquiredCount++;
}
ProcessAndRelease(messages[i]);
acquiredCount--;
}
if (messages.Length < fetchCount)
{
// We got less messages than we asked for
semaphore.Release(fetchCount - messages.Length);
acquiredCount -= fetchCount - messages.Length;
}
// This method is 'async void' because it is not expected to throw ever
async void ProcessAndRelease(TMessage message)
{
try { await processMessageAsync(message); }
catch { } // Swallow exceptions
finally { semaphore.Release(); }
}
}
}
catch (SemaphoreFullException)
{
// Guard against the (unlikely) scenario that the counting logic is flawed.
// The counter is no longer reliable, so skip the awaiting in finally.
acquiredCount = maxDegreeOfParallelism;
throw;
}
finally
{
// Wait for all pending operations to complete. This could cause a deadlock
// in case the counter has become out of sync.
for (int i = acquiredCount; i < maxDegreeOfParallelism; i++)
await semaphore.WaitAsync();
}
}
Usage example:
var cts = new CancellationTokenSource();
Task executionTask = ExecutionLoopAsync<Message>(async count =>
{
return await GetBatchFromAwsAsync(count);
}, async message =>
{
await ProcessMessageAsync(message);
}, fetchCount: 10, maxDegreeOfParallelism: 100, cts.Token);
Related
it seems that I do not understand TPL Dataflow error handling.
Lets assume I have a list of items I wanna process and I use a ActionBlock for that:
var actionBlock = new ActionBlock<int[]>(async tasks =>
{
foreach (var task in tasks)
{
await Task.Delay(1);
if (task > 30)
{
throw new InvalidOperationException();
}
Console.WriteLine("{0} Completed", task);
}
}, new ExecutionDataflowBlockOptions
{
BoundedCapacity = 200,
MaxDegreeOfParallelism = 4
});
for (i = 0; i < 10000; i++)
{
if (!await bufferBlock.SendAsync(i))
{
break;
}
}
actionBlock.Complete();
await actionBlock.Completion;
If an error occurs the block transitions to faulted state and SendAsync(...) returns false. I can just stop my loop and complete it and when I await the completion an exception is thrown. So far so good.
When I put a BufferBlock in between it does not work anymore:
bufferBlock.LinkTo(actionBlock, new DataflowLinkOptions
{
PropagateCompletion = true
});
for (i = 0; i < 10000; i++)
{
if (!await bufferBlock.SendAsync(i, cts.Token))
{
break;
}
}
bufferBlock.Complete();
await actionBlock.Completion;
The call to SendAsync() just "blocks" forever, because the BufferBlock never transitions to faulted state.
The only solution I found is this:
using (var cts = new CancellationTokenSource())
{
actionBlock.Completion.ContinueWith(x =>
{
if (x.Status != TaskStatus.RanToCompletion)
{
cts.Cancel();
}
});
var i = 0;
try
{
for (i = 0; i < 10000; i++)
{
if (cts.Token.IsCancellationRequested)
{
break;
}
if (!await bufferBlock.SendAsync(i, cts.Token))
{
break;
}
}
}
catch (OperationCanceledException)
{
}
bufferBlock.Complete();
await actionBlock.Completion;
}
Because the state propagates I have to listen to the state of the last block in my network and when this block stops I have to stop my loop.
Is this the intended way to work with Dataflow library or is there a better solution?
Don't allow unhandled exceptions. An unhandled exception in a block means the block and by extension the entire pipeline is terminally broken and must be aborted. That's not a TPL Dataflow bug, that's how the overall dataflow paradigm works. Exceptions are meant to signal errors up a call stack. There's no call stack in a dataflow though.
Blocks are independent workers that communicate through messages. There's no ownership relation between linked blocks and a faulting block doesn't mean any previous or following blocks should have to abort as well. That's why PropagateCompletion is false by default.
If a source links to more than one blocks the messages can easily go to the other blocks. It's also possible to change the links between blocks at runtime.
In a pipeline there are two different kinds of errors:
Message errors that occur when a block/actor/worker processes a message
Pipeline errors that invalidate the pipeline and may require aborting
There's no reason to abort the pipeline if a single message faults.
Message errors
If something goes wrong while processing a message, the actor should do something with that message and proceed with the next one. That something may be:
Log the error and go on
Send an "error" message to another block
Use a Result<TMessage,TError> class in the entire pipeline instead of using raw message types, and add any errors to the result
Retry and recovery strategies can be built on top of that, eg forwarding any failed messages to a "retry" block or dead message block
The simplest way would be to just catch the exceptions and log them :
var block=new ActionBlock<int[]>(msg=>{
try
{
...
}
catch(Exception exc)
{
_logger.LogError(exc);
}
});
Another option is to manually post to eg a dead-letter queue :
var dead=new BufferBlock<(int[] data,Exception error)>();
var block=new ActionBlock<int[]>(msg=>{
try
{
...
}
catch(Exception exc)
{
await _dead.SendAsync(msg,exc);
_logger.LogError(exc);
}
});
Going even further, one could define a Result<TMessage,TError> class to wrap results. Downstream blocks could ignore faulted results. The LinkTo predicate can also be used to reroute error messages. I'll cheat and hard-code the error to Exception. A better implementation would use different types for success and error :
record Result<TMessage>(TMessage? Message,Exception ? Error)
{
public bool HasError=>error!=null;
}
var block1=new TransformBlock<Result<int[]>,Result<double>>(msg=>{
if (msg.HasError)
{
//Propagate the error message
return new Result<double>(default,msg.Error);
}
try
{
var sum=(double)msg.Message.Sum();
if (sum % 5 ==0)
{
throw new Exception("Why not?");
}
return new Result(sum,null);
}
catch(Exception exc)
{
return new Result(null,exc);
}
});
var block2=new ActionBlock<Result<double>>(...);
block1.LinkTo(block2);
Another option is to redirect error messages to a different block:
var errorBlock=new ActionBlock<Result<int[]>>(msg=>{
_logger.LogError(msg.Error);
});
block1.LinkTo(errorBlock,msg=>msg.HasError);
block1.LinkTo(block2);
This redirects all errored messages to the error block. All other messages move on to block2
Pipeline errors
In some cases, an error is so severe the current block can't recover and perhaps even the entire pipeline must be cancelled/aborted. Cancellation in .NET is handled through a CancellationToken. All blocks accept a CancellationToken to allow aborting.
There's no single abort strategy that's appropriate to all pipelines. Propagating cancellation forward is common but definitely not the only option.
In the simplest case,
var pipeLineCancellation = new CancellationTokenSource();
var block1=new TransformBlock<Result<int[]>,Result<double>>(msg=>{
...
},
new ExecutionDataflowBlockOptions {
CancellationToken=pipeLineCancellation.Token
});
The block exception handler could request cancellation in case of a serious error :
//Wrong table name. We can't use the database
catch(SqlException exc) when (exc.Number ==208)
{
...
pipeLineCancellation.Cancel();
}
This would abort all blocks that use the same CancellationTokenSource. That doesn't mean that all blocks should be connected to the same CancellationTokenSource though.
Flowing cancellation backwards
In Go pipelines it's common to use an error channel that sends a cancellation message to the previous block. The same can be done in C# using linked CancellationTokenSources. One could even say this is even better than Go.
It's possible to create multiple linked CancellationTokenSources with CreateLinkedTokenSource. By creating sources that link backwards we can have a block signal cancellation for its own source and have the cancellation flow to the root.
var cts5=new CancellationTokenSource();
var cts4=CancellationTokenSource.CreateLinkedTokenSource(cts5.Token);
...
var cts1=CancellationTokenSource.CreateLinkedTokenSource(cts2.Token);
...
var block3=new TransformBlock<Result<int[]>,Result<double>>(msg=>{
...
catch(SqlException)
{
cts3.Cancel();
}
},
new ExecutionDataflowBlockOptions {
CancellationToken=cts3.Token
});
This will signal cancellation backwards, block by block, without cancelling the downstream blocks.
Pipeline Patterns
Dataflow in .NET is a gem few people know about, so it's really hard to find good references and patterns. The concepts are similar in Go though, so one could use the patterns found in Go Concurrency Patterns: Pipelines and cancellation.
The TPL Dataflow implements the processing loop and completion propagation so one typically only needs to provide the Action or Func that processes messages. The rest of the patterns have to be implemented, although .NET offers some advantages over Go.
The done channel is essentially a CancellationTokenSource.
Fan-in, fan-out are already handled through existing blocks, or can be handled using a relatively simple custom block that clones messages
CancellationTokenSources can be linked explicitly. In Go each "stage" (essentially a block) has to propagate completion/cancellation to other stages
One CancellationTokenSource can be used by all stages/blocks.
Linking allows not just easier composition but even runtime modifications to the pipeline/mesh.
Let's say we want to just stop processing messages after a while, even though there's no error. All that's needed is to create a CTS used by all blocks:
var pipeLineCancellation = new CancellationTokenSource();
var block1=new TransformBlock<Result<int[]>,Result<double>>(msg=>{
...
},
new ExecutionDataflowBlockOptions {
CancellationToken=pipeLineCancellation.Token
});
var block2 =.....;
pipeLineCancellation.Cancel();
Perhaps we want to run the pipeline for only a minute? Easy with
var pipeLineCancellation = new CancellationTokenSource(60000);
There are some disadvantages too, as a Dataflow block has no access to the "channels" or control over the loop
In Go it's easy to pass data, error and done channels to each stage, simplifying the error reporting and completion. In .NET the block delegates may have to access other blocks or CTSs directly.
In Go it's easier to use common state to eg accumulate data, or manage session/remote connection state. Imagine stage/block that controls a screen scraper like Selenium. We really don't want to restart the browser on every message.
Or we may want to insert data into a database using SqlBulkCopy. With an ActionBlock we'd have to create a new instance for each batch, which may or may not be a problem.
I have a problem with determining how to detect completion within a looping TPL Dataflow.
I have a feedback loop in part of a dataflow which is making GET requests to a remote server and processing data responses (transforming these with more dataflow then committing the results).
The data source splits its results into pages of 1000 records, and won't tell me how many pages it has available for me. I have to just keep reading until i get less than a full page of data.
Usually the number of pages is 1, frequently it is up to 10, every now and again we have 1000s.
I have many requests to fetch at the start.
I want to be able to use a pool of threads to deal with this, all of which is fine, I can queue multiple requests for data and request them concurrently. If I stumble across an instance where I need to get a big number of pages I want to be using all of my threads for this. I don't want to be left with one thread churning away whilst the others have finished.
The issue I have is when I drop this logic into dataflow, such as:
//generate initial requests for activity
var request = new TransformManyBlock<int, DataRequest>(cmp => QueueRequests(cmp));
//fetch the initial requests and feedback more requests to our input buffer if we need to
TransformBlock<DataRequest, DataResponse> fetch = null;
fetch = new TransformBlock<DataRequest, DataResponse>(async req =>
{
var resp = await Fetch(req);
if (resp.Results.Count == 1000)
await fetch.SendAsync(QueueAnotherRequest(req));
return resp;
}
, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 10 });
//commit each type of request
var commit = new ActionBlock<DataResponse>(async resp => await Commit(resp));
request.LinkTo(fetch);
fetch.LinkTo(commit);
//when are we complete?
QueueRequests produces an IEnumerable<DataRequest>. I queue the next N page requests at once, accepting that this means I send slightly more calls than I need to. DataRequest instances share a LastPage counter to avoid neadlessly making requests that we know are after the last page. All this is fine.
The problem:
If I loop by feeding back more requests into fetch's input buffer as I've shown in this example, then i have a problem with how to signal (or even detect) completion. I can't set completion on fetch from request, as once completion is set I can't feedback any more.
I can monitor for the input and output buffers being empty on fetch, but I think I'd be risking fetch still being busy with a request when I set completion, thus preventing queuing requests for additional pages.
I could do with some way of knowing that fetch is busy (either has input or is busy processing an input).
Am I missing an obvious/straightforward way to solve this?
I could loop within fetch, rather than queuing more requests. The problem with that is I want to be able to use a set maximum number of threads to throttle what I'm doing to the remote server. Could a parallel loop inside the block share a scheduler with the block itself and the resulting thread count be controlled via the scheduler?
I could create a custom transform block for fetch to handle the completion signalling. Seems like a lot of work for such a simple scenario.
Many thanks for any help offered!
In TPL Dataflow, you can link the blocks with DataflowLinkOptions with specifying the propagation of completion of the block:
request.LinkTo(fetch, new DataflowLinkOptions { PropagateCompletion = true });
fetch.LinkTo(commit, new DataflowLinkOptions { PropagateCompletion = true });
After that, you simply call the Complete() method for the request block, and you're done!
// the completion will be propagated to all the blocks
request.Complete();
The final thing you should use is Completion task property of the last block:
commit.Completion.ContinueWith(t =>
{
/* check the status of the task and correctness of the requests handling */
});
For now I have added a simple busy state counter to the fetch block:-
int fetch_busy = 0;
TransformBlock<DataRequest, DataResponse> fetch_activity=null;
fetch = new TransformBlock<DataRequest, ActivityResponse>(async req =>
{
try
{
Interlocked.Increment(ref fetch_busy);
var resp = await Fetch(req);
if (resp.Results.Count == 1000)
{
await fetch.SendAsync( QueueAnotherRequest(req) );
}
Interlocked.Decrement(ref fetch_busy);
return resp;
}
catch (Exception ex)
{
Interlocked.Decrement(ref fetch_busy);
throw ex;
}
}
, new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = 10 });
Which I then use to signal complete as follows:-
request.Completion.ContinueWith(async _ =>
{
while ( fetch.InputCount > 0 || fetch_busy > 0 )
{
await Task.Delay(100);
}
fetch.Complete();
});
Which doesnt seem very elegant, but should work I think.
I am trying to implement the following Use Case. I have an Azure Worker Role that will monitor the Azure Storage Queue, and when a message comes in, this will trigger a job to run Asynchronously. I want to use the TPL if possible, and need the operations to support cancellation, so that when the Azure Role OnStop fires, jobs can exit gracefully if possible. The MyFixIt example posted by Scott Guthrie is almost exactly what I need, and I have used this as the template for my project. The one critical aspect not supported is the requirement to run the jobs asynchronously. In the FixIt code, once a job is launched, no other jobs will process until that one finishes. Some of the jobs my application will process are long running, and I need the worker role to be able to notice other incoming jobs and run those while the long running job is running.
The 2 key methods here are ProcessMessagesAsync, which monitors the queue, and ProcessMessage, which will run the job when a message comes in. Here is what I have, and it mostly works except it does not handle the CancellationRequest properly, and the Azure Worker Role will shut down without waiting for jobs to complete.
/// <summary>
/// Continuous loop that monitors the queue and launches jobs when they are retrieved.
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
public virtual async Task ProcessMessagesAsync(CancellationToken token)
{
CloudQueue queue = _queueClient.GetQueueReference(_queueName);
await queue.CreateIfNotExistsAsync(token);
while (!token.IsCancellationRequested)
{
Debug.WriteLine("inLoop");
// The default timeout is 90 seconds, so we won’t continuously poll the queue if there are no messages.
// Pass in a cancellation token, because the operation can be long-running.
CloudQueueMessage message = await queue.GetMessageAsync(token);
if (message != null)
{
ProcessMessage(message, queue, token);
}
else
{
await Task.Delay(500, token);
}
}
}
protected virtual async Task ProcessMessage(CloudQueueMessage message, CloudQueue queue, CancellationToken token)
{
var jobDetails = JobDetails.DeserializeJson(message.AsString);
var result = await _jobRunner.RunJob(jobDetails, token);
//todo handle error
//if (result.Status == JobStatus.Error)
await queue.DeleteMessageAsync(message);
}
Then the JobRunner runs the job requested. I have written a TestJob in which I am trying to simulate a long running job that can notice the CancellationRequest, and after a short cleanup period, exit the job early.
public virtual async Task<JobResult> RunJob(JobDetails jobDetails, CancellationToken token)
{
switch (jobDetails.JobName.ToLower())
{
case "testjob":
return await TestJob(jobDetails.Args, token);
}
return new JobResult(JobStatus.Error) { ErrorMessage = "The job requested does not exist." };
}
protected virtual async Task<JobResult> TestJob(List<string> jobArgs, CancellationToken token)
{
var message = "no message";
if (jobArgs != null && jobArgs.Any())
message = jobArgs[0];
return await Task.Run(async () =>
{
Debug.WriteLine(string.Format("Start:{0}", message));
for (int i = 1; i <= 800; i++)
{
if (token.IsCancellationRequested)
{
Debug.WriteLine("CancelationRequest in TestJob");
//simulate short time to cleanup and exit early
Thread.Sleep(1500);
Debug.WriteLine("Cancelation Job Cleanup finsihed.");
token.ThrowIfCancellationRequested();
}
Thread.Sleep(10);
}
Debug.WriteLine(string.Format("Finish:{0}", message));
return new JobResult(JobStatus.Success);
});
}
I have been searching and researching for 2 days now, including the TPL DataFlow library, and have not yet been able to come up with a way to make this work properly. I feel like the Call to ProcessMessage(message, queue, token) is not being done correctly, there even is a compiler warning 'Because this call is not awaited...'. But I DON'T want to await (which is what the FixIt example does), because then no other jobs get noticed until the running one is finished. This seems like it would not be an uncommon use case, though I cannot seem to find anyone describing it.
Thank you in advance for any help!
Danny Green
The reason this is happening is because you are not honouring the task returned from ProcessMessage. Because of this ProcessMessageAsync can finish before ProcessMessage gracefully completes or cancels. Keeping in mind that you don't want to await ProcessMessage because it will make message processing sequential, I would suggest that you keep a list of running tasks.
In other words, create a List in ProcessMessageAsync and add the task returned from ProcessMessage to this list. Then at the end of while loop you should loop through this list to cancel all pending tasks if token was cancelled.
Sorry I don't have VS handy but I hope you get the point.
Thank you Sanjay, Based on your suggestion I have come up with the following.
/// <summary>
/// Continuous loop that monitors the queue and launches jobs when they are retrieved.
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
public virtual async Task ProcessMessagesAsync(CancellationToken token)
{
CloudQueue queue = _queueClient.GetQueueReference(_queueName);
await queue.CreateIfNotExistsAsync(token);
var runningTasks = new ConcurrentDictionary<int, Task>();
while (!token.IsCancellationRequested)
{
Debug.WriteLine("inLoop");
// The default timeout is 90 seconds, so we won’t continuously poll the queue if there are no messages.
// Pass in a cancellation token, because the operation can be long-running.
CloudQueueMessage message = await queue.GetMessageAsync(token);
if (message != null)
{
var t = ProcessMessage(message, queue, token);
var c = t.ContinueWith(z => RemoveRunningTask(t.Id, runningTasks));
while (true)
{
if (runningTasks.TryAdd(t.Id, t))
break;
Task.Delay(25);
}
}
else
{
try
{
await Task.Delay(500, token);
}
catch (Exception ex)
{
Debug.WriteLine(ex.Message);
}
}
}
while (!runningTasks.IsEmpty)
{
Debug.WriteLine("Waiting for running tasks");
Task.Delay(500);
}
}
private static void RemoveRunningTask(int id, ConcurrentDictionary<int, Task> runningTasks)
{
while (true)
{
Task outTask;
if (runningTasks.TryRemove(id, out outTask))
break;
Task.Delay(25);
}
}
This seems to work, though I feel it is a little clumsy. I started out coding the 'ContinueWith' like this, but was surprised that the incoming task had a different Id value (I expected it to be the same Task):
var task = ProcessMessage(message, queue, token).ContinueWith(x =>
{
while (true)
{
Task outTask;
if (runningTasks.TryRemove(x.Id, out outTask))
break;
Task.Delay(25);
}
});
UPDATE:
It turns out that this still does not quite work, I somehow misread the results when testing earlier. Based on the MyFixIt example, in the Work Role OnStop I have the following code:
public override void OnStop()
{
Debug.WriteLine("OnStop_Begin");
tokenSource.Cancel();
tokenSource.Token.WaitHandle.WaitOne();
base.OnStop();
Debug.WriteLine("Onstop_End");
tokenSource.Dispose();
}
It appears that the tokenSource.Token.WaitHandle.WaitOne isn't really able to wait until all of the tasks that have a reference to the token have finished, so the role continues and stops even when tasks are still in the processing of finishing up. Is there some way to properly use the token to signal when the cancellation is actually completed?
Thanks!
UPDATE 2
Ok, I think I have a solution that is now working. It appears that the CancellationToken.WaitHandle is signaled when the .Cancel is called, so I'm not sure what the purpose of having it immediately after the .Cancel is called, it seems like it would always just continue immediately through that code? This is how it is in the FixIt example, but I don't really understand it. For my purpose, I have changed ProcessMessagesAsync to now get passed in a ManualResetEventSlim, and then set that after all tasks have finished. Then in OnStop I wait on that before finishing the Stop.
/// <summary>
/// Continuous loop that monitors the queue and launches jobs when they are retrieved.
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
public virtual async Task ProcessMessagesAsync(CancellationToken token, ManualResetEventSlim reset)
{
CloudQueue queue = _queueClient.GetQueueReference(_queueName);
await queue.CreateIfNotExistsAsync(token);
var runningTasks = new ConcurrentDictionary<int, Task>();
while (!token.IsCancellationRequested)
{
Debug.WriteLine("inLoop");
// The default timeout is 90 seconds, so we won’t continuously poll the queue if there are no messages.
// Pass in a cancellation token, because the operation can be long-running.
CloudQueueMessage message = await queue.GetMessageAsync(token);
if (message != null)
{
var t = ProcessMessage(message, queue, token);
var c = t.ContinueWith(z => RemoveRunningTask(t.Id, runningTasks));
while (true)
{
if (runningTasks.TryAdd(t.Id, t))
break;
await Task.Delay(25);
}
}
else
{
try
{
await Task.Delay(500, token);
}
catch (Exception ex)
{
Debug.WriteLine(ex.Message);
}
}
}
while (!runningTasks.IsEmpty)
{
Debug.WriteLine("Waiting for running tasks");
await Task.Delay(500);
}
Debug.WriteLine("All tasks have finished, exiting ProcessMessagesAsync.");
reset.Set();
}
public override void OnStop()
{
Debug.WriteLine("OnStop_Begin");
tokenSource.Cancel();
tokenSource.Token.WaitHandle.WaitOne();
_reset.Wait();
base.OnStop();
Debug.WriteLine("Onstop_End");
tokenSource.Dispose();
}
I'm writing a Windows Service that will kick off multiple worker threads that will listen to Amazon SQS queues and process messages. There will be about 20 threads listening to 10 queues.
The threads will have to be always running and that's why I'm leaning towards to actually using actual threads for the worker loops rather than threadpool threads.
Here is a top level implementation. Windows service will kick off multiple worker threads and each will listen to it's queue and process messages.
protected override void OnStart(string[] args)
{
for (int i = 0; i < _workers; i++)
{
new Thread(RunWorker).Start();
}
}
Here is the implementation of the work
public async void RunWorker()
{
while(true)
{
// .. get message from amazon sqs sync.. about 20ms
var message = sqsClient.ReceiveMessage();
try
{
await PerformWebRequestAsync(message);
await InsertIntoDbAsync(message);
}
catch(SomeExeception)
{
// ... log
//continue to retry
continue;
}
sqsClient.DeleteMessage();
}
}
I know I can perform the same operation with Task.Run and execute it on the threadpool thread rather than starting individual thread, but I don't see a reason for that since each thread will always be running.
Do you see any problems with this implementation? How reliable would it be to leave threads always running in this fashion and what can I do to make sure that each thread is always running?
One problem with your existing solution is that you call your RunWorker in a fire-and-forget manner, albeit on a new thread (i.e., new Thread(RunWorker).Start()).
RunWorker is an async method, it will return to the caller when the execution point hits the first await (i.e. await PerformWebRequestAsync(message)). If PerformWebRequestAsync returns a pending task, RunWorker returns and the new thread you just started terminates.
I don't think you need a new thread here at all, just use AmazonSQSClient.ReceiveMessageAsync and await its result. Another thing is that you shouldn't be using async void methods unless you really don't care about tracking the state of the asynchronous task. Use async Task instead.
Your code might look like this:
List<Task> _workers = new List<Task>();
CancellationTokenSource _cts = new CancellationTokenSource();
protected override void OnStart(string[] args)
{
for (int i = 0; i < _MAX_WORKERS; i++)
{
_workers.Add(RunWorkerAsync(_cts.Token));
}
}
public async Task RunWorkerAsync(CancellationToken token)
{
while(true)
{
token.ThrowIfCancellationRequested();
// .. get message from amazon sqs sync.. about 20ms
var message = await sqsClient.ReceiveMessageAsync().ConfigureAwait(false);
try
{
await PerformWebRequestAsync(message);
await InsertIntoDbAsync(message);
}
catch(SomeExeception)
{
// ... log
//continue to retry
continue;
}
sqsClient.DeleteMessage();
}
}
Now, to stop all pending workers, you could simple do this (from the main "request dispatcher" thread):
_cts.Cancel();
try
{
Task.WaitAll(_workers.ToArray());
}
catch (AggregateException ex)
{
ex.Handle(inner => inner is OperationCanceledException);
}
Note, ConfigureAwait(false) is optional for Windows Service, because there's no synchronization context on the initial thread, by default. However, I'd keep it that way to make the code independent of the execution environment (for cases where there is synchronization context).
Finally, if for some reason you cannot use ReceiveMessageAsync, or you need to call another blocking API, or simply do a piece of CPU intensive work at the beginning of RunWorkerAsync, just wrap it with Task.Run (as opposed to wrapping the whole RunWorkerAsync):
var message = await Task.Run(
() => sqsClient.ReceiveMessage()).ConfigureAwait(false);
Well, for one I'd use a CancellationTokenSource instantiated in the service and passed down to the workers. Your while statement would become:
while(!cancellationTokenSource.IsCancellationRequested)
{
//rest of the code
}
This way you can cancel all your workers from the OnStop service method.
Additionally, you should watch for:
If you're playing with thread states from outside of the thread, then a ThreadStateException, or ThreadInterruptedException or one of the others might be thrown. So, you want to handle a proper thread restart.
Do the workers need to run without pause in-between iterations? I would throw in a sleep in there (even a few ms's) just so they don't keep the CPU up for nothing.
You need to handle ThreadStartException and restart the worker, if it occurs.
Other than that there's no reason why those 10 treads can't run for as long as the service runs (days, weeks, months at a time).
I am trying to achieve the following behaviour using the Task Parallel Library:
As messages arrive I would like to process them sequentially but in groups. So when the first message arrives it should be processed immediately. If 2 messages come in while the first is being processed then they should be processed in a group of 2.
I can almost get what I want using a BatchBlock linked to an ActionBlock
var batchBlock = new BatchBlock<int>(100);
var actionBlock = new ActionBlock<int[]>(list =>
{
// do work
// now trigger
batchBlock.TriggerBatch();
});
batchBlock.LinkTo(actionBlock);
The problem with the code above is that if an item arrives after the TriggerBatch() call then it needs to wait for the batch to fill up. If I trigger batch after each post instead then the ActionBlock always receives single messages.
Instead of BatchBlock, you could use BufferBlock with a Task the receives items from it and resends them in batches to the target, according to your logic. Because you need to try to send a message containing a batch, and cancel it if another item comes in, the target block (actionBlock in your sample) has to have BoundedCapacity set to 1.
So, what you do is that you first receive something. When you have that, you start sending asynchronously and you also try to receive more items. If sending completes first, you start over. If receiving completes first, you cancel sending, add the received items to the batch, and then start both asynchronous actions again.
The actual code is a bit more complicated, because it needs to deal with some corner cases (receiving and sending complete at the same time; sending couldn't be canceled; receiving completed, because the whole was completed; exceptions):
public static ITargetBlock<T> CreateBatchingWrapper<T>(
ITargetBlock<IReadOnlyList<T>> target)
{
// target should have BoundedCapacity == 1,
// but there is no way to check for that
var source = new BufferBlock<T>();
Task.Run(() => BatchItems(source, target));
return source;
}
private static async Task BatchItems<T>(
IReceivableSourceBlock<T> source, ITargetBlock<IReadOnlyList<T>> target)
{
try
{
while (true)
{
var messages = new List<T>();
// wait for first message in batch
if (!await source.OutputAvailableAsync())
{
// source was completed, complete target and return
target.Complete();
return;
}
// receive all there is right now
source.ReceiveAllInto(messages);
// try sending what we've got
var sendCancellation = new CancellationTokenSource();
var sendTask = target.SendAsync(messages, sendCancellation.Token);
var outputAvailableTask = source.OutputAvailableAsync();
while (true)
{
await Task.WhenAny(sendTask, outputAvailableTask);
// got another message, try cancelling send
if (outputAvailableTask.IsCompleted
&& outputAvailableTask.Result)
{
sendCancellation.Cancel();
// cancellation wasn't successful
// and the message was received, start another batch
if (!await sendTask.EnsureCancelled() && sendTask.Result)
break;
// send was cancelled, receive messages
source.ReceiveAllInto(messages);
// restart both Tasks
sendCancellation = new CancellationTokenSource();
sendTask = target.SendAsync(
messages, sendCancellation.Token);
outputAvailableTask = source.OutputAvailableAsync();
}
else
{
// we get here in three situations:
// 1. send was completed succesfully
// 2. send failed
// 3. input has completed
// in cases 2 and 3, this await is necessary
// in case 1, it's harmless
await sendTask;
break;
}
}
}
}
catch (Exception e)
{
source.Fault(e);
target.Fault(e);
}
}
/// <summary>
/// Returns a Task that completes when the given Task completes.
/// The Result is true if the Task was cancelled,
/// and false if it completed successfully.
/// If the Task was faulted, the returned Task is faulted too.
/// </summary>
public static Task<bool> EnsureCancelled(this Task task)
{
return task.ContinueWith(t =>
{
if (t.IsCanceled)
return true;
if (t.IsFaulted)
{
// rethrow the exception
ExceptionDispatchInfo.Capture(task.Exception.InnerException)
.Throw();
}
// completed successfully
return false;
});
}
public static void ReceiveAllInto<T>(
this IReceivableSourceBlock<T> source, List<T> targetCollection)
{
// TryReceiveAll would be best suited for this, except it's bugged
// (see http://connect.microsoft.com/VisualStudio/feedback/details/785185)
T item;
while (source.TryReceive(out item))
targetCollection.Add(item);
}
You can also use Timer; which will Trigger Batch on every 10 seconds