I am trying to achieve the following behaviour using the Task Parallel Library:
As messages arrive I would like to process them sequentially but in groups. So when the first message arrives it should be processed immediately. If 2 messages come in while the first is being processed then they should be processed in a group of 2.
I can almost get what I want using a BatchBlock linked to an ActionBlock
var batchBlock = new BatchBlock<int>(100);
var actionBlock = new ActionBlock<int[]>(list =>
{
// do work
// now trigger
batchBlock.TriggerBatch();
});
batchBlock.LinkTo(actionBlock);
The problem with the code above is that if an item arrives after the TriggerBatch() call then it needs to wait for the batch to fill up. If I trigger batch after each post instead then the ActionBlock always receives single messages.
Instead of BatchBlock, you could use BufferBlock with a Task the receives items from it and resends them in batches to the target, according to your logic. Because you need to try to send a message containing a batch, and cancel it if another item comes in, the target block (actionBlock in your sample) has to have BoundedCapacity set to 1.
So, what you do is that you first receive something. When you have that, you start sending asynchronously and you also try to receive more items. If sending completes first, you start over. If receiving completes first, you cancel sending, add the received items to the batch, and then start both asynchronous actions again.
The actual code is a bit more complicated, because it needs to deal with some corner cases (receiving and sending complete at the same time; sending couldn't be canceled; receiving completed, because the whole was completed; exceptions):
public static ITargetBlock<T> CreateBatchingWrapper<T>(
ITargetBlock<IReadOnlyList<T>> target)
{
// target should have BoundedCapacity == 1,
// but there is no way to check for that
var source = new BufferBlock<T>();
Task.Run(() => BatchItems(source, target));
return source;
}
private static async Task BatchItems<T>(
IReceivableSourceBlock<T> source, ITargetBlock<IReadOnlyList<T>> target)
{
try
{
while (true)
{
var messages = new List<T>();
// wait for first message in batch
if (!await source.OutputAvailableAsync())
{
// source was completed, complete target and return
target.Complete();
return;
}
// receive all there is right now
source.ReceiveAllInto(messages);
// try sending what we've got
var sendCancellation = new CancellationTokenSource();
var sendTask = target.SendAsync(messages, sendCancellation.Token);
var outputAvailableTask = source.OutputAvailableAsync();
while (true)
{
await Task.WhenAny(sendTask, outputAvailableTask);
// got another message, try cancelling send
if (outputAvailableTask.IsCompleted
&& outputAvailableTask.Result)
{
sendCancellation.Cancel();
// cancellation wasn't successful
// and the message was received, start another batch
if (!await sendTask.EnsureCancelled() && sendTask.Result)
break;
// send was cancelled, receive messages
source.ReceiveAllInto(messages);
// restart both Tasks
sendCancellation = new CancellationTokenSource();
sendTask = target.SendAsync(
messages, sendCancellation.Token);
outputAvailableTask = source.OutputAvailableAsync();
}
else
{
// we get here in three situations:
// 1. send was completed succesfully
// 2. send failed
// 3. input has completed
// in cases 2 and 3, this await is necessary
// in case 1, it's harmless
await sendTask;
break;
}
}
}
}
catch (Exception e)
{
source.Fault(e);
target.Fault(e);
}
}
/// <summary>
/// Returns a Task that completes when the given Task completes.
/// The Result is true if the Task was cancelled,
/// and false if it completed successfully.
/// If the Task was faulted, the returned Task is faulted too.
/// </summary>
public static Task<bool> EnsureCancelled(this Task task)
{
return task.ContinueWith(t =>
{
if (t.IsCanceled)
return true;
if (t.IsFaulted)
{
// rethrow the exception
ExceptionDispatchInfo.Capture(task.Exception.InnerException)
.Throw();
}
// completed successfully
return false;
});
}
public static void ReceiveAllInto<T>(
this IReceivableSourceBlock<T> source, List<T> targetCollection)
{
// TryReceiveAll would be best suited for this, except it's bugged
// (see http://connect.microsoft.com/VisualStudio/feedback/details/785185)
T item;
while (source.TryReceive(out item))
targetCollection.Add(item);
}
You can also use Timer; which will Trigger Batch on every 10 seconds
Related
Problem: I have a subscription to a never ending messaging service, my code needs to check if any message satisfies the condition, if it is satisfied, then close the subscription before all the messages are processed and return true. If I have processed all the messages and the condition isn't satisfied then I need to close the subscription and return false.
For example, the condition is foo = 5:
message dataset early success :
msg1: foo=1
msg2: foo=2
msg3: foo=5 <= condition satisfied, return true and stop processing
msg4: foo=6
message dataset failure :
msg1: foo=1
msg2: foo=2
msg3: foo=3
msg4: foo=4 <= no more messages, return false and stop processing
The subscription I use has a synchronous method that I have to pass an async EventHandler.
Here is my functioning code that works for both scenarios, lastMessageReceivedDateTime tracks when a message was last received (to identify the end of the messages) and _conditionStatisfied tells me if I've got my data:
private DateTime lastMessageReceivedDateTime;
private bool _conditionSatisfied;
public Task<bool> CheckSubscription(IThirdParyCode connection)
{
var subscription = connection.Subscribe(async (obj, args) =>
{
lastMessageReceivedDateTime = DateTime.Now;
if(args.Message.foo == 5)
{
_conditionSatisfied = true;
}
});
while (lastMessageReceivedDateTime.AddSeconds(1) > DateTime.Now && !_conditionSatisfied)
{
Thread.Sleep(500);
}
subscription?.Unsubscribe();
return _activityCheckSatisfied;
}
This works, but I wanted to know if there was a better solution.
Note: I can't simply await the async method, as it never returns/completes until I unsubscribe.
More info: The type of the connection is an IStanConnection (from NATS), and the signature of Subscribe is:
IStanSubscription Subscribe(string subject, StanSubscriptionOptions options,
EventHandler<StanMsgHandlerArgs> handler);
I had simplified the signature to focus on the code I had issue with.
Based on your code example I can assume that the message stream ends if there were no new messages within a second of the last message.
Your solution can be modified to eliminate active waiting loop and replace it with single await call. It would be based on two tasks:
First task would track successful completion (_conditionSatisfied in your example) and is going to be set by TaskCompletionSource.SetResult
Second task would try to signal end of the stream by using combination of CancellationToken task wrapper (example implementation of such wrapper) and CancellationTokenSource.CancelAfter which would try to cancel task after each iteration with deferral. This should replace lastMessageReceivedDateTime.AddSeconds(1) > DateTime.Now condition.
Modified code should look like this:
private CancellationTokenSource streamEndCancellation = new CancellationTokenSource();
private TaskCompletionSource<bool> satisfiedCompletionSource = new TaskCompletionSource<bool>();
public async Task<bool> CheckSubscription(IThirdParyCode connection)
{
// CancellationTokenTaskSource is in third-party library and not part of .NET
var streamEndSource = new CancellationTokenTaskSource<bool>(streamEndCancellation.Token);
var subscription = connection.Subscribe(async (obj, args) =>
{
lastMessageReceivedDateTime = DateTime.Now;
if(args.Message.foo == 5)
{
satisfiedCompletionSource.SetResult(true);
}
streamEndCancellation.CancelAfter(1000);
});
Task<bool> actualTask = await Task.WhenAny<bool>(satisfiedCompletionSource.Task, streamEndSource.Task);
subscription?.Unsubscribe();
return !actualTask.IsCanceled;
}
I'm trying to create an AWS SQS windows service consumer that will poll messages in batch of 10. Each messages will be executed in its own task for parallel execution. Message processing includes calling different api's and sending email so it might take some time.
My problem is that first, I only want to poll the queue when 10 messages can be processed immediately. This is due to sqs visibility timeout and having the received messages "wait" might go over the visibility timeout and be "back" on the queue. This will produce duplication. I don't think tweaking the visibility timeout is good, because there are still chances that messages will be duplicated and that's what I'm trying to avoid. Second, I want to have some sort of limit for parallelism (ex. max limit of 100 concurrent tasks), so that server resources can be kept at bay since there are also other apps running in the server.
How to achieve this? Or are there any other way to remedy these problems?
This answer makes the following assumptions:
Fetching messages from the AWS should be serialized. Only the processing of messages should be parallelized.
Every message fetched from the AWS should be processed. The whole execution should not terminate before all fetched messages have a chance to be processed.
Every message-processing operation should be awaited. The whole execution should not terminate before the completion of all started tasks.
Any error that occurs during the processing of a message should be ignored. The whole execution should not terminate because the processing of a single message failed.
Any error that occurs during the fetching of messages from the AWS should be fatal. The whole execution should terminate, but not before all currently running message-processing operations have completed.
The execution mechanism should be able to handle the case that a fetch-from-the-AWS operation returned a batch having a different number of messages than the requested number.
Below is an implementation that (hopefully) satisfies these requirements:
/// <summary>
/// Starts an execution loop that fetches batches of messages sequentially,
/// and process them one by one in parallel.
/// </summary>
public static async Task ExecutionLoopAsync<TMessage>(
Func<int, Task<TMessage[]>> fetchMessagesAsync,
Func<TMessage, Task> processMessageAsync,
int fetchCount,
int maxDegreeOfParallelism,
CancellationToken cancellationToken = default)
{
// Arguments validation omitted
var semaphore = new SemaphoreSlim(maxDegreeOfParallelism, maxDegreeOfParallelism);
// Count how many times we have acquired the semaphore, so that we know
// how many more times we have to acquire it before we exit from this method.
int acquiredCount = 0;
try
{
while (true)
{
Debug.Assert(acquiredCount == 0);
for (int i = 0; i < fetchCount; i++)
{
await semaphore.WaitAsync(cancellationToken);
acquiredCount++;
}
TMessage[] messages = await fetchMessagesAsync(fetchCount)
?? Array.Empty<TMessage>();
for (int i = 0; i < messages.Length; i++)
{
if (i >= fetchCount) // We got more messages than we asked for
{
await semaphore.WaitAsync();
acquiredCount++;
}
ProcessAndRelease(messages[i]);
acquiredCount--;
}
if (messages.Length < fetchCount)
{
// We got less messages than we asked for
semaphore.Release(fetchCount - messages.Length);
acquiredCount -= fetchCount - messages.Length;
}
// This method is 'async void' because it is not expected to throw ever
async void ProcessAndRelease(TMessage message)
{
try { await processMessageAsync(message); }
catch { } // Swallow exceptions
finally { semaphore.Release(); }
}
}
}
catch (SemaphoreFullException)
{
// Guard against the (unlikely) scenario that the counting logic is flawed.
// The counter is no longer reliable, so skip the awaiting in finally.
acquiredCount = maxDegreeOfParallelism;
throw;
}
finally
{
// Wait for all pending operations to complete. This could cause a deadlock
// in case the counter has become out of sync.
for (int i = acquiredCount; i < maxDegreeOfParallelism; i++)
await semaphore.WaitAsync();
}
}
Usage example:
var cts = new CancellationTokenSource();
Task executionTask = ExecutionLoopAsync<Message>(async count =>
{
return await GetBatchFromAwsAsync(count);
}, async message =>
{
await ProcessMessageAsync(message);
}, fetchCount: 10, maxDegreeOfParallelism: 100, cts.Token);
I have written a method to implement TPL ActionBlock, which will do a function to find the XPath of the element I am Posting to the block. I am sending the element from a real-time application (whenever the user finds an element Post it to the block). Now, I want to check the block is completed or not when I call a save button from another class. The logic is if the ActionBlok is completed when we click the save button element will save with some save logic, otherwise show a message box not yet ready. For the first time, this idea is working, but from the second element onwards the Actionblock is not accepting any. I am using block.Complete() and block.Completion.Wait() for the save check. Now I can show I am calling the code/logic.
Once create an element, post it ActionBlock
Class1
......
jobQ.Enqueue(familarizedElement);
......
Inside the ActionBlock
Class2
public class TPLCommonDataFlow
{
private ActionBlock<Element> _jobs;
public static bool TPLFlowStatus = false;
public static string JobStatus = string.Empty;
public TPLCommonDataFlow()
{
var executionDataflowBlockOptions = new ExecutionDataflowBlockOptions()
{
MaxDegreeOfParallelism = 2,
};
_jobs = new ActionBlock<Element>((job) =>
{
Thread.Sleep(5);
File.WriteAllText("C:\\Temp\\A.txt", "Started" + _jobs.InputCount.ToString());
JobStatus = "started";
Familiarization.FindAndUpdateXPath(job);
File.WriteAllText("C:\\Temp\\A.txt", "Finished");
}, executionDataflowBlockOptions);
_jobs.Complete();
//Wait for all messages to propagate through the network
_jobs.Completion.Wait();
//Checking all jobs are completed or not, if completed changing the boo, value.
if (_jobs.InputCount == 0)
{
TPLFlowStatus = true;
JobStatus = "stoped";
}
}
}
For the save I am checking the TPLFlowStatus boolean
Class3
if (class2.TPLFlowStatus == true)
{
//Save Logic
}
else
{
//Message Box showing Not Ready
}
Now what I want is to check each time the job is completed or not for each element in the save logic. If the block having two elements in Queue and one is finished, then the MessageBox needs to popup once the save button pressed. If all completed in the block, then need to go to the save logic.
The issue your seeing is that when a block is completed it will no longer accept new messages, you've told the block your done sending messages. To start sending a new batch of messages you can either keep the block alive by not calling complete and tracking the batch completion another way or you can just reset the block. A simple worker like this might be what you're looking for:
public class DataflowWorker
{
private ActionBlock<string> _jobs;
private void BuildJobsHandler()
{
_jobs = new ActionBlock<string>(x => Console.WriteLine(x) /*Do your work in the action block*/);
}
public async Task Enque(string element)
{
await _jobs.SendAsync(element);
}
public async Task CompleteJobsAndSave()
{
_jobs.Complete();
await _jobs.Completion;
//Save data when complete
BuildJobsHandler();
}
}
I am working on a web api project in which user perform some action and all the related user get notification regarding the user activity. To notify every user i am starting a new thread which perform the desire action. is it necessary to wait for this thread to terminate before request gets complete and return result to user.
P.S. Execution time for thread may increase with no of user.
Please Suggest any alternate if possible
Program Logic(Presently i am using await function to wait for async function to execute)
public async Task<IHttpActionResult> doSomething(arguments)
{
.
.
.
.
<!-- Perform some operation which includes some database transcations--!>
if(operation succesed)
{
await Notification(userid);
}
return result;
}
static void Main(string[] args)
{
var userIds = new[] { 1, 2, 3, 4, 5};
Console.WriteLine("Updating db for users...");
// Start new thread for notficiation send-out
Task.Run(() =>
{
foreach (var i in userIds)
Console.WriteLine("Sending notification for #user " + i);
}).ContinueWith(t => Console.WriteLine("Notifcation all sent!"));
Console.WriteLine("Return the result before notification all sent out!");
}
If you remove await in front of Task.Run() (equivalent to Notifcation() which returns Task<> in your case) and run then it will create separate thread for notification send-out.
public async Task<IHttpActionResult> doSomething(arguments)
{
bool isInsertDone ,isUpdateDone = false;
//create thread list
var task = new List<Task>();
// parallel tasks to thread list and execute that tasks
task.Add(Task.Run(() =>
{`enter code here`
isInsertDone = insertData(arguments)
}));
task.Add(Task.Run(() =>
{
isUpdateDone updateData(arguments)
}));
// wait for execute all above tasks
Task.WaitAll(task.ToArray());
// return response by result of insert and update.
return Ok<bool>(isInsertDone && isUpdateDone);
}
If it is going to be a long running function and there is no direct impact on the current function then there is no need to wait. Fire and forget. You can safely remove the await.
public async Task<IHttpActionResult> doSomething(arguments) {
//... Perform some operation which includes some async database transactions
if(operation succesed) {
NotificationsAsync(userid); //Start notifications and continue
}
return result;
}
I would suggest using a messaging queue for jobs like that but that is a more advanced topic which is out of scope for this question.
I am trying to implement the following Use Case. I have an Azure Worker Role that will monitor the Azure Storage Queue, and when a message comes in, this will trigger a job to run Asynchronously. I want to use the TPL if possible, and need the operations to support cancellation, so that when the Azure Role OnStop fires, jobs can exit gracefully if possible. The MyFixIt example posted by Scott Guthrie is almost exactly what I need, and I have used this as the template for my project. The one critical aspect not supported is the requirement to run the jobs asynchronously. In the FixIt code, once a job is launched, no other jobs will process until that one finishes. Some of the jobs my application will process are long running, and I need the worker role to be able to notice other incoming jobs and run those while the long running job is running.
The 2 key methods here are ProcessMessagesAsync, which monitors the queue, and ProcessMessage, which will run the job when a message comes in. Here is what I have, and it mostly works except it does not handle the CancellationRequest properly, and the Azure Worker Role will shut down without waiting for jobs to complete.
/// <summary>
/// Continuous loop that monitors the queue and launches jobs when they are retrieved.
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
public virtual async Task ProcessMessagesAsync(CancellationToken token)
{
CloudQueue queue = _queueClient.GetQueueReference(_queueName);
await queue.CreateIfNotExistsAsync(token);
while (!token.IsCancellationRequested)
{
Debug.WriteLine("inLoop");
// The default timeout is 90 seconds, so we won’t continuously poll the queue if there are no messages.
// Pass in a cancellation token, because the operation can be long-running.
CloudQueueMessage message = await queue.GetMessageAsync(token);
if (message != null)
{
ProcessMessage(message, queue, token);
}
else
{
await Task.Delay(500, token);
}
}
}
protected virtual async Task ProcessMessage(CloudQueueMessage message, CloudQueue queue, CancellationToken token)
{
var jobDetails = JobDetails.DeserializeJson(message.AsString);
var result = await _jobRunner.RunJob(jobDetails, token);
//todo handle error
//if (result.Status == JobStatus.Error)
await queue.DeleteMessageAsync(message);
}
Then the JobRunner runs the job requested. I have written a TestJob in which I am trying to simulate a long running job that can notice the CancellationRequest, and after a short cleanup period, exit the job early.
public virtual async Task<JobResult> RunJob(JobDetails jobDetails, CancellationToken token)
{
switch (jobDetails.JobName.ToLower())
{
case "testjob":
return await TestJob(jobDetails.Args, token);
}
return new JobResult(JobStatus.Error) { ErrorMessage = "The job requested does not exist." };
}
protected virtual async Task<JobResult> TestJob(List<string> jobArgs, CancellationToken token)
{
var message = "no message";
if (jobArgs != null && jobArgs.Any())
message = jobArgs[0];
return await Task.Run(async () =>
{
Debug.WriteLine(string.Format("Start:{0}", message));
for (int i = 1; i <= 800; i++)
{
if (token.IsCancellationRequested)
{
Debug.WriteLine("CancelationRequest in TestJob");
//simulate short time to cleanup and exit early
Thread.Sleep(1500);
Debug.WriteLine("Cancelation Job Cleanup finsihed.");
token.ThrowIfCancellationRequested();
}
Thread.Sleep(10);
}
Debug.WriteLine(string.Format("Finish:{0}", message));
return new JobResult(JobStatus.Success);
});
}
I have been searching and researching for 2 days now, including the TPL DataFlow library, and have not yet been able to come up with a way to make this work properly. I feel like the Call to ProcessMessage(message, queue, token) is not being done correctly, there even is a compiler warning 'Because this call is not awaited...'. But I DON'T want to await (which is what the FixIt example does), because then no other jobs get noticed until the running one is finished. This seems like it would not be an uncommon use case, though I cannot seem to find anyone describing it.
Thank you in advance for any help!
Danny Green
The reason this is happening is because you are not honouring the task returned from ProcessMessage. Because of this ProcessMessageAsync can finish before ProcessMessage gracefully completes or cancels. Keeping in mind that you don't want to await ProcessMessage because it will make message processing sequential, I would suggest that you keep a list of running tasks.
In other words, create a List in ProcessMessageAsync and add the task returned from ProcessMessage to this list. Then at the end of while loop you should loop through this list to cancel all pending tasks if token was cancelled.
Sorry I don't have VS handy but I hope you get the point.
Thank you Sanjay, Based on your suggestion I have come up with the following.
/// <summary>
/// Continuous loop that monitors the queue and launches jobs when they are retrieved.
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
public virtual async Task ProcessMessagesAsync(CancellationToken token)
{
CloudQueue queue = _queueClient.GetQueueReference(_queueName);
await queue.CreateIfNotExistsAsync(token);
var runningTasks = new ConcurrentDictionary<int, Task>();
while (!token.IsCancellationRequested)
{
Debug.WriteLine("inLoop");
// The default timeout is 90 seconds, so we won’t continuously poll the queue if there are no messages.
// Pass in a cancellation token, because the operation can be long-running.
CloudQueueMessage message = await queue.GetMessageAsync(token);
if (message != null)
{
var t = ProcessMessage(message, queue, token);
var c = t.ContinueWith(z => RemoveRunningTask(t.Id, runningTasks));
while (true)
{
if (runningTasks.TryAdd(t.Id, t))
break;
Task.Delay(25);
}
}
else
{
try
{
await Task.Delay(500, token);
}
catch (Exception ex)
{
Debug.WriteLine(ex.Message);
}
}
}
while (!runningTasks.IsEmpty)
{
Debug.WriteLine("Waiting for running tasks");
Task.Delay(500);
}
}
private static void RemoveRunningTask(int id, ConcurrentDictionary<int, Task> runningTasks)
{
while (true)
{
Task outTask;
if (runningTasks.TryRemove(id, out outTask))
break;
Task.Delay(25);
}
}
This seems to work, though I feel it is a little clumsy. I started out coding the 'ContinueWith' like this, but was surprised that the incoming task had a different Id value (I expected it to be the same Task):
var task = ProcessMessage(message, queue, token).ContinueWith(x =>
{
while (true)
{
Task outTask;
if (runningTasks.TryRemove(x.Id, out outTask))
break;
Task.Delay(25);
}
});
UPDATE:
It turns out that this still does not quite work, I somehow misread the results when testing earlier. Based on the MyFixIt example, in the Work Role OnStop I have the following code:
public override void OnStop()
{
Debug.WriteLine("OnStop_Begin");
tokenSource.Cancel();
tokenSource.Token.WaitHandle.WaitOne();
base.OnStop();
Debug.WriteLine("Onstop_End");
tokenSource.Dispose();
}
It appears that the tokenSource.Token.WaitHandle.WaitOne isn't really able to wait until all of the tasks that have a reference to the token have finished, so the role continues and stops even when tasks are still in the processing of finishing up. Is there some way to properly use the token to signal when the cancellation is actually completed?
Thanks!
UPDATE 2
Ok, I think I have a solution that is now working. It appears that the CancellationToken.WaitHandle is signaled when the .Cancel is called, so I'm not sure what the purpose of having it immediately after the .Cancel is called, it seems like it would always just continue immediately through that code? This is how it is in the FixIt example, but I don't really understand it. For my purpose, I have changed ProcessMessagesAsync to now get passed in a ManualResetEventSlim, and then set that after all tasks have finished. Then in OnStop I wait on that before finishing the Stop.
/// <summary>
/// Continuous loop that monitors the queue and launches jobs when they are retrieved.
/// </summary>
/// <param name="token"></param>
/// <returns></returns>
public virtual async Task ProcessMessagesAsync(CancellationToken token, ManualResetEventSlim reset)
{
CloudQueue queue = _queueClient.GetQueueReference(_queueName);
await queue.CreateIfNotExistsAsync(token);
var runningTasks = new ConcurrentDictionary<int, Task>();
while (!token.IsCancellationRequested)
{
Debug.WriteLine("inLoop");
// The default timeout is 90 seconds, so we won’t continuously poll the queue if there are no messages.
// Pass in a cancellation token, because the operation can be long-running.
CloudQueueMessage message = await queue.GetMessageAsync(token);
if (message != null)
{
var t = ProcessMessage(message, queue, token);
var c = t.ContinueWith(z => RemoveRunningTask(t.Id, runningTasks));
while (true)
{
if (runningTasks.TryAdd(t.Id, t))
break;
await Task.Delay(25);
}
}
else
{
try
{
await Task.Delay(500, token);
}
catch (Exception ex)
{
Debug.WriteLine(ex.Message);
}
}
}
while (!runningTasks.IsEmpty)
{
Debug.WriteLine("Waiting for running tasks");
await Task.Delay(500);
}
Debug.WriteLine("All tasks have finished, exiting ProcessMessagesAsync.");
reset.Set();
}
public override void OnStop()
{
Debug.WriteLine("OnStop_Begin");
tokenSource.Cancel();
tokenSource.Token.WaitHandle.WaitOne();
_reset.Wait();
base.OnStop();
Debug.WriteLine("Onstop_End");
tokenSource.Dispose();
}