Execute set of tasks in parallel but with a group timeout

Execute set of tasks in parallel but with a group timeout - c#

I'm currently trying to write a status checking tool with a reliable timeout value. One way I'd seen how to do this was using Task.WhenAny() and including a Task.Delay, however it doesn't seem to produce the results I expect:
public void DoIUnderstandTasksTest()
{
var checkTasks = new List<Task>();
// Create a list of dummy tasks that should just delay or "wait"
// for some multiple of the timeout
for (int i = 0; i < 10; i++)
{
checkTasks.Add(Task.Delay(_timeoutMilliseconds/2));
}
// Wrap the group of tasks in a task that will wait till they all finish
var allChecks = Task.WhenAll(checkTasks);
// I think WhenAny is supposed to return the first task that completes
bool didntTimeOut = Task.WhenAny(allChecks, Task.Delay(_timeoutMilliseconds)) == allChecks;
Assert.True(didntTimeOut);
}
What am I missing here?

I think you're confusing the workings of the When... calls with Wait....
Task.WhenAny doesn't return the first task to complete among those you pass to it. Rather, it returns a new Task that will be completed when any of the internal tasks finish. This means your equality check will always return false - the new task will never equal the previous one.
The behavior you're expecting seems similar to Task.WaitAny, which will block current execution until any of the internal tasks complete, and return the index of the completed task.
Using WaitAny, your code will look like this:
// Wrap the group of tasks in a task that will wait till they all finish
var allChecks = Task.WhenAll(checkTasks);
var taskIndexThatCompleted = Task.WaitAny(allChecks, Task.Delay(_timeoutMilliseconds));
Assert.AreEqual(0, taskIndexThatCompleted);

Related

C# async aggregate and dispatch

I'm having trouble assimilating the c# Task, async and await patterns.
Windows service, .NET v4.5.2 server-side.
I have a Windows service accepting a variety of sources of incoming records, arriving externally ad-hoc via a self-hosted web api. I would like to batch up these records and then forward them on to another service. If the number of batched records exceeds a threshold, the batch should be dispatched immediately. Furthermore, the batch as it stands should also be dispatched if a time interval has elapsed. This means that a record is never held for more than N seconds.
I'm struggling to fit this into a Task based async pattern.
In days gone by, I would have created a Thread, a ManualResetEvent and a System.Threading.Timer. The Thread would loop around a Wait on the reset event. The Timer would set the event when fired, as would the code doing the aggregation when the batch size exceeded the threshold. Following the Wait, the Thread would stop the Timer, do the dispatch (an HTTP Post), reset the Timer and clear the ManualResetEvent, the loop back and Wait.
However, I am seeing folk say that this is 'bad' as the Wait just blocks a valuable thread resource, and that async/await is my panacea.
First off, are they right? Is my way out-of-date and inefficient or can I JFDI?
I've found examples here for batching and here for tasks at intervals, but not a combination of the two.
Is this requirement actually compatible with async/await?

Actually, you're almost doing the right thing, and they are also partially right.
What you should know is that you should avoid idle threads, with long waiting on events or waiting for I/O to complete (waiting on locks with few contention and fast statement blocks or spinning loops with compare-and-swap are usually OK).
What most of them don't know is that tasks are not magic, for instance, Task.Delay uses a Timer (more exactly, a System.Threading.Timer) and waiting on a non-complete task ends up using a ManualResetEventSlim (an improvement over ManualResetEvent, as it doesn't create a Win32 event unless explicitly asked for, e.g. ((IAsyncResult)task).AsyncWaitHandle).
So yes, your requirements are achievable with async/await, or tasks in general.
Runnable example at .NET Fiddle:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
public class Record
{
private int n;
public Record(int n)
{
this.n = n;
}
public int N { get { return n; } }
}
public class RecordReceiver
{
// Arbitrary constants
// You should fetch value from configuration and define sensible defaults
private static readonly int threshold = 5;
// I chose a low value so the example wouldn't timeout in .NET Fiddle
private static readonly TimeSpan timeout = TimeSpan.FromMilliseconds(100);
// I'll use a Stopwatch to trace execution times
private readonly Stopwatch sw = Stopwatch.StartNew();
// Using a separate private object for locking
private readonly object lockObj = new object();
// The list of accumulated records to execute in a batch
private List<Record> records = new List<Record>();
// The most recent TCS to signal completion when:
// - the list count reached the threshold
// - enough time has passed
private TaskCompletionSource<IEnumerable<Record>> batchTcs;
// A CTS to cancel the timer-based task when the threshold is reached
// Not strictly necessary, but it reduces resource usage
private CancellationTokenSource delayCts;
// The task that will be completed when a batch of records has been dispatched
private Task dispatchTask;
// This method doesn't use async/await,
// because we're not doing an async flow here.
public Task ReceiveAsync(Record record)
{
Console.WriteLine("Received record {0} ({1})", record.N, sw.ElapsedMilliseconds);
lock (lockObj)
{
// When the list of records is empty, set up the next task
//
// TaskCompletionSource is just what we need, we'll complete a task
// not when we've finished some computation, but when we reach some criteria
//
// This is the main reason this method doesn't use async/await
if (records.Count == 0)
{
// I want the dispatch task to run on the thread pool
// In .NET 4.6, there's TaskCreationOptions.RunContinuationsAsynchronously
// .NET 4.6
//batchTcs = new TaskCompletionSource<IEnumerable<Record>>(TaskCreationOptions.RunContinuationsAsynchronously);
//dispatchTask = DispatchRecordsAsync(batchTcs.Task);
// Previously, we have to set up a continuation task using the default task scheduler
// .NET 4.5.2
batchTcs = new TaskCompletionSource<IEnumerable<Record>>();
var asyncContinuationsTask = batchTcs.Task
.ContinueWith(bt => bt.Result, TaskScheduler.Default);
dispatchTask = DispatchRecordsAsync(asyncContinuationsTask);
// Create a cancellation token source to be able to cancel the timer
//
// To be used when we reach the threshold, to release timer resources
delayCts = new CancellationTokenSource();
Task.Delay(timeout, delayCts.Token)
.ContinueWith(
dt =>
{
// When we hit the timer, take the lock and set the batch
// task as complete, moving the current records to its result
lock (lockObj)
{
// Avoid dispatching an empty list of records
//
// Also avoid a race condition by checking the cancellation token
//
// The race would be for the actual timer function to start before
// we had a chance to cancel it
if ((records.Count > 0) && !delayCts.IsCancellationRequested)
{
batchTcs.TrySetResult(new List<Record>(records));
records.Clear();
}
}
},
// Since our continuation function is fast, we want it to run
// ASAP on the same thread where the actual timer function runs
//
// Note: this is just a hint, but I trust it'll be favored most of the time
TaskContinuationOptions.ExecuteSynchronously);
// Remember that we want our batch task to have continuations
// running outside the timer thread, since dispatching records
// is probably too much work for a timer thread.
}
// Actually store the new record somewhere
records.Add(record);
// When we reach the threshold, set the batch task as complete,
// moving the current records to its result
//
// Also, cancel the timer task
if (records.Count >= threshold)
{
batchTcs.TrySetResult(new List<Record>(records));
delayCts.Cancel();
records.Clear();
}
// Return the last saved dispatch continuation task
//
// It'll start after either the timer or the threshold,
// but more importantly, it'll complete after it dispatches all records
return dispatchTask;
}
}
// This method uses async/await, since we want to use the async flow
internal async Task DispatchRecordsAsync(Task<IEnumerable<Record>> batchTask)
{
// We expect it to return a task right here, since the batch task hasn't had
// a chance to complete when the first record arrives
//
// Task.ConfigureAwait(false) allows us to run synchronously and on the same thread
// as the completer, but again, this is just a hint
//
// Remember we've set our task to run completions on the thread pool?
//
// With .NET 4.6, completing a TaskCompletionSource created with
// TaskCreationOptions.RunContinuationsAsynchronously will start scheduling
// continuations either on their captured SynchronizationContext or TaskScheduler,
// or forced to use TaskScheduler.Default
//
// Before .NET 4.6, completing a TaskCompletionSource could mean
// that continuations ran withing the completer, especially when
// Task.ConfigureAwait(false) was used on an async awaiter, or when
// Task.ContinueWith(..., TaskContinuationOptions.ExecuteSynchronously) was used
// to set up a continuation
//
// That's why, before .NET 4.6, we need to actually run a task for that effect,
// and we used Task.ContinueWith without TaskContinuationOptions.ExecuteSynchronously
// and with TaskScheduler.Default, to ensure it gets scheduled
//
// So, why am I using Task.ConfigureAwait(false) here anyway?
// Because it'll make a difference if this method is run from within
// a Windows Forms or WPF thread, or any thread with a SynchronizationContext
// or TaskScheduler that schedules tasks on a dedicated thread
var batchedRecords = await batchTask.ConfigureAwait(false);
// Async methods are transformed into state machines,
// much like iterator methods, but with async specifics
//
// What await actually does is:
// - check if the awaitable is complete
// - if so, continue executing
// Note: if every awaited awaitable is complete along an async method,
// the method will complete synchronously
// This is only expectable with tasks that have already completed
// or I/O that is always ready, e.g. MemoryStream
// - if not, return a task and schedule a continuation for just after the await expression
// Note: the continuation will resume the state machine on the next state
// Note: the returned task will complete on return or on exception,
// but that is something the compiled state machine will handle
foreach (var record in batchedRecords)
{
Console.WriteLine("Dispatched record {0} ({1})", record.N, sw.ElapsedMilliseconds);
// I used Task.Yield as a replacement for actual work
//
// It'll force the async state machine to always return here
// and shedule a continuation that reenters the async state machine right afterwards
//
// This is not something you usually want on production code,
// so please replace this with the actual dispatch
await Task.Yield();
}
}
}
public class Program
{
public static void Main()
{
// Our main entry point is synchronous, so we run an async entry point and wait on it
//
// The difference between MainAsync().Result and MainAsync().GetAwaiter().GetResult()
// is in the way exceptions are thrown:
// - the former aggregates exceptions, throwing an AggregateException
// - the latter doesn't aggregate exceptions if it doesn't have to, throwing the actual exception
//
// Since I'm not combining tasks (e.g. Task.WhenAll), I'm not expecting multiple exceptions
//
// If my main method returned int, I could return the task's result
// and I'd make MainAsync return Task<int> instead of just Task
MainAsync().GetAwaiter().GetResult();
}
// Async entry point
public static async Task MainAsync()
{
var receiver = new RecordReceiver();
// I'll provide a few records:
// - a delay big enough between the 1st and the 2nd such that the 1st will be dispatched
// - 8 records in a row, such that 5 of them will be dispatched, and 3 of them will wait
// - again, a delay big enough that will provoke the last 3 records to be dispatched
// - and a final record, which will wait to be dispatched
//
// We await for Task.Delay between providing records,
// but we'll await for the records in the end only
//
// That is, we'll not await each record before the next,
// as that would mean each record would only be dispatched after at least the timeout
var t1 = receiver.ReceiveAsync(new Record(1));
await Task.Delay(TimeSpan.FromMilliseconds(300));
var t2 = receiver.ReceiveAsync(new Record(2));
var t3 = receiver.ReceiveAsync(new Record(3));
var t4 = receiver.ReceiveAsync(new Record(4));
var t5 = receiver.ReceiveAsync(new Record(5));
var t6 = receiver.ReceiveAsync(new Record(6));
var t7 = receiver.ReceiveAsync(new Record(7));
var t8 = receiver.ReceiveAsync(new Record(8));
var t9 = receiver.ReceiveAsync(new Record(9));
await Task.Delay(TimeSpan.FromMilliseconds(300));
var t10 = receiver.ReceiveAsync(new Record(10));
// I probably should have used a list of records, but this is just an example
await Task.WhenAll(t1, t2, t3, t4, t5, t6, t7, t8, t9, t10);
}
}
You can make this more interesting, like returning a distinct task, such as Task<RecordDispatchReport>, from ReceiveAsync which is completed by the processing part of DispatchRecords, using a TaskCompletionSource for each record.

WaitAll for Changing List<Task>

Updated to explain things more clearly
I've got an application that runs a number of tasks. Some are created initially and other can be added later. I need need a programming structure that will wait on all the tasks to complete. Once the all the tasks complete some other code should run that cleans things up and does some final processing of data generated by the other tasks.
I've come up with a way to do this, but wouldn't call it elegant. So I'm looking to see if there is a better way.
What I do is keep a list of the tasks in a ConcurrentBag (a thread safe collection). At the start of the process I create and add some tasks to the ConcurrentBag. As the process does its thing if a new task is created that also needs to finish before the final steps I also add it to the ConcurrentBag.
Task.Wait accepts an array of Tasks as its argument. I can convert the ConcurrentBag into an array, but that array won't include any Tasks added to the Bag after Task.Wait was called.
So I have a two step wait process in a do while loop. In the body of the loop I do a simple Task.Wait on the array generated from the Bag. When it completes it means all the original tasks are done. Then in the while test I do a quick 1 millisecond test of a new array generated from the ConcurrentBag. If no new tasks were added, or any new tasks also completed it will return true, so the not condition exits the loop.
If it returns false (because a new task was added that didn't complete) we go back and do a non-timed Task.Wait. Then rinse and repeat until all new and old tasks are done.
// defined on the class, perhaps they should be properties
CancellationTokenSource Source = new CancellationTokenSource();
CancellationToken Token = Source.Token;
ConcurrentBag<Task> ToDoList = new ConcurrentBag<Task>();
public void RunAndWait() {
// start some tasks add them to the list
for (int i = 0; i < 12; i++)
{
Task task = new Task(() => SillyExample(Token), Token);
ToDoList.Add(task);
task.Start();
}
// now wait for those task, and any other tasks added to ToDoList to complete
try
{
do
{
Task.WaitAll(ToDoList.ToArray(), Token);
} while (! Task.WaitAll(ToDoList.ToArray(), 1, Token));
}
catch (OperationCanceledException e)
{
// any special handling of cancel we might want to do
}
// code that should only run after all tasks complete
}
Is there a more elegant way to do this?

I'd recommend using a ConcurrentQueue and removing items as you wait for them. Due to the first-in-first-out nature of queues, if you get to the point where there's nothing left in the queue, you know that you've waited for all the tasks that have been added up to that point.
ConcurrentQueue<Task> ToDoQueue = new ConcurrentQueue<Task>();
...
while(ToDoQueue.Count > 0 && !Token.IsCancellationRequested)
{
Task task;
if(ToDoQueue.TryDequeue(out task))
{
task.Wait(Token);
}
}

Here's a very cool way using Microsoft's Reactive Framework (NuGet "Rx-Main").
var taskSubject = new Subject<Task>();
var query = taskSubject.Select(t => Observable.FromAsync(() => t)).Merge();
var subscription =
query.Subscribe(
u => { /* Each Task Completed */ },
() => Console.WriteLine("All Tasks Completed."));
Now, to add tasks, just do this:
taskSubject.OnNext(Task.Run(() => { }));
taskSubject.OnNext(Task.Run(() => { }));
taskSubject.OnNext(Task.Run(() => { }));
And then to signal completion:
taskSubject.OnCompleted();
It is important to note that signalling completion doesn't complete the query immediately, it will wait for all of the tasks to finish too. Signalling completion just says that you will no longer add any new tasks.
Finally, if you want to cancel, then just do this:
subscription.Dispose();

Async and Await - How is order of execution maintained?

I am actually reading some topics about the Task Parallel Library and the asynchronous programming with async and await. The book "C# 5.0 in a Nutshell" states that when awaiting an expression using the await keyword, the compiler transforms the code into something like this:
var awaiter = expression.GetAwaiter();
awaiter.OnCompleted (() =>
{
var result = awaiter.GetResult();
Let's assume, we have this asynchronous function (also from the referred book):
async Task DisplayPrimeCounts()
{
for (int i = 0; i < 10; i++)
Console.WriteLine (await GetPrimesCountAsync (i*1000000 + 2, 1000000) +
" primes between " + (i*1000000) + " and " + ((i+1)*1000000-1));
Console.WriteLine ("Done!");
}
The call of the 'GetPrimesCountAsync' method will be enqueued and executed on a pooled thread. In general invoking multiple threads from within a for loop has the potential for introducing race conditions.
So how does the CLR ensure that the requests will be processed in the order they were made? I doubt that the compiler simply transforms the code into the above manner, since this would decouple the 'GetPrimesCountAsync' method from the for loop.

Just for the sake of simplicity, I'm going to replace your example with one that's slightly simpler, but has all of the same meaningful properties:
async Task DisplayPrimeCounts()
{
for (int i = 0; i < 10; i++)
{
var value = await SomeExpensiveComputation(i);
Console.WriteLine(value);
}
Console.WriteLine("Done!");
}
The ordering is all maintained because of the definition of your code. Let's imagine stepping through it.
This method is first called
The first line of code is the for loop, so i is initialized.
The loop check passes, so we go to the body of the loop.
SomeExpensiveComputation is called. It should return a Task<T> very quickly, but the work that it'd doing will keep going on in the background.
The rest of the method is added as a continuation to the returned task; it will continue executing when that task finishes.
After the task returned from SomeExpensiveComputation finishes, we store the result in value.
value is printed to the console.
GOTO 3; note that the existing expensive operation has already finished before we get to step 4 for the second time and start the next one.
As far as how the C# compiler actually accomplishes step 5, it does so by creating a state machine. Basically every time there is an await there's a label indicating where it left off, and at the start of the method (or after it's resumed after any continuation fires) it checks the current state, and does a goto to the spot where it left off. It also needs to hoist all local variables into fields of a new class so that the state of those local variables is maintained.
Now this transformation isn't actually done in C# code, it's done in IL, but this is sort of the morale equivalent of the code I showed above in a state machine. Note that this isn't valid C# (you cannot goto into a a for loop like this, but that restriction doesn't apply to the IL code that is actually used. There are also going to be differences between this and what C# actually does, but is should give you a basic idea of what's going on here:
internal class Foo
{
public int i;
public long value;
private int state = 0;
private Task<int> task;
int result0;
public Task Bar()
{
var tcs = new TaskCompletionSource<object>();
Action continuation = null;
continuation = () =>
{
try
{
if (state == 1)
{
goto state1;
}
for (i = 0; i < 10; i++)
{
Task<int> task = SomeExpensiveComputation(i);
var awaiter = task.GetAwaiter();
if (!awaiter.IsCompleted)
{
awaiter.OnCompleted(() =>
{
result0 = awaiter.GetResult();
continuation();
});
state = 1;
return;
}
else
{
result0 = awaiter.GetResult();
}
state1:
Console.WriteLine(value);
}
Console.WriteLine("Done!");
tcs.SetResult(true);
}
catch (Exception e)
{
tcs.SetException(e);
}
};
continuation();
}
}
Note that I've ignored task cancellation for the sake of this example, I've ignored the whole concept of capturing the current synchronization context, there's a bit more going on with error handling, etc. Don't consider this a complete implementation.

The call of the 'GetPrimesCountAsync' method will be enqueued and executed on a pooled thread.
No. await does not initiate any kind of background processing. It waits for existing processing to complete. It is up to GetPrimesCountAsync to do that (e.g. using Task.Run). It's more clear this way:
var myRunningTask = GetPrimesCountAsync();
await myRunningTask;
The loop only continues when the awaited task has completed. There is never more than one task outstanding.
So how does the CLR ensure that the requests will be processed in the order they were made?
The CLR is not involved.
I doubt that the compiler simply transforms the code into the above manner, since this would decouple the 'GetPrimesCountAsync' method from the for loop.
The transform that you shows is basically right but notice that the next loop iteration is not started right away but in the callback. That's what serializes execution.

c# Executing Multiple calls in Parallel

I'm looping through an Array of values, for each value I want to execute a long running process. Since I have multiple tasks to be performed that have no inter dependency I want to be able to execute them in parallel.
My code is:
List<Task<bool>> dependantTasksQuery = new List<Task<bool>>();
foreach (int dependantID in dependantIDList)
{
dependantTasksQuery.Add(WaitForDependantObject(dependantID));
}
Task<bool>[] dependantTasks = dependantTasksQuery.ToArray();
//Wait for all dependant tasks to complete
bool[] lengths = await Task.WhenAll(dependantTasks);
The WaitForDependantObject method just looks like:
async Task<bool> WaitForDependantObject(int idVal)
{
System.Threading.Thread.Sleep(20000);
bool waitDone = true;
return waitDone;
}
As you can see I've just added a sleep to highlight my issue. What is happening when debugging is that on the line:
dependantTasksQuery.Add(WaitForDependantObject(dependantID));
My code is stopping and waiting the 20 seconds for the method to complete. I did not want to start the execution until I had completed the loop and built up the Array. Can somebody point me to what I'm doing wrong? I'm pretty sure I need an await somewhere

In your case WaitForDependantObject isn't asynchronous at all even though it returns a task. If that's your goal do as Luke Willis suggests. To make these calls both asynchronous and truly parallel you need to offload them to a Thread Pool thread with Task.Run:
bool[] lengths = await Task.WhenAll(dependantIDList.Select(() => Task.Run(() => WaitForDependantObject(dependantID))));
async methods run synchronously until an await is reached and them returns a task representing the asynchronous operation. In your case you don't have an await so the methods simply execute one after the other. Task.Run uses multiple threads to enable parallelism even on these synchronous parts on top of the concurrency of awaiting all the tasks together with Task.WhenAll.
For WaitForDependantObject to represent an async method more accurately it should look like this:
async Task<bool> WaitForDependantObject(int idVal)
{
await Task.Delay(20000);
return true;
}

Use Task.Delay to make method asynchronous and looking more real replacement of mocked code:
async Task<bool> WaitForDependantObject(int idVal)
{
// how long synchronous part of method takes (before first await)
System.Threading.Thread.Sleep(1000);
// method returns as soon as awiting started
await Task.Delay(2000); // how long IO or other async operation takes place
// simulate data processing, would run on new thread unless
// used in WPF/WinForms/ASP.Net and no call to ConfigureAwait(false) made by caller.
System.Threading.Thread.Sleep(1000);
bool waitDone = true;
return waitDone;
}

You can do this using Task.Factory.StartNew.
Replace this:
dependantTasksQuery.Add(WaitForDependantObject(dependantID));
with this:
dependantTasksQuery.Add(
Task.Factory.StartNew(
() => WaitForDependantObject(dependantID)
)
);
This will run your method within a new Task and add the task to your List.
You will also want to change the method signature of WaitForDependantObject to be:
bool WaitForDependantObject(int idVal)
You can then wait for your tasks to complete with:
Task.WaitAll(dependentTasksQuery.ToArray());
And get your results with:
bool[] lengths = dependentTasksQuery.Select(task => task.Result).ToArray();

.NET Framework 4.0: Chaining tasks in a loop

I want to chain multiple Tasks, so that when one ends the next one starts. I know I can do this using ContinueWith. But what if I have a large number of tasks, so that:
t1 continues with t2
t2 continues with t3
t3 continues with t4
...
Is there a nice way to do it, other than creating this chain manually using a loop?

Well, assuming you have some sort of enumerable of Action delegates or something you want to do, you can easily use LINQ to do the following:
// Create the base task. Run synchronously.
var task = new Task(() => { });
task.RunSynchronously();
// Chain them all together.
var query =
// For each action
from action in actions
// Assign the task to the continuation and
// return that.
select (task = task.ContinueWith(action));
// Get the last task to wait on.
// Note that this cannot be changed to "Last"
// because the actions enumeration could have no
// elements, meaning that Last would throw.
// That means task can be null, so a check
// would have to be performed on it before
// waiting on it (unless you are assured that
// there are items in the action enumeration).
task = query.LastOrDefault();
The above code is really your loop, just in a fancier form. It does the same thing in that it takes the previous task (after primed with a dummy "noop" Task) and then adds a continuation in the form of ContinueWith (assigning the continuation to the current task in the process for the next iteration of the loop, which is performed when LastOrDefault is called).

You may use static extensions ContinueWhenAll here.
So you can pass multiple tasks.
Update
You can use a chaining extension such as this:
public static class MyTaskExtensions
{
public static Task BuildChain(this Task task,
IEnumerable<Action<Task>> actions)
{
if (!actions.Any())
return task;
else
{
Task continueWith = task.ContinueWith(actions.First());
return continueWith.BuildChain(actions.Skip(1));
}
}
}

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Execute set of tasks in parallel but with a group timeout - c#

Related

C# async aggregate and dispatch

WaitAll for Changing List<Task>

Async and Await - How is order of execution maintained?

c# Executing Multiple calls in Parallel

.NET Framework 4.0: Chaining tasks in a loop

Categories

Resources