Working with Lists and Parallel.Invoke - c#

I'm attempting to run methods using Parallel.Invoke, with each method appending the response to a list outside of Parallel.Invoke
I've been playing around using a lock, but the following code doesn't work
var allResults = new List<ResultRecord>();
var sync = new object();
Parallel.Invoke(
() => { var results = GetResultSet1(); lock (sync) { allResults.Concat(results); } },
() => { var results = GetResultSet2(); lock (sync) { allResults.Concat(results); } });
This code doesn't setup the list, allResults ends up being empty.

As already explained by PetSerAl in a comment, Concat() does not modify the list, it returns the modified list. Using AddRange() instead of Concat() is a solution, but I think using Tasks is clearer than Parallel.Invoke() here:
var resultSet1Task = Task.Run(() => GetResultSet1());
var resultSet2Task = Task.Run(() => GetResultSet2());
List<ResultRecord> allResults = resultSet1Task.Result.Concat(resultSet2Task.Result).ToList();
This way, there is no need for explicit locking, which makes the code safer.
If you can, you could also use await instead of .Result.

Related

How to bundle async tasks for Task.WhenAll?

I fire up some async tasks in parallel like the following example:
var BooksTask = _client.GetBooks(clientId);
var ExtrasTask = _client.GetBooksExtras(clientId);
var InvoicesTask = _client.GetBooksInvoice(clientId);
var ReceiptsTask = _client.GetBooksRecceipts(clientId);
await Task.WhenAll(
BooksTask,
ExtrasTask,
InvoicesTask,
ReceiptsTask
);
model.Books = BooksTask.Result;
model.Extras = ExtrasTask.Result;
model.Invoices = InvoicesTask.Result;
model.Receipts = ReceiptsTask.Result;
This results in a lot of typing. I searched the .Net Framework for a way to shorten this up. I imagine it to be lile this. I call the class Collector as I don't know how to name the concept.
var collector = new Collector();
collector.Bind(_client.GetBooks(clientId), out model.Books);
collector.Bind(_client.GetBooksExtras(clientId), out model.Extras);
collector.Bind(_client.GetBooksInvoice(clientId), out model.Invoices);
collector.Bind(_client.GetBooksRecceipts(clientId), out model.Receipts);
collector.Run();
Is this a valid approach? Is there something like that?
Personally, I prefer the code in the question (but using await instead of Result for code maintainability reasons). As noted in andyb952's answer, the Task.WhenAll is not required. I do prefer it for readability reasons; it makes the semantics explicit and IMO makes the code easier to read.
I searched the .Net Framework for a way to shorten this up.
There isn't anything built-in, nor (to my knowledge) any libraries for this. I've thought about writing one using tuples. For your code, it would look like this:
public static class TaskHelpers
{
public static async Task<(T1, T2, T3, T4)> WhenAll<T1, T2, T3, T4>(Task<T1> task1, Task<T2> task2, Task<T3> task3, Task<T4> task4)
{
await Task.WhenAll(task1, task2, task3, task4).ConfigureAwait(false);
return (await task1, await task2, await task3, await task4);
}
}
With this helper in place, your original code simplifies to:
(model.Books, model.Extras, model.Invoices, model.Receipts) = await TaskHelpers.WhenAll(
_client.GetBooks(clientId),
_client.GetBooksExtras(clientId),
_client.GetBooksInvoice(clientId),
_client.GetBooksRecceipts(clientId)
);
But is it really more readable? So far, I have not been convinced enough to make this into a library.
In this case I believe that the WhenAll is kind of irrelevant as you are using the results immediately after. Changing to this will have the same effect.
var BooksTask = _client.GetBooks(clientId);
var ExtrasTask = _client.GetBooksExtras(clientId);
var InvoicesTask = _client.GetBooksInvoice(clientId);
var ReceiptsTask = _client.GetBooksRecceipts(clientId);
model.Books = await BooksTask;
model.Extras = await ExtrasTask;
model.Invoices = await InvoicesTask;
model.Receipts = await ReceiptsTask;
The awaits will take care of ensuring you don't move past the 4 later assignments until the tasks are all completed
As pointed out in andyb952's answer, in this case it's not really needed to call Task.WhenAll since all the tasks are hot and running.
But, there are situations where you may still desire to have an AsyncCollector type.
TL;DR:
Async helper function usage example
async Task Async(Func<Task> asyncDelegate) =>
await asyncDelegate().ConfigureAwait(false);
AsyncCollector implementation, usage example
var collector = new AsyncCollector();
collector.Register(async () => model.Books = await _client.GetBooks(clientId));
collector.Register(async () => model.Extras = await _client.GetBooksExtras(clientId));
collector.Register(async () => model.Invoices = await _client.GetBooksInvoice(clientId));
collector.Register(async () => model.Receipts = await _client.GetBooksReceipts(clientId));
await collector.WhenAll();
If you're worried about closures, see the note at the end.
Let's see why someone would want that.
This is the solution that runs the tasks concurrently:
var task1 = _client.GetFooAsync();
var task2 = _client.GetBarAsync();
// Both tasks are running.
var v1 = await task1;
var v2 = await task2;
// It doesn't matter if task2 completed before task1:
// at this point both tasks completed and they ran concurrently.
The problem
What about when you don't know how many tasks you'll use?
In this scenario, you can't define the task variables at compile time.
Storing the tasks in a collection, alone, won't solve the problem, since the result of each task was meant to be assigned to a specific variable!
var tasks = new List<Task<string>>();
foreach (var translation in translations)
{
var translationTask = _client.TranslateAsync(translation.Eng);
tasks.Add(translationTask);
}
await Task.WhenAll(tasks);
// Now there are N completed tasks, each with a value that
// should be associated to the translation instance that
// was used to generate the async operation.
Solutions
A workaround would be to assign the values based on the index of the task, which of course only works if the tasks were created (and stored) in the same order of the items:
await Task.WhenAll(tasks);
for (int i = 0; i < tasks.Count; i++)
translations[i].Value = await tasks[i];
A more appropriate solution would be to use Linq and generate a Task that identifies two operations: the fetch of the data and the assignment to its receiver
List<Task> translationTasks = translations
.Select(async t => t.Value = await _client.TranslateAsync(t.Eng))
// Enumerating the result of the Select forces the tasks to be created.
.ToList();
await Task.WhenAll(translationTasks);
// Now all the translations have been fetched and assigned to the right property.
This looks ok, until you need to execute the same pattern on another list, or another single value, then you start to have many List<Task> and Task inside your function that you need to manage:
var translationTasks = translations
.Select(async t => t.Value = await _client.TranslateAsync(t.Eng))
.ToList();
var fooTasks = foos
.Select(async f => f.Value = await _client.GetFooAsync(f.Id))
.ToList();
var bar = ...;
var barTask = _client.GetBarAsync(bar.Id);
// Now all tasks are running concurrently, some are also assigning the value
// to the right property, but now the "await" part is a bit more cumbersome.
bar.Value = await barTask;
await Task.WhenAll(translationTasks);
await Task.WhenAll(fooTasks);
A cleaner solution (imho)
In this situations, I like to use a helper function that wraps an async operation (any kind of operation), very similar to how the tasks are created with Select above:
async Task Async(Func<Task> asyncDelegate) =>
await asyncDelegate().ConfigureAwait(false);
Using this function in the previous scenario results in this code:
var tasks = new List<Task>();
foreach (var t in translations)
{
// The fetch of the value and its assignment are wrapped by the Task.
var fetchAndAssignTask = Async(async t =>
{
t.Value = await _client.TranslateAsync(t.Eng);
});
tasks.Add(fetchAndAssignTask);
}
foreach (var f in foos)
// Short syntax
tasks.Add(Async(async f => f.Value = await _client.GetFooAsync(f.Id)));
// It works even without enumerables!
var bar = ...;
tasks.Add(Async(async () => bar.Value = await _client.GetBarAsync(bar.Id)));
await Task.WhenAll(tasks);
// Now all the values have been fetched and assigned to their receiver.
Here you can find a full example of using this helper function, which without the comments becomes:
var tasks = new List<Task>();
foreach (var t in translations)
tasks.Add(Async(async t => t.Value = await _client.TranslateAsync(t.Eng)));
foreach (var f in foos)
tasks.Add(Async(async f => f.Value = await _client.GetFooAsync(f.Id)));
tasks.Add(Async(async () => bar.Value = await _client.GetBarAsync(bar.Id)));
await Task.WhenAll(tasks);
The AsyncCollector type
This technique can be easily wrapped inside a "Collector" type:
class AsyncCollector
{
private readonly List<Task> _tasks = new List<Task>();
public void Register(Func<Task> asyncDelegate) => _tasks.Add(asyncDelegate());
public Task WhenAll() => Task.WhenAll(_tasks);
}
Here a full implementation and here an usage example.
Note: as pointed out in the comments, there are risks involved when using closures and enumerators, but from C# 5 onwards the use of foreach is safe because closures will close over a fresh copy of the variable each time.
It you still would like to use this type with a previous version of C# and need the safety during closure, the Register method can be changed in order to accept a subject that will be used inside the delegate, avoiding closures.
public void Register<TSubject>(TSubject subject, Func<TSubject, Task> asyncDelegate)
{
var task = asyncDelegate(subject);
_tasks.Add(task);
}
The code then becomes:
var collector = new AsyncCollector();
foreach (var translation in translations)
// Register translation as a subject, and use it inside the delegate as "t".
collector.Register(translation,
async t => t.Value = await _client.TranslateAsync(t.Eng));
foreach (var foo in foos)
collector.Register(foo, async f.Value = await _client.GetFooAsync(f.Id));
collector.Register(bar, async b => b.Value = await _client.GetBarAsync(bar.Id));
await collector.WhenAll();

How to force Paralled.Invoke to wait for all tasks to complete? [duplicate]

This question already has answers here:
Running multiple async tasks and waiting for them all to complete
(10 answers)
Closed 4 years ago.
I am using Parallel.invoke for executing multiple tasks but it returns the response before all the operations are completed. I want to use await keyboard while calling methods as all the methods are async but it doesn't allow me to do so. Also I know I can use Task.WhenAll() but the requirement is to use Parallel.Invoke so is there any way to achieve my goal?
var list = new List<string>();
var myArr = new string[] { "usa", "eng", "fra" };
Parallel.Invoke(() =>
{
myArr.Select(i => {
var data= MyMethod(i);
list.Add(1, data);
});
});
and the MyMethod implementation
public async Task<string> GetName(string country)
{
var data = await _db.GetAll().FindAsync(country).Capital;
return data;
}
Also I have another question is there any better approach to call the method 3 or 4 times inside Parallel.Invoke?
Also I know I can use Task.WhenAll() but the requirement is to use
Parallel.Invoke so is there any way to achieve my goal?
Your requirement to use Parallel.Invoke is likely wrong.
Your GetName method is IO bound. There is very little benefit in threading this. All you're doing is spawning more threads that you then have to await the async result of. This is less efficient than just looping.
It would make more sense to run this asynchronously. That way you don't use unnecessary threads but you use your main thread more effectively. You can do this using Task.WhenAll:
var list = new List<string>();
var myArr = new string[] { "usa", "eng", "fra" };
IEnumerable<Task<string>> myTasks = myArr.Select(i => {
GetName(i);
});
await Task.WhenAll(myTasks);
foreach(Task<string> task in myTasks)
{
string result = task.Result;
}
var list = new List<string>();
var myArr = new string[] { "usa", "eng", "fra" };
var tasks = myArr.Select(o => GetName(o));
var results = await Task.WhenAll(tasks);

async i/o and process results as they become available

I has a simple console app where I want to call many Urls in a loop and put the result in a database table. I am using .Net 4.5 and using async i/o to fetch the URL data. Here is a simplified version of what I am doing. All methods are async except for the database operation. Do you guys see any issues with this? Are there better ways of optimizing?
private async Task Run(){
var items = repo.GetItems(); // sync method to get list from database
var tasks = new List<Task>();
// add each call to task list and process result as it becomes available
// rather than waiting for all downloads
foreach(Item item in items){
tasks.Add(GetFromWeb(item.url).ContinueWith(response => { AddToDatabase(response.Result);}));
}
await Task.WhenAll(tasks); // wait for all tasks to complete.
}
private async Task<string> GetFromWeb(url) {
HttpResponseMessage response = await GetAsync(url);
return await response.Content.ReadAsStringAsync();
}
private void AddToDatabase(string item){
// add data to database.
}
Your solution is acceptable. But you should check out TPL Dataflow, which allows you to set up a dataflow "mesh" (or "pipeline") and then shove the data through it.
For a problem this simple, Dataflow won't really add much other than getting rid of the ContinueWith (I always find manual continuations awkward). But if you plan to add more steps or change your data flow in the future, Dataflow should be something you consider.
Your solution is pretty much correct, with just two minor mistakes (both of which cause compiler errors). First, you don't call ContinueWith on the result of List.Add, you need call continue with on the task and then add the continuation to your list, this is solved by just moving a parenthesis. You also need to call Result on the reponse Task.
Here is the section with the two minor changes:
tasks.Add(GetFromWeb(item.url)
.ContinueWith(response => { AddToDatabase(response.Result);}));
Another option is to leverage a method that takes a sequence of tasks and orders them by the order that they are completed. Here is my implementation of such a method:
public static IEnumerable<Task<T>> Order<T>(this IEnumerable<Task<T>> tasks)
{
var taskList = tasks.ToList();
var taskSources = new BlockingCollection<TaskCompletionSource<T>>();
var taskSourceList = new List<TaskCompletionSource<T>>(taskList.Count);
foreach (var task in taskList)
{
var newSource = new TaskCompletionSource<T>();
taskSources.Add(newSource);
taskSourceList.Add(newSource);
task.ContinueWith(t =>
{
var source = taskSources.Take();
if (t.IsCanceled)
source.TrySetCanceled();
else if (t.IsFaulted)
source.TrySetException(t.Exception.InnerExceptions);
else if (t.IsCompleted)
source.TrySetResult(t.Result);
}, CancellationToken.None, TaskContinuationOptions.PreferFairness, TaskScheduler.Default);
}
return taskSourceList.Select(tcs => tcs.Task);
}
Using this your code can become:
private async Task Run()
{
IEnumerable<Item> items = repo.GetItems(); // sync method to get list from database
foreach (var task in items.Select(item => GetFromWeb(item.url))
.Order())
{
await task.ConfigureAwait(false);
AddToDatabase(task.Result);
}
}
Just though I'd throw in my hat as well with the Rx solution
using System.Reactive;
using System.Reactive.Linq;
private Task Run()
{
var fromWebObservable = from item in repo.GetItems.ToObservable(Scheduler.Default)
select GetFromWeb(item.url);
fromWebObservable
.Select(async x => await x)
.Do(AddToDatabase)
.ToTask();
}

Speculative execution using the TPL

I have a List<Task<bool>> that I want to enumerate in parallel finding the first task to complete with a result of true and not waiting for or observe exceptions on any of the other tasks still pending.
var tasks = new List<Task<bool>>
{
Task.Delay(2000).ContinueWith(x => false),
Task.Delay(0).ContinueWith(x => true),
};
I have tried to use PLINQ to do something like:
var task = tasks.AsParallel().FirstOrDefault(t => t.Result);
Which executes in parallel, but doesn't return as soon as it finds a satisfying result. because accessing the Result property is blocking. In order for this to work using PLINQ, I'd have to write this aweful statement:
var cts = new CancellationTokenSource();
var task = tasks.AsParallel()
.FirstOrDefault(t =>
{
try
{
t.Wait(cts.Token);
if (t.Result)
{
cts.Cancel();
}
return t.Result;
}
catch (OperationCanceledException)
{
return false;
}
} );
I've written up an extension method that yields tasks as they complete like so.
public static class Exts
{
public static IEnumerable<Task<T>> InCompletionOrder<T>(this IEnumerable<Task<T>> source)
{
var tasks = source.ToList();
while (tasks.Any())
{
var t = Task.WhenAny(tasks);
yield return t.Result;
tasks.Remove(t.Result);
}
}
}
// and run like so
var task = tasks.InCompletionOrder().FirstOrDefault(t => t.Result);
But it feels like this is something common enough that there is a better way. Suggestions?
Maybe something like this?
var tcs = new TaskCompletionSource<Task<bool>>();
foreach (var task in tasks)
{
task.ContinueWith((t, state) =>
{
if (t.Result)
{
((TaskCompletionSource<Task<bool>>)state).TrySetResult(t);
}
},
tcs,
TaskContinuationOptions.OnlyOnRanToCompletion |
TaskContinuationOptions.ExecuteSynchronously);
}
var firstTaskToComplete = tcs.Task;
Perhaps you could try the Rx.Net library. Its very good for in effect providing Linq to Work.
Try this snippet in LinqPad after you reference the Microsoft Rx.Net assemblies.
using System
using System.Linq
using System.Reactive.Concurrency
using System.Reactive.Linq
using System.Reactive.Threading.Tasks
using System.Threading.Tasks
void Main()
{
var tasks = new List<Task<bool>>
{
Task.Delay(2000).ContinueWith(x => false),
Task.Delay(0).ContinueWith(x => true),
};
var observable = (from t in tasks.ToObservable()
//Convert task to an observable
let o = t.ToObservable()
//SelectMany
from x in o
select x);
var foo = observable
.SubscribeOn(Scheduler.Default) //Run the tasks on the threadpool
.ToList()
.First();
Console.WriteLine(foo);
}
First, I don't understand why are you trying to use PLINQ here. Enumerating a list of Tasks shouldn't take long, so I don't think you're going to gain anything from parallelizing it.
Now, to get the first Task that already completed with true, you can use the (non-blocking) IsCompleted property:
var task = tasks.FirstOrDefault(t => t.IsCompleted && t.Result);
If you wanted to get a collection of Tasks, ordered by their completion, have a look at Stephen Toub's article Processing tasks as they complete. If you want to list those that return true first, you would need to modify that code. If you don't want to modify it, you can use a version of this approach from Stephen Cleary's AsyncEx library.
Also, in the specific case in your question, you could “fix” your code by adding .WithMergeOptions(ParallelMergeOptions.NotBuffered) to the PLINQ query. But doing so still wouldn't work most of the time and can waste threads a lot even when it does. That's because PLINQ uses a constant number of threads and partitioning and using Result would block those threads most of the time.

List of objects with async Task methods, execute all concurrently

Given the following:
BlockingCollection<MyObject> collection;
public class MyObject
{
public async Task<ReturnObject> DoWork()
{
(...)
return await SomeIOWorkAsync();
}
}
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit though (I believe the Task Factory/ThreadPool does some management here)?
You can make use of the WhenAll extension method.
var combinedTask = await Task.WhenAll(collection.Select(x => x.DoWork());
It will start all tasks concurrently and waits for all to finish.
ThreadPool manages the number of threads running, but that won't help you much with asynchronous Tasks.
Because of that, you need something else. One way to do this is to utilize ActionBlock from TPL Dataflow:
int limit = …;
IEnumerable<MyObject> collection = …;
var block = new ActionBlock<MyObject>(
o => o.DoWork(),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = limit });
foreach (var obj in collection)
block.Post(o);
block.Complete();
await block.Completion;
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit
The easiest way to do that is with Task.WhenAll:
ReturnObject[] results = await Task.WhenAll(collection.Select(x => x.DoWork()));
This will invoke DoWork on all MyObjects in the collection and then wait for them all to complete. The thread pool handles all throttling sensibly.
Is there a different way if I want to capture every individual DoWork() return immediately instead of waiting for all items to complete?
Yes, you can use the method described by Jon Skeet and Stephen Toub. I have a similar solution in my AsyncEx library (available via NuGet), which you can use like this:
// "tasks" is of type "Task<ReturnObject>[]"
var tasks = collection.Select(x => x.DoWork()).OrderByCompletion();
foreach (var task in tasks)
{
var result = await task;
...
}
My comment was a bit cryptic, so I though I'd add this answer:
List<Task<ReturnObject>> workTasks =
collection.Select( o => o.DoWork() ).ToList();
List<Task> resultTasks =
workTasks.Select( o => o.ContinueWith( t =>
{
ReturnObject r = t.Result;
// do something with the result
},
// if you want to run this on the UI thread
TaskScheduler.FromCurrentSynchronizationContext()
)
)
.ToList();
await Task.WhenAll( resultTasks );

Categories

Resources