What are the differences between using Parallel.ForEach or Task.Run() to start a set of tasks asynchronously?
Version 1:
List<string> strings = new List<string> { "s1", "s2", "s3" };
Parallel.ForEach(strings, s =>
{
DoSomething(s);
});
Version 2:
List<string> strings = new List<string> { "s1", "s2", "s3" };
List<Task> Tasks = new List<Task>();
foreach (var s in strings)
{
Tasks.Add(Task.Run(() => DoSomething(s)));
}
await Task.WhenAll(Tasks);
In this case, the second method will asynchronously wait for the tasks to complete instead of blocking.
However, there is a disadvantage to use Task.Run in a loop- With Parallel.ForEach, there is a Partitioner which gets created to avoid making more tasks than necessary. Task.Run will always make a single task per item (since you're doing this), but the Parallel class batches work so you create fewer tasks than total work items. This can provide significantly better overall performance, especially if the loop body has a small amount of work per item.
If this is the case, you can combine both options by writing:
await Task.Run(() => Parallel.ForEach(strings, s =>
{
DoSomething(s);
}));
Note that this can also be written in this shorter form:
await Task.Run(() => Parallel.ForEach(strings, DoSomething));
The first version will synchronously block the calling thread (and run some of the tasks on it).
If it's a UI thread, this will freeze the UI.
The second version will run the tasks asynchronously in the thread pool and release the calling thread until they're done.
There are also differences in the scheduling algorithms used.
Note that your second example can be shortened to
await Task.WhenAll(strings.Select(s => Task.Run(() => DoSomething(s))));
I have seen Parallel.ForEach used inappropriately, and I figured an example in this question would help.
When you run the code below in a Console app, you will see how the tasks executed in Parallel.ForEach doesn't block the calling thread. This could be okay if you don't care about the result (positive or negative) but if you do need the result, you should make sure to use Task.WhenAll.
using System;
using System.Linq;
using System.Threading.Tasks;
namespace ParrellelEachExample
{
class Program
{
static void Main(string[] args)
{
var indexes = new int[] { 1, 2, 3 };
RunExample((prefix) => Parallel.ForEach(indexes, (i) => DoSomethingAsync(i, prefix)),
"Parallel.Foreach");
Console.ForegroundColor = ConsoleColor.Yellow;
Console.WriteLine("*You'll notice the tasks haven't run yet, because the main thread was not blocked*");
Console.WriteLine("Press any key to start the next example...");
Console.ReadKey();
RunExample((prefix) => Task.WhenAll(indexes.Select(i => DoSomethingAsync(i, prefix)).ToArray()).Wait(),
"Task.WhenAll");
Console.WriteLine("All tasks are done. Press any key to close...");
Console.ReadKey();
}
static void RunExample(Action<string> action, string prefix)
{
Console.ForegroundColor = ConsoleColor.White;
Console.WriteLine($"{Environment.NewLine}Starting '{prefix}'...");
action(prefix);
Console.WriteLine($"{Environment.NewLine}Finished '{prefix}'{Environment.NewLine}");
}
static async Task DoSomethingAsync(int i, string prefix)
{
await Task.Delay(i * 1000);
Console.WriteLine($"Finished: {prefix}[{i}]");
}
}
}
Here is the result:
Conclusion:
Using the Parallel.ForEach with a Task will not block the calling thread. If you care about the result, make sure to await the tasks.
I ended up doing this, as it felt easier to read:
List<Task> x = new List<Task>();
foreach(var s in myCollectionOfObject)
{
// Note there is no await here. Just collection the Tasks
x.Add(s.DoSomethingAsync());
}
await Task.WhenAll(x);
Related
I'm currently working on a concurrent file downloader.
For that reason I want to parametrize the number of concurrent tasks. I don't want to wait for all the tasks to be completed but to keep the same number being runned.
In fact, this thread on star overflow gave me a proper clue, but I'm struggling making it async:
Keep running a specific number of tasks
Here is my code:
public async Task StartAsync()
{
var semaphore = new SemaphoreSlim(1, _concurrentTransfers);
var queueHasMessages = true;
while (queueHasMessages)
{
try {
await Task.Run(async () =>
{
await semaphore.WaitAsync();
await asyncStuff();
});
}
finally {
semaphore.Release();
};
}
}
But the code just get executed one at a time. I think that the await is blocking me for generating the desired amount of tasks, but I don't know how to avoid it while respecting the limit established by the semaphore.
If I add all the tasks to a list and make a whenall, the semaphore throws an exception since it has reached the max count.
Any suggestions?
It was brought to my attention that the struck-through solution will drop any exceptions that occur during execution. That's bad.
Here is a solution that will not drop exceptions:
Task.Run is a Factory Method for creating a Task. You can check yourself with the intellisense return value. You can assign the returned Task anywhere you like.
"await" is an operator that will wait until the task it operates on completes. You are able to use any Task with the await operator.
public static async Task RunTasksConcurrently()
{
IList<Task> tasks = new List<Task>();
for (int i = 1; i < 4; i++)
{
tasks.Add(RunNextTask());
}
foreach (var task in tasks) {
await task;
}
}
public static async Task RunNextTask()
{
while(true) {
await Task.Delay(500);
}
}
By adding the values of the Task we create to a list, we can await them later on in execution.
Previous Answer below
Edit: With the clarification I think I understand better.
Instead of running every task at once, you want to start 3 tasks, and as soon as a task is finished, run the next one.
I believe this can happen using the .ContinueWith(Action<Task>) method.
See if this gets closer to your intended solution.
public void SpawnInitialTasks()
{
for (int i = 0; i < 3; i++)
{
RunNextTask();
}
}
public void RunNextTask()
{
Task.Run(async () => await Task.Delay(500))
.ContinueWith(t => RunNextTask());
// Recurse here to keep running tasks whenever we finish one.
}
The idea is that we spawn 3 tasks right away, then whenever one finishes we spawn the next. If you need to keep data flowing between the tasks, you can use parameters:
RunNextTask(DataObject object)
You can do this easily the old-fashioned way without using await by using Parallel.ForEach(), which lets you specify the maximum number of concurrent threads to use.
For example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
class Program
{
public static void Main(string[] args)
{
IEnumerable<string> filenames = Enumerable.Range(1, 100).Select(x => x.ToString());
Parallel.ForEach(
filenames,
new ParallelOptions { MaxDegreeOfParallelism = 4},
download
);
}
static void download(string filepath)
{
Console.WriteLine("Downloading " + filepath);
Thread.Sleep(1000); // Simulate downloading time.
Console.WriteLine("Downloaded " + filepath);
}
}
}
If you run this and observe the output, you'll see that the "files" are being "downloaded" in batchs.
A better simulation is the change download() so that it takes a random amount of time to process each "file", like so:
static Random rng = new Random();
static void download(string filepath)
{
Console.WriteLine("Downloading " + filepath);
Thread.Sleep(500 + rng.Next(1000)); // Simulate random downloading time.
Console.WriteLine("Downloaded " + filepath);
}
Try that and see the difference in the output.
However, if you want a more modern way to do this, you could look into the Dataflow part of the TPL (Task Parallel Library) - this works well with async methods.
This is a lot more complicated to get to grips with, but it's a lot more powerful. You could use an ActionBlock to do it, but describing how to do that is a bit beyond the scope of an answer I could give here.
Have a look at this other answer on StackOverflow; it gives a brief example.
Also note that the TPL is not built in to .Net - you have to get it from NuGet.
I have three methods that I call to do some number crunching that are as follows
results.LeftFront.CalcAi();
results.RightFront.CalcAi();
results.RearSuspension.CalcAi(geom, vehDef.Geometry.LTa.TaStiffness, vehDef.Geometry.RTa.TaStiffness);
Each of the functions is independent of each other and can be computed in parallel with no dead locks.
What is the easiest way to compute these in parallel without the containing method finishing until all three are done?
See the TPL documentation. They list this sample:
Parallel.Invoke(() => DoSomeWork(), () => DoSomeOtherWork());
So in your case this should just work:
Parallel.Invoke(
() => results.LeftFront.CalcAi(),
() => results.RightFront.CalcAi(),
() => results.RearSuspension.CalcAi(geom,
vehDef.Geometry.LTa.TaStiffness,
vehDef.Geometry.RTa.TaStiffness));
EDIT: The call returns after all actions have finished executing. Invoke() is does not guarantee that they will indeed run in parallel, nor does it guarantee the order in which the actions execute.
You can do this with tasks too (nicer if you later need Cancellation or something like results)
var task1 = Task.Factory.StartNew(() => results.LeftFront.CalcAi());
var task2 = Task.Factory.StartNew(() => results.RightFront.CalcAi());
var task3 = Task.Factory.StartNew(() =>results.RearSuspension.CalcAi(geom,
vehDef.Geometry.LTa.TaStiffness,
vehDef.Geometry.RTa.TaStiffness));
Task.WaitAll(task1, task2, task3);
In .NET 4, Microsoft introduced the Task Parallel Library which was designed to handle this kind of problem, see Parallel Programming in the .NET Framework.
To run parallel methods which are independent of each other ThreadPool.QueueUserWorkItem can also be used. Here is the sample method-
public static void ExecuteParallel(params Action[] tasks)
{
// Initialize the reset events to keep track of completed threads
ManualResetEvent[] resetEvents = new ManualResetEvent[tasks.Length];
// Launch each method in it's own thread
for (int i = 0; i < tasks.Length; i++)
{
resetEvents[i] = new ManualResetEvent(false);
ThreadPool.QueueUserWorkItem(new WaitCallback((object index) =>
{
int taskIndex = (int)index;
// Execute the method
tasks[taskIndex]();
// Tell the calling thread that we're done
resetEvents[taskIndex].Set();
}), i);
}
// Wait for all threads to execute
WaitHandle.WaitAll(resetEvents);
}
More detail about this function can be found here:
http://newapputil.blogspot.in/2016/03/running-parallel-tasks-using.html
var task1 = SomeLongRunningTask();
var task2 = SomeOtherLongRunningTask();
await Task.WhenAll(task1, task2);
The benefit of this over Task.WaitAll is that this will release the thread and await the completion of the two tasks.
Updated to explain things more clearly
I've got an application that runs a number of tasks. Some are created initially and other can be added later. I need need a programming structure that will wait on all the tasks to complete. Once the all the tasks complete some other code should run that cleans things up and does some final processing of data generated by the other tasks.
I've come up with a way to do this, but wouldn't call it elegant. So I'm looking to see if there is a better way.
What I do is keep a list of the tasks in a ConcurrentBag (a thread safe collection). At the start of the process I create and add some tasks to the ConcurrentBag. As the process does its thing if a new task is created that also needs to finish before the final steps I also add it to the ConcurrentBag.
Task.Wait accepts an array of Tasks as its argument. I can convert the ConcurrentBag into an array, but that array won't include any Tasks added to the Bag after Task.Wait was called.
So I have a two step wait process in a do while loop. In the body of the loop I do a simple Task.Wait on the array generated from the Bag. When it completes it means all the original tasks are done. Then in the while test I do a quick 1 millisecond test of a new array generated from the ConcurrentBag. If no new tasks were added, or any new tasks also completed it will return true, so the not condition exits the loop.
If it returns false (because a new task was added that didn't complete) we go back and do a non-timed Task.Wait. Then rinse and repeat until all new and old tasks are done.
// defined on the class, perhaps they should be properties
CancellationTokenSource Source = new CancellationTokenSource();
CancellationToken Token = Source.Token;
ConcurrentBag<Task> ToDoList = new ConcurrentBag<Task>();
public void RunAndWait() {
// start some tasks add them to the list
for (int i = 0; i < 12; i++)
{
Task task = new Task(() => SillyExample(Token), Token);
ToDoList.Add(task);
task.Start();
}
// now wait for those task, and any other tasks added to ToDoList to complete
try
{
do
{
Task.WaitAll(ToDoList.ToArray(), Token);
} while (! Task.WaitAll(ToDoList.ToArray(), 1, Token));
}
catch (OperationCanceledException e)
{
// any special handling of cancel we might want to do
}
// code that should only run after all tasks complete
}
Is there a more elegant way to do this?
I'd recommend using a ConcurrentQueue and removing items as you wait for them. Due to the first-in-first-out nature of queues, if you get to the point where there's nothing left in the queue, you know that you've waited for all the tasks that have been added up to that point.
ConcurrentQueue<Task> ToDoQueue = new ConcurrentQueue<Task>();
...
while(ToDoQueue.Count > 0 && !Token.IsCancellationRequested)
{
Task task;
if(ToDoQueue.TryDequeue(out task))
{
task.Wait(Token);
}
}
Here's a very cool way using Microsoft's Reactive Framework (NuGet "Rx-Main").
var taskSubject = new Subject<Task>();
var query = taskSubject.Select(t => Observable.FromAsync(() => t)).Merge();
var subscription =
query.Subscribe(
u => { /* Each Task Completed */ },
() => Console.WriteLine("All Tasks Completed."));
Now, to add tasks, just do this:
taskSubject.OnNext(Task.Run(() => { }));
taskSubject.OnNext(Task.Run(() => { }));
taskSubject.OnNext(Task.Run(() => { }));
And then to signal completion:
taskSubject.OnCompleted();
It is important to note that signalling completion doesn't complete the query immediately, it will wait for all of the tasks to finish too. Signalling completion just says that you will no longer add any new tasks.
Finally, if you want to cancel, then just do this:
subscription.Dispose();
Given the following code...
static void DoSomething(int id) {
Thread.Sleep(50);
Console.WriteLine(#"DidSomething({0})", id);
}
I know I can convert this to an async task as follows...
static async Task DoSomethingAsync(int id) {
await Task.Delay(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
}
And that by doing so if I am calling multiple times (Task.WhenAll) everything will be faster and more efficient than perhaps using Parallel.Foreach or even calling from within a loop.
But for a minute, lets pretend that Task.Delay() does not exist and I actually have to use Thread.Sleep(); I know in reality this is not the case, but this is concept code and where the Delay/Sleep is would normally be an IO operation where there is no async option (such as early EF).
I have tried the following...
static async Task DoSomethingAsync2(int id) {
await Task.Run(() => {
Thread.Sleep(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
});
}
But, though it runs without error, according to Lucien Wischik this is in fact bad practice as it is merely spinning up threads from the pool to complete each task (it is also slower using the following console application - if you swap between DoSomethingAsync and DoSomethingAsync2 call you can see a significant difference in the time that it takes to complete)...
static void Main(string[] args) {
MainAsync(args).Wait();
}
static async Task MainAsync(String[] args) {
List<Task> tasks = new List<Task>();
for (int i = 1; i <= 1000; i++)
tasks.Add(DoSomethingAsync2(i)); // Can replace with any version
await Task.WhenAll(tasks);
}
I then tried the following...
static async Task DoSomethingAsync3(int id) {
await new Task(() => {
Thread.Sleep(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
});
}
Transplanting this in place of the original DoSomethingAsync, the test never completes and nothing is shown on screen!
I have also tried multiple other variations that either do not compile or do not complete!
So, given the constraint that you cannot call any existing asynchronous methods and must complete both the Thread.Sleep and the Console.WriteLine in an asynchronous task, how do you do it in a manner that is as efficient as the original code?
The objective here for those of you who are interested is to give me a better understanding of how to create my own async methods where I am not calling anybody elses. Despite many searches, this seems to be the one area where examples are really lacking - whilst there are many thousands of examples of calling async methods that call other async methods in turn I cannot find any that convert an existing void method to an async task where there is no call to a further async task other than those that use the Task.Run(() => {} ) method.
There are two kinds of tasks: those that execute code (e.g., Task.Run and friends), and those that respond to some external event (e.g., TaskCompletionSource<T> and friends).
What you're looking for is TaskCompletionSource<T>. There are various "shorthand" forms for common situations so you don't always have to use TaskCompletionSource<T> directly. For example, Task.FromResult or TaskFactory.FromAsync. FromAsync is most commonly used if you have an existing *Begin/*End implementation of your I/O; otherwise, you can use TaskCompletionSource<T> directly.
For more information, see the "I/O-bound Tasks" section of Implementing the Task-based Asynchronous Pattern.
The Task constructor is (unfortunately) a holdover from Task-based parallelism, and should not be used in asynchronous code. It can only be used to create a code-based task, not an external event task.
So, given the constraint that you cannot call any existing asynchronous methods and must complete both the Thread.Sleep and the Console.WriteLine in an asynchronous task, how do you do it in a manner that is as efficient as the original code?
I would use a timer of some kind and have it complete a TaskCompletionSource<T> when the timer fires. I'm almost positive that's what the actual Task.Delay implementation does anyway.
So, given the constraint that you cannot call any existing
asynchronous methods and must complete both the Thread.Sleep and the
Console.WriteLine in an asynchronous task, how do you do it in a
manner that is as efficient as the original code?
IMO, this is a very synthetic constraint that you really need to stick with Thread.Sleep. Under this constraint, you still can slightly improve your Thread.Sleep-based code. Instead of this:
static async Task DoSomethingAsync2(int id) {
await Task.Run(() => {
Thread.Sleep(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
});
}
You could do this:
static Task DoSomethingAsync2(int id) {
return Task.Run(() => {
Thread.Sleep(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
});
}
This way, you'd avoid an overhead of the compiler-generated state machine class. There is a subtle difference between these two code fragments, in how exceptions are propagated.
Anyhow, this is not where the bottleneck of the slowdown is.
(it is also slower using the following console application - if you
swap between DoSomethingAsync and DoSomethingAsync2 call you can see a
significant difference in the time that it takes to complete)
Let's look one more time at your main loop code:
static async Task MainAsync(String[] args) {
List<Task> tasks = new List<Task>();
for (int i = 1; i <= 1000; i++)
tasks.Add(DoSomethingAsync2(i)); // Can replace with any version
await Task.WhenAll(tasks);
}
Technically, it requests 1000 tasks to be run in parallel, each supposedly to run on its own thread. In an ideal universe, you'd expect to execute Thread.Sleep(50) 1000 times in parallel and complete the whole thing in about 50ms.
However, this request is never satisfied by the TPL's default task scheduler, for a good reason: thread is a precious and expensive resource. Moreover, the actual number of concurrent operations is limited to the number of CPUs/cores. So in reality, with the default size of ThreadPool, I'm getting 21 pool threads (at peak) serving this operation in parallel. That is why DoSomethingAsync2 / Thread.Sleep takes so much longer than DoSomethingAsync / Task.Delay. DoSomethingAsync doesn't block a pool thread, it only requests one upon the completion of the time-out. Thus, more DoSomethingAsync tasks can actually run in parallel, than DoSomethingAsync2 those.
The test (a console app):
// https://stackoverflow.com/q/21800450/1768303
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace Console_21800450
{
public class Program
{
static async Task DoSomethingAsync(int id)
{
await Task.Delay(50);
UpdateMaxThreads();
Console.WriteLine(#"DidSomethingAsync({0})", id);
}
static async Task DoSomethingAsync2(int id)
{
await Task.Run(() =>
{
Thread.Sleep(50);
UpdateMaxThreads();
Console.WriteLine(#"DidSomethingAsync2({0})", id);
});
}
static async Task MainAsync(Func<int, Task> tester)
{
List<Task> tasks = new List<Task>();
for (int i = 1; i <= 1000; i++)
tasks.Add(tester(i)); // Can replace with any version
await Task.WhenAll(tasks);
}
volatile static int s_maxThreads = 0;
static void UpdateMaxThreads()
{
var threads = Process.GetCurrentProcess().Threads.Count;
// not using locks for simplicity
if (s_maxThreads < threads)
s_maxThreads = threads;
}
static void TestAsync(Func<int, Task> tester)
{
s_maxThreads = 0;
var stopwatch = new Stopwatch();
stopwatch.Start();
MainAsync(tester).Wait();
Console.WriteLine(
"time, ms: " + stopwatch.ElapsedMilliseconds +
", threads at peak: " + s_maxThreads);
}
static void Main()
{
Console.WriteLine("Press enter to test with Task.Delay ...");
Console.ReadLine();
TestAsync(DoSomethingAsync);
Console.ReadLine();
Console.WriteLine("Press enter to test with Thread.Sleep ...");
Console.ReadLine();
TestAsync(DoSomethingAsync2);
Console.ReadLine();
}
}
}
Output:
Press enter to test with Task.Delay ...
...
time, ms: 1077, threads at peak: 13
Press enter to test with Thread.Sleep ...
...
time, ms: 8684, threads at peak: 21
Is it possible to improve the timing figure for the Thread.Sleep-based DoSomethingAsync2? The only way I can think of is to use TaskCreationOptions.LongRunning with Task.Factory.StartNew:
You should think twice before doing this in any real-life application:
static async Task DoSomethingAsync2(int id)
{
await Task.Factory.StartNew(() =>
{
Thread.Sleep(50);
UpdateMaxThreads();
Console.WriteLine(#"DidSomethingAsync2({0})", id);
}, TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
// ...
static void Main()
{
Console.WriteLine("Press enter to test with Task.Delay ...");
Console.ReadLine();
TestAsync(DoSomethingAsync);
Console.ReadLine();
Console.WriteLine("Press enter to test with Thread.Sleep ...");
Console.ReadLine();
TestAsync(DoSomethingAsync2);
Console.ReadLine();
}
Output:
Press enter to test with Thread.Sleep ...
...
time, ms: 3600, threads at peak: 163
The timing gets better, but the price for this is high. This code asks the task scheduler to create a new thread for each new task. Do not expect this thread to come from the pool:
Task.Factory.StartNew(() =>
{
Thread.Sleep(1000);
Console.WriteLine("Thread pool: " +
Thread.CurrentThread.IsThreadPoolThread); // false!
}, TaskCreationOptions.LongRunning).Wait();
Given the following:
BlockingCollection<MyObject> collection;
public class MyObject
{
public async Task<ReturnObject> DoWork()
{
(...)
return await SomeIOWorkAsync();
}
}
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit though (I believe the Task Factory/ThreadPool does some management here)?
You can make use of the WhenAll extension method.
var combinedTask = await Task.WhenAll(collection.Select(x => x.DoWork());
It will start all tasks concurrently and waits for all to finish.
ThreadPool manages the number of threads running, but that won't help you much with asynchronous Tasks.
Because of that, you need something else. One way to do this is to utilize ActionBlock from TPL Dataflow:
int limit = …;
IEnumerable<MyObject> collection = …;
var block = new ActionBlock<MyObject>(
o => o.DoWork(),
new ExecutionDataflowBlockOptions { MaxDegreeOfParallelism = limit });
foreach (var obj in collection)
block.Post(o);
block.Complete();
await block.Completion;
What would be the correct/most performant way to execute all DoWork() tasks asynchronously on all MyObjects in collection concurrently (while capturing the return object), ideally with a sensible thread limit
The easiest way to do that is with Task.WhenAll:
ReturnObject[] results = await Task.WhenAll(collection.Select(x => x.DoWork()));
This will invoke DoWork on all MyObjects in the collection and then wait for them all to complete. The thread pool handles all throttling sensibly.
Is there a different way if I want to capture every individual DoWork() return immediately instead of waiting for all items to complete?
Yes, you can use the method described by Jon Skeet and Stephen Toub. I have a similar solution in my AsyncEx library (available via NuGet), which you can use like this:
// "tasks" is of type "Task<ReturnObject>[]"
var tasks = collection.Select(x => x.DoWork()).OrderByCompletion();
foreach (var task in tasks)
{
var result = await task;
...
}
My comment was a bit cryptic, so I though I'd add this answer:
List<Task<ReturnObject>> workTasks =
collection.Select( o => o.DoWork() ).ToList();
List<Task> resultTasks =
workTasks.Select( o => o.ContinueWith( t =>
{
ReturnObject r = t.Result;
// do something with the result
},
// if you want to run this on the UI thread
TaskScheduler.FromCurrentSynchronizationContext()
)
)
.ToList();
await Task.WhenAll( resultTasks );