Given the following code...
static void DoSomething(int id) {
Thread.Sleep(50);
Console.WriteLine(#"DidSomething({0})", id);
}
I know I can convert this to an async task as follows...
static async Task DoSomethingAsync(int id) {
await Task.Delay(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
}
And that by doing so if I am calling multiple times (Task.WhenAll) everything will be faster and more efficient than perhaps using Parallel.Foreach or even calling from within a loop.
But for a minute, lets pretend that Task.Delay() does not exist and I actually have to use Thread.Sleep(); I know in reality this is not the case, but this is concept code and where the Delay/Sleep is would normally be an IO operation where there is no async option (such as early EF).
I have tried the following...
static async Task DoSomethingAsync2(int id) {
await Task.Run(() => {
Thread.Sleep(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
});
}
But, though it runs without error, according to Lucien Wischik this is in fact bad practice as it is merely spinning up threads from the pool to complete each task (it is also slower using the following console application - if you swap between DoSomethingAsync and DoSomethingAsync2 call you can see a significant difference in the time that it takes to complete)...
static void Main(string[] args) {
MainAsync(args).Wait();
}
static async Task MainAsync(String[] args) {
List<Task> tasks = new List<Task>();
for (int i = 1; i <= 1000; i++)
tasks.Add(DoSomethingAsync2(i)); // Can replace with any version
await Task.WhenAll(tasks);
}
I then tried the following...
static async Task DoSomethingAsync3(int id) {
await new Task(() => {
Thread.Sleep(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
});
}
Transplanting this in place of the original DoSomethingAsync, the test never completes and nothing is shown on screen!
I have also tried multiple other variations that either do not compile or do not complete!
So, given the constraint that you cannot call any existing asynchronous methods and must complete both the Thread.Sleep and the Console.WriteLine in an asynchronous task, how do you do it in a manner that is as efficient as the original code?
The objective here for those of you who are interested is to give me a better understanding of how to create my own async methods where I am not calling anybody elses. Despite many searches, this seems to be the one area where examples are really lacking - whilst there are many thousands of examples of calling async methods that call other async methods in turn I cannot find any that convert an existing void method to an async task where there is no call to a further async task other than those that use the Task.Run(() => {} ) method.
There are two kinds of tasks: those that execute code (e.g., Task.Run and friends), and those that respond to some external event (e.g., TaskCompletionSource<T> and friends).
What you're looking for is TaskCompletionSource<T>. There are various "shorthand" forms for common situations so you don't always have to use TaskCompletionSource<T> directly. For example, Task.FromResult or TaskFactory.FromAsync. FromAsync is most commonly used if you have an existing *Begin/*End implementation of your I/O; otherwise, you can use TaskCompletionSource<T> directly.
For more information, see the "I/O-bound Tasks" section of Implementing the Task-based Asynchronous Pattern.
The Task constructor is (unfortunately) a holdover from Task-based parallelism, and should not be used in asynchronous code. It can only be used to create a code-based task, not an external event task.
So, given the constraint that you cannot call any existing asynchronous methods and must complete both the Thread.Sleep and the Console.WriteLine in an asynchronous task, how do you do it in a manner that is as efficient as the original code?
I would use a timer of some kind and have it complete a TaskCompletionSource<T> when the timer fires. I'm almost positive that's what the actual Task.Delay implementation does anyway.
So, given the constraint that you cannot call any existing
asynchronous methods and must complete both the Thread.Sleep and the
Console.WriteLine in an asynchronous task, how do you do it in a
manner that is as efficient as the original code?
IMO, this is a very synthetic constraint that you really need to stick with Thread.Sleep. Under this constraint, you still can slightly improve your Thread.Sleep-based code. Instead of this:
static async Task DoSomethingAsync2(int id) {
await Task.Run(() => {
Thread.Sleep(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
});
}
You could do this:
static Task DoSomethingAsync2(int id) {
return Task.Run(() => {
Thread.Sleep(50);
Console.WriteLine(#"DidSomethingAsync({0})", id);
});
}
This way, you'd avoid an overhead of the compiler-generated state machine class. There is a subtle difference between these two code fragments, in how exceptions are propagated.
Anyhow, this is not where the bottleneck of the slowdown is.
(it is also slower using the following console application - if you
swap between DoSomethingAsync and DoSomethingAsync2 call you can see a
significant difference in the time that it takes to complete)
Let's look one more time at your main loop code:
static async Task MainAsync(String[] args) {
List<Task> tasks = new List<Task>();
for (int i = 1; i <= 1000; i++)
tasks.Add(DoSomethingAsync2(i)); // Can replace with any version
await Task.WhenAll(tasks);
}
Technically, it requests 1000 tasks to be run in parallel, each supposedly to run on its own thread. In an ideal universe, you'd expect to execute Thread.Sleep(50) 1000 times in parallel and complete the whole thing in about 50ms.
However, this request is never satisfied by the TPL's default task scheduler, for a good reason: thread is a precious and expensive resource. Moreover, the actual number of concurrent operations is limited to the number of CPUs/cores. So in reality, with the default size of ThreadPool, I'm getting 21 pool threads (at peak) serving this operation in parallel. That is why DoSomethingAsync2 / Thread.Sleep takes so much longer than DoSomethingAsync / Task.Delay. DoSomethingAsync doesn't block a pool thread, it only requests one upon the completion of the time-out. Thus, more DoSomethingAsync tasks can actually run in parallel, than DoSomethingAsync2 those.
The test (a console app):
// https://stackoverflow.com/q/21800450/1768303
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace Console_21800450
{
public class Program
{
static async Task DoSomethingAsync(int id)
{
await Task.Delay(50);
UpdateMaxThreads();
Console.WriteLine(#"DidSomethingAsync({0})", id);
}
static async Task DoSomethingAsync2(int id)
{
await Task.Run(() =>
{
Thread.Sleep(50);
UpdateMaxThreads();
Console.WriteLine(#"DidSomethingAsync2({0})", id);
});
}
static async Task MainAsync(Func<int, Task> tester)
{
List<Task> tasks = new List<Task>();
for (int i = 1; i <= 1000; i++)
tasks.Add(tester(i)); // Can replace with any version
await Task.WhenAll(tasks);
}
volatile static int s_maxThreads = 0;
static void UpdateMaxThreads()
{
var threads = Process.GetCurrentProcess().Threads.Count;
// not using locks for simplicity
if (s_maxThreads < threads)
s_maxThreads = threads;
}
static void TestAsync(Func<int, Task> tester)
{
s_maxThreads = 0;
var stopwatch = new Stopwatch();
stopwatch.Start();
MainAsync(tester).Wait();
Console.WriteLine(
"time, ms: " + stopwatch.ElapsedMilliseconds +
", threads at peak: " + s_maxThreads);
}
static void Main()
{
Console.WriteLine("Press enter to test with Task.Delay ...");
Console.ReadLine();
TestAsync(DoSomethingAsync);
Console.ReadLine();
Console.WriteLine("Press enter to test with Thread.Sleep ...");
Console.ReadLine();
TestAsync(DoSomethingAsync2);
Console.ReadLine();
}
}
}
Output:
Press enter to test with Task.Delay ...
...
time, ms: 1077, threads at peak: 13
Press enter to test with Thread.Sleep ...
...
time, ms: 8684, threads at peak: 21
Is it possible to improve the timing figure for the Thread.Sleep-based DoSomethingAsync2? The only way I can think of is to use TaskCreationOptions.LongRunning with Task.Factory.StartNew:
You should think twice before doing this in any real-life application:
static async Task DoSomethingAsync2(int id)
{
await Task.Factory.StartNew(() =>
{
Thread.Sleep(50);
UpdateMaxThreads();
Console.WriteLine(#"DidSomethingAsync2({0})", id);
}, TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
// ...
static void Main()
{
Console.WriteLine("Press enter to test with Task.Delay ...");
Console.ReadLine();
TestAsync(DoSomethingAsync);
Console.ReadLine();
Console.WriteLine("Press enter to test with Thread.Sleep ...");
Console.ReadLine();
TestAsync(DoSomethingAsync2);
Console.ReadLine();
}
Output:
Press enter to test with Thread.Sleep ...
...
time, ms: 3600, threads at peak: 163
The timing gets better, but the price for this is high. This code asks the task scheduler to create a new thread for each new task. Do not expect this thread to come from the pool:
Task.Factory.StartNew(() =>
{
Thread.Sleep(1000);
Console.WriteLine("Thread pool: " +
Thread.CurrentThread.IsThreadPoolThread); // false!
}, TaskCreationOptions.LongRunning).Wait();
Related
I want to create a Task, which may run for many minutes, collecting data via an API call to another system. At some point in the future I need to stop the task and return the collected data. This future point is unknown at the time of starting the task.
I have read many question about returning data from tasks, but I can't find any that answer this scenario. I may be missing a trick, but all of the examples actually seem to wait in the man thread for the task to finish before continuing. This seems counter-intuitive, surely the purpose of a task is to hand off an activity whilst continuing with other activities in your main thread?
Here is one of those many examples, taken from DotNetPearls..
namespace TaskBasedAsynchronousProgramming
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine($"Main Thread Started");
Task<double> task1 = Task.Run(() =>
{
return CalculateSum(10);
});
Console.WriteLine($"Sum is: {task1.Result}");
Console.WriteLine($"Main Thread Completed");
Console.ReadKey();
}
static double CalculateSum(int num)
{
double sum = 0;
for (int count = 1; count <= num; count++)
{
sum += count;
}
return sum;
}
}
}
Is it possible to do what I need, and have a long-running task running in parallel, stop it and return the data at an arbitrary future point?
Here is a sample application how you can do that:
static double partialResult = -1;
static void Main()
{
CancellationTokenSource calculationEndSignal = new(TimeSpan.FromSeconds(3));
Task meaningOfLife = Task.Run(() =>
GetTheMeaningOfLife(calculationEndSignal.Token),
calculationEndSignal.Token);
calculationEndSignal.Token.Register(() => Console.WriteLine(partialResult));
Console.ReadLine();
}
static async Task GetTheMeaningOfLife(CancellationToken cancellationToken)
{
foreach (var semiResult in Enumerable.Range(1, 42))
{
partialResult = semiResult;
cancellationToken.ThrowIfCancellationRequested();
await Task.Delay(1000);
}
}
partialResult is a shared variable between the two threads
The worker thread (GetTheMeaningOfLife) only writes it
The main thread (Main) only reads it
The read operation is performed only after the Task has been cancelled
calculationEndSignal is used to cancel the long-running operation
I've have specified a timeout, but you can call the Cancel method if you want
meaningOfLife is the Task which represents the long-running operation call
I have passed the CancellationToken to the GetTheMeaningOfLife and to the Task.Run as well
For this very simple example the Task.Run should not need to receive the token but it is generally a good practice to pass there as well
Register is receiving a callback which should be called after the token is cancelled
ReadLine can be any other computation
I've used ReadLine to keep the application running
GetTheMeaningOfLife simply increments the partialResult shared variable
either until it reaches the meaning of life
or until it is cancelled
Here is one approach. It features a CancellationTokenSource that is used as a stopping mechanism, instead of its normal usage as a cancellation mechanism. That's because you want to get the partial results, and a canceled Task does not propagate results:
CancellationTokenSource stoppingTokenSource = new();
Task<List<int>> longRunningTask = Task.Run(() =>
{
List<int> list = new();
for (int i = 1; i <= 60; i++)
{
if (stoppingTokenSource.IsCancellationRequested) break;
// Simulate a synchronous operation that has 1 second duration.
Thread.Sleep(1000);
list.Add(i);
}
return list;
});
Then, somewhere else in your program, you can send a stopping signal to the task, and then await asynchronously until the task acknowledges the signal and completes successfully. The await will also propagate the partial results:
stoppingTokenSource.Cancel();
List<int> partialResults = await longRunningTask;
Or, if you are not in an asynchronous workflow, you can wait synchronously until the partial results are available:
stoppingTokenSource.Cancel();
List<int> partialResults = longRunningTask.Result;
I am thinking a way to handle and expensive process which would require me to call and get multiple data from multiple Industrial field. So for each career I will make them into one individual task inside the Generate Report Class. I would like to ask that when using Task.WhenAll is it that task 1,task 2,task 3 will run together without waiting task1 to be completed. But if using Task.WaitAll it will wait for MiningSector to Complete in order to run ChemicalSector. Am I correct since this is what I wanted to achieve.
public static async Task<bool> GenerateReport()
{
static async Task<string> WriteText(string name)
{
var start = DateTime.Now;
Console.WriteLine("Enter {0}, {1}", name, start);
return name;
}
static async Task MiningSector()
{
var task1 = WriteText("task1");
var task2 = WriteText("task2");
var task3 = WriteText("task3");
await Task.WhenAll(task1, task2, task3); //Run All 3 task together
Console.WriteLine("MiningSectorresults: {0}", String.Join(", ", t1.Result, t2.Result, t3.Result));
}
static async Task ChemicalsSector()
{
var task4 = WriteText("task4");
var task5 = WriteText("task5");
var task6 = WriteText("task6");
await Task.WhenAll(task4 , task5 , task6 ); //Run All 3 task together
Console.WriteLine("ChemicalsSectorresults: {0}", String.Join(", ", t1.Result, t2.Result, t3.Result));
}
static void Main(string[] args)
{
Task.WaitAll(MiningSector(), ChemicalsSector()); //Wait when MiningSectoris complete then start ChemicalsSector.
}
return true;
}
The thing I wanted to achieve is, inside MiningSector have 10 or more functions need to be run and ChemicalSector is same as well. I wanted to run all the functions inside the MiningSector at parallel once it's done then the ChemicalSector will start running the function.
But if using Task.WaitAll it will wait for MiningSector to Complete in order to run ChemicalSector. Am I correct since this is what I wanted to achieve.
No. The key thing to understand here is that arguments are evaluated first, and then the method call. This is the same way all arguments and method calls work across the entire C# language.
So this code:
Task.WaitAll(MiningSector(), ChemicalsSector());
has the same semantics as this code:
var miningTask = MiningSector();
var sectorTask = ChemicalsSector();
Task.WaitAll(miningTask, sectorTask);
In other words, both methods are called (and both tasks started) before Task.WaitAll is called.
If you want to do one then the other, then don't call the second method until the first task has completed. This is most easily done using async Main, e.g.:
static async Task Main(string[] args)
{
await MiningSector();
await ChemicalsSector();
}
I'm currently working on a concurrent file downloader.
For that reason I want to parametrize the number of concurrent tasks. I don't want to wait for all the tasks to be completed but to keep the same number being runned.
In fact, this thread on star overflow gave me a proper clue, but I'm struggling making it async:
Keep running a specific number of tasks
Here is my code:
public async Task StartAsync()
{
var semaphore = new SemaphoreSlim(1, _concurrentTransfers);
var queueHasMessages = true;
while (queueHasMessages)
{
try {
await Task.Run(async () =>
{
await semaphore.WaitAsync();
await asyncStuff();
});
}
finally {
semaphore.Release();
};
}
}
But the code just get executed one at a time. I think that the await is blocking me for generating the desired amount of tasks, but I don't know how to avoid it while respecting the limit established by the semaphore.
If I add all the tasks to a list and make a whenall, the semaphore throws an exception since it has reached the max count.
Any suggestions?
It was brought to my attention that the struck-through solution will drop any exceptions that occur during execution. That's bad.
Here is a solution that will not drop exceptions:
Task.Run is a Factory Method for creating a Task. You can check yourself with the intellisense return value. You can assign the returned Task anywhere you like.
"await" is an operator that will wait until the task it operates on completes. You are able to use any Task with the await operator.
public static async Task RunTasksConcurrently()
{
IList<Task> tasks = new List<Task>();
for (int i = 1; i < 4; i++)
{
tasks.Add(RunNextTask());
}
foreach (var task in tasks) {
await task;
}
}
public static async Task RunNextTask()
{
while(true) {
await Task.Delay(500);
}
}
By adding the values of the Task we create to a list, we can await them later on in execution.
Previous Answer below
Edit: With the clarification I think I understand better.
Instead of running every task at once, you want to start 3 tasks, and as soon as a task is finished, run the next one.
I believe this can happen using the .ContinueWith(Action<Task>) method.
See if this gets closer to your intended solution.
public void SpawnInitialTasks()
{
for (int i = 0; i < 3; i++)
{
RunNextTask();
}
}
public void RunNextTask()
{
Task.Run(async () => await Task.Delay(500))
.ContinueWith(t => RunNextTask());
// Recurse here to keep running tasks whenever we finish one.
}
The idea is that we spawn 3 tasks right away, then whenever one finishes we spawn the next. If you need to keep data flowing between the tasks, you can use parameters:
RunNextTask(DataObject object)
You can do this easily the old-fashioned way without using await by using Parallel.ForEach(), which lets you specify the maximum number of concurrent threads to use.
For example:
using System;
using System.Collections.Generic;
using System.Linq;
using System.Threading;
using System.Threading.Tasks;
namespace Demo
{
class Program
{
public static void Main(string[] args)
{
IEnumerable<string> filenames = Enumerable.Range(1, 100).Select(x => x.ToString());
Parallel.ForEach(
filenames,
new ParallelOptions { MaxDegreeOfParallelism = 4},
download
);
}
static void download(string filepath)
{
Console.WriteLine("Downloading " + filepath);
Thread.Sleep(1000); // Simulate downloading time.
Console.WriteLine("Downloaded " + filepath);
}
}
}
If you run this and observe the output, you'll see that the "files" are being "downloaded" in batchs.
A better simulation is the change download() so that it takes a random amount of time to process each "file", like so:
static Random rng = new Random();
static void download(string filepath)
{
Console.WriteLine("Downloading " + filepath);
Thread.Sleep(500 + rng.Next(1000)); // Simulate random downloading time.
Console.WriteLine("Downloaded " + filepath);
}
Try that and see the difference in the output.
However, if you want a more modern way to do this, you could look into the Dataflow part of the TPL (Task Parallel Library) - this works well with async methods.
This is a lot more complicated to get to grips with, but it's a lot more powerful. You could use an ActionBlock to do it, but describing how to do that is a bit beyond the scope of an answer I could give here.
Have a look at this other answer on StackOverflow; it gives a brief example.
Also note that the TPL is not built in to .Net - you have to get it from NuGet.
There is such application:
static void Main(string[] args)
{
HandleRequests(10).Wait();
HandleRequests(50).Wait();
HandleRequests(100).Wait();
HandleRequests(1000).Wait();
Console.ReadKey();
}
private static async Task IoBoundWork()
{
await Task.Delay(100);
}
private static void CpuBoundWork()
{
Thread.Sleep(100);
}
private static async Task HandleRequest()
{
CpuBoundWork();
await IoBoundWork();
}
private static async Task HandleRequests(int numberOfRequests)
{
var sw = Stopwatch.StartNew();
var tasks = new List<Task>();
for (int i = 0; i < numberOfRequests; i++)
{
tasks.Add(HandleRequest());
}
await Task.WhenAll(tasks.ToArray());
sw.Stop();
Console.WriteLine(sw.Elapsed);
}
Below the output of this app:
From my perspective having CPU-bound and IO-bound parts in one method it is quite regular situation, e.g. parsing/archiving/serialization of some object and saving that to the disk, so it should probably work well. However in the implementation above it works very slow. Could you please help me to understand why?
If we wrap the body of CpuBoundWork() in Task it significantly improve performance:
private static async Task CpuBoundWork()
{
await Task.Run(() => Thread.Sleep(100));
}
private static async Task HandleRequest()
{
await CpuBoundWork();
await IoBoundWork();
}
Why it works so slow without Task.Run? Why we can see performance boost after adding Task.Run? Should we always use such approach in similar methods?
for (int i = 0; i < numberOfRequests; i++)
{
tasks.Add(HandleRequest());
}
The returned task is created at the first await in the HandleRequest(). So you are executing all CPU bound code on one thread: the for loop thread. complete serialization, no parallelism at all.
When you wrap the CPU part in a task you are actually submitting the CPU part as Tasks, so they are executed in parallel.
The way you're doing, this is what happens:
|-----------HandleRequest Timeline-----------|
|CpuBoundWork Timeline| |IoBoundWork Timeline|
Try doing it like this:
private static async Task HandleRequest()
{
await IoBoundWork();
CpuBoundWork();
}
It has the advantage of starting the IO work and while it waits, the CpuBoundWork() can do the processing. You only await at the last moment you need the response.
The timeline would look somewhat like this:
|--HandleRequest Timeline--|
|Io...
|CpuBoundWork Timeline|
...BoundWork Timeline|
On a side note, open extra threads (Task.Run) with caution in an web environment, you already have a thread per request, so multiplying them will have a negative impact on scalability.
You've indicated that your method ought to be asynchronous, by having it return a Task, but you've not actually made it (entirely) asynchronous. You're implementation of the method does a bunch of expensive, long running, work synchronously, and then returns to the caller and does some other work asynchronously.
Your callers of the method, however, assume that it's actually asynchronous (in entirety) and that it doesn't do expensive work synchronously. They assume that they can call the method multiple times, have it return immediately, and then continue on, but since your implementation doesn't return immediately, and instead does a bunch of expensive work before returning, that code doesn't work properly (specifically, it's not able to start the next operation until the previous one returns, so that synchronous work isn't being done in parallel).
Note that your "fix" isn't quite idiomatic. You're using the async over sync anti-pattern. Rather than making CpuBoundWork async and having it return a Task, despite being a CPU bound operation, it should remain as is an HandleRequest should handle indicating that the CPU bound work should be done asynchronously in another thread by calling Task.Run:
private static async Task HandleRequest()
{
await Task.Run(() => CpuBoundWork());
await IoBoundWork();
}
I'm having some trouble getting a task to asynchronously delay. I am writing an application that needs to run at a scale of tens/hundreds of thousands of asynchronously executing scripts. I am doing this using C# Actions and sometimes, during the execution of a particular sequence, in order for the script to execute properly, it needs to wait on an external resource to reach an expected state. At first I wrote this using Thread.Sleep() but that turned out to be a torpedo in the applications performance, so I'm looking into async/await for async sleep. But I can't get it to actually wait on the pause! Can someone explain this?
static void Main(string[] args)
{
var sync = new List<Action>();
var async = new List<Action>();
var syncStopWatch = new Stopwatch();
sync.Add(syncStopWatch.Start);
sync.Add(() => Thread.Sleep(1000));
sync.Add(syncStopWatch.Stop);
sync.Add(() => Console.Write("Sync:\t" + syncStopWatch.ElapsedMilliseconds + "\n"));
var asyncStopWatch = new Stopwatch();
sync.Add(asyncStopWatch.Start);
sync.Add(async () => await Task.Delay(1000));
sync.Add(asyncStopWatch.Stop);
sync.Add(() => Console.Write("Async:\t" + asyncStopWatch.ElapsedMilliseconds + "\n"));
foreach (Action a in sync)
{
a.Invoke();
}
foreach (Action a in async)
{
a.Invoke();
}
}
The results of the execution are:
Sync: 999
Async: 2
How do I get it to wait asynchronously?
You're running into a problem with async void. When you pass an async lambda to an Action, the compiler is creating an async void method for you.
As a best practice, you should avoid async void.
One way to do this is to have your list of actions actually be a List<Func<Task>> instead of List<Action>. This allows you to queue async Task methods instead of async void methods.
This means your "execution" code would have to wait for each Task as it completes. Also, your synchronous methods would have to return Task.FromResult(0) or something like that so they match the Func<Task> signature.
If you want a bigger scope solution, I recommend you strongly consider TPL Dataflow instead of creating your own queue.