I have a program that generates txt files with randomized contents. Many documents have to be generated, so I use Tasks to get the work done faster. I want to notify the user of the progress that's being made every few seconds (ex: "Generated 50000 documents out of 100000"). I create another Task, RecordProgress(), which records progress every 3 seconds.
However, if the program generates many tasks, the RecordProgress() never runs. If the program only generates 4 tasks, then RecordProgress() runs correctly ie. gets called every 3 seconds. However, if the program generates many tasks, RecordProgress() only runs once processing is/close to being finished.
Is there any way to increase the priority of the RecordProgress() task?
I've tried logging progress in each task, but that generates too many log messages to the console, which severely slows down my program.
I've tried logging in each task and waiting 3 seconds between logs, but if the program generates 50 tasks, then 50 messages will be logged to the console at the same time, which once again slows down my program and is unnecessary. I'd rather have ONE log message to the console every 3 seconds.
public void RecordProgress()
{
Stopwatch sw = new Stopwatch();
sw.Start();
//only record data while this generator is generating
while (_processing)
{
if(sw.ElapsedMilliseconds < _logFrequency)
continue;
Console.WriteLine("Generated " + _docsGenerated + " documents.");
sw.Restart();
}
}
public void GenerateDocs()
{
List<Task> tasks = new List<Task>();
_processing = true;
for (i = 0; i < 50; i ++)
{
tasks.Add(Task.Run(() => DoWork());
}
//task which records progress
//ONLY runs if not many tasks are created above
Task.Run(() => RecordProgress());
Task.WaitAll(tasks.ToArray());
}
I'd like the RecordProgress() task to run every 3 seconds regardless of the number of tasks generated by this program.
Edit: As per the comments, I removed the use of Thread.Sleep(). However, that only delayed the starting of my RecordProgress() task.
I've attempted to use a Stopwatch in RecordProgress() to only record progress every 3 seconds, but it greatly slows the performance of my program.
So new question: how to record progress of tasks without using a timer that heavily impacts performance?
In the original case, you create many tasks and exhaust the available threads in the Thread Pool. Running ReportProgress last delays its execution until most of the other tasks are complete. I see you corrected you code.
As for the priority question. Have in mind that you cannot change the priority of a task. You may achieve something like it by implementing your own TaskScheduler, but by default all tasks run a normal priority.
If you really need to run a task in higher priority, you need to create a Thread and set its priority to higher / highest. Something you should be very careful about.
I've found it:
I've created a Stopwatch object that requires the use of a lock to access. At the end of my DoWork() task, the task locks and checks how much time has passed since the program last logged. If over 3 seconds have passed, the task logs progress and resets the Stopwatch object. The RecordProgress() task is no longer necessary.
public void GenerateDocs()
{
List<Task> tasks = new List<Task>();
lock (_lockForLog)
{
_swForLogging.Start();
}
_processing = true;
for (i = 0; i < 50; i ++)
{
tasks.Add(Task.Run(() => DoWork());
}
Task.WaitAll(tasks.ToArray());
lock (_lockForLog)
{
_swForLogging.Stop();
}
}
public void DoWork(){
//do work
lock (_lockForLog)
{
if (_swForLogging.ElapsedMilliseconds > 3000)
{
Console.WriteLine("Generated " + _docsGenerated + " documents.");
_swForLogging.Restart();
}
}
}
Related
Consider a .Net Core 3.1 API with the following endpoints
GET : /computation - Performs a CPU intensive computation task
GET : /livecheck
High Load on /computation endpoint :
When there is high load on the '/computation' endpoint, 300 requests per second, other endpoints slow down as all threads are used up.
During the high load calling the '/livecheck' endpoint will return a request in 5-10 seconds, which is too much.
This is a problem, because if the '/livecheck' endpoint does not respond on time, the app is killed.(AWS ECS, kills the container when livecheck takes more than 5 seconds)
Is it possible to ensure '/livecheck' endpoint still returns data, by running the '/computation' endpoint on a separate thread pool. So that it does not use up all worker threads and they are available for other endpoints?
Note :
'/computation' has to be returned as a part of the same request,
don't want to queue it to background task.
Any other solutions also welcome .
I would suggest that the 'cpu intensive computation' is offloaded to another app and not done in the same app as the live check. An async message can be sent to trigger the computation processing to start and the app could subscribe to a finished or failed event from the processing app. That way heavy processing would not impact the response time of the web api.
Better than that, you can dockerize the apps.
To extend my comments under question
I still recommend you to follow the best practices to avoid blocking calls.
But for now, let's assume you really cannot, then you should at least offload computation from thread whenever it suits to give processor time to livecheck.
Consider this example:
void Computation()
{
for (var i = 0; i < 300; i++)
{
Thread.Sleep(1);
}
}
void LiveCheck()
{
Console.WriteLine("I'm alive.");
}
async Task Main()
{
var tasks = new List<Task>();
// Create 1000 blocking compuations to simulate busy thread pool
// or thread pool starvation
for (int i = 0; i < 1000; i++)
{
tasks.Add(Task.Run(Computation));
}
// Simulate 3 seconds after thread pool is busy, execute livecheck
Thread.Sleep(3000);
var sw = Stopwatch.StartNew();
await Task.Run(LiveCheck);
sw.Stop();
Console.WriteLine($"LiveCheck completed in {sw.Elapsed.TotalSeconds} seconds");
await Task.WhenAll(tasks);
}
In most cases, the return result on my machine is like:
LiveCheck completed in 31.2030817 seconds
If offload computation from thread:
async Task Computation()
{
for (var i = 0; i < 300; i++)
{
Thread.Sleep(1);
// Yield thread every 50ms
if ((i % 50) == 0)
{
await Task.Yield();
}
}
}
Output usually is:
LiveCheck completed in 2.4753268 seconds
The trade-off here is a single run of computation will be slower than sync version.
I'm determining between using TPL Dataflow blocks or some sort of producer/consumer approach for these tests. Producing a list of tasks will be super-fast, as each task will just be a string containing a list of test parameters such as the setup parameters, the measurements required, and time between measurements. This list of tasks will simply be files loaded through the GUI (1 file per test).
When at test is started, it should start right away. The tests could be very long and very asynchronous in that an action could take seconds or tens of minutes (e.g. heating up a device), followed by a measurement that takes a few seconds (or minutes), followed by a long period of inaction (24 hours) before the test is repeated again.
I could have up to 16 tests running at the same time, but I need the flexibility to be able to cancel any one of those tests at any time. I also need to be able to ADD a new test at any time (i.e. try to picture testing of 16 devices, or the span of a month in which individual test devices are added and removed throughout the month).
(Visual C#) I tried this example code for TPL dataflow where I tell it to run 32 simple tasks all at the same time. Each task is just a 5 second delay to simulate work. It appears to be processing the tasks in parallel as the time to complete the tasks took 15 seconds. I assume all 32 tasks did not finish in 5 seconds due to scheduling and any other overhead, but I am a bit worried that some task might of been blocked.
class Program
{
// Performs several computations by using dataflow and returns the elapsed
// time required to perform the computations.
static TimeSpan TimeDataflowComputations(int messageCount)
{
// Create an ActionBlock<int> that performs some work.
var workerBlock = new ActionBlock<int>(
// Simulate work by suspending the current thread.
millisecondsTimeout => Thread.Sleep(millisecondsTimeout),
// Specify a maximum degree of parallelism.
new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = messageCount
});
// Compute the time that it takes for several messages to
// flow through the dataflow block.
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
for (int i = 0; i < messageCount; i++)
{
workerBlock.Post(5000); // simulated work: a delay of 5 seconds.
}
workerBlock.Complete();
// Wait for all messages to propagate through the network.
workerBlock.Completion.Wait();
// Stop the timer and return the elapsed number of milliseconds.
stopwatch.Stop();
return stopwatch.Elapsed;
}
static void Main(string[] args)
{
int messageCount = 32;
TimeSpan elapsed;
// set processors maximum degree of parallelism. This causes
// multiple messages to be processed in parallel.
Console.WriteLine("START:\r\n");
elapsed = TimeDataflowComputations(messageCount);
Console.WriteLine("message count = {1}; " +
"elapsed time = {2}ms.", messageCount,
(int)elapsed.TotalMilliseconds);
Console.ReadLine();
}
}
The demo seems to work, but I am not sure if any of the tasks were blocked until one or more of the 5 second tasks were completed. I am also not sure how one would go about identifying each action block in order to cancel a specific one.
The reason that you don't get the expected performance is because your workload is synchronous and blocks the thread-pool threads. Do you expect to actually have synchronous (blocking) workload in your production environment? If yes, you could try boosting the ThreadPool reserve of available threads before starting the TPL Dataflow pipeline:
ThreadPool.SetMinThreads(workerThreads: 100, completionPortThreads: 100);
If your actual workload is asynchronous, then you could better simulate it with Task.Delay instead of Thread.Sleep.
var workerBlock = new ActionBlock<int>(async millisecondsTimeout =>
{
await Task.Delay(millisecondsTimeout);
}, new ExecutionDataflowBlockOptions
{
MaxDegreeOfParallelism = messageCount
});
I didn't test it, but you should get completion times at around 5 sec with both these approaches.
I have a Task which I do not await because I want it to continue its own logic in the background. Part of that logic is to delay 60 seconds and check back in to see if some minute work is to be done. The abbreviate code looks something like this:
public Dictionary<string, Task> taskQueue = new Dictionary<string, Task>();
// Entry point
public void DoMainWork(string workId, XmlDocument workInstructions){
// A work task (i.e. "workInstructions") is actually a plugin which might use its own tasks internally or any other logic it sees fit.
var workTask = Task.Factory.StartNew(() => {
// Main work code that interprets workInstructions
// .........
// .........
// etc.
}, TaskCreationOptions.LongRunning);
// Add the work task to the queue of currently running tasks
taskQueue.Add(workId, workTask);
// Delay a period of time and then see if we need to extend our timeout for doing main work code
this.QueueCheckinOnWorkTask(workId); // Note the non-awaited task
}
private async Task QueueCheckinOnWorkTask(string workId){
DateTime startTime = DateTime.Now;
// Delay 60 seconds
await Task.Delay(60 * 1000).ConfigureAwait(false);
// Find out how long Task.Delay delayed for.
TimeSpan duration = DateTime.Now - startTime; // THIS SOMETIMES DENOTES TIMES MUCH LARGER THAN EXPECTED, I.E. 80+ SECONDS VS. 60
if(!taskQueue.ContainsKey(workId)){
// Do something based on work being complete
}else{
// Work is not complete, inform outside source we're still working
QueueCheckinOnWorkTask(workId); // Note the non-awaited task
}
}
Keep in mind, this is example code just to show a extremely miniminal version of what is going on with my actual program.
My problem is that Task.Delay() is delaying for longer than the time specified. Something is blocking this from continuing in a reasonable timeframe.
Unfortunately I haven't been able to replicate the issue on my development machine and it only happens on the server every couple of days. Lastly, it seems related to the number of work tasks we have running at a time.
What would cause this to delay longer than expected? Additionally, how might one go about debugging this type of situation?
This is a follow up to my other question which did not receive an answer: await Task.Delay() delaying for longer that expected
Most often that happens because of thread pool saturation. You can clearly see its effect with this simple console application (I measure time the same way you are doing, doesn't matter in this case if we use stopwatch or not):
public class Program {
public static void Main() {
for (int j = 0; j < 10; j++)
for (int i = 1; i < 10; i++) {
TestDelay(i * 1000);
}
Console.ReadKey();
}
static async Task TestDelay(int expected) {
var startTime = DateTime.Now;
await Task.Delay(expected).ConfigureAwait(false);
var actual = (int) (DateTime.Now - startTime).TotalMilliseconds;
ThreadPool.GetAvailableThreads(out int aw, out _);
ThreadPool.GetMaxThreads(out int mw, out _);
Console.WriteLine("Thread: {3}, Total threads in pool: {4}, Expected: {0}, Actual: {1}, Diff: {2}", expected, actual, actual - expected, Thread.CurrentThread.ManagedThreadId, mw - aw);
Thread.Sleep(5000);
}
}
This program starts 100 tasks which await Task.Delay for 1-10 seconds, and then use Thread.Sleep for 5 seconds to simulate work on a thread on which continuation runs (this is thread pool thread). It will also output total number of threads in thread pool, so you will see how it increases over time.
If you run it you will see that in almost all cases (except first 8) - actual time after delay is much longer than expected, in some cases 5 times longer (you delayed for 3 seconds but 15 seconds has passed).
That's not because Task.Delay is so imprecise. The reason is continuation after await should be executed on a thread pool thread. Thread pool will not always give you a thread when you request. It can consider that instead of creating new thread - it's better to wait for one of the current busy threads to finish its work. It will wait for a certain time and if no thread became free - it will still create a new thread. If you request 10 thread pool threads at once and none is free, it will wait for Xms and create new one. Now you have 9 requests in queue. Now it will again wait for Xms and create another one. Now you have 8 in queue, and so on. This wait for a thread pool thread to become free is what causes increased delay in this console application (and most likely in your real program) - we keep thread pool threads busy with long Thread.Sleep, and thread pool is saturated.
Some parameters of heuristics used by thread pool are available for you to control. Most influential one is "minumum" number of threads in a pool. Thread pool is expected to always create new thread without delay until total number of threads in a pool reaches configurable "minimum". After that, if you request a thread, it might either still create new one or wait for existing to become free.
So the most straightforward way to remove this delay is to increase minimum number of threads in a pool. For example if you do this:
ThreadPool.GetMinThreads(out int wt, out int ct);
ThreadPool.SetMinThreads(100, ct); // increase min worker threads to 100
All tasks in the example above will complete at the expected time with no additional delay.
This is usually not recommended way to solve this problem though. It's better to avoid performing long running heavy operations on thread pool threads, because thread pool is a global resource and doing this affects your whole application. For example, if we remove Thread.Sleep(5000) in the example above - all tasks will delay for expected amount of time, because all what keeps thread pool thread busy now is Console.WriteLine statement which completes in no time, making this thread available for other work.
So to sum up: identify places where you perform heavy work on thread pool threads and avoid doing that (perform heavy work on separate, non-thread-pool threads instead). Alternatively, you might consider increasing minimum number of threads in a pool to a reasonable amount.
private async Task MainTask(CancellationToken token)
{
List<Task> tasks = new List<Task>();
do
{
var data = StaticVariables.AllData;
foreach (var dataPiece in data)
{
tasks.Add((new Task(() => DoSomething(data))));
}
Parallel.ForEach(tasks, task => task.Start());
await Task.WhenAll(tasks);
tasks.Clear();
await Task.Delay(2000);
} while (!token.IsCancellationRequested);
}
The above function is supposed to start a number of DoSomething(task) methods and run them at the same time. DoSomething has a timeout of 2 sec before it returns false. After some testing, it seems that the part between
await Task.WhenAll(tasks);
and
tasks.Clear()
is taking roughly 2 sec * number of tasks. So it would seem they do it like that:
Start task
do it or abort after 2 sec
start next task
...
How could I do it so that they all start at the same time and perform their operations simultaneously?
EDIT
Doing it like so:
await Task.WhenAll(data.Select(dataPiece => Task.Run(() => DoSomething(dataPiece)))
results in horrible performance (around 25 sec to complete the old code, 115 sec to complete this)
The issue you are seeing here is due to the fact that the thread pool maintains a minimum number of threads ready to run. If the thread pool needs to create more threads than that minimum, it introduces a deliberate 1 second delay between creating each new thread.
This is done to prevent things like "thread stampedes" from swamping the system with many simultaneous thread creations.
You can change the minimum thread limit using the ThreadPool.SetMinThreads() method. However, it is not recommended to do this, since it is subverting the expected thread pool operation and may cause other processes to slow down.
If you really must do it though, here's an example console application:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;
namespace ConsoleApp3
{
class Program
{
static Stopwatch sw = Stopwatch.StartNew();
static void Main()
{
runTasks();
setMinThreadPoolThreads(30);
runTasks();
}
static void setMinThreadPoolThreads(int count)
{
Console.WriteLine("\nSetting min thread pool threads to {0}.\n", count);
int workerThreads, completionPortThreads;
ThreadPool.GetMinThreads(out workerThreads, out completionPortThreads);
ThreadPool.SetMinThreads(count, completionPortThreads);
}
static void runTasks()
{
var sw = Stopwatch.StartNew();
Console.WriteLine("\nStarting tasks.");
var task = test(20);
Console.WriteLine("Waiting for tasks to finish.");
task.Wait();
Console.WriteLine("Finished after " + sw.Elapsed);
}
static async Task test(int n)
{
var tasks = new List<Task>();
for (int i = 0; i < n; ++i)
tasks.Add(Task.Run(new Action(task)));
await Task.WhenAll(tasks);
}
static void task()
{
Console.WriteLine("Task starting at time " + sw.Elapsed);
Thread.Sleep(5000);
Console.WriteLine("Task stopping at time " + sw.Elapsed);
}
}
}
If you run it, you'll see from the output that running test() before setting the minimum thread pool size the tasks will take around 10 seconds (and you'll see the delay between the task start times increases after the first few tasks).
After setting the minimum thread pool threads to 30, the delay between new tasks starting is much shorter, and the overall time to run test() drops to around 5 seconds (on my PC - yours may be different!).
However, I just want to reiterate that setting the minimum thread pool size is not a normal thing to do, and should be approached with caution. As the Microsoft documentation says:
By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.
First of all, you should utilize Task.Run instead of creating and starting tasks in separate steps.
You can do so inside the loop or Linq style. If you use Linq, just ensure that you are not stuck with deferred execution, where the second task only starts after the first one is completed. Create a list, array or some other persistent collection of your selected tasks:
await Task.WhenAll(data.Select(dataPiece => Task.Run(() => DoSomething(dataPiece)).ToList());
The other problem is with the content of DoSomething. As long as this is a synchronous method, it will block its executing thread until it is done. For an inherently asynchronous operation (like pinging some network address), redesigning the method can prevent this thread blocking behavior.
Another option, as answered by Matthew Watson is to increase the amount of available threads, so each task can run in its own thread. This is not the best option, but if you have many tasks that have long blocking time without doing actual work, more threads will help to get the work done.
More threads will not help if the tasks are actually using the available physical resources, CPU or IO bound work.
Let's say I want to start roughly N tasks per second distributed equally.
So I tried this:
public async Task Generate(int numberOfCallsPerSecond)
{
var delay = TimeSpan.FromMiliseconds(1000/numberOfCallsPerSecond); // a call should happen every 1000 / numberOfCallsPerSecond miliseconds
for (int i=0; i < numberOfcallsPerSecond; i++)
{
Task t = Call(); // don't wait for result here
await Task.Delay(delay);
}
}
At first I expected this to run in 1 second but for numberOfCallsPerSecond = 100 it takes 16 seconds on my 12 core CPU.
It seems the await Task.Delay adds a lot of overhead (of course without it in place generation of the calls happens in 3ms.
I didn't expect that await would add so much overhead in this scenario. Is this normal?
EDIT:
Please forget about the Call(). Running this code shows similiar result:
public async Task Generate(int numberOfCallsPerSecond)
{
var delay = TimeSpan.FromMiliseconds(1000/numberOfCallsPerSecond); // a call should happen every 1000 / numberOfCallsPerSecond miliseconds
for (int i=0; i < numberOfcallsPerSecond; i++)
{
await Task.Delay(delay);
}
}
I tried to run it with numberOfCallsPerSecond = 500 and it takes around 10 seconds, I expected Generate to take roughly 1 second, not 10 times more
Task.Delay is lightweight but not accurate. Since the loop without delay completes much faster, it sounds like your thread is going idle and using an OS sleep to wait for the timer to elapse. The timer is checked according to the OS thread scheduling quantum (in the same interrupt handler which performs thread pre-emption), which is 16ms by default.
You can reduce the quantum with timeBeginPeriod, but a better (more power efficient) approach if you need rate limiting rather than exact timing is to keep track of elapsed time (the Stopwatch class is good for this) and number of calls made, and only delay when calls made have caught up to elapsed time. The overall effect is that your thread will wake up ~60 times per second, and start a few work items each time it does. If your CPU becomes busy with something else, you'll start extra work items when you get control back -- although it's also pretty straightforward to cap the number of items started at once, if that's what you need.
public async Task Generate(int numberOfCallsPerSecond)
{
var elapsed = Stopwatch.StartNew();
var delay = TimeSpan.FromMilliseconds(1000/numberOfCallsPerSecond); // a call should happen every 1000 / numberOfCallsPerSecond milliseconds
for (int i=0; i < numberOfcallsPerSecond; i++)
{
Call(); // don't wait for result here
int expectedI = elapsed.Elapsed.TotalSeconds * numberOfCallsPerSecond;
if (i > expectedI) await Task.Delay(delay);
}
}
My psychic debugger says your Call method has a significant synchronous part (i.e the part before an await) which takes time to execute synchronously.
If you want the Generate method only to "fire up" these Call calls and have them run concurrently (including the synchronous parts) you need to offload them to a ThreadPool thread using Task.Run:
var task = Task.Run(() => Call());
await Task.Delay(delay);
Task.Delay adds almost no overhead. It uses a System.Threading.Timer internally that requires very little resources.
If you use a timespan with Task.Delay(), it'll kill the CPU. Use an integer and it wont. True story. no idea why.