I am using Task.Run(() => this.someMethod()) to schedule a back ground job. I am not interested in the operation result and need to move on with flow of application.
But, sometimes my background task is not getting scheduled for a long time.This has started to happen since we moved from .Net 4.7 from 4.5. Even while debugging the break points are either not hit or hit after considerable delay( > 10 minutes).
Has anyone noticed this behavior or know whats causing it?
I am running on i7 core, 16 GB RAM.
Having your Task taking 10 minutes to even start sounds fishy. My guess is your system is under heavy load, or you have a lot of tasks running.
I'm going to attack the later (for a specific situation).
TaskCreationOptions
LongRunning Specifies that a task will be a long-running,
coarse-grained operation involving fewer, larger components than
fine-grained systems. It provides a hint to the TaskScheduler that
oversubscription may be warranted. Oversubscription lets you create
more threads than the available number of hardware threads. It also
provides a hint to the task scheduler that an additional thread might
be required for the task so that it does not block the forward
progress of other threads or work items on the local thread-pool
queue.
var task = new Task(() => MyLongRunningMethod(),TaskCreationOptions.LongRunning);
task.Start();
This is a quote from Stephen Toub - MSFT on this post
Under the covers, it's going to result in a higher number of threads
being used, because its purpose is to allow the ThreadPool to continue
to process work items even though one task is running for an extended
period of time; if that task were running in a thread from the pool,
that thread wouldn't be able to service other tasks. You'd typically
only use LongRunning if you found through performance testing that not
using it was causing long delays in the processing of other work.
Its hard to know what is your problem with out looking at all your code, however i posted this as a sugestion.
I don't know how many tasks you start this way, but unless the number is really high I would focus the debugging on the method being called, not the caller. A delay of 10 minutes is more likely caused by a deadlock or network issue than the task scheduling.
Some ideas:
For a start, I would add something to the beginning and the end of the method being called that lets you know when it starts to execute and when it finishes. Like a Debug.WriteLine() with a timestamp and task ID.
Make sure the method being called releases all resources, even if it crashes. Crashing threads/tasks may go unnoticed, because they don't cause the application to crash.
Double check that the method being called is thread-safe. You may have been lucky in the past and some new framework optimization is now causing havoc.
Related
Context: we have a task which might take from 30 seconds to 5 minutes depending on a service we are consuming in some Azure Functions.
We are planning to monitor the current status of that task object to make sure it's running and has not been cancelled/faulted.
There are two ways to go around it:
Create a Task, run it and then cancel it when the main task is finished. Alternatively, maybe use Task.Delay along with a while with a condition.
Create a Thread, run it and wait for it to finish (with a while condition to avoid a while that runs forever).
We have done some research and have realised that both have pros and cons. But we are still not sure about which one would be the best approach and why.
In a similar scenario, what would you use? A task, a thread, or something else?
Using a thread is a bit wasteful, but slightly more reliable.
It is wasteful because each thread allocates 1 MB of memory just for its mere existence.
It is more reliable because it doesn't depend on the availability of ThreadPool threads for running a timer event. A sudden burst in demand for ThreadPool threads could leave the ThreadPool starved for several seconds, or even minutes (in extreme scenarios).
So if wasting 1 MB of memory is a non-issue for the app, use a thread. On the other hand if the absolute precision in the timing of the events is something unimportant, use a task.
You could also use a task started with the option LongRunning, but this is essentially a thread in disguise.
When working with tasks, a rule of thumb appears to be that the thread pool - typically used by e.g. invoking Task.Run(), or Parallel.Invoke() - should be used for relatively short operations. When working with long running operations, we are supposed to use the TaskCreationOptions.LongRunning flag in order to - as far as I understand it - avoid clogging the thread pool queue, i.e. to push work to a newly-created thread.
But what exactly is a long running operation? How long is long, in terms of time? Are there other factors besides the expected task duration to be considered when deciding whether or not to use the LongRunning, like the anticipated CPU architecture (frequency, the number of cores, ...) or the number of tasks that will be attempted to be run at once from the programmer's perspective?
For example, suppose I have 500 tasks to process in a dedicated application, each taking 10-20 seconds to complete. Should I just start all 500 tasks using Task.Run (e.g. in a loop) and then await them all, perhaps as LongRunning, while leaving the default max level of concurrency? Then again, if I set LongRunning in such case, wouldn't this create 500 new threads and actually cause a lot of overhead and higher memory usage (due to extra threads being allocated) as compared to omitting LongRunning? This is assuming that no new tasks will be scheduled for execution while these 500 are being awaited.
I would guess that the decision to set LongRunning depends on the number of requests made to the thread pool in a given time interval, and that LongRunning should only be used for tasks that are expected to take significantly longer that the majority of the thread pool-placed tasks - by definition, at most a small percentage of all tasks. In other words, this appears to be a queuing and thread pool utilization optimization problem that should likely be solved case-by-case through testing, if at all. Am I correct?
It kind of doesn't matter. The problem isn't really about time, it's about what your code is doing. If you're doing asynchronous I/O, you're only using the thread for the short amount of time between individual requests. If you're doing CPU work... well, you're using the CPU. There's no "thread-pool starvation", because the CPUs are fully utilized.
The real problem is when you're doing blocking work that doesn't use the CPU. In case like that, thread-pool starvation leads to CPU-underutilization - you said "I need the CPU for my work" and then you don't actually use it.
If you're not using blocking APIs, there's no point in using Task.Run with LongRunning. If you have to run some legacy blocking code asynchronously, using LongRunning may be a good idea. Total work time isn't as important as "how often you are doing this". If you spin up one thread based on a user clicking on a GUI, the cost is tiny compared to all the latencies already included in the act of clicking a button in the first place, and you can use LongRunning just fine to avoid the thread-pool. If you're running a loop that spawns lots of blocking tasks... stop doing that. It's a bad idea :D
For example, imagine there is no asynchronous API alternative File.Exists. So if you see that this is giving you trouble (e.g. over a faulty network connection), you'd fire it up using Task.Run - and since you're not doing CPU work, you'd use LongRunning.
In contrast, if you need to do some image manipulation that's basically 100% CPU work, it doesn't matter how long the operation takes - it's not a LongRunning thing.
And finally, the most common scenario for using LongRunning is when your "work" is actually the old-school "loop and periodically check if something should be done, do it and then loop again". Long running, but 99% of the time just blocking on some wait handle or something like that. Again, this is only useful when dealing with code that isn't CPU-bound, but that doesn't have proper asynchronous APIs. You might find something like this if you ever need to write your own SynchronizationContext, for example.
Now, how do we apply this to your example? Well, we can't, not without more information. If your code is CPU-bound, Parallel.For and friends are what you want - those ensure you only use enough threads to sature the CPUs, and it's fine to use the thread-pool for that. If it's not CPU bound... you don't really have any option besides using LongRunning if you want to run the tasks in parallel. Ideally, such work would consist of asynchronous calls you can safely invoke and await Task.WhenAll(...) from your own thread.
When working with tasks, a rule of thumb appears to be that the thread pool - typically used by e.g. invoking Task.Run(), or Parallel.Invoke() - should be used for relatively short operations. When working with long running operations, we are supposed to set the TaskCreationOptions.LongRunning to true in order to - as far as I understand it - avoid clogging the thread pool queue, i.e. to push work to a newly-created thread.
The vast majority of the time, you don't need to use LongRunning at all, because the thread pool will adjust to "losing" a thread to a long-running operation after 2 seconds.
The main problem with LongRunning is that it forces you to use the very dangerous StartNew API.
In other words, this appears to be a queuing and thread pool utilization optimization problem that should likely be solved case-by-case through testing, if at all. Am I correct?
Yes. You should never set LongRunning when first writing code. If you are seeing delays due to the thread pool injection rate, then you can carefully add LongRunning.
You should not use TaskCreationOptions.LongRunning in your case. I would use Parallel.For.
The LongRunning option is not to be used if you're going to create a lot of tasks, just like in your case. It is to be used for creating couple of tasks that will be running for a Long Time.
By the way, i never used this option in any similar scenario.
As you point out, TaskCreationOptions.LongRunning's purpose is
to allow the ThreadPool to continue to process work items even though one task is running for an extended period of time
As for when to use it:
It's not a specific length per se...You'd typically only use LongRunning if you found through performance testing that not using it was causing long delays in the processing of other work.
Source
By default, the CLR runs tasks on pooled threads, which is ideal for
short-running compute-bound work. For longer-running and blocking
operations, you can prevent use of a pooled thread as follows:
Task task = Task.Factory.StartNew (() => ...,
TaskCreationOptions.LongRunning);
I am reading topic about thread and task. Can you explain to me what are "long[er]-running" and "short-running" tasks?
In general thread pooling, you distinguish short-running and long-running threads based on the comparison between their start-up time and run time.
Threads generally take some time to be created and get up to the point where they can start running your code.
The means that if you run a large number of threads where they each take a minute to start but only run for a second (not accurate times but the intent here is simply to show the relationship), the run time of each will be swamped by the time taken to get them going in the first place.
That's one of the reasons for using a thread pool: the threads aren't terminated once their work is done. Instead, they hang around to be reused so that the start-up time isn't incurred again.
So, in that sense, a long running thread is one whose run time is far greater than the time required to start it. In that case, the start-up time is far less important than it is for short running threads.
Conversely, short running threads are ones whose run time is less than or comparable to the start-up time.
For .NET specifically, it's a little different in operation. The thread pooling code will, once it's reached the minimum number of threads, attempt to limit thread creation to one per half-second.
Hence, if you know your thread is going to be long running, you should notify the scheduler so that it can adjust itself accordingly. This will probably mean just creating a new thread rather than grabbing one from the pool, so that the pool can be left to service short-running tasks as intended (no guarantees on that behaviour but it would make sense to do it that way).
However, that doesn't change the meaning of long-running and short-running, all it means is that there's some threshold at which it makes sense to distinguish between the two. For .NET, I would suggest the half-second figure would be a decent choice.
After working a while noticed that, even if you spawn 1000 tasks, they don't start immediately. So basically even if i start 1000 tasks, 100 of them running and 900 of them waiting to run.
So my question is, how are they begining to start ?
How .net determines when to start running task or make it waittorun ?
What methodolgy i can follow to start them immediately ?
I want to have certain number of task/thread running all the time.
If i use threads instead of tasks would they start running immediately or .net will start them as it please like tasks ?
Question may not be very clear, so please ask me to clarify.
Basically i am spawning 1000 (keeping this number spawned. when 1 task completed starting another task) tasks but only 125 of them Running and 875 of them WaitingToRun :)
this is how i start task
Task.Factory.StartNew(() =>
{
startCheckingProxies();
});
c# wpf 4.5
If you are talking about Task objects, they are run on top of the thread pool, so they will not all start immediately by running each on a separate thread. Instead, the limited number of tasks will initially be started on threads coming from the pool, and then the threads will be reused to run next tasks and so on.
Of course, this is just a high-level description, the logic behind is more complex and implements lot of optimizations.
You can find more info here and here
You can also start tasks with the overload of StartNew which lets you tweak options and scheduler settings. Note, however, that running on a large number of threads will likely result in worse performance. Thread creation and context switching have significant costs, and running thousands of threads will, IMO, backfire.
Tasks are really just threads under the hood.
There is a limit to how much benefit you can get by spawning new threads. Each thread has some overhead, so at some point, the overhead is going to exceed the benefit of spawning a new thread. If you leave the spawning of those tasks to the Framework, it is going to decide for itself how many threads it's going to run at once, and it's going to make that decision based on how much productivity it thinks it can get from those threads.
I'm pretty sure that optimal number is not going to be a thousand; I've written Windows Services where the optimal number of threads to run at the same time is the number of cores in the machine (in my case, it was 4).
I updated my code to use Tasks instead of threads....
Looking at memory usage and CPU I do not notices any improvements on the multi-core PC, Is this expected?
My application essentially starts up threads/tasks in different objects when it runs...
All I'm doing is a simple
Task a = new Task(...)
a.Start();
There are various implications to using Tasks instead of Threads, but performance isn't a major one (assuming you weren't creating huge numbers of threads.) A few key differences:
The default TaskScheduler will use thread pooling, so some Tasks may not start until other pending Tasks have completed. If you use Thread directly, every use will start a new Thread.
When an exception occurs in a Task, it gets wrapped into an AggregateException that calling code can receive when it waits for the Task to complete or if you register a continuation on the Task. This is because you can also do things like wait on multiple Tasks to complete, in which case multiple exceptions can be thrown and aggregated.
If you don't observe an unhandled exception thrown by a Task, it will (well, may) eventually be thrown by the finalizer of the Task, which is particularly nasty. I always recommend hooking the TaskScheduler.UnobservedTaskException event so that you can at least log these failures before the application blows up. This is different from Thread exceptions, which show up in the AppDomain.UnhandledException event.
If you simply replaced every usage of Thread with Task and did no other changes I would expect virtually the same performance. The Task API is really just that, it's an API over an existing set of constructs. Under the hood it uses threads to schedule it's activities and hence has similar performance characteristics.
What's great about Task are the new things you can do with them
Composition with ContinueWith
Cancellation
Hierarchies
Etc ...
One great improvement of Takss vs. Threads is that you can easiely build chains of tasks. You can specify when a task should start after the previous task ("OnSuccess", "OnError", a.s.o.) and you can specify if there should be a synchronization context switch. That gives you the great opportunity to run a long running task in bakcground and after that a UI refershing task on the UI thread.
If you are using .Net 4.0 then you can use the Parallel.Invoke method like so
Parallel.Invoke(()=> {
// What ever code you add here will get threaded.
});
for more info see
http://msdn.microsoft.com/en-us/library/dd992634.aspx
You would see difference if your original or converted code do not utlize CPU completely. I.e. if original code always limited number of threads to 2, on quad-core machine it will run at about 50% load with manually created threads and potentially 100% load with tasks (if your tasks can be actaully paralellized). So it looks like either your original code was reasonable from performance point of view, or both implemetaion suffer issues showing similar underutiliztion of CPU.