By default, the CLR runs tasks on pooled threads, which is ideal for
short-running compute-bound work. For longer-running and blocking
operations, you can prevent use of a pooled thread as follows:
Task task = Task.Factory.StartNew (() => ...,
TaskCreationOptions.LongRunning);
I am reading topic about thread and task. Can you explain to me what are "long[er]-running" and "short-running" tasks?
In general thread pooling, you distinguish short-running and long-running threads based on the comparison between their start-up time and run time.
Threads generally take some time to be created and get up to the point where they can start running your code.
The means that if you run a large number of threads where they each take a minute to start but only run for a second (not accurate times but the intent here is simply to show the relationship), the run time of each will be swamped by the time taken to get them going in the first place.
That's one of the reasons for using a thread pool: the threads aren't terminated once their work is done. Instead, they hang around to be reused so that the start-up time isn't incurred again.
So, in that sense, a long running thread is one whose run time is far greater than the time required to start it. In that case, the start-up time is far less important than it is for short running threads.
Conversely, short running threads are ones whose run time is less than or comparable to the start-up time.
For .NET specifically, it's a little different in operation. The thread pooling code will, once it's reached the minimum number of threads, attempt to limit thread creation to one per half-second.
Hence, if you know your thread is going to be long running, you should notify the scheduler so that it can adjust itself accordingly. This will probably mean just creating a new thread rather than grabbing one from the pool, so that the pool can be left to service short-running tasks as intended (no guarantees on that behaviour but it would make sense to do it that way).
However, that doesn't change the meaning of long-running and short-running, all it means is that there's some threshold at which it makes sense to distinguish between the two. For .NET, I would suggest the half-second figure would be a decent choice.
Related
When working with tasks, a rule of thumb appears to be that the thread pool - typically used by e.g. invoking Task.Run(), or Parallel.Invoke() - should be used for relatively short operations. When working with long running operations, we are supposed to use the TaskCreationOptions.LongRunning flag in order to - as far as I understand it - avoid clogging the thread pool queue, i.e. to push work to a newly-created thread.
But what exactly is a long running operation? How long is long, in terms of time? Are there other factors besides the expected task duration to be considered when deciding whether or not to use the LongRunning, like the anticipated CPU architecture (frequency, the number of cores, ...) or the number of tasks that will be attempted to be run at once from the programmer's perspective?
For example, suppose I have 500 tasks to process in a dedicated application, each taking 10-20 seconds to complete. Should I just start all 500 tasks using Task.Run (e.g. in a loop) and then await them all, perhaps as LongRunning, while leaving the default max level of concurrency? Then again, if I set LongRunning in such case, wouldn't this create 500 new threads and actually cause a lot of overhead and higher memory usage (due to extra threads being allocated) as compared to omitting LongRunning? This is assuming that no new tasks will be scheduled for execution while these 500 are being awaited.
I would guess that the decision to set LongRunning depends on the number of requests made to the thread pool in a given time interval, and that LongRunning should only be used for tasks that are expected to take significantly longer that the majority of the thread pool-placed tasks - by definition, at most a small percentage of all tasks. In other words, this appears to be a queuing and thread pool utilization optimization problem that should likely be solved case-by-case through testing, if at all. Am I correct?
It kind of doesn't matter. The problem isn't really about time, it's about what your code is doing. If you're doing asynchronous I/O, you're only using the thread for the short amount of time between individual requests. If you're doing CPU work... well, you're using the CPU. There's no "thread-pool starvation", because the CPUs are fully utilized.
The real problem is when you're doing blocking work that doesn't use the CPU. In case like that, thread-pool starvation leads to CPU-underutilization - you said "I need the CPU for my work" and then you don't actually use it.
If you're not using blocking APIs, there's no point in using Task.Run with LongRunning. If you have to run some legacy blocking code asynchronously, using LongRunning may be a good idea. Total work time isn't as important as "how often you are doing this". If you spin up one thread based on a user clicking on a GUI, the cost is tiny compared to all the latencies already included in the act of clicking a button in the first place, and you can use LongRunning just fine to avoid the thread-pool. If you're running a loop that spawns lots of blocking tasks... stop doing that. It's a bad idea :D
For example, imagine there is no asynchronous API alternative File.Exists. So if you see that this is giving you trouble (e.g. over a faulty network connection), you'd fire it up using Task.Run - and since you're not doing CPU work, you'd use LongRunning.
In contrast, if you need to do some image manipulation that's basically 100% CPU work, it doesn't matter how long the operation takes - it's not a LongRunning thing.
And finally, the most common scenario for using LongRunning is when your "work" is actually the old-school "loop and periodically check if something should be done, do it and then loop again". Long running, but 99% of the time just blocking on some wait handle or something like that. Again, this is only useful when dealing with code that isn't CPU-bound, but that doesn't have proper asynchronous APIs. You might find something like this if you ever need to write your own SynchronizationContext, for example.
Now, how do we apply this to your example? Well, we can't, not without more information. If your code is CPU-bound, Parallel.For and friends are what you want - those ensure you only use enough threads to sature the CPUs, and it's fine to use the thread-pool for that. If it's not CPU bound... you don't really have any option besides using LongRunning if you want to run the tasks in parallel. Ideally, such work would consist of asynchronous calls you can safely invoke and await Task.WhenAll(...) from your own thread.
When working with tasks, a rule of thumb appears to be that the thread pool - typically used by e.g. invoking Task.Run(), or Parallel.Invoke() - should be used for relatively short operations. When working with long running operations, we are supposed to set the TaskCreationOptions.LongRunning to true in order to - as far as I understand it - avoid clogging the thread pool queue, i.e. to push work to a newly-created thread.
The vast majority of the time, you don't need to use LongRunning at all, because the thread pool will adjust to "losing" a thread to a long-running operation after 2 seconds.
The main problem with LongRunning is that it forces you to use the very dangerous StartNew API.
In other words, this appears to be a queuing and thread pool utilization optimization problem that should likely be solved case-by-case through testing, if at all. Am I correct?
Yes. You should never set LongRunning when first writing code. If you are seeing delays due to the thread pool injection rate, then you can carefully add LongRunning.
You should not use TaskCreationOptions.LongRunning in your case. I would use Parallel.For.
The LongRunning option is not to be used if you're going to create a lot of tasks, just like in your case. It is to be used for creating couple of tasks that will be running for a Long Time.
By the way, i never used this option in any similar scenario.
As you point out, TaskCreationOptions.LongRunning's purpose is
to allow the ThreadPool to continue to process work items even though one task is running for an extended period of time
As for when to use it:
It's not a specific length per se...You'd typically only use LongRunning if you found through performance testing that not using it was causing long delays in the processing of other work.
Source
Okay, So I wanted to know what happens when I use TaskCreationOptions.LongRunning. By this answer, I came to know that for long running tasks, I should use this options because it creates a thread outside of threadpool.
Cool. But what advantage would I get when I create a thread outside threadpool? And when to do it and avoid it?
what advantage would I get when I create a thread outside threadpool?
The threadpool, as it name states, is a pool of threads which are allocated once and re-used throughout, in order to save the time and resources necessary to allocate a thread. The pool itself re-sizes on demand. If you queue more work than actual workers exist in the pool, it will allocate more threads in 500ms intervals, one at a time (this exists to avoid allocation of multiple threads simultaneously where existing threads may already finish executing and can serve requests). If many long running operations are performed on the thread-pool, it causes "thread starvation", meaning delegates will start getting queued and ran only once a thread frees up. That's why you'd want to avoid a large amount of threads doing lengthy work with thread-pool threads.
The Managed Thread-Pool docs also have a section on this question:
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
For more, see:
Thread vs ThreadPool
When should I not use the ThreadPool in .Net?
Dedicated thread or thread-pool thread?
"Long running" can be quantified pretty well, a thread that takes more than half a second is running long. That's a mountain of processor instructions on a modern machine, you'd have to burn a fat five billion of them per second. Pretty hard to do in a constructive way unless you are calculating the value of Pi to thousands of decimals in the fraction.
Practical threads can only take that long when they are not burning core but are waiting a lot. Invariably on an I/O completion, like reading data from a disk, a network, a dbase server. And often the reason you'd start considering using a thread in the first place.
The threadpool has a "manager". It determines when a threadpool thread is allowed to start. It doesn't happen immediately when you start it in your program. The manager tries to limit the number of running threads to the number of CPU cores you have. It is much more efficient that way, context switching between too many active threads is expensive. And a good throttle, preventing your program from consuming too many resources in a burst.
But the threadpool manager has the very common problem with managers, it doesn't know enough about what is going on. Just like my manager doesn't know that I'm goofing off at Stackoverflow.com, the tp manager doesn't know that a thread is waiting for something and not actually performing useful work. Without that knowledge it cannot make good decisions. A thread that does a lot of waiting should be ignored and another one should be allowed to run in its place. Actually doing real work.
Just like you tell your manager that you go on vacation, so he can expect no work to get done, you tell the threadpool manager the same thing with LongRunning.
Do note that it isn't quite a bad as it, perhaps, sounds in this answer. Particularly .NET 4.0 hired a new manager that's a lot smarter at figuring out the optimum number of running threads. It does so with a feedback loop, collecting data to discover if active threads actually get work done. And adjusts the optimum accordingly. Only problem with this approach is the common one when you close a feedback loop, you have to make it slow so the loop cannot become unstable. In other words, it isn't particularly quick at driving up the number of active threads.
If you know ahead of time that the thread is pretty abysmal, running for many seconds with no real cpu load then always pick LongRunning. Otherwise it is a tuning job, observing the program when it is done and tinkering with it to make it more optimal.
After working a while noticed that, even if you spawn 1000 tasks, they don't start immediately. So basically even if i start 1000 tasks, 100 of them running and 900 of them waiting to run.
So my question is, how are they begining to start ?
How .net determines when to start running task or make it waittorun ?
What methodolgy i can follow to start them immediately ?
I want to have certain number of task/thread running all the time.
If i use threads instead of tasks would they start running immediately or .net will start them as it please like tasks ?
Question may not be very clear, so please ask me to clarify.
Basically i am spawning 1000 (keeping this number spawned. when 1 task completed starting another task) tasks but only 125 of them Running and 875 of them WaitingToRun :)
this is how i start task
Task.Factory.StartNew(() =>
{
startCheckingProxies();
});
c# wpf 4.5
If you are talking about Task objects, they are run on top of the thread pool, so they will not all start immediately by running each on a separate thread. Instead, the limited number of tasks will initially be started on threads coming from the pool, and then the threads will be reused to run next tasks and so on.
Of course, this is just a high-level description, the logic behind is more complex and implements lot of optimizations.
You can find more info here and here
You can also start tasks with the overload of StartNew which lets you tweak options and scheduler settings. Note, however, that running on a large number of threads will likely result in worse performance. Thread creation and context switching have significant costs, and running thousands of threads will, IMO, backfire.
Tasks are really just threads under the hood.
There is a limit to how much benefit you can get by spawning new threads. Each thread has some overhead, so at some point, the overhead is going to exceed the benefit of spawning a new thread. If you leave the spawning of those tasks to the Framework, it is going to decide for itself how many threads it's going to run at once, and it's going to make that decision based on how much productivity it thinks it can get from those threads.
I'm pretty sure that optimal number is not going to be a thousand; I've written Windows Services where the optimal number of threads to run at the same time is the number of cores in the machine (in my case, it was 4).
In an attempt to speed up processing of physics objects in C# I decided to change a linear update algorithm into a parallel algorithm. I believed the best approach was to use the ThreadPool as it is built for completing a queue of jobs.
When I first implemented the parallel algorithm, I queued up a job for every physics object. Keep in mind, a single job completes fairly quickly (updates forces, velocity, position, checks for collision with the old state of any surrounding objects to make it thread safe, etc). I would then wait on all jobs to be finished using a single wait handle, with an interlocked integer that I decremented each time a physics object completed (upon hitting zero, I then set the wait handle). The wait was required as the next task I needed to do involved having the objects all be updated.
The first thing I noticed was that performance was crazy. When averaged, the thread pooling seemed to be going a bit faster, but had massive spikes in performance (on the order of 10 ms per update, with random jumps to 40-60ms). I attempted to profile this using ANTS, however I could not gain any insight into why the spikes were occurring.
My next approach was to still use the ThreadPool, however instead I split all the objects into groups. I initially started with only 8 groups, as that was how any cores my computer had. The performance was great. It far outperformed the single threaded approach, and had no spikes (about 6ms per update).
The only thing I thought about was that, if one job completed before the others, there would be an idle core. Therefore, I increased the number of jobs to about 20, and even up to 500. As I expected, it dropped to 5ms.
So my questions are as follows:
Why would spikes occur when I made the job sizes quick / many?
Is there any insight into how the ThreadPool is implemented that would help me to understand how best to use it?
Using threads has a price - you need context switching, you need locking (the job queue is most probably locked when a thread tries to fetch a new job) - it all comes at a price. This price is usually small compared to the actual work your thread is doing, but if the work ends quickly, the price becomes meaningful.
Your solution seems correct. A reasonable rule of thumb is to have twice as many threads as there are cores.
As you probably expect yourself, the spikes are likely caused by the code that manages the thread pools and distributes tasks to them.
For parallel programming, there are more sophisticated approaches than "manually" distributing work across different threads (even if using the threadpool).
See Parallel Programming in the .NET Framework for instance for an overview and different options. In your case, the "solution" may be as simple as this:
Parallel.ForEach(physicObjects, physicObject => Process(physicObject));
Here's my take on your two questions:
I'd like to start with question 2 (how the thread pool works) because it actually holds the key to answering question 1. The thread pool is implemented (without going into details) as a (thread-safe) work queue and a group of worker threads (which may shrink or enlarge as needed). As the user calls QueueUserWorkItem the task is put into the work queue. The workers keep polling the queue and taking work if they are idle. Once they manage to take a task, they execute it and then return to the queue for more work (this is very important!). So the work is done by the workers on-demand: as the workers become idle they take more pieces of work to do.
Having said the above, it's simple to see what is the answer to question 1 (why did you see a performance difference with more fine-grained tasks): it's because with fine-grain you get more load-balancing (a very desirable property), i.e. your workers do more or less the same amount of work and all cores are exploited uniformly. As you said, with a coarse-grain task distribution, there may be longer and shorter tasks, so one or more cores may be lagging behind, slowing down the overall computation, while other do nothing. With small tasks the problem goes away. Each worker thread takes one small task at a time and then goes back for more. If one thread picks up a shorter task it will go to the queue more often, If it takes a longer task it will go to the queue less often, so things are balanced.
Finally, when the jobs are too fine-grained, and considering that the pool may enlarge to over 1K threads, there is very high contention on the queue when all threads go back to take more work (which happens very often), which may account for the spikes you are seeing. If the underlying implementation uses a blocking lock to access the queue, then context switches are very frequent which hurts performance a lot and makes it seem rather random.
answer of question 1:
this is because of Thread switching , thread switching (or context switching in OS concepts) is CPU clocks that takes to switch between each thread , most of times multi-threading increases the speed of programs and process but when it's process is so small and quick size then context switching will take more time than thread's self process so the whole program throughput decreases, you can find more information about this in O.S concepts books .
answer of question 2:
actually i have a overall insight of ThreadPool , and i cant explain what is it's structure exactly.
to learn more about ThreadPool start here ThreadPool Class
each version of .NET Framework adds more and more capabilities utilizing ThreadPool indirectly. such as Parallel.ForEach Method mentioned before added in .NET 4 along with System.Threading.Tasks which makes code more readable and neat. You can learn more on this here Task Schedulers as well.
At very basic level what it does is: it creates let's say 20 threads and puts them into a lits. Each time it receives a delegate to execute async it takes idle thread from the list and executes delegate. if no available threads found it puts it into a queue. every time deletegate execution completes it will check if queue has any item and if so peeks one and executes in the same thread.
I have written a program which depends on threads heavily. In addition, there is a requirement to measure the total time taken by each thread, and also the execution time (kernel time plus user time).
There can be an arbitrary number of threads and many may run at once. This is down to user activity. I need them to run as quickly as possible, so using something which has some overhead like WMI/Performance Monitor to measure thread times is not ideal.
At the moment, I'm using GetThreadTimes, as shown in this article: http://www.codeproject.com/KB/dotnet/ExecutionStopwatch.aspx
My question is simple: I understand .NET threads may not correspond on a one-to-one basis with system threads (though in all my testing so far, it seems to have been one to one). That being the case, if .NET decides to put two or more of my threads into one system thread, am I going to get strange results from my timing code? If so (or even if not), is there another way to measure the kernel and user time of a .NET thread?
as it stated: Multithreading is managed internally by a thread scheduler, a function the CLR typically delegates to the operating system. A thread scheduler ensures all active threads are allocated appropriate execution time, and that threads that are waiting or blocked (for instance, on an exclusive lock or on user input) do not consume CPU time.
Theoretically NET team may implement their own scheduler, but i doubt this.So i think the GetThreadTimes function is what you need.