2000 Worker Threads and only few real - c#

I paused VS and came to the thread window. I see there > 2000 "Worker Thread" entries with the same call stack and different Id's (threads are created with Task.Factory.StartNew
method).
All these threads are waiting for one lock to be unlocked. This can be a bug in my application. The issue, that when I come to the task manager, i see +- standard amount of thread and memory usage. Is this a CLR optimization to not have many idle thread, or VS thread window bug?

It is a bug in your code. Deadlock is one of the universal threading bugs.
Getting to 2000 threads is possible. It is the job of the ThreadPool manager to limit the number of threads that can run. Governed by its SetMaxThreads() method. The default is a ridiculously large number, 1023 on my 4 core laptop. Depends on the .NET version as well, you probably have an 8 core machine. Actually getting that many started takes a while.
Deadlock is the easier threading bug to solve, you've got a lot of time to look at the call stacks to figure out where they are deadlocking. Unlike threading race bugs, the really nasty ones you're liable to get when you remove whatever lock causes the deadlock right now. Temporarily calling ThreadPool.SetMaxThreads(4, 10000) to limit the carnage is a decent strategy to not drown in the number of threads to look at and make the debugging attempt seem futile.

Related

Fire and Forget not getting scheduled

I am using Task.Run(() => this.someMethod()) to schedule a back ground job. I am not interested in the operation result and need to move on with flow of application.
But, sometimes my background task is not getting scheduled for a long time.This has started to happen since we moved from .Net 4.7 from 4.5. Even while debugging the break points are either not hit or hit after considerable delay( > 10 minutes).
Has anyone noticed this behavior or know whats causing it?
I am running on i7 core, 16 GB RAM.
Having your Task taking 10 minutes to even start sounds fishy. My guess is your system is under heavy load, or you have a lot of tasks running.
I'm going to attack the later (for a specific situation).
TaskCreationOptions
LongRunning Specifies that a task will be a long-running,
coarse-grained operation involving fewer, larger components than
fine-grained systems. It provides a hint to the TaskScheduler that
oversubscription may be warranted. Oversubscription lets you create
more threads than the available number of hardware threads. It also
provides a hint to the task scheduler that an additional thread might
be required for the task so that it does not block the forward
progress of other threads or work items on the local thread-pool
queue.
var task = new Task(() => MyLongRunningMethod(),TaskCreationOptions.LongRunning);
task.Start();
This is a quote from Stephen Toub - MSFT on this post
Under the covers, it's going to result in a higher number of threads
being used, because its purpose is to allow the ThreadPool to continue
to process work items even though one task is running for an extended
period of time; if that task were running in a thread from the pool,
that thread wouldn't be able to service other tasks. You'd typically
only use LongRunning if you found through performance testing that not
using it was causing long delays in the processing of other work.
Its hard to know what is your problem with out looking at all your code, however i posted this as a sugestion.
I don't know how many tasks you start this way, but unless the number is really high I would focus the debugging on the method being called, not the caller. A delay of 10 minutes is more likely caused by a deadlock or network issue than the task scheduling.
Some ideas:
For a start, I would add something to the beginning and the end of the method being called that lets you know when it starts to execute and when it finishes. Like a Debug.WriteLine() with a timestamp and task ID.
Make sure the method being called releases all resources, even if it crashes. Crashing threads/tasks may go unnoticed, because they don't cause the application to crash.
Double check that the method being called is thread-safe. You may have been lucky in the past and some new framework optimization is now causing havoc.

What is the advantage of creating a thread outside threadpool?

Okay, So I wanted to know what happens when I use TaskCreationOptions.LongRunning. By this answer, I came to know that for long running tasks, I should use this options because it creates a thread outside of threadpool.
Cool. But what advantage would I get when I create a thread outside threadpool? And when to do it and avoid it?
what advantage would I get when I create a thread outside threadpool?
The threadpool, as it name states, is a pool of threads which are allocated once and re-used throughout, in order to save the time and resources necessary to allocate a thread. The pool itself re-sizes on demand. If you queue more work than actual workers exist in the pool, it will allocate more threads in 500ms intervals, one at a time (this exists to avoid allocation of multiple threads simultaneously where existing threads may already finish executing and can serve requests). If many long running operations are performed on the thread-pool, it causes "thread starvation", meaning delegates will start getting queued and ran only once a thread frees up. That's why you'd want to avoid a large amount of threads doing lengthy work with thread-pool threads.
The Managed Thread-Pool docs also have a section on this question:
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
For more, see:
Thread vs ThreadPool
When should I not use the ThreadPool in .Net?
Dedicated thread or thread-pool thread?
"Long running" can be quantified pretty well, a thread that takes more than half a second is running long. That's a mountain of processor instructions on a modern machine, you'd have to burn a fat five billion of them per second. Pretty hard to do in a constructive way unless you are calculating the value of Pi to thousands of decimals in the fraction.
Practical threads can only take that long when they are not burning core but are waiting a lot. Invariably on an I/O completion, like reading data from a disk, a network, a dbase server. And often the reason you'd start considering using a thread in the first place.
The threadpool has a "manager". It determines when a threadpool thread is allowed to start. It doesn't happen immediately when you start it in your program. The manager tries to limit the number of running threads to the number of CPU cores you have. It is much more efficient that way, context switching between too many active threads is expensive. And a good throttle, preventing your program from consuming too many resources in a burst.
But the threadpool manager has the very common problem with managers, it doesn't know enough about what is going on. Just like my manager doesn't know that I'm goofing off at Stackoverflow.com, the tp manager doesn't know that a thread is waiting for something and not actually performing useful work. Without that knowledge it cannot make good decisions. A thread that does a lot of waiting should be ignored and another one should be allowed to run in its place. Actually doing real work.
Just like you tell your manager that you go on vacation, so he can expect no work to get done, you tell the threadpool manager the same thing with LongRunning.
Do note that it isn't quite a bad as it, perhaps, sounds in this answer. Particularly .NET 4.0 hired a new manager that's a lot smarter at figuring out the optimum number of running threads. It does so with a feedback loop, collecting data to discover if active threads actually get work done. And adjusts the optimum accordingly. Only problem with this approach is the common one when you close a feedback loop, you have to make it slow so the loop cannot become unstable. In other words, it isn't particularly quick at driving up the number of active threads.
If you know ahead of time that the thread is pretty abysmal, running for many seconds with no real cpu load then always pick LongRunning. Otherwise it is a tuning job, observing the program when it is done and tinkering with it to make it more optimal.

C# ThreadPool Implementation / Performance Spikes

In an attempt to speed up processing of physics objects in C# I decided to change a linear update algorithm into a parallel algorithm. I believed the best approach was to use the ThreadPool as it is built for completing a queue of jobs.
When I first implemented the parallel algorithm, I queued up a job for every physics object. Keep in mind, a single job completes fairly quickly (updates forces, velocity, position, checks for collision with the old state of any surrounding objects to make it thread safe, etc). I would then wait on all jobs to be finished using a single wait handle, with an interlocked integer that I decremented each time a physics object completed (upon hitting zero, I then set the wait handle). The wait was required as the next task I needed to do involved having the objects all be updated.
The first thing I noticed was that performance was crazy. When averaged, the thread pooling seemed to be going a bit faster, but had massive spikes in performance (on the order of 10 ms per update, with random jumps to 40-60ms). I attempted to profile this using ANTS, however I could not gain any insight into why the spikes were occurring.
My next approach was to still use the ThreadPool, however instead I split all the objects into groups. I initially started with only 8 groups, as that was how any cores my computer had. The performance was great. It far outperformed the single threaded approach, and had no spikes (about 6ms per update).
The only thing I thought about was that, if one job completed before the others, there would be an idle core. Therefore, I increased the number of jobs to about 20, and even up to 500. As I expected, it dropped to 5ms.
So my questions are as follows:
Why would spikes occur when I made the job sizes quick / many?
Is there any insight into how the ThreadPool is implemented that would help me to understand how best to use it?
Using threads has a price - you need context switching, you need locking (the job queue is most probably locked when a thread tries to fetch a new job) - it all comes at a price. This price is usually small compared to the actual work your thread is doing, but if the work ends quickly, the price becomes meaningful.
Your solution seems correct. A reasonable rule of thumb is to have twice as many threads as there are cores.
As you probably expect yourself, the spikes are likely caused by the code that manages the thread pools and distributes tasks to them.
For parallel programming, there are more sophisticated approaches than "manually" distributing work across different threads (even if using the threadpool).
See Parallel Programming in the .NET Framework for instance for an overview and different options. In your case, the "solution" may be as simple as this:
Parallel.ForEach(physicObjects, physicObject => Process(physicObject));
Here's my take on your two questions:
I'd like to start with question 2 (how the thread pool works) because it actually holds the key to answering question 1. The thread pool is implemented (without going into details) as a (thread-safe) work queue and a group of worker threads (which may shrink or enlarge as needed). As the user calls QueueUserWorkItem the task is put into the work queue. The workers keep polling the queue and taking work if they are idle. Once they manage to take a task, they execute it and then return to the queue for more work (this is very important!). So the work is done by the workers on-demand: as the workers become idle they take more pieces of work to do.
Having said the above, it's simple to see what is the answer to question 1 (why did you see a performance difference with more fine-grained tasks): it's because with fine-grain you get more load-balancing (a very desirable property), i.e. your workers do more or less the same amount of work and all cores are exploited uniformly. As you said, with a coarse-grain task distribution, there may be longer and shorter tasks, so one or more cores may be lagging behind, slowing down the overall computation, while other do nothing. With small tasks the problem goes away. Each worker thread takes one small task at a time and then goes back for more. If one thread picks up a shorter task it will go to the queue more often, If it takes a longer task it will go to the queue less often, so things are balanced.
Finally, when the jobs are too fine-grained, and considering that the pool may enlarge to over 1K threads, there is very high contention on the queue when all threads go back to take more work (which happens very often), which may account for the spikes you are seeing. If the underlying implementation uses a blocking lock to access the queue, then context switches are very frequent which hurts performance a lot and makes it seem rather random.
answer of question 1:
this is because of Thread switching , thread switching (or context switching in OS concepts) is CPU clocks that takes to switch between each thread , most of times multi-threading increases the speed of programs and process but when it's process is so small and quick size then context switching will take more time than thread's self process so the whole program throughput decreases, you can find more information about this in O.S concepts books .
answer of question 2:
actually i have a overall insight of ThreadPool , and i cant explain what is it's structure exactly.
to learn more about ThreadPool start here ThreadPool Class
each version of .NET Framework adds more and more capabilities utilizing ThreadPool indirectly. such as Parallel.ForEach Method mentioned before added in .NET 4 along with System.Threading.Tasks which makes code more readable and neat. You can learn more on this here Task Schedulers as well.
At very basic level what it does is: it creates let's say 20 threads and puts them into a lits. Each time it receives a delegate to execute async it takes idle thread from the list and executes delegate. if no available threads found it puts it into a queue. every time deletegate execution completes it will check if queue has any item and if so peeks one and executes in the same thread.

Multithreading takes more time than Sequential Execution in C#

I am using Multithreading in my while loop ,
as
while(reader.read())
{
string Number= (reader["Num"] != DBNull.Value) ? reader["Num"].ToString() : string.Empty;
threadarray[RowCount] = new Thread(() =>
{
object ID= (from r in Datasetds.Tables[0].AsEnumerable()
where r.Field<string>("Num") == Number
select r.Field<int>("ID")).First<int>();
});
threadarray[RowCount].Start();
RowCount++;
}
But with sequential execution ,for 200 readers it just takes 0.4 s
but with threading it takes 1.1 sec... This is an example but i have same problem when i execute it with number of lines of code in threading with multiple database operations.
for sequential it takes 10 sec for threading it takes more...
can anyone please suggest me?
Thanks...
Threading is not always quicker and in many cases can be slower (such as the case seen here). There are plenty of reasons why but the two most significant are
Creating a thread is a relatively expensive OS operation
Context switching (where the CPU stops working on one thread and starts working on another) is again a relatively expensive operation
Creating 200 threads will take a fair amount of time (with the default stack size this will allocate 200MB of memory for stacks alone), and unless you have a machine with 200 cores the OS will also need to spend a fair amount of time context switching between those threads too.
The end result is that the time that the machine spends creating threads and switching between them simply outstrips the amount of time that the machine spends doing any work. You may see a performance improvement if you reduce the number of threads being used. Try starting with 1 thread for each core that your machine has.
Multithreading where you have more threads than cores is generally only useful in scenarios where the CPU is hanging around waiting for things to happen (like disk IO or network communication). This isn't the case here.
Threading isn't always the solution and the way you're using it is definitely not thread-safe. Things like disk I/O or other bottlenecks won't benefit from threading in certain circumstances.
Also, there is a cost for starting up threads. Not that I would recommend it for your situation, but check out the TPL. http://msdn.microsoft.com/en-us/library/dd460717.aspx
Multithreading usually is a choice for non blocking execution. Like everything on earth it has its associated costs.
For the commodity of parallel execution, we pay with performance.
Usually there is nothing faster then sequential execution of a single task.
It's hard to suggest something real, in you concrete scenario.
May be you can think about multiple process execution, instead of multiple threads execution.
But I repeast it's hard to tell if you will get benefits from this, without knowing application complete architecture and requirements.
it seems you are creating a thread for each read(). so if it has 200 read(), you have 200 threads running (maybe fewer since some may finished quickly). depends on what you are doing in the thread, 200 threads running at the same time may actually slow down the system because of overheads like others mentioned.
multuthread helps you when 1) the job in the thread takes some time to finish; 2) you have a control on how many threads running at the same time.
in your case, you need try, say 10 threads. if 10 threads are running, wait until 1 of them finished, then allocate the thread to a new read().
if the job in the thread does not take much time, then better use single thread.
Sci Fi author and technologist Jerry Pournelle once said that in an ideal world every process should have its own processor. This is not an ideal world, and your machine has probably 1 - 4 processors. Your Windows system alone is running scads of processes even when you yourself are daydreaming. I just counted the processes running on my Core 2 Quad XP machine, and SYSTEM is running 65 processes. That's 65 processes that have to be shared between 4 processors. Add more threads and each one gets only a slice of processor power.
If you had a Beowulf Cluster you could share threads off into individual machines and you'd probably get very good timings. But your machine can't do this with only 4 processors. The more you ask it to do the worse performance is going to be.

ThreadPool behaviour: not growing from minimum size

I had set up my thread pool like this:
ThreadPool.SetMaxThreads(10000, 10000);
ThreadPool.SetMinThreads(20, 20);
However, my app started hanging under heavy load. This seemed to be because worker tasks were not executing: I had used ThreadPool.QueueUserWorkItem to run some tasks which in turn used the same method to queue further work. This is obviously dangerous with a limited thread pool (a deadlock situation), but I am using a thread pool not to limit maximum threads but to reduce thread creation overhead.
I can see the potential trap there, but I believed that setting a maximum of 10000 threads on the pool would mean that if an item was queued, all threads were busy, and there weren't 10000 threads in the pool, a new one would be created and the task processed there.
However, I changed to this:
ThreadPool.SetMaxThreads(10000, 10000);
ThreadPool.SetMinThreads(200, 200);
..and the app started working. If that made it start working, am I missing something about how/when the thread pool expands from minimum toward maximum size?
The job of the threadpool scheduler is to ensure there are no more executing TP threads than cpu cores. The default minimum is equal to the number of cores. A happy number since that minimizes the overhead due to thread context switching. Twice a second, the scheduler steps in and allows another thread to execute if the existing ones haven't completed.
It will therefore take a hour and twenty minutes of having threads that don't complete to get to your new maximum. It is fairly unlikely to ever get there, a 32-bit machine will keel over when 2000 threads have consumed all available virtual memory. You'd have a shot at it on a 64-bit operating system with a very large paging file. Lots of RAM required to avoid paging death, you'd need at least 12 gigabytes.
The generic diagnostic is that you are using TP threads inappropriately. They take too long, usually caused by blocking on I/O. A regular Thread is the proper choice for those kind of jobs. That's probably hard to fix right now, especially since you're happy with what you got. Raising the minimum is indeed a quick workaround. You'll have to hand-tune it since the TP scheduler can't do a reasonable job anymore.
Whenever you use the thread pool, you are at the mercy of its "thread injection and retirement algorithm".
The algorithm is not properly documented ( that I know of ) and not configurable.
If you're using Tasks, you can write your own Task Scheduler
The performance issue you described, is similar to what is documented in this ASP.NET KB article,
http://support.microsoft.com/kb/821268
To summarize, you need to carefully choose the parameters (this article mentions the typical settings for default ASP.NET thread pool, but you can apply the trick to your app), and further tune them based on performance testing and the characteristics of your app.
Notice that the more you learn about load, you will see that "heavy load" is no longer a good term to describe the situation. Sometimes you need to further categorize the cases, to include detailed term, such as burst load, and so on.
If your logic depends on having a minimum amount of threads you need to change that, urgently.
Setting a MinThreads of 200 (or even 20) is wasting quite a bit of memory. Note that the MaxThreads won't be relevant here, you probably don't have the 10 GB mem for that.
The fact that a min of 200 helps you out is suspicious and as a solution it is probably very brittle.
Take a look at normal Producer/Consumer patterns, and/or use a bounded queue to couple your tasks.

Categories

Resources