Scenario
I have a Windows Forms Application. Inside the main form there is a loop that iterates around 3000 times, Creating a new instance of a class on a new thread to perform some calculations. Bearing in mind that this setup uses a Thread Pool, the UI does stay responsive when there are only around 100 iterations of this loop (100 Assets to process). But as soon as this number begins to increase heavily, the UI locks up into eggtimer mode and the thus the log that is writing out to the listbox on the form becomes unreadable.
Question
Am I right in thinking that the best way around this is to use a Background Worker?
And is the UI locking up because even though I'm using lots of different threads (for speed), the UI itself is not on its own separate thread?
Suggested Implementations greatly appreciated.
EDIT!!
So lets say that instead of just firing off and queuing up 3000 assets to process, I decide to do them in batches of 100. How would I go about doing this efficiently? I made an attempt earlier at adding "Thread.Sleep(5000);" after every batch of 100 were fired off, but the whole thing seemed to crap out....
If you are creating 3000 separate threads, you are pushing a documented limitation of the ThreadPool class:
If an application is subject to bursts
of activity in which large numbers of
thread pool tasks are queued, use the
SetMinThreads method to increase the
minimum number of idle threads.
Otherwise, the built-in delay in
creating new idle threads could cause
a bottleneck.
See that MSDN topic for suggestions to configure the thread pool for your situation.
If your work is CPU intensive, having that many separate threads will cause more overhead than it's worth. However, if it's very IO intensive, having a large number of threads may help things somewhat.
.NET 4 introduces outstanding support for parallel programming. If that is an option for you, I suggest you have a look at that.
More threads does not equal top speed. In fact too many threads equals less speed. If your task is simply CPU related you should only be using as many threads as you have cores otherwise you're wasting resources.
With 3,000 iterations and your form thread attempting to create a thread each time what's probably happening is you are maxing out the thread pool and the form is hanging because it needs to wait for a prior thread to complete before it can allocate a new one.
Apparently ThreadPool doesn't work this way. I have never checked it with threads before so I am not sure. Another possibility is that the tasks begin flooding the UI thread with invocations at which point it will give up on the GUI.
It's difficult to tell without seeing code - but, based on what you're describing, there is one suspect.
You mentioned that you have this running on the ThreadPool now. Switching to a BackgroundWorker won't change anything, dramatically, since it also uses the ThreadPool to execute. (BackgroundWorker just simplifies the invoke calls...)
That being said, I suspect the problem is your notifications back to the UI thread for your ListBox. If you're invoking too frequently, your UI may become unresponsive while it tries to "catch up". This can happen if you're feeding too much status info back to the UI thread via Control.Invoke.
Otherwise, make sure that ALL of your work is being done on the ThreadPool, and you're not blocking on the UI thread, and it should work.
If every thread logs something to your ui, every written log line must invoke the main thread. Better to cache the log-output and update the gui only every 100 iterations or something like that.
Since I haven't seen your code so this is just a lot of conjecture with some highly hopefully educated guessing.
All a threadpool does is queue up your requests and then fire new threads off as others complete their work. Now 3000 threads doesn't sounds like a lot but if there's a ton of processing going on you could be destroying your CPU.
I'm not convinced a background worker would help out since you will end up re-creating a manager to handle all the pooling the threadpool gives you. I think more you issue is you've got too much data chunking going on. I think a good place to start would be to throttle the amount of threads you start and maintain. The threadpool manager easily allows you to do this. Find a balance that allows you to process data while still keeping the UI responsive.
Related
Okay, So I wanted to know what happens when I use TaskCreationOptions.LongRunning. By this answer, I came to know that for long running tasks, I should use this options because it creates a thread outside of threadpool.
Cool. But what advantage would I get when I create a thread outside threadpool? And when to do it and avoid it?
what advantage would I get when I create a thread outside threadpool?
The threadpool, as it name states, is a pool of threads which are allocated once and re-used throughout, in order to save the time and resources necessary to allocate a thread. The pool itself re-sizes on demand. If you queue more work than actual workers exist in the pool, it will allocate more threads in 500ms intervals, one at a time (this exists to avoid allocation of multiple threads simultaneously where existing threads may already finish executing and can serve requests). If many long running operations are performed on the thread-pool, it causes "thread starvation", meaning delegates will start getting queued and ran only once a thread frees up. That's why you'd want to avoid a large amount of threads doing lengthy work with thread-pool threads.
The Managed Thread-Pool docs also have a section on this question:
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
For more, see:
Thread vs ThreadPool
When should I not use the ThreadPool in .Net?
Dedicated thread or thread-pool thread?
"Long running" can be quantified pretty well, a thread that takes more than half a second is running long. That's a mountain of processor instructions on a modern machine, you'd have to burn a fat five billion of them per second. Pretty hard to do in a constructive way unless you are calculating the value of Pi to thousands of decimals in the fraction.
Practical threads can only take that long when they are not burning core but are waiting a lot. Invariably on an I/O completion, like reading data from a disk, a network, a dbase server. And often the reason you'd start considering using a thread in the first place.
The threadpool has a "manager". It determines when a threadpool thread is allowed to start. It doesn't happen immediately when you start it in your program. The manager tries to limit the number of running threads to the number of CPU cores you have. It is much more efficient that way, context switching between too many active threads is expensive. And a good throttle, preventing your program from consuming too many resources in a burst.
But the threadpool manager has the very common problem with managers, it doesn't know enough about what is going on. Just like my manager doesn't know that I'm goofing off at Stackoverflow.com, the tp manager doesn't know that a thread is waiting for something and not actually performing useful work. Without that knowledge it cannot make good decisions. A thread that does a lot of waiting should be ignored and another one should be allowed to run in its place. Actually doing real work.
Just like you tell your manager that you go on vacation, so he can expect no work to get done, you tell the threadpool manager the same thing with LongRunning.
Do note that it isn't quite a bad as it, perhaps, sounds in this answer. Particularly .NET 4.0 hired a new manager that's a lot smarter at figuring out the optimum number of running threads. It does so with a feedback loop, collecting data to discover if active threads actually get work done. And adjusts the optimum accordingly. Only problem with this approach is the common one when you close a feedback loop, you have to make it slow so the loop cannot become unstable. In other words, it isn't particularly quick at driving up the number of active threads.
If you know ahead of time that the thread is pretty abysmal, running for many seconds with no real cpu load then always pick LongRunning. Otherwise it is a tuning job, observing the program when it is done and tinkering with it to make it more optimal.
I have some code that I need to "fire and forget," in the sense that the calling code doesn't need to wait around for a response.
If I were to iterate a loop and call this 10,000 times in the span of a second, would I have 10,000 threads floating around, fighting for resources? I have a nightmare of my machine instantly slowing to a crawl.
Alternately, does the thread pool manage these requests, and queue them for resolution?
Put another way, how easy is it to do something stupid with Task.Run().
So, when you invoke Task.Run it creates a new Task and schedules it on TaskScheduler.Default. That happens to be ThreadPoolTaskScheduler which then queues that task to the ThreadPool. The ThreadPool uses a pool of threads (surprisingly) to slowly go through the work items. The ThreadPool manages its own threads internally and may create more of these when needed.
To your question:
would I have 10,000 threads floating around, fighting for resources?
No, the ThreadPool usually knows how to manage its threads effectively and grows very slowly.
However, you would be allocating a lot of tasks that would wait there taking up space until completed. They may also hold on to other references (usually because of capturing) that take up resources as well.
By the way, if instead of the safer Task.Run you would use Task.Factory.StartNew and pass TaskCreationOptions.LongRunning you would be creating 10,000 background threads.
When you call Task.Run, you are queueing work to run in the thread pool. If you start a bunch of work, it can take a while to finish as you have limited computing resources. It's easy to bite yourself when running background tasks using Task.Run. It's best to understand the scenario and use the right tools for the job. If you're looking to process lots of background work, take a look at things like TPL, BlockingCollection and Concurrent Collections. Things like Producer/Consumer flows are the right direction, but all depends on what problem you're trying to solve.
In an attempt to speed up processing of physics objects in C# I decided to change a linear update algorithm into a parallel algorithm. I believed the best approach was to use the ThreadPool as it is built for completing a queue of jobs.
When I first implemented the parallel algorithm, I queued up a job for every physics object. Keep in mind, a single job completes fairly quickly (updates forces, velocity, position, checks for collision with the old state of any surrounding objects to make it thread safe, etc). I would then wait on all jobs to be finished using a single wait handle, with an interlocked integer that I decremented each time a physics object completed (upon hitting zero, I then set the wait handle). The wait was required as the next task I needed to do involved having the objects all be updated.
The first thing I noticed was that performance was crazy. When averaged, the thread pooling seemed to be going a bit faster, but had massive spikes in performance (on the order of 10 ms per update, with random jumps to 40-60ms). I attempted to profile this using ANTS, however I could not gain any insight into why the spikes were occurring.
My next approach was to still use the ThreadPool, however instead I split all the objects into groups. I initially started with only 8 groups, as that was how any cores my computer had. The performance was great. It far outperformed the single threaded approach, and had no spikes (about 6ms per update).
The only thing I thought about was that, if one job completed before the others, there would be an idle core. Therefore, I increased the number of jobs to about 20, and even up to 500. As I expected, it dropped to 5ms.
So my questions are as follows:
Why would spikes occur when I made the job sizes quick / many?
Is there any insight into how the ThreadPool is implemented that would help me to understand how best to use it?
Using threads has a price - you need context switching, you need locking (the job queue is most probably locked when a thread tries to fetch a new job) - it all comes at a price. This price is usually small compared to the actual work your thread is doing, but if the work ends quickly, the price becomes meaningful.
Your solution seems correct. A reasonable rule of thumb is to have twice as many threads as there are cores.
As you probably expect yourself, the spikes are likely caused by the code that manages the thread pools and distributes tasks to them.
For parallel programming, there are more sophisticated approaches than "manually" distributing work across different threads (even if using the threadpool).
See Parallel Programming in the .NET Framework for instance for an overview and different options. In your case, the "solution" may be as simple as this:
Parallel.ForEach(physicObjects, physicObject => Process(physicObject));
Here's my take on your two questions:
I'd like to start with question 2 (how the thread pool works) because it actually holds the key to answering question 1. The thread pool is implemented (without going into details) as a (thread-safe) work queue and a group of worker threads (which may shrink or enlarge as needed). As the user calls QueueUserWorkItem the task is put into the work queue. The workers keep polling the queue and taking work if they are idle. Once they manage to take a task, they execute it and then return to the queue for more work (this is very important!). So the work is done by the workers on-demand: as the workers become idle they take more pieces of work to do.
Having said the above, it's simple to see what is the answer to question 1 (why did you see a performance difference with more fine-grained tasks): it's because with fine-grain you get more load-balancing (a very desirable property), i.e. your workers do more or less the same amount of work and all cores are exploited uniformly. As you said, with a coarse-grain task distribution, there may be longer and shorter tasks, so one or more cores may be lagging behind, slowing down the overall computation, while other do nothing. With small tasks the problem goes away. Each worker thread takes one small task at a time and then goes back for more. If one thread picks up a shorter task it will go to the queue more often, If it takes a longer task it will go to the queue less often, so things are balanced.
Finally, when the jobs are too fine-grained, and considering that the pool may enlarge to over 1K threads, there is very high contention on the queue when all threads go back to take more work (which happens very often), which may account for the spikes you are seeing. If the underlying implementation uses a blocking lock to access the queue, then context switches are very frequent which hurts performance a lot and makes it seem rather random.
answer of question 1:
this is because of Thread switching , thread switching (or context switching in OS concepts) is CPU clocks that takes to switch between each thread , most of times multi-threading increases the speed of programs and process but when it's process is so small and quick size then context switching will take more time than thread's self process so the whole program throughput decreases, you can find more information about this in O.S concepts books .
answer of question 2:
actually i have a overall insight of ThreadPool , and i cant explain what is it's structure exactly.
to learn more about ThreadPool start here ThreadPool Class
each version of .NET Framework adds more and more capabilities utilizing ThreadPool indirectly. such as Parallel.ForEach Method mentioned before added in .NET 4 along with System.Threading.Tasks which makes code more readable and neat. You can learn more on this here Task Schedulers as well.
At very basic level what it does is: it creates let's say 20 threads and puts them into a lits. Each time it receives a delegate to execute async it takes idle thread from the list and executes delegate. if no available threads found it puts it into a queue. every time deletegate execution completes it will check if queue has any item and if so peeks one and executes in the same thread.
I have written a .net winforms application that does some heavy processing and slows my computer down pretty much. I have read something about
Thread.CurrentThread.Priority
but i dont really understand if i should give the main thread more priority or to lower its priority to remove the "lagging" and the slowing of my computer.
Thank you.
It depends entirely what your application and any additional threads are doing. You shouldn't really boost your UI thread priority, however you could lower any background thread priorities.
To keep the UI responsive, don't do any heavy processing on that thread - do the work on a background thread.
It's a bit vague perhaps, but then so if your question. Happy to go into more detail if you can too. Hope that helps !
Yes, it will solve your problem. Set it to ThreadPriority.BelowNormal (or Lowest) and any thread that is started by other processes on your machine will get scheduled ahead of your worker thread. It notably keeps any program you use interactively more responsive. The consequence is that your worker thread can get starved for cpu time when another process is burning cpu. It will still run occasionally, just not very often.
In general, avoid starting more threads than you have cpu cores. Environment.ProcessorCount. The threadpool scheduler already does this automatically, but doesn't pay attention to other processes.
If you're experiencing general "slowing" and "lag" then boosting the priority of any individual thread is only going to make things worse.
It depends exactly how your app is structured, but if you have some heavy processing going on and it's using enough processor time to have a negative impact on the rest of the system then you have two main options:
Reduce the priority of the processing threads so that activity in the rest of the system takes priority.
Introduce artificial breaks in the processing thread. Typically this is done by regularly sending the thread to sleep or yielding to other threads or an event loop.
You should only be increasing the priority on threads that require a higher priority within the system, not just because they are running slowly. Priority means that your thread should be considered more important than other threads. In this case, I don't think that this is the case.
If you lower the priority, then you will probably find it appears to run slower, as other tasks may take the processing time.
What you need to do is reasses your processing, possibly add threads ( or more threads ), or consider how the processing can be improved in other ways. Prioirty is not the answer.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 years ago.
Improve this question
When should I not use the ThreadPool in .Net?
It looks like the best option is to use a ThreadPool, in which case, why is it not the only option?
What are your experiences around this?
#Eric, I'm going to have to agree with Dean. Threads are expensive. You can't assume that your program is the only one running. When everyone is greedy with resources, the problem multiplies.
I prefer to create my threads manually and control them myself. It keeps the code very easy to understand.
That's fine when it's appropriate. If you need a bunch of worker threads, though, all you've done is make your code more complicated. Now you have to write code to manage them. If you just used a thread pool, you'd get all the thread management for free. And the thread pool provided by the language is very likely to be more robust, more efficient, and less buggy than whatever you roll for yourself.
Thread t = new Thread(new ThreadStart(DoSomething));
t.Start();
t.Join();
I hope that you would normally have some additional code in between Start() and Join(). Otherwise, the extra thread is useless, and you're wasting resources for no reason.
People are way too afraid of the resources used by threads. I've never seen creating and starting a thread to take more than a millisecond. There is no hard limit on the number of threads you can create. RAM usage is minimal. Once you have a few hundred threads, CPU becomes an issue because of context switches, so at that point you might want to get fancy with your design.
A millisecond is a long time on modern hardware. That's 3 million cycles on a 3GHz machine. And again, you aren't the only one creating threads. Your threads compete for the CPU along with every other program's threads. If you use not-quite-too-many threads, and so does another program, then together you've used too many threads.
Seriously, don't make life more complex than it needs to be. Don't use the thread pool unless you need something very specific that it offers.
Indeed. Don't make life more complex. If your program needs multiple worker threads, don't reinvent the wheel. Use the thread pool. That's why it's there. Would you roll your own string class?
The only reason why I wouldn't use the ThreadPool for cheap multithreading is if I need to…
interract with the method running (e.g., to kill it)
run code on a STA thread (this happened to me)
keep the thread alive after my application has died (ThreadPool threads are background threads)
in case I need to change the priority of the Thread. We can not change priority of threads in ThreadPool which is by default Normal.
P.S.: The MSDN article "The Managed Thread Pool" contains a section titled, "When Not to Use Thread Pool Threads", with a very similar but slightly more complete list of possible reasons for not using the thread pool.
There are lots of reasons why you would need to skip the ThreadPool, but if you don't know them then the ThreadPool should be good enough for you.
Alternatively, look at the new Parallel Extensions Framework, which has some neat stuff in there that may suit your needs without having to use the ThreadPool.
To quarrelsome's answer, I would add that it's best not to use a ThreadPool thread if you need to guarantee that your thread will begin work immediately. The maximum number of running thread-pooled threads is limited per appdomain, so your piece of work may have to wait if they're all busy. It's called "queue user work item", after all.
Two caveats, of course:
You can change the maximum number of thread-pooled threads in code, at runtime, so there's nothing to stop you checking the current vs maximum number and upping the maximum if required.
Spinning up a new thread comes with its own time penalty - whether it's worthwhile for you to take the hit depends on your circumstances.
Thread pools make sense whenever you have the concept of worker threads. Any time you can easily partition processing into smaller jobs, each of which can be processed independently, worker threads (and therefore a thread pool) make sense.
Thread pools do not make sense when you need thread which perform entirely dissimilar and unrelated actions, which cannot be considered "jobs"; e.g., One thread for GUI event handling, another for backend processing. Thread pools also don't make sense when processing forms a pipeline.
Basically, if you have threads which start, process a job, and quit, a thread pool is probably the way to go. Otherwise, the thread pool isn't really going to help.
I'm not speaking as someone with only
theoretical knowledge here. I write
and maintain high volume applications
that make heavy use of multithreading,
and I generally don't find the thread
pool to be the correct answer.
Ah, argument from authority - but always be on the look out for people who might be on the Windows kernel team.
Neither of us were arguing with the fact that if you have some specific requirements then the .NET ThreadPool might not be the right thing. What we're objecting to is the trivialisation of the costs to the machine of creating a thread.
The significant expense of creating a thread at the raison d'etre for the ThreadPool in the first place. I don't want my machines to be filled with code written by people who have been misinformed about the expense of creating a thread, and don't, for example, know that it causes a method to be called in every single DLL which is attached to the process (some of which will be created by 3rd parties), and which may well hot-up a load of code which need not be in RAM at all and almost certainly didn't need to be in L1.
The shape of the memory hierarchy in a modern machine means that 'distracting' a CPU is about the worst thing you can possibly do, and everybody who cares about their craft should work hard to avoid it.
When you're going to perform an operation that is going to take a long time, or perhaps a continuous background thread.
I guess you could always push the amount of threads available in the pool up but there would be little point in incurring the management costs of a thread that is never going to be given back to the pool.
Threadpool threads are appropriate for tasks that meet both of the following criteria:
The task will not have to spend any significant time waiting for something to happen
Anything that's waiting for the task to finish will likely be waiting for many tasks to finish, so its scheduling priority isn't apt to affect things much.
Using a threadpool thread instead of creating a new one will save a significant but bounded amount of time. If that time is significant compared with the time it will take to perform a task, a threadpool task is likely appropriate. The longer the time required to perform a task, however, the smaller the benefit of using the threadpool and the greater the likelihood of the task impeding threadpool efficiency.
MSDN has a list some reasons here:
http://msdn.microsoft.com/en-us/library/0ka9477y.aspx
There are several scenarios in which it is appropriate to create and
manage your own threads instead of using thread pool threads:
You require a foreground thread.
You require a thread to have a particular priority.
You have tasks that cause the thread to block for long periods of time. The thread pool has a maximum number of threads, so a large
number of blocked thread pool threads might prevent tasks from
starting.
You need to place threads into a single-threaded apartment. All ThreadPool threads are in the multithreaded apartment.
You need to have a stable identity associated with the thread, or to dedicate a thread to a task.
#Eric
#Derek, I don't exactly agree with the scenario you use as an example. If you don't know exactly what's running on your machine and exactly how many total threads, handles, CPU time, RAM, etc, that your app will use under a certain amount of load, you are in trouble.
Are you the only target customer for the programs you write? If not, you can't be certain about most of that. You generally have no idea when you write a program whether it will execute effectively solo, or if it will run on a webserver being hammered by a DDOS attack. You can't know how much CPU time you are going to have.
Assuming your program's behavior changes based on input, it's rare to even know exactly how much memory or CPU time your program will consume. Sure, you should have a pretty good idea about how your program is going to behave, but most programs are never analyzed to determine exactly how much memory, how many handles, etc. will be used, because a full analysis is expensive. If you aren't writing real-time software, the payoff isn't worth the effort.
In general, claiming to know exactly how your program will behave is far-fetched, and claiming to know everything about the machine approaches ludicrous.
And to be honest, if you don't know exactly what method you should use: manual threads, thread pool, delegates, and how to implement it to do just what your application needs, you are in trouble.
I don't fully disagree, but I don't really see how that's relevant. This site is here specifically because programmers don't always have all the answers.
If your application is complex enough to require throttling the number of threads that you use, aren't you almost always going to want more control than what the framework gives you?
No. If I need a thread pool, I will use the one that's provided, unless and until I find that it is not sufficient. I will not simply assume that the provided thread pool is insufficient for my needs without confirming that to be the case.
I'm not speaking as someone with only theoretical knowledge here. I write and maintain high volume applications that make heavy use of multithreading, and I generally don't find the thread pool to be the correct answer.
Most of my professional experience has been with multithreading and multiprocessing programs. I have often needed to roll my own solution as well. That doesn't mean that the thread pool isn't useful, or appropriate in many cases. The thread pool is built to handle worker threads. In cases where multiple worker threads are appropriate, the provided thread pool should should generally be the first approach.