At which time and in which context should I call ThreadPool.SetMinThreads - c#

I have an ASP .NET MVC 2 - application, which calls functions of a C#-DLL.
The DLL itself is multithreaded. In the worst case it uses up to 200 threads, which do not run very long.
I use asynchronous delegates in order to generate the threads. In order to speed up the initialization of the delegates, I calculate the number of threads I need in advance and give it to the ThreadPool:
ThreadPool.SetMinThreads(my_num_threads, ...);
I just wonder, if I need to do this early enough, such that the ThreadPool has enough time to create the threads? Do I have to consider, when I set the size of the ThreadPool or are the threads available immediately after I call SetMinThreads?
Furthermore, if I set the size outside the DLL in my ASP .NET MVC-application (before I call the DLL), will this setting be available/visible for the DLL?

They share the same application domain so that setting ThreadPool anywhere affects all. Note that this also impacts on the ASP.NET framework which will use ThreadPool itself for all its own async tasks etc.
With that in mind, if you knew roughly the minimum number of threads you wanted you could set that on application startup ready for use later on.
However, 200 threads seems somewhat excessive, to put it into context my Chrome with about 8 tabs open uses about 35 and my SQL Server ~50. What are you doing that demands so many?
Also realise that eventually you'll reach a limit where performance will degrade as there are so many threads that must be serviced. Microsoft says such on MSDN:
You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.

Related

How can I get started creating a multithreaded load balancer?

I have an interesting exercise to solve from my professor. But I need a little bit of help so it does not become boring during the holidays.
The exercise is to
create a multithreaded load balancer, that reads 1 measuring point from 5 sensors every second. (therefore 5 values every second).
Then do some "complex" calculations with those values.
Printing results of the calculations on the screen. (like max value or average value of sensor 1-5 and so on, of course multithreaded)
As an additional task I also have to ensure that if in the future for example 500 sensors would be read every second the computer doesn't quit the job.(load balancing).
I have a csv textfile with ~400 measuring points from 5 imaginary sensors.
What I think I have to do:
Read the measuring points into an array
Ensure thread safe access to that array
Spawn a new thread for every value that calculates some math stuff
Set a max value for maximum concurrent working threads
I am new to multithreading applications in c# but I think using threadpool is the right way. I am currently working on a queue and maybe starting it inside a task so it wont block the application.
What would you recommend?
There are a couple of environment dependencies here:
What version of .NET are you using?
What UI are you using - desktop (WPF/WinForms) or ASP.NET?
Let's assume that it's .NET 4.0 or higher and a desktop app.
Reading the sensors
In a WPF or WinForms application, I would use a single BackgroundWorker to read data from the sensors. 500 reads per second is trivial - even 500,00 is usually trivial. And the BackgroundWorker type is specifically designed for interacting with desktop apps, for example handing-off results to the UI without worrying about thread interactions.
Processing the calculations
Then you need to process the "complex" calculations. This depends on how long-lived these calculations are. If we assume they're short-lived (say less than 1 second each), then I think using the TaskScheduler and the standard ThreadPool will be fine. So you create a Task for each calculation, and then let the TaskScheduler take care of allocating tasks to threads.
The job of the TaskScheduler is to load-balance the work by queuing lightweight tasks to more heavyweight threads, and managing the ThreadPool to best balance the workload vs the number of cores on the machine. You can even override the default TaskScheduler to schedule tasks in whatever manner you want.
The ThreadPool is a FIFO queue of work items that need to be processed. In .NET 4.0, the ThreadPool has improved performance by making the work queue a thread-safe ConcurrentQueue collection.
Measuring task throughput and efficiency
You can use PerformanceCounter to measure both CPU and memory usage. This will give you a good idea of whether the cores and memory are being used efficiently. The task throughput is simply measured by looking at the rate at which tasks are being processed and supplying results.
Note that I haven't included any code here, as I assume you want to deal with the implementation details for your professor :-)

ThreadPool behaviour: not growing from minimum size

I had set up my thread pool like this:
ThreadPool.SetMaxThreads(10000, 10000);
ThreadPool.SetMinThreads(20, 20);
However, my app started hanging under heavy load. This seemed to be because worker tasks were not executing: I had used ThreadPool.QueueUserWorkItem to run some tasks which in turn used the same method to queue further work. This is obviously dangerous with a limited thread pool (a deadlock situation), but I am using a thread pool not to limit maximum threads but to reduce thread creation overhead.
I can see the potential trap there, but I believed that setting a maximum of 10000 threads on the pool would mean that if an item was queued, all threads were busy, and there weren't 10000 threads in the pool, a new one would be created and the task processed there.
However, I changed to this:
ThreadPool.SetMaxThreads(10000, 10000);
ThreadPool.SetMinThreads(200, 200);
..and the app started working. If that made it start working, am I missing something about how/when the thread pool expands from minimum toward maximum size?
The job of the threadpool scheduler is to ensure there are no more executing TP threads than cpu cores. The default minimum is equal to the number of cores. A happy number since that minimizes the overhead due to thread context switching. Twice a second, the scheduler steps in and allows another thread to execute if the existing ones haven't completed.
It will therefore take a hour and twenty minutes of having threads that don't complete to get to your new maximum. It is fairly unlikely to ever get there, a 32-bit machine will keel over when 2000 threads have consumed all available virtual memory. You'd have a shot at it on a 64-bit operating system with a very large paging file. Lots of RAM required to avoid paging death, you'd need at least 12 gigabytes.
The generic diagnostic is that you are using TP threads inappropriately. They take too long, usually caused by blocking on I/O. A regular Thread is the proper choice for those kind of jobs. That's probably hard to fix right now, especially since you're happy with what you got. Raising the minimum is indeed a quick workaround. You'll have to hand-tune it since the TP scheduler can't do a reasonable job anymore.
Whenever you use the thread pool, you are at the mercy of its "thread injection and retirement algorithm".
The algorithm is not properly documented ( that I know of ) and not configurable.
If you're using Tasks, you can write your own Task Scheduler
The performance issue you described, is similar to what is documented in this ASP.NET KB article,
http://support.microsoft.com/kb/821268
To summarize, you need to carefully choose the parameters (this article mentions the typical settings for default ASP.NET thread pool, but you can apply the trick to your app), and further tune them based on performance testing and the characteristics of your app.
Notice that the more you learn about load, you will see that "heavy load" is no longer a good term to describe the situation. Sometimes you need to further categorize the cases, to include detailed term, such as burst load, and so on.
If your logic depends on having a minimum amount of threads you need to change that, urgently.
Setting a MinThreads of 200 (or even 20) is wasting quite a bit of memory. Note that the MaxThreads won't be relevant here, you probably don't have the 10 GB mem for that.
The fact that a min of 200 helps you out is suspicious and as a solution it is probably very brittle.
Take a look at normal Producer/Consumer patterns, and/or use a bounded queue to couple your tasks.

Are Socket.*Async methods threaded?

I'm currently trying to figure what is the best way to minimize the amount of threads I use in a TCP master server, in order to maximize performance.
As I've been reading a lot recently with the new async features of C# 5.0, asynchronous does not necessarily mean multithreaded. It could mean separated in smaller chunks of finite state objects, then processed alongside other operations, by alternating. However, I don't see how this could be done in networking, since I'm basically "waiting" for input (from the client).
Therefore, I wouldn't use ReceiveAsync() for all my sockets, it would just be creating and ending threads continuously (assuming it does create threads).
Consequently, my question is more or less: what architecture can a master server take without having one "thread" per connection?
Side question for bonus coolness points: Why is having multiple threads bad, considering that having an amount of threads that is over your amount of processing cores simply makes the machine "fake" multithreading, just like any other asynchronous method would?
No, you would not necessarily be creating threads. There are two possible ways you can do async without setting up and tearing down threads all the time:
You can have a "small" number of long-lived threads, and have them sleep when there's no work to do (this means that the OS will never schedule them for execution, so the resource drain is minimal). Then, when work arrives (i.e. Async method called), wake one of them up and tell it what needs to be done. Pleased to meet you, managed thread pool.
In Windows, the most efficient mechanism for async is I/O completion ports which synchronizes access to I/O operations and allows a small number of threads to manage massive workloads.
Regarding multiple threads:
Having multiple threads is not bad for performance, if
the number of threads is not excessive
the threads do not oversaturate the CPU
If the number of threads is excessive then obviously we are taxing the OS with having to keep track of and schedule all these threads, which uses up global resources and slows it down.
If the threads are CPU-bound, then the OS will need to perform much more frequent context switches in order to maintain fairness, and context switches kill performance. In fact, with user-mode threads (which all highly scalable systems use -- think RDBMS) we make our lives harder just so we can avoid context switches.
Update:
I just found this question, which lends support to the position that you can't say how many threads are too much beforehand -- there are just too many unknown variables.
Seems like the *Async methods use IOCP (by looking at the code with Reflector).
Jon's answer is great. As for the 'side question'... See http://en.wikipedia.org/wiki/Amdahl%27s_law. Amdel's law says that serial code quickly diminishes the gains to be had from parallel code. We also know that thread coordination (scheduling, context switching, etc) is serial - so at some point more threads means there are so many serial steps that parallelization benefits are lost and you have a net negative performance. This is tricky stuff. That's why there is so much effort going into letting .NET manage threads while we define 'tasks' for the framework to decide what thread to run on. The framework can switch between tasks much more efficiently than the OS can switch between threads because the OS has a lot of extra things it needs to worry about when doing so.
Asynchronous work can be done without one-thread-per-connection or a thread pool with OS support for select or poll (and Windows supports this and it is exposed via Socket.Select). I am not sure of the performance on windows, but this is a very common idiom elsewhere.
One thread is the "pump" that manages the IO connections and monitors changes to the streams and then dispatches messages to/from other threads (conceivably 0 ... n depending upon model). Approaches with 0 or 1 additional threads may fall into the "Event Machine" category like twisted (Python) or POE (Perl). With >1 threads the callers form an "implicit thread pool" (themselves) and basically just offload the blocking IO.
There are also approaches like Actors, Continuations or Fibres exposed in the underlying models of some languages which alter how the basic problem is approached -- don't wait, react.
Happy coding.

Threading cost - minimum execution time when threads would add speed

I am working on a C# application that works with an array. It walks through it (meaning that at one time only a narrow part of the array is used). I am considering adding threads in it to make it perform faster (it runs on a dualcore computer). The problem is that I do not know if it would actually help, because threads cost something and this cost could easily be more than the parallel gain... So how do I determine if threading would help?
Try writing some benchmarks that mimic, as closely as possible, the real-world conditions in which your software will actually be used.
Test and time the single-threaded version. Test and time the multi-threaded version. Compare the two sets of results.
If your application is CPU bound (i.e. it isn't spending time trying to read files or waiting for data from a device) and there is little to no sharing of live data (data being altered, if its read only its fine) between the threads then you can pretty much increase the speed by 50->75% by adding another thread (as long as it still remains CPU bound of course).
The main overhead in multithreading comes from 2 places.
Creation & initialization of the thread. Creating a thread requires quite a few resources to be allocated and involves swaps between kernel and user mode, this is expensive though a once off per thread so you can pretty much ignore it if the thread is running for any reasonable amount of time. The best way to mitigate this problem is to use a thread pool as it will keep the thread on hand and not need to be recreated.
Handling synchronization of data. If one thread is reading from data that another is writing, bad things will generally happen (worse if both are changing it). This requires you to lock your data before altering it so that no thread reads a half written value. These locks are generally quite slow as well. To mitigate this problem, you need to design your data layout so that the threads don't need to read or write to the same data as much as possible. If you do need a lot of these locks it can then become slower than the single thread option.
In short, if you are doing something that requires the CPU's to share a lot of data, then multi-threading it will be slower and if the program isn't CPU bound there will be little or no difference (could be a lot slower depending on what it is bound to, e.g. a cd/hard drive). If your program matches these conditions, then it will PROBABLY be worthwhile to add another thread (though the only way to be certain would be profiling).
One more little note, you should only create as many CPU bound threads as you have physical cores (threads that idle most of the time, such as a GUI message pump thread, can be ignored for this condition).
P.S. You can reduce the cost of locking data by using a methodology called "lock-free programming", though this something that should really only be attempted by people with a lot of experience with multi-threading and a clear understanding of their target architecture (including how the cache is treated and the memory bus).
I agree with Luke's answer. Benchmark it, it's the only way to be sure.
I can also give a prediction of the results - the fastest version will be when the number of threads matches the number of cores, EXCEPT if the array is very small and each thread would have to process just a few items, the setup/teardown times might get larger than the processing itself. How few - that depends on what you do. Again - benchmark.
I'd advise to find out a "minimum number of items for a thread to be useful". Then, when you are deciding how many threads to spawn (or take from a pool), check how many cores the computer has and how many items there are. Spawn as many threads as possible, but no more than the computer has cores, and not so many that each thread would have less than the minimum number of items to process.
For example if the minimum number of items is, say, 1000; and the computer has 4 cores; and your list contains 2500 items, you would spawn just 2 threads, because more threads would be inefficient (each would process less than 1000 items).
Making a step by step list for Luke's idea:
Make a single threaded test app
Download Sysinternals Process Monitor and run it
Run your test app and find it on the process list (remember to run it as a release build outside of Visual Studio)
Double click the process and select the Performance Graph tab
Observe the CPU time used by your process
If the CPU time is sittling flat 50% for more than a few seconds, you can probably speed your overall process up using threads (assuming the bunch of stuff Mr Peters refered to holds true)
(However, the best you can do on a duel core machine is to halve the time it takes to run. If your process only take 4 seconds, it might not be worth getting it to run in 2 seconds)
Using the task parallel library / Rx provides a friendlier interface than System.Threading.ThreadPool, which might make your world a bit easier.
You miss imho one item, which is that it is not always about execution time. There is:
The problem to koop a UI operational during an operation. Even if the UI is "dormant", a nonresponsive message pump makes a worse impression.
The possibility to use a thread pool to actually not ahve to start / stop threads all the time. I use thread pools very extensively, and various parts of the applications keep them busy.
Anyhow, ignoring my point 1 - where you may go multi threaded without speeding things up in order to keep your UI responsive - I would say it is always then faster when you can actually either split up work (so you can keep more than one core busy) or offload it for othe reasons.

How do I pick the best number of threads for hyptherthreading/multicore?

I have some embarrassingly-parallelizable work in a .NET 3.5 console app and I want to take advantage of hyperthreading and multi-core processors. How do I pick the best number of worker threads to utilize either of these the best on an arbitrary system? For example, if it's a dual core I will want 2 threads; quad core I will want 4 threads. What I'm ultimately after is determining the processor characteristics so I can know how many threads to create.
I'm not asking how to split up the work nor how to do threading, I'm asking how do I determine the "optimal" number of the threads on an arbitrary machine this console app will run on.
I'd suggest that you don't try to determine it yourself. Use the ThreadPool and let .NET manage the threads for you.
You can use Environment.ProcessorCount if that's the only thing you're after. But usually using a ThreadPool is indeed the better option.
The .NET thread pool also has provisions for sometimes allocating more threads than you have cores to maximise throughput in certain scenarios where many threads are waiting for I/O to finish.
The correct number is obviously 42.
Now on the serious note. Just use the thread pool, always.
1) If you have a lengthy processing task (ie. CPU intensive) that can be partitioned into multiple work piece meals then you should partition your task and then submit all individual work items to the ThreadPool. The thread pool will pick up work items and start churning on them in a dynamic fashion as it has self monitoring capabilities that include starting new threads as needed and can be configured at deployment by administrators according to the deployment site requirements, as opposed to pre-compute the numbers at development time. While is true that the proper partitioning size of your processing task can take into account the number of CPUs available, the right answer depends so much on the nature of the task and the data that is not even worth talking about at this stage (and besides the primary concerns should be your NUMA nodes, memory locality and interlocked cache contention, and only after that the number of cores).
2) If you're doing I/O (including DB calls) then you should use Asynchronous I/O and complete the calls in ThreadPool called completion routines.
These two are the the only valid reasons why you should have multiple threads, and they're both best handled by using the ThreadPool. Anything else, including starting a thread per 'request' or 'connection' are in fact anti patterns on the Win32 API world (fork is a valid pattern in *nix, but definitely not on Windows).
For a more specialized and way, way more detailed discussion of the topic I can only recommend the Rick Vicik papers on the subject:
designing-applications-for-high-performance-part-1.aspx
designing-applications-for-high-performance-part-ii.aspx
designing-applications-for-high-performance-part-iii.aspx
The optimal number would just be the processor count. Optimally you would always have one thread running on a CPU (logical or physical) to minimise context switches and the overhead that has with it.
Whether that is the right number depends (very much as everyone has said) on what you are doing. The threadpool (if I understand it correctly) pretty much tries to use as few threads as possible but spins up another one each time a thread blocks.
The blocking is never optimal but if you are doing any form of blocking then the answer would change dramatically.
The simplest and easiest way to get good (not necessarily optimal) behaviour is to use the threadpool. In my opinion its really hard to do any better than the threadpool so thats simply the best place to start and only ever think about something else if you can demonstrate why that is not good enough.
A good rule of the thumb, given that you're completely CPU-bound, is processorCount+1.
That's +1 because you will always get some tasks started/stopped/interrupted and n tasks will almost never completely fill up n processors.
The only way is a combination of data and code analysis based on performance data.
Different CPU families and speeds vs. memory speed vs other activities on the system are all going to make the tuning different.
Potentially some self-tuning is possible, but this will mean having some form of live performance tuning and self adjustment.
Or even better than the ThreadPool, use .NET 4.0 Task instances from the TPL. The Task Parallel Library is built on a foundation in the .NET 4.0 framework that will actually determine the optimal number of threads to perform the tasks as efficiently as possible for you.
I read something on this recently (see the accepted answer to this question for example).
The simple answer is that you let the operating system decide. It can do a far better job of deciding what's optimal than you can.
There are a number of questions on a similar theme - search for "optimal number threads" (without the quotes) gives you a couple of pages of results.
I would say it also depends on what you are doing, if your making a server application then using all you can out of the CPU`s via either Environment.ProcessorCount or a thread pool is a good idea.
But if this is running on a desktop or a machine that not dedicated to this task, you might want to leave some CPU idle so the machine "functions" for the user.
It can be argued that the real way to pick the best number of threads is for the application to profile itself and adaptively change its threading behavior based on what gives the best performance.
I wrote a simple number crunching app that used multiple threads, and found that on my Quad-core system, it completed the most work in a fixed period using 6 threads.
I think the only real way to determine is through trialling or profiling.
In addition to processor count, you may want to take into account the process's processor affinity by counting bits in the affinity mask returned by the GetProcessAffinityMask function.
If there is no excessive i/o processing or system calls when the threads are running, then the number of thread (except the main thread) is in general equal to the number of processors/cores in your system, otherwise you can try to increase the number of threads by testing.

Categories

Resources