How many threads are too many threads? - c#

I need to develop a procedure that will retrieve 100 rows (product items) performing some very complex calculations.
On average, 0.3s is required for each item, which means that I may face 30s delay if I perform the calculation in serial.
The calculation for each item is not depended on the result of the other items, so I am thinking to use c# asynchronous programming features in order to create threads that will make the calculations in parallel.
The above calculation will be performed in a ASP.NET Core app (.Net 6) that will serve about 10 users.
Until now, I used asynchronous programming for the purposes of keeping the main thread responsive, so I had no worries about system resources. Now I have to design a procedure that may require 100 x 10 = 1000 threads.
Keep in mind that the calculations are performed in the database, so calculation does not require any additional resources.
Should I do it?

If whatever calculation you are running is compute limited, there is no reason to use more threads than you have logical cpu cores. There might be reasons to use less threads to reserve some resources for other things, like keeping the UI responsive. A parallel.For would be a typical solution to run compute limited code concurrently, this will automatically scale the numbers of threads used, but also allow a maximum to be set if you want to reserve some cores.
If you are IO limited you do not really need to use any threads. As long as you are using "true" asynchronous calls no threads will be used while the IO system is working. But note that IO operations may not scale well with higher concurrency, since there will be hardware limits. This is especially true if you have spinning disks.
If your workload is mixed compute and IO you might want to pipeline the IO and the compute, so take a look at DataFlow. If most of the work is performed by the database you may need to just try and see how many threads can be used before performance starts to drop. Databases will involve a mix of IO and compute, but also things locks and semaphores that might add additional limits. You should also check your queries to ensure that they are as efficient as possible, there is no need to spend a bunch of time optimizing concurrency if a index will make the queries 100 times faster.

Related

Is there a general rule for a maximum amount of parallel tasks?

I am using parallel tasks for the first time instead of using a traditional threadpool. In my application I allow for the user to input the number of tasks started to complete the job. (jobs can be very big). I noticed that if I allow any more than 10 or so tasks, the application starts to hang and I actually get worse performance due to the resources used.
I am wondering if there is any correlation between amount of processors and max amount of tasks, so that I can limit the maximum amount of tasks for the users pc so it doesn't slow it down.
No, mostly becuase there is no definition of task. A task can be CPU intensive (limit is like Cores * factor), IO intensive (limit can be very low), or network intensivbe oto a limited ressource (which does not like to handle 1000 requests at the same time).
So, it is up for you as a programmer to use your brain and come up with a concept, then validate it and then put it into your program, depending on what the task actually IS and where the bottlenecks are foreseen. Such planning can be complex - very complex - but most of the time it is quite simple.
The TPL will automatically change how tasks are scheduled and add or remove ThreadPool threads over time. This means that, given enough time and similar work, the default behavior should improve to be the best option.
By default, it will start by using more threads than cores, since many tasks are not "pure CPU". Given that you're seeing extra tasks causing a slowdown, you likely either have resource contention (via locking), or your tasks are CPU bound, and having more tasks than processor cores will cause slowdowns. If this is going to be problematic, you can make a custom TaskScheduler that limits the number of tasks allowed at once, such as the LimitedConcurrencyTaskScheduler. This allows you to limit the number of tasks to the number of processors in pure CPU scenarios.
If your tasks are bound by other factors, such as IO, then you may need to profile to determine the best balance between # of concurrently scheduled tasks and throughput, though this will be system specific.
Assuming that your tasks are CPU-intensive (i.e. they don't do a lot of I/O blocking such as reading files), you probably want to limit the number of parallel tasks to the number of CPU cores available to your application. For example, if your application is running on a computer with a quad-core processor (i.e. 4 cores), limit it to 4 simultaneous tasks.
If your tasks are limited by something other than the CPU (e.g. disk access, network access, etc), then you'll need to figure out what share of that resource each task takes on average. If you know the average then the number of tasks you should run to fully utilize your resource is 100 / average.

At which time and in which context should I call ThreadPool.SetMinThreads

I have an ASP .NET MVC 2 - application, which calls functions of a C#-DLL.
The DLL itself is multithreaded. In the worst case it uses up to 200 threads, which do not run very long.
I use asynchronous delegates in order to generate the threads. In order to speed up the initialization of the delegates, I calculate the number of threads I need in advance and give it to the ThreadPool:
ThreadPool.SetMinThreads(my_num_threads, ...);
I just wonder, if I need to do this early enough, such that the ThreadPool has enough time to create the threads? Do I have to consider, when I set the size of the ThreadPool or are the threads available immediately after I call SetMinThreads?
Furthermore, if I set the size outside the DLL in my ASP .NET MVC-application (before I call the DLL), will this setting be available/visible for the DLL?
They share the same application domain so that setting ThreadPool anywhere affects all. Note that this also impacts on the ASP.NET framework which will use ThreadPool itself for all its own async tasks etc.
With that in mind, if you knew roughly the minimum number of threads you wanted you could set that on application startup ready for use later on.
However, 200 threads seems somewhat excessive, to put it into context my Chrome with about 8 tabs open uses about 35 and my SQL Server ~50. What are you doing that demands so many?
Also realise that eventually you'll reach a limit where performance will degrade as there are so many threads that must be serviced. Microsoft says such on MSDN:
You can use the SetMinThreads method to increase the minimum number of threads. However, unnecessarily increasing these values can cause performance problems. If too many tasks start at the same time, all of them might appear to be slow. In most cases, the thread pool will perform better with its own algorithm for allocating threads. Reducing the minimum to less than the number of processors can also hurt performance.

Task Parallel Library and IIS worker Threads?

I want to use Task Parallel Library for some calculation intensive tasks, but I have been told by a colleague there is a huge overhead for IIS creating worker threads.
I am not sure quite what is done when you call Task.Factory.StartNew()...say 100 times. How does IIS handle this? Is is a huge risk, or is there ways to make this very beneficial for an application?
First Tasks != Threads. You may have many tasks being serviced by few threads (which are already being pooled).
As a general rule, I'm against running long running processes on web servers. There are tons of problems keeping long running jobs up and you tend to reduce your web servers scalability, especially if you are paralellizing long running, cpu intensive jobs. Don't forget the optimal number of threads to have running on a machine is equal to the number of "logical" cores. You want to avoid creating excess threads (each managed thread eats something like a meg in overhead). Running cpu intensive jobs takes cpu time away from serving requests.
In my opinion the best way to use tpl on a web server, is to use it with the goal in the mind that you are making requests as non blocking as possible, which allows the greatest number of requests to be served with the smallest number of threads. Keep in mind that many people make the decision that the extra scale gained by having highly asynchronous request handing is not worth the extra complexity. Depends on your specific case.
So in short, running many long running cpu bound tasks on a web server risks your scalability. Doesn't really matter if you are using tasks, threads, backgroundworkers, or the threadpool. It boils down to the same thing.
One of the great things about the Task abstraction is that it abstracts creating threads away. What that means is that the TPL (actually, the ThreadPool) can decide what the best amount of actual threads is. Because of this, creating 100 Tasks most likely won't create 100 Threads. Because of that, you don't have to worry about the overhead of creating Threads.
But it also depends on what kind of Tasks they are. If you have 100 Tasks that perform some long IO-bound operations and so they block most of the time, that's not a good use of TPL and your code will be quite inefficient (and you may actually end up with 100 Threads).
On the other hand, if you have 100 CPU-bound, relatively short Tasks, that's the sweet spot of TPL and you will get good efficiency.
If you are really concerned about efficiency, you should also know that Tasks do have some overhead. Because of that, in some cases it might make sense to merge multiple Tasks into one larger one to make the overhead smaller. Or you can use something that already does that: Parallel.ForEach or Parallel.For, if they fit your use case. As another advantage, code using them will be more readable than using Tasks manually.
How about just creating a service to handle this work? You'll be much better off in terms of scaling and can isolate that unit of work nicely... even if the work is compute-bound.
In my opinion - don't use the Thread Pool/BackgroundWorker/Thread in ASP.NET. In your case, the TPL simply wraps the thread pool. It's usually more trouble than it's worth.
Threading overheads are the same for any host. Has nothing to do with IIS, at least when it comes to performance.
There are other concerns as well. For example, at application shutdown, user threads are rudely aborted.

Are Socket.*Async methods threaded?

I'm currently trying to figure what is the best way to minimize the amount of threads I use in a TCP master server, in order to maximize performance.
As I've been reading a lot recently with the new async features of C# 5.0, asynchronous does not necessarily mean multithreaded. It could mean separated in smaller chunks of finite state objects, then processed alongside other operations, by alternating. However, I don't see how this could be done in networking, since I'm basically "waiting" for input (from the client).
Therefore, I wouldn't use ReceiveAsync() for all my sockets, it would just be creating and ending threads continuously (assuming it does create threads).
Consequently, my question is more or less: what architecture can a master server take without having one "thread" per connection?
Side question for bonus coolness points: Why is having multiple threads bad, considering that having an amount of threads that is over your amount of processing cores simply makes the machine "fake" multithreading, just like any other asynchronous method would?
No, you would not necessarily be creating threads. There are two possible ways you can do async without setting up and tearing down threads all the time:
You can have a "small" number of long-lived threads, and have them sleep when there's no work to do (this means that the OS will never schedule them for execution, so the resource drain is minimal). Then, when work arrives (i.e. Async method called), wake one of them up and tell it what needs to be done. Pleased to meet you, managed thread pool.
In Windows, the most efficient mechanism for async is I/O completion ports which synchronizes access to I/O operations and allows a small number of threads to manage massive workloads.
Regarding multiple threads:
Having multiple threads is not bad for performance, if
the number of threads is not excessive
the threads do not oversaturate the CPU
If the number of threads is excessive then obviously we are taxing the OS with having to keep track of and schedule all these threads, which uses up global resources and slows it down.
If the threads are CPU-bound, then the OS will need to perform much more frequent context switches in order to maintain fairness, and context switches kill performance. In fact, with user-mode threads (which all highly scalable systems use -- think RDBMS) we make our lives harder just so we can avoid context switches.
Update:
I just found this question, which lends support to the position that you can't say how many threads are too much beforehand -- there are just too many unknown variables.
Seems like the *Async methods use IOCP (by looking at the code with Reflector).
Jon's answer is great. As for the 'side question'... See http://en.wikipedia.org/wiki/Amdahl%27s_law. Amdel's law says that serial code quickly diminishes the gains to be had from parallel code. We also know that thread coordination (scheduling, context switching, etc) is serial - so at some point more threads means there are so many serial steps that parallelization benefits are lost and you have a net negative performance. This is tricky stuff. That's why there is so much effort going into letting .NET manage threads while we define 'tasks' for the framework to decide what thread to run on. The framework can switch between tasks much more efficiently than the OS can switch between threads because the OS has a lot of extra things it needs to worry about when doing so.
Asynchronous work can be done without one-thread-per-connection or a thread pool with OS support for select or poll (and Windows supports this and it is exposed via Socket.Select). I am not sure of the performance on windows, but this is a very common idiom elsewhere.
One thread is the "pump" that manages the IO connections and monitors changes to the streams and then dispatches messages to/from other threads (conceivably 0 ... n depending upon model). Approaches with 0 or 1 additional threads may fall into the "Event Machine" category like twisted (Python) or POE (Perl). With >1 threads the callers form an "implicit thread pool" (themselves) and basically just offload the blocking IO.
There are also approaches like Actors, Continuations or Fibres exposed in the underlying models of some languages which alter how the basic problem is approached -- don't wait, react.
Happy coding.

Threading cost - minimum execution time when threads would add speed

I am working on a C# application that works with an array. It walks through it (meaning that at one time only a narrow part of the array is used). I am considering adding threads in it to make it perform faster (it runs on a dualcore computer). The problem is that I do not know if it would actually help, because threads cost something and this cost could easily be more than the parallel gain... So how do I determine if threading would help?
Try writing some benchmarks that mimic, as closely as possible, the real-world conditions in which your software will actually be used.
Test and time the single-threaded version. Test and time the multi-threaded version. Compare the two sets of results.
If your application is CPU bound (i.e. it isn't spending time trying to read files or waiting for data from a device) and there is little to no sharing of live data (data being altered, if its read only its fine) between the threads then you can pretty much increase the speed by 50->75% by adding another thread (as long as it still remains CPU bound of course).
The main overhead in multithreading comes from 2 places.
Creation & initialization of the thread. Creating a thread requires quite a few resources to be allocated and involves swaps between kernel and user mode, this is expensive though a once off per thread so you can pretty much ignore it if the thread is running for any reasonable amount of time. The best way to mitigate this problem is to use a thread pool as it will keep the thread on hand and not need to be recreated.
Handling synchronization of data. If one thread is reading from data that another is writing, bad things will generally happen (worse if both are changing it). This requires you to lock your data before altering it so that no thread reads a half written value. These locks are generally quite slow as well. To mitigate this problem, you need to design your data layout so that the threads don't need to read or write to the same data as much as possible. If you do need a lot of these locks it can then become slower than the single thread option.
In short, if you are doing something that requires the CPU's to share a lot of data, then multi-threading it will be slower and if the program isn't CPU bound there will be little or no difference (could be a lot slower depending on what it is bound to, e.g. a cd/hard drive). If your program matches these conditions, then it will PROBABLY be worthwhile to add another thread (though the only way to be certain would be profiling).
One more little note, you should only create as many CPU bound threads as you have physical cores (threads that idle most of the time, such as a GUI message pump thread, can be ignored for this condition).
P.S. You can reduce the cost of locking data by using a methodology called "lock-free programming", though this something that should really only be attempted by people with a lot of experience with multi-threading and a clear understanding of their target architecture (including how the cache is treated and the memory bus).
I agree with Luke's answer. Benchmark it, it's the only way to be sure.
I can also give a prediction of the results - the fastest version will be when the number of threads matches the number of cores, EXCEPT if the array is very small and each thread would have to process just a few items, the setup/teardown times might get larger than the processing itself. How few - that depends on what you do. Again - benchmark.
I'd advise to find out a "minimum number of items for a thread to be useful". Then, when you are deciding how many threads to spawn (or take from a pool), check how many cores the computer has and how many items there are. Spawn as many threads as possible, but no more than the computer has cores, and not so many that each thread would have less than the minimum number of items to process.
For example if the minimum number of items is, say, 1000; and the computer has 4 cores; and your list contains 2500 items, you would spawn just 2 threads, because more threads would be inefficient (each would process less than 1000 items).
Making a step by step list for Luke's idea:
Make a single threaded test app
Download Sysinternals Process Monitor and run it
Run your test app and find it on the process list (remember to run it as a release build outside of Visual Studio)
Double click the process and select the Performance Graph tab
Observe the CPU time used by your process
If the CPU time is sittling flat 50% for more than a few seconds, you can probably speed your overall process up using threads (assuming the bunch of stuff Mr Peters refered to holds true)
(However, the best you can do on a duel core machine is to halve the time it takes to run. If your process only take 4 seconds, it might not be worth getting it to run in 2 seconds)
Using the task parallel library / Rx provides a friendlier interface than System.Threading.ThreadPool, which might make your world a bit easier.
You miss imho one item, which is that it is not always about execution time. There is:
The problem to koop a UI operational during an operation. Even if the UI is "dormant", a nonresponsive message pump makes a worse impression.
The possibility to use a thread pool to actually not ahve to start / stop threads all the time. I use thread pools very extensively, and various parts of the applications keep them busy.
Anyhow, ignoring my point 1 - where you may go multi threaded without speeding things up in order to keep your UI responsive - I would say it is always then faster when you can actually either split up work (so you can keep more than one core busy) or offload it for othe reasons.

Categories

Resources