We have a computation intensive web application that serves only a few dozen user at any given time at most. However, we have encountered situations where some task performed by just one user can prevent the entire site from processing any other request.
I read that the thread pool has by default up to 128 threads. How can a single thread deprive the remaining threads from cpu time? How does that work? Is this the operating system juging that data access as an example requires higher priority since TCP connection should be kept reliable to sql server in case a large dataset is beeing fethched or saved?
Can someone with a deeper insight with how things actually work enlighten me on this?
How about multi core CPU? We have 8 cpu cores on the server. will they participate in the processing or do we have to increase the number of process/actively engage into parallel processing to take advantage of the multi-core environnment?
Related
I have written a program in C# that does a lot of parallel work using different threads. When i reach approx 300 threads the GUI of the program starts to become slow and the execution of threads is also slowing down drastically. The threads are reading and writing data from a mySQL Database runnning on a different machine.
The funny thing is that if i split the work between two processes on the same machine everything runs perfect. Is there a thread limit per process in the .net framework or in windows? Or why am I getting this behaviour? Could it be a network related problem? I am running Windows 7 Ultimate and i have tried both VS2010 and VS 2012 with the same behaviour.
The way processor time is allocated is that the Operating System gives processor time to every process, then every process gives time to every thread.
So two processes will get twice the processor time, and that's why it works faster if you divide the program into two processes.
If you want to make the GUI run smoother, just set the priority higher for that thread.
This way the GUI thread will get more processor time then the other threads, but not so much that it will noticeably slow down the other threads.
300 threads is silly.
The number of threads should be in the range of your number of cores (2..8) and/or the max simultaneous connections (sometimes only 4 over TCP) your system supports.
Get beyond that and you're only wasting memory, at 1 MB per thread. In a 32bit system, 300 MB is already consuming a lot of the available mem space. And I assume each thread has some buffers attached.
If 2 separate processes perform better than1 then it probably isn't the context switching but either memory usage or a connection limit that holds you back.
Use ThreadPool. That should automatically allocate the optimal number of threads based on your system by throttling the number of threads in existence. You can also set the maximum number of threads allowable at any one time.
Also, if you're allocating thread to parallelize tasks from within a for-loop, foreach-loop, or linq statment you should look at the Parallel Class or PLINQ.
The accepted answer to this question will probably explain what is happening, but 300 threads seems like to many to be a good idea for any normal application.
At first if you have 300 threads for an application then probably you should rethink about your program design.
Setting up GUI threads priority may give you a better performance of GUI. But if you run so much thread the OS have to allocate space in program stack. And the stack is a continuous segment of the memory. So each time you create a new thread the allocated memory space for the stack may be incapable to hold the new thread. And then the OS must have to allocate a larger continuous space in the memory and copy all the data from the old stack to new stack. So obviously this may cause performance slow of your program.
I am using WMI to monitor some hundreds of hosts. I am polling for CPU usage about every 5 seconds. I am using C#'s thread pool to run the currently scheduled appropriate WMI queries. Usually, there are no more than 30 or so threads running the queries. Sometimes there is like 16 seconds gap instead of 5 seconds with no visible CPU usage. Because the CPU is underutilized, I suspect the bottleneck to be in RPC or TCP/IP stack. However I think it is not the TCP/IP stack because the connections are permanently held open. So I suspect the bottleneck to be in RPC on the monitoring machine.
Is there any RPC tuning I can do on the monitoring machine?
UPDATE 1:
I have already done some .NET tuning before I posted. I have tuned the ThreadPool with the ThreadPool.SetMinThreads(200, 200) and ThreadPool.SetMaxThreads(300,300) calls. I am using the Task objects, all created with TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness.
I am using C#'s thread pool
Which is not a good idea if you are running code that does a lot of blocking and little executing. Like WMI queries. The thread pool scheduler tries to limit the number of executing threads to the number of cores on your machine. That's an optimization, it reduces the amount of overhead lost to thread context switches. But it can't predict or detect that threads are not actually executing code. It has an adaptive scheduling algorithm to deal with it, allowing extra threads to execute when the existing ones are not finishing, but that operates slowly.
You can call ThreadPool.SetMinThread() to increase the number of threads that are allowed to execute concurrently. The default is the number of cores. Increasing it to 30 fixes your problem but has global side-effects. Using a Thread instead of the thread pool is a local solution.
I have a TCP/IP server written in C# .net which can have 10,000 connections at once easy. However, when a callback is received from a socket, it is dealt with by a new thread in the application thread pool. This means that the real concurrent communication limitation is down to the number of threads within the thread pool. For example if those 10,000 connections all attempt to send data at the same time, the majority will have to wait whilst the thread pool runs through as fast as it can. Can anyone share their experience with high performance socket services and advise how a large corporation would go about ensuring the 10,000 connections can not only be connected at the same time, but can also communicate at the same time? Thanks
Don't process the packets inline in the callback. Do the absolute minimum work there, and then hand them off to a separate worker thread pool via a producer-consumer queue that (ideally) never blocks the producer threads, which are your socket listeners. BlockingCollection<T> may be useful here.
You have to be careful that the queue does not grow unbounded - if your consumers are a lot slower than producers, and the queue grows under normal load, you have a problem to which throttling the network receives is the obvious solution, despite its undesirability.
YOu make a thought mistake here. Regardless how many thread you have, data always has to wait unless you have one CPU CORE PER CONNECTION. Scalability is not having unlimited paralellism, but being ab le to handle a lot of conenctions and keep the cpu at full power.
The thread pool is perfectly sized for that. Once the CPU reaches full utilization, you can not do anything else anyway.
and advise how a large corporation would go about ensuring the 10,000 connections can not only be
connected at the same time, but can also communicate at the same time?
MANY computers that have like a total of 500 processor cores. The trick is: what latency is acceptable. You dont need instant communication. You try to sovle that from the wrong end.
I've been playing around with threading, attempting to push some limits to the extreme - for my own amusement. I know the threadpool defaults to 25 threads and can be pushed up to 1000 (according to MSDN). What though, is the practical limit of threads per CPU core? At some point, context switching is going to cause more of a bottleneck than threading saves. Does anyone have any best practices covering this? Are we talking 100, 200, 500? Does it depend on what the threads are doing? What determines, other than framework dictated architecture how many threads operate optimally per CPU core?
It's all dependent on what the threads are doing, of course. If they are CPU-bound (say sitting tight in an infinite loop) then one thread per core will be enough to saturate the CPU; any more than that (and you will already have more, from background processes etc) and you will start getting contention.
On the other extreme, if the threads are not eligible to run (e.g. blocked on some synchronization object), then the limit of how many you could have would be dictated by factors other than the CPU (memory for the stacks, OS internal limits, etc).
If your application is not CPU bound (like the majority), then context switches are not a big deal because every time your app has to wait, a context switch is necessary. The problem of having too many threads is about OS data structures and some synchronization anomalies like starvation, where a thread never (or very rarely) gets a chance to execute due to randomness of synchronization algorithms.
If your application is CPU bound (stays 99% of time working on memory, very rarely does I/O or wait for something else such as user input or another thread), then the optimal would be 1 thread per logical core, because in this case there will be no context switching.
Beware that the OS interrupts threads every time, even when there's only one thread for multiple CPUs. The OS interrupts threads not only to make task switching, but also for thread management purposes (like updating counters to show on Task Manager, or to allow a super user to kill it).
I'm running a .NET remoting application built using .NET 2.0. It is a console app, although I removed the [STAThread] on Main.
The TCP channel I'm using uses a ThreadPool in the background.
I've been reported that when running on a dual core box, under heay load, the application never uses more than 50% of the CPU (although I've seen it at 70% or more on a quad core).
Is there any restriction in terms of multi-core for remoting apps or ThreadPools?
Is it needed to change something in order to make a multithreaded app run on several cores?
Thanks
There shouldn't be.
There are several reasons why you could be seeing this behavior:
Your threads are IO bound.
In that case you won't see a lot of parallelism, because everything will be waiting on the disk. A single disk is inherently sequential.
Your lock granularity is too small
Your app may be spending most of it's time obtaining locks, rather than executing your app logic. This can slow things down considerably.
Your lock granularity is too big
If your locking is not granular enough, your other threads may spend a lot of time waiting.
You have a lot of lock contention
Your threads might all be trying to lock the same resources at the same time, making them inherently sequential.
You may not be partitioning your threads correctly.
You may be running the wrong things on multiple threads. For example, if you are using one thread per connection, you may not be taking advantage of available parallelism within the task you are running on that thread. Try splitting those tasks up into chunks that can run in parallel.
Your process may not have a lot of available parallelism
You might just be doing stuff that can't really be done in parallel.
I would try and investigate each one to see what the cause is.
Multithreaded applications will use all of your cores.
I suspect your behavior is due to this statement:
The TCP channel I'm using uses a ThreadPool in the background.
TCP, as well as most socket/file/etc code, tends to use very little CPU. It's spending most of its time waiting, so the CPU usage of your program will probably never spike. Try using the threadpool with heavy computations, and you'll see your processor spike to near 100% CPU usage.
Multi-threaded apps are not required to be bound to a single core. You can check the affinity (how many cores it operates on) of a thread by using ProcessThread.ProcessorAffinity. I'm not sure what the default behavior is, but you can change it programmatically if you need to.
Here is an example of how to do this (taken directly from TechRepublic)
Console.WriteLine("Current ProcessorAffinity: {0}",
Process.GetCurrentProcess().ProcessorAffinity);
Process.GetCurrentProcess().ProcessorAffinity = (System.IntPtr)2;
Console.WriteLine("Current ProcessorAffinity: {0}",
Process.GetCurrentProcess().ProcessorAffinity);
And the output:
Current ProcessorAffinity: 3
Current ProcessorAffinity: 2
The code above, first shows that the process is running on both cores. Then it changes to only use the second core and shows that it is now using only the second core. You can read the .NET documentation on ProcessorAffinity to see what the various numbers mean for affinity.