I'm running a .NET remoting application built using .NET 2.0. It is a console app, although I removed the [STAThread] on Main.
The TCP channel I'm using uses a ThreadPool in the background.
I've been reported that when running on a dual core box, under heay load, the application never uses more than 50% of the CPU (although I've seen it at 70% or more on a quad core).
Is there any restriction in terms of multi-core for remoting apps or ThreadPools?
Is it needed to change something in order to make a multithreaded app run on several cores?
Thanks
There shouldn't be.
There are several reasons why you could be seeing this behavior:
Your threads are IO bound.
In that case you won't see a lot of parallelism, because everything will be waiting on the disk. A single disk is inherently sequential.
Your lock granularity is too small
Your app may be spending most of it's time obtaining locks, rather than executing your app logic. This can slow things down considerably.
Your lock granularity is too big
If your locking is not granular enough, your other threads may spend a lot of time waiting.
You have a lot of lock contention
Your threads might all be trying to lock the same resources at the same time, making them inherently sequential.
You may not be partitioning your threads correctly.
You may be running the wrong things on multiple threads. For example, if you are using one thread per connection, you may not be taking advantage of available parallelism within the task you are running on that thread. Try splitting those tasks up into chunks that can run in parallel.
Your process may not have a lot of available parallelism
You might just be doing stuff that can't really be done in parallel.
I would try and investigate each one to see what the cause is.
Multithreaded applications will use all of your cores.
I suspect your behavior is due to this statement:
The TCP channel I'm using uses a ThreadPool in the background.
TCP, as well as most socket/file/etc code, tends to use very little CPU. It's spending most of its time waiting, so the CPU usage of your program will probably never spike. Try using the threadpool with heavy computations, and you'll see your processor spike to near 100% CPU usage.
Multi-threaded apps are not required to be bound to a single core. You can check the affinity (how many cores it operates on) of a thread by using ProcessThread.ProcessorAffinity. I'm not sure what the default behavior is, but you can change it programmatically if you need to.
Here is an example of how to do this (taken directly from TechRepublic)
Console.WriteLine("Current ProcessorAffinity: {0}",
Process.GetCurrentProcess().ProcessorAffinity);
Process.GetCurrentProcess().ProcessorAffinity = (System.IntPtr)2;
Console.WriteLine("Current ProcessorAffinity: {0}",
Process.GetCurrentProcess().ProcessorAffinity);
And the output:
Current ProcessorAffinity: 3
Current ProcessorAffinity: 2
The code above, first shows that the process is running on both cores. Then it changes to only use the second core and shows that it is now using only the second core. You can read the .NET documentation on ProcessorAffinity to see what the various numbers mean for affinity.
Related
Are .NET threads lightweight user-mode threads or are they kernel-mode operating system threads?
Also, sparing SQL Server, is there a one-to-one correspondence between a .NET thread an an operating system thread?
I am also intrigued because the Thread class that has a symmetric pair of methods named BeginThreadAffinity and EndThreadAffinity, whose documentation subtly suggests that .NET threads are lightweight abstractions over real operating system threads.
Also, I read a while ago on some stack overflow thread itself that Microsoft scratched an attempt to maintain this separation in the CLR, like SQL Server does. There was some project underway to use the Fiber API for this purpose, I recollect, but I can't say I understood all the details of what I read.
I would like some more detailed literature on this topic such as the internal structure of a .NET thread vis-a-vis that of a thread created by Windows. While there is plenty of information available on the structure of a thread created by Windows, Jeffrey Richter's Advanced Windows Programming book being one of the sources, for instance, I can't find any literature dedicated to the internal structure of a .NET thread.
One might argue that this information is available in the .NET source code, which is now publicly available, or using a disassembler such as Reflector or IL Spy, but I don't see anything to represent the Thread Control Block (TCB) and Program Counter (PC) and Stack Pointer (SP) or the thread's wait queue, or the list of queues that the thread is currently a member of in the Thread class.
Where can I read about this? Does the documentation mention any of it? I have read all of these pages from the MSDN but they do not seem to mention it.
.NET's threads are indeed abstractions, but you can basically think of them as nearly identical to OS threads. There are some key differences especially with respect to garbage collection, but to the vast majority of programmers (read: programmers who are unlikely to spin up WinDBG) there is no functional difference.
For more detail, read this
It's important to have some ideas about expected performance and the nature of intended concurrency, before you decide how to do concurrency. For instance, .NET is a virtual machine, so concurrency is simulated. This means more overhead for starting the execution than direct OS threads.
If your application intends to have significant concurrency on demand, .NET tasks or threads (even ThreadPool) will be slow to create and begin executing. In such a case you may benefit from Windows OS threads. However, you'll want to use the unmanaged thread pool (unless you're like me and you prefer writing your own thread pool).
If performance of kicking off an indeterminate number of threads on demand is not a design goal then you should consider the .NET concurrency. The reason is it is easier to program and you can take advantage of keywords impacting concurrency built into the programming language.
To put this into perspective, I wrote a test application that tested an algorithm running in different concurrent units of execution. I ran the test under my own unmanaged thread pool, .NET Tasks, .NET Threads and the .NET ThreadPool.
With concurrency set to 512, meaning 512 concurrent units of execution will be invoked as quickly as possible, I found anything .NET to be extremely slow off the starting block. I ran the test on several systems from an i5 4-core desktop with 16gb RAM to a Windows Server 2012 R2 and the results are the same.
I captured the number of concurrent units of execution completed, how long each unit took to start, and CPU core utilization. The duration of each test was 30 seconds.
All tests resulted in well-balanced CPU core utilization (contrary to the beliefs of some). However, anything .NET would lose the race. In 30 seconds...
.NET Tasks and Threads (ThreadPool) had 34 Tasks completed with an 885ms average start time
All 512 OS Threads ran to completion with a 59ms average start time.
Regardless of what anyone says, the path from invoking an API to start a unit of execution and the actual unit executing, is much longer in .NET.
I am wondering if one C# thread uses up one CPU thread as example:
If you have a CPU with eight threads, for example: 4790k and you start two new C# threads, do you lose two threads? Like 8 - 2 = 6?
Threads are not cores. In the case you gave, your computer has 8 cores (4 physical and 4 logical) that can handle threads.
Computers handle hundreds of threads at once and switch in-between them extremely fast. If you create a Thread class in your C# application, it will create a new thread that will execute on any of the cores on your CPU. The more cores that you have, the faster your computer runs because it can handle more threads at once (there are other factors at play in the speed of the computer than just how many cores you have).
The virtual machine (CLR) is free to do whatever it wants if I recall correctly. There are implementations called green threads for example, that don't necessarily map managed threads to native threads. However, of course, in .NET on a desktop x86, the implementation we know try hard to really map to system threads.
Now this is one layer, but it still says nothing about the CPU. The OS scheduler will let a hardware thread (virtualized because of hyper threading) execute a native thread when it decides it's OK.
There is such a thing known as overcommit which speaks of when too many native threads wants to run (are present in the OS's scheduler ready queue), but the CPU have only that many threads to use, so some remains waiting, until their weight which is an internal measure of the OS scheduler decides that it's their turn.
You can check this document for more information: CFS Scheduler
Then as for the CLR, maybe this question: How does a managed thread work and exist on a native platform?
Indeed in the discussion the useful MSDN page
Managed and unmanaged threading in Windows
explains that:
An operating-system ThreadId has no fixed relationship to a managed
thread, because an unmanaged host can control the relationship between
managed and unmanaged threads. Specifically, a sophisticated host can
use the Fiber API to schedule many managed threads against the same
operating system thread, or to move a managed thread among different
operating system threads.
For very interesting reasons. One of course is the freedom of virtual machine implementation. But another one is the freedom of passing between the native/managed barrier with lots of flexibility as explained on this page. Notably the fact that native threads can enter to managed application domains backwards, and see the virtual machine allocating a managed thread object automatically. This management is maintained using thread id hashes. Crazy stuff :)
Actually, we can see on page CLR Inside Out that the CLR has configurable thread API mapping. So quite flexible, and that proves indeed that you cannot say for sure that one C# thread == one native thread.
As JRLambert already touched on, you can have many threads running on the same core.
When you spawn new threads in your .NET application, you are not actually multi-"tasking" at all. What's happening is time-slicing. Your CPU will constantly switch between the different threads letting them seem like they're working in parallel. However, a core can only work on a single thread at any given time.
When implementing the Task Parallel Library (TPL) for instance, and your machine has multiple cores, you'll see that the cores get more equally distributed work loads, than when simply spawning new threads. Microsoft really did some fantastic work with TPL to simplify a multi-thread asynchronous environment without having to do heavy work with the Thread Pool and Synchronization primitives as we may have had to in the past.
Have a look at this YouTube video. He describes time-slicing vs. multi-tasking quite well.
If you're not familiar with TPL yet, Sacha Barber has a fantastic series on Code Project.
Just want to share an experience as an enthusiastic programmer. I hope it may help someone.
I was developing a code, defining a certain amount of tasks which were planned to start in a fraction of a second interval. however, I noticed that all tasks were run in a group of 8! I was wondering whether the number of CPU thread imposed a limitation here? Actually, I expected all tasks, say 40, should've started exactly in a queue, one by one, according to time interval I set in the code. My old laptop had a CPU corei5, with 4 cores and 8 threats. the same number of 8 made me so confused so that I wanted to hit my head on a wall :)
Task<object>[] tasks = new Task<object>[maxLoop];
String[] timeStamps = new String[maxLoop];
for (int i = 0; i < maxLoop; i++)
{
tasks[i] = SubmitRequest();
timeStamps[i] = DateTime.Now.TimeOfDay.ToString();
await Task.Delay(timeInterval);
}
await Task.WhenAll(tasks);
After reading the Microsoft documents, I reached to this page:ThreadPool.SetMinThreads(Int32, Int32) Method
According to the page, "By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads."
then I added this to the code:
ThreadPool.SetMinThreads(maxLoop, maxLoop);
and the problem was resolved.
Just notice unnecessarily increasing the minimum threads may decrease performance. If too many tasks start at the same time, it may impact the overall speed of the task. So normally, it was advised to leave it to the thread pool to decide with its own algorithm for allocating threads.
I had set up my thread pool like this:
ThreadPool.SetMaxThreads(10000, 10000);
ThreadPool.SetMinThreads(20, 20);
However, my app started hanging under heavy load. This seemed to be because worker tasks were not executing: I had used ThreadPool.QueueUserWorkItem to run some tasks which in turn used the same method to queue further work. This is obviously dangerous with a limited thread pool (a deadlock situation), but I am using a thread pool not to limit maximum threads but to reduce thread creation overhead.
I can see the potential trap there, but I believed that setting a maximum of 10000 threads on the pool would mean that if an item was queued, all threads were busy, and there weren't 10000 threads in the pool, a new one would be created and the task processed there.
However, I changed to this:
ThreadPool.SetMaxThreads(10000, 10000);
ThreadPool.SetMinThreads(200, 200);
..and the app started working. If that made it start working, am I missing something about how/when the thread pool expands from minimum toward maximum size?
The job of the threadpool scheduler is to ensure there are no more executing TP threads than cpu cores. The default minimum is equal to the number of cores. A happy number since that minimizes the overhead due to thread context switching. Twice a second, the scheduler steps in and allows another thread to execute if the existing ones haven't completed.
It will therefore take a hour and twenty minutes of having threads that don't complete to get to your new maximum. It is fairly unlikely to ever get there, a 32-bit machine will keel over when 2000 threads have consumed all available virtual memory. You'd have a shot at it on a 64-bit operating system with a very large paging file. Lots of RAM required to avoid paging death, you'd need at least 12 gigabytes.
The generic diagnostic is that you are using TP threads inappropriately. They take too long, usually caused by blocking on I/O. A regular Thread is the proper choice for those kind of jobs. That's probably hard to fix right now, especially since you're happy with what you got. Raising the minimum is indeed a quick workaround. You'll have to hand-tune it since the TP scheduler can't do a reasonable job anymore.
Whenever you use the thread pool, you are at the mercy of its "thread injection and retirement algorithm".
The algorithm is not properly documented ( that I know of ) and not configurable.
If you're using Tasks, you can write your own Task Scheduler
The performance issue you described, is similar to what is documented in this ASP.NET KB article,
http://support.microsoft.com/kb/821268
To summarize, you need to carefully choose the parameters (this article mentions the typical settings for default ASP.NET thread pool, but you can apply the trick to your app), and further tune them based on performance testing and the characteristics of your app.
Notice that the more you learn about load, you will see that "heavy load" is no longer a good term to describe the situation. Sometimes you need to further categorize the cases, to include detailed term, such as burst load, and so on.
If your logic depends on having a minimum amount of threads you need to change that, urgently.
Setting a MinThreads of 200 (or even 20) is wasting quite a bit of memory. Note that the MaxThreads won't be relevant here, you probably don't have the 10 GB mem for that.
The fact that a min of 200 helps you out is suspicious and as a solution it is probably very brittle.
Take a look at normal Producer/Consumer patterns, and/or use a bounded queue to couple your tasks.
Scenario
I have a very heavy number-crunching process that pools large datasets from 3 different databases and then does a bit of processing on each to eventually produce a result.
This process is fine if it is only used by a single asset. However I now have 3500 assets that I need to process, which takes about 1hr30mins in the state of the current process.
Question
What is my best option for speeding this process up in terms of a multi-threaded c# application? Realistically I don't have to share anything between the processing of each asset, so I'm confident that being able to run process multiple assets at a time shouldn't cause too many issues.
Thoughts
I've heard good things about thread pools, but I guess realistically I want something that isn't too huge to implement, is easily understandable and can run off a decent number of threads at a time.
Help would be greatly appreciated.
In .net you can use the existing Thread Pool, no need to implement one yourself. Here is the relevant MSDN.
You should take care not to run too many processes at once (3500 are a bit much), but using the supplied queuing mechanism should get you started in the right direction.
Another thing to try is using PLINQ.
If you don't have a multi-core processor, multiple machines, and/or the thread processes are not I/O bound, multithreading will not help. Start by profiling the current processing to see where the time is going.
Thread pools are fine, and you can use a task queue to do simple load-balancing, but if there's no spare CPU cycles in the current application this would be a waste of time.
The nicest option would be to use the new Task Parallel Library in .NET 4, if you can do this using VS 2010 RC. This has built-in load balancing and work stealing queues, so it will make this task easy to thread, and very scalable.
However, if you need to do this in .NET 3.5, I would recommend using the ThreadPool, and just using ThreadPool.QueueUserWorkItem to start each task.
If your tasks are all very computationally intensive for their entire lifetime, you may want to prevent having too many running concurrently. Some form of queue, which you pull work from and execute, can be beneficial in this case. Just place all of your work items into a queue, and have threads pull work from the queue (with appropriate locking), and process.
If you have a multi-core system, and CPU cycles are your bottleneck, this should scale very well.
The .Net built in ThreadPool will solve both of your requirements of running a decent number of threads as well as being simple to work with. I have previously written an article on the subject which you can find here.
With using SQL Server 2005 or later, you can create user-defined functions in C# and use them from within T-SQL procedures, which can give a marked speedup for number crunching. SQL Server is multi-threaded and does a good job with it, so consider keeping as much of the processing in the database engine as you can.
I have some embarrassingly-parallelizable work in a .NET 3.5 console app and I want to take advantage of hyperthreading and multi-core processors. How do I pick the best number of worker threads to utilize either of these the best on an arbitrary system? For example, if it's a dual core I will want 2 threads; quad core I will want 4 threads. What I'm ultimately after is determining the processor characteristics so I can know how many threads to create.
I'm not asking how to split up the work nor how to do threading, I'm asking how do I determine the "optimal" number of the threads on an arbitrary machine this console app will run on.
I'd suggest that you don't try to determine it yourself. Use the ThreadPool and let .NET manage the threads for you.
You can use Environment.ProcessorCount if that's the only thing you're after. But usually using a ThreadPool is indeed the better option.
The .NET thread pool also has provisions for sometimes allocating more threads than you have cores to maximise throughput in certain scenarios where many threads are waiting for I/O to finish.
The correct number is obviously 42.
Now on the serious note. Just use the thread pool, always.
1) If you have a lengthy processing task (ie. CPU intensive) that can be partitioned into multiple work piece meals then you should partition your task and then submit all individual work items to the ThreadPool. The thread pool will pick up work items and start churning on them in a dynamic fashion as it has self monitoring capabilities that include starting new threads as needed and can be configured at deployment by administrators according to the deployment site requirements, as opposed to pre-compute the numbers at development time. While is true that the proper partitioning size of your processing task can take into account the number of CPUs available, the right answer depends so much on the nature of the task and the data that is not even worth talking about at this stage (and besides the primary concerns should be your NUMA nodes, memory locality and interlocked cache contention, and only after that the number of cores).
2) If you're doing I/O (including DB calls) then you should use Asynchronous I/O and complete the calls in ThreadPool called completion routines.
These two are the the only valid reasons why you should have multiple threads, and they're both best handled by using the ThreadPool. Anything else, including starting a thread per 'request' or 'connection' are in fact anti patterns on the Win32 API world (fork is a valid pattern in *nix, but definitely not on Windows).
For a more specialized and way, way more detailed discussion of the topic I can only recommend the Rick Vicik papers on the subject:
designing-applications-for-high-performance-part-1.aspx
designing-applications-for-high-performance-part-ii.aspx
designing-applications-for-high-performance-part-iii.aspx
The optimal number would just be the processor count. Optimally you would always have one thread running on a CPU (logical or physical) to minimise context switches and the overhead that has with it.
Whether that is the right number depends (very much as everyone has said) on what you are doing. The threadpool (if I understand it correctly) pretty much tries to use as few threads as possible but spins up another one each time a thread blocks.
The blocking is never optimal but if you are doing any form of blocking then the answer would change dramatically.
The simplest and easiest way to get good (not necessarily optimal) behaviour is to use the threadpool. In my opinion its really hard to do any better than the threadpool so thats simply the best place to start and only ever think about something else if you can demonstrate why that is not good enough.
A good rule of the thumb, given that you're completely CPU-bound, is processorCount+1.
That's +1 because you will always get some tasks started/stopped/interrupted and n tasks will almost never completely fill up n processors.
The only way is a combination of data and code analysis based on performance data.
Different CPU families and speeds vs. memory speed vs other activities on the system are all going to make the tuning different.
Potentially some self-tuning is possible, but this will mean having some form of live performance tuning and self adjustment.
Or even better than the ThreadPool, use .NET 4.0 Task instances from the TPL. The Task Parallel Library is built on a foundation in the .NET 4.0 framework that will actually determine the optimal number of threads to perform the tasks as efficiently as possible for you.
I read something on this recently (see the accepted answer to this question for example).
The simple answer is that you let the operating system decide. It can do a far better job of deciding what's optimal than you can.
There are a number of questions on a similar theme - search for "optimal number threads" (without the quotes) gives you a couple of pages of results.
I would say it also depends on what you are doing, if your making a server application then using all you can out of the CPU`s via either Environment.ProcessorCount or a thread pool is a good idea.
But if this is running on a desktop or a machine that not dedicated to this task, you might want to leave some CPU idle so the machine "functions" for the user.
It can be argued that the real way to pick the best number of threads is for the application to profile itself and adaptively change its threading behavior based on what gives the best performance.
I wrote a simple number crunching app that used multiple threads, and found that on my Quad-core system, it completed the most work in a fixed period using 6 threads.
I think the only real way to determine is through trialling or profiling.
In addition to processor count, you may want to take into account the process's processor affinity by counting bits in the affinity mask returned by the GetProcessAffinityMask function.
If there is no excessive i/o processing or system calls when the threads are running, then the number of thread (except the main thread) is in general equal to the number of processors/cores in your system, otherwise you can try to increase the number of threads by testing.