I am wondering if one C# thread uses up one CPU thread as example:
If you have a CPU with eight threads, for example: 4790k and you start two new C# threads, do you lose two threads? Like 8 - 2 = 6?
Threads are not cores. In the case you gave, your computer has 8 cores (4 physical and 4 logical) that can handle threads.
Computers handle hundreds of threads at once and switch in-between them extremely fast. If you create a Thread class in your C# application, it will create a new thread that will execute on any of the cores on your CPU. The more cores that you have, the faster your computer runs because it can handle more threads at once (there are other factors at play in the speed of the computer than just how many cores you have).
The virtual machine (CLR) is free to do whatever it wants if I recall correctly. There are implementations called green threads for example, that don't necessarily map managed threads to native threads. However, of course, in .NET on a desktop x86, the implementation we know try hard to really map to system threads.
Now this is one layer, but it still says nothing about the CPU. The OS scheduler will let a hardware thread (virtualized because of hyper threading) execute a native thread when it decides it's OK.
There is such a thing known as overcommit which speaks of when too many native threads wants to run (are present in the OS's scheduler ready queue), but the CPU have only that many threads to use, so some remains waiting, until their weight which is an internal measure of the OS scheduler decides that it's their turn.
You can check this document for more information: CFS Scheduler
Then as for the CLR, maybe this question: How does a managed thread work and exist on a native platform?
Indeed in the discussion the useful MSDN page
Managed and unmanaged threading in Windows
explains that:
An operating-system ThreadId has no fixed relationship to a managed
thread, because an unmanaged host can control the relationship between
managed and unmanaged threads. Specifically, a sophisticated host can
use the Fiber API to schedule many managed threads against the same
operating system thread, or to move a managed thread among different
operating system threads.
For very interesting reasons. One of course is the freedom of virtual machine implementation. But another one is the freedom of passing between the native/managed barrier with lots of flexibility as explained on this page. Notably the fact that native threads can enter to managed application domains backwards, and see the virtual machine allocating a managed thread object automatically. This management is maintained using thread id hashes. Crazy stuff :)
Actually, we can see on page CLR Inside Out that the CLR has configurable thread API mapping. So quite flexible, and that proves indeed that you cannot say for sure that one C# thread == one native thread.
As JRLambert already touched on, you can have many threads running on the same core.
When you spawn new threads in your .NET application, you are not actually multi-"tasking" at all. What's happening is time-slicing. Your CPU will constantly switch between the different threads letting them seem like they're working in parallel. However, a core can only work on a single thread at any given time.
When implementing the Task Parallel Library (TPL) for instance, and your machine has multiple cores, you'll see that the cores get more equally distributed work loads, than when simply spawning new threads. Microsoft really did some fantastic work with TPL to simplify a multi-thread asynchronous environment without having to do heavy work with the Thread Pool and Synchronization primitives as we may have had to in the past.
Have a look at this YouTube video. He describes time-slicing vs. multi-tasking quite well.
If you're not familiar with TPL yet, Sacha Barber has a fantastic series on Code Project.
Just want to share an experience as an enthusiastic programmer. I hope it may help someone.
I was developing a code, defining a certain amount of tasks which were planned to start in a fraction of a second interval. however, I noticed that all tasks were run in a group of 8! I was wondering whether the number of CPU thread imposed a limitation here? Actually, I expected all tasks, say 40, should've started exactly in a queue, one by one, according to time interval I set in the code. My old laptop had a CPU corei5, with 4 cores and 8 threats. the same number of 8 made me so confused so that I wanted to hit my head on a wall :)
Task<object>[] tasks = new Task<object>[maxLoop];
String[] timeStamps = new String[maxLoop];
for (int i = 0; i < maxLoop; i++)
{
tasks[i] = SubmitRequest();
timeStamps[i] = DateTime.Now.TimeOfDay.ToString();
await Task.Delay(timeInterval);
}
await Task.WhenAll(tasks);
After reading the Microsoft documents, I reached to this page:ThreadPool.SetMinThreads(Int32, Int32) Method
According to the page, "By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads."
then I added this to the code:
ThreadPool.SetMinThreads(maxLoop, maxLoop);
and the problem was resolved.
Just notice unnecessarily increasing the minimum threads may decrease performance. If too many tasks start at the same time, it may impact the overall speed of the task. So normally, it was advised to leave it to the thread pool to decide with its own algorithm for allocating threads.
Related
Are .NET threads lightweight user-mode threads or are they kernel-mode operating system threads?
Also, sparing SQL Server, is there a one-to-one correspondence between a .NET thread an an operating system thread?
I am also intrigued because the Thread class that has a symmetric pair of methods named BeginThreadAffinity and EndThreadAffinity, whose documentation subtly suggests that .NET threads are lightweight abstractions over real operating system threads.
Also, I read a while ago on some stack overflow thread itself that Microsoft scratched an attempt to maintain this separation in the CLR, like SQL Server does. There was some project underway to use the Fiber API for this purpose, I recollect, but I can't say I understood all the details of what I read.
I would like some more detailed literature on this topic such as the internal structure of a .NET thread vis-a-vis that of a thread created by Windows. While there is plenty of information available on the structure of a thread created by Windows, Jeffrey Richter's Advanced Windows Programming book being one of the sources, for instance, I can't find any literature dedicated to the internal structure of a .NET thread.
One might argue that this information is available in the .NET source code, which is now publicly available, or using a disassembler such as Reflector or IL Spy, but I don't see anything to represent the Thread Control Block (TCB) and Program Counter (PC) and Stack Pointer (SP) or the thread's wait queue, or the list of queues that the thread is currently a member of in the Thread class.
Where can I read about this? Does the documentation mention any of it? I have read all of these pages from the MSDN but they do not seem to mention it.
.NET's threads are indeed abstractions, but you can basically think of them as nearly identical to OS threads. There are some key differences especially with respect to garbage collection, but to the vast majority of programmers (read: programmers who are unlikely to spin up WinDBG) there is no functional difference.
For more detail, read this
It's important to have some ideas about expected performance and the nature of intended concurrency, before you decide how to do concurrency. For instance, .NET is a virtual machine, so concurrency is simulated. This means more overhead for starting the execution than direct OS threads.
If your application intends to have significant concurrency on demand, .NET tasks or threads (even ThreadPool) will be slow to create and begin executing. In such a case you may benefit from Windows OS threads. However, you'll want to use the unmanaged thread pool (unless you're like me and you prefer writing your own thread pool).
If performance of kicking off an indeterminate number of threads on demand is not a design goal then you should consider the .NET concurrency. The reason is it is easier to program and you can take advantage of keywords impacting concurrency built into the programming language.
To put this into perspective, I wrote a test application that tested an algorithm running in different concurrent units of execution. I ran the test under my own unmanaged thread pool, .NET Tasks, .NET Threads and the .NET ThreadPool.
With concurrency set to 512, meaning 512 concurrent units of execution will be invoked as quickly as possible, I found anything .NET to be extremely slow off the starting block. I ran the test on several systems from an i5 4-core desktop with 16gb RAM to a Windows Server 2012 R2 and the results are the same.
I captured the number of concurrent units of execution completed, how long each unit took to start, and CPU core utilization. The duration of each test was 30 seconds.
All tests resulted in well-balanced CPU core utilization (contrary to the beliefs of some). However, anything .NET would lose the race. In 30 seconds...
.NET Tasks and Threads (ThreadPool) had 34 Tasks completed with an 885ms average start time
All 512 OS Threads ran to completion with a 59ms average start time.
Regardless of what anyone says, the path from invoking an API to start a unit of execution and the actual unit executing, is much longer in .NET.
I have written a program which depends on threads heavily. In addition, there is a requirement to measure the total time taken by each thread, and also the execution time (kernel time plus user time).
There can be an arbitrary number of threads and many may run at once. This is down to user activity. I need them to run as quickly as possible, so using something which has some overhead like WMI/Performance Monitor to measure thread times is not ideal.
At the moment, I'm using GetThreadTimes, as shown in this article: http://www.codeproject.com/KB/dotnet/ExecutionStopwatch.aspx
My question is simple: I understand .NET threads may not correspond on a one-to-one basis with system threads (though in all my testing so far, it seems to have been one to one). That being the case, if .NET decides to put two or more of my threads into one system thread, am I going to get strange results from my timing code? If so (or even if not), is there another way to measure the kernel and user time of a .NET thread?
as it stated: Multithreading is managed internally by a thread scheduler, a function the CLR typically delegates to the operating system. A thread scheduler ensures all active threads are allocated appropriate execution time, and that threads that are waiting or blocked (for instance, on an exclusive lock or on user input) do not consume CPU time.
Theoretically NET team may implement their own scheduler, but i doubt this.So i think the GetThreadTimes function is what you need.
I'm currently trying to figure what is the best way to minimize the amount of threads I use in a TCP master server, in order to maximize performance.
As I've been reading a lot recently with the new async features of C# 5.0, asynchronous does not necessarily mean multithreaded. It could mean separated in smaller chunks of finite state objects, then processed alongside other operations, by alternating. However, I don't see how this could be done in networking, since I'm basically "waiting" for input (from the client).
Therefore, I wouldn't use ReceiveAsync() for all my sockets, it would just be creating and ending threads continuously (assuming it does create threads).
Consequently, my question is more or less: what architecture can a master server take without having one "thread" per connection?
Side question for bonus coolness points: Why is having multiple threads bad, considering that having an amount of threads that is over your amount of processing cores simply makes the machine "fake" multithreading, just like any other asynchronous method would?
No, you would not necessarily be creating threads. There are two possible ways you can do async without setting up and tearing down threads all the time:
You can have a "small" number of long-lived threads, and have them sleep when there's no work to do (this means that the OS will never schedule them for execution, so the resource drain is minimal). Then, when work arrives (i.e. Async method called), wake one of them up and tell it what needs to be done. Pleased to meet you, managed thread pool.
In Windows, the most efficient mechanism for async is I/O completion ports which synchronizes access to I/O operations and allows a small number of threads to manage massive workloads.
Regarding multiple threads:
Having multiple threads is not bad for performance, if
the number of threads is not excessive
the threads do not oversaturate the CPU
If the number of threads is excessive then obviously we are taxing the OS with having to keep track of and schedule all these threads, which uses up global resources and slows it down.
If the threads are CPU-bound, then the OS will need to perform much more frequent context switches in order to maintain fairness, and context switches kill performance. In fact, with user-mode threads (which all highly scalable systems use -- think RDBMS) we make our lives harder just so we can avoid context switches.
Update:
I just found this question, which lends support to the position that you can't say how many threads are too much beforehand -- there are just too many unknown variables.
Seems like the *Async methods use IOCP (by looking at the code with Reflector).
Jon's answer is great. As for the 'side question'... See http://en.wikipedia.org/wiki/Amdahl%27s_law. Amdel's law says that serial code quickly diminishes the gains to be had from parallel code. We also know that thread coordination (scheduling, context switching, etc) is serial - so at some point more threads means there are so many serial steps that parallelization benefits are lost and you have a net negative performance. This is tricky stuff. That's why there is so much effort going into letting .NET manage threads while we define 'tasks' for the framework to decide what thread to run on. The framework can switch between tasks much more efficiently than the OS can switch between threads because the OS has a lot of extra things it needs to worry about when doing so.
Asynchronous work can be done without one-thread-per-connection or a thread pool with OS support for select or poll (and Windows supports this and it is exposed via Socket.Select). I am not sure of the performance on windows, but this is a very common idiom elsewhere.
One thread is the "pump" that manages the IO connections and monitors changes to the streams and then dispatches messages to/from other threads (conceivably 0 ... n depending upon model). Approaches with 0 or 1 additional threads may fall into the "Event Machine" category like twisted (Python) or POE (Perl). With >1 threads the callers form an "implicit thread pool" (themselves) and basically just offload the blocking IO.
There are also approaches like Actors, Continuations or Fibres exposed in the underlying models of some languages which alter how the basic problem is approached -- don't wait, react.
Happy coding.
I have soft real-time .NET application run on W2008R2.
I just realised that I cant explain how precisely threads are being scheduled.
And for my embarassement I dont know how it works down to OS threads at all.
So I will explain what I know and I'd appreciate if anyone could help me to fill the gaps and refer me to a simple description of algorithms which are used in .NET and Windows to schedule threads.
My code runs in managed threads. As I know managed threads (lets call them .NET threads) run in unmanaged threads (lets call them OS threads).
I know that threads are competing for CPU time and other resources. And that there is a piece of software - scheduler which monitors resources and threads and make the whole thing work.
Here I am not sure - is the scheduler just one for the OS or there is also .NET scheduler which schedules .NET threads? And if there are two schedulers then how they are working together?
And what are the rules to schedule a thread?
I want to narrow this question to a thread which does only arithmetic computations and no access to any devices so we are talking purely CPU consuming thread.
How scheduler factors .NET thread priority, process priority and a shared resource (CPU in this case) ?
The current versions of the .NET desktop runtime do have a 1:1 mapping between .NET threads and OS threads. This may change in future versions (and may not be true in other hosts; I know it's not true in AutoCAD but I believe it's true in SQL Server and MS Office).
Note that the .NET thread pool is different than the OS thread pool (though it does make use of a single process-wide I/O Completion Port, which has some pool-like semantics).
For full and complete answers, you'll need two books: Windows Internals and Concurrent Programming on Windows. I recommend reading them in that order.
I have some embarrassingly-parallelizable work in a .NET 3.5 console app and I want to take advantage of hyperthreading and multi-core processors. How do I pick the best number of worker threads to utilize either of these the best on an arbitrary system? For example, if it's a dual core I will want 2 threads; quad core I will want 4 threads. What I'm ultimately after is determining the processor characteristics so I can know how many threads to create.
I'm not asking how to split up the work nor how to do threading, I'm asking how do I determine the "optimal" number of the threads on an arbitrary machine this console app will run on.
I'd suggest that you don't try to determine it yourself. Use the ThreadPool and let .NET manage the threads for you.
You can use Environment.ProcessorCount if that's the only thing you're after. But usually using a ThreadPool is indeed the better option.
The .NET thread pool also has provisions for sometimes allocating more threads than you have cores to maximise throughput in certain scenarios where many threads are waiting for I/O to finish.
The correct number is obviously 42.
Now on the serious note. Just use the thread pool, always.
1) If you have a lengthy processing task (ie. CPU intensive) that can be partitioned into multiple work piece meals then you should partition your task and then submit all individual work items to the ThreadPool. The thread pool will pick up work items and start churning on them in a dynamic fashion as it has self monitoring capabilities that include starting new threads as needed and can be configured at deployment by administrators according to the deployment site requirements, as opposed to pre-compute the numbers at development time. While is true that the proper partitioning size of your processing task can take into account the number of CPUs available, the right answer depends so much on the nature of the task and the data that is not even worth talking about at this stage (and besides the primary concerns should be your NUMA nodes, memory locality and interlocked cache contention, and only after that the number of cores).
2) If you're doing I/O (including DB calls) then you should use Asynchronous I/O and complete the calls in ThreadPool called completion routines.
These two are the the only valid reasons why you should have multiple threads, and they're both best handled by using the ThreadPool. Anything else, including starting a thread per 'request' or 'connection' are in fact anti patterns on the Win32 API world (fork is a valid pattern in *nix, but definitely not on Windows).
For a more specialized and way, way more detailed discussion of the topic I can only recommend the Rick Vicik papers on the subject:
designing-applications-for-high-performance-part-1.aspx
designing-applications-for-high-performance-part-ii.aspx
designing-applications-for-high-performance-part-iii.aspx
The optimal number would just be the processor count. Optimally you would always have one thread running on a CPU (logical or physical) to minimise context switches and the overhead that has with it.
Whether that is the right number depends (very much as everyone has said) on what you are doing. The threadpool (if I understand it correctly) pretty much tries to use as few threads as possible but spins up another one each time a thread blocks.
The blocking is never optimal but if you are doing any form of blocking then the answer would change dramatically.
The simplest and easiest way to get good (not necessarily optimal) behaviour is to use the threadpool. In my opinion its really hard to do any better than the threadpool so thats simply the best place to start and only ever think about something else if you can demonstrate why that is not good enough.
A good rule of the thumb, given that you're completely CPU-bound, is processorCount+1.
That's +1 because you will always get some tasks started/stopped/interrupted and n tasks will almost never completely fill up n processors.
The only way is a combination of data and code analysis based on performance data.
Different CPU families and speeds vs. memory speed vs other activities on the system are all going to make the tuning different.
Potentially some self-tuning is possible, but this will mean having some form of live performance tuning and self adjustment.
Or even better than the ThreadPool, use .NET 4.0 Task instances from the TPL. The Task Parallel Library is built on a foundation in the .NET 4.0 framework that will actually determine the optimal number of threads to perform the tasks as efficiently as possible for you.
I read something on this recently (see the accepted answer to this question for example).
The simple answer is that you let the operating system decide. It can do a far better job of deciding what's optimal than you can.
There are a number of questions on a similar theme - search for "optimal number threads" (without the quotes) gives you a couple of pages of results.
I would say it also depends on what you are doing, if your making a server application then using all you can out of the CPU`s via either Environment.ProcessorCount or a thread pool is a good idea.
But if this is running on a desktop or a machine that not dedicated to this task, you might want to leave some CPU idle so the machine "functions" for the user.
It can be argued that the real way to pick the best number of threads is for the application to profile itself and adaptively change its threading behavior based on what gives the best performance.
I wrote a simple number crunching app that used multiple threads, and found that on my Quad-core system, it completed the most work in a fixed period using 6 threads.
I think the only real way to determine is through trialling or profiling.
In addition to processor count, you may want to take into account the process's processor affinity by counting bits in the affinity mask returned by the GetProcessAffinityMask function.
If there is no excessive i/o processing or system calls when the threads are running, then the number of thread (except the main thread) is in general equal to the number of processors/cores in your system, otherwise you can try to increase the number of threads by testing.