Are .NET threads different from operating system threads?

Are .NET threads different from operating system threads? - c#

Are .NET threads lightweight user-mode threads or are they kernel-mode operating system threads?
Also, sparing SQL Server, is there a one-to-one correspondence between a .NET thread an an operating system thread?
I am also intrigued because the Thread class that has a symmetric pair of methods named BeginThreadAffinity and EndThreadAffinity, whose documentation subtly suggests that .NET threads are lightweight abstractions over real operating system threads.
Also, I read a while ago on some stack overflow thread itself that Microsoft scratched an attempt to maintain this separation in the CLR, like SQL Server does. There was some project underway to use the Fiber API for this purpose, I recollect, but I can't say I understood all the details of what I read.
I would like some more detailed literature on this topic such as the internal structure of a .NET thread vis-a-vis that of a thread created by Windows. While there is plenty of information available on the structure of a thread created by Windows, Jeffrey Richter's Advanced Windows Programming book being one of the sources, for instance, I can't find any literature dedicated to the internal structure of a .NET thread.
One might argue that this information is available in the .NET source code, which is now publicly available, or using a disassembler such as Reflector or IL Spy, but I don't see anything to represent the Thread Control Block (TCB) and Program Counter (PC) and Stack Pointer (SP) or the thread's wait queue, or the list of queues that the thread is currently a member of in the Thread class.
Where can I read about this? Does the documentation mention any of it? I have read all of these pages from the MSDN but they do not seem to mention it.

.NET's threads are indeed abstractions, but you can basically think of them as nearly identical to OS threads. There are some key differences especially with respect to garbage collection, but to the vast majority of programmers (read: programmers who are unlikely to spin up WinDBG) there is no functional difference.
For more detail, read this

It's important to have some ideas about expected performance and the nature of intended concurrency, before you decide how to do concurrency. For instance, .NET is a virtual machine, so concurrency is simulated. This means more overhead for starting the execution than direct OS threads.
If your application intends to have significant concurrency on demand, .NET tasks or threads (even ThreadPool) will be slow to create and begin executing. In such a case you may benefit from Windows OS threads. However, you'll want to use the unmanaged thread pool (unless you're like me and you prefer writing your own thread pool).
If performance of kicking off an indeterminate number of threads on demand is not a design goal then you should consider the .NET concurrency. The reason is it is easier to program and you can take advantage of keywords impacting concurrency built into the programming language.
To put this into perspective, I wrote a test application that tested an algorithm running in different concurrent units of execution. I ran the test under my own unmanaged thread pool, .NET Tasks, .NET Threads and the .NET ThreadPool.
With concurrency set to 512, meaning 512 concurrent units of execution will be invoked as quickly as possible, I found anything .NET to be extremely slow off the starting block. I ran the test on several systems from an i5 4-core desktop with 16gb RAM to a Windows Server 2012 R2 and the results are the same.
I captured the number of concurrent units of execution completed, how long each unit took to start, and CPU core utilization. The duration of each test was 30 seconds.
All tests resulted in well-balanced CPU core utilization (contrary to the beliefs of some). However, anything .NET would lose the race. In 30 seconds...
.NET Tasks and Threads (ThreadPool) had 34 Tasks completed with an 885ms average start time
All 512 OS Threads ran to completion with a 59ms average start time.
Regardless of what anyone says, the path from invoking an API to start a unit of execution and the actual unit executing, is much longer in .NET.

Related

Is one C# thread one CPU thread?

I am wondering if one C# thread uses up one CPU thread as example:
If you have a CPU with eight threads, for example: 4790k and you start two new C# threads, do you lose two threads? Like 8 - 2 = 6?

Threads are not cores. In the case you gave, your computer has 8 cores (4 physical and 4 logical) that can handle threads.
Computers handle hundreds of threads at once and switch in-between them extremely fast. If you create a Thread class in your C# application, it will create a new thread that will execute on any of the cores on your CPU. The more cores that you have, the faster your computer runs because it can handle more threads at once (there are other factors at play in the speed of the computer than just how many cores you have).

The virtual machine (CLR) is free to do whatever it wants if I recall correctly. There are implementations called green threads for example, that don't necessarily map managed threads to native threads. However, of course, in .NET on a desktop x86, the implementation we know try hard to really map to system threads.
Now this is one layer, but it still says nothing about the CPU. The OS scheduler will let a hardware thread (virtualized because of hyper threading) execute a native thread when it decides it's OK.
There is such a thing known as overcommit which speaks of when too many native threads wants to run (are present in the OS's scheduler ready queue), but the CPU have only that many threads to use, so some remains waiting, until their weight which is an internal measure of the OS scheduler decides that it's their turn.
You can check this document for more information: CFS Scheduler
Then as for the CLR, maybe this question: How does a managed thread work and exist on a native platform?
Indeed in the discussion the useful MSDN page
Managed and unmanaged threading in Windows
explains that:
An operating-system ThreadId has no fixed relationship to a managed
thread, because an unmanaged host can control the relationship between
managed and unmanaged threads. Specifically, a sophisticated host can
use the Fiber API to schedule many managed threads against the same
operating system thread, or to move a managed thread among different
operating system threads.
For very interesting reasons. One of course is the freedom of virtual machine implementation. But another one is the freedom of passing between the native/managed barrier with lots of flexibility as explained on this page. Notably the fact that native threads can enter to managed application domains backwards, and see the virtual machine allocating a managed thread object automatically. This management is maintained using thread id hashes. Crazy stuff :)
Actually, we can see on page CLR Inside Out that the CLR has configurable thread API mapping. So quite flexible, and that proves indeed that you cannot say for sure that one C# thread == one native thread.

As JRLambert already touched on, you can have many threads running on the same core.
When you spawn new threads in your .NET application, you are not actually multi-"tasking" at all. What's happening is time-slicing. Your CPU will constantly switch between the different threads letting them seem like they're working in parallel. However, a core can only work on a single thread at any given time.
When implementing the Task Parallel Library (TPL) for instance, and your machine has multiple cores, you'll see that the cores get more equally distributed work loads, than when simply spawning new threads. Microsoft really did some fantastic work with TPL to simplify a multi-thread asynchronous environment without having to do heavy work with the Thread Pool and Synchronization primitives as we may have had to in the past.
Have a look at this YouTube video. He describes time-slicing vs. multi-tasking quite well.
If you're not familiar with TPL yet, Sacha Barber has a fantastic series on Code Project.

Just want to share an experience as an enthusiastic programmer. I hope it may help someone.
I was developing a code, defining a certain amount of tasks which were planned to start in a fraction of a second interval. however, I noticed that all tasks were run in a group of 8! I was wondering whether the number of CPU thread imposed a limitation here? Actually, I expected all tasks, say 40, should've started exactly in a queue, one by one, according to time interval I set in the code. My old laptop had a CPU corei5, with 4 cores and 8 threats. the same number of 8 made me so confused so that I wanted to hit my head on a wall :)
Task<object>[] tasks = new Task<object>[maxLoop];
String[] timeStamps = new String[maxLoop];
for (int i = 0; i < maxLoop; i++)
{
tasks[i] = SubmitRequest();
timeStamps[i] = DateTime.Now.TimeOfDay.ToString();
await Task.Delay(timeInterval);
}
await Task.WhenAll(tasks);
After reading the Microsoft documents, I reached to this page:ThreadPool.SetMinThreads(Int32, Int32) Method
According to the page, "By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads."
then I added this to the code:
ThreadPool.SetMinThreads(maxLoop, maxLoop);
and the problem was resolved.
Just notice unnecessarily increasing the minimum threads may decrease performance. If too many tasks start at the same time, it may impact the overall speed of the task. So normally, it was advised to leave it to the thread pool to decide with its own algorithm for allocating threads.

Cross AppDomain Thread Pool W/O Context Switching

Our application is heavily CPU bound to process billions of tuples of data as fast as possible. It uses multi-cores and soon moving to distributed in the cloud.
So the goal is to using the CPU absolutely as efficiently as possible. This question is about how to maintain that high level of performance will allows plugs to loaded/unloaded dynamically at run time.
Please understand that while it's easy to communicate across AppDomains, none of the "easy" ways will meet the performance requirements above. So this questions discusses the reasons that the common techniques are too slow and further requirement plus specific questions of our ideas to solve.
To achieve the performance, the application has been designed to communicate among components via "message passing" which means that we have a user mode task scheduler to keep those threads busy without ever (except when necessary) relinquishing any time slices or context switches to the operating system.
The ONLY way to unload DLLs in .NET is via AppDomains. However, to maintain the high level of performance described above it means that the thread pool (we have our own home-grown thread pool) must be able to perform tasks in various different AppDomains.
More specifically, it will be terrible performance to have separate threads for each of dozens of AppDomains that then compete for CPU. More threads than cores for CPU bound work will kill performance as the operating systems spends tremendous time context switching among threads.
There seems after preliminary research zero efficient way for our own thread pool to cross into other AppDomain with any kind of decent performance. Remoting or serializing are both impossibly slow even with zero arguments to methods.
Please note that this is not about data sharing. We don't want to use the thread calls into different AppDomain to passing data. Instead, we expect to share data among AppDomains via sockets and memory mapped files for top flight performance just like proper Interprocess Communication. So this question only relates to get threads working across AppDomains.
The following link here on Stackoverflow which is over 2 years old hints at capitalizing on the built-in thread pool in the CLR for .Net which states that it crosses other AppDomains to do tasks. MS documentation also verifies that the CLR thread pool operates across all AppDomains.
.Net How to create a custom ThreadPool shared across all the AppDomain of a process?
Still after reading documentation, how to use the built-in thread pool across AppDomain while NEVER allowing any context switches when crossing AppDomains?
So the design goal is how to rotate the threads (one per core) to frequently check the "run queue" of tasks in each AppDomain to see if there is work to do there and then move to the next AppDomain? And so on, looping through each AppDomains scheduler? How to do that w/o waiting any context switching overhead or remoting or marshalling?
Note, of course, we'll have some cleverness into which AppDomains are assigned to each threads to avoid L1 cache misses and such to avoid hardware bottlenecks.
Also another idea that we wonder about is writing our own custom CLR host. It appears that the C++ API allows implementing our own thread pool. Does anyone know if that will allow for the above capabilities? If so, is that the only way to do it through unmanaged code?

Are Socket.*Async methods threaded?

I'm currently trying to figure what is the best way to minimize the amount of threads I use in a TCP master server, in order to maximize performance.
As I've been reading a lot recently with the new async features of C# 5.0, asynchronous does not necessarily mean multithreaded. It could mean separated in smaller chunks of finite state objects, then processed alongside other operations, by alternating. However, I don't see how this could be done in networking, since I'm basically "waiting" for input (from the client).
Therefore, I wouldn't use ReceiveAsync() for all my sockets, it would just be creating and ending threads continuously (assuming it does create threads).
Consequently, my question is more or less: what architecture can a master server take without having one "thread" per connection?
Side question for bonus coolness points: Why is having multiple threads bad, considering that having an amount of threads that is over your amount of processing cores simply makes the machine "fake" multithreading, just like any other asynchronous method would?

No, you would not necessarily be creating threads. There are two possible ways you can do async without setting up and tearing down threads all the time:
You can have a "small" number of long-lived threads, and have them sleep when there's no work to do (this means that the OS will never schedule them for execution, so the resource drain is minimal). Then, when work arrives (i.e. Async method called), wake one of them up and tell it what needs to be done. Pleased to meet you, managed thread pool.
In Windows, the most efficient mechanism for async is I/O completion ports which synchronizes access to I/O operations and allows a small number of threads to manage massive workloads.
Regarding multiple threads:
Having multiple threads is not bad for performance, if
the number of threads is not excessive
the threads do not oversaturate the CPU
If the number of threads is excessive then obviously we are taxing the OS with having to keep track of and schedule all these threads, which uses up global resources and slows it down.
If the threads are CPU-bound, then the OS will need to perform much more frequent context switches in order to maintain fairness, and context switches kill performance. In fact, with user-mode threads (which all highly scalable systems use -- think RDBMS) we make our lives harder just so we can avoid context switches.
Update:
I just found this question, which lends support to the position that you can't say how many threads are too much beforehand -- there are just too many unknown variables.

Seems like the *Async methods use IOCP (by looking at the code with Reflector).

Jon's answer is great. As for the 'side question'... See http://en.wikipedia.org/wiki/Amdahl%27s_law. Amdel's law says that serial code quickly diminishes the gains to be had from parallel code. We also know that thread coordination (scheduling, context switching, etc) is serial - so at some point more threads means there are so many serial steps that parallelization benefits are lost and you have a net negative performance. This is tricky stuff. That's why there is so much effort going into letting .NET manage threads while we define 'tasks' for the framework to decide what thread to run on. The framework can switch between tasks much more efficiently than the OS can switch between threads because the OS has a lot of extra things it needs to worry about when doing so.

Asynchronous work can be done without one-thread-per-connection or a thread pool with OS support for select or poll (and Windows supports this and it is exposed via Socket.Select). I am not sure of the performance on windows, but this is a very common idiom elsewhere.
One thread is the "pump" that manages the IO connections and monitors changes to the streams and then dispatches messages to/from other threads (conceivably 0 ... n depending upon model). Approaches with 0 or 1 additional threads may fall into the "Event Machine" category like twisted (Python) or POE (Perl). With >1 threads the callers form an "implicit thread pool" (themselves) and basically just offload the blocking IO.
There are also approaches like Actors, Continuations or Fibres exposed in the underlying models of some languages which alter how the basic problem is approached -- don't wait, react.
Happy coding.

Is there simple and conscious diagram / algorithm on threads scheduling?

I have soft real-time .NET application run on W2008R2.
I just realised that I cant explain how precisely threads are being scheduled.
And for my embarassement I dont know how it works down to OS threads at all.
So I will explain what I know and I'd appreciate if anyone could help me to fill the gaps and refer me to a simple description of algorithms which are used in .NET and Windows to schedule threads.
My code runs in managed threads. As I know managed threads (lets call them .NET threads) run in unmanaged threads (lets call them OS threads).
I know that threads are competing for CPU time and other resources. And that there is a piece of software - scheduler which monitors resources and threads and make the whole thing work.
Here I am not sure - is the scheduler just one for the OS or there is also .NET scheduler which schedules .NET threads? And if there are two schedulers then how they are working together?
And what are the rules to schedule a thread?
I want to narrow this question to a thread which does only arithmetic computations and no access to any devices so we are talking purely CPU consuming thread.
How scheduler factors .NET thread priority, process priority and a shared resource (CPU in this case) ?

The current versions of the .NET desktop runtime do have a 1:1 mapping between .NET threads and OS threads. This may change in future versions (and may not be true in other hosts; I know it's not true in AutoCAD but I believe it's true in SQL Server and MS Office).
Note that the .NET thread pool is different than the OS thread pool (though it does make use of a single process-wide I/O Completion Port, which has some pool-like semantics).
For full and complete answers, you'll need two books: Windows Internals and Concurrent Programming on Windows. I recommend reading them in that order.

Are multithreaded apps bound to a single core?

I'm running a .NET remoting application built using .NET 2.0. It is a console app, although I removed the [STAThread] on Main.
The TCP channel I'm using uses a ThreadPool in the background.
I've been reported that when running on a dual core box, under heay load, the application never uses more than 50% of the CPU (although I've seen it at 70% or more on a quad core).
Is there any restriction in terms of multi-core for remoting apps or ThreadPools?
Is it needed to change something in order to make a multithreaded app run on several cores?
Thanks

There shouldn't be.
There are several reasons why you could be seeing this behavior:
Your threads are IO bound.
In that case you won't see a lot of parallelism, because everything will be waiting on the disk. A single disk is inherently sequential.
Your lock granularity is too small
Your app may be spending most of it's time obtaining locks, rather than executing your app logic. This can slow things down considerably.
Your lock granularity is too big
If your locking is not granular enough, your other threads may spend a lot of time waiting.
You have a lot of lock contention
Your threads might all be trying to lock the same resources at the same time, making them inherently sequential.
You may not be partitioning your threads correctly.
You may be running the wrong things on multiple threads. For example, if you are using one thread per connection, you may not be taking advantage of available parallelism within the task you are running on that thread. Try splitting those tasks up into chunks that can run in parallel.
Your process may not have a lot of available parallelism
You might just be doing stuff that can't really be done in parallel.
I would try and investigate each one to see what the cause is.

Multithreaded applications will use all of your cores.
I suspect your behavior is due to this statement:
The TCP channel I'm using uses a ThreadPool in the background.
TCP, as well as most socket/file/etc code, tends to use very little CPU. It's spending most of its time waiting, so the CPU usage of your program will probably never spike. Try using the threadpool with heavy computations, and you'll see your processor spike to near 100% CPU usage.

Multi-threaded apps are not required to be bound to a single core. You can check the affinity (how many cores it operates on) of a thread by using ProcessThread.ProcessorAffinity. I'm not sure what the default behavior is, but you can change it programmatically if you need to.
Here is an example of how to do this (taken directly from TechRepublic)
Console.WriteLine("Current ProcessorAffinity: {0}",
Process.GetCurrentProcess().ProcessorAffinity);
Process.GetCurrentProcess().ProcessorAffinity = (System.IntPtr)2;
Console.WriteLine("Current ProcessorAffinity: {0}",
Process.GetCurrentProcess().ProcessorAffinity);
And the output:
Current ProcessorAffinity: 3
Current ProcessorAffinity: 2
The code above, first shows that the process is running on both cores. Then it changes to only use the second core and shows that it is now using only the second core. You can read the .NET documentation on ProcessorAffinity to see what the various numbers mean for affinity.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.