Cross AppDomain Thread Pool W/O Context Switching - c#

Our application is heavily CPU bound to process billions of tuples of data as fast as possible. It uses multi-cores and soon moving to distributed in the cloud.
So the goal is to using the CPU absolutely as efficiently as possible. This question is about how to maintain that high level of performance will allows plugs to loaded/unloaded dynamically at run time.
Please understand that while it's easy to communicate across AppDomains, none of the "easy" ways will meet the performance requirements above. So this questions discusses the reasons that the common techniques are too slow and further requirement plus specific questions of our ideas to solve.
To achieve the performance, the application has been designed to communicate among components via "message passing" which means that we have a user mode task scheduler to keep those threads busy without ever (except when necessary) relinquishing any time slices or context switches to the operating system.
The ONLY way to unload DLLs in .NET is via AppDomains. However, to maintain the high level of performance described above it means that the thread pool (we have our own home-grown thread pool) must be able to perform tasks in various different AppDomains.
More specifically, it will be terrible performance to have separate threads for each of dozens of AppDomains that then compete for CPU. More threads than cores for CPU bound work will kill performance as the operating systems spends tremendous time context switching among threads.
There seems after preliminary research zero efficient way for our own thread pool to cross into other AppDomain with any kind of decent performance. Remoting or serializing are both impossibly slow even with zero arguments to methods.
Please note that this is not about data sharing. We don't want to use the thread calls into different AppDomain to passing data. Instead, we expect to share data among AppDomains via sockets and memory mapped files for top flight performance just like proper Interprocess Communication. So this question only relates to get threads working across AppDomains.
The following link here on Stackoverflow which is over 2 years old hints at capitalizing on the built-in thread pool in the CLR for .Net which states that it crosses other AppDomains to do tasks. MS documentation also verifies that the CLR thread pool operates across all AppDomains.
.Net How to create a custom ThreadPool shared across all the AppDomain of a process?
Still after reading documentation, how to use the built-in thread pool across AppDomain while NEVER allowing any context switches when crossing AppDomains?
So the design goal is how to rotate the threads (one per core) to frequently check the "run queue" of tasks in each AppDomain to see if there is work to do there and then move to the next AppDomain? And so on, looping through each AppDomains scheduler? How to do that w/o waiting any context switching overhead or remoting or marshalling?
Note, of course, we'll have some cleverness into which AppDomains are assigned to each threads to avoid L1 cache misses and such to avoid hardware bottlenecks.
Also another idea that we wonder about is writing our own custom CLR host. It appears that the C++ API allows implementing our own thread pool. Does anyone know if that will allow for the above capabilities? If so, is that the only way to do it through unmanaged code?

Related

Are .NET threads different from operating system threads?

Are .NET threads lightweight user-mode threads or are they kernel-mode operating system threads?
Also, sparing SQL Server, is there a one-to-one correspondence between a .NET thread an an operating system thread?
I am also intrigued because the Thread class that has a symmetric pair of methods named BeginThreadAffinity and EndThreadAffinity, whose documentation subtly suggests that .NET threads are lightweight abstractions over real operating system threads.
Also, I read a while ago on some stack overflow thread itself that Microsoft scratched an attempt to maintain this separation in the CLR, like SQL Server does. There was some project underway to use the Fiber API for this purpose, I recollect, but I can't say I understood all the details of what I read.
I would like some more detailed literature on this topic such as the internal structure of a .NET thread vis-a-vis that of a thread created by Windows. While there is plenty of information available on the structure of a thread created by Windows, Jeffrey Richter's Advanced Windows Programming book being one of the sources, for instance, I can't find any literature dedicated to the internal structure of a .NET thread.
One might argue that this information is available in the .NET source code, which is now publicly available, or using a disassembler such as Reflector or IL Spy, but I don't see anything to represent the Thread Control Block (TCB) and Program Counter (PC) and Stack Pointer (SP) or the thread's wait queue, or the list of queues that the thread is currently a member of in the Thread class.
Where can I read about this? Does the documentation mention any of it? I have read all of these pages from the MSDN but they do not seem to mention it.
.NET's threads are indeed abstractions, but you can basically think of them as nearly identical to OS threads. There are some key differences especially with respect to garbage collection, but to the vast majority of programmers (read: programmers who are unlikely to spin up WinDBG) there is no functional difference.
For more detail, read this
It's important to have some ideas about expected performance and the nature of intended concurrency, before you decide how to do concurrency. For instance, .NET is a virtual machine, so concurrency is simulated. This means more overhead for starting the execution than direct OS threads.
If your application intends to have significant concurrency on demand, .NET tasks or threads (even ThreadPool) will be slow to create and begin executing. In such a case you may benefit from Windows OS threads. However, you'll want to use the unmanaged thread pool (unless you're like me and you prefer writing your own thread pool).
If performance of kicking off an indeterminate number of threads on demand is not a design goal then you should consider the .NET concurrency. The reason is it is easier to program and you can take advantage of keywords impacting concurrency built into the programming language.
To put this into perspective, I wrote a test application that tested an algorithm running in different concurrent units of execution. I ran the test under my own unmanaged thread pool, .NET Tasks, .NET Threads and the .NET ThreadPool.
With concurrency set to 512, meaning 512 concurrent units of execution will be invoked as quickly as possible, I found anything .NET to be extremely slow off the starting block. I ran the test on several systems from an i5 4-core desktop with 16gb RAM to a Windows Server 2012 R2 and the results are the same.
I captured the number of concurrent units of execution completed, how long each unit took to start, and CPU core utilization. The duration of each test was 30 seconds.
All tests resulted in well-balanced CPU core utilization (contrary to the beliefs of some). However, anything .NET would lose the race. In 30 seconds...
.NET Tasks and Threads (ThreadPool) had 34 Tasks completed with an 885ms average start time
All 512 OS Threads ran to completion with a 59ms average start time.
Regardless of what anyone says, the path from invoking an API to start a unit of execution and the actual unit executing, is much longer in .NET.

Is one C# thread one CPU thread?

I am wondering if one C# thread uses up one CPU thread as example:
If you have a CPU with eight threads, for example: 4790k and you start two new C# threads, do you lose two threads? Like 8 - 2 = 6?
Threads are not cores. In the case you gave, your computer has 8 cores (4 physical and 4 logical) that can handle threads.
Computers handle hundreds of threads at once and switch in-between them extremely fast. If you create a Thread class in your C# application, it will create a new thread that will execute on any of the cores on your CPU. The more cores that you have, the faster your computer runs because it can handle more threads at once (there are other factors at play in the speed of the computer than just how many cores you have).
The virtual machine (CLR) is free to do whatever it wants if I recall correctly. There are implementations called green threads for example, that don't necessarily map managed threads to native threads. However, of course, in .NET on a desktop x86, the implementation we know try hard to really map to system threads.
Now this is one layer, but it still says nothing about the CPU. The OS scheduler will let a hardware thread (virtualized because of hyper threading) execute a native thread when it decides it's OK.
There is such a thing known as overcommit which speaks of when too many native threads wants to run (are present in the OS's scheduler ready queue), but the CPU have only that many threads to use, so some remains waiting, until their weight which is an internal measure of the OS scheduler decides that it's their turn.
You can check this document for more information: CFS Scheduler
Then as for the CLR, maybe this question: How does a managed thread work and exist on a native platform?
Indeed in the discussion the useful MSDN page
Managed and unmanaged threading in Windows
explains that:
An operating-system ThreadId has no fixed relationship to a managed
thread, because an unmanaged host can control the relationship between
managed and unmanaged threads. Specifically, a sophisticated host can
use the Fiber API to schedule many managed threads against the same
operating system thread, or to move a managed thread among different
operating system threads.
For very interesting reasons. One of course is the freedom of virtual machine implementation. But another one is the freedom of passing between the native/managed barrier with lots of flexibility as explained on this page. Notably the fact that native threads can enter to managed application domains backwards, and see the virtual machine allocating a managed thread object automatically. This management is maintained using thread id hashes. Crazy stuff :)
Actually, we can see on page CLR Inside Out that the CLR has configurable thread API mapping. So quite flexible, and that proves indeed that you cannot say for sure that one C# thread == one native thread.
As JRLambert already touched on, you can have many threads running on the same core.
When you spawn new threads in your .NET application, you are not actually multi-"tasking" at all. What's happening is time-slicing. Your CPU will constantly switch between the different threads letting them seem like they're working in parallel. However, a core can only work on a single thread at any given time.
When implementing the Task Parallel Library (TPL) for instance, and your machine has multiple cores, you'll see that the cores get more equally distributed work loads, than when simply spawning new threads. Microsoft really did some fantastic work with TPL to simplify a multi-thread asynchronous environment without having to do heavy work with the Thread Pool and Synchronization primitives as we may have had to in the past.
Have a look at this YouTube video. He describes time-slicing vs. multi-tasking quite well.
If you're not familiar with TPL yet, Sacha Barber has a fantastic series on Code Project.
Just want to share an experience as an enthusiastic programmer. I hope it may help someone.
I was developing a code, defining a certain amount of tasks which were planned to start in a fraction of a second interval. however, I noticed that all tasks were run in a group of 8! I was wondering whether the number of CPU thread imposed a limitation here? Actually, I expected all tasks, say 40, should've started exactly in a queue, one by one, according to time interval I set in the code. My old laptop had a CPU corei5, with 4 cores and 8 threats. the same number of 8 made me so confused so that I wanted to hit my head on a wall :)
Task<object>[] tasks = new Task<object>[maxLoop];
String[] timeStamps = new String[maxLoop];
for (int i = 0; i < maxLoop; i++)
{
tasks[i] = SubmitRequest();
timeStamps[i] = DateTime.Now.TimeOfDay.ToString();
await Task.Delay(timeInterval);
}
await Task.WhenAll(tasks);
After reading the Microsoft documents, I reached to this page:ThreadPool.SetMinThreads(Int32, Int32) Method
According to the page, "By default, the minimum number of threads is set to the number of processors on a system. You can use the SetMinThreads method to increase the minimum number of threads."
then I added this to the code:
ThreadPool.SetMinThreads(maxLoop, maxLoop);
and the problem was resolved.
Just notice unnecessarily increasing the minimum threads may decrease performance. If too many tasks start at the same time, it may impact the overall speed of the task. So normally, it was advised to leave it to the thread pool to decide with its own algorithm for allocating threads.

does C# lock affect other processes on the same computer system

I'm trying to understand whether a process would interfere another process running on the same piece of hardware system. This could happen in a wild range of products. ie. vmware or as simple as running multiple .net applications.
If I have repetitive lock happening of a particular process say, interlock, or lock keywords in C# terms, will it affect the performance other processes due to its intensive usage of lock? The setting is a heavy loaded www system, and I am experience some situational delay, I would like to determine whether the delay was caused by a dense while loop of locks that was completely isolated by a different windows kernel thread.
If there is no isolation, will application domain in .net help me in this case?
Thanks for your answer
No it won't. A lock in C#, and .Net overall, is local to a process. It can't directly affect other processes on the machine.
A lock statement operates on a particular instance of an object. In order for a lock to effect multiple processes they would all have to lock on the same instance of an object. This is not possible since objects are local to a process.
I'm trying to understand whether a process would interfere another process running on the same piece of hardware system
Is there anything that lead you to this question or are you simply just imagining some scenario based on a whim?
A lock is local to the process running those threads. If you want to synchronize across processes, consider using a Semaphore.
will it affect the performance other processes due to its intensive usage of lock?
Short answer, no. Of course, unfettered and whimsical use of lock will probably lead to some live-lock/deadlock scenarios.
No, that's not going to be a problem... you're only locking your own worker process only. Other tasks have their process. While locks are useful for specific tasks I'd recommend you keep them to a minimum since you'll introduce waits in your application.

Are Socket.*Async methods threaded?

I'm currently trying to figure what is the best way to minimize the amount of threads I use in a TCP master server, in order to maximize performance.
As I've been reading a lot recently with the new async features of C# 5.0, asynchronous does not necessarily mean multithreaded. It could mean separated in smaller chunks of finite state objects, then processed alongside other operations, by alternating. However, I don't see how this could be done in networking, since I'm basically "waiting" for input (from the client).
Therefore, I wouldn't use ReceiveAsync() for all my sockets, it would just be creating and ending threads continuously (assuming it does create threads).
Consequently, my question is more or less: what architecture can a master server take without having one "thread" per connection?
Side question for bonus coolness points: Why is having multiple threads bad, considering that having an amount of threads that is over your amount of processing cores simply makes the machine "fake" multithreading, just like any other asynchronous method would?
No, you would not necessarily be creating threads. There are two possible ways you can do async without setting up and tearing down threads all the time:
You can have a "small" number of long-lived threads, and have them sleep when there's no work to do (this means that the OS will never schedule them for execution, so the resource drain is minimal). Then, when work arrives (i.e. Async method called), wake one of them up and tell it what needs to be done. Pleased to meet you, managed thread pool.
In Windows, the most efficient mechanism for async is I/O completion ports which synchronizes access to I/O operations and allows a small number of threads to manage massive workloads.
Regarding multiple threads:
Having multiple threads is not bad for performance, if
the number of threads is not excessive
the threads do not oversaturate the CPU
If the number of threads is excessive then obviously we are taxing the OS with having to keep track of and schedule all these threads, which uses up global resources and slows it down.
If the threads are CPU-bound, then the OS will need to perform much more frequent context switches in order to maintain fairness, and context switches kill performance. In fact, with user-mode threads (which all highly scalable systems use -- think RDBMS) we make our lives harder just so we can avoid context switches.
Update:
I just found this question, which lends support to the position that you can't say how many threads are too much beforehand -- there are just too many unknown variables.
Seems like the *Async methods use IOCP (by looking at the code with Reflector).
Jon's answer is great. As for the 'side question'... See http://en.wikipedia.org/wiki/Amdahl%27s_law. Amdel's law says that serial code quickly diminishes the gains to be had from parallel code. We also know that thread coordination (scheduling, context switching, etc) is serial - so at some point more threads means there are so many serial steps that parallelization benefits are lost and you have a net negative performance. This is tricky stuff. That's why there is so much effort going into letting .NET manage threads while we define 'tasks' for the framework to decide what thread to run on. The framework can switch between tasks much more efficiently than the OS can switch between threads because the OS has a lot of extra things it needs to worry about when doing so.
Asynchronous work can be done without one-thread-per-connection or a thread pool with OS support for select or poll (and Windows supports this and it is exposed via Socket.Select). I am not sure of the performance on windows, but this is a very common idiom elsewhere.
One thread is the "pump" that manages the IO connections and monitors changes to the streams and then dispatches messages to/from other threads (conceivably 0 ... n depending upon model). Approaches with 0 or 1 additional threads may fall into the "Event Machine" category like twisted (Python) or POE (Perl). With >1 threads the callers form an "implicit thread pool" (themselves) and basically just offload the blocking IO.
There are also approaches like Actors, Continuations or Fibres exposed in the underlying models of some languages which alter how the basic problem is approached -- don't wait, react.
Happy coding.

Alternative to Threads

I've read that threads are very problematic. What alternatives are available? Something that handles blocking and stuff automatically?
A lot of people recommend the background worker, but I've no idea why.
Anyone care to explain "easy" alternatives? The user will be able to select the number of threads to use (depending on their speed needs and computer power).
Any ideas?
To summarize the problems with threads:
if threads share memory, you can get
race conditions
if you avoid races by liberally using locks, you
can get deadlocks (see the dining philosophers problem)
An example of a race: suppose two threads share access to some memory where a number is stored. Thread 1 reads from the memory address and stores it in a CPU register. Thread 2 does the same. Now thread 1 increments the number and writes it back to memory. Thread 2 then does the same. End result: the number was only incremented by 1, while both threads tried to increment it. The outcome of such interactions depend on timing. Worse, your code may seem to work bug-free but once in a blue moon the timing is wrong and bad things happen.
To avoid these problems, the answer is simple: avoid sharing writable memory. Instead, use message passing to communicate between threads. An extreme example is to put the threads in separate processes and communicate via TCP/IP connections or named pipes.
Another approach is to share only read-only data structures, which is why functional programming languages can work so well with multiple threads.
This is a bit higher-level answer, but it may be useful if you want to consider other alternatives to threads. Anyway, most of the answers discussed solutions based on threads (or thread pools) or maybe tasks from .NET 4.0, but there is one more alternative, which is called message-passing. This has been successfuly used in Erlang (a functional language used by Ericsson). Since functional programming is becoming more mainstream in these days (e.g. F#), I thought I could mention it. In genral:
Threads (or thread pools) can usually used when you have some relatively long-running computation. When it needs to share state with other threads, it gets tricky (you have to correctly use locks or other synchronization primitives).
Tasks (available in TPL in .NET 4.0) are very lightweight - you can split your program into thousands of tasks and then let the runtime run them (it will use optimal number of threads). If you can write your algorithm using tasks instead of threads, it sounds like a good idea - you can avoid some synchronization when you run computation using smaller steps.
Declarative approaches (PLINQ in .NET 4.0 is a great option) if you have some higher-level data processing operation that can be encoded using LINQ primitives, then you can use this technique. The runtime will automatically parallelize your code, because LINQ doesn't specify how exactly should it evaluate the results (you just say what results you want to get).
Message-passing allows you two write program as concurrently running processes that perform some (relatively simple) tasks and communicate by sending messages to each other. This is great, because you can share some state (send messages) without the usual synchronization issues (you just send a message, then do other thing or wait for messages). Here is a good introduction to message-passing in F# from Robert Pickering.
Note that the last three techniques are quite related to functional programming - in functional programming, you desing programs differently - as computations that return result (which makes it easier to use Tasks). You also often write declarative and higher-level code (which makes it easier to use Declarative approaches).
When it comes to actual implementation, F# has a wonderful message-passing library right in the core libraries. In C#, you can use Concurrency & Coordination Runtime, which feels a bit "hacky", but is probably quite powerful too (but may look too complicated).
Won't the parallel programming options in .Net 4 be an "easy" way to use threads? I'm not sure what I'd suggest for .Net 3.5 and earlier...
This MSDN link to the Parallel Computing Developer Center has links to lots of info on Parellel Programming including links to videos, etc.
I can recommend this project. Smart Thread Pool
Project Description
Smart Thread Pool is a thread pool written in C#. It is far more advanced than the .NET built-in thread pool.
Here is a list of the thread pool features:
The number of threads dynamically changes according to the workload on the threads in the pool.
Work items can return a value.
A work item can be cancelled.
The caller thread's context is used when the work item is executed (limited).
Usage of minimum number of Win32 event handles, so the handle count of the application won't explode.
The caller can wait for multiple or all the work items to complete.
Work item can have a PostExecute callback, which is called as soon the work item is completed.
The state object, that accompanies the work item, can be disposed automatically.
Work item exceptions are sent back to the caller.
Work items have priority.
Work items group.
The caller can suspend the start of a thread pool and work items group.
Threads have priority.
Can run COM objects that have single threaded apartment.
Support Action and Func delegates.
Support for WindowsCE (limited)
The MaxThreads and MinThreads can be changed at run time.
Cancel behavior is imporved.
"Problematic" is not the word I would use to describe working with threads. "Tedious" is a more appropriate description.
If you are new to threaded programming, I would suggest reading this thread as a starting point. It is by no means exhaustive but has some good introductory information. From there, I would continue to scour this website and other programming sites for information related to specific threading questions you may have.
As for specific threading options in C#, here's some suggestions on when to use each one.
Use BackgroundWorker if you have a single task that runs in the background and needs to interact with the UI. The task of marshalling data and method calls to the UI thread are handled automatically through its event-based model. Avoid BackgroundWorker if (1) your assembly does not already reference the System.Windows.Form assembly, (2) you need the thread to be a foreground thread, or (3) you need to manipulate the thread priority.
Use a ThreadPool thread when efficiency is desired. The ThreadPool helps avoid the overhead associated with creating, starting, and stopping threads. Avoid using the ThreadPool if (1) the task runs for the lifetime of your application, (2) you need the thread to be a foreground thread, (3) you need to manipulate the thread priority, or (4) you need the thread to have a fixed identity (aborting, suspending, discovering).
Use the Thread class for long-running tasks and when you require features offered by a formal threading model, e.g., choosing between foreground and background threads, tweaking the thread priority, fine-grained control over thread execution, etc.
Any time you introduce multiple threads, each running at once, you open up the potential for race conditions. To avoid these, you tend to need to add synchronization, which adds complexity, as well as the potential for deadlocks.
Many tools make this easier. .NET has quite a few classes specifically meant to ease the pain of dealing with multiple threads, including the BackgroundWorker class, which makes running background work and interacting with a user interface much simpler.
.NET 4 is going to do a lot to ease this even more. The Task Parallel Library and PLINQ dramatically ease working with multiple threads.
As for your last comment:
The user will be able to select the number of threads to use (depending on their speed needs and computer power).
Most of the routines in .NET are built upon the ThreadPool. In .NET 4, when using the TPL, the work load will actually scale at runtime, for you, eliminating the burden of having to specify the number of threads to use. However, there are ways to do this now.
Currently, you can use ThreadPool.SetMaxThreads to help limit the number of threads generated. In TPL, you can specify ParallelOptions.MaxDegreesOfParallelism, and pass an instance of the ParallelOptions into your routine to control this. The default behavior scales up with more threads as you add more processing cores, which is usually the best behavior in any case.
Threads are not problematic if you understand what causes problems with them.
For ex. if you avoid statics, you know which API's to use (e.g. use synchronized streams), you will avoid many of the issues that come up for their bad utilization.
If threading is a problem (this can happen if you have unsafe/unmanaged 3rd party dll's that cannot support multithreading. In this can an option is to create a meachism to queue the operations. ie store the parameters of the action to a database and just run through them one at a time. This can be done in a windows service. Obviously this will take longer but in some cases is the only option.
Threads are indispensable tools for solving many problems, and it behooves the maturing developer to know how to effectively use them. But like many tools, they can cause some very difficult-to-find bugs.
Don't shy away from some so useful just because it can cause problems, instead study and practice until you become the go-to guy for multi-threaded apps.
A great place to start is Joe Albahari's article: http://www.albahari.com/threading/.

Categories

Resources