I know something about the IOCP, but I'm a little confused with APM.
static FileStream fs;
static void Main(string[] args)
{
fs = new FileStream(#"c:\bigfile.txt", FileMode.Open);
var buffer = new byte[10000000];
IAsyncResult asyncResult = fs.BeginRead(buffer, 0, 10000000, OnCompletedRead, null);
Console.WriteLine("async...");
int bytesRead = fs.EndRead(asyncResult);
Console.WriteLine("async... over");
}
static void OnCompletedRead(IAsyncResult ar)
{
Console.WriteLine("finished");
}
I wonder, is the read action executed by an IO thread asynchronously? Or a worker thread in a thread pool?
And the callback function OnCompletedRead, is it also executed by an IO thread in CLR thread pool?
Are these two threads the same one? If not, there are two threads generated, one executes the read action and another does the callback.
If you don't use an AsyncCallback argument with BeginRead then there is only one thread that runs code in your program. This uses IO completion ports to signal when the IO is complete by running a small amount of code on a thread in the IO thread pool to update the status of the operation as being complete. When you call EndRead it will block the current thread until the IO operation is complete. It is asynchronous in that when you start the read operation the current thread does not need to do anything other than wait for the IO hardware to perform the read operation, so you can do other things in the meantime and then decide when you want to stop and wait for the IO to finish.
If you do pass in an AsyncCallback then when the IO operation is complete it will execute a small amount of code on an IO thread pool thread which will trigger your callback method to be executed on a thread from the .NET thread pool.
Usually, mclaassen is right about the nature of IO bound work, IOCP and the APM. When BeginRead executes, it does so asynchronously all the way down to kernel mode. But, there is one caveat specifically in your example that he didn't mention in his answer.
In your example, you use the FileStream class. One important thing to note is that if you dont use the FileStream overload that accepts a useAsync boolean, when you invoke a BeginWrite / EndWrite operation, it will queue work on a new ThreadPool thread.
This is the proper overload:
public FileStream(
string path,
FileMode mode,
FileAccess access,
FileShare share,
int bufferSize,
bool useAsync
)
From MSDN:
useAsync:
Type: System.Boolean
Specifies whether to use asynchronous
I/O or synchronous I/O. However, note that the underlying operating
system might not support asynchronous I/O, so when specifying true,
the handle might be opened synchronously depending on the platform.
When opened asynchronously, the BeginRead and BeginWrite methods
perform better on large reads or writes, but they might be much slower
for small reads or writes. If the application is designed to take
advantage of asynchronous I/O, set the useAsync parameter to true.
Using asynchronous I/O correctly can speed up applications by as much
as a factor of 10, but using it without redesigning the application
for asynchronous I/O can decrease performance by as much as a factor
of 10.
You have to make sure each specific method implementing the APM pattern truly uses true asynchronous work all the way down.
Related
I completely don't understand the applied meaning of async\await.
I just started learning async\await and I know that there are already a huge number of topics. If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation. For example, database response, network request, file handling. Many people write that async\await is also needed so as not to block the main thread. And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task. So I'm trying to create a code that will wait a long time for a response from the network.
I created an example. I see with my own eyes through the windows task manager that the while (i < int.MaxValue) operation is processed first, taking up the entire processor resource, although I first launched the DownloadFile. And only then, when the processor is released, I see that the download files is in progress. On my machine, the example runs ~54 seconds.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
using System.Net;
string PathProject = Directory.GetParent(Directory.GetCurrentDirectory()).Parent.Parent.Parent.FullName;
//Create folder 1 in the project folder
DirectoryInfo Path = new DirectoryInfo($"{PathProject}\\1");
int Iterations = Environment.ProcessorCount * 3;
string file = "https://s182vla.storage.yandex.net/rdisk/82b08d86b9920a5e889c6947e4221eb1350374db8d799ee9161395f7195b0b0e/62f75403/geIEA69cusBRNOpxmtup5BdJ7AbRoezTJE9GH4TIzcUe-Cp7uoav-lLks4AknK2SfU_yxi16QmxiuZOGFm-hLQ==?uid=0&filename=004%20-%2002%20Lesnik.mp3&disposition=attachment&hash=e0E3gNC19eqNvFi1rXJjnP1y8SAS38sn5%2ByGEWhnzE5cwAGsEnlbazlMDWSjXpyvq/J6bpmRyOJonT3VoXnDag%3D%3D&limit=0&content_type=audio%2Fmpeg&owner_uid=160716081&fsize=3862987&hid=98984d857027117759bc5ce6092eaa6a&media_type=audio&tknv=v2&rtoken=k9xogU6296eg&force_default=no&ycrid=na-2bc914314062204f1cbf810798018afd-downloader16e&ts=5e61a6daac6c0&s=eef8b08190dc7b22befd6bad89e1393b394869a1668d9b8af3730cce4774e8ad&pb=U2FsdGVkX1__q3AvjJzgzWG4wVR80Oh8XMl-0Dlfyu9FhqAYQVVkoBV0dtBmajpmOkCXKUXPbREOS-MZCxMNu2rkAkKq_n-AXcZ85svtSFs";
List<Task> tasks = new List<Task>();
void MyMethod1(int i)
{
WebClient client = new WebClient();
client.DownloadFile(file, $"{Path}\\{i}.mp3");
}
void MyMethod2()
{
int i = 0;
while (i < int.MaxValue)
{
i++;
}
}
DateTime dateTimeStart = DateTime.Now;
for (int i = 0; i < Iterations; i++)
{
int j = i;
tasks.Add(Task.Run(() => MyMethod1(j)));
}
for (int i = 0; i < Iterations; i++)
{
tasks.Add(Task.Run(() => { MyMethod2(); MyMethod2(); }));
}
Task.WaitAll(tasks.ToArray());
Console.WriteLine(DateTime.Now - dateTimeStart);
while (true)
{
Thread.Sleep(100);
if (Path.GetFiles().Length == Iterations)
{
Thread.Sleep(1000);
foreach (FileInfo f in Path.GetFiles())
{
f.Delete();
}
return;
}
}
If there are 2 web servers that talk to a database and they run on 2 machines with the same spec the web server with async code will be able to handle more concurrent requests.
The following is from 2014's Async Programming : Introduction to Async/Await on ASP.NET
Why Not Increase the Thread Pool Size?
At this point, a question is always asked: Why not just increase the size of the thread pool? The answer is twofold: Asynchronous code scales both further and faster than blocking thread pool threads.
Asynchronous code can scale further than blocking threads because it uses much less memory; every thread pool thread on a modern OS has a 1MB stack, plus an unpageable kernel stack. That doesn’t sound like a lot until you start getting a whole lot of threads on your server. In contrast, the memory overhead for an asynchronous operation is much smaller. So, a request with an asynchronous operation has much less memory pressure than a request with a blocked thread. Asynchronous code allows you to use more of your memory for other things (caching, for example).
Asynchronous code can scale faster than blocking threads because the thread pool has a limited injection rate. As of this writing, the rate is one thread every two seconds. This injection rate limit is a good thing; it avoids constant thread construction and destruction. However, consider what happens when a sudden flood of requests comes in. Synchronous code can easily get bogged down as the requests use up all available threads and the remaining requests have to wait for the thread pool to inject new threads. On the other hand, asynchronous code doesn’t need a limit like this; it’s “always on,” so to speak. Asynchronous code is more responsive to sudden swings in request volume.
(These days threads are added added every 0.5 second)
WebRequest.Create("https://192.168.1.1").GetResponse()
At some point the above code will probably hit the OS method recv(). The OS will suspend your thread until data becomes available. The state of your function, in CPU registers and the thread stack, will be preserved by the OS while the thread is suspended. In the meantime, this thread can't be used for anything else.
If you start that method via Task.Run(), then your method will consume a thread from a thread pool that has been prepared for you by the runtime. Since these threads aren't used for anything else, your program can continue handling other requests on other threads. However, creating a large number of OS threads has significant overheads.
Every OS thread must have some memory reserved for its stack, and the OS must use some memory to store the full state of the CPU for any suspended thread. Switching threads can have a significant performance cost. For maximum performance, you want to keep a small number of threads busy. Rather than having a large number of suspended threads which the OS must keep swapping in and out of each CPU core.
When you use async & await, the C# compiler will transform your method into a coroutine. Ensuring that any state your program needs to remember is no longer stored in CPU registers or on the OS thread stack. Instead all of that state will be stored in heap memory while your task is suspended. When your task is suspended and resumed, only the data which you actually need will be loaded & stored, rather than the entire CPU state.
If you change your code to use .GetResponseAsync(), the runtime will call an OS method that supports overlapped I/O. While your task is suspended, no OS thread will be busy. When data is available, the runtime will continue to execute your task on a thread from the thread pool.
Is this going to impact the program you are writing today? Will you be able to tell the difference? Not until the CPU starts to become the bottleneck. When you are attempting to scale your program to thousands of concurrent requests.
If you are writing new code, look for the Async version of any I/O method. Sprinkle async & await around. It doesn't cost you anything.
If I understand correctly, then async\await is not needed anywhere else except for operations with a long wait in a thread, if this is not related to a long calculation.
It's kind of recursive, but async is best used whenever there's something asynchronous. In other words, anything where the CPU would be wasted if it had to just spin (or block) while waiting for the operation to complete. Operations that are naturally asynchronous are generally I/O-based (as you mention, DB and other network calls, as well as file I/O), but they can be more arbitrary events, too (e.g., timers). Anything where there isn't actual code to run to get the response.
Many people write that async\await is also needed so as not to block the main thread.
At a higher level, there are two primary benefits to async/await, depending on what kind of code you're talking about:
On the server side (e.g., web apps), async/await provides scalability by using fewer threads per request.
On the client side (e.g., UI apps), async/await provides responsiveness by keeping the UI thread free to respond to user input.
Developers tend to emphasize one or the other depending on the kind of work they normally do. So if you see an async article talking about "not blocking the main thread", they're talking about UI apps specifically.
And here it is completely unclear to me why it should be blocked. Don't block without async\await, just create a task.
That works just fine for many situations. But it doesn't work well in others.
E.g., it would be a bad idea to just Task.Run onto a background thread in a web app. The primary benefit of async in a web app is to provide scalability by using fewer threads per request, so using Task.Run does not provide any benefits at all (in fact, scalability is reduced). So, the idea of "use Task.Run instead of async/await" cannot be adopted as a universal principle.
The other problem is in resource-constrained environments, such as mobile devices. You can only have so many threads there before you start running into other problems.
But if you're talking Desktop apps (e.g., WPF and friends), then sure, you can use async/await to free up the UI thread, or you can use Task.Run to free up the UI thread. They both achieve the same goal.
Question: how could I first run the DownloadFile asynchronously so that the threads do not idle uselessly, but can do while (i < int.MaxValue)?
There's nothing in your code that is asynchronous at all. So really, you're dealing with multithreading/parallelism. In general, I recommend using higher-level constructs such as Parallel for parallelism rather than Task.Run.
But regardless of the API used, the underlying problem is that you're kicking off Environment.ProcessorCount * 6 threads. You'll want to ensure that your thread pool is ready for that many threads by calling ThreadPool.SetMinThreads with the workerThreads set to a high enough number.
It's not web requests but here's a toy example:
Test:
n: 1 await: 00:00:00.1373839 sleep: 00:00:00.1195186
n: 10 await: 00:00:00.1290465 sleep: 00:00:00.1086578
n: 100 await: 00:00:00.1101379 sleep: 00:00:00.6517959
n: 300 await: 00:00:00.1207069 sleep: 00:00:02.0564836
n: 500 await: 00:00:00.1211736 sleep: 00:00:02.2742309
n: 1000 await: 00:00:00.1571661 sleep: 00:00:05.3987737
Code:
using System.Diagnostics;
foreach( var n in new []{1, 10, 100, 300, 500, 1000})
{
var sw = Stopwatch.StartNew();
var tasks = Enumerable.Range(0,n)
.Select( i => Task.Run( async () =>
{
await Task.Delay(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks);
var tAwait = sw.Elapsed;
sw = Stopwatch.StartNew();
var tasks2 = Enumerable.Range(0,n)
.Select( i => Task.Run( () =>
{
Thread.Sleep(TimeSpan.FromMilliseconds(100));
}));
await Task.WhenAll(tasks2);
var tSleep = sw.Elapsed;
Console.WriteLine($"n: {n,4} await: {tAwait} sleep: {tSleep}");
}
Before the introduction of async-await programming into C#, how was one able to put a network request into another thread and yield execution time back to the CPU until a response is received so that this thread will not waste CPU time?
Because when CPU allocates time to this thread and thread sits idle waiting for a response, that would be a waste of CPU time, right?
In several ways, however Asynchronous Programming Model (APM) was the go-to for this type of Asynchrony
An asynchronous operation that uses the IAsyncResult design pattern is
implemented as two methods named BeginOperationName and
EndOperationName that begin and end the Asynchronous Operation
OperationName respectively. For example, the FileStream class provides
the BeginRead and EndRead methods to Asynchronously read bytes from a
file. These methods implement the asynchronous version of the Read
method.
To answer your question
Because when CPU allocates time to this thread and thread sits idle
waiting for a response, that would be a waste of CPU time, right?
No blocking a thread and waiting for a completion port to call back doesn't cause CPU cycles to run away, however polling on a thread will.
There is a lot to how this works, however an example use can be seen here
Example of usage
private static void TestWrite()
{
// Must specify FileOptions.Asynchronous otherwise the BeginXxx/EndXxx methods are
// handled synchronously.
FileStream fs = new FileStream(Program.FilePath, FileMode.OpenOrCreate,
FileAccess.Write, FileShare.None, 8, FileOptions.Asynchronous);
string content = "A quick brown fox jumps over the lazy dog";
byte[] data = Encoding.Unicode.GetBytes(content);
// Begins to write content to the file stream.
Console.WriteLine("Begin to write");
fs.BeginWrite(data, 0, data.Length, Program.OnWriteCompleted, fs);
Console.WriteLine("Write queued");
}
private static void OnWriteCompleted(IAsyncResult asyncResult)
{
// End the async operation.
FileStream fs = (FileStream)asyncResult.AsyncState;
fs.EndWrite(asyncResult);
// Close the file stream.
fs.Close();
Console.WriteLine("Write completed");
// Test async read bytes from the file stream.
Program.TestRead();
}
There is a strong emphasis that async/await is unrelated to multi-threading in most tutorials; that a single thread can dispatch multiple I/O operations and then handle the results as they complete without creating new threads. The concept makes sense but I've never seen that actual behavior in practice.
Take the below example:
static void Main(string[] args)
{
// No Delay
// var tasks = new List<int> { 3, 2, 1 }.Select(x => DelayedResult(x, 0));
// Staggered delay
// var tasks = new List<int> { 3, 2, 1 }.Select(x => DelayedResult(x, x));
// Simultaneous Delay
// var tasks = new List<int> { 3, 2, 1 }.Select(x => DelayedResult(x, 1));
var allTasks = Task.WhenAll(tasks);
allTasks.Wait();
Console.ReadLine();
}
static async Task<T> DelayedResult<T>(T result, int seconds = 0)
{
ThreadPrint("Yield:" + result);
await Task.Delay(TimeSpan.FromSeconds(seconds));
ThreadPrint("Continuation:" + result);
return result;
}
static void ThreadPrint(string message)
{
int threadId = Thread.CurrentThread.ManagedThreadId;
Console.WriteLine("Thread:" + threadId + "|" + message);
}
"No Delay" uses only one thread and executes the continuation immediately as though it were synchronous code. Looks good.
Thread:1|Yield:3
Thread:1|Continuation:3
Thread:1|Yield:2
Thread:1|Continuation:2
Thread:1|Yield:1
Thread:1|Continuation:1
"Staggered Delay" uses two threads. We have left the single-threaded world behind and there are absolutely new threads being created in the thread pool. At least the thread used for processing the continuations is reused and processing occurs in the order completed rather than the order invoked.
Thread:1|Yield:3
Thread:1|Yield:2
Thread:1|Yield:1
Thread:4|Continuation:1
Thread:4|Continuation:2
Thread:4|Continuation:3
"Simultaneous Delay" uses...4 threads! This is no better than regular old multi-threading; in fact, its worse since there is an ugly state machine hiding under the covers in the IL.
Thread:1|Yield:3
Thread:1|Yield:2
Thread:1|Yield:1
Thread:4|Continuation:1
Thread:7|Continuation:3
Thread:5|Continuation:2
Please provide a code example for the "Simultaneous Delay" that only uses one thread. I suspect there isn't one...which begs the question of why the async/await pattern is advertised as unrelated to multi-threading when it clearly either a) uses the ThreadPool and dispatches new threads as necessary or b) in a UI or ASP.NET context, simply deadlocks on a single thread unless you await "all the way up" which just means that the magic additional thread is being handled by the framework (not that it does not exist).
IMHO, async/await is an awesome abstraction for using continuations everywhere for high availability without getting mired in callback hell...but let's not pretend we are somehow dodging multi-threading. What am I missing?
You are forcing the multithreading in the code you posted.
When you await Task.Delay the current thread is freed to acomplish other tasks if the task scheduler decides it must be run asynchronously, in this case after it's released from the three tasks you lock that thread with Task.WhenAll.Wait which is a synchronous function.
Also, when the task scheduler finds the Task.Delay on the tasks it decides the task is going to be long running so it must be executed asynchronously, not synchronously like the No delay case (yes, you also await Task.Delay on the No delay case, but a delay of 0 seconds, the task scheduler is smart enough to distinguish this case).
As all the tasks resume simultaneously the task scheduler finds the first thread occupied so it creates a new thread for the first task resumed, then the next task sees both threads occupied and so on.
Basically you are asking something impossible to the async mechanism, you want the methods to be executed in parallel while being executed in one thread.
Also, async is not announced as unrelated to multithreading, if someone says that then he doesn't understand what async is, in fact, asynchronous implies multithreading but the async mechanism on .net is smart enough to complete some tasks synchronously to ensure the maximum efficiency.
It can be announced as thread efficient as if a thread is waiting for an I/O operation per example, it can be used for other tasks without completely locking that thread doing nothing, take a TcpClient for example which uses a Socket, at the OS level the socket uses completion threads so retaining that thread doing nothing is totally inefficient, or if you want to go more low level, take a disk read/write which uses DMA to transfer data without using the processor, in that case no other thread is needed at all and retaining the thread is a waste of resources.
Just as a fact, take this description from Microsoft when they introduced async:
Visual Studio 2012 introduces a simplified approach, async
programming, that leverages asynchronous support in the .NET Framework
4.5 and the Windows Runtime. The compiler does the difficult work that the developer used to do, and your application retains a logical
structure that resembles synchronous code. As a result, you get all
the advantages of asynchronous programming with a fraction of the
effort.
Also, using async on an UI thread does not lock the thread, that's the benefit, the UI thread will be freed and keep the UI responsive when it's waiting for long tasks, and instead of programming manually the multithreading and synchronization functions the async mechanism takes care of everything for you.
I already know that async-await keeps the thread context , also handle exception forwarding etc.(which helps a lot).
But consider the following example :
/*1*/ public async Task<int> ExampleMethodAsync()
/*2*/ {
/*3*/ var httpClient = new HttpClient();
/*4*/
/*5*/ //start async task...
/*6*/ Task<string> contentsTask = httpClient.GetStringAsync("http://msdn.microsoft.com");
/*7*/
/*8*/ //wait and return...
/*9*/ string contents = await contentsTask;
/*10*/
/*11*/ //get the length...
/*12*/ int exampleInt = contents.Length;
/*13*/
/*14*/ //return the length...
/*15*/ return exampleInt;
/*16*/ }
If the async method (httpClient.GetStringAsync) is an IO operation ( like in my sample above) So - I gain these things :
Caller Thread is not blocked
Worker thread is released because there is an IO operation ( IO completion ports...) (GetStringAsync uses TaskCompletionSource and not open a new thread)
Preserved thread context
Exception is thrown back
But What if instead of httpClient.GetStringAsync (IO operation) , I have a Task of CalcFirstMillionsDigitsOf_PI_Async (heavy compute bound operation on a sperate thread)
It seems that the only things I gain here is :
Preserved thread context
Exception is thrown back
Caller Thread is not blocked
But I still have another thread ( parallel thread) which executes the operation. and the cpu is switching between the main thread and the operation .
Does my diagnostics is correct?
Actually, you only get the second set of advantages in both cases. await doesn't start asynchronous execution of anything, it's simply a keyword to the compiler to generate code for handling completion, context etc.
You can find a better explanation of this in '"Invoke the method with await"... ugh!' by Stephen Toub.
It's up to the asynchronous method itself to decide how it achieves the asynchronous execution:
Some methods will use a Task to run their code on a ThreadPool thread,
Some will use some IO-completion mechanism. There is even a special ThreadPool for that, which you can use with Tasks with a custom TaskScheduler
Some will wrap a TaskCompletionSource over another mechanism like events or callbacks.
In every case, it is the specific implementation that releases the thread (if one is used). The TaskScheduler releases the thread automatically when a Task finishes execution, so you get this functionality for cases #1 and #2 anyway.
What happens in case #3 for callbacks, depends on how the callback is made. Most of the time the callback is made on a thread managed by some external library. In this case you have to quickly process the callback and return to allow the library to reuse the method.
EDIT
Using a decompiler, it's possible to see that GetStringAsync uses the third option: It creates a TaskCompletionSource that gets signalled when the operation finishes. Executing the operation is delegated to an HttpMessageHandler.
Your analysis is correct, though the wording on your second part makes it sound like async is creating a worker thread for you, which it is not.
In library code, you actually want to keep your synchronous methods synchronous. If you want to consume a synchronous method asynchronously (e.g., from a UI thread), then call it using await Task.Run(..)
Yes, you're correct. I cannot find any wrong statement in your question. Just the term "Preserved thread context" is unclear to me. Do you mean the "logical control flow"? In that case I'd agree.
Regarding the CPU bound example: you'd normally not do it that way because starting a CPU-based task and waiting for it increases overhead and decreases throughput. But this might be valid if you need the caller to be unblocked (in the case of a WinForms or WFP project for example).
(the following items has different goals , but im interesting knowing how they "PAUSEd")
questions
Thread.sleep - Does it impact performance on a system ?does it tie up a thread with its wait ?
what about Monitor.Wait ? what is the difference in the way they "wait"? do they tie up a thread with their wait ?
what about RegisteredWaitHandle ? This method accepts a delegate that is executed when a wait
handle is signaled. While it’s waiting, it doesn’t tie up a thread.
so some thread are paused and can be woken by a delegate , while others just wait ? spin ?
can someone please make things clearer ?
edit
http://www.albahari.com/threading/part2.aspx
Both Thread.Sleep and Monitor.Wait put the thread in the WaitSleepJoin state:
WaitSleepJoin: The thread is blocked. This could be the result of calling
Thread::Sleep or Thread::Join, of requesting a lock — for example, by
calling Monitor::Enter or Monitor::Wait — or of waiting on a thread
synchronization object such as ManualResetEvent.
RegisteredWaitHandle is obtained by calling RegisterWaitForSingleObject and passing a WaitHandle. Generally all descendants of this class use blocking mechanisms, so calling Wait will again put the thread in WaitSleepJoin (e.g. AutoResetEvent).
Here's another quote from MSDN:
The RegisterWaitForSingleObject method checks the current state of the
specified object's WaitHandle. If the object's state is unsignaled,
the method registers a wait operation. The wait operation is performed
by a thread from the thread pool. The delegate is executed by a worker
thread when the object's state becomes signaled or the time-out
interval elapses.
So a thread in the pool does wait for the signal.
Regarding ThreadPool.RegisterWaitForSingleObject, this does not tie up a thread per registration (pooled or otherwise). You can test this easily: run the following script in LINQPad which calls that method 20,000 times:
static ManualResetEvent _starter = new ManualResetEvent (false);
void Main()
{
var regs = Enumerable.Range (0, 20000)
.Select (_ => ThreadPool.RegisterWaitForSingleObject (_starter, Go, "Some Data", -1, true))
.ToArray();
Thread.Sleep (5000);
Console.WriteLine ("Signaling worker...");
_starter.Set();
Console.ReadLine();
foreach (var reg in regs) reg.Unregister (_starter);
}
public static void Go (object data, bool timedOut)
{
Console.WriteLine ("Started - " + data);
// Perform task...
}
If that code tied up 20,000 threads for the duration of the 5-second "wait", it couldn't possibly work.
Edit - in response to:
"this is a proof. but is there still a single thread which checks for
signals only ? in the thread pool ?"
This is an implementation detail. Yes, it could be implemented with a single thread that offloads the callbacks to the managed thread pool, although there's no guarantee of this. Wait handles are ultimately managed by operating system, which will most likely trigger the callbacks, too. It might use one thread (or a small number of threads) in its internal implementation. Or with interrupts, it might not block a single thread. It might even vary according to the operating system version. This is an implementation detail that's of no real relevance to us.
While it's true RegisterWaitForSingleObject creates wait threads, not every call creates one.
From MSDN:
New wait threads are created automatically when required
From Raymond Chen's blog:
...instead of costing a whole thread, it costs something closer to (but not exactly) 1/64 of a thread
So using RegisterWaitForSingleObject is generally preferable to creating your own wait threads.
Thread.Sleep and RegisteredWaitHandle work at different levels. Let me try and clear it up:
Processes have multiple threads, which execute simultaneously (depending on the OS scheduler). If a thread calls Thread.Sleep or Monitor.Wait, it doesn't spin - it is put to WaitSleepJoin state, and the CPU is given to other threads.
Now, when you have many simultaneous work items, you use a thread pool - a mechanism which creates several threads, and uses its own understanding of work items to dispatch calls to its threads. In this models, worker threads are called from the thread pool dispatcher to do some work, and then return back to the pool. If a worker thread calls a blocking operation - like Thread.Sleep or Monitor.Wait - the this thread is "tied up", since the thread pool dispatcher can't use it for additional work items.
I'm not familiar with the actual API, but I think RegisteredWaitHandle would tell the thread pool dispatcher to call a worker thread when needed - and your own thread is not "tied up", and can continue its work or return to the thread pool.
ThreadPool.g RegisterWaitForSingleObject does call in its native implementation ultimately
QueueUserAPC. See rotor sources (sscli20\clr\src\vm\win32threadpool.cpp(1981)). Unlike Wait Thread.Sleep your thread will not be put to a halt when you use RegisterWaitForSingleObject.
Instead for this thread a FIFO queue with user mode callbacks is registered which will be called when the thread is in an alertable state. That means you can continue to work and when your thread is blocked the OS will work on the registered callbacks giving your thread do to the opportunity to do something meaningful while it is waiting.
Edit1:
To complete the analysis. On the thread that did call RegisterWaitForSingleObject a callback is called on the thread when it is in an alertable state. Once this happens the the thread that did call RegisterWaitForSingleObject will execute a CLR callback that does register another callback which is processed by a thread pool callback wait thread which is only there to wait for signaled callbacks. This thread pool callback wait thread will then check in regular intervals for signaled callbacks.
This wait thread does finally call QueueUserWorkItem for the signalled callback to be executed on a thread pool thread.