C# NetworkStream asynchronous Begin methods callback on which thread? [duplicate] - c#

It's very hard to find detailed but simple description of worker and I/O threads in .NET
What's clear to me regarding this topic (but may not be technically precise):
Worker threads are threads that should employ CPU for their work;
I/O threads (also called "completion port threads") should employ device drivers for their work and essentially "do nothing", only monitor the completion of non-CPU operations.
What is not clear:
Although method ThreadPool.GetAvailableThreads returns number of available threads of both types, it seems there is no public API to schedule work for I/O thread. You can only manually create worker thread in .NET?
It seems that single I/O thread can monitor multiple I/O operations. Is it true? If so, why ThreadPool has so many available I/O threads by default?
In some texts I read that callback, triggered after I/O operation completion is performed by I/O thread. Is it true? Isn’t this a job for worker thread, considering that this callback is CPU operation?
To be more specific – do ASP.NET asynchronous pages user I/O threads? What exactly is performance benefit in switching I/O work to separate thread instead of increasing maximum number of worker threads? Is it because single I/O thread does monitor multiple operations? Or Windows does more efficient context switching when using I/O threads?

The term 'worker thread' in .net/CLR typically just refers to any thread other than the Main thread that does some 'work' on behalf of the application that spawned the thread. 'Work' could really mean anything, including waiting for some I/O to complete. The ThreadPool keeps a cache of worker threads because threads are expensive to create.
The term 'I/O thread' in .net/CLR refers to the threads the ThreadPool reserves in order to dispatch NativeOverlapped callbacks from "overlapped" win32 calls (also known as "completion port I/O"). The CLR maintains its own I/O completion port, and can bind any handle to it (via the ThreadPool.BindHandle API). Example here: http://blogs.msdn.com/junfeng/archive/2008/12/01/threadpool-bindhandle.aspx. Many .net APIs use this mechanism internally to receive NativeOverlapped callbacks, though the typical .net developer won't ever use it directly.
There is really no technical difference between 'worker thread' and 'I/O thread' -- they are both just normal threads. But the CLR ThreadPool keeps separate pools of each simply to avoid a situation where high demand on worker threads exhausts all the threads available to dispatch native I/O callbacks, potentially leading to deadlock. (Imagine an application using all 250 worker threads, where each one is waiting for some I/O to complete).
The developer does need to take some care when handling an I/O callback in order to ensure that the I/O thread is returned to the ThreadPool -- that is, I/O callback code should do the minimum work required to service the callback and then return control of the thread to the CLR threadpool. If more work is required, that work should be scheduled on a worker thread. Otherwise, the application risks 'hijacking' the CLR's pool of reserved I/O completion threads for use as normal worker threads, leading to the deadlock situation described above.
Some good references for further reading:
win32 I/O completion ports: http://msdn.microsoft.com/en-us/library/aa365198(VS.85).aspx
managed threadpool: http://msdn.microsoft.com/en-us/library/0ka9477y.aspx
example of BindHandle: http://blogs.msdn.com/junfeng/archive/2008/12/01/threadpool-bindhandle.aspx

I'll begin with a description of how asynchronous I/O is used by programs in NT.
You may be familiar with the Win32 API function ReadFile (as an example), which is a wrapper around the Native API function NtReadFile. This function allows you to do two things with asynchronous I/O:
You can create an event object and pass it to NtReadFile. This event will then be signaled when the read operation completes.
You can pass an asynchronous procedure call (APC) function to NtReadFile. Essentially what this means is that when the read operation completes, the function will be queued to the thread which initiated the operation and it will be executed when the thread performs an alertable wait.
There is however a third way of being notified when an I/O operation completes. You can create an I/O completion port object and associate file handles with it. Whenever an operation is completed on a file which is associated with the I/O completion port, the results of the operation (like I/O status) is queued to the I/O completion port. You can then set up a dedicated thread to remove results from the queue and perform the appropriate tasks like calling callback functions. This is essentially what an "I/O worker thread" is.
A normal "worker thread" is very similar; instead of removing I/O results from a queue, it removes work items from a queue. You can queue work items (QueueUserWorkItem) and have the worker threads execute them. This prevents you from having to spawn a thread every single time you want to perform a task asynchronously.

Simply put a worker thread is meant to perform a short period of work and will delete itself when it has completed it. A callback may be used to notify the parent process that it has completed or to pass back data.
An I/O thread will perform the same operation or series of operations continuously until stopped by the parent process. It is so called because it typically device drivers run continuously monitor the device port. An I/O thread will typically create Events whenever it wishes to communicate to other threads.
All processes run as threads.
Your application runs as a thread.
Any thread may spawn worker threads or I/O threads (as you call them).
There is always a fine balance between performance and the number or type of threads used. Too many callbacks or Events handled by a process will severely degrade its performance due to the number of interruptions to its main process loop as it handles them.
Examples of a worker thread would be to add data into a database after user interaction or to perform a long mathematical calculation or write data to a file. By using a worker thread you free up the main application, this is most useful for GUIs as it doesn't freeze whilst the task is being performed.

Someone with more skills than me is going to jump in here to help out.
Worker threads have a lot of state, they are scheduled by the processor etc. and you control everything they do.
IO Completion Ports are provided by the operating system for very specific tasks involving little shared state, and thus are faster to use. A good example in .Net is the WCF framework. Every "call" to a WCF service is actually executed by an IO Completion Port because they are the fastest to launch and the OS looks after them for you.

Related

When an I/O operation is being done in a synchronous method, does it always spawn a new thread?

The following read claims that when we're in a method A, and we for example read a stream to its end, the read operation will spawn an I/O thread, while the main thread will be waiting on that to complete.
Is that really the case? Why wouldn't the main thread do all that work, instead of waiting on something else to do it? Isn't that the main idea of what "there is no thread" is all about?
Since .NET asynchronous I/O has arisen initially in environment of Windows OS the given considerations are based on the Windows-based I/O concepts.
To start with, there are two types of the I/O flows: synchronous and asynchronous. The former is based on the waiting mechanism which implies that a thread which initiated an I/O operation at some point (typically when request hits a driver) is put into the waiting state by the OS scheduler and is awaken back by the scheduler when the I/O operation completes. The latter is based on specific notification mechanism which implies that after sending an I/O request the thread keeps doing other things and the I/O completion notification is sent separately to this or any other thread depending on the internal threads configuration.
Now as for the I/O threads - the I/O notification mechanism in Windows systems is implemented by using so-called I/O completion ports (IOCP). Briefly an application can create a completion port (you can think of it as a queue) which can be associated with more than one file handle and any thread can be associated to the completion port when it calls specific API on this port for the first time. That way the scheduler keeps the associations between the completion ports and threads which are associated with them to handle I/O completions more efficiently. Briefly a thread which is associated with completion port is put into waiting state and is awaken when the status of completion request is updated. For the .NET world the infrastructure creates the pools of such threads and they are denoted as I/O threads.
The example given in the article implies using synchronous I/O flow with waiting by the initial thread for the I/O operation completion. In contrast an asynchronous I/O scenario from the .NET perspective means using an additional thread for the I/O completion handling (but not earlier than completion occurs).

Why does a blocking thread consume more then async/await?

See this question and answer;
Why use async controllers, when IIS already handles the request concurrency?
Ok, a thread consumes more resources then the async/await construction, but why? What is the core difference? You still need to remember all state etc, don't you?
Why would a thread pool be limited, but can you have tons of more idle async/await constructions?
Is it because async/await knows more about your application?
Well, let's imagine a web-server. Most of his time, all he does is wait. it doesn't really CPU-bound usually, but more of I/O bound. It waits for network I/O, disk I/O etc. After every time he waits, he has something (usually very short to do) and then all he does is waiting again. Now, the interesting part is what happend while he waits. In the most "trivial" case (that of course is absolutely not production), you would create a thread to deal with every socket you have.
Now, each of those threads has it's own cost. Some handles, 1MB of stack space... And of course, not all those threads can run in the same time - so the OS scheduler need to deal with that and choose the right thread to run each time (which means A LOT of context switching). It will work for 1 clients. It'll work for 10 clients. But, let's imagine 10,000 clients at the same time. 10,000 threads means 10GB of memory. That's more than the average web server in the world.
All of these resources, is because you dedicated a thread for a user. BUT, most of this threads does nothing! they just wait for something to happen. and the OS has API for async IO that allows you to just queue an operation that will be done once the IO operation completed, without having dedicated thread waiting for it.
If you use async/await, you can write application that will easily use less threads, and each of the thread will be utilized much more - less "doing nothing" time.
async/await is not the only way of doing that. You could have done this before async/await was introduced. BUT, async/await allows you to write code that's very readable and very easy to write that does that, and look almost as it runs just on a single thread (not a lot of callbacks and delegates moving around like before).
By combining the easy syntax of async/await and some features of the OS like async I/O (by using IO completion port), you can write much more scalable code, without losing readability.
Another famous sample is WPF/WinForms. You have the UI thread, that all he does is to process events, and usually has nothing special to do. But, you can't block it or the GUI will hang and the user won't like it. By using async/await and splitting each "hard" work to short operations, you can achieve responsible UI and readable code. If you have to access the DB to execute a query, you'll start the async operation from the UI thread, and then you'll "await" it until it ends and you have results that you can process in the UI thread (because you need to show them to the user, for example). You could have done it before, but using async/await makes it much more readable.
Hope it helps.
Creating a new thread allocates a separate memory area exclusive for this thread holding its resources, mainly its call stack which in Windows takes up 1MB of memory.
So if you have a 1000 idle threads you are using up at least 1GB of memory doing nothing.
The state for async operations takes memory as well but it's just the actual size needed for that operation and the state machine generated by the compiler and it's kept on the heap.
Moreover, using many threads and blocking them has another cost (which IMO is bigger). When a thread is blocked it is taken out of the CPU and switched with another (i.e. context-switch). That means that your threads aren't using their time-slices optimally when they get blocked. Higher rate of context switching means your machine does more overhead of context-switching and less actual work by the individual threads.
Using async-await appropriately enables using all the given time-slice since the thread, instead of blocking, goes back to the thread pool and takes another task to execute while the asynchronous operation continues concurrently.
So, in conclusion, the resources async await frees up are CPU and memory, which allows your server to handle more requests concurrently with the same amount of resources or the same amount of requests with less resources.
The important thing to realize here is that a blocked thread is not usable to do any other work until it becomes unblocked. A thread that encounters an await is free to return to the threadpool and pick up other work until the value being awaited becomes available.
When you call a synchronous I/O method, the thread executing your code is blocked waiting for the I/O to complete. To handle 1000 concurrent requests, you will need 1000 threads.
When you call an asynchronous I/O method, the thread is not blocked. It initializes the I/O operation and can work on something else. It can be the rest of your method (if you don't await), or it can be some other request if you await the I/O method. The thread pool doesn't need to create new threads for new requests, as all the threads can be used optimally and keep the CPUs busy.
Async I/O operations are actually implemented asynchronously at the OS level.

When does the CLR create the IO thread when I call BeginXXX()

Suppose I call HttpWebRequest.BeginGetRequestStream() method.
I know that this method is asynchronous and when it's finished, an callback method should be called.
What I'm very sure is that when call the callback method, an I/O thread is created by CLR and this I/O thread do the callback.
What I'm not sure is when I call HttpWebRequest.BeginGetRequestStream(), is there any I/O thread created by CLR? Or just a worker thread created to send the request to the device?
Async IO is thread-less. There is no thread being created or blocked. That is the entire point of using async IO! What would it help you to unblock one thread and block another?
The internals of this have been discussed many times. Basically, the OS notifies the CLR when the IO is done. This causes the CLR to queue the completion callback you specified onto the thread-pool.
Short answer: You don't care.
Long answer:
There is no thread. More exactly, there is no thread for each of the asynchronous request you create. Instead, there's a bunch of I/O threads on the thread pool, which are mostly "waiting" on IOCP - waiting for the kernel to wake them up when data is available.
The important point is that each of these can respond to any notification, they're not (necessarily) selective. If you need to process 1000 responses, you can still do so with just the one thread - and it is only at this point that a thread from the thread pool is requested; it has to execute the callback method. For example, using synchronous methods, you'd have to keep a thousand threads to handle a thousand TCP connections.
Using asynchronous methods, you only need the one IOCP thread, and you only have to spawn (or rather, borrow) new threads to execute the callbacks - which are only being executed after you get new data / connection / whatever, and are returned back to the thread pool as soon as you're done. In practice, this means that a TCP server can handle thousands of simultaneous TCP connections with just a handful of threads. After all, there's little point in spawning more threads than your CPU can handle (which these days is usually around twice as much as you've got cores), and all the things that don't require CPU (like asynchronous I/O) don't require you to spawn new threads. If the handful of processing threads isn't enough, adding more will not help, unlike in the synchronous I/O scenario.

Does smtp.SendMailAsync(message) runs in new thread

Does Async functions runs in new thread. And it continues with normal execution.
smtp.SendMailAsync(message);
If there are 100 messages in the Message list: msgList, and we put foreach for that, Is it that it will create 100 threads and will run parallely.
foreach (var item in msgList)
{
smtp.SendMailAsync(item);
}
Please explain me, and also the performance issues.
And please let me know if there is a better way to send mass emails at once.
Firstly, SendMailAsync is not TAP. You cannot await it. Secondly there is no need for a thread to exist when sending an email, most of the "wait" time is in the latency for the server to respond. Finally, "is a better way to send mass emails at once"? What problems have you found?
The best way to find out if there are performance problems is to try it.
SendMailAsync and all methods that use the Task Parallel Library execute in a threadpool thread, although you can make them use a new thread if you need to. This means that instead of creating a new thread, an available thread is picked from the pool and returned there when the method finishes.
The number of threads in the pool varies with the version of .NET and the OS, the number of cores etc. It can go from 25 threads per core in .NET 4 to hundreds or per core in .NET 4.5 on a server OS.
An additional optimization for IO-bound (disk, network) tasks is that instead of using a thread, an IO completion port is used. Roughly, this is a callback from the IO stack when an IO operation (disk or network) finishes. This way the framework doesn't waste a thread waiting for an IO call to finish.
When you start an asynchronous network operation, .NET make the network call, registers for the callback and releases the threadpool thread. When the call finishes, the framework gets notified and schedules the rest of the asynchronous method (essentially what comes after the await or ContinueWith) on a threadpool thread.
Submitting a 100 asynchronous operations doesn't mean that 100 threads will be used nor that all 100 of them will execute in parallel. Rather, the framework will take into account the number of cores, the load and the number of available threads to execute as many of them as possible, without hurting overall performance. Waiting on the network calls may not even use a thread at all, while processing the messages themselves will execute on threadpool threads
SendMailAsync is just a TPL wrapper around the SendAsync method, but neither method uses a thread. Instead it uses a model known as an IO completion port (IOCP).
When you call SendMailAsync, your thread writes the mail message to the socket connecting to the SMTP server and registers a callback with the operating system which will be executed when the client receives a response back from the server. This callback is triggered by the "completion" event handled by the IO completion port.
The callback itself is invoked on one of a number of IO completion threads which are managed by the thread-pool. This pool of threads just handles the callbacks from IO completion events. Part of completing the callback marks the Task returned by the call to SendMailAsync as "completed", allowing any awaiting code to start execution in their own context.

Will a thread waiting for IO block the CPU?

Suppose I have a C# thread doing some blocking IO and waiting for it to finish. Now the OS scheduler gives it CPU time. Will it be given back right away or will it just be used by the thread doing nothing?
Or perhaps something entirely else?
On Windows blocking IO to any device (accessible via the file system interface or others) works by sending the IO request to the driver associated with the device, along with a handle to an event object, and then blocks the calling thread by waiting on that event object. (The event would get signaled when driver completes the IO). Hence when a thread does blocking IO it does not hog the CPU as it is only waiting on the event handle.
All blocking IO API(s) works in this fashion with probably subtle differences in implementation.

Categories

Resources