Ok, I think I have understood the whole async/await thing. Whenever you await something, the function you're running returns, allowing the current thread to do something else while the async function completes. The advantage is that you don't start a new thread.
This is not that hard to understand as it's somewhat how Node.JS works, except Node uses alot of callbacks to make this happen. This is where I fail to understand the advantage however.
The socket class doesn't currently have any Async methods (that work with async/await). I can of course pass a socket to the stream class, and use the async methods there, however this leaves a problem with the accepting of new sockets.
There are two ways of doing this, as far as I know. In both cases I accept new sockets in an infinite loop on the main thread. In the first case I can start a new task for every socket that I accept, and run the stream.ReceiveAsync within that task. However, won't an await actually block that task, since the task will have nothing else to do? Which again will result in more threads spawned on the threadpool, which again is no better than using synchronous methods inside a task?
My second option is to put all accepted sockets in one of several lists (one list per thread), and inside those threads run a loop, running await stream.ReceiveAsync for every socket. This way, whenever i run into await, stream.ReceiveAsync and start receiving from all other sockets.
I guess my real question is if this is in any way more effective than a threadpool, and in the first case, if it really will be worse than just using the APM methods.
I also know you can wrap APM methods into functions using await/async, but the way I see it, you still get the "disadvantage" of APM methods, with the extra overhead of state machines in async/await.
The async socket API is not based around Task[<T>], so it isn't directly usable from async/await - but you can bridge that fairly easily - for example (completely untested):
public class AsyncSocketWrapper : IDisposable
{
public void Dispose()
{
var tmp = socket;
socket = null;
if(tmp != null) tmp.Dispose();
}
public AsyncSocketWrapper(Socket socket)
{
this.socket = socket;
args = new SocketAsyncEventArgs();
args.Completed += args_Completed;
}
void args_Completed(object sender, SocketAsyncEventArgs e)
{
// might want to switch on e.LastOperation
var source = (TaskCompletionSource<int>)e.UserToken;
if (ShouldSetResult(source, args)) source.TrySetResult(args.BytesTransferred);
}
private Socket socket;
private readonly SocketAsyncEventArgs args;
public Task<int> ReceiveAsync(byte[] buffer, int offset, int count)
{
TaskCompletionSource<int> source = new TaskCompletionSource<int>();
try
{
args.SetBuffer(buffer, offset, count);
args.UserToken = source;
if (!socket.ReceiveAsync(args))
{
if (ShouldSetResult(source, args))
{
return Task.FromResult(args.BytesTransferred);
}
}
}
catch (Exception ex)
{
source.TrySetException(ex);
}
return source.Task;
}
static bool ShouldSetResult<T>(TaskCompletionSource<T> source, SocketAsyncEventArgs args)
{
if (args.SocketError == SocketError.Success) return true;
var ex = new InvalidOperationException(args.SocketError.ToString());
source.TrySetException(ex);
return false;
}
}
Note: you should probably avoid running the receives in a loop - I would advise making each socket responsible for pumping itself as it receives data. The only thing you need a loop for is to periodically sweep for zombies, since not all socket deaths are detectable.
Note also that the raw async socket API is perfectly usable without Task[<T>] - I use that extensively. While await may have uses here, it is not essential.
This is not that hard to understand as it's somewhat how Node.JS works, except Node uses alot of callbacks to make this happen. This is where I fail to understand the advantage however.
Node.js does use callbacks, but it has one other significant facet that really simplifies those callbacks: they are all serialized to the same thread. So when you're looking at asynchronous callbacks in .NET, you're usually dealing with multithreading as well as asynchronous programming (except for EAP-style callbacks).
Asynchronous programming using callbacks is called "continuation-passing style" (CPS). It's the only real option for Node.js but is one of many options on .NET. In particular, CPS code can get extremely complex and difficult to maintain, so the async/await compiler transform was introduced so you could write "normal-looking" code and the compiler would translate it to CPS for you.
In both cases I accept new sockets in an infinite loop on the main thread.
If you're writing a server, then yes, somewhere you will be repeatedly accepting new client connections. Also, you should be continuously reading from each connected socket, so each socket also has a loop.
In the first case I can start a new task for every socket that I accept, and run the stream.ReceiveAsync within that task.
You wouldn't need a new task. That's the whole point of asynchronous programming.
My second option is to put all accepted sockets in one of several lists (one list per thread), and inside those threads run a loop, running await stream.ReceiveAsync for every socket.
I'm not sure why you'd need multiple threads, or any dedicated threads at all.
You seem a bit confused on how async and await work. I recommend reading my own introduction, the MSDN overview, the Task-Based Asynchronous Pattern guidance, and the async FAQ, in that order.
I also know you can wrap APM methods into functions using await/async, but the way I see it, you still get the "disadvantage" of APM methods, with the extra overhead of state machines in async/await.
I'm not sure what disadvantage you're referring to. The overhead of state machines, while non-zero, is negligible in the face of socket I/O.
If you're looking to do socket I/O, you have several options. For reads, you can either do them in an "infinite" loop using APM or Task wrappers around the APM or Async methods. Alternatively, you could convert them into a stream-like abstraction using Rx or TPL Dataflow.
Another option is a library I wrote a few years ago called Nito.Async. It provides EAP-style (event-based) sockets that handle all the thread marshaling for you, so you end up with something simpler like Node.js. Of course, like Node.js, this simplicity means it won't scale as well as a more complex solution.
Related
I'm currently rewriting my TCP server from using StreamSocketListener to TcpListener because I need to be able to use SSL. Since it was some time ago that I wrote the code I'm also trying to make it more cleaner and easier to read and hopefully increase the performance with higher number of clients but I'm currently stuck.
I'm calling a receive method recursively until the client disconnects but I'm starting to wonder if it wouldn't be a better to use a single long running task for it. But I hesitate to use it since it will then create a new long running task for every connected client. That's why I'm turning to the Stack Overflow community for some guidance on how to proceed.
Note: The connection is supposed to be open 24/7 or as much as possible for most of the connected clients.
Any comments are appreciated.
The current code looks something like this:
private async Task ReceiveData(SocketStream socket) {
await Task.Yield();
try {
using (var reader = new DataReader(socket.InputStream)) {
uint received;
do {
received = await reader.LoadAsync(4);
if (received == 0) return;
} while (reader.UnconsumedBufferLength < 4);
if (received == 0) return;
var length = reader.ReadUInt32();
do {
received = await reader.LoadAsync(length);
if (received == 0) return;
} while (reader.UnconsumedBufferLength < length);
if (received == 0) return;
// Publish the data asynchronously using an event aggregator
Console.WriteLine(reader.ReadString(length));
}
ReceiveData(socket);
}
catch (IOException ex) {
// Client probably disconnected. Can check hresult to be sure.
}
catch (Exception ex) {
Console.WriteLine(ex);
}
}
But I'm wondering if I should use something like the following code instead and start it as a long running task:
// Not sure about this part, never used Factory.StartNew before.
Task.Factory.StartNew(async delegate { await ReceiveData(_socket); }, TaskCreationOptions.LongRunning);
private async Task ReceiveData(SocketStream socket) {
try {
using (var reader = new DataReader(socket.InputStream)) {
while (true) {
uint received;
do {
received = await reader.LoadAsync(4);
if (received == 0) break;
} while (reader.UnconsumedBufferLength < 4);
if (received == 0) break;
var length = reader.ReadUInt32();
do {
received = await reader.LoadAsync(length);
if (received == 0) break;
} while (reader.UnconsumedBufferLength < length);
if (received == 0) break;
// Publish the data asynchronously using an event aggregator
Console.WriteLine(reader.ReadString(length));
}
}
// Client disconnected.
}
catch (IOException ex) {
// Client probably disconnected. Can check hresult to be sure.
}
catch (Exception ex) {
Console.WriteLine(ex);
}
}
In the first, over-simplified version of the code that was posted, the "recursive" approach had no exception handling. That in and of itself would be enough to disqualify it. However, in your updated code example it's clear that you are catching exceptions in the async method itself; thus the method is not expected to throw any exceptions, and so failing to await the method call is much less of a concern.
So, what else can we use to compare and contrast the two options?
You wrote:
I'm also trying to make it more cleaner and easier to read
While the first version is not really recursive, in the sense that each call to itself would increase the depth of the stack, it does share some of the readability and maintainability issues with true recursive methods. For experienced programmers, comprehending such a method may not be hard, but it will at the very least slow down the inexperienced, if not make them scratch their heads for awhile.
So there's that. It seems like a significant disadvantage, given the stated goals.
So what about the second option, about which you wrote:
…it will then create a new long running task for every connected client
This is an incorrect understanding of how that would work.
Without delving too deeply into how async methods work, the basic behavior is that an async method will in fact return at each use of await (ignoring for a moment the possibility of operations that complete synchronously…the assumption is that the typical case is asynchronous completions).
This means that the task you initiate with this line of code:
Task.Factory.StartNew(
async delegate { await ReceiveData(_socket); },
TaskCreationOptions.LongRunning);
…lives only long enough to reach the first await in the ReceiveData() method. At that point, the method returns and the task which was started terminates (either allowing the thread to terminate completely, or to be returned to the thread pool, depending on how the task scheduler decided to run the task).
There is no "long running task" for every connected client, at least not in the sense of there being a thread being used up. (In some sense, there is since of course there's a Task object involved. But that's just as true for the "recursive" approach as it is for the looping approach.)
So, that's the technical comparison. Of course, it's up to you to decide what the implications are for your own code. But I'll offer my own opinion anyway…
For me, the second approach is significantly more readable. And it is specifically because of the way async and await were designed and why they were designed. That is, this feature in C# is specifically there to allow asynchronous code to be implemented in a way that reads almost exactly like regular synchronous code. And in fact, this is borne out by the false impression that there is a "long running task" dedicated to each connection.
Prior to the async/await feature, the correct way to write a scalable networking implementation would have been to use one of the APIs from the "Asynchronous Programming Model". In this model, the IOCP thread pool is used to service I/O completions, such that a small number of threads can monitor and respond to a very large number of connections.
The underlying implementation details actually do not change when switching over to the new async/await syntax. The process still uses a small number of IOCP thread pool threads to handle the I/O completions as they occur.
The difference is that the when using async/await, the code looks like the kind of code that one would write if using a single thread for each connection (hence the misunderstanding of how this code actually works). It's just a big loop, with all the necessary handling in one place, and without the need for different code to initiate an I/O operation and to complete one (i.e. the call to Begin...() and later to End...()).
And to me, that is the beauty of async/await, and why the first version of your code is inferior to the second. The first fails to take advantage of what makes async/await actually useful and beneficial to code. The second takes full advantage of that.
Beauty is, of course, in the eye of the beholder. And to at least a limited extent, the same thing can be said of readability and maintainability. But given your stated goal of making the code "cleaner and easier to read", it seems to me that the second version of your code serves that purpose best.
Assuming the code is correct now, it will be easier for you to read (and remember, it's not just you today that needs to read it…you want it readable for the "you" a year from now, after you haven't seen the code for awhile). And if it turns out the code is not correct now, the simpler second version will be easier to read, debug, and fix.
The two versions are in fact almost identical. In that respect, it almost doesn't matter. But less code is always better. The first version has two extra method calls and dangling awaitable operations, while the second replaces those with a simple while loop. I know which one I find more readable. :)
Related questions/useful additional reading:
Long Running Blocking Methods. Difference between Blocking, Sleeping, Begin/End and Async
Is async recursion safe in C# (async ctp/.net 4.5)?
I am getting my hands dirty with TPL.I stumbled upon a topic in TPL called TaskCompletionSource which is one of the ways to create a Task and it give you more control over the task by allowing developers in setting result,exception etc etc. Here is an example using task completion source
public static Task<int> RunAsyncFunction(Func<int> sampleFunction)
{
if (sampleFunction == null)
throw new NullReferenceException("Method cannot be null");
var tcs = new TaskCompletionSource<int>();
ThreadPool.QueueUserWorkItem(_ =>
{
try
{
int result = sampleFunction();
tcs.SetResult(result);
}
catch (Exception ex)
{
tcs.SetException(ex);
}
});
return tcs.Task;
}
However this is not truly asynchronus programming.It is asynchronus programming using multithreading .How can I convert this example to get it run on a single thread rather than multiple threads ? or is there any other example I can follow?
For it to be asynchronous, it needs some capacity to be completed independently in the future. That is typically via one of two things:
via a callback from an operation such as socket IO, file IO, a system timer, etc (some external source that can cause reactivation)
a second thread (possibly a queued work pool thread, like in your example)
If you only have a single thread, and no external callback, then there really isn't any need or sense in using Task<T>. However, you can still expose that by simply performing the calculation now, and setting the result now - or more simply: using Task.FromResult.
However, the code you have shown is genuinely asynchronous - or more specifically: the Task<T> that you return is. It perhaps isn't the greatest use-case, but there's nothing inherently wrong with it - except that your entire method can be hugely simplified to:
return Task.Run(sampleFunction);
The Task.Run<T> method:
Queues the specified work to run on the ThreadPool and returns a task or Task handle for that work.
Normally, if I'm using TaskCompletionSource, it is because I am writing IO-callback based tasks, not ThreadPool based tasks; Task.Run is fine for most of those.
TaskCompletionSource doesn't make your code asynchronous. It's a utility to enable someone else to asynchronously await your operation.
Your operation needs to already be asynchronous on its own. For example if it's in an older paradigm, like the BeginXXX/EndXXX one.
TaskCompletionSource is mostly used to convert different types of asynchronous programming into Task based asynchronous programming.
In this scenario, system A needs to send a message to system B. The following code shows a sample of how this was accomplished:
public interface IExecutionStrategy
{
Task<Result> ExecuteMessage(Message message);
}
public class WcfExecutionStrategy
{
public async Task<Result> ExecuteMessage(Message message)
{
using (var client = new Client())
{
return await client.RunMessageOnServer(message);
}
}
}
public class MessageExecutor
{
private readonly IExecutionStrategy _strategy;
public MessageExecutor(IExecutionStrategy strategy)
{
_strategy = strategy;
}
public Task<Result> ExecuteMessage(Message msg)
{
// ....
// Do some common checks and code here
// ....
var result = await _strategy.ExecuteMessage(msg);
// ....
// Do some common cleanup and logging here
// .....
return result;
}
}
For reasons out of scope of this question we decided to switch from Wcf to using raw http streams, but we needed both side by side to gather metrics and test it out. So I created a new IExecutionStrategy implementation to handle this:
public class HttpclientExecutionStrategy
{
public async Task<Result> ExecuteMessage(Message message)
{
var request = CreateWebRequestmessage
var responseStream = await Task.Run(() =>
{
var webResponse = (HttpWebResponse)webRequest.GetResponse();
return webResponse.GetResponseStream();
}
return MessageStreamer.ReadResultFromStream(responseStream);
}
}
Essentially, the only way I could get this to be asynchronous was to wrap it in a Task.Run() so the web request was none blocking. (Note: due to unique stream manipulation requirements on both sending and receiving it is not straight forward to implement this is in HttpClient, and even if it's possible this fact is out of scope for this question).
We thought this was fine until we read Stephen Cleary's multiple blog posts about how Task.Run() is bad, both in library code and in Asp.Net applications. This makes perfect sense to me.
What doesn't make sense is how you actually implement a naturally asynchronous call if the third party library does not support an asynchronous movement. For example, if you were to use HttpClient.GetStreamAsync() what does that do that makes it better for asynchronous operations than Task.Run(() => HttpClient.GetStream()), and is there any way to remedy this for non-async third party libraries?
HttpClient.GetStreamAsync is a pure asynchronous method, which means no new threads will be introduced while making the call, and when using in combination with await, will yield control back to the caller until the IO request is done. This will scale well, as you actually free the ThreadPool thread that invoked the operation to so more work while the request is executing, so your server can actually process more requests in the meantime.
On the contrary, using a dedicated thread (sync over async) just to make a blocking IO request call will definitely not scale well, and might eventually cause a starvation if the execution time is long enough.
Edit
The truely asynchronous nature of the XXXAsync implementation comes from the network device driver supplying an asynchronous endpoint to the OS. Under the covers the WinHTTP (Thanks #Noseratio for the correction) library is used for the async operations. What that means is that an I/O Request Packet (IRP) is generated and passed to the device driver. Once the request is complete, a CPU interrupt will occur which will eventually cause the callback registered to be invoked. You can look at theses examples: Using WinInet HTTP functions in Full Asynchronous Mode or Windows with C++ - Asynchronous WinHTTP for asynchronous examples, and of course read the excellent There Is No Thread by Stephan Cleary. You can natively implement it yourself and wrap it in a managed wrapper.
Short Answer
Use async-await if the operation is truly asynchronous. If it's not, then don't pretend it is.
Long Answer
Asynchronous operations have only 2 real benefits: scalability and offloading. Scalability is mostly for the server-side while offloading is used for "special" threads (mostly GUI threads).
If you're dependent on a synchronous library and there's nothing you can do about, making it asynchronous using Task.Run doesn't improve scalability at all (and may in fact hinder it), and you're only left with offloading. async reduces resource usage only when the operation is inherently asynchronous, when there is no thread throughout the actual asynchronous part (network, disk, etc.)
If you're developing a rich-client application, go ahead and use "sync over async" with async-await. But for any other type of application, there's no reason to use async-await, and I would recommend against it.
Okay, I read many questions involving writing high scale-able servers but I never really came across a good answer. Anyway I want to create a high scale-able clients, which handles lots of data and connections. What I created was a simple client using SocketAsyncEventArgs and C#5 async/await like this:
public async Task<int> SendAsync(byte[] buffer, int offset, int size)
{
var socketArgs = new SocketAsyncEventArgs();
var tcs = new TaskCompletionSource<int>();
socketArgs.SetBuffer(buffer, offset, size);
socketArgs.Completed += (sender, args) =>
{
tcs.SetResult(socketArgs.BytesTransferred);
LastSocketError = socketArgs.SocketError;
};
if (_clientSocket.SendAsync(socketArgs))
return await tcs.Task;
LastSocketError = socketArgs.SocketError;
return socketArgs.BytesTransferred;
}
And that for ConnectAsync, ReceiveAsync and AcceptAcync. This works great for the client part, but on my server I don't know how to get it right.. (I would end up creating a Thread for every Client for receiving.)
I can use the APM (Begin/End) or using EventHandler but that kills the purpose of using async/await and in the end it isn't memory efficient when creating a new SocketAsyncEventArgs for every call. I tried creating a Pool (using a ConcurrentBag) but then I still had to create a TaskCompletionSource over and over (as you can use it only once.)
So I don't think this is a good idea of creating this, although I really like it.. What would be a good design to create a high scale-able and high performance server?
We could, of course, adjust such asynchronous methods with the task, but that would largely defeat the purpose of these methods, similar to the assignment of an IAsyncResult in the case of APM. However, even so, we want to be able to take advantage of the async / await support of the compiler to facilitate the writing of asynchronous code using sockets. We want our cake and eat it too.
Of course, the compiler admits expecting more than Tasks. So, if you have a specialized scenario like this, you can take advantage of the support based on compiler patterns to wait for things.
More information:
https://blogs.msdn.microsoft.com/pfxteam/2011/12/15/awaiting-socket-operations/
What is the correct way to accept sockets in a multi connection environment in .NET?
Will the following be enough even if the load is high?
while (true)
{
//block untill socket accepted
var socket = tcpListener.AcceptSocket();
DoStuff(socket) //e.g. spawn thread and read data
}
That is, can I accept sockets in a single thread and then handle the sockets in a thread / dataflow / whatever.
So the question is just about the accept part..
You'll probably want the BeginAccept async operation instead of the synchroneous Accept.
And if you want to handle high load, you definitely don't want a thread per connection - again, you async methods.
Take a look at either the Reactor or Proactor pattern depending on if you wan't to block or not. I'll recommend the Patterns for Concurrent and Networked Objects book.
This should be fine but if the load gets even higher you might consider using the asynchronous versions of this method: BeginAcceptSocket/EndAcceptSocket.
The BeginAcceptSocket is a better choice if you want the most performant server.
More importantly, these async operations use a Threadpool under the hood whilst in your current implementation you are creating and destroying lots of threads which is really expensive.
I think the best approach is to call BeginAccept(), and within OnAccept call BeginAccept right again.. This should give you the best concurrency.
The OnAccept should be something like this:
private void OnAccept(IAsyncResult ar)
{
bool beginAcceptCalled = false;
try
{
//start the listener again
_listener.BeginAcceptSocket(OnAccept, null);
beginAcceptCalled = true;
Socket socket = _listener.EndAcceptSocket(ar);
//do something with the socket..
}
catch (Exception ex)
{
if (!beginAcceptCalled)
{
//try listening to connections again
_listener.BeginAcceptSocket(OnAccept, null);
}
}
}
It doesn't really matter performance wise. What matters is how you communicate which each client. That handling will consume a lot more CPU than accepting sockets.
I would use BeginAccept/EndAccept for the listener socket AND BeginReceive/EndReceive for the client sockets.
Since I'm using Async CTP and DataFlow, the current code looks like this:
private async void WaitForSockets()
{
var socket = await tcpListener.AcceptSocketAsync();
WaitForSockets();
incomingSockets.Post(socket);
}
Note that what looks like a recursive call will not cause stack overflow or block.
It will simply start a new awaiter for a new socket and exit.