I want to read from a socket in an asynchronous way.
If I used the synchronous code below
StreamReader aReadStream = new StreamReader(aStream);
String aLine = aReadStream.ReadLine(); // application might hand here
with a valid open connection, but no data available for reading, the application would hang at the ReadLine() function.
If I switch to the code below using Asynchronous Programming Model
NetworkStream aStream = aClient.GetStream();
while(true)
{
IAsyncResult res = aStream.BeginRead(data, 0, data.Length, null, null);
if (res.IsCompleted){
Trace.WriteLine("here");
} else {
Thread.Sleep(100);
}
}
In the same scenario what would happen? If there is no data to read, the variable res would appear as Completed or not?
Most important, when there is no data to read, all of my calls in that while loop to aStream.BeginRead() are they scheduled continuously in the Thread Pool? If this is true am I risking to degrade the application performances because the Thread Pool has increased too much its size?
Thanks for the help
AFG
By writing the code in this way, you've basically made it synchronous.
The proper way to make this code work is to call BeginRead, passing a callback handler which will process the data which has been read when it is ready, then go do other work rather than just entering a loop.
The callback handler you specify will also be triggered when the data stream is terminated (e.g. because the connection was closed) which you can handle appropriately.
See http://msdn.microsoft.com/en-us/library/system.net.sockets.networkstream.beginread.aspx for a sample.
Have a look at this article which I wrote that covers asynchronous sockets here on CodeProject which is what I learnt from the "MSDN, August 2005, 'Winsock - Get closer to the wire with high-performance sockets in .NET', Daryn Kiely, Pg 81."
Hope this helps,
Best regards,
Tom.
Related
Ok, I think I have understood the whole async/await thing. Whenever you await something, the function you're running returns, allowing the current thread to do something else while the async function completes. The advantage is that you don't start a new thread.
This is not that hard to understand as it's somewhat how Node.JS works, except Node uses alot of callbacks to make this happen. This is where I fail to understand the advantage however.
The socket class doesn't currently have any Async methods (that work with async/await). I can of course pass a socket to the stream class, and use the async methods there, however this leaves a problem with the accepting of new sockets.
There are two ways of doing this, as far as I know. In both cases I accept new sockets in an infinite loop on the main thread. In the first case I can start a new task for every socket that I accept, and run the stream.ReceiveAsync within that task. However, won't an await actually block that task, since the task will have nothing else to do? Which again will result in more threads spawned on the threadpool, which again is no better than using synchronous methods inside a task?
My second option is to put all accepted sockets in one of several lists (one list per thread), and inside those threads run a loop, running await stream.ReceiveAsync for every socket. This way, whenever i run into await, stream.ReceiveAsync and start receiving from all other sockets.
I guess my real question is if this is in any way more effective than a threadpool, and in the first case, if it really will be worse than just using the APM methods.
I also know you can wrap APM methods into functions using await/async, but the way I see it, you still get the "disadvantage" of APM methods, with the extra overhead of state machines in async/await.
The async socket API is not based around Task[<T>], so it isn't directly usable from async/await - but you can bridge that fairly easily - for example (completely untested):
public class AsyncSocketWrapper : IDisposable
{
public void Dispose()
{
var tmp = socket;
socket = null;
if(tmp != null) tmp.Dispose();
}
public AsyncSocketWrapper(Socket socket)
{
this.socket = socket;
args = new SocketAsyncEventArgs();
args.Completed += args_Completed;
}
void args_Completed(object sender, SocketAsyncEventArgs e)
{
// might want to switch on e.LastOperation
var source = (TaskCompletionSource<int>)e.UserToken;
if (ShouldSetResult(source, args)) source.TrySetResult(args.BytesTransferred);
}
private Socket socket;
private readonly SocketAsyncEventArgs args;
public Task<int> ReceiveAsync(byte[] buffer, int offset, int count)
{
TaskCompletionSource<int> source = new TaskCompletionSource<int>();
try
{
args.SetBuffer(buffer, offset, count);
args.UserToken = source;
if (!socket.ReceiveAsync(args))
{
if (ShouldSetResult(source, args))
{
return Task.FromResult(args.BytesTransferred);
}
}
}
catch (Exception ex)
{
source.TrySetException(ex);
}
return source.Task;
}
static bool ShouldSetResult<T>(TaskCompletionSource<T> source, SocketAsyncEventArgs args)
{
if (args.SocketError == SocketError.Success) return true;
var ex = new InvalidOperationException(args.SocketError.ToString());
source.TrySetException(ex);
return false;
}
}
Note: you should probably avoid running the receives in a loop - I would advise making each socket responsible for pumping itself as it receives data. The only thing you need a loop for is to periodically sweep for zombies, since not all socket deaths are detectable.
Note also that the raw async socket API is perfectly usable without Task[<T>] - I use that extensively. While await may have uses here, it is not essential.
This is not that hard to understand as it's somewhat how Node.JS works, except Node uses alot of callbacks to make this happen. This is where I fail to understand the advantage however.
Node.js does use callbacks, but it has one other significant facet that really simplifies those callbacks: they are all serialized to the same thread. So when you're looking at asynchronous callbacks in .NET, you're usually dealing with multithreading as well as asynchronous programming (except for EAP-style callbacks).
Asynchronous programming using callbacks is called "continuation-passing style" (CPS). It's the only real option for Node.js but is one of many options on .NET. In particular, CPS code can get extremely complex and difficult to maintain, so the async/await compiler transform was introduced so you could write "normal-looking" code and the compiler would translate it to CPS for you.
In both cases I accept new sockets in an infinite loop on the main thread.
If you're writing a server, then yes, somewhere you will be repeatedly accepting new client connections. Also, you should be continuously reading from each connected socket, so each socket also has a loop.
In the first case I can start a new task for every socket that I accept, and run the stream.ReceiveAsync within that task.
You wouldn't need a new task. That's the whole point of asynchronous programming.
My second option is to put all accepted sockets in one of several lists (one list per thread), and inside those threads run a loop, running await stream.ReceiveAsync for every socket.
I'm not sure why you'd need multiple threads, or any dedicated threads at all.
You seem a bit confused on how async and await work. I recommend reading my own introduction, the MSDN overview, the Task-Based Asynchronous Pattern guidance, and the async FAQ, in that order.
I also know you can wrap APM methods into functions using await/async, but the way I see it, you still get the "disadvantage" of APM methods, with the extra overhead of state machines in async/await.
I'm not sure what disadvantage you're referring to. The overhead of state machines, while non-zero, is negligible in the face of socket I/O.
If you're looking to do socket I/O, you have several options. For reads, you can either do them in an "infinite" loop using APM or Task wrappers around the APM or Async methods. Alternatively, you could convert them into a stream-like abstraction using Rx or TPL Dataflow.
Another option is a library I wrote a few years ago called Nito.Async. It provides EAP-style (event-based) sockets that handle all the thread marshaling for you, so you end up with something simpler like Node.js. Of course, like Node.js, this simplicity means it won't scale as well as a more complex solution.
What is the correct way to accept sockets in a multi connection environment in .NET?
Will the following be enough even if the load is high?
while (true)
{
//block untill socket accepted
var socket = tcpListener.AcceptSocket();
DoStuff(socket) //e.g. spawn thread and read data
}
That is, can I accept sockets in a single thread and then handle the sockets in a thread / dataflow / whatever.
So the question is just about the accept part..
You'll probably want the BeginAccept async operation instead of the synchroneous Accept.
And if you want to handle high load, you definitely don't want a thread per connection - again, you async methods.
Take a look at either the Reactor or Proactor pattern depending on if you wan't to block or not. I'll recommend the Patterns for Concurrent and Networked Objects book.
This should be fine but if the load gets even higher you might consider using the asynchronous versions of this method: BeginAcceptSocket/EndAcceptSocket.
The BeginAcceptSocket is a better choice if you want the most performant server.
More importantly, these async operations use a Threadpool under the hood whilst in your current implementation you are creating and destroying lots of threads which is really expensive.
I think the best approach is to call BeginAccept(), and within OnAccept call BeginAccept right again.. This should give you the best concurrency.
The OnAccept should be something like this:
private void OnAccept(IAsyncResult ar)
{
bool beginAcceptCalled = false;
try
{
//start the listener again
_listener.BeginAcceptSocket(OnAccept, null);
beginAcceptCalled = true;
Socket socket = _listener.EndAcceptSocket(ar);
//do something with the socket..
}
catch (Exception ex)
{
if (!beginAcceptCalled)
{
//try listening to connections again
_listener.BeginAcceptSocket(OnAccept, null);
}
}
}
It doesn't really matter performance wise. What matters is how you communicate which each client. That handling will consume a lot more CPU than accepting sockets.
I would use BeginAccept/EndAccept for the listener socket AND BeginReceive/EndReceive for the client sockets.
Since I'm using Async CTP and DataFlow, the current code looks like this:
private async void WaitForSockets()
{
var socket = await tcpListener.AcceptSocketAsync();
WaitForSockets();
incomingSockets.Post(socket);
}
Note that what looks like a recursive call will not cause stack overflow or block.
It will simply start a new awaiter for a new socket and exit.
Quick preface of what I'm trying to do. I want to start a process and start up two threads to monitor the stderr and stdin. Each thread chews off bits of the stream and then fires it out to a NetworkStream. If there is an error in either thread, both threads need to die immediately.
Each of these processes with stdout and stdin monitoring threads are spun off by a main server process. The reason this becomes tricky is because there can easily be 40 or 50 of these processes at any given time. Only during morning restart bursts are there ever more than 50 connections, but it really needs to be able to handle 100 or more. I test with 100 simultaneous connections.
try
{
StreamReader reader = this.myProcess.StandardOutput;
char[] buffer = new char[4096];
byte[] data;
int read;
while (reader.Peek() > -1 ) // This can block before stream is streamed to
{
read = reader.Read(buffer, 0, 4096);
data = Server.ClientEncoding.GetBytes(buffer, 0, read);
this.clientStream.Write(data, 0, data.Length); //ClientStream is a NetworkStream
}
}
catch (Exception err)
{
Utilities.ConsoleOut(string.Format("StdOut err for client {0} -- {1}", this.clientID, err));
this.ShutdownClient(true);
}
This code block is run in one Thread which is right now not Background. There is a similar thread for the StandardError stream. I am using this method instead of listening to OutputDataReceived and ErrorDataReceived because there was an issue in Mono that caused these events to not always fire properly and even though it appears to be fixed now I like that this method ensures I'm reading and writing everything sequentially.
ShutdownClient with True simply tries to kill both threads. Unfortunately the only way I have found to make this work is to use an interrupt on the stdErrThread and stdOutThread objects. Ideally peek would not block and I could just use a manual reset event to keep checking for new data on stdOut or stdIn and then just die when the event is flipped.
I doubt this is the best way to do it. Is there a way to execute this without using an Interrupt?
I'd like to change, because I just saw in my logs that I missed a ThreadInterruptException thrown inside Utlities.ConsoleOut. This just does a System.Console.Write if a static variable is true, but I guess this blocks somewhere.
Edits:
These threads are part of a parent Thread that is launched en masse by a server upon a request. Therefore I cannot set the StdOut and StdErr threads to background and kill the application. I could kill the parent thread from the main server, but this again would get sticky with Peek blocking.
Added info about this being a server.
Also I'm starting to realize a better Queuing method for queries might be the ultimate solution.
I can tell this whole mess stems from the fact that Peek blocks. You're really trying to fix something that is fundamentally broken in the framework and that is never easy (i.e. not a dirty hack). Personally, I would fix the root of the problem, which is the blocking Peek. Mono would've followed Microsoft's implementation and thus ends up with the same problem.
While I know exactly how to fix the problem should I be allowed to change the framework source code, the workaround is lengthy and time consuming.
But here goes.
Essentially, what Microsoft needs to do is change Process.StartWithCreateProcess such that standardOutput and standardError are both assigned a specialised type of StreamReader (e.g. PipeStreamReader).
In this PipeStreamReader, they need to override both ReadBuffer overloads (i.e. need to change both overloads to virtual in StreamReader first) such that prior to a read, PeekNamedPipe is called to do the actual peek. As it is at the moment, FileStream.Read() (called by Peek()) will block on pipe reads when no data is available for read. While a FileStream.Read() with 0 bytes works well on files, it doesn't work all that well on pipes. In fact, the .NET team missed an important part of the pipe documentation - PeekNamedPipe WinAPI.
The PeekNamedPipe function is similar to the ReadFile function with the following exceptions:
...
The function always returns immediately in a single-threaded application, even if there is no data in the pipe. The wait mode of a named pipe handle (blocking or nonblocking) has no effect on the function.
The best thing at this moment without this issue solved in the framework would be to roll out your own Process class (a thin wrapper around WinAPI would suffice).
Why dont you just set both Threads to be backround and then kill the app? It would cause an immediate closing of both threads.
You're building a server. You want to avoid blocking. The obvious solution is to use the asynchronous APIs:
var myProcess = Process.GetCurrentProcess();
StreamReader reader = myProcess.StandardOutput;
char[] buffer = new char[4096];
byte[] data;
int read;
while (!myProcess.HasExited)
{
read = await reader.ReadAsync(buffer, 0, 4096);
data = Server.ClientEncoding.GetBytes(buffer, 0, read);
await this.clientStream.WriteAsync(data, 0, data.Length);
}
No need to waste threads doing I/O work :)
Get rid of peek and use the method below to read from the process output streams. ReadLine() returns null when the process ends. To join this thread with your calling thread either wait for the process to end or kill the process yourself. ShutdownClient() should just Kill() the process which will cause the other thread reading the StdOut or StdErr to also exit.
private void ReadToEnd()
{
string nextLine;
while ((nextLine = stream.ReadLine()) != null)
{
output.WriteLine(nextLine);
}
}
While attempting to send a message for a queue through the BeginSend call seem te behave as a blocking call.
Specificly I have:
public void Send(MyMessage message)
{
lock(SEND_LOCK){
var state = ...
try {
log.Info("Begin Sending...");
socket.BeginSend(message.AsBytes(),0, message.ByteLength, SocketFlags.None,
(r) => EndSend(r), state);
log.Info("Begin Send Complete.");
}
catch (SocketException e) {
...
}
}
}
The callback would be something like this:
private void EndSend(IAsyncResult result) {
log.Info("EndSend: Ending send.");
var state = (MySendState) result.AsyncState;
...
state.Socket.EndSend(result, out code);
log.Info("EndSend: Send ended.");
WaitUntilNewMessageInQueue();
SendNextMessage();
}
Most of the time this works fine, but sometimes it hangs. Logging indicates this happens when BeginSend en EndSend are excecuted on the same Thread. The WaitUntilNewMessageInQueue blocks until there is a new message in the queue, so when there is no new message it can wait quit a while.
As far as I can tell this should not really be a problem, but in the some cases BeginSend blocks causing a deadlock situation where EndSend is blocking on WaitUntilNewMessageInQueue (expected), but Send is blocking on BeginSend in return as it seems te be waiting for the EndSend callback te return (not expected).
This behaviour was not what I was expecting. Why does BeginSend sometimes block if the callback does not return in timely fashion?
First of all, why are you locking in your Send method? The lock will be released before the send is complete since you are using BeginSend. The result is that multiple sends can be executing at the same time.
Secondly, do not write (r) => EndSend(r), just write EndSend (without any parameters).
Thrid: You do not need to include the socket in your state. Your EndSend method is working like any other instance method. You can therefore access the socket field directly.
As for your deadlocks, it's hard to tell. You delegate may have something to do with it (optimizations by the compiler / runner). But I have no knowledge in that area.
Need more help? Post more code. but I suggest that you fix the issues above (all four of them) and try again first.
Which operating system are you running on?
Are you sure you're seeing what you think you're seeing?
The notes on the MSDN page say that Send() CAN block if there's no OS buffer space to initiate your async send unless you have put the socket in non blocking mode. Could that be the case? Are you potentially sending data very quickly and filling the TCP window to the peer? If you break into the debugger what does the call stack show?
The rest is speculation based on my understanding of the underlying native technologies involved...
The notes for Send() are likely wrong about I/O being cancelled if the thread exits, this almost certainly depends on the underlying OS as it's a low level IO Completion Port/overlapped I/O issue that changed with Windows Vista (see here: http://www.lenholgate.com/blog/2008/02/major-vista-overlapped-io-change.html) and given that they're wrong about that then they could be wrong about how the completions (calls to EndSend() are dispatched on later operating systems). From Vista onwards it's possible that the completions could be dispatched on the issuing thread if the .Net sockets wrapper is enabling the correct options on the socket (see here where I talk about FILE_SKIP_COMPLETION_PORT_ON_SUCCESS)... However, if this were the case then it's likely that you'd see this behaviour a lot as initially most sends are likely to complete 'in line' and so you'd see most completions happening on the same thread - I'm pretty sure that this is NOT the case and that .Net does NOT enable this option without asking...
This is how you check if it completed synchronously so you avoid the callback on another thread.
For a single send:
var result = socket.BeginSend(...);
if (result.CompletedSynchronously)
{
socket.EndSend(result);
}
For a queue of multiple sends, you can just loop and finalize all synchronous sends:
while (true)
{
var result = socket.BeginSend(...);
if (!result.CompletedSynchronously)
{
break;
}
socket.EndSend(result);
}
I am using the TcpClient class in C#.
Each time there is a new tcp connection request, the usual practice is to create a new thread to handle it. And it should be possible for the main thread to terminate these handler threads anytime.
My solution for each of these handler thread is as follows:
1 Check NetworkStream's DataAvailable method
1.1 If new data available then read and process new data
1.2 If end of stream then self terminate
2 Check for terminate signal from main thread
2.1 If terminate signal activated then self terminate
3 Goto 1.
The problem with this polling approach is that all of these handler threads will be taking up significant processor resources and especially so if there is a huge number of these threads. This makes it highly inefficient.
Is there a better way of doing this?
See Asynchronous Server Socket Example to learn how to do this the ".NET way", without creating new threads for each request.
Believe it or not that 1000 tick sleep will really keep things running smooth.
private readonly Queue<Socket> sockets = new Queue<Socket>();
private readonly object locker = new object();
private readonly TimeSpan sleepTimeSpan = new TimeSpan(1000);
private volatile Boolean terminate;
private void HandleRequests()
{
Socket socket = null;
while (!terminate)
{
lock (locker)
{
socket = null;
if (sockets.Count > 0)
{
socket = sockets.Dequeue();
}
}
if (socket != null)
{
// process
}
Thread.Sleep(sleepTimeSpan);
}
}
I remember working on a similar kind of Windows Service. It was a NTRIP Server that can take around 1000 TCP connections and route the data to a NTRIP Caster.
If you have a dedicated server for this application then it will not be a problem unless you add more code to each thread (File IO, Database etc - although in my case I also had Database processing to log the in/out for each connection).
The things to watch out for:
Bandwidth when the threads goes up to 600 or so. You will start seeing disconnections when the TCP Buffer window is choked for some reason or the available bandwidth falls short
The operating system on which you are running this application might have some restrictions, which can cause disconnections
The above might not be applicable in your case but I just wanted it put it here because I faced then during development.
You're right that you do not want all of your threads "busy waiting" (i.e. running a small loop over and over). You either want them blocking, or you want to use asynchronous I/O.
As John Saunders mentioned, asynchronous I/O is the "right way" to do this, since it can scale up to hundreds of connections. Basically, you call BeginRead() and pass it a callback function. BeginRead() returns immediately, and when data arrives, the callback function is invoked on a thread from the thread pool. The callback function processes the data, calls BeginRead() again, and then returns, which releases the thread back into the pool.
However, if you'll only be holding a handful of connections open at a time, it's perfectly fine to create a thread for each connection. Instead of checking the DataAvailable property in a loop, go ahead and call Read(). The thread will block, consuming no CPU, until data is available to read. If the connection is lost, or you close it from another thread, the Read() call will throw an exception, which you can handle by terminating your reader thread.