.NET WebSocket closing for no apparent reason - c#

I am testing how .NET WebSockets work when the client can't process data from the server side fast enough. For this purpose, I wrote an application that sends data continuously to a WebSocket, but includes an artificial delay in the receive loop. As expected, once the TCP window and other buffers fill, the SendAsync calls start to take long to return. But after a few minutes, one of these exceptions is thrown by SendAsync:
System.Net.HttpListenerException: The device does not recognize the command
System.Net.HttpListenerException: The I/O operation has been aborted because of either a thread exit or an application request.
What's weird is, that this only happens with certain message sizes and certain timing. When the client is allowed to read all data unrestricted, the connection is stable. Also, when the client is blocked completely and does not read at all, the connection stays open.
Examining the data flow through Wireshark revealed that it is the server that is resetting the TCP connection while the client's TCP window is exhausted.
I tried to follow this answer (.NET WebSockets forcibly closed despite keep-alive and activity on the connection) without success. Tweaking the WebSocket keep alive interval has no effect. Also, I know that the final application needs to be able to handle unexpected disconnections gracefully, but I do not want them to occur if they can be avoided.
Did anybody encounter this? Is there some timeout tweaking that I can do? Running this should produce the error between a minute and half to three minutes:
class Program
{
static void Main(string[] args)
{
System.Net.ServicePointManager.MaxServicePointIdleTime = Int32.MaxValue; // has no effect
HttpListener httpListener = new HttpListener();
httpListener.Prefixes.Add("http://*/ws/");
Listen(httpListener);
Thread.Sleep(500);
Receive("ws://localhost/ws/");
Console.WriteLine("running...");
Console.ReadKey();
}
private static async void Listen(HttpListener listener)
{
listener.Start();
while (true)
{
HttpListenerContext ctx = await listener.GetContextAsync();
if (!ctx.Request.IsWebSocketRequest)
{
ctx.Response.StatusCode = (int)HttpStatusCode.NotImplemented;
ctx.Response.Close();
return;
}
Send(ctx);
}
}
private static async void Send(HttpListenerContext ctx)
{
TimeSpan keepAliveInterval = TimeSpan.FromSeconds(5); // tweaking has no effect
HttpListenerWebSocketContext wsCtx = await ctx.AcceptWebSocketAsync(null, keepAliveInterval);
WebSocket webSocket = wsCtx.WebSocket;
byte[] buffer = new byte[100];
while (true)
{
await webSocket.SendAsync(new ArraySegment<byte>(buffer), WebSocketMessageType.Binary, true, CancellationToken.None);
}
}
private static async void Receive(string serverAddress)
{
ClientWebSocket webSocket = new ClientWebSocket();
webSocket.Options.KeepAliveInterval = TimeSpan.FromSeconds(5); // tweaking has no effect
await webSocket.ConnectAsync(new Uri(serverAddress), CancellationToken.None);
byte[] receiveBuffer = new byte[10000];
while (true)
{
await Task.Delay(10); // simulate a slow client
var message = await webSocket.ReceiveAsync(new ArraySegment<byte>(receiveBuffer), CancellationToken.None);
if (message.CloseStatus.HasValue)
break;
}
}

I'm not a .NET developer but as far as I have seen these kind of problems in websocket topic and in my own opinion, these can be the reasons:
Very short timeout setting on websocket on both sides.
Client/Server side runtime exceptions (beside of logging, must check onError and onClose methods to see why)
Internet or connection failures. Websocket sometimes goes into IDLE mode too. You have to implement a heartbeat system on websockets to keep them alive. Use ping and pong packets.
check maximum binary or text message size on server side. Also set some buffers to avoid failure when message is too big.
As you said your error usually happens within a certain time, 1 and 2 must help you. Again sorry if I cant provide you codes, but I have had same problems in java and I found out these are the settings that must be set in order to work with websockets. Search how to set these in your client and server implementations and you must be fine after that.

Apparently, I was hitting an HTTP.SYS low speed connection attack countermeasure, as roughly described in KB 3137046 (https://support.microsoft.com/en-us/help/3137046/http-sys-forcibly-disconnects-http-bindings-for-wcf-self-hosted-servic):
By default, Http.sys considers any speed rate of less than 150 bytes per second as a potential low speed connection attack, and it drops the TCP connection to release the resource.
When HTTP.SYS does that, there is a trace entry in the log at %windir%\System32\LogFiles\HTTPERR
Switching it off was simple from code:
httpListener.TimeoutManager.MinSendBytesPerSecond = UInt32.MaxValue;

Related

How can I investigate application/network bottlenecks in my TCP application server and the environment?

I'm trying to write a high-performance TCP server (a LDAP server) using this tutorial by David Fowler as a base part of the MyServerListener.cs to handle incoming connections.
This is a simple .net 7 console app (with little changes) that I borrowed from David, it just accepts incoming clients, process the requests and writes hello to the response :
internal class Program
{
const int PORT = 389; // injecting from config
const int BACKLOG_LENGTH = 200; // max backlog size in windows server
static async Task Main(string[] args)
{
var listenSocket = new Socket(SocketType.Stream, ProtocolType.Tcp);
listenSocket.Bind(new IPEndPoint(IPAddress.Any, port));
Console.WriteLine("Listening on port " + port);
listenSocket.Listen(BACKLOG_LENGTH);
while (true)
{
var socket = await listenSocket.AcceptAsync();
_ = ProcessLinesAsync(socket);
}
}
private static async Task ProcessLinesAsync(Socket socket)
{
#if DEBUG
Console.WriteLine($"[{socket.RemoteEndPoint}]: connected");
#endif
// Create a PipeReader over the network stream
var stream = new NetworkStream(socket);
var reader = PipeReader.Create(stream);
var writer = PipeWriter.Create(stream);
while (true)
{
ReadResult result = await reader.ReadAsync();
ReadOnlySequence<byte> buffer = result.Buffer;
while (TryReadLine(ref buffer, out ReadOnlySequence<byte> line))
{
// Process the line.
ProcessLine(line);
try
{
// writing a sample message to the response
var helloBytes = Encoding.ASCII.GetBytes("hello\n");
await writer.WriteAsync(helloBytes);
}
catch (Exception ex)
{
throw;
}
}
// Tell the PipeReader how much of the buffer has been consumed.
reader.AdvanceTo(buffer.Start, buffer.End);
// Stop reading if there's no more data coming.
if (result.IsCompleted)
{
break;
}
}
// Mark the PipeReader as complete.
await reader.CompleteAsync();
#if DEBUG
Console.WriteLine($"[{socket.RemoteEndPoint}]: disconnected");
#endif
}
private static bool TryReadLine(ref ReadOnlySequence<byte> buffer, out ReadOnlySequence<byte> line)
{
// Look for a EOL in the buffer.
SequencePosition? position = buffer.PositionOf((byte)'\n');
if (position == null)
{
line = default;
return false;
}
// Skip the line + the \n.
line = buffer.Slice(0, position.Value);
buffer = buffer.Slice(buffer.GetPosition(1, position.Value));
return true;
}
private static void ProcessLine(in ReadOnlySequence<byte> buffer)
{
foreach (var segment in buffer)
{
// Doing some tasks
#if DEBUG
Console.Write(Encoding.UTF8.GetString(segment.Span));
Console.WriteLine();
#endif
}
}
}
This server listens on a port (389), processes the incoming request, doing some jobs and then writes a message to the response using PipeReader and PipeWriter.
I'm trying to do my best to a less memory/heap allocation code (using span<>, memory<>, ...) as I can, to keep my codebase so fast and optimize. But for now, I'm trying to test the production environment with the above code to examine the throughput; I mean: the server resources, my TCP server application itself, clients and the network;
I'm using Apache JMeter to test (load/stress test).
In some scenarios (sending more than 5000 request/sec) I get Connection refused error messages in JMeter logs, but I don't have any high pressure in the server or client's (JMeter[s]) resources (CPU/Memory).
I tried to optimize the server's configuration and changed some TCP related parameters (I googled about them) like MaxUserPort: 65534, TcpTimedWaitDelay: 30 or different backlog size, but no improvements.
So I'm almost sure that there is sth related to the network (packet dropping/rejecting or sth like this).
I also turned off firewall in the testing clients and the server, But I don't have any access to the network configurations (and I don't know what are they) like firewalls, ISA, TMG, etc.
_____________
Update 1:
I already increased our clients ephemeral ports to the maximum range using this PS script:
netsh int ipv4 set dynamic tcp start=5000 num=65535
and now we have this :
netsh int ipv4 show dynamicport tcp
Start Port : 1024
Number of Ports : 64511
And we also checked JMeter logs to see any error indicating this situation (Ephemeral ports exhaustion), at first we saw this message :
Non HTTP response code: java.net.BindException,Non HTTP response
message: Address already in use
But now, it's gone and we don't have large number of TIME_WAIT ports to worry about.
And we are also testing our scenario with SO_LINGER:0 and monitoring real times TIME_WAIT ports (using some tools), and we are sure that this isn't our concern right now.
_____________
So my question is, how can I find out why I can't send more traffic (threads/requests per seconds in JMeter clients) to the server to testing my TCP server application performance? Because for now, the server CPU doesn't increase more than ~10%.
At this point, is this a network related problem? How can I be sure about that? e.g: can I use some network analyzers (e.g: PRTG network monitor) to find out any dropped TCP packets? Or any other tips welcomed
Most probably TCP ports are not recycled fast enough, there is a network parameter which controls the time which connection can stay in TIME_WAIT state so you might also want to reduce TcpTimedWaitDelay
Also it might be a good idea to increase maximum number of TCP connections via TcpNumConnections parameter
And last but not the least it might be the case JMeter is not capable of sending the requests fast enough so you might need to play the same trick on the load generator side. In addition make sure to follow JMeter Best Practices and monitor CPU/RAM/Network/Disk/Swap usage on JMeter side as it might be the case you will need to switch to Distributed Testing if one machine is not capable of giving more than 5k requests per second.

System.Net.WebSockets Connection OPEN before server is listening

I'm running into issues on a setup that's under load accepting the connection and setting it to OPEN before the server is read to read from the socket.
Example(not the actual code):
while (true) {
var objContext = HttpListenerContext.GetContext();
if (objContext.Request.IsWebSocketRequest){
var WebSocket = (await HttpListenerContext.AcceptWebSocketAsync(null)).WebSocket;
while (WebSocket.State == WebSocketState.Open) {
var buffer = new byte[BUFFERSIZE];
var result = await WebSocket.ReceiveAsync(new ArraySegment<byte>(buffer), CancellationToken);
....
}
}
}
When adding breakpoints on the client and server I've noticed after the AcceptWebSocketAsync is called, the clients receives that the connection is 'OPEN' and ready to be used and starts sending data.
The issue is that ReceiveAsync hasn't started at that time yet, reason is that the ReceiveAsync is a bit slower to start running the task, they stay in WaitingForActivation a bit longer then on an idle system due to the server in question being under high load (lots of other tasks/threads and low number of cpu's).
Is this normal behavior and how can this be stopped?
Had someone go over my code, that person found that the ReceiveAsync being called later isn't an issue, the info is in the buffer.
It was higher up that an event handler was attached too late.

Stream CopyToAsync - Detect client disconnection and set a timeout

I am writing a ConnectionHandler as a part of Kestrel. The idea is that when a client connects, the ConnectionHandler opens a socket with another server in the network, gets a continuous stream of data and forwards them back to the client. In the meantime, the client can also send data to the ConnectionHandler that the latter is constantly forwarding to the other server in the network (opened socket).
public override async Task OnConnectedAsync(ConnectionContext connection)
{
TcpClient serverSocket = TcpClient(address, port);
serverSocket.ReceiveTimeout = 10000;
serverSocket.SendTimeout = 10000;
NetworkStream dataStream = serverSocket.GetStream();
dataStream.ReadTimeout = 10000;
dataStream.WriteTimeout = 10000;
Stream clientStreamOut = connection.Transport.Output.AsStream();
Stream clientStreamIn = connection.Transport.Input.AsStream();
Task dataTask = Task.Run(async () =>
{
try
{
await dataStream.CopyToAsync(clientStreamOut);
}
catch
{
await LogsHelper.Log(logStream, LogsHelper.BROKEN_CLIENT_STREAM);
return;
}
}, connection.ConnectionClosed);
Task clientTask = Task.Run(async () =>
{
try
{
await clientStreamIn.CopyToAsync(dataStream);
}
catch
{
await LogsHelper.Log(logStream, LogsHelper.BROKEN_DATA_STREAM);
return;
}
}, connection.ConnectionClosed);
await Task.WhenAny(dataTask, clientTask);
}
I am encountering 3 issues:
For the socket with the other server, I am using a TcpClient and I use a NetworkStream. Even though I am setting both ReadTimeout and WriteTimeout to 10 seconds, for both TcpClient and NetworkStream, the opened socket is waiting forever, even if the other server in the network does not send any data for 5 minutes.
Setting timeout for clientStreamOut and clientStreamIn (e.g: clientStreamIn.ReadTimeout = 10000;) is also failing with an exception that it's not supported for that particular stream. I was wondering, is it possible somehow to provide a timeout?
When a client connects to the ConnectionHandler, OnConnectedAsync is triggered. The issue with the code comes when a client disconnects (either due to network drop or for whatever reason). Sometimes disconnection of the client is being detected and the session terminates, while other times it hangs forever, even if the client has actually been disconnected. I was expecting that CopyToAsync will throw an exception in case of a disconnection since I assume that CopyToAsync is trying to write, but that's not always the case.
connection.ConnectionClosed is a CancellationToken that comes from OnConnectedAsync, I read here https://github.com/dotnet/runtime/issues/23207 that it can be used in CopyToAsync. However, I am not sure how I can use it. Also, it is worth to mention that I have zero control over the client code.
I am running the app using Docker
FROM mcr.microsoft.com/dotnet/core/sdk:3.1
FROM mcr.microsoft.com/dotnet/core/aspnet:3.1
The ReadTimeout and WriteTimeout properties only apply to synchronous reads/writes, not asynchronous ones.
For asynchronous code, you'll need to implement your own read timeouts (write timeouts are generally unnecessary). E.g., use Task.Delay and kill the connection if data isn't received in that time.

RabbitMQ c# System.IO.EndOfStreamException

I get the following exception when a consumer is blocking to receive a message from the SharedQueue:
Unhandled Exception: System.IO.EndOfStreamException: SharedQueue closed
at RabbitMQ.Util.SharedQueue.EnsureIsOpen()
at RabbitMQ.Util.SharedQueue.Dequeue()
at Consumer.Program.Main(String[] args) in c:\Users\pdecker\Documents\Visual
Studio 2012\Projects\RabbitMQTest1\Consumer\Program.cs:line 33
Here is the line of code that is being executed when the exception is thrown:
BasicDeliverEventArgs e = (BasicDeliverEventArgs)consumer.Queue.Dequeue();
So far I have seen the exception occuring when rabbitMQ is inactive. Our application needs to have the consumer always connected and listening for keystrokes. Does anyone know the cause of this problem? Does anyone know how to recover from this problem?
Thanks in advance.
The consumer is tied to the channel:
var consumer = new QueueingBasicConsumer(channel);
So if the channel has closed, then the consumer will not be able to fetch any additional events once the local Queue has been cleared.
Check for the channel to be open with
channel.IsOpen == true
and that the Queue has available events with
if( consumer.Queue.Count() > 0 )
before calling:
BasicDeliverEventArgs e = (BasicDeliverEventArgs)consumer.Queue.Dequeue();
To be more specific, I would check the following before calling Dequeue()
if( !channel.IsOpen || !connection.IsOpen )
{
Your_Connection_Channel_Init_Function();
consumer = new QueueingBasicConsumer(channel); // consumer is tied to channel
}
if( consumer.Queue.Any() )
BasicDeliverEventArgs e = (BasicDeliverEventArgs)consumer.Queue.Dequeue();
Don't worry this is just expected behavior, it means there is no message left in queue to process. Don't even try it is not gonna work...
consumer.Queue.Any()
Just catch the EndOfStreamException:
private void ConsumeMessages(string queueName)
{
using (IConnection conn = factory.CreateConnection())
{
using (IModel channel = conn.CreateModel())
{
var consumer = new QueueingBasicConsumer(channel);
channel.BasicConsume(queueName, false, consumer);
Trace.WriteLine(string.Format("Waiting for messages from: {0}", queueName));
while (true)
{
BasicDeliverEventArgs ea = null;
try
{
ea = consumer.Queue.Dequeue();
}
catch (EndOfStreamException endOfStreamException)
{
Trace.WriteLine(endOfStreamException);
// If you want to end listening end of queue call break;
break;
}
if (ea == null) break;
var body = ea.Body;
// Consume message how you want
Thread.Sleep(300);
channel.BasicAck(ea.DeliveryTag, false);
}
}
}
}
There is another possible source of trouble: your corporate firewall.
Thats because such firewall can drop your connection to RabbitMQ when the connection is idle for a certain amount of time.
Although RabbitMQ connection has a heartbeat feature to prevent this, if the heartbeat pulse happens after the firewall connection timeout, it is useless.
This is the default heartbeat interval configuration in seconds:
Default: 60 (580 prior to release 3.5.5)
From RabbitMQ:
Detecting Dead TCP Connections with Heartbeats
Introduction
Network can fail in many ways, sometimes pretty subtle (e.g. high
ratio packet loss). Disrupted TCP connections take a moderately long
time (about 11 minutes with default configuration on Linux, for
example) to be detected by the operating system. AMQP 0-9-1 offers a
heartbeat feature to ensure that the application layer promptly finds
out about disrupted connections (and also completely unresponsive
peers).
Heartbeats also defend against certain network equipment which
may terminate "idle" TCP connections.
That happened to us and we solved the problem by decreasing the Heartbeat Timeout Interval in the global configuration:
In your rabbitmq.config, find the heartbeat and set it to a value smaller than that of your firewall rule.
You can change the interval in your client, too:
Enabling Heartbeats with Java Client To configure the heartbeat
timeout in the Java client, set it with
ConnectionFactory#setRequestedHeartbeat before creating a connection:
ConnectionFactory cf = new ConnectionFactory();
// set the heartbeat timeout to 60 seconds
cf.setRequestedHeartbeat(60);
Enabling Heartbeats with the .NET Client To configure the heartbeat
timeout in the .NET client, set it with
ConnectionFactory.RequestedHeartbeat before creating a connection:
var cf = new ConnectionFactory();
//set the heartbeat timeout to 60 seconds
cf.RequestedHeartbeat = 60;
The answers here that say that this is the expected behavior are correct, however I would argue that it's bad to have it throw an exception by design like this.
from the documentation: "Callers of Dequeue() will block if no items are available until some other thread calls Enqueue() or the queue is closed. In the latter case this method will throw EndOfStreamException."
So, like GlenH7 said, you have to check that channel is open before calling Dequeue() (IModel.IsOpen).
However, what if the channel closes while Dequeue() is blocking? I think it's best to call Queue.DequeueNoWait(null), and block the thread yourself by waiting for it to return something that isn't null. So, something like:
while(channel.IsOpen)
{
var args = consumer.Queue.DequeueNoWait(null);
if(args == null) continue;
//...
}
This way, it won't throw that exception.

Why would TcpListener be leaking ESTABLISHED connections?

I have an application that's listening messages from a modem in some 30 cars. I've used TcpListener to implement server code that looks like this (error handling elided):
...
listener.Start()
...
void
BeginAcceptTcpClient()
{
if(listener.Server.IsBound) {
listener.BeginAcceptTcpClient(TcpClientAccepted, null);
}
}
void
TcpClientAccepted(IAsyncResult ar)
{
var buffer = new byte[bufferSize];
BeginAcceptTcpClient();
using(var client = EndAcceptTcpClient(ar)) {
using(var stream = client.GetStream()) {
var count = 0;
while((count = stream.Read(buffer, total, bufferSize - total)) > 0) {
total += count;
}
}
DoSomething(buffer)
}
I get the messages correctly, my problem lies with disconnections. Every 12 hours the modems get reset and get a new IP address, but the Server continues to hold the old connections active (they are marked as ESTABLISHED in tcpview). Is there any way to set a timeout for the old connections? I thought that by closing the TcpClient the TCP connection was closed (and that's what happens in my local tests), what I'm doing wrong?
I'm actually a little confused by the code sample - the question suggests that these questions are open a reasonably long time, but the code is more typical for very short bursts; for a long-running connection, I would expect to see one of the async APIs here, not the sync API.
Sockets that die without trace are very common, especially when distributed with a number of intermediate devices that would all need to spot the shutdown. Wireless networks in particular sometimes try to keep sockets artificially alive, since it is pretty common to briefly lose a wireless connection, as the devices don't want that to kill every connection every time.
As such, it is pretty common to implement some kind of heartbeat on connections, so that you can keep track of who is still really alive.
As an example - I have a websocket server here, which in theory handles both graceful shutdowns (via a particular sequence that indicates closure), and ungraceful socket closure (unexpectedly terminating the connection) - but of the 19k connections I've seen in the last hour or so, 70 have died without hitting either of those. So instead, I track activity against a (slow) heartbeat, and kill them if they fail to respond after too long.
Re timeout; you can try the ReceiveTimeout, but that will only help you if you aren't usually expecting big gaps in traffic.

Categories

Resources