Scalability test - Connections dropped after many connections are established

Scalability test - Connections dropped after many connections are established - c#

I am programming with sockets (TcpListener and TcpClient actually) in C#. I wrote a server that accepts client connections and streams data to them.
In order to test scalability, I wrote a test harness that creates a certain number of connections (say 1000) in a loop, connects to the server, and writes whatever data is received to the console.
After the server receives about 1300 connections, the clients' connection attempts start failing with a regular "No connection could be made because the target machine actively refused it" exception. If the clients keep trying, some connections get through, but there are still many of them that don't. I even tried putting in delays, e.g. three simultaneous clients each opening one connection per second to the server, but the problem remains.
My guess was that the listen backlog was becoming full, but given the delays I introduced, I now doubt it. How can this behaviour be explained and solved?
Edit: before anyone else jumps on this question and marks it as duplicate without having read it...
I am using asynchronous sockets using the Asynchronous Programming Model. That's the old BeginXXX/EndXXX pattern, not the new async/await pattern. The APM uses the Thread Pool underneath, so this is not a naive one-thread-per-connection model. The connections are dormant most of the time unless I/O occurs. In that case, the .NET Framework automatically allocates threads to handle this.
Edit 2: The gist of this question, for those who thought it was too [insert silly adjective here], is: why does a server drop connections when under a heavy load? The error message I quoted usually occurs when a connection cannot be established (i.e. when you got the ip/port wrong), but this clearly isn't the case.

Related

Winsock IOCP Weird Behaviour On Disconnect Flood

I'm programming a Socket/Client/Server library for C#, since I do a lot of cross-platform programming, and I didn't find mono/dotnet/dotnet core enough efficient in high-performance socket handling.
Sice linux epoll unarguably won the performance and usability "fight", I decided to use an epoll-like interface as common API, so I'm trying to emulate it on Windows environment (windows socket performance is not that important for me than linux, but the interface is). To achieve this I use the Winsock2 and Kernel32 API's directly with Marshaling and I use IOCP.
Almost everything works fine, except one: When I create a TCP server with winsock, and connect to it (from local or from a remote machine via LAN, does not matter) with more than 10000 connections, all connections are accepted, no problem at all, when all connections send data (flood) from the client side to the server, no problem at all, server receives all packets, but when I disconnect all clients at the same time, the server does not recognize all disconnection events (i.e. 0 byte read/available), usually 1500-8000 clients stuck. The completion event does not get triggered, therefore I can not detect the connection loss.
The server does not crash, it continues accept new connections, and everything works as expected, only the lost connections do not get recognized.
I've read that - because using overlapped IO needs pre-allocated read buffer - IOCP on reading locks these buffers and releases the locks on completion, and if too many events happen in the same time it can not lock all affected buffers because of an OS limit, and this causes IOCP hang for indefinite time.
I've read that the solution to this buffer-lock problem is I should use a zero-sized buffer with null-pointer to the buffer itself, so the read event will not lock it, and I should use real buffer only when I read real data.
I've implemented the above workaround and it works, except the original problem, after disconnecting many-thousands of clients in the same time, a few-thousand stuck.
Of course I keep up the possibility my code is wrong, so I made a basic server with dotnet's built in SocketAsyncEventArgs class (as the official example describes), that basically does the same using IOCP, and the results are the same.
Everything works fine, except the thousands of client disconnecting in the same time, a few-thousand of disconnection (read on disconnect) events does not get recognized.
I know I should do IO operation and check the return value if the socket is still can perform the IO, and if not, then disconnect it. The problem is in some cases I have nothing to tell the socket, I just receive data, or if I do it periodically this would be almost the same as polling, and would cause high load with thousands of connections, wasted CPU work.
(I use closing the clients numerous closing methods, from gaceful disconnection to proper TCP Socket closing, both on windows and linux clients, results are always the same)
My questions:
Is there any known solution to this problem?
Is there any efficient way to recognize TCP (graceful) connection closing by remote?
Can I somehow set a read-timeout to overlapped socket read?
Any help would be appreciated, thank You!

error message trying to connect a few hundred clients to a single server all at once (socket error 10061)

I've created a server-client (well I created the client, my coworker created the server in a different language) application. When I try to connect 900+ clients (all individual TCP clients) to the server I get the following excpetion:
No connection could be made because the target computer actively refused it.
Socket error: 10061
WSAECONNREFUSED
Connection refused. No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running.
Eventually, if I wait long enough they will all connect (because we've made our own reconnect/keep alive) on top of the TCP socket. So if it fails it will simply try again till it succeeds.
Do I get this error because the server is 'overloaded' and can't handle all the requests all at once (i'm creating 900 clients each in a separate thread so it's pretty much all trying to connect simultaneously).
Is there a way to counter act this? can we tweak a TCP Socket option so it can handle more clients all at once?
It might also be good to note that i'm running the server and client on the same machine, running it on different machines seems to reduce the number of error messages. Which is why I think this is some sort of capacity problem because the server can't handle them all that fast.

Do I get this error because the server is 'overloaded' and can't
handle all the requests all at once (i'm creating 900 clients each in
a separate thread so it's pretty much all trying to connect
simultaneously).
Yes, that is insane(creating 900 individual threads, you should create thread pool using ConcurrentQueue and limit the queue)!
You can increase the number of backlogs using Socket.Listen(backlog);, where backlog is the maximum length of the pending connections queue. The backlog parameter is limited to different values depending on the Operating System. You may specify a higher value, but the backlog will be limited based on the Operating System.
Is there a way to counter act this? can we tweak a TCP Socket option
so it can handle more clients all at once?
Not in this case(here it is 900 request already); but, in general - YES, provide more backlog in the Socket.Listen(backlog) for other cases having lesser backlog. Also, use a connection pool(already managed by the .NET runtime environment) and a ConcurrentQueue for handling threads in order. But, remember, you can't provide more backlog than the number which is limited by the OS. 900 is way too much for simpler machines!
It might also be good to note that i'm running the server and client
on the same machine, running it on different machines seems to reduce
the number of error messages.
It'll, because the load has been distributed on 2 different machines. But, still you shouldn't try what you're doing. Create a ConcurrentQueue to manage the threads(in order).

Accepting many tcp connections at one time using c#

I want to accept about 5000 tcp client that are trying to connect exactly at one time.
when i test the program many of client can connect succusfully but many of them cant by giving "No connection could be made because the target machine actively refused it" error.
i increased backlog parameter of listen method of my socket but it didn't help
the code i used is the example of msdn with this link. can anybody help me?

It is ok for underlying stack to refuse connections while busy accepting other connections(nothing is really parallel inside). If you really need to connect that many clients at a time, you can change client logic a bit: reconnect on failing (like proposed in comments). Or you can start multiple listeners on different threads on different ports and choose which port to connect by fair dice on clientside.

You may want to look into using the SocketAsyncEventArgs object. Then you can have accept sockets to handle the initial connections and once the connection is established the accept socket will hand it off to a worker SocketAsyncEventArgs object.
Check out this project to get started with it. http://www.codeproject.com/Articles/83102/C-SocketAsyncEventArgs-High-Performance-Socket-Cod
What I have found is that using this technique works really well, but you will run up against the limitations of the OS and hardware. I tested my tcp server (running on windows server 2008 R2), which uses SocketAsyncEventArgs object, with a few thousand connections and it worked successfully without the clients getting rejected (Had to increase the backlog for this). The problem was that the time between the client and server establishing the connection, and the client getting a response, grew as the number of simultaneous connection requests grew.

Are TCP Connections resource intensive?

I have a TCP server that gets data from one (and only one) client. When this client sends the data, it makes a connection to my server, sends one (logical) message and then does not send any more on that connection.
It will then make another connection to send the next message.
I have a co-worker who says that this is very bad from a resources point of view. He says that making a connection is resource intensive and takes a while. He says that I need to get this client to make a connection and then just keep using it for as long as we need to communicate (or until there is an error).
One benefit of using separate connections is that I can probably multi-thread them and get more throughput on the line. I mentioned this to my co-worker and he told me that having lots of sockets open will kill the server.
Is this true? Or can I just allow it to make a separate connection for each logical message that needs to be sent. (Note that by logical message I mean an xml file that is of variable length.)

It depends entirely on the number of connections that you are intending to open and close and the rate at which you intend to open them.
Unless you go out of your way to avoid the TIME_WAIT state by aborting the connections rather than closing them gracefully you will accumulate sockets in TIME_WAIT state on either the client or the server. With a single client it doesn't actually matter where these accumulate as the issue will be the same. If the rate at which you use your connections is faster than the rate at which your TIME_WAIT connections close then you will eventually get to a point where you cannot open any new connections because you have no ephemeral ports left as all of them are in use with sockets that are in TIME_WAIT.
I write about this in much more detail here: http://www.serverframework.com/asynchronousevents/2011/01/time-wait-and-its-design-implications-for-protocols-and-scalable-servers.html
In general I would suggest that you keep a single connection and simply reopen it if it gets reset. The logic may appear to be a little more complex but the system will scale far better; you may only have one client now and the rate of connections may be such that you do not expect to suffer from TIME_WAIT issues but these facts may not stay the same for the life of your system...

The initiation sequence of a TCP connection is a very simple 3 way handshake which has very low overhead. No need to maintain a constant connection.
Also having many TCP connections won't kill your server so fast. modern hardware and operating systems can handle hundreds of concurrect TCP connections, unless you are afraid of Denial of service attacks which are out of the scope of this question obviously.

If your server has only a single client, I can't imagine in practice there'd be any issues with opening a new TCP socket per message. Sounds like your co-worker likes to prematurely optimize.
However, if you're flooding the server with messages, it may become an issue. But still, with a single client, I wouldn't worry about it.
Just make sure you close the socket when you're done with it. No need to be rude to the server :)

In addition to what everyone said, consider UDP. It's perfect for small messages where no response is expected, and on a local network (as opposed to Internet) it's practically reliable.

From the servers perspective, it not a problem to have a very large number of connections open.
How many socket connections can a web server handle?
From the clients perspective, if measuring shows you need to avoid the time initiate connections and you want parallelism, you could create a connection pool. Multiple threads can re-use each of the connections and release them back into the pool when they're done. That does raise the complexity level so once again, make sure you need it. You could also have logic to shrink and grow the pool based on activity - it would be ashame to hold connections open to the server over night while the app is just sitting their idle.

What is the preferred way to handle this TCP connection in C#?

I have a server application (singleton, simple .NET console application) that talks to a GlobalCache GC-100-12 for the purpose of routing IR commands. Various .NET WinForm clients on the local network connect to my server application and send ASCII commands to it. The server application queues these ASCII commands and then sends them to the GC-100-12 via a TCP connection.
My question is, what is the best way to handle this connection from the server's point of view? I can think of two ways:
Create and Open a new TcpClient for each individual request. Close the TcpClient when the request is done.
Create and Open one TcpClient when the server starts and use a keep-alive (if necessary) to keep the connection open for the lifetime of the server object.
I ask this question because I wonder about the overhead of creating a new TcpClient for each request. Is it an expensive operation? Is this a bad practice?
Currently I am doing #1, and printing the results of each transmission to the console. Occasionally some connections timeout and the command doesn't get routed, and I was wondering if that was because of the overhead of creating a new TcpConnection each time, or if it is due to something else.
I can see #2 being more complicated because if the connection does drop it has to be recreated, and that will require a bit more code to handle that circumstance.
I'm looking for any general advice on this. I don't have a lot of experience working with the TcpClient class.

We had a simillar case of opening a telnet session to an old PICK based system. We found that the cost of opening the TCP connection each time a request came in was fairly expensive, and we decided to implement a no-op routine to keep the connection open. It is more complex, but as long as your end point is not trying to serve many many clients then pinning a connection sounds like a viable solution.
You could also set it up to have a timeout, if you want to prevent keeping a connection open when there is no traffic. Five minutes of no activity then shut down the connection.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.