C# TCP Asynchronous connection actively refused with low CPU utilization - c#

I have a C# TCP Asynchronous socket server.
Testing locally, I can accept 5K connections over 35 seconds. I am running a quad core hyper threaded PC (8 threads) and the utilization is ~20-40% on each core. As the connections are accepted, the server asks the client for a bunch of data and then performas a bunch of database entries.
I moved the server application and SQL database to a small database and medium server instance on Amazon AWS.
The Amazon Medium server(EC2) is 1 virtual core and 2 ECU's. From what I can tell it only has 1 thread.(from performance monitor)
If I try to connect 1000 clients over 35 seconds to the medium server. After ~650 connections i start to receive Connect failed(b) No connection could be made because the target machine actively refused it.
Looking at performance monitor, I noticed that the CPU utilization is only ~10-15%.
I am guessing that the core is not getting loaded because it is trying to handle a huge queue of connections, of small operations and does not provide enough load to increase CPU usage, since the server only has 1 virtual core.
Does anyone have any experience with this, does my theory make sense? I am stuck at a hardware limitation? (need to bump up the server size?)
If not, any ideas how to get more utilization and support more/quicker connections?
Does anyone have experience with this?
EDIT:
I upgraded my instance on Amazon EC2 to the High CPU Extra Large instance.
The Instance now has 8 cores and 20 ECU's.
I am experiencing the same problem, I get No connection could be made because the target machine actively refused it after ~600 connections.

Check if the Amazon servers are configured to allow that many open TCP connections. In windows you have to set the server to allow connection by adding a registry entry for Max allowed TCP connections.

your code seems to be utilizing multiple cores while you were running it locally. Have you made changes in the code while you moved it to single core?. I have seen similar case in my application when I was testing with dummy user creation and programmatically making them access the application at same time. After certain number , I used to get connection dropped / not able to access.
If your requirement is for heavy utilization, you can go with higher config EC2 and get the reserved instance ( if you are looking to use it for long time), which might cost you very economic. Also, try using cloudfront , loadbalancer( if you have 2/more instances). they will definitely improve the number of users you can handle currently. Hope it helps.

I was able to fix this issue. I upgraded my Instance to use PIOPS. I also used a better internet connection locally, as my old upload rate was too slow to support sending all the data and connections.
With these 2 changes I am able to create 5K+ TCP connections locally and connect to the server no problem.

Related

Pushing a notification to million users with ASP.NET core, websocket limits and SignalR performance

I have a social-style web-app, and I want to add "live" updates to votes under posts.
At first I thought that a client-side loop with /GET polling the server every few seconds would be an inferior choice, and that all cool kids use websockets or server sent events.
But now I found that websocket would be limited at 65k live connections (even less in practice).
Is there a way to push vote updates to a big number of users realtime?
The app has around ~2 million daily users, so I'd expect 200-300k simultaneous socket connections.
The stack is ASP.NET Core backend hosted on an Ubuntu machine with nginx reverse proxy.
At current state all load is easily handled by a single machine, and I don't really want to add multiple instances just to be able to work with signalR.
May be there is a simple solution that I'm missing out?

30,000+ sticky TCP connections causing delaying issue and unresponsiveness in .NET 4.5 Application

I have 30k+ active devices that connect through TCP sticky session from my application. Each request than creates its own threads and start receiving and sending data. Each requests around take 20 mins to send the complete packets over the network. It means that application need to create 30K threads approximately.
Problem is that my application sometimes become unresponsive and doesn't send and receive messages and it has happening randomly. I was reading on HERE that the max limit of threads in each process is 2000. I am wondering if this is the bad design to hold the 30k+ session and wait for new message to send and receive? What can be done to manage such load?
My Assumption/Suggestion:
1: Creating 2000 threads at max would cause the application to slow because of context switching?
2: At any point in time 28000 devices will be on wait because of worker thread limit?
System Information:
64 bit
8 CPU cores
Use fewer threads. Use async IO instead. I would strongly recommend using Kestrel to host this using the "pipelines" model, which would deal with threading, buffer management, and most of the TCP nuances for you, so you can just worry about your protocol and application code. It scales far beyond your needs (although for 30k sockets you might want to start using multiple ports, especially if you're behind a load balancer, due to ephemeral port exhaustion). Kestrel would require .NET Core or above, but the same approach is available on .NET Framework via Pipelines.Sockets.Unofficial. One guide to pipelines and Kestrel in this context is in a 4-part series here - other guides are available.

error message trying to connect a few hundred clients to a single server all at once (socket error 10061)

I've created a server-client (well I created the client, my coworker created the server in a different language) application. When I try to connect 900+ clients (all individual TCP clients) to the server I get the following excpetion:
No connection could be made because the target computer actively refused it.
Socket error: 10061
WSAECONNREFUSED
Connection refused. No connection could be made because the target computer actively refused it. This usually results from trying to connect to a service that is inactive on the foreign host—that is, one with no server application running.
Eventually, if I wait long enough they will all connect (because we've made our own reconnect/keep alive) on top of the TCP socket. So if it fails it will simply try again till it succeeds.
Do I get this error because the server is 'overloaded' and can't handle all the requests all at once (i'm creating 900 clients each in a separate thread so it's pretty much all trying to connect simultaneously).
Is there a way to counter act this? can we tweak a TCP Socket option so it can handle more clients all at once?
It might also be good to note that i'm running the server and client on the same machine, running it on different machines seems to reduce the number of error messages. Which is why I think this is some sort of capacity problem because the server can't handle them all that fast.
Do I get this error because the server is 'overloaded' and can't
handle all the requests all at once (i'm creating 900 clients each in
a separate thread so it's pretty much all trying to connect
simultaneously).
Yes, that is insane(creating 900 individual threads, you should create thread pool using ConcurrentQueue and limit the queue)!
You can increase the number of backlogs using Socket.Listen(backlog);, where backlog is the maximum length of the pending connections queue. The backlog parameter is limited to different values depending on the Operating System. You may specify a higher value, but the backlog will be limited based on the Operating System.
Is there a way to counter act this? can we tweak a TCP Socket option
so it can handle more clients all at once?
Not in this case(here it is 900 request already); but, in general - YES, provide more backlog in the Socket.Listen(backlog) for other cases having lesser backlog. Also, use a connection pool(already managed by the .NET runtime environment) and a ConcurrentQueue for handling threads in order. But, remember, you can't provide more backlog than the number which is limited by the OS. 900 is way too much for simpler machines!
It might also be good to note that i'm running the server and client
on the same machine, running it on different machines seems to reduce
the number of error messages.
It'll, because the load has been distributed on 2 different machines. But, still you shouldn't try what you're doing. Create a ConcurrentQueue to manage the threads(in order).

Any Downside to Increasing "maxconnection" Setting in system.net?

Our system was having a problem with WCF connections being limited, which was solved by this answer. We added this setting to the client's web.config, and the limit of two concurrent connections went away:
Outside of the obvious impacts (e.g. overloading the server), are there any downsides to setting this limit to a number (possibly much) higher than the default "2"? Any source on the reasoning for having the default so low to begin with?
In general, it's OK to raise the client connection limit, with a few caveats:
If you don't own the server, then be careful because your client app might be confused with a DoS attack which might lead to your client IP address being blocked by the server. Even if you own the server, this is sometimes a risk-- for example, we've had cases where a bug in our app's login page caused multiple requests to be issued when the user held down the Enter key. This caused these users to get blocked from our app because of our firewall's DoS protection!
Connections aren't free. They take up RAM, CPU, and other scarce resources. Having 5 or 10 client connections isn't a problem, but when you have hundreds of open client connections then you risk running out of resources on the client.
Proxies or edge servers between client and server may impose their own limits. So you may try to open 1,000 simultaneous connections only to have #5 and later refused by the proxy.
Sometimes, adding more client connections is a workaround for an architectural problem. Consider fixing the architectural problem instead. For example, if you're opening so many connections because each request takes 10 minutes to return results, then you really should look at a more loosely-coupled solution (e.g. post requests to a server queue and come back later to pick up results) because long-lived connections are vulnerable to network disruption, transient client or server outages, etc.
Being able to open many simultaneous connections can make it risky to restart your client or server app, because even if your "normal" load is only X requests/sec, if either client or server has been offline for a while, then the client may try to catch up on pending requests by issuing hundreds or thousands of requests all at once. Many servers have a non-linear response to overload conditions, where an extra 10% of load may reduce response time by 100%, creating a runaway overload condition.
The solution to all these risks is to carefully load-test both client and server with the maximum # of connections you want to support... and don't set your connection limit higher than what you've tested. Don't just crank the connection limit to 1,000,000 just because you can!
To answer the other part of your question, the default limit of 2 connections goes back to a very old version of the HTTP specification which limited clients to 2 connections per domain, so that web browsers wouldn't swamp servers with a lot of simultaneous connections. For more details, see this answer: https://stackoverflow.com/a/39520881/126352

Winsock closesocket() performance (local computer, 127.0.0.1): Why is it so slow on some computers and ultra fast with others?

I'm struggling with an odd performance problem related to closing database connections in my C# code. We are using a database server called Raima on a local computer (only local TCP connection 127.0.0.1 to the local database server on the same computer, not across a LAN) via its native Raima API (not ADO.NET, just a .NET wrapper).
The problem is that on many computers (high-performance dual-core or quad-core computers) the closing takes about 120ms-250ms most of the time (e.g. 120ms in a .NET C# web service and 250ms in a .NET C# Windows application) while on the other computers it takes only 4ms (steady). What confuses me is that on some computers it's, for example, 120ms most of the time, but occasionally it may jump to 4ms.
Our database vendor (Raima) has told us that they can't do anything about it because these slowdowns are caused by the Winsock method closesocket().
So my question is that is it true that Winsock closesocket() may cause these kinds of slowdowns on a local computer? Or is it, after all, just about the database vendor and their slow database driver/server?
Thanks!
I suggest you test your raima (you call it) using performance tools / stress tess tools; test your network using networking tools; install your computer using other database (free edition of SQL server, IBM DB2, MySQL, Oracle and others) and try to connect it and count its performance.
AFAIK it is not winsock implementation, it is computer configuration. If all database is slowdown so you can suspect of winsock.
If all is slow, perhaps you can mention to upgrade hardware.
If all except raima is fast. you can suspect raima database is slow.
If you have difficulty on connection like disconnection as always, just change your LAN cable.
My name is Jason from Raima. Hopefully you received our email about this problem back in 2011, but for anyone else who has experienced this performance problem, we will explain how this was fixed:
The performance problem was in closesocket(), and explained that we could not do anything about this. Immediately afterwards one of our engineers found that we were using the SO_LINGER option, which causes closesocket() to block till all outstanding data has been sent before returning. We removed this option and sent a patch on October 11, 2011. With SO_LINGER turned off, the closesocket() function will still send outstanding data before closing the socket, but it may return before this operation is complete. The patch improved performance in most cases.

Categories

Resources