How to investigate and test IIS connection pooling / port reusing - c#

We are developing a middleware solution in the form of an Azure service, and we're experiencing a port exhaustion issue. As I'm not aware of any tools within Azure that could provide me with some more insights here, I want to do some testing on my local IISExpress.
Our middleware solution (.NET Core Web API) connects with Azure Cosmos DB services and a wide range of other REST APIs. We thought that our code is stable and solid. Making use of the IHttpClientFactory for the Cosmos DB requests, and using RestSharp for all other API requests. But, there must be a leak. Some sub process is creating too many instances of a HttpClient or similar, and that causes messages like
"An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full"
Now, I'm simulating requests via Postman and I'm running some netstat commands at the same time. But I'm not able get the insights I'm looking for. Netstat just keeps listing IP's and port numbers. I don't even see the IPs which are behind the hostnames I'm connecting with.
So I'm a bit lost here.
Is there a way to ask netstat to only show what ports are in use by IISExpress? Or is there an even better way to get some insights on port usage?
What I'm going now is running this command while executing web requests in a loop, and see if the count of TIME_WAIT lines is increasing. But is this a reliable check?
netstat -ano | select-string TIME_WAIT | measure-object

Your question sounds exactly like you are creating a new HttpClient every time you open a connection. This is incorrect behaviour and leads to exactly what you are experiencing.
As per the docs, you create an HttpClient once, and reuse it. There are cases where you may use more than one HttpClient (e.g. one per API you are calling), and sometimes you may need to 'refresh the dns', but generally, you create one and keep using it.
https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?redirectedfrom=MSDN&view=net-6.0

Related

Azure and local number of TCP connections are very different

I'm now trying to read data from a lot of Azure blobs in parallel using Azure Function and fail to do so, because my service plan does not allow more than ~4000 TCP connections (which I get an error in the portal about), however when I try to run it locally all of the following:
netstat with all possible flags
Wireshark
TCPView
network inspector in Windows task manager
just show a couple dozens of items. Is there a tool or maybe code snippet which will allow me to emulate locally the situation that I have once my app is deployed?
Even better would be knowing if it is possible to somehow limit the number of TCP connections that my Azure Function is trying to open (using .NET Azure SDK, or Azure portal, or some settings.json file or whatever)
Edit1: I've rewritten the whole thing to be sequential and also I've split blob reads into chunks of 100 items, this seemed to somewhat help the number of TCP connections (it's about 500 at peak now, so still a lot, but at least fitting the app service plan, the app, of course, became slow as hell as a result), but it still tries to allocate ~4000 of "Socket handles" and fails, still can't find a way to see locally the same amount of socket handles allocated - Handles column is Details tab of windows task manager shows roughly the same amount of handles during the whole process execution
To answer the question itself: I wasn't able to find a way to see locally the TCP-related metrics that I get when actually running my functions in Azure. For now it feels like some important development tools and/or docs are missing. The "serverless" experience turned out be the deepest dive into Windows system programming I ever had as a .NET developer.
The solution for the problem itself was the following:
I've rewritten the whole thing to be sequential and managed it to get establishing about a hundred simultaneous connections. Then I just used binary search playing with MaxDegreeOfParallelism until I found a value suitable for my plan.
You may be bumping up against the HTTP standard implementation within HttpClient which restricts the number of open connections to 2 by default. The HTTP/1.1 specification limits the number of connections from an application to two connections per server. You can override that default using the DefaultConnectionLimit property of the ServicePointManager. Microsoft has an article on it here.

How does WCF's ClientBase<TChannel> handle TCP connections when Disposed?

A while ago I came across an interesting article explaining that putting HttpClient in a using block will dispose of the object when the code has executed but not close the TCP socket and the TCP state will eventually go to TIME_WAIT and stay in that state listing for further activity for 4 minutes (default).
So basically using this multiple times:
using(var client = new HttpClient())
{
//do something with http client
}
results in many open TCP connections sitting in TIME_WAIT.
You can read the whole thing here:
You're using HttpClient wrong and it is destabilizing your software
So I was wondering what would happen if I did the same with the ClientBase<TChannel> derived service class created by Visual Studio when you right-click a project and select add Add Service Reference . . and implemented this:
//SomeServiceOutThere inherits from ClientBase
using (var service = new SomeServiceOutThere())
{
var serviceRequestParameter = txtInputBox.Text;
var result = service.BaddaBingMethod(serviceRequestParameter);
//do some magic (see Fred Brooks quote)
}
However, I haven't been able to recreate exactly the same behavior, and I wonder why.
I created a small desktop app and added a reference to a IIS hosted WCF service.
Next I added a button that basically calls the code via the using code block like I showed above.
After hitting the service the first time, I run netsat for the IP and this is the result:
So far so good. I clicked the button again, and sure enough, new connection established, while the first one went into TIME_WAIT state:
However, after this, every time I hit the service it would use the ESTABLISHED connection, and not open any more like in the HttpClient demo (even when passing different parameters to the service, but keeping the app running).
It seems that WCF is smart enough to realize there is already an established connection to the server, and uses that.
The interesting part is that, when I repeated the process above, but stopped and restarted the application between each call to the service, I did get the same behavior as with HttpClient:
There are some other potential problems with ClientBase (e.g. see here), and I know that temporarily open sockets may not be an issue at all if traffic to the service is relatively low or the server is setup for a large number of maximum connections, but I would still like to be able to reliably test whether this could be a problem or not, and under what conditions (e.g. a running windows service hitting the WCF service vs. a Desktop application).
Any thoughts?
WCF does not use HttpClient internally. WCF probably uses HttpWebRequest because that API was available at the time and it's likely a bit faster since HttpClient is a wrapper around it.
WCF is meant for high performance use cases so they made sure that HTTP connections are reused. Not reusing connections by default is, in my mind, unacceptable. This is either a bug or a design problem with HttpClient.
The 4.6.2 Desktop .NET Framework contains this line in HttpClienthandler.Dispose:
ServicePointManager.CloseConnectionGroups(this.connectionGroupName);
Since this code is not in CoreClr there is no documentation for it. I don't know why this was added. It even has a bug because of this.connectionGroupName = RuntimeHelpers.GetHashCode(this).ToString(NumberFormatInfo.InvariantInfo); in the ctor. Two connectionGroupNames can clash. This is a terrible way obtaining random numbers that are supposed to be unique.
If you restart the process there is no way to reuse existing connections. That's why you are seeing the old connections in a TIME_WAIT state. The two processes are unrelated. For what the code in them (and the OS) knows they are not cooperating in any way. It's also hard to save a TCP connection across process restarts (but possible). No app that I know of does this.
Are you starting processes so often that this might become a problem? Unlikely, but if yes you can apply one of the general workaround such as reducing TIME_WAIT duration.
Replicating this is easy: Just start 100k test processes in a loop.

WCF or Custom Socket Architecture

I'm writing a client/server architecture where there are going to be possibly hundreds of clients over multiple virtual machines, mostly on the intranet but some in other locations.
Each client will be gathering data constantly and sending a message to a server every second or so. Each message will probably be about 128 characters or so in length.
My question is, for this architecture where I am writing both client/server in .NET is should I go with WCF or some socket code I've written previously. I need scalability (which the socket code has in mind), reliability and just the ability to handle that many messages.
I would not make final decision without peforming some proof of concept. Create very simple service, host it and use some stress test to get real performance results. Than validate results against your requirements. You have mentioned amount of messages but you didn't mentioned expected response time. There is currently discussed similar question on MSDN forum which complains about slow response time of WCF compared to sockets.
Other requirements are not directly mentioned in your post so I will make some assumption for best performance:
Use netTcpBinding - best performance, binary encoding, requires .NET server / clients. I guess you are going to use Net.Tcp because your other choice was direct socket programming.
Don't use security if you don't have to - reduces performance. Probably not possible for clients outside your intranet.
Reuse proxy on clients if possible. Openning TCP connection is expensive if you reuse the same proxy you will have single connection per proxy. This will affect instancing of you services - by default single service instance will handle all requests from single proxy.
Set service throttling so that your service host is ready for many clients
Also you should make some decisions about load balancing. Load balancing for WCF net.tcp connections requires sticky sessions (session affinity) so that after openning the channel client always calls the service on the same server (bacause instance of that service was created only on single server).
100 requests per second does not sound like much for a WCF service, especially with that little payload. But it should be quite quick to setup a simple setup with a WCF service with one echo method just returning the input and then hook up a client with a bunch of threads and a loop.
If you already have a working socket implementation you might keep it, but otherwise you can pick WCF and spend your precious development time elsewhere.
From my experience with WCF, i can tell you that it's performance on high load is very very nice. Especially you can chose between several bindings to achieve your requirements for the different scenarios (httpBinding for outside communication, netPeerTcpBinding in local network e.g.).

Azure: Will it work for my App?

I'm creating an application that I want to put into the cloud. This application has one main function.
It hosts socket CLIENT sessions on behalf of other users (think of Beejive IM for the iPhone, where it hosts IM sessions for clients to maintain state on those IM networks, allowing the client to connect/disconnect at will, without breaking the IM network connection).
Now, the way I've planned it now, is that one 'worker instance' can likely only handle a finite number of client sessions (let's say 50,000 for argument sake). Those sessions will be very long lived worker tasks.
The issue I'm trying to get my head around is that I will sometimes need to perform tasks to specific client sessions (eg: If I need to disconnect a client session). With Azure, would I be able to queue up a smaller task that only the instance hosting that specific client session would be able to dequeue?
Right now I'm contemplating GoGrid as my provider, and I solve this issue by using Apache's Active Messaging Queue software. My web app enqueues 'disconnect' tasks that are assigned to a specific instance Id. Each client session is therefore assigned to a specific instance id. The instance then only dequeues 'disconnect' tasks that are assigned to it.
I'm wondering if it's feasible to do something similar on Azure, and how I would generally do it. I like the idea of not having to setup many different VM's to scale, but instead just deploying a single package. Also, it would be nice to make use of Azure's Queues instead of integrating a third party product such as Apache ActiveMQ, or even MSMQ.
I'd be very concerned about building a production application on Azure until the feature set, pricing, and licensing terms are finalized. For starters, you can't even do a cost comparison between it and e.g. GoGrid or EC2 or Mosso. So I don't see how it could possibly end up a front-runner. Also, we know that all of these systems will have glitches as they mature. Amazon's services are in much wider use than any of the others, and have been publicly available for much years. IMHO choosing Azure is a recipe for pain as they stabilize.
Have you considered Amazon's Simple Queue Service for queueing?
I think you can absolutely use Windows Azure for this. My recommendation would be to create a queue for each session you're tracking. Then enqueue the disconnect message (for example) on the queue for that session. The worker instance that's handling that connection should be the only one polling that queue, so it should handle performing the task on that connection.
Regarding the application hosting socket connections for clients to connect to, I'd double-check on what's allowed as I think only HTTP and HTTPS connections are allowed to be made with Azure.

web service slowdown

I have a web service slowdown.
My (web) service is in gsoap & managed C++. It's not IIS/apache hosted, but speaks xml.
My client is in .NET
The service computation time is light (<0.1s to prepare reply). I expect the service to be smooth, fast and have good availability.
I have about 100 clients, response time is 1s mandatory.
Clients have about 1 request per minute.
Clients are checking web service presence by tcp open port test.
So, to avoid possible congestion, I turned gSoap KeepAlive to false.
Until there everything runs fine : I bearly see connections in TCPView (sysinternals)
New special synchronisation program now calls the service in a loop.
It's higher load but everything is processed in less 30 seconds.
With sysinternals TCPView, I see that about 1 thousands connections are in TIME_WAIT.
They slowdown the service and It takes seconds for the service to reply, now.
Could it be that I need to reset the SoapHttpClientProtocol connection ?
Someone has TIME_WAIT ghosts with a web service call in a loop ?
Sounds like you aren't closing the connection after the call and opening new connections on each request. Either close the connection or reuse the open connections.
Be very careful with the implementations mentioned above. There are serious problems with them.
The implementation described in yakkowarner.blogspot.com/2008/11/calling-web-service-in-loop.html (COMMENT ABOVE):
PROBLEM: All your work will be be wiped out the next time you regenerate the web service using wsdl.exe and you are going to forget what you did not to mention that this fix is rather hacky relying on a message string to take action.
The implementation described in forums.asp.net/t/1003135.aspx (COMMENT ABOVE):
PROBLEM: You are selecting an endpoint between 5000 and 65535 so on the surface this looks like a good idea. If you think about it there is no way (at least none I can think of) that you could reserve ports to be used later. How can you guarantee that the next port on your list is not currently used? You are sequentially picking up ports to use and if some other application picks a port that is next on your list then you are hosed. Or what if some other application running on your client machine starts using random ports for its connections - you would be hosed at UNPREDICTABLE points in time. You would RANDOMLY get an error message like "remote host can't be reached or is unavailable" - even harder to troubleshoot.
Although I can't give you the right solution to this problem, some things you can do are:
Try to minimize the number of web service requests or spread them out more over a longer period of time
For your type of app maybe web services wasn't the correct architecture - for something with 1ms response time you should be using a messaging system - not a web service
Set your OS's number of connections allowed to 65K using the registry as in Windows
Set you OS's time that sockets remain in TIME_WAIT to some lower number (this presents its own list of problems)

Categories

Resources