Azure and local number of TCP connections are very different - c#

I'm now trying to read data from a lot of Azure blobs in parallel using Azure Function and fail to do so, because my service plan does not allow more than ~4000 TCP connections (which I get an error in the portal about), however when I try to run it locally all of the following:
netstat with all possible flags
Wireshark
TCPView
network inspector in Windows task manager
just show a couple dozens of items. Is there a tool or maybe code snippet which will allow me to emulate locally the situation that I have once my app is deployed?
Even better would be knowing if it is possible to somehow limit the number of TCP connections that my Azure Function is trying to open (using .NET Azure SDK, or Azure portal, or some settings.json file or whatever)
Edit1: I've rewritten the whole thing to be sequential and also I've split blob reads into chunks of 100 items, this seemed to somewhat help the number of TCP connections (it's about 500 at peak now, so still a lot, but at least fitting the app service plan, the app, of course, became slow as hell as a result), but it still tries to allocate ~4000 of "Socket handles" and fails, still can't find a way to see locally the same amount of socket handles allocated - Handles column is Details tab of windows task manager shows roughly the same amount of handles during the whole process execution

To answer the question itself: I wasn't able to find a way to see locally the TCP-related metrics that I get when actually running my functions in Azure. For now it feels like some important development tools and/or docs are missing. The "serverless" experience turned out be the deepest dive into Windows system programming I ever had as a .NET developer.
The solution for the problem itself was the following:
I've rewritten the whole thing to be sequential and managed it to get establishing about a hundred simultaneous connections. Then I just used binary search playing with MaxDegreeOfParallelism until I found a value suitable for my plan.

You may be bumping up against the HTTP standard implementation within HttpClient which restricts the number of open connections to 2 by default. The HTTP/1.1 specification limits the number of connections from an application to two connections per server. You can override that default using the DefaultConnectionLimit property of the ServicePointManager. Microsoft has an article on it here.

Related

How to investigate and test IIS connection pooling / port reusing

We are developing a middleware solution in the form of an Azure service, and we're experiencing a port exhaustion issue. As I'm not aware of any tools within Azure that could provide me with some more insights here, I want to do some testing on my local IISExpress.
Our middleware solution (.NET Core Web API) connects with Azure Cosmos DB services and a wide range of other REST APIs. We thought that our code is stable and solid. Making use of the IHttpClientFactory for the Cosmos DB requests, and using RestSharp for all other API requests. But, there must be a leak. Some sub process is creating too many instances of a HttpClient or similar, and that causes messages like
"An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full"
Now, I'm simulating requests via Postman and I'm running some netstat commands at the same time. But I'm not able get the insights I'm looking for. Netstat just keeps listing IP's and port numbers. I don't even see the IPs which are behind the hostnames I'm connecting with.
So I'm a bit lost here.
Is there a way to ask netstat to only show what ports are in use by IISExpress? Or is there an even better way to get some insights on port usage?
What I'm going now is running this command while executing web requests in a loop, and see if the count of TIME_WAIT lines is increasing. But is this a reliable check?
netstat -ano | select-string TIME_WAIT | measure-object
Your question sounds exactly like you are creating a new HttpClient every time you open a connection. This is incorrect behaviour and leads to exactly what you are experiencing.
As per the docs, you create an HttpClient once, and reuse it. There are cases where you may use more than one HttpClient (e.g. one per API you are calling), and sometimes you may need to 'refresh the dns', but generally, you create one and keep using it.
https://learn.microsoft.com/en-us/dotnet/api/system.net.http.httpclient?redirectedfrom=MSDN&view=net-6.0

WCF Service called from SharePoint workflows - Underlying connection was closed errors

I have developed a WCF Web Service that is called from several SharePoint Online workflows. At certain points there could be around 4 users starting up to 10 workflows within a very short time frame: one workflow could possibly make as much as 3 requests to the web service. Needless to say, at certain points, the WCF Service becomes overloaded. When SharePoint workflows make HTTP web service calls and the service is unavailable, the workflow runs into an error and attempts to restart the workflow after a short period of time: which only contributes to making things worse.
These are some of the exceptions logged today from the web service during an approximate 40 minute of "overloading":
Unable to read data from the transport connection: An existing
connection was forcibly closed by the remote host.
The underlying connection was closed: An unexpected error occurred on
a receive.
The underlying connection was closed: A connection that was expected
to be kept alive was closed by the server.
I have tried to look into ways to avoid the WCF web service from malfunctioning when several requests are being made and besides the obvious actions of finding ways to decrease the amount of calls made to the web service (which is not always an option), I came into the terms: WCF Concurrency Modes and Throttling Limits.
Given the scenario described above, could anyone guide me into the right direction as to which Concurrency Mode and Throttling limits would be most ideal? Presently, my WCF service has default configuration.
Concurrency Modes can be:
Single or
Multiple or
Reentrant
Throttling Limit options are shown below:
<serviceThrottling maxConcurrentCalls="Integer"
maxConcurrentInstances="Integer"
maxConcurrentSessions="Integer" />
I am still quite new to this area of programming and am finding it a tad complicated, so any help would be greatly appreciated!
Update: The SharePoint system is highly customised and it covers a Business process that is quite complicated. The Web Service methods are varied and it would take me a long time to explain what every method does but I will mention some examples. The web service is used for operations that either cannot be done (easily or at all) using out of the box SharePoint designer actions. For example: moving documents and copying metadata from one folder to another (in the same or different lists), syncing information between lists/libraries, calculating values based on metadata of several documents living within a given folder, scheduling data into an external database to be used with other components such as a console application running as a scheduled task, etc.
The web service calls take an average of 2 minutes to execute and return a value. The fastest methods take around 30 seconds, and the slowest around 4 minutes. Both the slow and fast methods are frequently utilised.
Your problem could be caused by a number of things, and you need to gather more information in order for anyone to be helpful to you.
With that said, the best I can do here is give you some pointers on how to gather such information, such as:
Turn on WCF tracing and try to understand when does the error occur on Sharepoint side. Does the error occur while the webservice is processing the request, after, or does it never receive the request in the first place?
If this tracing doesn't give you much answers, write code in your webservices to Trace specific messages to give you more information on what the webservice is doing and what it is receiving/returning from/to Sharepoint, or use your preferred logging library.
In specific cases, the EventViewer might have some information on what is happening. Check for any messages that show up at a similar time of when the error occurs on the client.
At last, relaxing your serviceThrottling settings might mitigate some of your issues, but won't solve them.
If you have alot of I/O operations in your webservices (access to Databases, Filesystem or other Webservices) you might improve your webservices performance by using asynchronous I/O, using the TPL framework.
If you are returning a lot of data from your webservice (like a big object, an object with cyclic references, or a big file), this might be also the reason why the server is forcing the connections to be closed.
Hope this helps you in solving your issue.

Android app disconnects from WCF web service randomly

We have designed an Android and iOS application for a client using their WCF created backend.
We have a method that allows users to checkin for their appointments if they are with in a certain geo location.
Both apps are able to find the server and grab data however the android application only works about 50% of the time (the iOS version works 100%). I have tried on Wifi and Cellular and get the same results.
The biggest issue is I can't even determine if the issue is with the client or server or how should i handle it?
I have read that it could be due to an unclosed httpURLConnection but thats my only real lead at the moment.
Please help!!!
Thanks
Since it works half of the time on android and for iOS always, the problem can be only the network problem by my opinion. It just happens in some network environments and on some of the devices or on some of the device with certain android-java version. I think you will not be able to do anything.
When I started to have those problems, I just always bumped into a simple fact about fallacy of network programming and CAP theorem: Network is reliable. So the fact is that network is NOT reliable and you should consider it in your solutions.
http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing
http://en.wikipedia.org/wiki/CAP_theorem
It could be that this problem sometimes gets also into succeeded request but no response problem. Again almost the same network failure problem as above and what you can do is handle the possible timeout exceptions etc. and handle it appropriately on the server and client. So in case of having methods like submittingSomePayment, not use a normal WCF methods, but use enterprise service buses which store the messages into queues and process them and then have additional call that checks the results. See CQRS pattern for example.
http://msdn.microsoft.com/en-us/library/jj591573.aspx
Not very helpful, I know. But a possible answer and my thought about it. Because it annoyed me too. So it could be that you will not be able to do anything about it (except end users handling their network issue by themselves).
I would also log the request in the core beginning of the WCF service method call to exclude any internal WCF behavior (to exclude guessing is a problem in your code). All possible Logging on server and client would be a possible way to go in case you are still convinced the problem is in a code. And also see is this happening on all android devices (LG,... etc.) or only certain with certain java versions.

EWS Managed API: Only one usage of each socket address (protocol/network address/port) is normally permitted

We're developing software that allows our custom scheduling application (master role) to synchronize with Microsoft Exchange Server 2010/2007 (slave role). Our solution is based on .NET 4.0, the EWS Managed API and Parallel Fx, and of course our own C# code. We've taken special care of the fact that "ExchangeService" class instances are not used by multiple threads concurrently and are conservative regarding the total number of instances (we currently have 10 live instances at any given time). We do however make a lot of calls (FindItems, CreateItems, UpdateItems, DeleteItems, LoadPropertiesForItems).
Digging deeper, we've found that this approach doesn't buy us really much. Whenever an operation is issued a new HttpWebRequest is created, executed and closed. For authenticated requests (in our case: https + WebCredentials) it appears that the underlying tcp connection is returned to the OS with a state of TIME_WAIT (as described in this article: msdn.microsoft.com/en-us/library/aa560610(BTS.10).aspx) and sits there, doing nothing for 4 minutes (by default). We already applied the suggestions in the above article (further discussion can be found here: blogs.msdn.com/b/dgorti/archive/2005/09/18/470766.aspx), and reduced the "sits there" time to 30 seconds. This is all fine in a test environment, but not on a production system where tcp connections are a more scarce resource and our app is not the only one using them.
I think the problem has to do with how the EWS Managed API is using HttpWebRequest (or even deeper in System.Net). We wanted to toy with msdn.microsoft.com/en-us/library/system.net.httpwebrequest.unsafeauthenticatedconnectionsharing.aspx and msdn.microsoft.com/en-us/library/6y3d5dts.aspx but since everything in the EWS Managed API is either internal or sealed, it becomes difficult to extend/try this out.
Advice on how to proceed welcome!

Can a TCP/IP Stack be killed programmatically?

Our server application is listening on a port, and after a period of time it no longer accepts incoming connections. (And while I'd love to solve this issue, it's not what I'm asking about here;)
The strange this is that when our app stops accepting connections on port 44044, so does IIS (on port 8080). Killing our app fixes everything - IIS starts responding again.
So the question is, can an application mess up the entire TCP/IP stack? Or perhaps, how can an application do that?
Senseless detail: Our app is written in C#, under .Net 2.0, on XP/SP2.
Clarification: IIS is not "refusing" the attempted connections. It is never seeing them. Clients are getting a "server did not respond in a timely manner" message (using the .Net TCP Client.)
You may well be starving the stack. It is pretty easy to drain in a high open/close transactions per second environment e.g. webserver serving lots of unpooled requests.
This is exhacerbated by the default TIME-WAIT delay - the amount of time that a socket has to be closed before being recycled defaults to 90s (if I remember right)
There are a bunch of registry keys that can be tweaked - suggest at least the following keys are created/edited
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\Tcpip\Parameters
TcpTimedWaitDelay = 30
MaxUserPort = 65534
MaxHashTableSize = 65536
MaxFreeTcbs = 16000
Plenty of docs on MSDN & Technet about the function of these keys.
You haven't maxed out the available port handles have you ?
netstat -a
I saw something similar when an app was opening and closing ports (but not actually closing them correctly).
Use netstat -a to see the active connections when this happens. Perhaps, your server app is not closing/disposing of 'closed' connections.
Good suggestions from everyone, thanks for your help.
So here's what was going on:
It turns out that we had several services competing for the same port, and most of the time the "proper" service would get the port. Occasionally a second service would grab the port away, and the first service would try to open a different port. From that time on, the services would keep grabbing new ports every time they serviced a request (since they weren't using their preferred ports) and eventually we would exhaust all available ports.
Of course, the actual question was: "Can an application mess up the entire TCP/IP stack?", and the answer to that question is: Yes. One way to do it is to listen on a whole bunch of ports.
I guess the port number comment from RichS is correct.
Other than that, the TCP/IP stack is just a module in your operating system and, as such, can have bugs that might allow an application to kill it. It wouldn't be the first driver to be killed by a program.
(A tip to the hat towards Andrew Tanenbaum for insisting that operating systems should be modular instead of monolithic.)
I've been in a couple of similar situations myself. A good troubleshooting step is to attempt a connection from the affected machine to good known destination that isn't at that moment experiencing any connectivity issues. If the connection attempt fails, you are very likely to get more interesting details in the error message/code. For example, it could say that there aren't enough handles, or memory.
From a support and sys admin standpoint, I have only seen this on the rarest of occasions (more than once), but it certainly can happen.
When you are diagnosing the problem, you should carefully eliminate the possible causes, rather than blindly rebooting the system at the first sign of trouble. I only say this because many customers I work with are tempted to do that.

Categories

Resources