HttpWebResponse won't scale for concurrent outbound requests

HttpWebResponse won't scale for concurrent outbound requests - c#

I have an ASP.NET 3.5 server application written in C#. It makes outbound requests to a REST API using HttpWebRequest and HttpWebResponse.
I have setup a test application to send these requests on separate threads (to vaguely mimic concurrency against the server).
Please note this is more of a Mono/Environment question than a code question; so please keep in mind that the code below is not verbatim; just a cut/paste of the functional bits.
Here is some pseudo-code:
// threaded client piece
int numThreads = 1;
ManualResetEvent doneEvent;
using (doneEvent = new ManualResetEvent(false))
{
for (int i = 0; i < numThreads; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(Test), random_url_to_same_host);
}
doneEvent.WaitOne();
}
void Test(object some_url)
{
// setup service point here just to show what config settings Im using
ServicePoint lgsp = ServicePointManager.FindServicePoint(new Uri(some_url.ToString()));
// set these to optimal for MONO and .NET
lgsp.Expect100Continue = false;
lgsp.ConnectionLimit = 100;
lgsp.UseNagleAlgorithm = true;
lgsp.MaxIdleTime = 100000;
_request = (HttpWebRequest)WebRequest.Create(some_url);
using (HttpWebResponse _response = (HttpWebResponse)_request.GetResponse())
{
// do stuff
} // releases the response object
// close out threading stuff
if (Interlocked.Decrement(ref numThreads) == 0)
{
doneEvent.Set();
}
}
If I run the application on my local development machine (Windows 7) in the Visual Studio web server, I can up the numThreads and receive the same avg response time with minimal variation whether it's 1 "user" or 100.
Publishing and deploying the application to Apache2 on a Mono 2.10.2 environment, the response times scale almost linearly. (i.e, 1 thread = 300ms, 5 thread = 1500ms, 10 threads = 3000ms). This happens regardless of server endpoint (different hostname, different network, etc).
Using IPTRAF (and other network tools), it appears as though the application only opens 1 or 2 ports to route all connections through and the remaining responses have to wait.
We have built a similar PHP application and deployed in Mono with the same requests and the responses scale appropriately.
I have run through every single configuration setting I can think of for Mono and Apache and the ONLY setting that is different between the two environments (at least in code) is that sometimes the ServicePoint SupportsPipelining=false in Mono, while it is true from my machine.
It seems as though the ConnectionLimit (default of 2) is not being changed in Mono for some reason but I am setting it to a higher value both in code and the web.config for the specified host(s).
Either me and my team are overlooking something significant or this is some sort of bug in Mono.

I believe that you're hitting a bottleneck in the HttpWebRequest. The web requests each use a common service point infrastructure within the .NET framework. This appears to be intended to allow requests to the same host to be reused, but in my experience results in two bottlenecks.
First, the service points allow only two concurrent connections to a given host by default in order to be compliant to the HTTP specification. This can be overridden by setting the static property ServicePointManager.DefaultConnectionLimit to a higher value. See this MSDN page for more details. It looks as if you're already addressing this for the individual service point itself, but due to the concurrency locking scheme at the service point level, doing so may be contributing to the bottleneck.
Second, there appears to be an issue with lock granularity in the ServicePoint class itself. If you decompile and look at the source for the lock keyword, you'll find that it uses the instance itself to synchronize and does so in many places. With the service point instance being shared among web requests for a given host, in my experience this tends to bottleneck as more HttpWebRequests are opened and causes it to scale poorly. This second point is mostly personal observation and poking around the source, so take it with a grain of salt; I wouldn't consider it an authoritative source.
Unfortunately, I did not find a reasonable substitute at the time that I was working with it. Now that the ASP.NET Web API has been released, you may wish to give the HttpClient a look. Hope that helps.

I know this is pretty old but I'm putting this here in case it might help somebody else who runs into this issue. We ran into the same problem with parallel outbound HTTPS requests. There are a few issues at play.
The first issue is that ServicePointManager.DefaultConnectionLimit did not change the connection limit as far as I can tell. Setting this to 50, creating a new connection, and then checking the connection limit on the service point for the new connection says 2. Setting it on that service point to 50 once appears to work and persist for all connections that will end up going through that service point.
The second issue we ran into was with threading. The current implementation of the mono thread pool appears to create at most 2 new threads per second. This is an eternity if you are doing many parallel requests that start at exactly the same time. To counteract this, we tried setting ThreadPool.SetMinThreads to a higher number. It appears that Mono only creates up to 1 new thread when you make this call, regardless of the delta between the current number of threads and the desired number. We were able to work around this by calling SetMinThreads in a loop until the thread pool had the desired number of idle threads.
I opened a bug about the latter issue because that's the one I'm most confident is not working as intended: https://bugzilla.xamarin.com/show_bug.cgi?id=7055

If #jake-moshenko is right about ServicePointManager.DefaultConnectionLimit not having any effect if changed in Mono, please file this as a bug in http://bugzilla.xamarin.com/.
However I would try some things before discarding this completely as a Mono issue:
Try using the SGen garbage collector instead of the old boehm one, by passing --gc=sgen as a flag to mono.
If the above doesn't help, upgrade to Mono 3.2 (which BTW defaults to SGEN GC too), because there has been a lot of fixes since you asked the question.
If the above doesn't help, build your own Mono (master branch), as this important pull request about threading has been merged recently.
If the above doesn't help, build your own Mono with this pull request added. If it fixes your problem, please add a "+1" to the pull request. It might be a fix for bug 7055.

Related

429 Too many requests only production server side, not localhost, not browser

I readed this post: C# (429) Too Many Requests
and i understod the responde code but... why only return this status code when the call is done from server side (backend) and production mode (hosted)? the service never return this code when call (the same service) from chrome's navigate url or when i do the call server side (backend) but my localhost.
CASE 1 (works fine in localhost - the service url is not localhost, is hosted)
App A (localhost) call App B (hosted) --> works fine
for (int i = 0; i < 1000; i++)
{
HttpClient client = new HttpClient();
client.BaseAddress = new Uri(url);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
String response = client.GetStringAsync(urlParameters).Result;
client.Dispose();
}
CASE 2 (work fine)
Chrome navigator call App B (hosted) --> works fine
CASE 3 (similar to case 1 but too less requests - NOT WORK)
App A (hosted) call App B (hosted) --> 429
Why? What is the problem? How can solve it?

What's Happening
The HTTP 429 response code indicates you have been rate limited. The idea is to prevent one caller from overwhelming a service, making it less availabe to other callers.
Most Common
That limiting can be based on many things. Most common are
Number of calls per unit time (usually per second)
Number of concurrent calls
The General Case
A rate limiter may also forgive a short burst of calls that happens occasionally, may allow more calls before hitting the brakes based on who you are (using your IP or an API key for example), dynamically adjust its limits based on total system load, or do other things.
Probably Happening Here
Based on your description, I would guess the number of concurrent calls could be causing production rate limiting. Rather than hitting the external API hard trying to guess what the rules are, try reaching out to them to ask. If that is not an option, running multiple requests in parallel could validate this theory.
Handling
A great way to deal with this is to back off your requests when you receive an HTTP 429.
The service should return a Retry-After header indicating how many seconds you should wait before trying again. If it does, wait that long before resubmitting your request.
If the service does not provide that header (I work with a major one that does not), use exponential backoff instead.
Depending on your needs, you may want to tell your own caller to try again later (return an HTTP 429 yourself) or you may want to queue up pending requests and work off the queue to submit them all.
Preventing
If you know the rate limits, you can pre-emptively limit your outbound call rate so you get into this situation less often.
For call-per-second limits, you can use a counter variable that you reset (in a thread-safe way) every second. If the known call limit would be exceeded, calculate when the counter will reset (store a timestamp when it does) and delay processing that long.
For a concurrent-call limit, a SemaphoreSlim works nicely. Set the maximum count to whatever your concurrent rate limit is. Acquire the semaphore before making a request and release it (in a finally block) after your call completes.
If you have multiple servers subject to the same rate limit (e.g. if rate limiting is based on an API key rather than IP address), it gets harder to self-limit, but you can set self-limiting parameters (calls per second and concurrent calls) in a configuration file, and tune them over time to maximize your throughput without hitting excessive HTTP 429's.

DownloadFileAsync throw Get TimeOut exception

I have put Flurl in high load using DownloadFileAsync method to download files in private network from one server to another and after several hours the method starts to throw exceptions "Get TimeOut". The only solution to solve that is restart application.
downloadUrl.DownloadFileAsync(Helper.CreateTempFolder()).Result;
I have added second method as failover using HTTPClient and its download files fine after flurl fails, so it is not server problem.
private void DownloadFile(string fileUri, string locationToStoreTo)
{
using (var client = new HttpClient())
using (var response = client.GetAsync(new Uri(fileUri)).Result)
{
response.EnsureSuccessStatusCode();
var stream = response.Content.ReadAsStreamAsync().Result;
using (var fileStream = File.Create(locationToStoreTo))
{
stream.CopyTo(fileStream);
}
}
}
Do you have any idea why Get TimeOut error starts popup on high load using the method?
public static Task<string> DownloadFileAsync(this string url, string localFolderPath, string localFileName = null, int bufferSize = 4096, CancellationToken cancellationToken = default(CancellationToken));
The two download code differ only that Flurl re-use HttpClient instance for all request and my code destroy and create new HttpClient object for every new request. I know that creating and destroying HttpClient is time and resource consuming I rather would use Flurl if it would work.

As others point out, you're trying to use Flurl synchronously by calling .Result. This is not supported, and under highly concurrent workloads you're most likely witnessing deadlocks.
The HttpClient solution is using a new instance for every call, and since instances aren't shared at all it's probably less prone to deadlocks. But it's inviting a whole new problem: port exhaustion.
In short, if you want to continue using Flurl then go ahead and do so, especially since you're getting smart HttpClient reuse "for free". Just use it asynchronously (with async/await) as intended. See the docs for more information and examples.

I can think of two or three possibilities (I'm sure there are others that I can't think of as well)
Server IP address has changed.
You wrote that Flurl reuses a HttpClient. I've never used, or even heard of Flurl, so I have no idea how it works. But an HttpClient re-uses a pool of connections, which is why it's efficient to reuse a single instance and why it's critical to do so in a high-volume microservice application, otherwise you're likely to exhaust all ports, but that gives a different error message, not a time out, so I know you haven't hit that case. However, while it's important to re-use an HttpClient in the short term, HttpClient will cache DNS results, which means it's important to dispose and create new HttpClients periodically. In short-lived processes, you can use a static or singleton instance. But in long running processes, you should create a new instance periodically. If you only use it to access one server, that server's DNS TTL is a good value to use.
So, what might be happening is the server changed IP addresses a few hours after your program started, and because Flurl keep reusing the same HttpClient, it doesn't get the new IP address from the DNS entry. One way to check if this is the problem is write the server's IP address to a log at the beginning of the process and when you encounter the problem, check if the IP address is the same or not.
If this is the problem, you can look into ASP.NET Core 2.1's HttpClientFactory. It's a bit awkward to use outside of ASP.NET, but I did once. It gives you re-use of HttpClients, to avoid the TCP port exhaustion problem of using more than 32k HttpClients in 120 seconds, but also avoid DNS caching issues. My memory is that it creates a new HttpClient every 5 minutes by default.
Reaching the maximum connections per server
ServicepointManager.DefaultConnectionLimit sets the maximum number of HTTP connections that a client will open to a server. If your code tries to use more than this simultaneously, the requests that exceed the limit will wait for an existing HTTP client to finish its request, then it will use the newly available connection. However, in the past when I was looking into this, the HTTP timeout started from when the HttpClient's method was called, not when the HttpClient sends the request to the server over a connection. This means that if your limit is 2 and both are used for longer than the timeout period (for example if downloading 2 large files), other requests to download from the same server will time out, even though no http request was ever sent to the server.
So, depending on your application and server, you may be able to use a higher connection limit, otherwise you need to implement request queuing in your app.
Thread pool exhaustion
Async code is awesome for performance when used correctly in highly concurrent, IO bound workloads. I sometimes think it's a bad idea to use anywhere else because it such huge potential for causing weird problems when used incorrectly. Like Crowcoder wrote in a comment on the question, you shouldn't use .Result, or any code that blocks a running thread, when in an async context. Although the code sample you provided says public void DownloadFile(... , if it's actually public async Task DownloadFile(..., or if DownloadFile is called from an async method, then there's real risk of issues. If DownloadFile is not called from an async method, but is called on the thread pool, there's the same risk of errors.
Understanding async is a huge topic, unfortunately with a lot of misinformation on the internet as well, so I can't possibly cover it in detail here. A key thing to note is that async tasks run on the thread pool. So, if you call ThreadPool.QueueUserWorkItem and block the thread that your code runs on, or if you have async tasks that you block on (for example by calling .Result), what could happen is that you block every thread in the thread pool, and when an HTTP response comes back from the network, the .NET run time has no threads available to complete the task. The problem with this idea is that there are also no threads available to signal the timeout, so I don't believe you're exhausting the thread pool (if you were, I would expect a deadlock), but I don't know how timeouts are implemented. If timeouts/timers use a dedicated thread it could be possible for a cancellation token (the thing that signals a timeout) to be set by the timer's thread, and then any code on a blocking wait for either the HTTP response or the cancellation token could be triggered. But thread pool exhaustion generally causes deadlocks, so if you're getting an error back, it's probably not this.
To check if you're having threadpool exhaustion issues, when your program starts getting the timeout errors, get a memory dump of your app (for example using Task Manager). If you have the Enterprise or Ultimate SKU of Visual Studio, you can open/debug the memory dump in VS. Otherwise you'll need to learn how to use windbg (or find another tool). When debugging the memory dump, check the number of threads. If there's a very large number of threads, that's a hint you might be on the right track. Check where the thread was at the time of the memory dump. If they're all in blocking calls like WaitForObject, or something similar, then there's a real risk you've exhausted the thread pool. I've never debugged an async task deadlock/thread pool exhaustion issue before, so I'm not sure if there's a way to get a list of tasks and see from their runstate if they're likely to be deadlocked or not. If you ever see more tasks in the running state than you have cores on your CPU, you almost certainly have blocking in an async task, however.
In summary, you haven't given us enough details to give you an answer that will work with 100% certainty. You need to keep investigating to understand the problem until you can either solve it yourself, or provide us with more information. I've given you some of the most likely causes, but it could very easily be something else completely.

TCP Port Exhaustion with WindowsAzure.Storage in .NET 5.0

I'm running the latest version of the WindowsAzure.Storage library, 6.1.1. This was previously a known issue but is supposed to have been fixed back in .NET 4.5.1. It's exactly the issue I'm having.
I'm hitting a table in Azure Storage with 100m+ rows to insert. I've focused on making the code fast and scalable, it maxes out an Azure D12 VM running Datacenter 2012 R2. I'm seeing 5,000 - 10,000 entities processed per second (read file, process, upload).
Update: This ONLY happens on Azure VMs. On my home system it doesn't occur.
The process always crashes out at ~16,384 batches (around 320,000 records) with a classic port exhaustion error: Only one usage of each socket address (protocol/network address/port) is normally permitted.
I've done the usual things: increase MaxUserPort (64434) and decreased TcpTimedWaitDelay (15 seconds). MaxUserPort seems to be ignored given the suspiciously logical 16,384 it fails at.
Netstat shows that the ports are never being closed in the first place. The state on all of them remains 'Established' untill the process itself is closed, then they disappear.
The actual connection code comes down to:
var acx = CloudStorageAccount.Parse(conn);
var client = acx.CreateCloudTableClient();
var table = client.GetTableReference("Test");
var op = new TableBatchOperation();
foreach (var record in batch) //Batch is just a bunch of entity objects
op.InsertOrReplace(record);
try
{
await table.ExecuteBatchAsync(op, opsConfig, null);
Interlocked.Add(ref totalUploaded, batch.Count);
}
catch...
I've tried every variation I can think of - reusing a pool of TableBatchOperations, having a single table/client/account to creating an object for every hit, and every combination in between.
The problem seems to be lower level than I can get to. When the issue was supposedly fixed two years ago the connections were staying open because the response stream wasn't being read properly.
Grateful for any suggestions! Please just ask if you need more information or clarification.

How to disable Nagle's algorithm in ServiceStack?

We're using ServiceStack 3.9.71.0 and we're currently experiencing unexplained latency issues with clients over a WAN connection.
A reply with a very small payload (<100 bytes) is received after 200ms+.
The round-trip-time (RTT) on the link is about 40ms due to the geographical distance. This has been verified by pinging the other host and using a simple echo service to test the latency of a TCP connection.
Both ping and echo test show latencies which are in line with expectations. Getting a reply from our ServiceStack host takes much longer than expected.
We've verified that:
WAN link is only running at 25% of capacity (no congestion)
No QOS is employed on the WAN link
same host gives fast reply to same request from a different host on local network
delay is not caused by our code processing the request
We've now stumbled across Nagle's algorithm and that it can mean delays for small requests on WAN networks (http://blogs.msdn.com/b/windowsazurestorage/archive/2010/06/25/nagle-s-algorithm-is-not-friendly-towards-small-requests.aspx).
In .NET it can be disabled by setting TcpClient.NoDelay = true (https://msdn.microsoft.com/en-us/en-US/library/system.net.sockets.tcpclient.nodelay(v=vs.110).aspx).
How can this be disabled for ServiceStack's TCP handling?
EDIT: I don't think that this is a duplicate of HttpWebRequest is slow with chunked data. The mentioned question covers HttpWebRequest which isn't used by ServiceStack. ServiceStack uses HttpListener which also happens to be controlled / managed by the mentioned ServicePointManager. We're going to conduct a test to see whether setting ServicePointManager.UseNagleAlgorithm = false solves the issue.

I think you provided an answer in your Update UseNagleAlgorithm = false should solve this issue. But be careful because ServicePointManager.UseNagleAlgorithm = false; is a global settings which means it will turn off this algorithm for all of your endpoint and for all of your requests in the entire App Domain. When you call more than one service endpoints (usually that is the case) with mixed sized of Request it will bite back. So you should consider setting this only for one specific ServicePoint, you can acquire it by:
ServicePoint sp = ServicePointManager.FindServicePoint(<uri>);
sp.UseNagleAlgorithm = false;
and not set it globally
Here is an article about it: https://technet2.github.io/Wiki/blogs/windowsazurestorage/nagles-algorithm-is-not-friendly-towards-small-requests.html

ServicePoint safety checks to prevent blocking on new HttpWebRequests

I'm using a 3rd party library that makes a number of http calls. By decompiling the code, I've determined that it is creating and using raw HttpWebRequest's, all going to a single URL. The issue is that some of the requests don't get closed properly. After some time, all new HttpWebRequest's block forever when the library calls GetRequestStream()* on them. I've determined this blocking is due to the ConnectionLimit on the ServicePoint for that particular host, which has the default value of 2. In other words, the library has opened 2 requests, and then tries to open a 3rd, which blocks.
I want to protect against this blocking. The library is fairly resilient and will reconnect itself, so it's okay if I kill the existing connections it has made. The problem is that I don't have access to any of the HttpWebRequest or HttpWebResponses this library makes. However I do know the URL it accesses and therefore I can access the ServicePoint for it.
var sp = ServicePointManager.FindServicePoint(new Uri("http://UrlThatIKnowAbout.com"));
(Note: KeepAlive is enabled on these HttpWebRequests)

This worked, though I'm not sure it's the best way to solve the problem.
Get the service point object for the url
var sp = ServicePointManager.FindServicePoint(new Uri("http://UrlThatIKnowAbout.com"));
Increase the ConnectionLimit to int.MaxValue
Create a background thread that periodically checks the ConnectionCount on the service point. If it goes above 5, call CloseConnectionGroup()
Set MaxIdleTime to 1 hour (instead of default)
Setting the ConnectionLimit should prevent the blocking. The monitor thread will ensure that too many connections are never active at the same time. Setting MaxIdleTime should serve as a fall back.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.