Introduction
I'm making an application that crawls files/websites for proxy ips/ports. The implementation is purely asynchronous, to make it flow with the speed I want. The whole purpose of the application is to find proxies that's "alive".
To check whether a proxy is alive, I'm making a HttpWebRequest to a specific website, and if successful it's alive, if not then dead.
The Problem
An example:
First run:
Crawls a text file with 30k proxies.
Makes a HttpWebRequest with each proxy, to check if it's "alive".
Finds 30 proxies that's alive.
Second run:
Crawls the same text file with 30k proxies.
Makes a HttpWebRequest with each proxy, to check if it's "alive".
Finds 0 proxies that's alive.
If I wait approx 10min, then the second run will produce ~30 proxies that's alive too. Else it will simply timeout on them, and mark them as dead.
As you probably already have figured out, my question is why the second run doesn't produce the same or close to the result of the first run.
I've changed following:
ServicePointManager.DefaultConnectionLimit = int.MaxValue, and ServicePointManager.MaxServicePoints = int.MaxValue
Is this a basic limitation of my network, or is something else playing in?
Have you ensured that you are closing the first web request?
Related
I readed this post: C# (429) Too Many Requests
and i understod the responde code but... why only return this status code when the call is done from server side (backend) and production mode (hosted)? the service never return this code when call (the same service) from chrome's navigate url or when i do the call server side (backend) but my localhost.
CASE 1 (works fine in localhost - the service url is not localhost, is hosted)
App A (localhost) call App B (hosted) --> works fine
for (int i = 0; i < 1000; i++)
{
HttpClient client = new HttpClient();
client.BaseAddress = new Uri(url);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
String response = client.GetStringAsync(urlParameters).Result;
client.Dispose();
}
CASE 2 (work fine)
Chrome navigator call App B (hosted) --> works fine
CASE 3 (similar to case 1 but too less requests - NOT WORK)
App A (hosted) call App B (hosted) --> 429
Why? What is the problem? How can solve it?
What's Happening
The HTTP 429 response code indicates you have been rate limited. The idea is to prevent one caller from overwhelming a service, making it less availabe to other callers.
Most Common
That limiting can be based on many things. Most common are
Number of calls per unit time (usually per second)
Number of concurrent calls
The General Case
A rate limiter may also forgive a short burst of calls that happens occasionally, may allow more calls before hitting the brakes based on who you are (using your IP or an API key for example), dynamically adjust its limits based on total system load, or do other things.
Probably Happening Here
Based on your description, I would guess the number of concurrent calls could be causing production rate limiting. Rather than hitting the external API hard trying to guess what the rules are, try reaching out to them to ask. If that is not an option, running multiple requests in parallel could validate this theory.
Handling
A great way to deal with this is to back off your requests when you receive an HTTP 429.
The service should return a Retry-After header indicating how many seconds you should wait before trying again. If it does, wait that long before resubmitting your request.
If the service does not provide that header (I work with a major one that does not), use exponential backoff instead.
Depending on your needs, you may want to tell your own caller to try again later (return an HTTP 429 yourself) or you may want to queue up pending requests and work off the queue to submit them all.
Preventing
If you know the rate limits, you can pre-emptively limit your outbound call rate so you get into this situation less often.
For call-per-second limits, you can use a counter variable that you reset (in a thread-safe way) every second. If the known call limit would be exceeded, calculate when the counter will reset (store a timestamp when it does) and delay processing that long.
For a concurrent-call limit, a SemaphoreSlim works nicely. Set the maximum count to whatever your concurrent rate limit is. Acquire the semaphore before making a request and release it (in a finally block) after your call completes.
If you have multiple servers subject to the same rate limit (e.g. if rate limiting is based on an API key rather than IP address), it gets harder to self-limit, but you can set self-limiting parameters (calls per second and concurrent calls) in a configuration file, and tune them over time to maximize your throughput without hitting excessive HTTP 429's.
My understanding is the point of Task is to abstract out threads, and that a new thread is not guaranteed per Task.
I'm debugging in VS2010, and I have something similar to this:
var request = WebRequest.Create(URL);
Task.Factory.FromAsync<WebResponse>(
request.BeginGetResponse,
request.EndGetResponse).ContinueWith(
t => { /* ... Stuff to do with response ... */ });
If I make X calls to this, e.g. start up X async web requests, how am I to calculate how many simultaneous (concurrent) connections are actually being made at any given time during execution? I assume that somehow it is opening only the max it can (in the case X is very high), and the other Tasks are blocked while waiting?
Any insight into this or how I can check with the debugger to determine how many active (open) connections are existent at a given point in execution would be great.
Basically, I'm wondering if it's handled for me, or if I have to take special consideration so that I do not appear to be attacking a server?
This won't really be specific to Task. The external connection is created as soon as you make your call to Task.Factory.FromAsync. The "task" that the Task is performing is simply waiting for the response to get back (not for it to be sent in the first place). Thus the call to BeginGetResponse will fail if your machine is unable to send any more requests, and the response will contain an error message if the server is rejecting your requests due to their belief that you are flooding them.
The only real place that Task comes into play here is the amount of time between when the response is actually received by the machine and when your continuation runs. If you are getting lots of responses, or otherwise have lots of work in the thread pool, it could take some time for it to get to your continuation.
I have an ASP.NET 3.5 server application written in C#. It makes outbound requests to a REST API using HttpWebRequest and HttpWebResponse.
I have setup a test application to send these requests on separate threads (to vaguely mimic concurrency against the server).
Please note this is more of a Mono/Environment question than a code question; so please keep in mind that the code below is not verbatim; just a cut/paste of the functional bits.
Here is some pseudo-code:
// threaded client piece
int numThreads = 1;
ManualResetEvent doneEvent;
using (doneEvent = new ManualResetEvent(false))
{
for (int i = 0; i < numThreads; i++)
{
ThreadPool.QueueUserWorkItem(new WaitCallback(Test), random_url_to_same_host);
}
doneEvent.WaitOne();
}
void Test(object some_url)
{
// setup service point here just to show what config settings Im using
ServicePoint lgsp = ServicePointManager.FindServicePoint(new Uri(some_url.ToString()));
// set these to optimal for MONO and .NET
lgsp.Expect100Continue = false;
lgsp.ConnectionLimit = 100;
lgsp.UseNagleAlgorithm = true;
lgsp.MaxIdleTime = 100000;
_request = (HttpWebRequest)WebRequest.Create(some_url);
using (HttpWebResponse _response = (HttpWebResponse)_request.GetResponse())
{
// do stuff
} // releases the response object
// close out threading stuff
if (Interlocked.Decrement(ref numThreads) == 0)
{
doneEvent.Set();
}
}
If I run the application on my local development machine (Windows 7) in the Visual Studio web server, I can up the numThreads and receive the same avg response time with minimal variation whether it's 1 "user" or 100.
Publishing and deploying the application to Apache2 on a Mono 2.10.2 environment, the response times scale almost linearly. (i.e, 1 thread = 300ms, 5 thread = 1500ms, 10 threads = 3000ms). This happens regardless of server endpoint (different hostname, different network, etc).
Using IPTRAF (and other network tools), it appears as though the application only opens 1 or 2 ports to route all connections through and the remaining responses have to wait.
We have built a similar PHP application and deployed in Mono with the same requests and the responses scale appropriately.
I have run through every single configuration setting I can think of for Mono and Apache and the ONLY setting that is different between the two environments (at least in code) is that sometimes the ServicePoint SupportsPipelining=false in Mono, while it is true from my machine.
It seems as though the ConnectionLimit (default of 2) is not being changed in Mono for some reason but I am setting it to a higher value both in code and the web.config for the specified host(s).
Either me and my team are overlooking something significant or this is some sort of bug in Mono.
I believe that you're hitting a bottleneck in the HttpWebRequest. The web requests each use a common service point infrastructure within the .NET framework. This appears to be intended to allow requests to the same host to be reused, but in my experience results in two bottlenecks.
First, the service points allow only two concurrent connections to a given host by default in order to be compliant to the HTTP specification. This can be overridden by setting the static property ServicePointManager.DefaultConnectionLimit to a higher value. See this MSDN page for more details. It looks as if you're already addressing this for the individual service point itself, but due to the concurrency locking scheme at the service point level, doing so may be contributing to the bottleneck.
Second, there appears to be an issue with lock granularity in the ServicePoint class itself. If you decompile and look at the source for the lock keyword, you'll find that it uses the instance itself to synchronize and does so in many places. With the service point instance being shared among web requests for a given host, in my experience this tends to bottleneck as more HttpWebRequests are opened and causes it to scale poorly. This second point is mostly personal observation and poking around the source, so take it with a grain of salt; I wouldn't consider it an authoritative source.
Unfortunately, I did not find a reasonable substitute at the time that I was working with it. Now that the ASP.NET Web API has been released, you may wish to give the HttpClient a look. Hope that helps.
I know this is pretty old but I'm putting this here in case it might help somebody else who runs into this issue. We ran into the same problem with parallel outbound HTTPS requests. There are a few issues at play.
The first issue is that ServicePointManager.DefaultConnectionLimit did not change the connection limit as far as I can tell. Setting this to 50, creating a new connection, and then checking the connection limit on the service point for the new connection says 2. Setting it on that service point to 50 once appears to work and persist for all connections that will end up going through that service point.
The second issue we ran into was with threading. The current implementation of the mono thread pool appears to create at most 2 new threads per second. This is an eternity if you are doing many parallel requests that start at exactly the same time. To counteract this, we tried setting ThreadPool.SetMinThreads to a higher number. It appears that Mono only creates up to 1 new thread when you make this call, regardless of the delta between the current number of threads and the desired number. We were able to work around this by calling SetMinThreads in a loop until the thread pool had the desired number of idle threads.
I opened a bug about the latter issue because that's the one I'm most confident is not working as intended: https://bugzilla.xamarin.com/show_bug.cgi?id=7055
If #jake-moshenko is right about ServicePointManager.DefaultConnectionLimit not having any effect if changed in Mono, please file this as a bug in http://bugzilla.xamarin.com/.
However I would try some things before discarding this completely as a Mono issue:
Try using the SGen garbage collector instead of the old boehm one, by passing --gc=sgen as a flag to mono.
If the above doesn't help, upgrade to Mono 3.2 (which BTW defaults to SGEN GC too), because there has been a lot of fixes since you asked the question.
If the above doesn't help, build your own Mono (master branch), as this important pull request about threading has been merged recently.
If the above doesn't help, build your own Mono with this pull request added. If it fixes your problem, please add a "+1" to the pull request. It might be a fix for bug 7055.
I'm using a 3rd party library that makes a number of http calls. By decompiling the code, I've determined that it is creating and using raw HttpWebRequest's, all going to a single URL. The issue is that some of the requests don't get closed properly. After some time, all new HttpWebRequest's block forever when the library calls GetRequestStream()* on them. I've determined this blocking is due to the ConnectionLimit on the ServicePoint for that particular host, which has the default value of 2. In other words, the library has opened 2 requests, and then tries to open a 3rd, which blocks.
I want to protect against this blocking. The library is fairly resilient and will reconnect itself, so it's okay if I kill the existing connections it has made. The problem is that I don't have access to any of the HttpWebRequest or HttpWebResponses this library makes. However I do know the URL it accesses and therefore I can access the ServicePoint for it.
var sp = ServicePointManager.FindServicePoint(new Uri("http://UrlThatIKnowAbout.com"));
(Note: KeepAlive is enabled on these HttpWebRequests)
This worked, though I'm not sure it's the best way to solve the problem.
Get the service point object for the url
var sp = ServicePointManager.FindServicePoint(new Uri("http://UrlThatIKnowAbout.com"));
Increase the ConnectionLimit to int.MaxValue
Create a background thread that periodically checks the ConnectionCount on the service point. If it goes above 5, call CloseConnectionGroup()
Set MaxIdleTime to 1 hour (instead of default)
Setting the ConnectionLimit should prevent the blocking. The monitor thread will ensure that too many connections are never active at the same time. Setting MaxIdleTime should serve as a fall back.
I'd like to implement a WebService containing a method whose reply will be delayed for less than 1 second to about an hour (it depends if the data is already cached or neeeds to be fetched).
Basically my question is what would be the best way to implement this if you are only able to connect from the client to the WebService (no notification possible)?
AFAIK this will only be possible by using some kind of polling. But polling is bad and so I'd rather like to avoid using it. The other extreme could be to just let the connection stay open as long as the method isn't done. But i guess this could end up in slowing down the webserver and the network. I considerd to combine these two technics. Then the client would call the method and the server will return after at least 10 seconds either with the message that the client needs to poll again or the actual result.
What are your thoughts?
You probably want to have a look at comet
I would suggest a sort of intelligent polling, if possible:
On first request, return a token to represent the request. This is what gets presented in future requests, so it's easy to check whether or not that request has really completed.
On future requests, hold the connection open for a certain amount of time (e.g. a minute, possibly specified on the client) and return either the result or a result of "still no results; please try again at X " where X is the best guess you have about when the response will be completed.
Advantages:
You allow the client to use the "hold a connection open" model which is relatively expensive (in terms of connections) but allows the response to be served as soon as it's ready. Make sure you don't hold onto a thread each connection though! (And have some kind of time limit...)
By saying when the client should come back, you can implement a backoff policy - even if you don't know when it will be ready, you could have a "backoff for 1, 2, 4, 8, 16, 30, 30, 30, 30..." minutes policy. (You should potentially check that the client isn't ignoring this.) You don't end up with masses of wasted polls for long misses, but you still get quick results quickly.
I think that for something which could take an hour to respond a web service is not the best mechanism to use.
Why is polling bad? Surely if you adjust the frequency of the polling it won't be so bad. Prehaps double the time between polls with a max of about five minutes.
Some web services I've worked with return a "please try again in " xml message when they can't respond immediately. I realise that this is just a refinement of the polling technique, but if your server can determine at the time of the request what the likely delay is going to be, it could tell the client that and then forget about it, leaving the client to ask again once the polling interval has expired.
There are timeouts on IIS and client - side, which will prevent you from leaving the connection open.
This is also not practical, because resources/connections are blocked on the server.
Why do you want the user to wait for such a long running task? Let them look up the status of the operation somewhere.