Disable Kestrel (dotnet asp.net core server) request queuing

Disable Kestrel (dotnet asp.net core server) request queuing - c#

Kestrel (dotnet asp.net core server) is queuing requests if too many requests are hit at one time. I want it to throw a 503 than queue instead to avoid timeout. We have
.UseKestrel(options => { options.Limits.MaxConcurrentConnections = 100; })
But if more than 100 requests it would still queue up, and some requests just timeout.

MaxConcurrentConnections property specifies the number of connections the Kester Server can accept before it starts rejecting the connections.
So, in other words, MaxConcurrentConnections specifies the queue length. In the above example, it will start to drop if it accepted 100 requests and processing them.
https://github.com/aspnet/AspNetCore/blob/b31bdd43738a55e10bb38336406ee0db56c66b44/src/Servers/Kestrel/Core/src/Middleware/ConnectionLimitMiddleware.cs#L32-L39
If your site receives less than 10 requests per second and you are processing the requests within 5 seconds, you will be good.
Also, there is no option to specify a custom HTTP error code. The TCP connection will be terminated abrubtly by the server. Youe client should detect and handle the Network Error.
Also refer this open issue: https://github.com/aspnet/AspNetCore/issues/4777

Related

429 Too many requests only production server side, not localhost, not browser

I readed this post: C# (429) Too Many Requests
and i understod the responde code but... why only return this status code when the call is done from server side (backend) and production mode (hosted)? the service never return this code when call (the same service) from chrome's navigate url or when i do the call server side (backend) but my localhost.
CASE 1 (works fine in localhost - the service url is not localhost, is hosted)
App A (localhost) call App B (hosted) --> works fine
for (int i = 0; i < 1000; i++)
{
HttpClient client = new HttpClient();
client.BaseAddress = new Uri(url);
client.DefaultRequestHeaders.Accept.Add(new MediaTypeWithQualityHeaderValue("application/json"));
String response = client.GetStringAsync(urlParameters).Result;
client.Dispose();
}
CASE 2 (work fine)
Chrome navigator call App B (hosted) --> works fine
CASE 3 (similar to case 1 but too less requests - NOT WORK)
App A (hosted) call App B (hosted) --> 429
Why? What is the problem? How can solve it?

What's Happening
The HTTP 429 response code indicates you have been rate limited. The idea is to prevent one caller from overwhelming a service, making it less availabe to other callers.
Most Common
That limiting can be based on many things. Most common are
Number of calls per unit time (usually per second)
Number of concurrent calls
The General Case
A rate limiter may also forgive a short burst of calls that happens occasionally, may allow more calls before hitting the brakes based on who you are (using your IP or an API key for example), dynamically adjust its limits based on total system load, or do other things.
Probably Happening Here
Based on your description, I would guess the number of concurrent calls could be causing production rate limiting. Rather than hitting the external API hard trying to guess what the rules are, try reaching out to them to ask. If that is not an option, running multiple requests in parallel could validate this theory.
Handling
A great way to deal with this is to back off your requests when you receive an HTTP 429.
The service should return a Retry-After header indicating how many seconds you should wait before trying again. If it does, wait that long before resubmitting your request.
If the service does not provide that header (I work with a major one that does not), use exponential backoff instead.
Depending on your needs, you may want to tell your own caller to try again later (return an HTTP 429 yourself) or you may want to queue up pending requests and work off the queue to submit them all.
Preventing
If you know the rate limits, you can pre-emptively limit your outbound call rate so you get into this situation less often.
For call-per-second limits, you can use a counter variable that you reset (in a thread-safe way) every second. If the known call limit would be exceeded, calculate when the counter will reset (store a timestamp when it does) and delay processing that long.
For a concurrent-call limit, a SemaphoreSlim works nicely. Set the maximum count to whatever your concurrent rate limit is. Acquire the semaphore before making a request and release it (in a finally block) after your call completes.
If you have multiple servers subject to the same rate limit (e.g. if rate limiting is based on an API key rather than IP address), it gets harder to self-limit, but you can set self-limiting parameters (calls per second and concurrent calls) in a configuration file, and tune them over time to maximize your throughput without hitting excessive HTTP 429's.

gRPC - bidirectional stream goes to TRANSIENT_FAILURE if idle for too long

I'm trying to get a fairly simple test scenario to work - I'd like to create a long-lived bidirectional streaming rpc that may sit idle for long periods of time (electron app with local server).
A Node gRPC client starts a C# gRPC server locally and initiates a bidirectional stream. The streaming service receives each message, waits 50 ms, and sends it back.
The Node client test code is set up to send 5 messages, wait 30 seconds, and then send 5 more messages. The first 5 messages successfully roundtrip. The second 5 messages eventually roundtrip, but not until 5 minutes later. The server side code is not hit during this time.
I'm sure I'm being a baboon here, but I don't understand why the connection seems to be dying so fast. I'm also not sure what options could help here, if any. It seems like keepalive is intended for tracking whether the TCP connection is still alive, but doesn't actually help keep it alive. idleTimeout doesn't seem relevant either, because we're going to TRANSIENT_FAILURE status according to the enum documentation here.
This discussion from 2016 is close to what I'm trying to do, but the solution was a RYO heartbeat. This grpc-dotnet issue seems to rely on a heartbeat-type solution specific to ASP.NET, which is not currently used.
gRPC server logs:
After the first 5 messages are sent:
transport 000001A7B5A63090 set connectivity_state=4
Start BDP ping err..."Endpoint read failed" (paraphrasing)
5 minutes later right before the second set of 5 messages comes through:
W:000001A7B5AC8A10 SERVER [ipv6:[::1]:57416] state IDLE -> WRITING [RETRY_SEND_PING]
Node library is #grpc/grpc-js
tl;dr How can I keep the connection healthy & working in the case of downtime?

MongoDB connection problems on Azure

We have an ASP.NET MVC application deployed to an Azure Website that connects to MongoDB and does both read and write operations. The application does this iteratively. A few thousand times per minute.
We initialize the C# driver using Autofac and we set the MaxConnectionIdleTime to 45 seconds as suggested in https://groups.google.com/forum/#!topic/mongodb-user/_Z8YepNHnbI and a few other places.
We are still getting a large number of the below error:
Unable to read data from the transport connection: A connection
attempt failed because the connected party did not properly respond
after a period of time, or established connection failed because
connected host has failed to respond. Method
Message:":{"ClassName":"System.IO.IOException","Message":"Unable to
read data from the transport connection: A connection attempt failed
because the connected party did not properly respond after a period of
time, or established connection failed because connected host has
failed to respond.
We get this error while connecting to both a MongoDB instance deployed on a VM in the same datacenter/region on Azure and also while connecting to an external PaaS MongoDB provider.
I run the same code in my local computer and connect to the same DB and I don't receive these errors. It's only when I deploy the code to an Azure Website.
Any suggestions?

A few thousand requests per minute is a big load, and the only way to do it right, is by controlling and limiting the maximum number of threads which could be running at any one time.
As there's not much information posted as to how you've implemented this. I'm going to cover a few possible circumstances.
Time to experiment...
The constants:
Items to process:
50 per second, or in other words...
3,000 per minute, and one more way to look at it...
180,000 per hour
The variables:
Data transfer rates:
How much data you can transfer per second is going to play a role no matter what we do, and this will vary through out the day depending on the time of day.
The only thing we can do is fire off more requests from different cpu's to distribute the weight of traffic we're sending back n forth.
Processing power:
I'm assuming you have this in a WebJob as opposed to having this coded inside the MVC site it's self. It's highly inefficient and not fit for the purpose that you're trying to achieve. By using a WebJob we can queue work items to be processed by other WebJobs. The queue in question is the Azure Queue Storage.
Azure Queue storage is a service for storing large numbers of messages
that can be accessed from anywhere in the world via authenticated
calls using HTTP or HTTPS. A single queue message can be up to 64 KB
in size, and a queue can contain millions of messages, up to the total
capacity limit of a storage account. A storage account can contain up
to 200 TB of blob, queue, and table data. See Azure Storage
Scalability and Performance Targets for details about storage account
capacity.
Common uses of Queue storage include:
Creating a backlog of work to process asynchronously
Passing messages from an Azure Web role to an Azure Worker role
The issues:
We're attempting to complete 50 transactions per second, so each transaction should be done in under 1 second if we were utilising 50 threads. Our 45 second time out serves no purpose at this point.
We're expecting 50 threads to run concurrently, and all complete in under a second, every second, on a single cpu. (I'm exaggerating a point here, just to make a point... but imagine downloading 50 text files every single second. Processing it, then trying to shoot it back over to a colleague in the hopes they'll even be ready to catch it)
We need to have a retry logic in place, if after 3 attempts the item isn't processed, they need to be placed back in to the queue. Ideally we should be providing more time to the server to respond than just one second with each failure, lets say that we gave it a 2 second break on first failure, then 4 seconds, then 10, this will greatly increase the odds of us persisting / retrieving the data that we needed.
We're assuming that our MongoDb can handle this number of requests per second. If you haven't already, start looking at ways to scale it out, the issue isn't in the fact that it's a MongoDb, the data layer could have been anything, it's the fact that we're making this number of requests from a single source that is going to be the most likely cause of your issues.
The solution:
Set up a WebJob and name it EnqueueJob. This WebJob will have one sole purpose, to queue items of work to be process in the Queue Storage.
Create a Queue Storage Container named WorkItemQueue, this queue will act as a trigger to the next step and kick off our scaling out operations.
Create another WebJob named DequeueJob. This WebJob will also have one sole purpose, to dequeue the work items from the WorkItemQueue and fire out the requests to your data store.
Configure the DequeueJob to spin up once an item has been placed inside the WorkItemQueue, start 5 separate threads on each and while the queue is not empty, dequeue work items for each thread and attempt to execute the dequeued job.
Attempt 1, if fail, wait & retry.
Attempt 2, if fail, wait & retry.
Attempt 3, if fail, enqueue item back to WorkItemQueue
Configure your website to autoscale out to x amount of cpu's (note that your website and web jobs share the same resources)
Here's a short 10 minute video that gives an overview on how to utilise queue storages and web jobs.
Edit:
Another reason you may be getting those errors could be because of two other factors as well, again caused by it being in an MVC app...
If you're compiling the application with the DEBUG attribute applied but pushing the RELEASE version instead, you could be running into issues due to the settings in your web.config, without the DEBUG attribute, an ASP.NET web application will run a request for a maximum of 90 seconds, if the request takes longer than this, it will dispose of the request.
To increase the timeout to longer than 90 seconds you will need to change the [httpRuntime][3] property in your web.config...

<httpRuntime executionTimeout="300" />
The other thing that you need to be aware of is the request timeout settings of your browser > web app, I'd say that if you insist on keeping the code in MVC as opposed to extracting it and putting it into a WebJob, then you can use the following code to fire a request off to your web app and offset the timeout of the request.
string html = string.Empty;
string uri = "http://google.com";
HttpWebRequest request = (HttpWebRequest)WebRequest.Create(uri);
request.Timeout = TimeSpan.FromMinutes(5);
using (HttpWebResponse response = (HttpWebResonse)request.GetResponse())
using (Stream stream = response.GetResponseStream())
using (StreamReader reader = new StreamReader(stream))
{
html = reader.ReadToEnd();
}

Are you using mongoDB in a VM? It seems to be a network problem. This kind of transient faults should occur, so the best you can do is implement a retry pattern or use a lib such as Polly to do that:
Policy
.Handle<IOException>()
.Retry(3, (exception, retryCount) =>
{
// do something
});
https://github.com/michael-wolfenden/Polly

UploadOperation timeout restrictions

UploadOperation has next time restrictions:
1.) When establishing a new connection for an upload over TCP/SSL, the connection attempt is aborted if not established within five minutes.
2.) After the connection has been established, an HTTP request message that has not received a response within two minutes is aborted.
Is it normal for POST and PUT requests to limit them in time(if I want to upload big file)?
Is there a way to overwrite that "two minutes" value to allow longer requests?

The 0x80072EE2 exception is ERROR_INTERNET_TIMEOUT. That happens because the server has not read data from the request in 2 minutes, or because the server has not sent response data in 2 minutes.
If the server keeps reading or writing data constantly, the connection does not timeout.

wcf - difference between MaxConcurrentCalls and MaxConcurrentSessions property

After reading http://msdn.microsoft.com/en-us/library/system.servicemodel.description.servicethrottlingbehavior.maxconcurrentsessions.aspx
and
http://msdn.microsoft.com/en-us/library/system.servicemodel.description.servicethrottlingbehavior.maxconcurrentcalls.aspx
I have concluded that:
MaxConcurrentSessions is the number of queued sessions per client (default of 10)
MaxConcurrentCalls is the number of active connections on the service (default of 16) i.e. all clients accessing the service at any one time, meaning that if 2 client did 10 calls each, 4 would have to wait in the queue for processing.
Questions:
Is my conclusion correct?
How does MaxConnections interact with these?
Does MaxConnections take precedence over the MaxConcurrentX settings?
(Note:I am using .NET 3.5)

MaxConcurrentCalls has to do with the number of calls on the service that are currently executing.
MaxConnections has to do with the total number of open connections on the service, regardless if the service is executing anything for the connection.
For example, if a client opens a connection to the service, calls a method, and is waiting for the method to return, it will count against the MaxConcurrentCalls. As soon as the service returns a response to the client’s method call, it will not count against the MaxConcurrentCalls… even if you didn’t close the client-side proxy. Assuming you didn’t close the client-side proxy, the connection would count towards the MaxConnections on the service since you still have the connection open, but it’s not currently executing anything on the service so it would not count against the MaxConcurrentCalls.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.