HttpRequest throttling between two web apps - c#

I have two webapps running .net core 2.1 self hosted (it's Azure, but the behavior applies also to when I'm running it locally) with NancyFx.
App1 is just a proxy that collects client requests, does authentication, and forwards them to App2 and than sends response back to clients. This architecture is not likely to change due to some internal limitations.
App1 uses HttpClient and HttpClientFactory for forwarding.
Now the problem. I run a load test which generates a lot of requests. In logs I see that App2 is in general fast enough with handling them, but when the load increases App1 is still running into timeouts. I investigated faling requests and see roughly the following:
8:30:00 Client request came to App1
8:30:01 HttpClient in App1 posted
request to App2
8:30:31 Request accepted by App2 (What? 30 seconds
when both apps are localhost?!)
8:32:00 In App2:
ERROR|Microsoft.AspNetCore.Server.Kestrel|Connection id
"0HLFE17I2N0C2", Request id "0HLFE17GH3G4R:00000001": An unhandled
exception was thrown by the application.
System.OperationCanceledException: The operation was canceled. In
App1: 503
Some research:
In App2 the CPU and RAM remain low
In App2 database access is fast and queries are all executed in much under 1 sec
In dotTrace I see that lots of time is spent in Nancy.NancyEngine.HandleRequest
I added ServicePointManager.DefaultConnectionLimit = 100; to both apps
I added <connectionManagement><add address = "http://localhost:5001" maxconnection = "100"
/></connectionManagement>
I tried with HttpRequest without factory and passed in HttpRequestHandler with MaxConnections = 100
I added this to App2:
builder.UseKestrel(options =>
{
options.Listen(IPAddress.Loopback, 5001, listenOptions =>
{
listenOptions.KestrelServerOptions.Limits.MaxConcurrentConnections = 400;
listenOptions.KestrelServerOptions.Limits.MaxConcurrentUpgradedConnections = 400;
});
});
Nothing helps, it looks to me like throttling, but where? IP of App1 doesn't change of course and probably App2 doesn't like so many requests. But the behavior is the same locally, so it's not some Azure protection.

Related

Net Core 3.1 IHttpClientFactory/HttpClient Slow on First Request

I have a .Net Core 3.1 app. In the app, I use the IHttpClientFactory to create an HttpClient. When I make a call using SendAsync, the first request takes over 2 seconds whereas subsequent requests take less than 100 ms. This is not acceptable performance for a production application.
I have also noticed that it happens if I don't make any requests for a while. I came across the PooledConnectionIdleTimeout property, which defaults to 2 minutes, and I can extend that time, but that would only work for pooled connections that already exist, not when needing to create a new one.
I configure the HttpClient in my Startup.cs as such:
services.AddHttpClient("HttpClient",
h =>
{
h.BaseAddress = new Uri(Configuration["PythonUrl"]);
});
Use the HttpClient like this:
var client = httpClientFactory.CreateClient("HttpClient");
var res = await client.GetAsync(nameof(Accounts).ToLower() + "/" + id.ToString() + "/");
when the "Configuration["PythonUrl"]" contains PC name,like this :
{
"AllowedHosts": "*",
"PythonUrl": "http://PC202003261059:8000/",
"url": "http://*:5000"
}
The HttpClient's first request becomes very slow. Can anything be done to avoid this?
I had a similar problem and found a solution that works well for me.
Note that this solution will only work for IIS hosted WebApps.
When hosting an application in IIS, the AppPool is responsible for actually running it. However, it only starts the application when it receives the first request. This is fine in most situations and is therefore the default setting, but this leads to a slow first request.
In IIS manager, right click on the Application Pool that runs your
app and select 'Advanced Settings'. Set 'Start Mode' to 'Always
Running'.
Rick-Click your Site and under 'Manage Website' select 'Advanced
Settings'. Set 'Preload Enabled' to 'true'.
This should make the first request faster and won't put your AppPool to sleep after some time. It should also be noted that this may impact the performance of other apps hosted on the same server.
This article helped me with this.
Disclaimer: the first request might still be slower due to several reason (some of which may be client-side like DNS requests, or network-related lost packets, poor internet connection etc...).

Disable Kestrel (dotnet asp.net core server) request queuing

Kestrel (dotnet asp.net core server) is queuing requests if too many requests are hit at one time. I want it to throw a 503 than queue instead to avoid timeout. We have
.UseKestrel(options => { options.Limits.MaxConcurrentConnections = 100; })
But if more than 100 requests it would still queue up, and some requests just timeout.
MaxConcurrentConnections property specifies the number of connections the Kester Server can accept before it starts rejecting the connections.
So, in other words, MaxConcurrentConnections specifies the queue length. In the above example, it will start to drop if it accepted 100 requests and processing them.
https://github.com/aspnet/AspNetCore/blob/b31bdd43738a55e10bb38336406ee0db56c66b44/src/Servers/Kestrel/Core/src/Middleware/ConnectionLimitMiddleware.cs#L32-L39
If your site receives less than 10 requests per second and you are processing the requests within 5 seconds, you will be good.
Also, there is no option to specify a custom HTTP error code. The TCP connection will be terminated abrubtly by the server. Youe client should detect and handle the Network Error.
Also refer this open issue: https://github.com/aspnet/AspNetCore/issues/4777

Kestrel request per second issue

i'm a newbie to asp.net core
i'm write a web api service, which store passed data to database. in theory there is about 300-400 request per second to server in future and response time must be less than 10 seconds
but first of all i try to run some load test with locust.
i write simple app with one controller and only one post method which simple return Ok() without any processing.
i try to create load to this service for 1000 users. my service run under ubuntu 16.04 with .net core 2.1 (2 Xeon 8175M with 8 GB of RAM). Locust run from dedicated computer
but i see only ~400 RPS and response time about 1400 ms. For empty action it is very big value.
i'm turn off all loging, run in production mode but no luck - still ~400 rps.
in system monitor (i use nmon) i see that both cpu loads only for 12-15% (total 24-30%). I have about 3 GB free ram, no network usage (about 200-300 KB/s), no disk usage, so system have hardware resource for handling request.
so i think, that there is problem with some configuration or may be with system resource like sockets, handles etc
i also try to use libuv instead of managed socket, but result is same
in kestrel configuration i setup explicitly Limit.MaxConnection and MaxUpgradedConnection to null (but it is default value)
so, i have two question:
- in theory, can kestrel provide high rps?
- if first is true, can you give me some advise for start point (links, articles and so on)

Azure ASP .net WebApp The request timed out

I have deployed an ASP .net MVC web app to Azure App service.
I do a GET request from my site to some controller method which gets data from DB(DbContext). Sometimes the process of getting data from DB may take more than 4 minutes. That means that my request has no action more than 4 minutes. After that Azure kills the connection - I get message:
500 - The request timed out. The web server failed
to respond within the specified time.
This is a method example:
[HttpGet]
public async Task<JsonResult> LongGet(string testString)
{
var task = Task.Delay(360000);
await task;
return Json("Woke", JsonRequestBehavior.AllowGet);
}
I have seen a lot of questions like this, but I got no answer:
Not working 1
Cant give other link - reputation is too low.
I have read this article - its about Azure Load Balancer which is not available for webapps, but its written that common way of handling my problem in Azure webapp is using TCP Keep-alive. So I changed my method:
[HttpGet]
public async Task<JsonResult> LongPost(string testString)
{
ServicePointManager.SetTcpKeepAlive(true, 1000, 5000);
ServicePointManager.MaxServicePointIdleTime = 400000;
ServicePointManager.FindServicePoint(Request.Url).MaxIdleTime = 4000000;
var task = Task.Delay(360000);
await task;
return Json("Woke", JsonRequestBehavior.AllowGet);
}
But still get same error.
I am using simple GET request like
GET /Home/LongPost?testString="abc" HTTP/1.1
Host: longgetrequest.azurewebsites.net
Cache-Control: no-cache
Postman-Token: bde0d996-8cf3-2b3f-20cd-d704016b29c6
So I am looking for the answer what am I doing wrong and how to increase request timeout time in Azure Web app. Any help is appreciated.
Azure setting on portal:
Web sockets - On
Always On - On
App settings:
SCM_COMMAND_IDLE_TIMEOUT = 3600
WEBSITE_NODE_DEFAULT_VERSION = 4.2.3
230 seconds. That's it. That's the in-flight request timeout in Azure App Service. It's hardcoded in the platform so TCP keep-alives or not you're still bound by it.
Source -- see David Ebbo's answer here:
https://social.msdn.microsoft.com/Forums/en-US/17305ddc-07b2-436c-881b-286d1744c98f/503-errors-with-large-pdf-file?forum=windowsazurewebsitespreview
There is a 230 second (i.e. a little less than 4 mins) timeout for requests that are not sending any data back. After that, the client gets the 500 you saw, even though in reality the request is allowed to continue server side.
Without knowing more about your application it's difficult to suggest a different approach. However what's clear is that you do need a different approach --
Maybe return a 202 Accepted instead with a Location header to poll for the result later?
I just changed my Azure Web Site from Shared Enviroment to Standard, and it works.

Invalid or expired security context token in WCF web service

All,
I have a WCF web service (let's called service "B") hosted under IIS using a service account (VM, Windows 2003 SP2). The service exposes an endpoint that use WSHttpBinding with the default values except for maxReceivedMessageSize, maxBufferPoolSize, maxBufferSize and some of the time outs that have been increased.
The web service has been load tested using Visual Studio Load Test framework with around 800 concurrent users and successfully passed all tests with no exceptions being thrown. The proxy in the unit test has been created from configuration.
There is a sharepoint application that use the Office Sharepoint Server Search service to call web services "A" and "B". The application will get data from service "A" to create a request that will be sent to service "B". The response coming from service "B" is indexed for search. The proxy is created programmatically using the ChannelFactory.
When service "A" takes less than 10 minutes, the calls to service "B" are successfull. But when service "A" takes more time (~20 minutes) the calls to service "B" throw the following exception:
Exception Message: An unsecured or incorrectly secured fault was received from the other party. See the inner FaultException for the fault code and detail
Inner Exception Message: The message could not be processed. This is most likely because the action 'namespace/OperationName' is incorrect or because the message contains an invalid or expired security context token or because there is a mismatch between bindings. The security context token would be invalid if the service aborted the channel due to inactivity. To prevent the service from aborting idle sessions prematurely increase the Receive timeout on the service endpoint's binding.
The binding settings are the same, the time in both client server and web service server are synchronize with the Windows Time service, same time zone.
When i look at the server where web service "B" is hosted i can see the following security errors being logged:
Source: Security
Category: Logon/Logoff
Event ID: 537
User NT AUTHORITY\SYSTEM
Logon Failure:
Reason: An error occurred during logon
Logon Type: 3
Logon Process: Kerberos
Authentication Package: Kerberos
Status code: 0xC000006D
Substatus code: 0xC0000133
After reading some of the blogs online, the Status code means STATUS_LOGON_FAILURE and the substatus code means STATUS_TIME_DIFFERENCE_AT_DC. but i already checked both server and client clocks and they are syncronized.
I also noticed that the security token seems to be cached somewhere in the client server because they have another process that calls the web service "B" using the same service account and successfully gets data the first time is called. Then they start the proccess to update the office sharepoint server search service indexes and it fails. Then if they called the first proccess again it will fail too.
Has anyone experienced this type of problems or have any ideas?
Regards,
--Damian
10 mins is the default receive timeout. If you have an idled proxy for more than 10mins, the security session of that proxy is aborted by the server. Enable logging and you will see this in the diagnostics log of the server. The error message you reported fits for this behavior.
Search your system diagnostic file for "SessionIdleManager". If you find it, the above is your problem.
Give it a whirl and set the establishSecurityContext="false" for the client and the server.
Don't call the service operation in a using statement. Instead use a pattern such as...
client = new ServiceClient("Ws<binding>")
try
{
client.Operation(x,y);
client.Close();
}
catch ()
{
client.Abort();
}
I don't understand why this works but I would guess that when the proxy goes out of scope in the using statement, Close isn't called. The service then waits until receiveTimeout (on the binding) has expired and then aborts the connection causing subsequent calls to fail.
What I believe is happening here is that your channel is timing out (as you suspect).
If I understand correctly, it is not the calls to service A that are timing out, but rather to service B, before you call your operation.
I'm guessing that you are creating your channel before you call service A, rather than just in time (i.e. before calling service B). You should create the channel (proxy, service client) just before you use it like:
AResponse aResp = null;
BResponse bResp = null;
using (ServiceAProxy proxyA = new ServiceAProxy())
{
aResp = proxyA.DoServiceAWork();
using (ServiceBProxy proxyB = new ServiceBProxy())
{
bResp = proxyB.DoOtherork(aResp);
}
}
return bResp;
I believe however, that once you get over that problem (service B timing out), you'll realize that the sharepoint app's proxy (that called service A) will timeout.
To solve that, you may wish to change your service model from a request-response, to a publish-subscribe model.
With long-running services, you'll want your sharepoint app to subscribe to service A, and have service A publish its results when it is ready to do so - regardless of how long it takes.
Programming WCF Services (O'Reilly) by Juval Lowey, has a great explanation, and IDesign (Juval's company) published a great set of coding standards for WCF, as well as the code for a great Publish-Subscribe Framework.
Hope this helps,
Assaf.
I actually triggered this error just now by doing something silly. I have a unit test that modifies the system date in order to test some time-based features. And I guess the apparent time difference between when I created the context and when I called my method (because of the changes to the system date), caused something to expire.

Categories

Resources