kestrel + nginx intermittent 502 - c#

I have an ASP.NET core 1.0 app running on ubuntu 16.04 behind nginx/1.10.0. However, I notice that intermittently nginx throws the following error (in nginx error-log)
2017/06/08 05:19:19 [error] 11572#11572: *119049 upstream prematurely closed connection while reading response header from upstream, client: <ipaddress>, server: <servername>, request: "POST /<uri> HTTP/1.1", upstream: "http://127.0.0.1:5000/<uri>", host: "<servername>"
which results in a 502 Bad Gateway to the client although the request completes successfully in the application
2017-06-08 05:19:14.399 [DBG] Executed action method "<Controller.Action>", returned result "Microsoft.AspNetCore.Mvc.CreatedAtRouteResult
Could this be an nginx configuration issue or ASP.NET/Kestrel issue? Can someone give me hints to debug this?
Further notes:
I also see the following in the application log just after request completion
2017-06-08 05:19:14.399 [DBG] Executed action method "<Controller.Action>", returned result "Microsoft.AspNetCore.Mvc.CreatedAtRouteResult
2017-06-08 05:19:18.632 [DBG] Some connections failed to close gracefully during server shutdown.
2017-06-08 05:19:22.402 [DBG] Hosting starting
2017-06-08 05:19:22.689 [DBG] Hosting started
and the nginx error is just around the same time 2017/06/08 05:19:19
Also, normally in the logs I see messages around Executing ObjectResult and Executed time etc. which I don't see with the above request
2017-06-08 00:10:14.391 [DBG] Selected output formatter '"Microsoft.AspNetCore.Mvc.Formatters.JsonOutputFormatter"' and content type '"application/json"' to write the response.
2017-06-08 00:10:14.391 [INF] Executing ObjectResult, writing value "Microsoft.AspNetCore.Mvc.ControllerContext".
2017-06-08 00:10:14.504 [INF] Executed action "<Controller.Action>" in 1875.3845ms

Related

Query health check endpoints in ASP.NET

I implemented a health check endpoint following this doc. My ASP.NET application is dockerized and runs using docker-compose, with the port mapped/exposed.
Question: I am not sure how to query the health check endpoint from clients such as postman.
When I send a GET request to the /healthz endpoint as the following, postman throws the following error.
http://host.docker.internal:1200/healthz
Error: Client network socket disconnected before secure TLS connection was established
while I can see the following in the logs of the docker container.
[05:01:03 DBG] Connection id "0HMLSHSV1HRD2" accepted.
[05:01:03 DBG] Connection id "0HMLSHSV1HRD2" started.
[05:01:03 INF] Request starting HTTP/1.1 GET http://host.docker.internal:1200/healthz - -
[05:01:03 DBG] Wildcard detected, all requests with hosts will be allowed.
[05:01:03 VRB] All hosts are allowed.
[05:01:03 DBG] 1 candidate(s) found for the request path '/healthz'
[05:01:03 DBG] Request matched endpoint 'Health checks'
[05:01:03 DBG] Static files was skipped as the request already matched an endpoint.
[05:01:03 DBG] Https port '1200' loaded from configuration.
[05:01:03 DBG] Redirecting to 'https://host.docker.internal:1200/healthz'.
[05:01:03 DBG] Connection id "0HMLSHSV1HRD2" completed keep alive response.
[05:01:03 INF] Request finished HTTP/1.1 GET http://host.docker.internal:1200/healthz - - - 307 0 - 89.5220ms

Timeout on a WCF call, but getting an HTTP 500(64) error

I've an application (on production environment) that makes a lot of concurrent (multithreading) calls to WCF services (.Net Framework 4.0, SOAP BasicHttpBinding). Sometimes, the last request throws a TimeoutException:
System.TimeoutException: The request channel timed out while waiting for a reply after...
I can't reproduce it on a local envrionment, so it's difficult to apply changes to try. Anyway I've increased the timeout but the exception is throwing equally (later evidently). I've traced the server searching for inter-threading locks or Oracle locks, but I couldn't find anything.
I've activated internal code traces and I've deduced that the request didn't reach my server code, so, watching the IIS traces I've found an HTTP error ~2' after the request:
sc-status sc-substatus sc-win32-status time-taken
500 0 64 118265
But the timeout exceptions is thrown later depending on the wcf binding configuration. So I have two questions:
Why WCF is not catching that http error and remains waiting for response? I've searched for the error but I have high values on servicethrottling parameters.
Why the server is throwing that error without getting the server code not even an on an IDispatchMessageInspector I've implemented to log some data on request and response?

Upload S3 from on-premise host?

I'm following the following link https://docs.aws.amazon.com/AmazonS3/latest/dev/HLuploadFileDotNet.html
to upload files from local machine to an S3 bucket on VPC. The application is also testing and running on the on-premise machine.
var s3Client = new AmazonS3Client(RegionEndpoint.USEast2);
var fileTransferUtility = new TransferUtility(s3Client);
await fileTransferUtility.UploadAsync("c:\tmp\test.txt", "bucketName");
However, the code gets the following error.
A socket operation was attempted to an unreachable network
Should an Url be given?
Here is the network traffic captured by Fiddler. However, it gets a different exception for the code.
GET http://1xx.1xx.1xx.2xx/latest/meta-data/iam/security-credentials HTTP/1.1
Host: 1xx.1xx.1xx.2xx
HTTP/1.1 503 Service Unavailable
Cache-Control: no-cache
Pragma: no-cache
Content-Type: text/html; charset=utf-8
Proxy-Connection: Keep-Alive
Connection: Keep-Alive
Content-Length: 787
Network Error
Network Error (tcp_error)
A communication error occurred: "Operation timed out"
The Web Server may be down, too busy, or experiencing other problems preventing it from responding to requests. You may wish to try again at a later time.
For assistance, contact your network support team.
.aws\config
[default]
region = USWest2
I had the same error today, even though I had a valid $USERPROFILE\.aws\credentials file - it was actually because $USERPROFILE\AppData\Local\AWSToolkit\RegisteredAccounts.json couldn't be decrypted (not sure why), which causes AWS to think you don't have have local credentials, and hence tries to make a connection to the EC2 metadata URL which is http://169.254.169.254/latest/meta-data/?. On a local development machine that won't be accessible. For me, deleting the $USERPROFILE\AppData\Local\AWSToolkit\RegisteredAccounts.json file did the trick. FWIW, I only managed to figure this out by reading through the source of the AWS SDK...

Owin Self-Hosted WebApi Timeout Settings

I got an Owin self-hosted web-api server, and I'm wondering if I need to change timeout settings when there are huge file downloads?
The client I'm using reads the response withHttpCompletionOption.ResponseHeadersRead.
During debugging, after I stopped for some time in a breakpoint, I got an exception on client side while trying to read from a received stream:
Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.
While debugging I can reproduce this issue. It happens after around 30 seconds waiting in a breakpoint, after the Get-Request to the server returned.
Is this due to some kind of idle timeout, because I hold in a breakpoint and do not work on the received stream? Or can it also happen while I'm reading from the stream when my collection is slow and it takes too long?
Very old question but may help whoever hits the same wall.
I had the same problem with a streaming content and found the initial clue inside HTTPERR folder (C:\Windows\System32\LogFiles\HTTPERR)
2016-08-12 09:17:52 ::1%0 60095 ::1%0 8000 HTTP/1.1 GET
/endpoint/audiostream/0/0/streamer.mp3 - - - Timer_MinBytesPerSecond -
2016-08-12 09:18:19 ::1%0 60118 ::1%0 8000 HTTP/1.1 GET
/endpoint/audiostream/0/0/streamer.mp3 - - - Request_Cancelled -
Owin HttpListener has a TimeOutManager property that allows you to change most timeout/limits. The only way I found to get my webapp HttpListener instance was by accessing its properties
var listener = (OwinHttpListener);
app.Properties["Microsoft.Owin.Host.HttpListener.OwinHttpListener" ]);
listener.Listener.TimeoutManager.MinSendBytesPerSecond = uint.MaxValue;
According to owin codebase, uint.MaxValue as MinSendBytesPerSecond will just disable the flag.

SignalR with Redis Backplane Behind F5 - StatusCode: 400, ReasonPhrase: 'Bad Request'

I'm using SignalR version 2.1.2 with SignalR.Redis 2.1.2 on Server 2012 R2, IIS 8.5 with WebSockets enabled.
All is running perfectly in my development environment. I can even stand up copies on different servers (e.g. http machine1/myapp/signalr, http machine2/myapp/signalr) of the site configured to use the same backplane, and both UI's get messages pubb'd to them perfectly.
I then moved "myapp" to our next environment, which is a cluster of 2 machines sitting behind an F5 load balancer, with a dns alias setup to route to the F5, and then round robin "myapp". The website itself can connect to signalr just fine, and can receive published messages it subscribes to, BUT when I try to publish to the site via the alias (e.g. http myappalias/signalr), I get a 400, Bad Request error response. Here is an example of the error.
InnerException: Microsoft.AspNet.SignalR.Client.Infrastructure.StartException
_HResult=-2146233088
_message=Error during start request. Stopping the connection.
HResult=-2146233088
IsTransient=false
Message=Error during start request. Stopping the connection.
InnerException: System.AggregateException
_HResult=-2146233088
_message=One or more errors occurred.
HResult=-2146233088
IsTransient=false
Message=One or more errors occurred.
InnerException: Microsoft.AspNet.SignalR.Client.HttpClientException
_HResult=-2146233088
_message=StatusCode: 400, ReasonPhrase: 'Bad Request', Version: 1.1, Content: System.Net.Http.StreamContent, Headers:
{
Pragma: no-cache
Transfer-Encoding: chunked
X-Content-Type-Options: nosniff
Persistent-Auth: true
Cache-Control: no-cache
Date: Thu, 13 Nov 2014 22:30:22 GMT
Server: Microsoft-IIS/8.5
X-AspNet-Version: 4.0.30319
X-Powered-By: ASP.NET
Content-Type: text/html
Expires: -1
}
Here is some test code I'm using to publish test messages to each environment, where it fails on "connection.Start().Wait()"
class Program
{
static void Main(string[] args)
{
var connection = new HubConnection("http://myappalias/signalr");
connection.Credentials = System.Net.CredentialCache.DefaultNetworkCredentials;
var proxy = connection.CreateHubProxy("MyAppHub");
connection.Start().Wait();
ConsoleKeyInfo key = Console.ReadKey();
do
{
proxy.Invoke("NewMessage", new Message() { Payload = "Hello" });
Console.WriteLine("Message fired.");
key = Console.ReadKey();
} while (key.Key != ConsoleKey.Escape);
}
}
Now, if I don't use the "myappalias", and instead hit the server head on, it works perfectly. It appears either the F5 is the problem, the client needs to be configured differently for this scenario or I have to do something different when setting up signlar's startup class. Here is an example of the startup class I'm using.
[assembly: OwinStartup(typeof(MyApp.Startup))]
namespace MyApp
{
public class Startup
{
private static readonly ILog log = LogManager.GetLogger
(System.Reflection.MethodBase.GetCurrentMethod().DeclaringType);
public void Configuration(IAppBuilder app)
{
try
{
log.Debug(LoggingConstants.Begin);
string redisServer = ConfigurationManager.AppSettings["redis:server"];
int redisPort = Convert.ToInt32(ConfigurationManager.AppSettings["redis:port"]);
HubConfiguration configuration = new HubConfiguration();
configuration.EnableDetailedErrors = true;
configuration.EnableJavaScriptProxies = false;
configuration.Resolver = GlobalHost.DependencyResolver.UseRedis(redisServer, redisPort, string.Empty, "MyApp");
app.MapSignalR("/signalr", configuration);
log.Info("SIGNALR - Startup Complete");
}
finally
{
log.Debug(LoggingConstants.End);
}
}
}
}
I download the client source code, and wired that in directly instead of the nuget package, so I could step through everything. I seems it successfully negotiates, and then attempt to "connect" with SSE's and then LongPolling transports, but fails at both.
Question 1.1
Anyone know of an alternative to Signalr for .NET that supports scaling with load balancing in a less "I want to pull my hair out" kind of way?
It should not be necessary to configure source address affinity to use SignalR behind a load balancer. It's certainly not wrong to set up session affinity, but that doesn't fix your underlying problem.
If you look closely at the content of the 400 response, you probably see a message similar to "The ConnectionId is in the incorrect format."
SignalR uses the server's machine key to create an anti-CSRF token, but this requires that all the servers in your farm share a machine key for the token to be properly decrypted in when SignalR requests hop servers. The /negotiate request that you see succeed is the request that retrieves the anti-CSRF token. When the SignalR client then uses the anti-CSRF token to make a /connect request, it failed because the /connect request was processed by a different server that didn't create the token and is unable to decrypt it.
This explains why setting up session affinity fixed your problem, but sharing a machine key will help you avoid this problem even if something goes wrong with session affinity.
Here is an issue that filed on GitHub by someone who experienced a similar issue: https://github.com/SignalR/SignalR/issues/2292.
The problem was fixed by switching the profile for "MyApp" in the F5, to using the "source_addr" profile built into the F5 as a parent profile with a timeout of 1 hour. Here is a description of what that profile does:
Source address affinity persistence Also known as simple persistence,
source address affinity persistence supports TCP and UDP protocols,
and directs session requests to the same server based solely on the
source IP address of a packet.
EDIT
This ended up "Working" for a while, but if I deploy a publisher (something that simply publishes through the signalr client) without republishing the Hub, the publisher times out trying to connect over and over and over again. uhg.

Categories

Resources