HealthChecks failing after several hours

HealthChecks failing after several hours - c#

I've implemented health checks in one of our web apps:
services.AddHealthChecks()
.AddSqlServer(connectionString.ConnectionString, null, HealthCheckName); // Sql HealthCheck
What I've noticed is that we're getting this at least once a day, then the app will restart.
An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full. (directaccess-.....****) An operation on a socket could not be performed because the system lacked sufficient buffer space or because a queue was full.
Then the app will restart and function again. Did anyone run into this issue before?

Option -- 1
It appears that application insights was integrated for this app so review Application Insights data to identify why custom exceptions were thrown by application code or why app was taking a long time to load.
Please follow these instructions to view Application Insights data.
Go to Application Insights blade for this App.
Click on View Application Insights Data.
Option -- 2
If the issue is happening right now, collect .NET Profiler trace to troubleshoot the issue. A profiler trace helps you easily identify the ExceptionType, message and callstack for a .NET exception without installing any additional tools and without changing the state of the problem. Profiler trace helps you identify exceptions in both ASP.NET and ASP.NET Core applications.
Please follow these instructions to collect a profiler trace.
Go to App Service Diagnostics
Choose Diagnostic Tools .
Click on Collect .NET Profiler Trace tile and follow the instructions.
(Collect .NET Profiler Trace tile is enabled only for ASP.NET and ASP.NET Core applications. If your app is an ASP.NET app and don't this tile, choose the application stack from the top right)
Option -- 3
If the issue is not reproducible or intermittent, you can configure AutoHealing's custom action to collect some data (like profiler trace or memory dump) that will help you debug the issue further. The triggers and actions allow you to define various conditions based on request count, slow requests, memory limit on which you can take specific actions like restarting the process, logging an event, or starting another executable.
Please follow these instructions to configure an autohealing rule.
Go to App Service Diagnostics
Choose Diagnostic Tools .
Click on Auto Healing tile under Proactive Tools category
Configure a rule based on your scenario.

Related

Why is my azure webapp request sometimes slow

My azure web application sometimes reacts very slowly. He waits a few seconds before executing the request.
Of course I have the setting "always on" turned on.
It's running on a S2 service plan.
Avg users online 3
No vertical or horizontal scaling configured.
Application
Asp.net MVC
.net Framework 4.6.1
C#
Does anyone have an idea why this problem occasionally occurs?

Ok i see based on your picture that there is a wait time of 98.71% and lots of wait time from the compiler, so i would recommend you to consider to use precompiled views on your mvc app, to avoid the runtime compilation of the views. If you are using Azure DevOps, you should be able to change your task to build the solution and add the following options on the MSBuild arguments.
/p:PrecompileBeforePublish=true /p:UseMerge=true /p:SingleAssemblyName=AppCode

When you see the WebApp being slow it is important to understand what HTTP requests are slow and whether those HTTP requests are slow all the time or it is an intermittent issue? How are the CPU and memory metrics and what is the pattern of slowness? If you have application Insights enabled please navigate to the "Performance" tab to see the requests were are slow and whether they are dependent on an external component.
Collecting CLR profiler in the context of slowness will reveal where the time is spent.
You can navigate to Azure Portal-->WebApp-->Diagnose and solve problem blade-->Diagnostic tools-->Autoheal and enable the rule to collect the CLR profiler traces on slowness.
Once the rule triggers it will collect the profiler traces and build a report for your review.

Is the tracing built in to ASP.Net Web Api 2 only meant for a non-production environment?

I read the Global Error Handling recommendations and the Tracing in Web API 2 articles, and I understand how to set these things up. However, I noticed in the error handling part, that it states:
While Web API does have tracing infrastructure that captures error conditions the tracing infrastructure is for diagnostics purposes and is not designed or suited for running in production environments. Global exception handling and logging should be services that can run during production and be plugged into existing monitoring solutions
I'm looking for clarification on this. Is this statement saying that errors should only be logged as part of the trace when not in production, or that a custom implementation of ITraceWriter should only be registered with the HttpConfiguration when not in production?
I would assume that the article says
not designed or suited for running in production environments
simply for the performance impact, but is there some different contextual info that I could see for a specific error by looking at the Exception on the TraceRecord vs. the Exception that gets passed into the IExceptionLogger?

Going by what was written, it's meant as a rudimentary form of tracing and logging that whilst fine for developer and diagnostic environments is not for production due to performance and feature reasons.
To be due-diligent it advises to use an out-of-process service (eg log4net in a separate process or take your pick from Azure) so as to reduce probability of logging failing due to a fault in the core process; room to expand performance; and potential for a more feature rich logging system not provided in the default design.

Running a workflow within another workflow - Both terminates

I have been using windows workflow foundation and calling a workflow from another workflow.
Using .Net framework 4.6.1 along with MS SQL Server 2014. The whole scenario is working absolutely fine.
However, when I deploy it to a customer environment it starts terminating without an exception or log trace. I have added detail logging (printing log statements on different lines) but it terminates on different statements.
As debugging in customer environment is not a good idea, any pointers in identification of this issue would be appreciated.

IIS app pool crashing on Azure load-balanced VMs

We have a new ASP.NET website running on a pair of load balanced Azure VMs. The website is fairly simple and uses Kentico CMS. Twice in the 24 hours since going live the application pool on both web servers has suddenly stopped (within 5-10 minutes of each other) causing 503: Service unavailable errors.
Looking at Windows system logs I see the error which caused the problem:
Application pool '[[NAME]]' is being automatically disabled due to a
series of failures in the process(es) serving that application pool.
Leading up to this are a series of warnings:
A process serving application pool '[[NAME]]' suffered a fatal
communication error with the Windows Process Activation Service. The
process id was '[[PROCESS ID]]'. The data field contains the error
number.
Evidently this is IIS's rapid-fail protection kicking in. What's not clear is how to find the cause of this "fatal communication error".
After some web searching I've installed the Debug Diagnostics Tool which has helped me identify that in every case the relevant process was the IIS worker process (w3wp.exe). This tool is new to me and unfortunately the only time the problem occurred since I installed it, no dumps were generated. However, its logs contain a lot of messages like this:
First chance exception - 0xe0434352 caused by thread with System ID:
[[ID]]
The frustrating thing is that I don't know what steps to take to replicate the error conditions. It never occurred in UAT in a very similar environment, even under load test. Here are some facts about my setup:
ASP.NET version = 4.5.2
Application pool running with identity set to a domain account with modify permission on the website directory
Application set with max one worker process
Any advice much appreciated.
* UPDATE 1 *
I now have DebugDiag dump generated by the "fatal communication error" warning event. Dump summary reads:
Dump Summary
------------
Process Name: w3wp.exe : C:\Windows\SysWOW64\inetsrv\w3wp.exe
Process Architecture: x86
Exception Code: 0xC00000FD
Exception Information: The thread used up its stack.
Heap Information: Present

In the end I tracked this down to a bug in my code. Under very edge-case circumstances the CMS was returning an empty Guid instead of an actual ID which was causing a stack overflow in a recursive method.
The 0xC00000FD exception code I posted above is actually a stack overflow exception, so once I knew that and downloaded the Debug Diagnostcs dump file I was able to replicate the crash scenario locally. That tool, by the way, is incredibly powerful and was able to demonstrate the exact conditions of the crash.
All I can say to people who arrive here with similar issue is - firstly, don't assume the issue is not with your code! And secondly, use Debug Diagnostcs.

First of all, what is your app pool regular recycle time interval setting & overlapping setting in IIS? - If these incidents occur when the recycling is scheduled and overlapping is disabled, this behavior is to be expected. Even when overlapping is enabled, I'd guess that it is somewhat connected to automatic recycling of app pool since both instances are impacted in cca the same time & it occurs twice a day and it can cause logging the warning you mentioned (Here you might find how to disable logging this warning in case it is caused by automatic recycling)
If that leads nowhere, you can find more details about the warning event here:
IIS Application Pool Availability
And about the Debug Diagnostcs tools here:
How to use the Debug Diagnostics tool to troubleshoot an IIS process that stops unexpectedly

ASP.NET Custom errors for developer machines

Does this ever happen to you?
You are sitting at your development machine and you are made aware of an unhandled exception in a deployed asp.net application. You visit the deployed web app. You can't see the exception detail in your browser, because custom errors is set to remote only. So you have to login to the web server and instigate the exception.
Is there a built in way to turn custom errors off for certain remote clients?
This only happens to me for trivial applications where I haven't implemented a better solution, like ELMAH. But, it's still annoying when it happens.

2 things. One, if you dont have a sophisticated Exception\Logging Policy already implemented, check out the Microsoft Patterns and Practices Enterprise Library - http://entlib.codeplex.com/ - this may be helpful in tracking down bugs in your software.
Secondly, at the very least, put some logging in your global.asax code behind's Application_Error event, you can capture the last unhandled exception by using something like:
Dim lastError As Exception = Server.GetLastError.GetBaseException
Then you can add custom error pages to your web.config and not worry about debugging from a yellow screen, but still capture any error details.
HTH

You can use remote debugging.
This MSDN Article discusses debugging strategies for ASP.NET. If you scroll down to the "Local and Remote Debugging" heading there's some information for you and a link to the remote debugging article.
Basically you can debug a remote server in visual studio. Not reccomended for production servers, but staging servers for sure.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.