How to troubleshoot slow requests in an Azure .NET Web application

How to troubleshoot slow requests in an Azure .NET Web application - c#

With this post, I try to help my team and hopefully other .NET teams around the world to help them troubleshoot slow requests in the Azure Web application running as an API for an Angular frontend. The goal is to create a small guideline together with the community. There is a lot of chaotic information out there, but each application is different. Maybe we can list up some checks based on the experience of other teams.
Context:
We have an API running in a .NETCore 2.1.x Azure web application on:
p3v2 app service plan.
Separate S1 Web application for Angular frontend (out of scope of this post)
Azure SQL Database Standard S6
Azure Redis cache Standard 6GB
Application Insights
Important Note: This post is about troubleshooting a n in general fast-running API application which is suddenly slowing down. So no specific API calls, but all requests are suddenly slow which means they cannot directly be linked to specific code.
The API application has an AVG load of 35k requests during 5 minutes (see application insights - requests) and 300-400 simultaneous users.
Usually the requests have an avg response of a few milliseconds.
When the application is running slow it can be caused by several factors. Please find below our first standard troubleshooting guideline:
Check the Memory and CPU usage of the Azure service plan. Is it too high?
Ex: Once we had an issue of Out-of-memory exceptions caused by the Dotnet.exe app still running in 32bit. Make sure you deploy your .net application as 64bit if a lot of memory is needed
Check the Database DTU usage. If it's too high this can also cause slow API requests as they are waiting for the database to reply
Check the load on your Redis cache instance and see if it's not too high.
Now, my question to the community: Are there other things that can slow down the application significantly but are not seen in the three bullets above?
The API was suddenly responding very slow yesterday, however, there were:
no high CPU usage, nor high Memory usage of the app service plan
no high DTU usage of the database
no high usage of Redis cache
Maybe some examples I can think of:
Can sudden application exceptions on one particular request slow down other API requests without influencing the three bullets above?
Can too many requests, sent by one external client, slow down the application without influencing the three bullets above? If yes how can this be noticed?
Other things that can be checked?

Related

Temporarily Scale Up Azure Web App RAM (e.g. App Service Plan) for Single Requests

We have an Azure web app used for internal reporting, and 99% of the time it can handle all the traffic / requests it needs to on the minimum pricing tier (3.5 GB RAM).
But there is one specific request to generate an Excel Report that temporarily requires ~8 GB of RAM to service (ClosedXML is a beast, and we've already minimized the peak RAM footprint in every way possible). Unfortunately, this requires not only the next pricing tier up (7GB) but the one after that, giving us 14 GB to play with.
This request only takes ~1 minute to service, so after trying everything else, I'm considering using Azure APIs to programmatically change the App Service Plan when the request comes in, wait the 10 seconds or so for it to kick in, then process the request, and scale back down afterwards.
Is this a sane approach, or is there some other feature I'm not aware of to temporarily perform a memory-hungry action? I considered an Azure function, but I've read those are limited to 1.5GB RAM... As far as I can tell, this work can't be subdivided up in any way without becoming an expert on manipulating the zipped-XML underlying Excel workbooks.

Sounds reasonable what you are trying to do, we are doing similar things where we scale thing up before running massive monthly imports, we scale both the front end functions and the back end CosmosDB and then scale back down again once the import is done so I don't think you will have any issues doing this.
On a side note there is no 1.5 GB limit on azure functions, it totally depends on the underlying hosting solution, you can host a function on a P3V3 App Service Plan or even bigger dedicated plans and benefit from the resources they provide but that is a different topic.

No out of box from AppService (Plan). In similar situation, we started with an automation account but upgraded to LogicApps.
Using LogicApp as request broker, for specific kind of operation invoke https://learn.microsoft.com/en-us/rest/api/appservice/app-service-plans/update to scaleup the AppService plan and after the successful completion scale down. Btw, hosting the LogicApp on the APIM as well before exposing the url!

Why is my azure webapp request sometimes slow

My azure web application sometimes reacts very slowly. He waits a few seconds before executing the request.
Of course I have the setting "always on" turned on.
It's running on a S2 service plan.
Avg users online 3
No vertical or horizontal scaling configured.
Application
Asp.net MVC
.net Framework 4.6.1
C#
Does anyone have an idea why this problem occasionally occurs?

Ok i see based on your picture that there is a wait time of 98.71% and lots of wait time from the compiler, so i would recommend you to consider to use precompiled views on your mvc app, to avoid the runtime compilation of the views. If you are using Azure DevOps, you should be able to change your task to build the solution and add the following options on the MSBuild arguments.
/p:PrecompileBeforePublish=true /p:UseMerge=true /p:SingleAssemblyName=AppCode

When you see the WebApp being slow it is important to understand what HTTP requests are slow and whether those HTTP requests are slow all the time or it is an intermittent issue? How are the CPU and memory metrics and what is the pattern of slowness? If you have application Insights enabled please navigate to the "Performance" tab to see the requests were are slow and whether they are dependent on an external component.
Collecting CLR profiler in the context of slowness will reveal where the time is spent.
You can navigate to Azure Portal-->WebApp-->Diagnose and solve problem blade-->Diagnostic tools-->Autoheal and enable the rule to collect the CLR profiler traces on slowness.
Once the rule triggers it will collect the profiler traces and build a report for your review.

slow performance- IIS or application?

Our team has an application in Android, with a .NET c# backend, hosted in IIS.
Recently, we have observed sudden and unexplainable latencies in our customers with the following scenario:
Without any warning, users are enable to change the channel (Zapping) , since the product has to do with Live Media Streaming, and they can not even log out of the application
The mobile application connected to another backend (still a c# backend) , is working properly, without any problem
After some time (which varies from 6 hours of the first incident, to 5 minutes of the last one), it all turns back to normal.
I have enabled Failed Request Tracing logs, to see if I can get anything from there, and I have results as follows:
<failedRequest url="https://ourDNS.com:443/servertime.aspx"
siteId="1"
appPoolId="DefaultAppPool"
processId="22232"
verb="POST"
remoteUserName=""
userName=""
tokenUserName="NT AUTHORITY\IUSR"
authenticationType="anonymous"
activityId="{80013C53-0802-B500-B63F-84710C7967BB}"
failureReason="TIME_TAKEN"
statusCode="200"
triggerStatusCode="0"
timeTaken="45141"
xmlns:freb="http://schemas.microsoft.com/win/2006/06/iis/freb"
>
The page described above is a simple page, that first gets the server's timezone, and then after getting the customer's timezone (that can be set manually from the client), returns the exact date and time of the device where the application is hosted, for further calculations of stream program, what is playing now etc. However, for this page, that returns a simple JSON with a string in it, it requires some times more than 45 seconds (to me this is insane).
Another log from Client side in the moment is one Exception as above:
java.net.SocketTimeoutException
at java.net.PlainSocketImpl.read(PlainSocketImpl.java:491)
at java.net.PlainSocketImpl.access$000(PlainSocketImpl.java:46)
at java.net.PlainSocketImpl$PlainSocketInputStream.read(PlainSocketImpl.java:240)
at org.apache.http.impl.io.AbstractSessionInputBuffer.fillBuffer(AbstractSessionInputBuffer.java:103)
at org.apache.http.impl.io.AbstractSessionInputBuffer.readLine(AbstractSessionInputBuffer.java:191)
at org.apache.http.impl.conn.DefaultResponseParser.parseHead(DefaultResponseParser.java:82)
at org.apache.http.impl.io.AbstractMessageParser.parse(AbstractMessageParser.java:174)
at org.apache.http.impl.AbstractHttpClientConnection.receiveResponseHeader(AbstractHttpClientConnection.java:180)
at org.apache.http.impl.conn.DefaultClientConnection.receiveResponseHeader(DefaultClientConnection.java:235)
at org.apache.http.impl.conn.AbstractClientConnAdapter.receiveResponseHeader(AbstractClientConnAdapter.java:259)
at org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse(HttpRequestExecutor.java:279)
at org.apache.http.protocol.HttpRequestExecutor.execute(HttpRequestExecutor.java:121)
at org.apache.http.impl.client.DefaultRequestDirector.execute(DefaultRequestDirector.java:428)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:555)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:487)
at org.apache.http.impl.client.AbstractHttpClient.execute(AbstractHttpClient.java:465)
at com.framework.utilityframe.webhelper.HttpRequest.getHttpResponse(HttpRequest.java:316)
at com.framework.utilityframe.webhelper.HttpRequest.httpRequest(HttpRequest.java:393)
at com.tibo.webtv.web.TiboLog.logBufferingError(TiboLog.java:319)
at com.tibo.webtv.CustomVideoView$Buffering_Problem.doInBackground(CustomVideoView.java:324)
at com.tibo.webtv.CustomVideoView$Buffering_Problem.doInBackground(CustomVideoView.java:307)
at android.os.AsyncTask$2.call(AsyncTask.java:287)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:305)
at java.util.concurrent.FutureTask.run(FutureTask.java:137)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1076)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:569)
at java.lang.Thread.run(Thread.java:856)
Reading through different forums, I have seen different causes of performance leaks, starting from database to IIS and even a misconfiguration of the application. I have discarded database as a cause because:
At the moment of the problem, database parameters were absolutely fine, no changes in queries time execution, no waiting tasks, no locking
Secondly, the mobile and Decoder application connect to the same database, and the mobile application is running just fine with the same queries
Now, if I think of IIS, every Application hosted at that AppPool, was running fine and without delays, but still there may be something I am missing over there
And at least, something that makes me suspicious is the fact that the mobile application differs in two ways with the Decoder application:
First, the mobile application takes the responses from the Backend in XML format, the Decoder uses JSON.
Second,the mobile application uses http requests, and the Decoder uses https (SSL)
If anyone has experienced similar issues, their help would be greatly appreciated. And for any other detail you need, just ask and I will provide.

So,
Today, our team made another test, which included :
Application hosted in one server and database in another
Application and database hosted in a completely different server (Azure environment)
In both cases, the result was the same: Latencies and problem at the service.
The problem was neither at the backend nor the server. First, the Java application by mistake executed Sync Tasks when saving the logs to another server(dedicated, with full potential to keep as much data as you can give). Second, the log server had a full HDD, with more than 1 TB of only DB Logs, so when the application executed those Sync Tasks (which came as the first call, before any interaction with the channels), they received the Socket exceptions. So, maybe for someone else who may see this post: PLEASE,ALWAYS CHECK YOUR TASKS IN YOUR APPLICATION,AND ALWAYS CHECK ANY SERVER RELATED TO YOUR APPLICATION!!! Thank you very much :D

What could be the reason for such kind of Azure Web Site hangs?

I have a rather high-load deployment on Azure: 4 Large instances serving about 300-600 requests per second. Under normal conditions: "Average Response Time" is 70 to 150ms, but sometimes it may grow up to 200-300ms, but it's absolutely OK.
Though, one or two times per day (not at "Rush Hours") I see such picture on the Web Site Monitoring tab:
So, number of requests per minute significantly drops, average response time is growing on to 3 minutes, and after a while – everything comes back to normal.
During this "Blackout" there is only 0.1% requests being dropped (Http Server Errors with timeout), other requests just wait in queue and are normally processed after few minutes. Though, not all clients are ready to wait :-(
Memory usage is under 30% all the time, CPU usage is only up to 40-50%.
What I've already checked?:
Traces for timed-out requests: they did timed out at random locations.
Throttling for Azure Storage and other components used: no throttling at all.
I also tried to route all traffic through CloudFlare: and saw the same problems.
What could be the reason for such problems? What may I check next?
Thank you all in advance!
Update 1: BenV proposed good thing to try, but unfortunately it showed nothing :-(
I configured processes recycling every 500k requests and also added worker nodes, so CPU utilization is now less than 40% all day long, but blackouts still appear.
Update 2: Project uses ASP.Net MVC 4.

I had this exact same problem. For me I saw a lot of WinCache errors in my logs.
Whenever the site would fail, it would have a lot of WinCache errors in the log. WinCache is how IIS handles PHP to try to speed up the processing. It’s a Microsoft built add-on that is enabled by default in IIS and all Azure sites. WinCache would get hung up and instead of recycling and continuing, it would consume all the memory and file handles on an instance, essentially locking it up.
I added new App setting in the Azure Portal to scan a folder for php.ini settings changes.
d:\home\site\ini
Added a file in d:\home\site\ini\settings.ini
that contains the following
wincache.fcenabled=1
session.save_handler = files
memory_limit = 256M
wincache.chkinterval=5
wincache.ucachesize=200
wincache.scachesize=64
wincache.enablecli=1
wincache.ocenabled=0
This does a few things:
wincache.fcenabled=1
Enables file caching using WinCache (I think that's the default anyway)
session.save_handler = files
Changes the session handler from WinCache (Azure Default) to standard file based to reduce the cache engine stress
memory_limit = 256M
wincache.chkinterval=5
wincache.ucachesize=200
wincache.scachesize=64
wincache.enablecli=1
Sets the WinCache size to 256 megabytes per thread and limits the overall Cache size. This forces WinCache to clear out old data and recycle the cache more often.
wincache.ocenabled=0
This is the big one. DISABLE WinCache Operational Code caching. That is WinCache caching the actual PHP scripts into memory. Files are still cached from line one, but PHP is interpreted per normal and not cached into large binary files.
I went from having a my Azure Website crash about once every 3 days with logs that look like yours to 120 days straight so far without any issues.
Good luck!

There's some nice tools available for Web Apps in the preview portal.
The Application Insights extension especially can be useful for monitoring and troubleshooting app performance.

CPU usage goes high in Asp.Net MVC application while longer process run by other utility

I have one application which is developed in ASP .NET MVC 3 which using a SQL server database.
Apart from this, I have one console application which calls an external web service and update the same database with the information and business rules. (Basically we iterate the records from Web service and process the business rule and update the same database), we have configured the console application with Windows scheduler to process it periodically.
The problem is, when my Console application runs periodically, it uses the 100% CPU usage (because we're getting more than 2000 records from web service), and because of that my current MVC application is gets haging OR sometime works very very slow because both application are configured on same windows server.
Could anybody please do let me know that How would I resolve this problem where I want both the things on same server because I have central database used by both application.
Thanks in advance.

You haven't given any detail that anyone can really provide resolution, so I'll simply suggest how I would approach it.
First, I would review the database schema with a DBA to make sure there aren't things like table locks (or if there are, come up with strategies to compensate for them). I would then use the SQL Server profiler to see where (or if) there are any bottle necks in SQL server while these things are running. I would then profile the console application to make sure it's not doing something it doesn't need to be doing. I might even consider profiling the web site to see if there's anything in there that might be contributing to slowness.
After that, I would figure out how to get rid of the Console application and work its functionality into the site. Spawning another application on a given web request is not scalable. More than a couple of those come in at once and you've got the potential to bog the server down very easily.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.