Application Insights Delay? - c#

I've looked in many places for details around the delay of time it takes for Application Insights data to appear in my dashboard, but can't find it documented anywhere.
I spent some time yesterday trying to debug an issue around my code seemingly unable to send data to application insights, only for the data to appear sometime later (~40 mins).
Does anybody have any details regarding time I should expect to have to wait prior to seeing data on my dashboard?
I've read a few FAQs an articles such as: https://azure.microsoft.com/en-gb/documentation/articles/app-insights-troubleshoot-faq/ but am none the wiser.
More specifically, these were attempts to track exceptions and custom events.

Generally raw examples of your data should be available within couple of minutes from the time you send it, and aggregated data takes about 5-10 minutes to appear. Also when we are experiencing a processing delay we display a banner on the Overview page in Application Insights in the portal as on the screenshot below.
If you saw 40 minutes delay seeing your data this was either the case of ongoing issue with the processing pipeline, in which case a message should have been shown (and if not, it is a detection problem on our side), or, as we are often seeing, there could have been a configuration problem with your application that was later addressed.

Agree with the comments in the accepted answer that real-time logging is a absolute requirement of an enterprise system. Even the Portal says the following on the Monitor section of the Azure Functions blade:
This appears to be due to metric aggregation. However I've just been shown Application Insights' Live Metrics Stream by a colleague. It has 1-second latency, which is probably what most readers of this question are after and thought worth sharing.

Related

Best way to rate limit clientside api in C#

I've ran into an issue which i'm struggling to decide the best way to solve. Perhaps my software articheture needs to change?
I have a cron job which hits my website method every 10 seconds and then on my website the method then makes an API call each time to an API however the API is rate limited x amount in a minute and y amount a day
Currently i'm exceeding the API limits and need to control this in the website method somehow. I've thought storing in a file perhaps but seems hacky similary to a database as I don't currently use one for this project.
I've tried this package: https://github.com/David-Desmaisons/RateLimiter but alas it doesn't work in my scenario and I think it would work if I did one request with a loop as provided in his examples. I noticed he had a persistent timer(PersistentCountByIntervalAwaitableConstraint) but he has no documentation or examples for it(I emailed him incase). I've done a lot of googling around and can't find any examples of this only server rate limiting which is the other way around server limiting client and not client limiting requests to server
How can I solve my issue without changing the cronjobs? What does everyone think the best solution to this is?
Assuming that you don't want to change the clients generating the load, there is no choice but to implement rate limiting on the server.
Since an ASP.NET application can be restarted at any time, the state used for that rate-limiting must be persisted somewhere. You can choose any data store you like for that.
In this case you have two limits: One per minute and one per day. If you simply apply two separate rate limiters you will end up with the daily limit being exceeded fairly quickly. After that, there will be no further access for the rest of the day. Likely, this is undesirable.
It seems better to only apply the daily limit because it is more restrictive. A simple solution would be to calculate how far apart requests must be to meet the daily limit. Then, you store the date of the last request. Any new incoming request is immediately failed if not enough time has passed.
Let me know if this helps you.

Azure Portal: How to See Callstacks

Apologies, this is not a short question:
Background
I have a B1 Azure Website, and for the life of me, cannot get exceptions with callstacks.
The WebAPI is hosted side-by-side with the website in the same solution, which I hear is unusual. Almost all configuration has been done through the solution, I believe. Most everything in the portal is probably default settings from a brand new site.
I will be the first to admit, I am a novice at Azure. I have previously hosted some exceedingly simple ASP websites (mostly pre-.NET) in the past. I have found the Azure Portal to be overwhelming, to say the least. Hence why I am here!
The main place I look for exceptions is in Application Insights, under Failures, Exceptions tab, however. While it usually (not always...) show that there were 500s, the vast majority of the time, it will show no callstack.
Situations
The few times it does catch a callstack, it's your normal bots poking at random directories... not the crippling exception I need to debug immediately. I recall hearing that Azure will use "AI to determine which callstacks to keep" or something market-y like that, but I can't find any settings regarding it. Even if that market-speak is true, why is it recording callstacks to daily bot attempts, but the rare application-crippling exception?
A month or so ago, I attempted to debug the live website via Visual Studio, but I get an error saying that Internet Explorer could not be found. Given that it's the year 2018 and Microsoft has moved onto Edge, I don't know why it wants Internet Explorer at all. I did find a response to this, saying to hack the registry and reinstall Internet Explorer, but that seemed overkill at the time.
Viewing Azure errors through Visual Studio's embedded Azure portal thing seems to show very similar data as the Azure portal does. No callstacks to be found.
Many years ago, a classic alert was set up for Http Server Errors, which still triggers to this day. It does not trigger on HttpExceptions from bots poking at the site, but it does for for important 500s, and that's good. What is interesting is that it is the most reliable way to hear about errors, besides user reports. Too bad they don't have callstacks...
Last night, we encountered an exception, presumably in the view, of a page. We got e-mails from the classic alert, as expected, but the Failures section does not show any failures at all. In the past, we'd see 500s, but no callstack. It would seem that last night's errors were not detected by anything but the classic alert and the user. I don't know if it is because last night's error was unique, or if we now mysteriously get even less information out of Azure.
Attempted Solutions
Over the years, I have followed a myriad of guides, ranging from flipping switches in the portal itself, to FTPing and looking at the raw logs (which apparently are not really about your application, as much as Microsoft hosting it). If I got a penny for every time I read a guide that said, "Simply click on the Exceptions tab to see your callstacks" I'd be rich :-P.
A month ago, I got so desperate I implemented Application_Error in the HttpApplication class for the application, and implemented ExceptionLogger for WebAPI, to manually log all exceptions to text files. Unfortunately, while this helped me fix one error, subsequent exceptions have not appeared there either. Just like Application Insights, mostly bots poking at non-existent directories show up in these logs.
A week ago, I got desperate enough that I wrote a janky "unit test" (ha!), that'd pull a copy of production data down and test it locally, which is absolutely bonkers.
I have spoken to other architect-level ASP.NET engineers that use Azure portal to varying frequencies, and they could not come up with any suggestions. We looked at the web.configs; there is one in the root and in the Views folder. We played with turning on customerrors, but obviously we can't have that running in production because it'd display the errors to the user. That being said, I wouldn't mind having real error messages appear to certain users. How would one accomplish that? If I were to guess, the issue is hidden in those web.configs, simply because they're ancient and so many hands have touched them.
Conclusion
I need a 100% bullet-proof way to get exceptions and their callstacks from ASP.NET hosted on Azure. Otherwise, it's nearly impossible to solve edge cases that appear unexpectedly in production. I don't recall this being a problem in my days before Azure.
I am certain an expert out there will have this solved in mere minutes, but, for now, I'm completely stumped. Thank you for your time!
A couple of things to try and check for:
Make sure that your Application Insights NuGet packages are up to date. I've had metrics quit working over the last couple of years, or new metrics show up on the AppInsights blade that I wasn't collecting. Upgrading to the latest NuGet packages did the trick.
Are you catching exceptions within your web app and then returning a HTTP 500 response explicitly? If so, you won't see a stack trace. Stack traces are captured after bubbling all the way up through your controller method unhandled.

Testing, analyzing, preventing, and handling mass or spam requests to ASP.NET Web API

I am developing a web API in ASP.NET that does some image processing. Essentially the user application will make get requests with a few arguments i.e an image, quality, text to draw on it, size, etc.
My concern is that I do not know exactly how fast these requests are going to come in. If I spam refresh on a get request for long enough, I see the memory starting to slowly increase until it hits 1G and then finally throws an OutOfMemory exception. Strangely enough, sometimes before hitting the OOM, I get a ArgumentException (even though I am using a valid request that works otherwise).
My questions are broad and as follows:
1) Is there a good tool to test this sort of mass request? I'd like to be able to spam my server so I can consistently analyze and troubleshoot any problems that arise. I havn't found anything and have just been clicking / pressing enter on the browser manually..
2) Is there a tool you'd recommend to analyze what specific processes in my program are causing this memory issue? If the Diagnostic Tools in VS are good enough, can you offer some guidance as to what I should be looking for? I.e investigating the call stack, memory profiling, etc..
3) Perhaps none of the above questions are even necessary if this one can be answered: Can these sort of requests be prevented? Maybe my API can ensure that they are only processed at a speed that can be handled (at the expense of user image load time).. I know that catching the exceptions alone isn't going to be enough, so is there something that ASP.NET provides for this sort of mass request prevention?
Thanks for taking the time to read, any answers are appreciated.

Multi-server n-tier synchronized timing and performance metrics?

[I'm not sure whether to post this in stackoverflow or serverfault, but since this is a C# development project, I'll stick with stackoverflow...]
We've got a multi-tiered application that is exhibiting poor performance at unpredictable times of the day, and we're trying to track down the cause(s). It's particularly difficult to fix because we can't reproduce it on our development environment - it's a sporadic problem on our production servers only.
The architecture is as follows: Load balanced front end web servers (IIS) running an MVC application (C#). A home-grown service bus, implemented with MSMQ running in domain-integration mode. Five 'worker pool' servers, running our Windows Service, which responds to requests placed on the bus. Back end SQL Server 2012 database, mirrored and replicated.
All servers have high spec hardware, running Windows Server 2012, latest releases, latest windows update. Everything bang up to date.
When a user hits an action in the MVC app, the controller itself is very thin. Pretty much all it does is put a request message on the bus (sends an MSMQ message) and awaits the reply.
One of the servers in the worker pool picks up the message, works out what to do and then performs queries on the SQL Server back end and does other grunt work. The result is then placed back on the bus for the MVC app to pick back up using the Correlation ID.
It's a nice architecture to work with in respect to the simplicity of each individual component. As demand increases, we can simply add more servers to the worker pool and all is normally well. It also allows us to hot-swap code in the middle tier. Most of the time, the solution performs extremely well.
However, as stated we do have these moments where performance is a problem. It's proving difficult to track down at which point(s) in the architecture the bottleneck is.
What we have attempted to do is send a request down the bus and roundtrip it back to the MVC app with a whole suite of timings and metrics embedded in the message. At each stop on the route, a timestamp and other metrics are added to the message. Then when the MVC app receives the reply, we can screen dump the timestamps and metrics and try to determine which part of the process is causing the issue.
However, we soon realised that we cannot rely on the Windows time as an accurate measure, due to the fact that many of our processes are down to the 5-100ms level and a message can go through 5 servers (and back again). We cannot synchronize the time across the servers to that resolution. MS article: http://support.microsoft.com/kb/939322/en-us
To compound the problem, each time we send a request, we can't predict which particular worker pool server will handle the message.
What is the best way to get an accurate, coordinated and synchronized time that is accurate to the 5ms level? If we have to call out to an external (web)service at each step, this would add extra time to the process, and how can we guarantee that each call takes the same amount of time on each server? Even a small amount of latency in an external call on one server would skew the results and give us a false positive.
Hope I have explained our predicament and look forward to your help.
Update
I've just found this: http://www.pool.ntp.org/en/use.html, which might be promising. Perhaps a scheduled job every x hours to keep the time synchronised could get me to the sub 5 ms resolution I need. Comments or experience?
Update 2
FWIW, We've found the cause of the performance issue. It occurs when the software tests if a queue has been created before it opens it. So it was essentially looking up the queue twice, which is fairly expensive. So the issue has gone away.
What you should try is using the Performance Monitor that's part of Windows itself. What you can do is create a Data Collector Set on each of the servers and select the metrics you want to monitor. Something like Request Execution Time would be a good one to monitor for.
Here's a tutorial for Data Collector Sets: https://www.youtube.com/watch?v=591kfPROYbs
Hopefully this will give you a start on troubleshooting the problem.

High number of Request Timeouts on IIS

I have a fairly busy site which does around 10m views a month.
One of my app pools seemed to jam up for a few hours and I'm looking for some ideas on how to troubleshoot it..? I suspect that it somehow ran out of threads but I'm not sure how to determine this retroactively..? Here's what I know:
The site never went 'down', but around 90% of requests started timing out.
I can see a high number of "HttpException - Request timed out." in the log during the outage
I can't find any SQL errors or code errors that would have caused the timeouts.
The timeouts seem to have been site wide on all pages.
There was one page with a bug on it which would have caused errors on that specific page.
The site had to be restarted.
The site is ASP.NET C# 3.5 WebForms..
Possibilities:
Thread depletion: My thought is that the page causing the error may have somehow started jamming up the available threads?
Global code error: Another possibility is that one of my static classes has an undiscovered bug in it somewhere. This is unlikely as the this has never happened before, and I can't find any log errors for these classes, but it is a possibility.
UPDATE
I've managed to trace the issue now while it's occurring. The pages are being loaded normally but for some reason WebResource.axd and ScriptResource.axd are both taking a minute to load. In the performance counters I can see ASP.NET Requests Queued spikes at this point.
The first thing I'd try is Sam Saffron's CPU analyzer tool, which should give an indication if there is something common that is happening too much / too long. In part because it doesn't involve any changes; just run it at the server.
After that, there are various other debugging tools available; we've found that some very ghetto approaches can be insanely effective at seeing where time is spent (of course, it'll only work on the 10% of successful results).
You can of course just open the server profiling tools and drag in various .NET / IIS counters, which may help you spot some things.
Between these three options, you should be covered for:
code dropping into a black hole and never coming out (typically threading related)
code running, but too slowly (typically data access related)

Categories

Resources