Developing within SharePoint 2010 - all the latest updates for it are installed (SP2 etc.)
Standard farm with 2 application servers, 2 front-end servers, Active Directory server and 2 SQL servers. All this stuff is hosted by Windows Azure Virtual machines, within Virtual network.
While performing simple SPWebApplication.Lookup() noticed that it takes very-very long to complete - about 16 seconds. To compare - locally it takes about 1 second. And on another very similar farm, also hosted in Azure - about 2 seconds.
What attempts were made to fix performance degradation:
Checked configs and network settings, pings etc. - looks 100% OK.
Profiled with SQL Profiler - no bottlenecks found - there is no hard SQL for this request actually.
Double-check that all the servers and DBs are upgraded and up to date.
Kick off all the possible errors that were found in ULS and Windows logs - now it's clear there.
Investigation of metrics with Metalogix Diagnostics manager - as result nothing critical was found. It only sometimes showed that processor queue length is big. But as I know, normal number for it is #of cores +1. So 4-5 in my case is fine. Also it's needed to note that from my perspective - for VM it's also normal to have such number.
Wrote very simple console app that performs lookup of web app. Profiled with Ants profiler. Noticed that call tree differs from the result received locally. Maybe that's OK, cause locally I have standalone installation.
The result at farm is not optimistic - several calls have a huge Hits count. Though, it's clear where the bottleneck in the call tree - all the ideas about the source have already finished. Profiling result as follows: http://1drv.ms/1kYT3rT
It would be great if you could advice.
Thanks in advance.
How many disks do you have on your VM?
Azure has limited IOPS, 500 per disk... organise your databases so they are on different disks to get more IOPS. You can have 16 disks per VM.
http://msdn.microsoft.com/library/azure/dn248436.aspx
Related
With this post, I try to help my team and hopefully other .NET teams around the world to help them troubleshoot slow requests in the Azure Web application running as an API for an Angular frontend. The goal is to create a small guideline together with the community. There is a lot of chaotic information out there, but each application is different. Maybe we can list up some checks based on the experience of other teams.
Context:
We have an API running in a .NETCore 2.1.x Azure web application on:
p3v2 app service plan.
Separate S1 Web application for Angular frontend (out of scope of this post)
Azure SQL Database Standard S6
Azure Redis cache Standard 6GB
Application Insights
Important Note: This post is about troubleshooting a n in general fast-running API application which is suddenly slowing down. So no specific API calls, but all requests are suddenly slow which means they cannot directly be linked to specific code.
The API application has an AVG load of 35k requests during 5 minutes (see application insights - requests) and 300-400 simultaneous users.
Usually the requests have an avg response of a few milliseconds.
When the application is running slow it can be caused by several factors. Please find below our first standard troubleshooting guideline:
Check the Memory and CPU usage of the Azure service plan. Is it too high?
Ex: Once we had an issue of Out-of-memory exceptions caused by the Dotnet.exe app still running in 32bit. Make sure you deploy your .net application as 64bit if a lot of memory is needed
Check the Database DTU usage. If it's too high this can also cause slow API requests as they are waiting for the database to reply
Check the load on your Redis cache instance and see if it's not too high.
Now, my question to the community: Are there other things that can slow down the application significantly but are not seen in the three bullets above?
The API was suddenly responding very slow yesterday, however, there were:
no high CPU usage, nor high Memory usage of the app service plan
no high DTU usage of the database
no high usage of Redis cache
Maybe some examples I can think of:
Can sudden application exceptions on one particular request slow down other API requests without influencing the three bullets above?
Can too many requests, sent by one external client, slow down the application without influencing the three bullets above? If yes how can this be noticed?
Other things that can be checked?
My code is in ASP.NET MVC Razor C# and Database is SQL Server 2012. Right now the code is on localhost.
ok. Before moving to the server, I want to test the website design should able to scale/support 100,000 concurrent users.
Question : I am a .NET Developer only. Is there any way to do the testing for 100,000 concurrent users on a single machine ?
Short answer: Yes, you can create load tests in visual studio (using VS Ultimate/Enterprise test tools) no problems.
Some basic info here: https://msdn.microsoft.com/en-us/library/vstudio/dd293540(v=vs.110).aspx
But...
Your machine will not be able to handle creating 100,000 simultaneous requests, let alone the site/application servicing those requests on a single machine.
You really need to setup a staging environment that will mimic your production implementation, then deploy and load test on that with load balancing and all the bells and whistles. Otherwise the load/stress test will be a waste of time, the stats you will get back from the test will show 100% timeouts over ~1,000 concurrent users (which is not at all a representation of the speed of your app, just the speed of your machine).
Then once you have said staging environment setup. I would suggest spreading the load test over 5-10 PC/VM's as well. This will give the best "real-world" results.
As per your question, you cannot test an application (either web or desktop application) with 100,000 users without the help of testing software's. The type of testing you are willing to perform is known as volume testing or stress testing in which number of users access the application at the same time and we observe the behavior of the application. This can be done by using a software called HP-Performance Center developed by HP. But it is a licensed software. You wont get it for free.
I have a rather high-load deployment on Azure: 4 Large instances serving about 300-600 requests per second. Under normal conditions: "Average Response Time" is 70 to 150ms, but sometimes it may grow up to 200-300ms, but it's absolutely OK.
Though, one or two times per day (not at "Rush Hours") I see such picture on the Web Site Monitoring tab:
So, number of requests per minute significantly drops, average response time is growing on to 3 minutes, and after a while – everything comes back to normal.
During this "Blackout" there is only 0.1% requests being dropped (Http Server Errors with timeout), other requests just wait in queue and are normally processed after few minutes. Though, not all clients are ready to wait :-(
Memory usage is under 30% all the time, CPU usage is only up to 40-50%.
What I've already checked?:
Traces for timed-out requests: they did timed out at random locations.
Throttling for Azure Storage and other components used: no throttling at all.
I also tried to route all traffic through CloudFlare: and saw the same problems.
What could be the reason for such problems? What may I check next?
Thank you all in advance!
Update 1: BenV proposed good thing to try, but unfortunately it showed nothing :-(
I configured processes recycling every 500k requests and also added worker nodes, so CPU utilization is now less than 40% all day long, but blackouts still appear.
Update 2: Project uses ASP.Net MVC 4.
I had this exact same problem. For me I saw a lot of WinCache errors in my logs.
Whenever the site would fail, it would have a lot of WinCache errors in the log. WinCache is how IIS handles PHP to try to speed up the processing. It’s a Microsoft built add-on that is enabled by default in IIS and all Azure sites. WinCache would get hung up and instead of recycling and continuing, it would consume all the memory and file handles on an instance, essentially locking it up.
I added new App setting in the Azure Portal to scan a folder for php.ini settings changes.
d:\home\site\ini
Added a file in d:\home\site\ini\settings.ini
that contains the following
wincache.fcenabled=1
session.save_handler = files
memory_limit = 256M
wincache.chkinterval=5
wincache.ucachesize=200
wincache.scachesize=64
wincache.enablecli=1
wincache.ocenabled=0
This does a few things:
wincache.fcenabled=1
Enables file caching using WinCache (I think that's the default anyway)
session.save_handler = files
Changes the session handler from WinCache (Azure Default) to standard file based to reduce the cache engine stress
memory_limit = 256M
wincache.chkinterval=5
wincache.ucachesize=200
wincache.scachesize=64
wincache.enablecli=1
Sets the WinCache size to 256 megabytes per thread and limits the overall Cache size. This forces WinCache to clear out old data and recycle the cache more often.
wincache.ocenabled=0
This is the big one. DISABLE WinCache Operational Code caching. That is WinCache caching the actual PHP scripts into memory. Files are still cached from line one, but PHP is interpreted per normal and not cached into large binary files.
I went from having a my Azure Website crash about once every 3 days with logs that look like yours to 120 days straight so far without any issues.
Good luck!
There's some nice tools available for Web Apps in the preview portal.
The Application Insights extension especially can be useful for monitoring and troubleshooting app performance.
Basically, I have a Windows service which performs a batch job.
I have two collections that are related, customerAccounts and events. The events collection logs actions that customers performed on a site, containing the timestamp, the name of the event, the page it occurred on and the username.
The service runs through each account and works out their journey phase and risk of account closure based on what events they have in the Events collection and a set of user-defined rules.
There are about 3,500 accounts and around 100,000 events in my database at present. The service takes just over 1 minute to run on my development PC, but takes seemingly forever on the server (I've estimated it takes roughly 2.5 hours based on modifying the service so it only performs the job on a single customer account.
My machine is a Core i7 with 16GB of RAM, the server is an Intel Xeon E5-2609 (64bit, Win 2008 R2) with 24GB of RAM. I put the database on a much older server (32bit, Windows 2003) and the service took about 2 minutes to run. So, I know that on my dev machine it takes just over a minute, and on older server hardware it takes just over 2 minutes, yet on a modern server it's taking a matter of hours.
Originally, the Mongo Shell was warning that NUMA was enabled on the server and should be switched off to avoid performance problems. This has since been turned off but hasn't seemed to have an affect on performance.
When I run db.currentOp(); on the server, I've noticed that it's always got a query "createIndexes" of some nature (the indexes were created ages ago), yet when I mongodump/restore the database to my dev machine and run the service/currentOp, the "createIndexes" query isn't there. Apart from that, nothing else seems to jump out at me.
Does anyone have any ideas / help on this mysterious performance issue? I'll post currentOp/mongostats if/when required.
Quick answer: I re-installed Mongo. No fancy configuration, just ran the setup and it fixed the issue.
I never worked out why Mongo was constantly creating indexes. The log file is 0.25GB for a single day full of event logs for "creating index".
I'd like to know my options for the following scenario:
I have a C# winforms application (developed in VS 2010) distributed to a number of offices within the country. The application communicates with a C# web service which lies on a main server at a separate location and there is one database (SQL Server 2012) at a further location. (All servers run Windows Server 2008)
Head Office (where we are) utilize the same front-end to manage certain information on the database which needs to be readily available to all offices - real-time. At the same time, any data they change needs to be readily available to us at Head Office as we have a real-time dashboard web application that monitors site-wide statistics.
Currently, the users are complaining about the speed at which the application operates. They say it is really slow. We work in a business-critical environment where every minute waiting may mean losing a client.
I have researched the following options, but do not come from a DB background, so not too sure what the best route for my scenario is.
Terminal Services/Sessions (which I've just implemented at Head Office and they say it's a great improvement, although there's a terrible lag - like remoting onto someones desktop, which is not nice to work on.)
Transactional Replication (Sounds like something quite plausible for my scenario, but would require all offices to have their own SQL server database on their individual servers and they have a tendency to "fiddle" and break everything they're left in charge of!) Wish we could take over all their servers, but they are franchises so have their own IT people on site.)
I've currently got a whole lot of the look-up data being cached on start-up of the application but this too takes 2-3 minutes to complete which is just not acceptable!
Does anyone have any ideas?
With everything running through the web service, there is no need for additional SQL Servers to be deployed local to the client. The WS wouldn't be able to communicate with these databases, unless the WS was also deployed locally as well.
Before suggesting any specific improvements, you need to benchmark where your bottlenecks are occurring. What is the latency between the various clients and the web service, and then from the web service and the database? Does the database show any waiting? Once you know the worst case scenario, improve that, and then work your way down.
Some general thoughts, though:
Move the WS closer to the database
Cache the data at the web service level to save on DB calls
Find the expense WS calls, and try to optimize the throughput
If the lookup data doesn't change all that often, use a local copy of SQL CE to cache that data, and use the MS Sync Framework to keep the data synchronized to the SQL Server
Use SQL CE for everything on the client computer, and use a background process to sync between the client and WS
UPDATE
After your comment, two additional thoughts. If your web service payload(s) is/are large, you can try adding compression on the web service (if it hasn't already been implemented).
You can also update your client to do the WS calls asynchronously, either in a thread or if you are using .NET 4.5 using async/await. This would at least allow the client to use the UI, but wouldn't necessary fix any issues with data load times.