i'm using springframework .net 1.2 and spark view engine for my web application running on .net 3.5 runtime. recently, i have been investigating the performance of my application running under load on multicore processor. i notice when under load a aop proxied method takes much much longer to complete with high context switching but low cpu utilization. i have profile my application using vs2010 resource contention profiler and it show that lock contention happened on every part of the application. i was wondering where could be wrong, is it because of the springframework we used?
We have identify the root of the problem. Our application uses a slot type thread local storage which based on our proof of concept testing, it performed badly under concurrent load. A good reference found from spring .net http://piers7.blogspot.com/2005/11/threadstatic-callcontext-and_02.html. The VS2010 resource contention profiling help us to identify the problem. Coming from java background i don't believe the problem could be the thread local storage until we did a POC.
Related
We are seeing a very high amount of CPU and memory usage from one of our .NET MVC apps and can't seem to track down what the cause of it is. Our group does not have access to the web server itself but instead gets notified automatically when certain limits are hit (90+% of CPU or memory). Running locally we can't seem to find the problem. Some items we think might be the culprit
The app has a number of threads running in the background when users take certain actions
We are using memcached (on a different machine than the web server)
We are using web sockets
Other than that the app is pretty standard as far as web applications go. Couple of forms here, login/logout there, some admin capabilities to manage users and data; nothing super fancy.
I'm looking at two different solutions and wondering what would be best.
Create a page inside the app itself (available only to app admins) that shows information about memory and CPU being used. Are there examples of this or is it even possible?
Use some type of 3rd party profiling service or application that gets installed on the web servers and allows us to drive down to find what is causing the high CPU and memory usage in the app.
i recommed the asp.net mvc miniprofiler. http://miniprofiler.com/
it is simple to implement and to extend, can run in production mode, can store its results to SQL Server. i used it many times to find difficult performance issues.
Another possibility is to use http://getglimpse.com/ in combination with the miniprofiler glimpse-plugin https://github.com/mcliment/miniprofiler-glimpse-plugin
both tools are open source and don't require admin access to the server.
You can hook up Preemptive's Runtime Intelligence to it. - http://www.preemptive.com/
Otherwise a profiler, or load test could help find the problem. Do you have anything monitoring the actual machine health? (Processor usage, memory usage, disk queue lengths, etc..).
http://blogs.msdn.com/b/visualstudioalm/archive/2012/06/04/getting-started-with-load-testing-in-visual-studio-2012.aspx
Visual studio has a built-in profiler (depending on version and edition). You may be able to WMI query the web server that has the issues, or write/provide diagnostic recording/monitoring tools to hand them over to someone that does have access.
Do you have any output caching? what version of IIS? Is the 90% processor usage you are getting alerted to showing that your web process is actually the one doing it? ( Perhaps it's not your app if the alert is improperly configured)
I had a similar situation and I created a system monitor to my app admins based on this project
I've a C# server developed on both Visual Studio 2010 and Mono Develop 2.8. NET Framework 4.0
It looks like this server behaves much better (in terms of scalability) on Windows than on Linux.
I tested the server scalability on native Windows(12 physical cores), and 8 and 12 cores Windows and Ubuntu Virtual Machines using Apache's ab tool.
The windows response time is pretty much flat. It starts picking up when the concurrency level approaches/overcomes the number of cores.
For some reason the linux response times are much worse. They grow pretty much linearly starting from level 5 of concurrency. Also 8 and 12 cores Linux VM behave similarly.
So my question is: why does it perform worse on linux? (and How can I fix that?).
Please take a look at the graph attached, it shows the averaged time to fulfill 75% of the requests as a function of the requests concurrency(the range bar are set at 50% and 100%).
I have a feeling that this might be due to mono's Garbage Collector. I tried playing around with the GC settings but I had no success. Any suggestion?
Some additional background information: the server is based on an HTTP listener that quickly parses the requests and queues them on a thread pool. The thread pool takes care of replying to those requests with some intensive math (computing an answer in ~10secs).
You need to isolate where the problem is first. Start by monitoring your memory usage with HeapShot. If it's not memory, then profile your code to pinpoint the time consuming methods.
This page, Performance Tips: Writing better performing .NET and Mono applications, contains some useful information including using the mono profiler.
Excessive String manipulation and Boxing are often 'hidden' culprits of code that doesn't scale well.
Try the sgen garbage collector (and for that, Mono 2.11.x is recommended). Look at the mono man page for more details.
I don't believe it's because of the GC. AFAIK the GC side effects should be more or less evenly distributed across the threads.
My blind guess is: you can fix it by playing with ThreadPool.SetMinThreads/SetMaxThreads API.
I would strongly recommend you do some profiles on the code with regards to how long individual methods are running. It's quite likely that you're seeing some locking or similar multi-threading difficulties that are not being handled perfectly by mono. CPU and RAM usage would also help.
I believe this may be the same problem that we tracked down involving the thread pool and the starting behavior for new threads, as well as a bug in the mono implementation of setMinThreads. Please see my answer on that thread for more information: https://stackoverflow.com/a/12371795/1663096
If you code throws a lot of exceptions, then mono is 10x faster than .NET
I was just wondering what will have the best performance.
Lets say we have 3 physical servers, where each server has 32cores and 64gb ram, and the application is a "standard" asp.net application. Load balancing is already in place.
Setup 1# - One applicaiton consumes all
- One IIS server with 1 application running on each physical server. (total of 3 application "endpoints")
Setup 2# - Shared resources
- One IIS server with 16 applications in a webfarm. (total of 48 application "endpoints")
Setup 3# - Virtualization
Virtualization: 15 virtual servers (total of 45 application endpoints)
What would have the best performance, and why?
It depends! Much depends on what the application is doing and where it spends its time.
In broad terms, though:
If an application is compute-bound -- i.e. the time taken to retrieve data from an external source such as a database is limited -- then in most cases setup #1 will likely be fastest. IIS is itself highly multi-threaded and giving it control of the machine's resources will allow it to self-tune.
If the application is data-bound -- i.e. more than (say) 40% of the time taken for each request is spent getting and waiting for data -- then setup #2 may be better. This is especially the case for less-well-written applications that do synchronous in-process databases accesses: even if a thread is sitting around waiting for database access to complete it's still consuming resources.
As discussed here: How to increase thread-pool threads on IIS 7.0 you'll run out of thread pool threads eventually. However, as discussed on MSDN here: http://blogs.msdn.com/b/david.wang/archive/2006/03/14/thoughts-on-application-pools-running-out-of-threads.aspx by creating multiple IIS worker processes you're really just papering over the cracks of larger underlying issues.
Unless there's other reasons -- such as manageability -- I'd not recommend setup #3 as the overhead of managing additional operating systems in entire virtual machines is quite considerable.
So: monitor your system, use something like the MiniProfiler (http://code.google.com/p/mvc-mini-profiler/) to figure out where the issues in the code lie, and use asynchronous non-blocking calls whenever you can.
It really depends on your application, you have to design for each architecture and performance test your setups. Some applications will run fast on setup 1 and not on the other setups and the other way around. There are many more things you can optimize on performance in iis. The key thing is you design you application for monitoring and scaling.
Our application is written in .NET (framework 3.5). We are experiencing problems with the applications performance when deployed in a terminal services environment. The client is using a TS farm. They have 4GB ram and a decent xeon processor.
When the application is opened in this environment, it sits at 25% CPU usage even when idle. When deployed in a normal client - server environment, it behaves normally, spiking the CPU usage when necessary and drops down to 0 when idle.
Does anyone have any ideas what could be causing this? Or, what I could do to investigate? We have no memory leaks that we can find using performance profiling tools.
This is a WinForms application
We dont have a TS environment avialable to test on
The application is a Business Application.
Basically, capturing and updating of data. Its a massive business application, but there is little multithreading, listeners etc. We do have ANTS profiler (memory / performance) but as mentioned in our environment we dont have the problem - it only occurs on the TS environment
Well, there are a few questions before we can really get you too far.
Is this a Console Application? WinForms Application? or Windows Service?
Do you have a Terminal Services environment available?
What does your application do?
Depending on what the application does, you might check to see if there is unusually high activity on their hardware that you have not accounted for. Examples that I have noticed in the past are items such as having a FileSystemWatcher accidentally listening to a "drop location" for reporting on a client server. Things of that nature, items that while "idle" shouldn't be busy, but are.
Otherwise, if you have the ability to do so, you could also use a tool such as ANTS Profiler from RedGate to see WHAT is using the CPU time on the environment.
Look for sections in your application that constantly repaints the window. Factor those out so that when sitting idle it isn't constantly repainting the window.
I have a large multi-threaded C# application running on a multi-core 4-way server. Currently we're using "server mode" garbage collection. However testing has shown that workstation mode GC is quicker.
MSDN says:
Managed code applications that use the server API receive significant benefits from using the server-optimized garbage collector (GC) instead of the default workstation GC.
Workstation is the default GC mode and the only one available on single-processor computers. Workstation GC is hosted in console and Windows Forms applications. It performs full (generation 2) collections concurrently with the running program, thereby minimizing latency. This mode is useful for client applications, where perceived performance is usually more important than raw throughput.
The server GC is available only on multiprocessor computers. It creates a separate managed heap and thread for each processor and performs collections in parallel. During collection, all managed threads are paused (threads running native code are paused only when the native call returns). In this way, the server GC mode maximizes throughput (the number of requests per second) and improves performance as the number of processors increases. Performance especially shines on computers with four or more processors.
But we're not seeing performance shine!!!! Has anyone got any advice?
It's not explained very well, but as far as I can tell, the server mode is synchronous per core, while the workstation mode is asynchronous.
In other words, the workstation mode is intended for a small number of long running applications that need consistent performance. The garbage collection tries to "stay out of the way" but, as a result, is less efficient on average.
The server mode is intended for applications where each "job" is relatively short lived and handled by a single core (edit: think multi threaded web server). The idea is that each "job" gets all the cpu power, and gets done quickly, but that occasionally the core stops handling requests and cleans up memory. So in this case the hope is that GC is more efficient on average, but the core is unavailable while its running, so the application needs to be able to adapt to that.
In your case it sounds like, because you have a single application whose threads are relatively coupled, you're fitting better into the model expected by the first mode rather than the second.
But that's all just after-the-fact justification. Measure your system's performance (as ammoQ said, not your GC performance, but how well you application behaves) and use what you measure to be best.
.NET 4.5 introduces concurrent server garbage collection.
http://msdn.microsoft.com/en-us/library/ee787088.aspx
specify <gcServer enabled="true"/>
specify <gcConcurrent enabled="true"/> (this is the default so can be omitted)
And there is the new SustainedLowLatencyMode;
In the .NET Framework 4.5, SustainedLowLatency mode is available for both workstation and server GC. To turn it on, set the GCSettings.LatencyMode property to GCLatencyMode.SustainedLowLatency.
Server: Your program is the only significant application on the machine and needs the lowest possible latency for GCs.
Workstation: You have a UI or share the machine with other important process
I have a test of my DB engine on .NET 6 where I compare performance on different GC settings. In general, more than 300% improvement did not happen, with 6 heaps.
https://vimeo.com/711964445
runtimeconfig.template.json
{
"configProperties": {
"System.GC.HeapHardLimit": 8000000000,
"System.GC.Server": true,
"System.GC.HeapCount": 6
}
}