I am trying to get some detailed performance information from an application my company is developing. Examples of information I am trying to get would be how long a network transaction takes, how much CPU/memory the application is using, how long it takes for a given method to complete, etc.
I have had some failed attempts at this in the past (like trying to measure small time periods by using DateTime.Now). On top of that I don't know much of anything about getting CPU and memory statistics. Are there any good .Net classes, libraries, or frameworks out there that would help me collect this sort of information and/or log it to a text file?
What you are looking for is Performance Counters. For .net you need this Performance Counter.
Performance counters are one way to go, and the System.Diagnostics.Stopwatch class are good foundational places to look for doing this.
With performance counters (beyond those provided) you will need to manage both the infrastructure of tracking the events, as well as reporting the data. The performance counter base classes supply the connection details for hooking up to the event log, but you will need to provide other reporting infrastructure if you need to report the data in another way (such as to a log file, or database).
The stopwatch class is a wrapper around the high performance timer, giving you microsecond or nanosecond resolution depending on the processor or the platform. If you do not need that high of resolution you can use System.DateTime..Now.Ticks to get the current tick count for the processor clock and do differential math with that, giving you millisecond or bettter precision for most operations.
When tracking CPU statistics be aware that multiple processors and multiple cores will complicate any accurate statistics in some cases.
One last caution with performance counters, be aware that not all performance counters are on all machines. For instance ASP.NET counters are not present on a machine which does not have IIS installed, etc.
for a modern opensource library to do performance metrics and monitoring consider using app-metrics.
github: https://github.com/AppMetrics/AppMetrics
website https://www.app-metrics.io
A platform independent open source performance recorder build on top of aspect injector can be found here: https://gitlab.com/hectorjsmith/csharp-performance-recorder. It can be added to any C# project. Instructions on how to use can be found in the README in Gitlab.
Benefits
It's aspect oriented, meaning you don't have to litter your code with DateTime.Now - you just annotate the appropriate methods
It takes care of pretty printing the results - you can do with the what you like e.g. printing them to a file
Notes
This performance recorder is focused on the timing of methods only. It doesn't cover CPU or memory usage.
For cpu/memory use performance counters. For more specific information about specific methods (or lines of code) and specific objects, use a profiler. red-gate makes a great profiler.
http://www.red-gate.com/products/ants_performance_profiler
Related
Folks, I've been programming high speed software over 20 years and know virtually every trick in the book from micro-bench making cooperative, profiling, user-mode multitasking, tail recursion, you name it for very high performance stuff on Linux, Windows, and more.
The problem is that I find myself befuddled by what happens when multiple threads of CPU intensive work are exposed to a multi-core processors.
The results from performance in micro benchmarks of various ways of sharing date between threads (on different cores) don't seem to follow logic.
It's clear that there is some "hidden interaction" between the cores which isn't obvious from my own programming code. I hear of L1 cache and other issues but those are opaque to me.
Question is: Where can I learn this stuff ? I am looking for an in depth book on how multi-core processors work, how to program to capitalize on their memory caches or other hardware architecture instead of being punished by them.
Any advice or great websites or books? After much Googling, I'm coming up empty.
Sincerely,
Wayne
This book taught me a lot about these sorts of issues about why raw CPU power is not necessary the only thing to pay attention to. I used it in grad school years ago, but I think all of the principles still apply:
http://www.amazon.com/Computer-Architecture-Quantitative-Approach-4th/dp/0123704901
And essentially a major issue in multi-process configurations is synchronizing the access to the main memory, if you don't do this right it can be a real bottleneck in the performance. It's pretty complex with the caches that have to be kept in sync.
my own question, with answer, on stackoverflow's sister site: https://softwareengineering.stackexchange.com/questions/126986/where-can-i-find-an-overview-of-known-multithreading-design-patterns/126993#126993
I will copy the answer to avoid the need for click-through:
Quote Boris:
Parallel Programming with Microsoft .NET: Design Patterns for
Decomposition and Coordination on Multicore Architectures https://rads.stackoverflow.com/amzn/click/0735651590
This is a book, I recommend wholeheartedly.
It is:
New - published last year. Means you are not reading somewhat outdated
practices.
Short - about 200+ pages, dense with information. These
days there is too much to read and too little time to read 1000+ pages
books.
Easy to read - not only it is very well written but it
introduces hard to grasps concepts in really simple to read way.
Intended to teach - each chapter gives exercises to do. I know it is
always beneficial to do these, but rarely do. This book gives very
compelling and interesting tasks. Surprisingly I did most of them and
enjoyed doing them.
additionally, if you wish to learn more of the low-level details, this is the best resource i have found: "The Art of Multiprocessor Programming" It's written using java as their code samples, which plays nicely with my C# background.
PS: I have about 5 years "hard core" parallel programming experience, (abet using C#) so hope you can trust me when I say that "The Art of Multiprocessor Programming" rocks
My answer on "Are you concerned about multicores"
Herb Sutter's articles
Video Series on Parallel Programming
One specific cause of unexpected poor results in parallelized code is false sharing, you won't see that coming if you dont know what's going on down there (I didn't). Here a two articles that dicuss the cause and remedy for .Net:
http://msdn.microsoft.com/en-us/magazine/cc872851.aspx
http://www.codeproject.com/KB/threads/FalseSharing.aspx
Rgds GJ
There are different aspects to multi-threading requiring different approaches.
On a webserver, for example, the use of thread-pools is widely used since it supposedly is "good for" performance. Such pools may contain hundreds of threads waiting to be put to work. Using that many threads will cause the scheduler to work overtime which is detrimental to performance but can't be avoided on Linux systems. For Windows the method of choice is the IOCP mechanism which recommends a number of threads not greater than the number of cores installed. It causes an application to become (I/O completion) event driven which means that no cycles are wasted on polling. The few threads involved reduce scheduler work to a minimum.
If the object is to implement a functionality that is scaleable (more cores <=> higher performance) then the main issue will be memory bus saturation. The saturation will occur due to code fetching, data reading and data writing. An incorrectly implemented code will run slower with two threads than with one. The only way around this is to reduce the memory bus work by actively:
tailoring the code to a minimal memory footprint (= fits in the code cache) and which doesn't call other functions or jump all over the place.
tailoring memory reads and writes to a minimum size.
informing the prefetch mechanism of coming RAM reads.
tailoring the work such that the ratio of work performed inside the core's own caches (L1 & L2) is as great as possible in relation to the work outside them (L3 & RAM).
To put this in another way: fit the applicable code and data chunks into as few cache lines (# 64 bytes each) as possible because ultimately this is what will decide the scaleability. If the cache/memory system is capable of x cache line operations every second your code will run faster if its requirements are five cache lines per unit of work (=> x/5) rather than eleven (x/11) or fifty-two (x/52).
Achieving this is not trivial since it requires a more or less unique solution every time. Some compilers do a good job of instruction ordering to take advantage of the host processor's pipelining. This does not necessarily mean that it will be a good ordering for multiple cores.
An efficient implementation of a scaleable code will not necessarily be a pretty one. Recommended coding techniques and styles may, in the end, hinder the code's execution.
My advice is to test how this works by writing a simple multi-threaded application in a low-level language (such as C) that can be adjusted to run in single or multi-threaded mode and then profiling the code for the different modes. You will need to analyze the code at the instruction level. Then you experiment using different (C) code constructs, data organization, etc. You may have to think outside the box and rethink the algorithm to make it more cache-friendly.
The first time will require lots of work. You will not learn what will work for all multi-threaded solutions but you will perhaps get an inkling of what not to do and what indications to look for when analyzing profiled code.
I found this link that specifically explains the issues of
multicore cache handling on CPUs that was affecting my
multithreaded program.
http://www.multicoreinfo.com/research/intel/mem-issues.pdf
The site multicoreinfo.com in general has lots of good
information and references about multicore programming.
I am getting ready to perform a series of performance comparisons of various of the shelf products.
What do I need to do to show credibility in the tests? How do I design my benchmark tests so that they are respectable?
I am also interested in any suggestions on the actual design of the tests. Ways to load data without effecting the tests (Heisenberg Uncertainty Principle), or ways to monitor... etc
This is a bit tricky to answer without knowing what sort of "off the shelf" products you are trying to assess. Are you looking for UI responsiveness, throughput (e.g. email, transactions/sec), startup time, etc - all of these have different criteria for what measures you should track and different tools for testing or evaluating. But to answer some of your general questions:
Credibility - this is important. Try to make sure that whatever you are measuring has little run to run variance. Utilize the technique of doing several runs of the same scenario, get rid of outliers (i.e. your lowest and highest), and evaluate your avg/max/min/median values. If you're doing some sort of throughput test, consider making it long running so you have a good sample set. For example, if you are looking at something like Microsoft Exchange and thus are using their perf counters, try to make sure you are taking frequent samples (once per sec or every few secs) and have the test run for 20mins or so. Again, chop off the first few mins and the last few mins to eliminate any startup/shutdown noise.
Heisenburg - tricky. In most modern systems, depending on what application/measures you are measuring, you can minimize this impact by being smart about what/how you are measuring. Sometimes (like in the Exchange example), you'll see near 0 impact. Try to use as least invasive tools as possible. For example, if you're measuring startup time, consider using xperfinfo and utilize the events built into the kernel. If you're using perfmon, don't flood the system with extraneous counters that you don't care about. If you're doing some exteremely long running test, ratchet down your sampling interval.
Also try to eliminate any sources of environment variability or possible sources of noise. If you're doing something network intensive, consider isolating the network. Try to disable any services or applications that you don't care about. Limit any sort of disk IO, memory intensive operations, etc. If disk IO might introduce noise in something that is CPU bound, consider using SSD.
When designing your tests, keep repeatability in mind. If you doing some sort of microbenchmark type testing (e.g. perf unit test) then have your infrastructure support running the same operation n times exactly the same. If you're driving UI, try not to physically drive the mouse and instead use the underlying accessibility layer (MSAA, UIAutomation, etc) to hit controls directly programmatically.
Again, this is just general advice. If you have more specifics then I can try to follow up with more relavant guidance.
Enjoy!
Your question is very interesting, but a bit vague, because without knowing what to test it is not easy to give you some clues.
You can test performance from many different angles, then, depending on the use or target of the library you should try one approach or another; I will try to enumerate some of the things you may have to consider for measurement:
Multithreading: if the library uses
it or your software will use the
library in a multithreaded context
then you may have to test it with
many different processor and
multiprocessor configurations to see
how it reacts.
Startup time: its
importance depends on how intensively
will you use the library and what’s
the nature of the product being built
with it (client, server …).
Response time: for this do not take
the first execution, try to execute
the same call many times after the
first one and do an average. Using
System.Diagnostics.StopWatch could be
very useful for that.
Memory
consumption: analyze the growth,
beware of exponential ones ;). Go a
step further and measure quantity of
objects being created and disposed.
Responsiveness: you should not only
measure raw performance, how the user
feels the speed of the product it is
very important too.
Network: if the
library uses resources on the network
you may have to test it with
different bandwidth and latency
configurations, there is software to
simulate these situations.
Data:
try to create many different testing
data packages, trying to cover, for
example: a big bunch of raw data,
then a large set made of many smaller
chunks, a long iteration with small
pieces of data, …
Tools:
System.Diagnostics.Stopwatch: essential for benchmarking method calls
Performance counters: whenever available they are very useful to know what’s happening inside, allowing you to monitor the software without affecting its performance.
Profilers: there are some good memory and performance profilers in the market, but as you said, they always affect the measurements. They are good for finding bottlenecks in your software, but I don’t think you can use them for a comparison test.
Why do you care about the performance? In both cases the time taken to write the message to wherever you a storing your log will be a lot slower than anything else.
If you are really doing that match logging, then you are likely to need to index your log files so you can find the log entry you need, at that point you are not doing standard logging.
I am using SOA Architecture for project using Microsoft Technologies .NET 3.5 Platform. Can you give me steps/tools/guidelines/knowledge on the shortest and fast route to find the methods that cause the major Hardware bottlenecks like CPU time, memory usage. Also suggest what are the ways to improve thoroughput, scalability with response time.
Regards/Anand
I don't know any "short and fast route" to find any kind of bottle neck. So this is how I would approach the problem:
We usually generate logs for general time measures. You could inject a WCF behaviour which logs duration of each server method call. You could produce statistics from that. Consider duration of a method call and also number of calls to the same method (only optimize frequent method calls).
Memory is more complicated. You need to call a method separately to measure memory of a single method. Mostly it depends on existing data. There are tools to hunt memory leaks, if you intent to do this.
I found most unnecessary performance problems by observing database activity (eg. using Profiler for Sql Server).
I've been researching a bit about .NET Performance counters, but couldn't find anything detailed (unlike the rest of the framework).
My question is: Which are the most relevant performance counters to get the maximum computing power of the machine? Well, the .NET virtual machine, that is..
Thank you,
Chuck Mysak
You haven't described what you mean by "computing power". Here are some of the things you can get through performance counters that might be relevant:
Number of SQL queries executed.
Number of IIS requests completed.
Number of distributed transactions committed.
Bytes of I/O performed (disk, network, etc.).
There are also relative numbers, such as percentages of processor and memory in use which can give an indication of how much of the "power" of your system is actually being used.
However, you will not find anything that correlates cleanly with raw computing "power". In fact, who is to say that the machine's full "power" is being taken advantage of at the time you look at the counters? It sounds like what you really want to do is run a benchmark, which includes the exact work to be performed and the collection of measurements to be taken. You can search the Internet for various benchmark applications. Some of these run directly in .NET while the most popular are native applications which you could shell out to execute.
What is the best available tool to monitor the memory usage of my C#/.Net windows service over a long period of time. As far as I know, tools like perfmon can monitor the memory usage over a short period of time, but not graphically over a long period of time. I need trend data over days, not seconds.
To be clear, I want to monitor the memory usage at a fine level of detail over a long time, and have the graph show both the whole time frame and the level of detail. I need a small sampling interval, and a large graph.
Perfmon in my opinion is one of the best tools to do this but make sure you properly configure the sampling interval according to the time you wish to monitor.
For example if you want to monitor a process:
for 1 hour : I would use 1 second intervals (this will generate 60*60 samples)
for 1 day : I would use 30 second intervals (this will generate 2*60*24 samples)
for 1 week : I would use 1 minute intervals (this will generate 60*24*7 samples)
With these sampling intervals Perfmon should have no problem generating a nice graphical output of your counters.
Well I used perfmon, exported the results to a csv and used excel for statistics afterwards. That worked pretty well last time I needed to monitor a process
Playing around with Computer Management (assuming you're running Windows here) and it seems like you can make it monitor a process over time. Go to computer management -> performance logs and alerts and look at the counter/trace logs. Right click on counter logs and add a new log. Now click add object and select memory. Now click add counters and change the "Performance Object" to Process, and select your process.
As good as monitoring the memory is by itself, you're probably thinking of memory profiling to identify leaks or stale objects - http://memprofiler.com/ is a good choice here, but there are plenty of others.
If you want to do something very specific, don't be afraid to write your own WMI-based logger running on a timer - you could get this to email you process statistics, warn when it grows too fast or too high, send it as XML for charting, etc.
If you're familiar with Python, it's pretty easy to write a script for this.
Activestate Python (which is free) exposes the relevant parts of the Win32 API through the win32process module.
You can also check out all win32 related modules or use gotAPI to browse the Python standard libs.
I would recommend using the .Net Memory Validator tool from software verify.
This tool helped me to solve many different issues related to memory management in .Net application I have to work with.
I use more frequently the C++ version but they are quite similar and the fact that you can really see in real-time the type of the objects being allocated will be invaluable to you.
I've used ProcessMonitor if you need something more powerful than perfmon.