I read that
One way to understand how the processor used its time is to look at
the hardware counters. To help with performance tuning, modern
processors track various counters as they execute code: the number of
instructions executed, the number of various types of memory accesses,
the number of branches encountered, and so forth. To read the
counters, you’ll need a tool such as the profiler in Visual Studio
2010 Premium or Ultimate, AMD Code Analyst or Intel VTune.
So what is the way to do it by coding like using PerformanceCounter And get this number of instructions executed ?
And is there any way to count the hit of instructions like DotTrace,Ants, VS profiler do it ?
For the VS Profiler:
To view a list of a list of all CPU counters that are supported on the current platform
In Performance Explorer, right-click the performance session and then click Properties.
Do one of the following:
Click Sampling, and then select Performance counter from the Sample event list. The CPU counters are listed in Available performance counters.
Note Click Cancel to return to the previous sampling configuration.
-or-
Select CPU Counters, and then select Collect CPU Counters. The CPU counters are listed in Available counters.
Note Click Cancel to return to the previous counter collection configuration.
You can't access the full CPU counters from your program, because: https://stackoverflow.com/a/8800266/613130
You can use RDPMC instruction or __readpmc MSVC compiler intrinsic, which is the same thing.
However, Windows prohibits user-mode applications to execute this instruction by setting CR4.PCE to 0. Presumably, this is done because the meaning of each counter is determined by MSR registers, which are only accessible in kernel mode. In other words, unless you're a kernel-mode module (e.g. a device driver), you are going to get "privileged instruction" trap if you attempt to execute this instruction.
(RDPMC is the instruction that returns the CPU counters)
I'll add that normally the number of instructions executed is quite useless. What is important is the CPU time that was used to execute some code. Each instruction has a different CPU time, so even knowing the number of them, you wouldn't know the number of CPU cycles/time used.
If you want to know the CPU cycles used for some instructions, then you can use the ASM instruction RDTSC/RDTSCP. Using it in C# is complex and quite time-consuming (so using it is slow enough that it often compromises the measuring you are trying to do). If you are interested, I wrote a response about it some days ago: https://stackoverflow.com/a/29646856/613130
Already answered to your question, so i think this is duplicating question.
It's native methods in WinAPI most likely, you can invoke them from C# via DLLImport. But for simplicity you could try use third party wrapper from here.
But you should clearly understand what you are doing. Between first call of your function and the second one will be difference because of JITting time. And if your method allocates memory - GC might occur any time while calling your method and it will be reflected in measurement.
Related
I'm trying to analyze my program with the Visual Studio Performance Analyzer, but I'm new to this tool.
If I start my program in the analyzer I get a report where I see the % of the total analyzing time a function took. But the total time can vary between 5 sec and 500 sec, so how can I see if my optimizations have had any effect?
If it was in milliseconds I would not have this problem, but I cannot find any function like "show in milliseconds" or similar. Does such a function exist?
There are two different CPU profiling methods in Visual Studio Profiler: Sampling & Instrumentation.
Sampling (Default)
The sampling profiling method interrupts the computer processor at set intervals and collects the function call stack. Exclusive sample counts are incremented for the function that is executing and inclusive counts are incremented for all of the calling functions on the call stack. Sampling reports present the totals of these counts for the profiled module, function, source code line, and instruction.
The sampling method is lightweight (no changes in your binaries) and has little effect on the execution of the application methods: it collects only statistical data about the work that is performed by an application during a profiling session.
It's good for initial explorations. A high % can means a slow function or a function that is called too often.
Instrumentation
The instrumentation profiling method collects detailed timing for the function calls in a profiled application. How? It injects code that captures timing information for each function in the instrumented file and each function call that is made by those functions. Instrumentation also identifies when a function calls into the operating for operations such as writing to a file.
In reports, you will see Application Time (total time that is spent executing a piece of code, but excluding time that is spent in calls to the operating system, ado.net, service calls, ... ) and Elapsed Time (total time that is spent executing a piece of code).
This profiling mode also has higher runtime overhead. This inevitably changes the performance characteristics of your application a little bit, but it's quite minimal.
Only this option allows you to see milliseconds. So change the profiling method in the wizard of in the performance explorer. Also note that this option is sometimes not available, such as when profiling Unit Tests.
I have just finished a project, but i 've got a question from my teacher. Why does my program (with same algorithm, same data, same environment) run with different finish time at different moments.?
Can anyone help me?
Example: Now my program runs for 1.03s.
but then it runs for 1.05s (sometimes faster 1.01).
That happens because your program is not the only entity executing in the system and it does not get all the resources immediately at all times.
For this reason it's practically of little value to measure short execution times as they are going to vary quite noticeably. Instead, if you're interested in more accurate time measurements, you should execute your code many times and calculate the average time of all runs.
Just an idea here but could it be because of the changes on the memory usage, cpu usage by the background applications changes on different times. I mean time difference would create difference only on;
The memory usage by the other applications
The physical conditions such as cpu heat. ( The changes in time is really small )
And system clock. If you do a random number generation or do any operation that uses system clock on the background might create that change.
Hope this helps.
Cheers.
That's easy. You capture system time difference, using a counter that's imprecise as it uses system resources. There are more programs that run in parallel with yours, some take priority over your code causing temporary (~20ms, depending on OS settings) suspension of your thread. Even in DOS there is code that runs in quasi-parallel with yours, given there's only one thread possible, your code is stalled while the time is still ticking (it's governed by that code).
Because Windows is not a real time operating system. Many other activity can happen when your program is executed, and the cpu can share its cycles with other running processes. Time can change even more if your program need to read from physical devices as disk ( database too ) and the net: this because physical resource can be busy serving other requests. memory could change things too, if there is page faults, so your app need to read pages from virtual memory and as a result you will see a performance decrease. Since you are using C#, time can change sensibly from first execution to the others in the same process, due to the fact the code is JITtted, ie is compiled from intermediate code to assembly the first time is seen, then it is used in the assembly form, that is dramatically faster.
The assumption is wrong. The environment does not stay the same. The available resources for your program depend on many things. E.g. CPU and memory utilization by other processes, e.g. background processes. The harddisk and/or network utilization due to other processes. Even if there are no other processes running your program will change the internal state of the caches.
In "real world" performance scenarios it is not uncommon to see fluctuations of +/- 20% after "warm up". That is: measure 10 times in a row as "warm up" and discard the results. Measure 10 times more and collect the results. --> +/- 20% is quite common. If you do not warm up you might even see differences several orders of magnitude due to "cold" caches.
Conclusion: your program is very small and uses very little resources and it does not benefit from durable cache mechanisms.
I have an application which has to process hundred of thousands of records. Right now, I can only process 500 of them at a time. Each batch can take up to 5 minutes to process/analyze (total of ~10 hours of processing). The reason for that limit of 500 records is memory consumption. I think that one of the main reason why our program takes so much memory is that fact that we don't set the size of lists or dictionaries (e.g. new List() instead of new List(100000)). I did the changes to set the size of collection so that .NET stop creating and copying new lists with extra capacity.
Here's my question: how to prove that a version of a program is more memory efficient? Are there performance counters I should look at? Tools? Monitoring?
There are tools like .net memory profiler from where you can have in-depth analysis of memory management and memory leaks of a .net application.
.NET Memory Profiler is a powerful tool for finding memory leaks and optimizing the memory usage in programs written in C#, VB.NET or any other .NET Language. With the help of the profiling guides, the automatic memory analyzer, and specialized trackers, you can make sure that your program has no memory or resource leaks, and that the memory usage is as optimal as possible.
I don't think a profiler would give you a real overview of how much more efficient one version is than the other.
I recommend using memory performance counters to do this. You can setup a few data collection sessions using perfmon, for both of your app's versions.
You should monitor at first only the process memory (of your process, of course). You can also add some of the rest of system memory performance counters, just to see how the overall virtual memory status changes during your process' lifetime.
If there is a difference and you still can't get a good overview and conclusion, you could start digging in the .NET CLR memory performance counters. Not as easy to analyze than the previous ones I mentioned, but much more detailed.
You should probably run each session for one hour or two in order to get some good data, given the lengthy processing done by your app.
You can use the in-built Performance Profiler tool if you have Visual Studio Ultimate: Debug -> Start Performance Analysis. If you don't have this available, you can use dotTrace by JetBrains, which accomplishes the same thing.
You can also measure execution time by using the Stopwatch class. Stopwatch is specifically designed to measure elapsed time and may (if available on your hardware) provide good granularity/accuracy using an underlying high-frequency hardware timer, compared to DateTime.Now. By using the Stopwatch class and comparing the two execution times you can see which runs faster. This does not give any information with CPU or Memory Usage.
Using lots of memory does normally not really slow you down until you go out of real memory and start using swapfiles. You can check that in Task manager while your app is running. Where you can also see the peak and working amount of memory of your app.
I would profile a bit (there are several memory profilers available) to see what objects are taking up your memory.
As for the lists; I do not think setting the capacity of Lists will fix your problem. The List class grows by doubling the capacity (tested as I could not find documentation). Worst case you are using twice the memory. If you create new lists inabundance you could create an object cache to re-use the lists. But in my experience .NET this does not help much in .NET as the runtime is quite efficient.
You can use CLR GC ETL events. Download perfview from Microsoft, run your application under it, and then just check the GC page.
Just check "total memory allocation" and "time spent in GC" would give you an idea of how managed memory is used.
If you want more details, check the CLR allocation tick events.
Perfview can also analyse your managed heap live objects.
It seems that the Ants profiler does instrumentation and sampling of code at exactly the same time which I find very interesting
I have used the VS profiler and you have to run two different profile sessions to identify bottlenecks - How does ANTS capture IO bound function calls without modifying and injecting code into the compiled functions?
EDIT: Does ants use instrumentation or sampling?
The Ants profiler offers several different profiling modes, some of which use sampling and some of which use instrumentation (the instrumentation modes are only available in the professional edition, and the sampling mode was introduced fairly recently). A brief description of the modes that are available is here, as well as a comparison between the different modes.
RedGate doesn't publish technical details about how their profiler works, but from experimentation I haven't found any significant differences from how other profilers work (just a different user interface, and I'm sure there are optimizations in some areas or fringe features that aren't available in other profilers). Based on your question I'm guessing you're somewhat familiar with other profilers, but if you're interested in how it works on a lower level, here's a brief overview:
In sampling mode, the profiler will periodically run OS interrupts to pause program execution, and checks what method the program is currently in. Every method in a binary or intermediate-language assembly consists of an instruction set. When a program is executed, every thread will progress along that instruction set, jumping to a different instruction set location when a method is invoked. The current location for the threads execution can be though of as a pointer to a location in this instruction set, and you can find out the address is for the instruction set for a given method. So a profiler builds a map of instruction set locations to method names, and when it pauses the program it checks where the current execution is. By mapping it to the method name, it can count the number of times that method has been invoked and how long it is taking to run. But since this is only a sample, there may be other methods that were called that we didn't notice because they returned before we paused the program in the next interval.
In instrumentation mode, the profiler will inject additional instructions into the program's instruction sets. Lets say you have an instruction set A->B->C that is invoked when the doSomething() method is called. A very crude profiler could inject additional instructions to do something like
long starttime = currentTime()
A
B
C
long endtime = currentTime() - starttime
this will tell you how much time it took to run the method. Of course, modern profilers do much more elaborate instructions than this to optimize performance, get performance on a per-line basis, get memory and IO information as well as timing information, etc, but the principle is the same.
Modern OSes also have a decent capability to get hardware-level diagnostics as well, so that profilers can get more detailed information about most of the systems, including memory, disk IO, CPU utilization, etc. How these different systems work are very device and driver specific.
Note that this injection can be done at various stages - on the source level, on the binary level before execution, at runtime, etc. Especially with languages like C#, where there is an intermediate stage between compilation and assembly execution (the CLR), it's easier to inject these additional instructions at runtime. It also allows you to surround methods within the internal .NET framework (such as the IO operations that I think you are asking about) with custom instructions at runtime, so that you can get performance information even if you don't have the original source code. This again relies on its ability to build a mapping from instruction sets to method names, but the difference is that you can still inject the additional instructions without having to resort to sampling. I think there are special precautions you can take to make this more difficult, but there's no real incentive for Microsoft to do this to the internals of the .NET framework.
If the Ants Profiler you are referring to is the one from RedGate then this is for .NET runtimes. I suspect that they are using the very extensive API for profiling applications, provided by Microsoft; I suggest you look for ICorProfilerCallback/2/3 and ICorProfilerInfo/2/3 for starters. The API allows for instrumentation and filtered callbacks for method entry/exit calls and other features.
Some open source (or code available) profilers of interest I suspect for you based on your query are CLRProfiler4 (Microsoft) and SlimTune.
I've been researching a bit about .NET Performance counters, but couldn't find anything detailed (unlike the rest of the framework).
My question is: Which are the most relevant performance counters to get the maximum computing power of the machine? Well, the .NET virtual machine, that is..
Thank you,
Chuck Mysak
You haven't described what you mean by "computing power". Here are some of the things you can get through performance counters that might be relevant:
Number of SQL queries executed.
Number of IIS requests completed.
Number of distributed transactions committed.
Bytes of I/O performed (disk, network, etc.).
There are also relative numbers, such as percentages of processor and memory in use which can give an indication of how much of the "power" of your system is actually being used.
However, you will not find anything that correlates cleanly with raw computing "power". In fact, who is to say that the machine's full "power" is being taken advantage of at the time you look at the counters? It sounds like what you really want to do is run a benchmark, which includes the exact work to be performed and the collection of measurements to be taken. You can search the Internet for various benchmark applications. Some of these run directly in .NET while the most popular are native applications which you could shell out to execute.