I have a simple program that's supposed to sort a table, and measure the number of processor ticks needed to do it:
timePerRun = Stopwatch.StartNew();
QuickSortLibrary.Quicksort.QuickSort(tabOfInts, 0, tabOfInts.Length-1);
timePerRun.Stop();
The only problem is that when I'm trying to sort a table of ~15 elements, I get 1-4 ticks. Is it possible that it happens so quickly, or the stopwatch only measures what happens in this method, not in the one that does actual sorting?
It is really possible that this happens so quickly. For so little elements, which all fit in the cache, it is a piece of joke for a contemporary CPU.
Related
Here is the code:
using (var context = new AventureWorksDataContext())
{
IEnumerable<Customer> _customerQuery = from c in context.Customers
where c.FirstName.StartsWith("A")
select c;
var watch = new Stopwatch();
watch.Start();
var result = Parallel.ForEach(_customerQuery, c => Console.WriteLine(c.FirstName));
watch.Stop();
Debug.WriteLine(watch.ElapsedMilliseconds);
watch = new Stopwatch();
watch.Start();
foreach (var customer in _customerQuery)
{
Console.WriteLine(customer.FirstName);
}
watch.Stop();
Debug.WriteLine(watch.ElapsedMilliseconds);
}
The problem is, Parallel.ForEach takes about 400ms vs a regular foreach, which takes about 40ms. What exactly am I doing wrong and why doesn't this work as I expect it to?
Suppose you have a task to perform. Let's say you're a math teacher and you have twenty papers to grade. It takes you two minutes to grade a paper, so it's going to take you about forty minutes.
Now let's suppose that you decide to hire some assistants to help you grade papers. It takes you an hour to locate four assistants. You each take four papers and you are all done in eight minutes. You've traded 40 minutes of work for 68 total minutes of work including the extra hour to find the assistants, so this isn't a savings. The overhead of finding the assistants is larger than the cost of doing the work yourself.
Now suppose you have twenty thousand papers to grade, so it is going to take you about 40000 minutes. Now if you spend an hour finding assistants, that's a win. You each take 4000 papers and are done in a total of 8060 minutes instead of 40000 minutes, a savings of almost a factor of 5. The overhead of finding the assistants is basically irrelevant.
Parallelization is not free. The cost of splitting up work amongst different threads needs to be tiny compared to the amount of work done per thread.
Further reading:
Amdahl's law
Gives the theoretical speedup in latency of the execution of a task at fixed workload, that can be expected of a system whose resources are improved.
Gustafson's law
Gives the theoretical speedup in latency of the execution of a task at fixed execution time, that can be expected of a system whose resources are improved.
The first thing you should realize is that not all parallelism is beneficial. There is an amount of overhead to parallelism, and this overhead may or may not be significant depending on the complexity what is being parallelized. Since the work in your parallel function is very small, the overhead of the management the parallelism has to do becomes significant, thus slowing down the overall work.
The additional overhead of creating all the threads for your enumerable VS just executing the numerable is more than likely the cause for the slowdown. Parallel.ForEach is not a blanket performance increasing move; it needs to be weighed whether or not the operation that is to be completed for each element is likely to block.
For example, if you were to make a web request or something instead of simply writing to the console, the parallel version might be faster. As it is, simply writing to the console is a very fast operation, so the overhead of creating the threads and starting them is going to be slower.
As previous writer has said there are some overhead associated with Parallel.ForEach, but that is not why you can't see your performance improvement. Console.WriteLine is a synchronous operation, so only one thread is working at a time. Try changing the body to something non-blocking and you will see the performance increase (as long as the amount of work in the body is big enough to outweight the overhead).
I like salomons answer and would like to add that you also have additional overhead of
Allocating delegates.
Calling through them.
I have an application which monitors a particular event and then starts to calculate things once it happens. Events are irregular and can come in any pattern from bunches in a sec to none for long time..
I want to measure %% of time the application is busy (similar to CPU % Usage)
I want to use Timer100Ns counter
Two questions:
Do I increment it by hardware ticks or by DateTime ticks (e.g. if I use Stopwatch - do I use sw.ElapsedTicks or sw.Elapsed.Ticks) ?
Do I need a base counter for it?
so I am about to write something like this:
Stopwatch sw = new Stopwatch();
sw.Start();
// Do some operation which is irregular by nature
sw.Stop();
// Measure utilization of the application
myCounterOfTypeTimer100Ns.IncrementBy(sw.Elapsed.Ticks);
Will it do ?
EDIT : I experimented with it a bit and now its even more confusing.. It actually shows the values I increment it by. Not %%.
The mystery unravelled. It currently appears that I don't use it in the way it was supposed to be used (or rather I didn't read TFM properly). If the sampling interval is 1s (as in perf mon live window) and you intervals are more than 1s then it shows you a nonsense number... To achieve smoothness, the activity you are trying to measure must be really fractions of 1s.. Otherwise this counter is not a good idea..
The answer for this kind of problem (although its not obvious, but still disturbing that nobody suggested it in a week) is actually SampleCounter.
I would like to understand why the first iteration in the loop executes quicker than the rest.
Stopwatch sw = new Stopwatch ();
sw.Start ();
for(int i=0; i<10; i++)
{
System.Threading.Thread.Sleep ( 100 );
Console.WriteLine ( "Finished at : {0}", ((double) sw.ElapsedTicks / Stopwatch.Frequency ) * 1e3 );
}
When I execute the code I get the following:
Initially I thought it could be due to the accuracy factor of Stopwatch class, but then why is it applicable only to the first element? Correct me if I'm missing something.
This is a very flawed benchmark. For one, Thread.Sleep does not guarantee you that you'll sleep for exactly 100ms. Try much longer sleeps and you'll see more consistent results.
So it might be even just scheduling - the next iterations are always just doing sleep after sleep. Since Sleep works thanks to the system interrupt clock, the sleeps after the first should take similar amount of time, while the first has to "sync up" with the clock first.
If you add another sleep before the cycle (and before starting the stopwatch), you'll likely get closer times for each of the iterations.
Or even better, don't use sleeps. If you use some actual CPU work instead, you'll avoid thread switches (provided you've got enough CPU to do that) and many other costs not associated with the cycle itself. For example,
Stopwatch sw = new Stopwatch ();
sw.Start ();
for(int i=0; i<10; i++)
{
Thread.SpinWait(10000000);
Console.WriteLine ( "Finished at : {0}", ((double) sw.ElapsedTicks / Stopwatch.Frequency ) * 1e3 );
}
This will give you much more consistent results, because it doesn't depend on the clock at all.
There's many other things that can complicate a benchmark like this, which is why benchmarks simply aren't done this way. There will always be deviations, and they can get rather big, especially on a system with a lot of work.
In other words, if you're getting differences in CPU work execution time on the scale of milliseconds, someone is stealing your work. There's nothing in a modern CPU that would account for such a huge difference just based on e.g. i++ being there or not.
I could describe a lot more issues with your code, but it probably isn't worth it. Just google for some best practices on CPU work benchmarking in C#, and you'll get much more worth out of it.
Oh, and just to help hammer the point home more, on my computer, the first tends to go anywhere from 99 up to 100. This would be highly unusual, since the default is 15.6ms, rather than 1ms, but the culprit is easily found - Chrome sets it to 1ms. Ouch.
What you're outputting for times is the total time elapsed since the start. so, time increasing by about 100ms is exactly what you should be expecting
But, when you use Thread.Sleep you're giving up control of the thread and maybe for something close to the time you've specified. That time will be in multiples of the system quantum--so, what you specify cannot possibly be exact. If other threads of higher priority are doing work, it's less likely that your thread will be given processor time at a granularity close to the time you've suggested.
I need to display a set of signals. Each signal is defined by millions of samples. Just processing the collection (for converting samples to points according to bitmap size) of samples takes a significant amount of time (especially during scrolling).
So I implemented some kind of downsampling. I just skip some points: take every 2nd, every 3rd, every 50th point depending on signal characteristics. It increases speed very much but significantly distorts signal form.
Are there any smarter approaches?
We've had a similar issue in a recent application. Our visualization (a simple line graph) became too cluttered when zoomed out to see the full extent of the data (about 7 days of samples with a sample taken every 6 seconds more or less), so down-sampling was actually the way to go. If we didn't do that, zooming out wouldn't have much meaning, as all you would see was just a big blob of lines smeared out over the screen.
It all depends on how you are going to implement the down-sampling. There's two (simple) approaches: down-sample at the moment you get your sample or down-sample at display time.
What really gives a huge performance boost in both of these cases is the proper selection of your data-sources.
Let's say you have 7 million samples, and your viewing window is just interested in the last million points. If your implementation depends on an IEnumerable, this means that the IEnumerable will have to MoveNext 6 million times before actually starting. However, if you're using something which is optimized for random reads (a List comes to mind), you can implement your own enumerator for that, more or less like this:
public IEnumerator<T> GetEnumerator(int start, int count, int skip)
{
// assume we have a field in the class which contains the data as a List<T>, named _data
for(int i = start;i<count && i < _data.Count;i+=skip)
{
yield return _data[i];
}
}
Obviously this is a very naive implementation, but you can do whatever you want within the for-loop (use an algorithm based on the surrounding samples to average?). However, this approach will make usually smooth out any extreme spikes in your signal, so be wary of that.
Another approach would be to create some generalized versions of your dataset for different ranges, which update itself whenever you receive a new signal. You usually don't need to update the complete dataset; just updating the end of your set is probably good enough. This allows you do do a bit more advanced processing of your data, but it will cost more memory. You will have to cache the distinct 'layers' of detail in your application.
However, reading your (short) explanation, I think a display-time optimization might be good enough. You will always get a distortion in your signal if you generalize. You always lose data. It's up to the algorithm you choose on how this distortion will occur, and how noticeable it will be.
You need a better sampling algorithm, also you can employ parallel processing features of c#. Refer to Task Parallel Library
I want to get the maximum count I have to execute a loop for it to take x milliseconds to finish.
For eg.
int GetIterationsForExecutionTime(int ms)
{
int count = 0;
/* pseudocode
do
some code here
count++;
until executionTime > ms
*/
return count;
}
How do I accomplish something like this?
I want to get the maximum count I have to execute a loop for it to take x milliseconds to finish.
First off, simply do not do that. If you need to wait a certain number of milliseconds do not busy-wait in a loop. Rather, start a timer and return. When the timer ticks, have it call a method that resumes where you left off. The Task.Delay method might be a good one to use; it takes care of the timer details for you.
If your question is actually about how to time the amount of time that some code takes then you need much more than simply a good timer. There is a lot of art and science to getting accurate timings.
First you should always use Stopwatch and never use DateTime.Now for these timings. Stopwatch is designed to be a high-precision timer for telling you how much time elapsed. DateTime.Now is a low-precision timer for telling you if it is time to watch Doctor Who yet. You wouldn't use a wall clock to time an Olympic race; you'd use the highest precision stopwatch you could get your hands on. So use the one provided for you.
Second, you need to remember that C# code is compiled Just In Time. The first time you go through a loop can therefore be hundreds or thousands of times more expensive than every subsequent time due to the cost of the jitter analyzing the code that the loop calls. If you are intending on measuring the "warm" cost of a loop then you need to run the loop once before you start timing it. If you are intending on measuring the average cost including the jit time then you need to decide how many times makes up a reasonable number of trials, so that the average works out correctly.
Third, you need to make sure that you are not wearing any lead weights when you are running. Never make performance measurements while debugging. It is astonishing the number of people who do this. If you are in the debugger then the runtime may be talking back and forth with the debugger to make sure that you are getting the debugging experience you want, and that chatter takes time. The jitter is generating worse code than it normally would, so that your debugging experience is more consistent. The garbage collector is collecting less aggressively. And so on. Always run your performance measurements outside the debugger, and with optimizations turned on.
Fourth, remember that virtual memory systems impose costs similar to those of jitters. If you are already running a managed program, or have recently run one, then the pages of the CLR that you need are likely "hot" -- already in RAM -- where they are fast. If not, then the pages might be cold, on disk, and need to be page faulted in. That can change timings enormously.
Fifth, remember that the jitter can make optimizations that you do not expect. If you try to time:
// Let's time addition!
for (int i = 0; i < 1000000; ++i) { int j = i + 1; }
the jitter is entirely within its rights to remove the entire loop. It can realize that the loop computes no value that is used anywhere else in the program and remove it entirely, giving it a time of zero. Does it do so? Maybe. Maybe not. That's up to the jitter. You should measure the performance of realistic code, where the values computed are actually used somehow; the jitter will then know that it cannot optimize them away.
Sixth, timings of tests which create lots of garbage can be thrown off by the garbage collector. Suppose you have two tests, one that makes a lot of garbage and one that makes a little bit. The cost of the collection of the garbage produced by the first test can be "charged" to the time taken to run the second test if by luck the first test manages to run without a collection but the second test triggers one. If your tests produce a lot of garbage then consider (1) is my test realistic to begin with? It doesn't make any sense to do a performance measurement of an unrealistic program because you cannot make good inferences to how your real program will behave. And (2) should I be charging the cost of garbage collection to the test that produced the garbage? If so, then make sure that you force a full collection before the timing of the test is done.
Seventh, you are running your code in a multithreaded, multiprocessor environment where threads can be switched at will, and where the thread quantum (the amount of time the operating system will give another thread until yours might get a chance to run again) is about 16 milliseconds. 16 milliseconds is about fifty million processor cycles. Coming up with accurate timings of sub-millisecond operations can be quite difficult if the thread switch happens within one of the several million processor cycles that you are trying to measure. Take that into consideration.
var sw = Stopwatch.StartNew();
...
long elapsedMilliseconds = sw.ElapsedMilliseconds;
You could also use the Stopwatch class:
int GetIterationsForExecutionTime(int ms)
{
int count = 0;
Stopwatch stopwatch = new Stopwatch();
stopwatch.Start();
do
{
// some code here
count++;
} while (stopwatch.ElapsedMilliseconds < ms);
stopwatch.Stop();
return count;
}
Good points from Eric Lippert.
I'd been benchmarking and unit testing for a while and I'd advise you should discard every first-pass on you code cause JIT compilation.
So in a benchmarking code which use loop and Stopwatch remember to put this at the end of the loop:
// JIT optimization.
if (i == 0)
{
// Discard every result you've collected.
// And restart the timer.
stopwatch.Restart();
}