Time of program at different moments

Time of program at different moments - c#

I have just finished a project, but i 've got a question from my teacher. Why does my program (with same algorithm, same data, same environment) run with different finish time at different moments.?
Can anyone help me?
Example: Now my program runs for 1.03s.
but then it runs for 1.05s (sometimes faster 1.01).

That happens because your program is not the only entity executing in the system and it does not get all the resources immediately at all times.
For this reason it's practically of little value to measure short execution times as they are going to vary quite noticeably. Instead, if you're interested in more accurate time measurements, you should execute your code many times and calculate the average time of all runs.

Just an idea here but could it be because of the changes on the memory usage, cpu usage by the background applications changes on different times. I mean time difference would create difference only on;
The memory usage by the other applications
The physical conditions such as cpu heat. ( The changes in time is really small )
And system clock. If you do a random number generation or do any operation that uses system clock on the background might create that change.
Hope this helps.
Cheers.

That's easy. You capture system time difference, using a counter that's imprecise as it uses system resources. There are more programs that run in parallel with yours, some take priority over your code causing temporary (~20ms, depending on OS settings) suspension of your thread. Even in DOS there is code that runs in quasi-parallel with yours, given there's only one thread possible, your code is stalled while the time is still ticking (it's governed by that code).

Because Windows is not a real time operating system. Many other activity can happen when your program is executed, and the cpu can share its cycles with other running processes. Time can change even more if your program need to read from physical devices as disk ( database too ) and the net: this because physical resource can be busy serving other requests. memory could change things too, if there is page faults, so your app need to read pages from virtual memory and as a result you will see a performance decrease. Since you are using C#, time can change sensibly from first execution to the others in the same process, due to the fact the code is JITtted, ie is compiled from intermediate code to assembly the first time is seen, then it is used in the assembly form, that is dramatically faster.

The assumption is wrong. The environment does not stay the same. The available resources for your program depend on many things. E.g. CPU and memory utilization by other processes, e.g. background processes. The harddisk and/or network utilization due to other processes. Even if there are no other processes running your program will change the internal state of the caches.
In "real world" performance scenarios it is not uncommon to see fluctuations of +/- 20% after "warm up". That is: measure 10 times in a row as "warm up" and discard the results. Measure 10 times more and collect the results. --> +/- 20% is quite common. If you do not warm up you might even see differences several orders of magnitude due to "cold" caches.
Conclusion: your program is very small and uses very little resources and it does not benefit from durable cache mechanisms.

Related

What limits debugging output speed?

For control purposes, I print all values in a collection to the debug console, with
Debug.WriteLine(...);
Since I'm also watching the task manager for performance control, I noticed that neither of the 2 CPU cores is under full load while printing. RAM usage also doesn't exceed about 50%.
Both cores have got work to do, so it's not a problem of not having enough tasks to perform
So my question is:
What component or something like that determines the maximum speed at which the debug output can be written?

I would guess that most of the time would be spent in I/O operations, i.e. writing to the logfile or the console (which might be even more expensive). So the CPU will spent the idle time waiting for the Hard drive, the GPU and/or the additional memory operations.

Control Memory-Hungy Multi-Threaded App

This a VERY open question.
Basically, I have a computing application that launches test combinations for N Scenarios.
Each test is conducted in a single dedicated thread, and involves reading large binary data, processing it, and dropping results to DB.
If the number of threads is too large, the app gets rogue and eats out all available memory and hangs out..
What is the most efficient way to exploit all CPU+RAM capabilities (High Performance computing i.e 12Cores/16GB RAM) without putting the system down to its knees (which happens if "too many" simultaneous threads are launched, "too many" being a relative notion of course)
I have to specify that I have a workers buffer queue with N workers, every time one finishes and dies a new one is launched via a Queue. This works pretty fine as of now. But I would like to avoid "manually" and "empirically" setting the number of simultaneous threads and have an intelligent scalable system that drops as many threads at a time that the system can properly handle, and stop at a "reasonable" memory usage (the target server is dedicated to the app so there is no problem regarding other applications except the system)
PS : I know that .Net 3.5 comes with Thread Pools and .Net 4 has interesting TPL capabilites, that I am still considering right now (I never went very deep into this so far).
PS 2 : After reading this post I was a bit puzzled by the "don't do this" answers. Though I think such request is fair for a memory-demanding computing program.
EDIT
After reading this post I will to try to use WMI features

All built-in threading capabilities in .NET do not support adjusting according to memory usage. You need to build this yourself.
You can either predict memory usage or react to low memory conditions. Alternatives:
Look at the amount of free memory on the system before launching a new task. If it is below 500mb, wait until enough has been freed.
Launch tasks as they come and throttle as soon as some of them start to fail because of OOM. Restart them later. This alternative sucks big time because your process will do garbage collections like crazy to avoid the OOMs.
I recommend (1).
You can either look at free system memory or your own processes memory usage. In order to get the memory usage I recommend looking at private bytes using the Process class.
If you set aside 1GB of buffer on your 16GB system you run at 94% efficiency and are pretty safe.

Threads vs Processes in .NET

I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:
ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();
What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.
I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?
I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.

It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.
The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.
Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.
One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.
I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.
Using the Task Parallel Library didn't alleviate this problem.

First of all on a 24 core box if you are using only 4 threads the most cpu it could ever use is 16.7% so really you are getting 60% utilization, which is fairly good.
It is hard to tell if your program is I/O bound at this point, my guess is that is is. You need to run a profiler on your project and see what sections of code your project is spending the most of it's time. If it is sitting on a read/write operation it is I/O bound.
It is possable you have some form of inter-thread locking being used. That would cause the program to slow down as you add more threads, and yes running a second process would fix that but fixing your locking would too.
What it all boils down to is without profiling information we can not say if using a second process will speed things up or make things slower, we need to know if the program is hanging on a I/O operation, a locking operation, or just taking a long time in a function that can be parallelized better.

I think you find out what file cache is not ideal in case when one proccess write data in many file concurrently. File cache should sync to disk when the number of dirty page cache exceeds a threshold. It seems concurrent writers in one proccess hit threshold faster than the single thread writer. You can read read about file system cache here File Cache Performance and Tuning

Try using Task library from .net 4 (System.Threading.Task). This library have built-in optimizations for different number of processors.
Have no clue what is you problem, maybe because your code snippet is not really informative

How many threads to use?

I know there are some existing questions and they provide a very good general perspective on things. I'm hoping to get some details on the C#/VB.Net side for the actual implementation (not philosophy) of some of these perspectives.
My Particular Case
I have a WCF Service which, amongst other things, receives files. For most of the service's life this particular area is actually just sat doing nothing - when work does come it arrives in high bursts of greatly varying quantities.
For each file received (which at a max can be thousands per second) the service needs to work on the files for between 1-10 seconds (each) depending on a number of other services, local resources, and network IO wait times.
To aid the service with these burst workloads I implemented a Queue system. Those thousands of files recieved per second are placed onto the Queue. A controller calculates the number of threads to use based on the size of the queue, up until it reaches a "Peak Max Threads" setting which prevents it from creating additional threads. These threads are placed in a thread pool, and reused to cycle through the queue. The controller will; at intervals; recalculate the number of threads required. If the queue size reduces, a relevant number of threads are released.
The age old problem
How many threads should I peak at? Clearly, adding a new thread everytime a file was received would be silly for lack of a better word - the performance, at best, would deteriorate. Capping the threads when CPU utilization is only 10% across each core, also doesn't seem to be the best use of resources.
So, is there an appropriate way to determine how many threads to cap at? I would rather the service could determine this for itself by sampling available resources, but is there a performance hit from doing so? I know the common answer is to monitor workloads, adjust the counts through trial and error until I find a number I like, but due to the nature of this service (long periods of idle followed by high/burst workloads) it could take a long time to get that kind of information.
What then if we move the server's image to a different host which is faster/slower/different to the first? I have to re-sample the process all over again?
Ideally what I'm after, is for the co-ordinator to intelligently increase the size of the threadpool until CPU utilisation is at x% (would 80% be reasonable? 90%? 99%?). Clearly, I want to do this without adding more threads than is necessary to hit x% otherwise all I'll end up with is threads not just waiting on IO resources, but awaiting each other too.
Thanks in advance!
Related questions (if you want some generic ideas):
How many threads to create?
How many threads is too many?
How many threads to create and when?
A Complication for you
Where would be the fun if I didn't make the problem more difficult?
As it currently stands, the service does hit 100% cpu during these bursts, regularly. The issue is the CPU utilisation spikes. It goes from idle (0-10%) to 100%, and back down again. I'm not sure I can help that - ideally I wouldn't take it all the way to 100%. The problem exists because the files mentioned are in fact images, and part of the services' process is to pass the image through to the System.Windows.Media blackbox which does some complex image processing for me.
There are then lulls in between the spikes because of the IO waits and other processing that goes on. If the spikes hitting 100% can't be helped (and I'm all for knowing how to prevent that, or if I should) how should I aim for the CPU utilisation graph to look? Sat constantly at 100%? Bouncing between 50-100? If I do go through the effort of sampling to decide what does seem to work best, is it guaranteed that switching the virtual servers' host will also work best with the same graph?
This added complexity I won't take into consideration for those of you willing to answer. Feel free to ignore this section. However, any answer that also accounts for this complication, or even answers that just provide tips on how to handle it, I'll at the very least upvote!
Heck of a long question - sorry about that - and thanks for reading so much!!

PerformanceCounter allows you to query for processor usage.
However ,have you tried something the framework provides?
foreach (var file in files)
{
var workitem = file;
Task.Factory.StartNew(() =>
{
// do work on workitem
}, TaskCreationOptions.LongRunning | TaskCreationOptions.PreferFairness);
}
You can tune the concurrency level for Tasks in the Task.Factory.
The .NET 4 threadpool by default will schedule the number of threads it finds most performing on the hardware where it runs, but you can change how that works with the previous link.
Probably you need a custom solution but it would be ok to benchmark yours with the standard.
Edit: (comment note):
No links needed, I may have used an invented term since english is not my language. What I mean is: have a variable where you store the variance before the last check (prevDelta), and call it delta. add this to the varuiable avrageDelta and divide by 2, each time you 'check'. You will have the variable averageDelta that will mostly be low since you have no activity. Then have another set of delta variables, one you have already (delta - prevdelta), and store it in a delta variable that is not the average of all deltas but the average of deltas in a small timespan (you will have to come up with an algortihm to calculate accurately this temporal variance). Once done this you can compare the average delta and the 'temporal delta'. The average delta will be mostly low and will slowly go up whjen bursts come. In the same period the temporal delta will go up really fast. Then you have the situation when the burst stops, the average delta goes slowly down, and the 'temporal' goes really fast.

You could use I/O Completion Ports to asynchronously fetch your images without tying up any threads until it comes time to process what you have fetched.
You could then limit your thread pool based on the number of cores on your client PC, making sure to leave a core free for other processes to use.

What about a dynamic thread manager that monitors their overall performance and according to this spawns new threads or kills old ones? The main problem here is only how to define the performance measurement function. The rest can be done with a periodically scheduled job that increases or decreases the number of threads according to the previous number of threads and performance in that case or something like that. Maybe also in connection to resources utilization (CPU, disks, network...).

How to simulate different CPU frequency and limit RAM

I have to build a simulator with C#. This simulator should be able to run a second thread with configureable CPU speed and limited RAM size, e.g. 144MHz and 50 MB.
Of course I know that a simulator can never be as accurate as the real hardware. But I try to get almost similar performance.
At the moment I'm thinking about creating a thread which I will stop/sleep from time to time. Depending on the desired CPU speed the simulator should adjust the sleep time of this thread and therefore simulate different cpu frequency. To measure the achieved speed I though about using PerformanceCounters. But with this approach I have the problem that I don't know how to limit the RAM size the thread could use.
Do you have any ideas how to realize such a simulator?
Thanks in advance!!

Limit memory is easy with the virtual machines like vmware. You can change cpu speed with some overclocking tools. For example http://cpu.rightmark.org/products/rmclock.shtml
Good luck!

CPU speed limiting? You should check this, perhaps it will useful (to some degree at least).
CPU Emulation and locking to a specific clock speed

If you are concerned with simulating an operating system environment then one answer would be to use a virtual machines environment where you can control memory and CPU parameters, etc.
The threading pause\stop may help you to simulate CPU frequency, but this is going to be terribly inaccurate as when you pause the thread it will be de-scheduled, then it's up to the operating system to re-schedule it at some "random" point in time i.e. a point which you have no control over.
As for limiting the memory, starting a new process that will host your code is an option, and then limiting the memory of that process, e.g.:
http://www.codeproject.com/KB/threads/Setting_Max_Memory_Limit.aspx
This will not really simulate overall OS memory limitations though.

A thread to sleep down the software execution of your guest opcodes ?
I think it works but a little weird, like fast-forward, pause, ff, pause, etc...
If you just want to speed down a process, try this: use the cpu single step features, and "debug" the process. You have to wrote a custom handler for the cpu single stepping trap. Your handler job is only a big loop of NOPs.
You have a fine delay between each instruction.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.