What limits debugging output speed? - c#

For control purposes, I print all values in a collection to the debug console, with
Debug.WriteLine(...);
Since I'm also watching the task manager for performance control, I noticed that neither of the 2 CPU cores is under full load while printing. RAM usage also doesn't exceed about 50%.
Both cores have got work to do, so it's not a problem of not having enough tasks to perform
So my question is:
What component or something like that determines the maximum speed at which the debug output can be written?

I would guess that most of the time would be spent in I/O operations, i.e. writing to the logfile or the console (which might be even more expensive). So the CPU will spent the idle time waiting for the Hard drive, the GPU and/or the additional memory operations.

Related

Lock CPU resources in production critical applications (safety overhead)

When coding an app for Windows (c++, c#), is there a way to lock a certain amount of cpu percentage or cores or threads so they cannot be used by other programs or processes when said app is running? I know you could tinker with CPU priority and affinity in task manager, but I don't know if that prevents other programs to 'steal' cpu power when they need it.
The app is very cpu intensive and dependant on 'real time' operation so when the usage reaches 100%, cpu cannot deal with all the load and errors occur.
So ideally the code would make sure that, if the app is currently working nicely and using 80% of the cpu, no other proccess would ever be allowed to take the remaining 20% (allowing only 10% usage, for example). I guess you could call that 'safety overhead'? I hope I made myself clear.
I am trying to figure out if such a concept exists at all, I couldn't be sure of the keywords or find a thread to start pulling.
If that is not possible in Windows c++ c#, is it a thing in other enviroments?
Thanks!
To my knowledge there is no nice way of doing such things, especially since what you are trying to use is a custom scheduler, but as these are usually hard-coded into the operating system I don't see much hope for you.
If real-time functionality is your main concern, I would either recommend using a real-time-operating-system or optimizing your software, so it doesn't need 80% of your CPU, if possible. You could also just upgrade your CPU (if money is not really a concern), so it can handle your software.
Under other operating systems there are ways to encourage the scheduler to like your software more (look up "nice value") but that's similar to changing priority in Task Manager (on steroids however).
I also remember from my operating systems lecture that there are operating systems that allow the scheduler to be modified, this might be further than you wanted to go, but that is a possibility if you become desperate enough.
And as my last idea: If you have something really computationally intense, it is often doing something over and over again. Assuming that these steps are (partially) independent of each other, it could be a massive performance gain to move work from the CPU to the GPU, from my experience (n=1) saving 50% is possible. From C++ with an Nvidia GPU you want to look up CUDA, for everything else you likely want OpenCL.

Time of program at different moments

I have just finished a project, but i 've got a question from my teacher. Why does my program (with same algorithm, same data, same environment) run with different finish time at different moments.?
Can anyone help me?
Example: Now my program runs for 1.03s.
but then it runs for 1.05s (sometimes faster 1.01).
That happens because your program is not the only entity executing in the system and it does not get all the resources immediately at all times.
For this reason it's practically of little value to measure short execution times as they are going to vary quite noticeably. Instead, if you're interested in more accurate time measurements, you should execute your code many times and calculate the average time of all runs.
Just an idea here but could it be because of the changes on the memory usage, cpu usage by the background applications changes on different times. I mean time difference would create difference only on;
The memory usage by the other applications
The physical conditions such as cpu heat. ( The changes in time is really small )
And system clock. If you do a random number generation or do any operation that uses system clock on the background might create that change.
Hope this helps.
Cheers.
That's easy. You capture system time difference, using a counter that's imprecise as it uses system resources. There are more programs that run in parallel with yours, some take priority over your code causing temporary (~20ms, depending on OS settings) suspension of your thread. Even in DOS there is code that runs in quasi-parallel with yours, given there's only one thread possible, your code is stalled while the time is still ticking (it's governed by that code).
Because Windows is not a real time operating system. Many other activity can happen when your program is executed, and the cpu can share its cycles with other running processes. Time can change even more if your program need to read from physical devices as disk ( database too ) and the net: this because physical resource can be busy serving other requests. memory could change things too, if there is page faults, so your app need to read pages from virtual memory and as a result you will see a performance decrease. Since you are using C#, time can change sensibly from first execution to the others in the same process, due to the fact the code is JITtted, ie is compiled from intermediate code to assembly the first time is seen, then it is used in the assembly form, that is dramatically faster.
The assumption is wrong. The environment does not stay the same. The available resources for your program depend on many things. E.g. CPU and memory utilization by other processes, e.g. background processes. The harddisk and/or network utilization due to other processes. Even if there are no other processes running your program will change the internal state of the caches.
In "real world" performance scenarios it is not uncommon to see fluctuations of +/- 20% after "warm up". That is: measure 10 times in a row as "warm up" and discard the results. Measure 10 times more and collect the results. --> +/- 20% is quite common. If you do not warm up you might even see differences several orders of magnitude due to "cold" caches.
Conclusion: your program is very small and uses very little resources and it does not benefit from durable cache mechanisms.

Control Memory-Hungy Multi-Threaded App

This a VERY open question.
Basically, I have a computing application that launches test combinations for N Scenarios.
Each test is conducted in a single dedicated thread, and involves reading large binary data, processing it, and dropping results to DB.
If the number of threads is too large, the app gets rogue and eats out all available memory and hangs out..
What is the most efficient way to exploit all CPU+RAM capabilities (High Performance computing i.e 12Cores/16GB RAM) without putting the system down to its knees (which happens if "too many" simultaneous threads are launched, "too many" being a relative notion of course)
I have to specify that I have a workers buffer queue with N workers, every time one finishes and dies a new one is launched via a Queue. This works pretty fine as of now. But I would like to avoid "manually" and "empirically" setting the number of simultaneous threads and have an intelligent scalable system that drops as many threads at a time that the system can properly handle, and stop at a "reasonable" memory usage (the target server is dedicated to the app so there is no problem regarding other applications except the system)
PS : I know that .Net 3.5 comes with Thread Pools and .Net 4 has interesting TPL capabilites, that I am still considering right now (I never went very deep into this so far).
PS 2 : After reading this post I was a bit puzzled by the "don't do this" answers. Though I think such request is fair for a memory-demanding computing program.
EDIT
After reading this post I will to try to use WMI features
All built-in threading capabilities in .NET do not support adjusting according to memory usage. You need to build this yourself.
You can either predict memory usage or react to low memory conditions. Alternatives:
Look at the amount of free memory on the system before launching a new task. If it is below 500mb, wait until enough has been freed.
Launch tasks as they come and throttle as soon as some of them start to fail because of OOM. Restart them later. This alternative sucks big time because your process will do garbage collections like crazy to avoid the OOMs.
I recommend (1).
You can either look at free system memory or your own processes memory usage. In order to get the memory usage I recommend looking at private bytes using the Process class.
If you set aside 1GB of buffer on your 16GB system you run at 94% efficiency and are pretty safe.

Threads vs Processes in .NET

I have a long-running process that reads large files and writes summary files. To speed things up, I'm processing multiple files simultaneously using regular old threads:
ThreadStart ts = new ThreadStart(Work);
Thread t = new Thread(ts);
t.Start();
What I've found is that even with separate threads reading separate files and no locking between them and using 4 threads on a 24-core box, I can't even get up to 10% on the CPU or 10% on disk I/O. If I use more threads in my app, it seems to run even more slowly.
I'd guess I'm doing something wrong, but where it gets curious is that if I start the whole exe a second and third time, then it actually processes files two and three times faster. My question is, why can't I get 12 threads in my one app to process data and tax the machine as well as 4 threads in 3 instances of my app?
I've profiled the app and the most time-intensive and frequently called functions are all string processing calls.
It's possible that your computing problem is not CPU bound, but I/O bound. It doesn't help to state that your disk I/O is "only at 10%". I'm not sure such performance counter even exists.
The reason why it gets slower while using more threads is because those threads are all trying to get to their respective files at the same time, while the disk subsystem is having a hard time trying to accomodate all of the different threads. You see, even with a modern technology like SSDs where the seek time is several orders of magnitude smaller than with traditional hard drives, there's still a penalty involved.
Rather, you should conclude that your problem is disk bound and a single thread will probably be the fastest way to solve your problem.
One could argue that you could use asynchronous techniques to process a bit that's been read, while on the background the next bit is being read in, but I think you'll see very little performance improvement there.
I've had a similar problem not too long ago in a small tool where I wanted to calculate MD5 signatures of all the files on my harddrive and I found that the CPU is way too fast compared to the storage system and I got similar results trying to get more performance by using more threads.
Using the Task Parallel Library didn't alleviate this problem.
First of all on a 24 core box if you are using only 4 threads the most cpu it could ever use is 16.7% so really you are getting 60% utilization, which is fairly good.
It is hard to tell if your program is I/O bound at this point, my guess is that is is. You need to run a profiler on your project and see what sections of code your project is spending the most of it's time. If it is sitting on a read/write operation it is I/O bound.
It is possable you have some form of inter-thread locking being used. That would cause the program to slow down as you add more threads, and yes running a second process would fix that but fixing your locking would too.
What it all boils down to is without profiling information we can not say if using a second process will speed things up or make things slower, we need to know if the program is hanging on a I/O operation, a locking operation, or just taking a long time in a function that can be parallelized better.
I think you find out what file cache is not ideal in case when one proccess write data in many file concurrently. File cache should sync to disk when the number of dirty page cache exceeds a threshold. It seems concurrent writers in one proccess hit threshold faster than the single thread writer. You can read read about file system cache here File Cache Performance and Tuning
Try using Task library from .net 4 (System.Threading.Task). This library have built-in optimizations for different number of processors.
Have no clue what is you problem, maybe because your code snippet is not really informative

How to simulate different CPU frequency and limit RAM

I have to build a simulator with C#. This simulator should be able to run a second thread with configureable CPU speed and limited RAM size, e.g. 144MHz and 50 MB.
Of course I know that a simulator can never be as accurate as the real hardware. But I try to get almost similar performance.
At the moment I'm thinking about creating a thread which I will stop/sleep from time to time. Depending on the desired CPU speed the simulator should adjust the sleep time of this thread and therefore simulate different cpu frequency. To measure the achieved speed I though about using PerformanceCounters. But with this approach I have the problem that I don't know how to limit the RAM size the thread could use.
Do you have any ideas how to realize such a simulator?
Thanks in advance!!
Limit memory is easy with the virtual machines like vmware. You can change cpu speed with some overclocking tools. For example http://cpu.rightmark.org/products/rmclock.shtml
Good luck!
CPU speed limiting? You should check this, perhaps it will useful (to some degree at least).
CPU Emulation and locking to a specific clock speed
If you are concerned with simulating an operating system environment then one answer would be to use a virtual machines environment where you can control memory and CPU parameters, etc.
The threading pause\stop may help you to simulate CPU frequency, but this is going to be terribly inaccurate as when you pause the thread it will be de-scheduled, then it's up to the operating system to re-schedule it at some "random" point in time i.e. a point which you have no control over.
As for limiting the memory, starting a new process that will host your code is an option, and then limiting the memory of that process, e.g.:
http://www.codeproject.com/KB/threads/Setting_Max_Memory_Limit.aspx
This will not really simulate overall OS memory limitations though.
A thread to sleep down the software execution of your guest opcodes ?
I think it works but a little weird, like fast-forward, pause, ff, pause, etc...
If you just want to speed down a process, try this: use the cpu single step features, and "debug" the process. You have to wrote a custom handler for the cpu single stepping trap. Your handler job is only a big loop of NOPs.
You have a fine delay between each instruction.

Categories

Resources