the Windows Taskmanger is fine to check the CPU and memory useage of an application but in our programm we have different threads and we want to know how much of the total ammount is on each thread.
We would like to check this with an external programm and also from the application itself during runtime. It would be great if a thread could tell about his memory and cpu useage.
Here's is the example:
You have threadA and ThreadB.
ThreadA creats an object X.
ThreadB uses this object.
So what do you want to see in thread's information? Who created the object or who is using it?
The only thing you can see is how much CPU time is using thread AFAIK
And all the same the only program that I know that shows MAX info on process is Process Explorer. http://technet.microsoft.com/en-us/sysinternals/bb896653
You can use performance monitor to see how much memory is allocated to a process, but you cannot see the same for single threads inside it.
However, you could create custom performance counters to display any value you want to monitor from within your code.
SysInternals Process Explorer has this feature, check this Server fault thread.
There is an open source project on CodeProject, the screenshot looks promising: How to get CPU usage of processes and threads, but the demo project seems to be crashing on Win7 (probably missing some privileges).
[Edit] If you want to write it yourself, you can P/Invoke Thread32First and Thread32Next functions to enumerate threads within a single process, and then use QueryThreadCycleTime to query CPU time for each thread.
Objects are shared between threads, threads do not own objects.
Memory for an object is allocated on the heap, which lives in the realm of the application. Any thread can access any of this memory at any time during the lifetime of the application.
There is no way to determine which thread is or may be using any arbitrary blocks of memory.
Threads perform units of work. Unless you know which thread is going to be running which unit of work you will be able to get no reliable metrics out of CPU usage. If you do know which thread will be performing which tasks, then Process Explorer by SysInternals has this metric.
Related
I'm trying to profile my .net application written in C# which uses 100% of cpu. Application is very big, contains tons of code, so it is impossible to provide whole project code. I tried to get threads stack for application threads that uses 25% CPU (1 core), and often i got this:
ntoskrnl.exe!KeSynchronizeExecution+0x2246
ntoskrnl.exe!KeWaitForMultipleObjects+0x135e
ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9
ntoskrnl.exe!KeWaitForMutexObject+0x373
ntoskrnl.exe!KeStallWhileFrozen+0x1977
ntoskrnl.exe!_misaligned_access+0x13f9
ntoskrnl.exe!KeWaitForMultipleObjects+0x152f
ntoskrnl.exe!KeWaitForMultipleObjects+0xdd9
ntoskrnl.exe!KeWaitForMutexObject+0x373
ntoskrnl.exe!NtWaitForSingleObject+0xb2
ntoskrnl.exe!setjmpex+0x34a3
ntdll.dll!ZwWaitForSingleObject+0xa
KERNELBASE.dll!WaitForSingleObjectEx+0x98
clr.dll!GetMetaDataInternalInterface+0x25b1f
clr.dll!GetMetaDataInternalInterface+0x25ad3
clr.dll!GetMetaDataInternalInterface+0x25a92
clr.dll!GetMetaDataInternalInterface+0x39106
clr.dll!GetMetaDataInternalInterface+0x39a81
clr.dll!GetMetaDataInternalInterface+0x394ad
clr.dll!GetMetaDataInternalInterface+0x39979
clr.dll!GetMetaDataInternalInterface+0x398c1
clr.dll!GetMetaDataInternalInterface+0x3539a
clr.dll!ClrCreateManagedInstance+0x2747
KERNEL32.dll!BaseThreadInitThunk+0x22
ntdll.dll!RtlUserThreadStart+0x34
Can anyone explain to me why thread with this call stack consumes 1 core
of my CPU?
What does this 'KeSynchronizeExecution'?
How to avoid hight CPU usage in such situations?
Just trying to help you here, I am not an expert.
The ntoskrnl.exe!KeSynchronizeExecution routine synchronizes the execution of the specified routine with the interrupt service routine (ISR) that is assigned to a set of one or more interrupt objects.
The ntoskrnl.exe!KeWaitForMultipleObjects routine puts the current thread into an alertable or nonalertable wait state until any or all of a number of dispatcher objects are set to a signaled state or (optionally) until the wait times out.
The ntoskrnl.exe!KeWaitForMutexObject routine puts the current thread into an alertable or nonalertable wait state until the given mutex object is set to a signaled state or (optionally) until the wait times out.
I think ntoskrnl.exe!KEStallWhileFrozen routine is called when waits for multiple objects routines are not resolved.
ntoskrnl.exe!_misaligned_access routine is an alert when cpu cannot read misaligned data. Seems Misaligned memory accesses can incur enormous performance losses on targets that do not support them in hardware. Ref: https://msdn.microsoft.com/en-us/library/ms253949(v=vs.80).aspx. Also check the Avoiding Alignment Errors section.
ntoskrnl.exe!NtWaitForSingleObject waits until the specified object attains a state of signaled.
A call to the setjmp function saves the current instruction address as well as other CPU registers. A subsequent call to the longjmp function restores the instruction pointer and registers, and execution resumes at the point just after the setjmp call.
ntdll.dll!ZwWaitForSingleObject routine waits until the specified object attains a state of Signaled. An optional time-out can also be specified.
KERNELBASE.dll!WaitForSingleObjectEx waits until the specified object is in the signaled state, an I/O completion routine or asynchronous procedure call (APC) is queued to the thread, or the time-out interval elapses.
clr.dll!GetMetaDataInternalInterface gets a pointer to an internal interface instance that is used to read and write metadata in memory.
I used Jetbrains solution before for this. And it could really find optimization points easy. My advice use : https://www.jetbrains.com/profiler/ and find which process and methods high usage cpu. also you can find memory etc.
I know you can install it trial. Install it trial and solve your problem.
I have just saw a similar problem in my application.
I was able to profile it using PerfView application.
In my case it was lock inside Dictionary.Insert method. Trying to access dictionary from multiple threads at the same time causes infinit operations in these threads.
Looks like one CPU Core goes to 25% usage and no chance to unlock it.
I have written a program in C# that does a lot of parallel work using different threads. When i reach approx 300 threads the GUI of the program starts to become slow and the execution of threads is also slowing down drastically. The threads are reading and writing data from a mySQL Database runnning on a different machine.
The funny thing is that if i split the work between two processes on the same machine everything runs perfect. Is there a thread limit per process in the .net framework or in windows? Or why am I getting this behaviour? Could it be a network related problem? I am running Windows 7 Ultimate and i have tried both VS2010 and VS 2012 with the same behaviour.
The way processor time is allocated is that the Operating System gives processor time to every process, then every process gives time to every thread.
So two processes will get twice the processor time, and that's why it works faster if you divide the program into two processes.
If you want to make the GUI run smoother, just set the priority higher for that thread.
This way the GUI thread will get more processor time then the other threads, but not so much that it will noticeably slow down the other threads.
300 threads is silly.
The number of threads should be in the range of your number of cores (2..8) and/or the max simultaneous connections (sometimes only 4 over TCP) your system supports.
Get beyond that and you're only wasting memory, at 1 MB per thread. In a 32bit system, 300 MB is already consuming a lot of the available mem space. And I assume each thread has some buffers attached.
If 2 separate processes perform better than1 then it probably isn't the context switching but either memory usage or a connection limit that holds you back.
Use ThreadPool. That should automatically allocate the optimal number of threads based on your system by throttling the number of threads in existence. You can also set the maximum number of threads allowable at any one time.
Also, if you're allocating thread to parallelize tasks from within a for-loop, foreach-loop, or linq statment you should look at the Parallel Class or PLINQ.
The accepted answer to this question will probably explain what is happening, but 300 threads seems like to many to be a good idea for any normal application.
At first if you have 300 threads for an application then probably you should rethink about your program design.
Setting up GUI threads priority may give you a better performance of GUI. But if you run so much thread the OS have to allocate space in program stack. And the stack is a continuous segment of the memory. So each time you create a new thread the allocated memory space for the stack may be incapable to hold the new thread. And then the OS must have to allocate a larger continuous space in the memory and copy all the data from the old stack to new stack. So obviously this may cause performance slow of your program.
I'm trying to understand whether a process would interfere another process running on the same piece of hardware system. This could happen in a wild range of products. ie. vmware or as simple as running multiple .net applications.
If I have repetitive lock happening of a particular process say, interlock, or lock keywords in C# terms, will it affect the performance other processes due to its intensive usage of lock? The setting is a heavy loaded www system, and I am experience some situational delay, I would like to determine whether the delay was caused by a dense while loop of locks that was completely isolated by a different windows kernel thread.
If there is no isolation, will application domain in .net help me in this case?
Thanks for your answer
No it won't. A lock in C#, and .Net overall, is local to a process. It can't directly affect other processes on the machine.
A lock statement operates on a particular instance of an object. In order for a lock to effect multiple processes they would all have to lock on the same instance of an object. This is not possible since objects are local to a process.
I'm trying to understand whether a process would interfere another process running on the same piece of hardware system
Is there anything that lead you to this question or are you simply just imagining some scenario based on a whim?
A lock is local to the process running those threads. If you want to synchronize across processes, consider using a Semaphore.
will it affect the performance other processes due to its intensive usage of lock?
Short answer, no. Of course, unfettered and whimsical use of lock will probably lead to some live-lock/deadlock scenarios.
No, that's not going to be a problem... you're only locking your own worker process only. Other tasks have their process. While locks are useful for specific tasks I'd recommend you keep them to a minimum since you'll introduce waits in your application.
I have written a program which depends on threads heavily. In addition, there is a requirement to measure the total time taken by each thread, and also the execution time (kernel time plus user time).
There can be an arbitrary number of threads and many may run at once. This is down to user activity. I need them to run as quickly as possible, so using something which has some overhead like WMI/Performance Monitor to measure thread times is not ideal.
At the moment, I'm using GetThreadTimes, as shown in this article: http://www.codeproject.com/KB/dotnet/ExecutionStopwatch.aspx
My question is simple: I understand .NET threads may not correspond on a one-to-one basis with system threads (though in all my testing so far, it seems to have been one to one). That being the case, if .NET decides to put two or more of my threads into one system thread, am I going to get strange results from my timing code? If so (or even if not), is there another way to measure the kernel and user time of a .NET thread?
as it stated: Multithreading is managed internally by a thread scheduler, a function the CLR typically delegates to the operating system. A thread scheduler ensures all active threads are allocated appropriate execution time, and that threads that are waiting or blocked (for instance, on an exclusive lock or on user input) do not consume CPU time.
Theoretically NET team may implement their own scheduler, but i doubt this.So i think the GetThreadTimes function is what you need.
I've written a small, in house C# program which batch converts between various file formats, mostly relying on other API's. Currently, the UI spawns a BackgroundWorker (not a Thread) to handle the conversions, and then fills a queue with the requests which empties as the worker completes jobs. The queue object themselves are very small (3 Strings to tell the worker what to do) and don't contribute greatly to the program's memory footprint. I've been fairly draconian with my memory management (disposing of Images when finished with them, and manually garbage collecting at certain points.) Still, the program tends to use about 100 MB of memory at a time, and uses about 50% of total CPU time. It seems like if I naively implemented threading, it would quickly run out of system memory (unless the CLR does some sort of magic I don't know about.)
Is there a simple/efficient way to spawn threads to prevent the system from running out of memory besides catching OutOfMemory exceptions and rolling back the thread that died (seems very inefficient, but there's no way to preserve state without using prohibitive amounts of memory)?
If you use the ThreadPool.QueueUserWorkItem to spawn the conversions you will automatically get a limit on how many threads that are running. The ThreadPool manages this internally and will process queued calls as soon as a pool thread becomes available.
Put a limit on the queue size and make sender wait if it's full. You would have to find the right queue size limit empirically though.
Out of memory exceptions can be tricky and maybe caused by fragmented memory, not actually running out of memory. Therefore they can be tricky to track down.
Tess from Microsoft product support in Sweeden (http://blogs.msdn.com/tess/) has got a good number of posts on tracking down memory where memory is going to help with this process.
A good profile of your app could also help. JetBrains have a good one as is AQTime.
Using 100MB of memory is not a lot for an image processing application.
Also, in .net when you get an OutOfMemory exception the entire process dies, you can't recover from this.
If you need more memory (or, more correctly, more address space) than the system can give you can either page memory in and out as you use it or switch to 64bit.
You can limit the queue size to make sure the only "heavy" memory user is the worker thread, you can have more than one worker thread, that will increase your memory usage but will also empty the queue faster, just remember that in a CPU intensive operations like this having more threads than CPU cores is inefficient.
You talk about "naively implemented threading", multi-threading is full of pitfalls - a naive implementation will be inefficient and full of bugs.