My application (base application is MFC interop with C++/CLI but it also contains a lot of C#, Windows Forms, WPF) has has a handle leak. Shortly after application start I can see the handle count in the task manager grow continuously (at a rate of 10 new handles per second). So I used handles.exe to see what kind of handles they are. I found out that the leaking handles are process handles. And they are process handles to the process of my application.
So I wonder what operations would typically create a handle to the process it runs in. Any idea? Have you ever seen something like this? What else could I do to track down the leak, considering that I can't use debug DLLs and that I can only use tools that can be xcopy deployed.
Update:
I was able to throw windbg and !handle, !htrace at it and found out that the process handles are all created using the following stack traces (ordered by frequency):
0x79f7570b: mscorwks!CorExitProcess+0x00022055
0x79f03edd: mscorwks!GetPrivateContextsPerfCounters+0x0000b6fe
0x79f04b87: mscorwks!GetPrivateContextsPerfCounters+0x0000c3a8
0x79f04b03: mscorwks!GetPrivateContextsPerfCounters+0x0000c324
0x79f919bf: mscorwks!CorExitProcess+0x0003e309
0x79f91b28: mscorwks!CorExitProcess+0x0003e472
0x792d6b4c: mscorlib_ni+0x00216b4c
0x1391a663: +0x1391a663
0x1391a0b1: +0x1391a0b1
0x7a9ea544: System_ni+0x005aa544
0x792a842f: mscorlib_ni+0x001e842f
or
0x7c8106f5: kernel32!CreateThread+0x0000001e
0x79f04bb2: mscorwks!GetPrivateContextsPerfCounters+0x0000c3d3
0x79f04b03: mscorwks!GetPrivateContextsPerfCounters+0x0000c324
0x79f919bf: mscorwks!CorExitProcess+0x0003e309
0x79f91b28: mscorwks!CorExitProcess+0x0003e472
0x792d6b4c: mscorlib_ni+0x00216b4c
0x1391a663: +0x1391a663
0x1391a0b1: +0x1391a0b1
0x7a9ea544: System_ni+0x005aa544
0x792a842f: mscorlib_ni+0x001e842f
or
0x08ec2eba: +0x08ec2eba
0x792b8277: mscorlib_ni+0x001f8277
0x792b8190: mscorlib_ni+0x001f8190
0x792b8040: mscorlib_ni+0x001f8040
0x792b7ff2: mscorlib_ni+0x001f7ff2
0x677e48f3: System_Runtime_Remoting_ni+0x000748f3
0x677e44be: System_Runtime_Remoting_ni+0x000744be
0x677e46ec: System_Runtime_Remoting_ni+0x000746ec
0x677e8408: System_Runtime_Remoting_ni+0x00078408
0x7926eb8d: mscorlib_ni+0x001aeb8d
Now what does that tell me?
The call stacks look wrong. Did you setup the symbol server correctly? .symfix should do the trick in Windbg. Afterwards you should get a better stacktrace.
It looks like part of the code that has this problem is managed so it would make sense to break on DuplicateHandle and OpenProcess and dump the managed call stack there. These two methods are the only ones which could produce a real process handle.
You can declare a breakpoint like this and execute commands when the breakpoint is hit.
In this case the managed stack is printed and then the execution does continue.
bp kernel32!OpenProcess "!ClrStack;g"
Had same issues with a webservice calling COM objects through interop.
I solved this by explicitely calling Marshal.ReleaseComObject against the interop objects I created. No issues after that moment for me.
Hope it helps.
So... are you doing performance counters explicitely (if so, try disabling them to narrow down on the source of the leaks).
Related
I have a computationally-expensive multi-threaded C# app that seems to crash consistently after 30-90 minutes of running. The error it gives is
The runtime has encountered a fatal error. The address of the error was at 0xec37ebae, on thread 0xbcc. The error code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, which may corrupt the stack.
(0xc0000005 is the error-code for Access Violation)
My app does not invoke any native code, or use any unsafe blocks, or even any non-CLS compliant types like uint. In fact, the line of code that the debugger says caused the crash is
overallLength += distanceTravelled;
Where both values are of type double
Given all this, I believe the crash must be due to a bug in the compiler or CLR or JIT. I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin. I've never had to view the CIL-binary, or the compiled JIT output, or the native stacktrace (there is no managed stacktrace at the time of the crash), so I'm not sure how. I can't even figure out how to view the state of all the variables at the time of the crash (VS unfortunately won't tell me like it does after managed-exceptions, and outputting them to console/a file would slow down the app 1000-fold, which is obviously not an option).
So, how do I go about debugging this?
[Edit] Compiled under VS 2010 SP1, running latest version of .Net 4.0 Client Profile. Apparently it's ".Net 4.0C/.Net 4.0E, .Net CLR 1.1.4322"
I'd like to figure out what causes it, or at the very least write a smaller reproduction to send into Microsoft, but I have no idea where to even begin.
"Smaller reproduction" definitely sounds like a great idea here... even if "smaller" won't mean "quicker to reproduce".
Before you even start, try to reproduce the error on another machine. If you can't reproduce it on another machine, that suggests a whole different set of tests to do - hardware, installation etc.
Also, check you're on the latest version of everything. It would be annoying to spend days debugging this (which is likely, I'm afraid) and then end up with a response of "Yes, we know about this - it was a bug in .NET 4 which was fixed in .NET 4.5" for example. If you can reproduce it on a variety of framework versions, that would be even better :)
Next, cut out everything you can in the program:
Does it have a user interface at all? If possible, remove that.
Does it use a database? See if you can remove all database access: definitely any output which isn't used later, and ideally input too. If you can hard code the input within the app, that would be ideal - but if not, files are simpler for reproductions than database access.
Is it data-sensitive? Again, without knowing much about the app it's hard to know whether this is useful, but assuming it's processing a lot of data, can you use a binary search to find a relatively small amount of data which causes the problem?
Does it have to be multi-threaded? If you can remove all the threading, obviously that may well then take much longer to reproduce the problem - but does it still happen at all?
Try removing bits of business logic: if your app is componentized appropriately, you can probably fake out whole significant components by first creating a stub implementation, and then simply removing the calls.
All of this will gradually reduce the size of the app until it's more manageable. At each step, you'll need to run the app again until it either crashes or you're convinced it won't crash. If you have a lot of machines available to you, that should help...
tl;dr Make sure you're compiling to .Net 4.5
This sounds suspiciously like the same error found here. From the MSDN page:
This bug can be encountered when the Garbage Collector is freeing and compacting memory. The error can happen when the Concurrent Garbage Collection is enabled and a certain combination of foreground Garbage Collection and background Garbage Collection occurs. When this situation happens you will see the same call stack over and over. On the heap you will see one free object and before it ends you will see another free object corrupting the heap.
The fix is to compile to .Net 4.5. If for some reason you can't do this, you can also disable concurrent garbage collection by disabling gcConcurrent in the app.config file:
<configuration>
<runtime>
<gcConcurrent enabled="false"/>
</runtime>
</configuration>
Or just compile to x86.
WinDbg is your friend:
http://blogs.msdn.com/b/tess/archive/2006/02/09/net-crash-managed-heap-corruption-calling-unmanaged-code.aspx
http://www.codeproject.com/Articles/23589/Get-Started-Debugging-Memory-Related-Issues-in-Net
http://www.codeproject.com/Articles/22245/Quick-start-to-using-WinDbg
Download Debug Diagnostic Tool v1.2
Run program
Add Rule "Crash"
Select "Specific Process"
on page Advanced Configuration set your exception if you know on which exception it fails or just leave this page as is
Set userdump location
Now wait for process to crash, log file is created by DebugDiag. Now activate tab Advanced Analysis, select Crash/Hang Analyzers in top list and dump file in lower list and hit Start Analysis. This will generate html report for you. Hopes you found usefull info in that report. If you have problem with analyze, upload html report somewhere and place url here so we can focus on it.
My app does not invoke any native code, or use any unsafe blocks, or
even any non-CLS compliant types like uint
You may think this, but threading, synchronization via semaphore, mutex it any handles all are native. .net is a layer over operating system, .net itself does not support pure clr code for multithreading apps, this is because OS already does it.
Most likely this is thread synchronization error. Probably multiple threads are trying to access shared resource like file etc that is outside clr boundary.
You may think you aren't accessing com etc, but when you call certain API like get desktop folder path etc it is called through shell com API.
You have following two options,
Publish your code so that we can review the bottleneck
Redesign your app using .net parallel threading framework, which includes variety of algorithms requiring CPU intensive operations.
Most likely programs fail after certain period of time as collections grow up and operations fail to execute before other thread interfere. For example, producer consumer problem, you will not notice any problem till producer will become slower or fail to finish its operation before consumer kicks in.
Bug in clr is rare, because clr is very stable. But poorly written code may lead error to appear as bug in clr. Clr can not and will never detect whether the bug is in your code or in clr itself.
Did you run a memory test for your machine as the one time I had comparable symptoms one of my dimms turned out to be faulty (a very good memorytester is included in Win7; http://www.tomstricks.com/how-to-test-your-ram-or-memory-with-windows-memory-diagnostic-tool-in-windows-7/)
It might also be a heating/throttling issue if your CPU gets too hot after this period of time. Although that would happen sooner imho.
There should be a dumpfile that you can analyze. If you never did this find someone who did, or send that to microsoft
I will suggest you open a support case via http://support.microsoft.com immediately, as the support guys can show you how to collect the necessary information.
Generally speaking, like #paulsm4 and #psulek said, you can utilize WinDbg or Debug Diag to capture crash dumps of the process, and within it, all necessary information is embedded. However, if this is the very first time you use those tools, you might be puzzled. Microsoft support team can provide you step by step guidance on them, or they can even set up a Live Meeting session with you to capture the data, as the program crashes so often.
Once you are familiar with the tools, in the future you can perform similar troubleshooting more easily,
http://blogs.msdn.com/b/lexli/archive/2009/08/23/when-the-application-program-crashes-on-windows.aspx
BTW, it is too early to say "I've found a bug". Though you cannot obviously find in your program a dependency on native code, it might still have a dependency on native code. We should not draw a conclusion before debugging further into the issue.
I've got a strange behavior here:
I get a massive memory leak in production running a WPF application that runs on a DLOG-Terminal (Windows Embedded Standard SP1) that behaves perfectly fine if I run it localy on a normal desktop (Win7 prof.)
After many unsucessful attempts to find any problem I put one of those directly beside my monitor, installed the ANTs MemoryProfiler and did one hour test run simulating user operations on both the terminal and my development PC.
Result is, that due to some strange reasons the embedded system piles up a huge amount of WeakReference and EffectiveValueEntry[] Objects.
Here are are some pictures:
Development (PC):
And the terminal:
Just look at the class list...
Has anyone seen something like this before and are there known solutions to this?
Where can I get help?
(PS the terminals where installed with images prepared for .net4)
PPS: for the close-voter: I think the question is clear: how can I fix this.
You could argue if this is a IT/OS problem vs. a programming problem but I think if I post this in Server Fault it will get a off-topic close in no time...
UPDATE:
I was able to find a big portion of the problem - but it feels a bit like C++:
I use a ViewModel-like Items class for a WPF-List that provides (among others) a ICommand (RelayCommand-pattern). The Items where created on the fly in the getter of a ViewModel-Property for the view and it seems that the application/GC did never free those unused commands - or the subscribtions to their CanExecuteChanged - the memory profiler shows those as "held by a weak reference". I changed my code to reuse those item-viewmodels and Dispose/set to null every used properties in their Dispose and use this too as clean up - as I said: feels like "delete" in those old C++ days.
On top of this I use a forced GC.Collect every 30mins (yeah I know - you never should - but I got no other solution till now).
With this setup the applications runs for 6+ hours without problems so far but it don't feel right.
I cannot understand why those WeakReferences are not claimed as they are on my desktop machine...
Any thoughts on this? Please!
UPDATE:
I am still not able to pin down this problem but I see a strange behavior:
If I use PC-Anywhere to observe the operation of my software on one of the terminals the problem goes away!
Even after running 8hr. straight the software runs as it should - it will even free memory (I put a little memorycounter-display in the main-screen - let's say I connect to the terminal and see that memory is low - after waiting a few minutes the memory is reclaimed)
So I think Devin (one Answer below) has a lead in the right direction - something in the Remote-Control software unblocks the finalizer-thread or whatever is blocking the GC - be it the simulated keyboard/mouse or whatever.
Any thoughts on this?
We had a (somewhat) similar issue running my app on a tablet. The memory would be reclaimed when run on a desktop, but not when run on a tablet or some other device that used a PC Input panel. The problem is that the finalization queue was getting stuck. The COM object finalizer was waiting to run something on the main thread, which didn't have a message loop.
The solution was to find an adequate time to invoke Application.DoEvents(). We had a method that would be called intermittently and we invoke it with every 10th call. I don't know if this is the same issue you are having, but maybe it can shed some light.
EDIT: I do need to make it clear, in general, calling DoEvents() is a bad idea. It works in that case because there isn't any UI on that thread or anything else happening that those events can interfere with.
From the screenshots it is interesting to see that the LOH grows at the same time the used space does not grow much. The free space is growing a lot at the LOH which indicates memory fragmentation due to pinned objects. This looks like a stuck finalizer thread which does prevent the cleanup of managed objects. You should get a memory dump and check in which method the finalizer thread was stuck. You can do this quite easy with Windbg.
Is it possible to enumerate all managed threads in C#? Visual Studio seems to be able to do this when you hit a break point while debugging. In the "Threads" window it shows a list of all running threads, including managed names. Does anyone know how it does this?
Debuggers are often able to do things "normal" code can't. You'd probably find that you can do this if you use the profiling/debugging API, but I don't believe you can do it from "normal" .NET code.
This sounds like a duplicate of "How to enumerate threads in .NET using the Name property?" - If so, the short answer is "keep track of your own threads yourself" - i.e. in a List<Thread> or similar.
Take a look at Managed Stack Explorer:
MSE works by quickly attaching to a
process when a stack trace is
requested and the detaching the second
the stack trace has been retrieved.
This way the interference in the
normal operation of the process is
minimized.
Unfortunately, this means it has to done by an external process.
A similar tool is StackDump, which uses MDbg to generate the dump.
You can enumerate threads by making an HTTP GET request to ProcInsp. ProcInsp allows you to get threads with stack traces of CLR processes (either in its UI or as a JSON message). The web API is available at /Process/%PID%/Threads.
Disclaimer: I am the developer of this tool. It is MIT-licensed and free for use.
Other than that I don't know if I can reproduce it now that it's happened (I've been using this particular application for a week or two now without issue), assuming that I'm running my application in the VS debugger, how should I go about debugging a deadlock after it's happened? I thought I might be able to get at call stacks if I paused the program and hence see where the different threads were when it happened, but clicking pause just threw Visual Studio into a deadlock too till I killed my application.
Is there some way other than browsing through my source tree to find potential problems? Is there a way to get at the call stacks once the problem has occured to see where the problem is? Any other tools/tips/tricks that might help?
What you did was the correct way. If Visual Studio also deadlocks, that happens now and then. It's just bad luck, unless there's some other issue.
You don't have to run the application in the debugger in order to debug it. Run the application normally, and if the deadlock happens, you can attach VS later. Ctrl+Alt+P, select the process, choose debugger type and click attach. Using a different set of debugger types might reduce the risk of VS crashing (especially if you don't debug native code)
A deadlock involves 2 or more threads. You probably know the first one (probably your UI thread) since you noticed the deadlock in your application. Now you only need to find the other one. With knowledge of the architecture, it should be easy to find (e.g. what other threads use the same locks, interact with the UI etc)
If VS doesn't work at all, you can always use windbg. Download here: http://www.microsoft.com/whdc/devtools/debugging/default.mspx
I'd try different approaches in the following order:
First, inspect the code to look for thread-safety violations, making sure that your critical regions don't call other functions that will in turn try to lock a critical region.
Use whatever tool you can get your hands on to visualize thread activity, I use an in-house perl script that parses an OS log we made and graphs all the context switches and shows when a thread gets pre-empted.
If you can't find a good tool, do some logging to see the last threads that were running before the deadlock occurred. This will give you a clue as to where the issue might be caused, it helps if the locking mechanisms have unique names, like if an object has it's own thread, create a dedicated semaphore or mutex just to manage that thread.
I hope this helps. Good luck!
You can use different programs like Intel(R) Parallel Inspector:
http://software.intel.com/en-us/intel-parallel-inspector/
Such programs can show you places in your code with potential deadlocks. However you should pay for it, or use it only evaluation period. Don't know if there is any free tools like this.
Just like anywhere, there're no "Silver bullet" tools to catch all the deadlocks. It is all about the sequence in which different threads aquire resources so your job is to find out where the order was violated. Usually Visual Studio or other debugger will provide stack traces and you will be able to find out where the discrepancy is. DevPartner Studio does provide deadlock analysis but last time I've checked there were too many false positives. Some static analysis tools will find some potential deadlocks too.
Other than that it helps to get the architecture straight to enforce resource aquisition order. For example, layering helps to make sure upper level locks are taken before lower ones but beware of callbacks.
From C#, is it possible to detect the number of context switches that occurred while executing a block of code on a particular thread? Ideally, I'd like to know how many times and what CPU my thread code was scheduled on.
I know I can use tools like Event Tracing for Windows and the associated viewers, but this seemed a bit complicated to get the data I wanted.
Also, tools like Process Explorer make it too hard to tell how many switches occurred as a result of a specific block of code.
Background: I'm trying to test the actual performance of a low level lock primitive in .NET (as a result of some comments on a recent blog post I made.
It sounds like you may be looking for a programmatic solution, but if not, Microsoft's Process Explorer tool will tell you very easily the number of context switches for a particular thread.
Once in the tool, double-click your process, select the Threads tab, and select your thread.
The .NET tab has more specific .NET-related perf data.
I've never done this, but here are a few leads that might help:
The .NET profiler APIs might allow you to hook in? The ICorProfilerCallback interface has RuntimeThreadSuspended and RuntimeThreadResumed callbacks. But a comment on this blog post seems to indicate that they won't get you what you are looking for: "RuntimeThreadSuspended is issued when a thread is being suspended by the runtime, typically in preparation for doing a GC."
There is a "Context Switches/sec" perfmon counter that might help. I haven't looked at this counter specifically, but I'm guessing it operates on Win32 threads and not managed threads. You can use the profiling APIs to get the Win32 thread ID for any given managed thread ID.
Good luck! ;)
It looks like procexp might be using the kernel thread (KTHREAD) or executive thread (ETHREAD) data structures that have a ContextSwitches field on them. It might be possible to get this from managed code.