How to debug the potential memory leak? - c#

I programed the windows service to do a routine work.
I InstallUtil it to windows service and it'will wake up and do something and then thread.sleep(5min)
The code is simple, but I've noticed a potential memory leak. I traced it using DOS tasklist and drew a chart:
Can I say that it's pretty clear there was memory leak, although so little.
My code is like below, Please help me to find the potential leak. Thanks.
public partial class AutoReport : ServiceBase
{
int Time = Convert.ToInt32(AppSettings["Time"].ToString());
private Utilities.RequestHelper requestHelper = new RequestHelper();
public AutoReport()
{
InitializeComponent();
}
protected override void OnStart(string[] args)
{
Thread thread = new Thread(new ParameterizedThreadStart(DoWork));
thread.Start();
}
protected override void OnStop()
{
}
public void DoWork(object data)
{
while (true)
{
string jsonOutStr = requestHelper.PostDataToUrl("{\"KeyString\":\"somestring\"}", "http://myurl.ashx");
Thread.Sleep(Time);
}
}
}
Edit: After using WinDbg #Russell suggested. What should I do to these classes?
MT Count TotalSize ClassName
79330b24 1529 123096 System.String
793042f4 471 41952 System.Object[]
79332b54 337 8088 System.Collections.ArrayList
79333594 211 70600 System.Byte[]
79331ca4 199 3980 System.RuntimeType
7a5e9ea4 159 2544 System.Collections.Specialized.NameObjectCollectionBase+NameObjectEntry
79333274 143 30888 System.Collections.Hashtable+bucket[]
79333178 142 7952 System.Collections.Hashtable
79331754 121 57208 System.Char[]
7a5d8120 100 4000 System.Net.LazyAsyncResult
00d522e4 95 5320 System.Configuration.FactoryRecord
00d54d60 76 3952 System.Configuration.ConfigurationProperty
7a5df92c 74 2664 System.Net.CoreResponseData
7a5d8060 74 5032 System.Net.WebHeaderCollection
79332d70 73 876 System.Int32
79330c60 73 1460 System.Text.StringBuilder
79332e4c 72 2016 System.Collections.ArrayList+ArrayListEnumeratorSimple
7.93E+09 69 1380 Microsoft.Win32.SafeHandles.SafeTokenHandle
7a5e0d0c 53 1060 System.Net.HeaderInfo
7a5e4444 53 2120 System.Net.TimerThread+TimerNode
79330740 52 624 System.Object
7a5df1d0 50 2000 System.Net.AuthenticationState
7a5e031c 50 5800 System.Net.ConnectStream
7aa46f78 49 588 System.Net.ConnectStreamContext
793180f4 48 960 System.IntPtr[]

This is how I'd go about finding the memory leak:
1) Download WinDbg if you don't already have it. It's a really powerful (although difficult to use as it's complicated) debugger.
2) Run WinDbg and attach it to your process by pressing F6 and selecting your exe.
3) When it has attached type these commands: (followed by enter)
//this will load the managed extensions
.loadby sos clr
//this will dump the details of all your objects on the heap
!dumpheap -stat
//this will start the service again
g
Now wait a few minutes and type Ctrl+Break to break back into the service. Run the !Dumpheap -stat command again to find out what is on the heap now. If you have a memory leak (in managed code) then you will see one or more of your classes keep getting added to the heap over time. You now know what is being kept in memory so you know where to look for the problem in your code. You can work out what is holding references to the objects being leaked from within WinDbg if you like but it's a complicated process. If you decide to use WinDbg then you probably want to start by reading Tess's blog and doing the labs.

you will need to use an Allocation Profiler to Detect Memory Leaks, there are some good profiler for that, i can recommend the AQTime (see this video)
And read: How to Locate Memory Leaks Using the Allocation Profiler
Maby this article can be helpfull too

To find a memory leak, you should look at the performance counters on a long period of time. If you see the number of handles or total bytes in all heap growing without never decreasing, you have a real memory leak. Then, you can use for example profiling tools in visual studio to track the leak. There is also a tool from redgate which works quite well.

Difficult to say, but I suspect this line in your while loop within DoWork:
JsonIn jsonIn = new JsonIn { KeyString = "secretekeystring", };
Although jsonin only has scope within the while block, I would hazard that the Garbage Collector is taking its time to remove unwanted instances.

Related

Find source of token handle leak in managed process

I'm investigating a handle leak in a WCF service, running on .NET 4.6.2. The service runs fine but over time handle count keeps increasing, with thousands of Token type handles sitting around in the process. It seems memory is also leaking very slowly (likely related to the handle leak).
Edit: it looks like event and thread handles are also leaked.
Process Explorer shows that the suspect handles all have the same name:
DOMAIN\some.username$:183db90
and all share the same address.
I attached WinDbg to the process and ran !htrace -enable and then !htrace -diff some time later. This gave me a list of almost 2000 newly opened handles and native stack traces, like this one:
Handle = 0x000000000000b02c - OPEN
Thread ID = 0x000000000000484c, Process ID = 0x0000000000002cdc
0x00007ffc66e80b3a: ntdll!NtCreateEvent+0x000000000000000a
0x00007ffc64272ce8: KERNELBASE!CreateEventW+0x0000000000000084
0x00007ffc5b392e0a: clr!CLREventBase::CreateManualEvent+0x000000000000003a
0x00007ffc5b3935c7: clr!Thread::AllocHandles+0x000000000000007b
0x00007ffc5b3943c7: clr!Thread::CreateNewOSThread+0x000000000000007f
0x00007ffc5b394308: clr!Thread::CreateNewThread+0x0000000000000090
0x00007ffc5b394afb: clr!ThreadpoolMgr::CreateUnimpersonatedThread+0x00000000000000cb
0x00007ffc5b394baf: clr!ThreadpoolMgr::MaybeAddWorkingWorker+0x000000000000010c
0x00007ffc5b1d8c74: clr!ManagedPerAppDomainTPCount::SetAppDomainRequestsActive+0x0000000000000024
0x00007ffc5b1d8d27: clr!ThreadpoolMgr::SetAppDomainRequestsActive+0x000000000000003f
0x00007ffc5b1d8cae: clr!ThreadPoolNative::RequestWorkerThread+0x000000000000002f
0x00007ffc5a019028: mscorlib_ni+0x0000000000549028
0x00007ffc59f5f48f: mscorlib_ni+0x000000000048f48f
0x00007ffc59f5f3b9: mscorlib_ni+0x000000000048f3b9
Another stack trace (a large portion of the ~2000 new handles have this):
Handle = 0x000000000000a0c8 - OPEN
Thread ID = 0x0000000000003614, Process ID = 0x0000000000002cdc
0x00007ffc66e817aa: ntdll!NtOpenProcessToken+0x000000000000000a
0x00007ffc64272eba: KERNELBASE!OpenProcessToken+0x000000000000000a
0x00007ffc5a01aa9b: mscorlib_ni+0x000000000054aa9b
0x00007ffc5a002ebd: mscorlib_ni+0x0000000000532ebd
0x00007ffc5a002e68: mscorlib_ni+0x0000000000532e68
0x00007ffc5a002d40: mscorlib_ni+0x0000000000532d40
0x00007ffc5a0027c7: mscorlib_ni+0x00000000005327c7
0x00007ffbfbfb3d6a: +0x00007ffbfbfb3d6a
When I run the !handle 0 0 command in WinDbg, I get the following result:
21046 Handles
Type Count
None 4
Event 2635 **
Section 360
File 408
Directory 4
Mutant 9
Semaphore 121
Key 77
Token 16803 **
Thread 554 **
IoCompletion 8
Timer 3
TpWorkerFactory 2
ALPC Port 7
WaitCompletionPacket 51
The ones marked with ** are increasing over time, although at a different rate.
Edit 3:
I ran !dumpheap to see the number of Thread objects (unrelated classes removed):
!DumpHeap -stat -type System.Threading.Thread
Statistics:
MT Count TotalSize Class Name
00007ffc5a152bb0 745 71520 System.Threading.Thread
Active thread count fluctuates between 56 and 62 in Process Explorer as the process is handling some background tasks periodically.
Some of the stack traces are different but they're all native traces so I don't know what managed code triggered the handle creation. Is it possible to get the managed function call that's running on the newly created thread when Thread::CreateNewThread is called? I don't know what WinDbg command I'd use for this.
Note:
I cannot attach sample code to the question because the WCF service loads hundreds of DLL-s, most of which are built from many source files - I have some very vague suspicions in what major area this may come from but I don't know any details to show an MCVE.
Edit: after Harry Johnston's comments below, I noticed that thread handle count is also increasing - I overlooked this earlier because of the high number of token handles.

How to find out deadlocking awaited Tasks and their current call stack?

Here is one simplified example I found that really hard to debug deadlock in awaited tasks in some cases:
class Program
{
static void Main(string[] args)
{
var task = Hang();
task.Wait();
}
static async Task Hang()
{
var tcs = new TaskCompletionSource<object>();
// do some more stuff. e.g. another await Task.FromResult(0);
await tcs.Task;
tcs.SetResult(0);
}
}
This example is easy to understand why it deadlocks, it is awaiting a task which is finished later on. It looks stupid but similar scenario could happen in more complicated production code and deadlocks could be mistakenly introduced due to lack of multithreading experience.
Interesting thing for this example is inside Hang method there is no thread blocking code like Task.Wait() or Task.Result. Then when I attach VS debugger, it just shows the main thread is waiting for the task to finish. However, there is no thread showing where the code has stopped inside Hang method using Parallel Stacks view.
Here are the call stacks on each thread (3 in all) I have in the Parallel Stacks:
Thead 1:
[Managed to Native Transition]
Microsoft.VisualStudio.HostingProcess.HostProc.WaitForThreadExit
Microsoft.VisualStudio.HostingProcess.HostProc.RunParkingWindowThread
System.Threading.ThreadHelper.ThreadStart_Context
System.Threading.ExecutionContext.RunInternal
System.Threading.ExecutionContext.Run
System.Threading.ExecutionContext.Run
System.Threading.ThreadHelper.ThreadStart
Thread 2:
[Managed to Native Transition]
Microsoft.Win32.SystemEvents.WindowThreadProc
System.Threading.ThreadHelper.ThreadStart_Context
System.Threading.ExecutionContext.RunInternal
System.Threading.ExecutionContext.Run
System.Threading.ExecutionContext.Run
System.Threading.ThreadHelper.ThreadStart
Main Thread:
System.Threading.Monitor.Wait
System.Threading.Monitor.Wait
System.Threading.ManualResetEventSlim.Wait
System.Threading.Tasks.Task.SpinThenBlockingWait
System.Threading.Tasks.Task.InternalWait
System.Threading.Tasks.Task.Wait
System.Threading.Tasks.Task.Wait
ConsoleApplication.Program.Main Line 12 //this is our Main function
[Native to Managed Transition]
[Managed to Native Transition]
System.AppDomain.ExecuteAssembly
Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly
System.Threading.ThreadHelper.ThreadStart_Context
System.Threading.ExecutionContext.RunInternal
System.Threading.ExecutionContext.Run
System.Threading.ExecutionContext.Run
System.Threading.ThreadHelper.ThreadStart
Is there anyway to find out where the task has stopped inside Hang method? And the call stack if possible? I believe internally there must be some states about each task and their continuation points so that the scheduler could work. But I don't know how to check that out.
Inside of Visual Studio, I am not aware of a way to debug this sort of situation simply. However, there are two other ways to visualize this for full framework applications, plus a bonus preview of a way to do this in .NET Core 3.
tldr version: Yep, its hard, and Yep, the information you want is there, just difficult to find. Once you find the heap objects as per the below methods, you can use the address of them in the VS watch window to use the visualizers to take a deeper dive.
WinDbg
WinDbg has a primitive but helpful extension that provides a !dumpasync command.
If you download the extension from the vs-threading release branch and copy the x64 and x86 AsyncDebugTools.dll to C:\Program Files (x86)\Windows Kits\10\Debuggers\[x86|x64]\winext folders, you can do the following:
.load AsyncDebugTools
!dumpasync
The output (taken from the link above) looks like:
07494c7c <0> Microsoft.Cascade.Rpc.RpcSession+<SendRequestAsync>d__49
.07491d10 <1> Microsoft.Cascade.Agent.WorkspaceService+<JoinRemoteWorkspaceAsync>d__28
..073c8be4 <5> Microsoft.Cascade.Agent.WorkspaceService+<JoinWorkspaceAsync>d__22
...073b7e94 <0> Microsoft.Cascade.Rpc.RpcDispatcher`1+<>c__DisplayClass23_2+<<BuildMethodMap>b__2>d[[Microsoft.Cascade.Contracts.IWorkspaceService, Microsoft.Cascade.Common]]
....073b60e0 <0> Microsoft.Cascade.Rpc.RpcServiceUtil+<RequestAsync>d__3
.....073b366c <0> Microsoft.Cascade.Rpc.RpcSession+<ReceiveRequestAsync>d__42
......073b815c <0> Microsoft.Cascade.Rpc.RpcSession+<>c__DisplayClass40_1+<<Receive>b__0>d
On your sample above the output is less interesting:
033a23c8 <0> StackOverflow41476418.Program+<Hang>d__1
The description of the output is:
The output above is a set of stacks – not exactly callstacks, but actually "continuation stacks". A continuation stack is synthesized based on what code has 'awaited' the call to an async method. It's possible that the Task returned by an async method was awaited from multiple places (e.g. the Task was stored in a field, then awaited by multiple interested parties). When there are multiple awaiters, the stack can branch and show multiple descendents of a given frame. The stacks above are therefore actually "trees", and the leading dots at each frame helps recognize when trees have multiple branches.
If an async method is invoked but not awaited on, the caller won't appear in the continuation stack.
Once you see the nested hierarchy for more complex situations, you can at least deep-dive into the state objects and find their continuations and roots.
LinqPad and ClrMd
Another useful too is LinqPad coupled with ClrMd and ClrMD.Extensions. The latter package is used to bridge ClrMd into LINQPad - there is a getting started guide. Once you have the packages/namespaces set, this query is what you want:
var session = ClrMD.Extensions.ClrMDSession.LoadCrashDump(#"dmpfile.dmp");
var stateMachineTypes = (
from type in session.Heap.EnumerateTypes()
where type.Interfaces.Any(item => item.Name == "System.Runtime.CompilerServices.IAsyncStateMachine")
select type);
session.Heap.EnumerateDynamicObjects(stateMachineTypes).Dump(2);
Below is a sample of the output running on your sample code:
DotNet Core 3
For .NET Core 3.x they added !dumpasync into the WinDbg sos extension. It is MUCH better than the extension described above, as it gives much more context. You can see it is part of a much larger user story to improve debugging of async code. Here is the output from that under .NET Core 3.0 preview 6 with a preview 7 version of SOS with extended options. Note that line numbers are present, something you don't get with the above options.:
0:000> !dumpasync -stacks -roots
Statistics:
MT Count TotalSize Class Name
00007ffb564e9be0 1 96 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[StackOverflow41476418_Core.Program+<Hang>d__1, StackOverflow41476418_Core]]
Total 1 objects
In 1 chains.
Address MT Size State Description
00000209915d21a8 00007ffb564e9be0 96 0 StackOverflow41476418_Core.Program+<Hang>d__1
Async "stack":
.00000209915d2738 System.Threading.Tasks.Task+SetOnInvokeMres
GC roots:
Thread bc20:
000000e08057e8c0 00007ffbb580a292 System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs # 2939]
rbp+10: 000000e08057e930
-> 00000209915d21a8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[StackOverflow41476418_Core.Program+<Hang>d__1, StackOverflow41476418_Core]]
000000e08057e930 00007ffbb580a093 System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs # 2878]
rsi:
-> 00000209915d21a8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[StackOverflow41476418_Core.Program+<Hang>d__1, StackOverflow41476418_Core]]
000000e08057e9b0 00007ffbb5809f0a System.Threading.Tasks.Task.Wait(Int32, System.Threading.CancellationToken) [/_/src/System.Private.CoreLib/shared/System/Threading/Tasks/Task.cs # 2789]
rsi:
-> 00000209915d21a8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[StackOverflow41476418_Core.Program+<Hang>d__1, StackOverflow41476418_Core]]
Windows symbol path parsing FAILED
000000e08057ea10 00007ffb56421f17 StackOverflow41476418_Core.Program.Main(System.String[]) [C:\StackOverflow41476418_Core\Program.cs # 12]
rbp+28: 000000e08057ea38
-> 00000209915d21a8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[StackOverflow41476418_Core.Program+<Hang>d__1, StackOverflow41476418_Core]]
000000e08057ea10 00007ffb56421f17 StackOverflow41476418_Core.Program.Main(System.String[]) [C:\StackOverflow41476418_Core\Program.cs # 12]
rbp+30: 000000e08057ea40
-> 00000209915d21a8 System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1[[System.Threading.Tasks.VoidTaskResult, System.Private.CoreLib],[StackOverflow41476418_Core.Program+<Hang>d__1, StackOverflow41476418_Core]]

Web application use too much memory

On my host company, i have 2, more or less identical, .net applications. This two applications use the same pool, which has a max memory on 300 mb.
I can refresh the startpage for one of the application about 10 times, then i get a out of memory exception and the pool is crashed.
In my application i print out this memory values:
PrivateMemorySize64: 197 804.00 kb (193.00 mb)
PeakPagedMemorySize64: 247 220.00 kb (241.00 mb)
VirtualMemorySize64: 27 327 128.00 kb (26 686.00 mb)
PagedMemorySize64: 197 804.00 kb (193.00 mb)
PagedSystemMemorySize64: 415.00 kb (0.00 mb)
PeakWorkingSet64: 109 196.00 kb (106.00 mb)
WorkingSet64: 61 196.00 kb (59.00 mb)
GC.GetTotalMemory(true): 2 960.00 kb (2.00 mb)
GC.GetTotalMemory(false): 2 968.00 kb (2.00 mb)
I have read, and read and read an seen videos about memory profiling, but i can't find any problem when i do the profiling of my application.
I use ANTS Memory profiler 8 and get this result when i refresh the startpage one time after the build:
When i look at the Summary, .NET is using 41.65 MB of 135.8 MB total private bytes allocated for the application.
This values gets bigger and bigger for each refresh. Is that normal? When i refresh 8 times i get this:
.NET is using 56.11 MB of 153 MB total private bytes allocated for the application.
Where should i start? What could be the problem that use so much memory? Is 300 mb to low for memory?
This is undoubtedly due to a memory leak in your code, likely in the form of not disposing/closing connections to something like a queue or database. Profiling aside, review your code and ensure that you're closing/disposing all appropriate resources: your problem should then relieve itself.
There was some db connections that not was disposed. Then i have a class which removes Etags, like this:
public class CustomHeaderModule : IHttpModule
{
public void Dispose() { }
public void Init(HttpApplication context)
{
context.PreSendRequestHeaders += OnPreSendRequestHeaders;
}
void OnPreSendRequestHeaders(object sender, EventArgs e)
{
HttpContext.Current.Response.Headers.Remove("ETag");
}
}'
Must i remove the new event i add in the Init function? Or will the GC fix that?
And i have a lot of this:
Task.Factory.StartNew(() =>
{
Add(...);
});
But i don't dispose them in my code. Will the GC fix that or should i do on a other way?

Size of the finalizer queue

How can I get the current size of the finalizer queue in c#?
I am trying to debug an application that is a little too liberal with letting the garbage collector dispose IDIsposables, which I suspect is related to occasional crashes.
The IDIsposables in question contain unmanaged memory, and therefore implement their own finalizer, which disposes them if the Dispose method is not called explicitely before they are garbage collected.
The garbage collection handles all finalizable objects in a separate finalizer thread, which can grow if objects are created faster than they are finalized.
It would be a big help if I could output the number of objects currently waiting for finalization at various stages in the program.
The IDisposable objects come from various libraries, so adding logging to the respective finalizers themselves is not trivial. Is there a way to get the number of objects waiting for finalisation, directly from the GC?
You can use Memory Performance Counters.
http://msdn.microsoft.com/en-us/library/x2tyfybc(v=vs.110).aspx
To use C# you can use WMemoryProfiler which automates debugging from C#. You can even debug your own process if you wish like this:
using System;
using WMemoryProfiler;
namespace AllocateManyObjects
{
/// <summary>
/// Shows the basic usage of WMemoryProfiler
/// </summary>
class Program
{
static void Main(string[] args)
{
try
{
// Start this application not under the debugger
// To start is press Ctrl+F5 in Visual Studio to start it without debugging.
SelfDebug();
}
catch (Exception ex)
{
Console.WriteLine("Got Exception: {0}", ex);
}
Console.WriteLine("Press Enter to exit.");
Console.ReadLine();
}
/// <summary>
/// Issue a !Threads command and print the output to console
/// </summary>
private static void SelfDebug()
{
using (var debugger = new MdbEng())
{
string[] output = debugger.Execute("!FinalizeQueue");
Console.WriteLine(String.Join(Environment.NewLine, output));
}
}
}
}
You only need to download https://wmemoryprofiler.codeplex.com/SourceControl/latest
compile the sources and install Windbg from the Windows 8.0 SDK or deploy it to the location where it searches for Windbg:
C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x64
C:\Program Files (x86)\Windows Kits\8.0\Debuggers\x86
Then you can parse the output of sos commands or do anything you would like with it.
This sample prints on my machine:
0:008> !FinalizeQueue
*** ERROR: Symbol file could not be found. Defaulted to export symbols for C:\Windows\Microsoft.NET\Framework64\v4.0.30319\clr.dll -
************* Symbol Loading Error Summary **************
Module name Error
clr The system cannot find the file specified : srv*c:\symbols*http://msdl.microsoft.com/download/symbols
You can troubleshoot most symbol related issues by turning on symbol loading diagnostics (!sym noisy) and repeating the command that caused symbols to be loaded.
You should also verify that your symbol search path (.sympath) is correct.
PDB symbol for clr.dll not loaded
SyncBlocks to be cleaned up: 0
Free-Threaded Interfaces to be released: 0
MTA Interfaces to be released: 0
STA Interfaces to be released: 0
----------------------------------
generation 0 has 165 finalizable objects (0000000000a70ed0->0000000000a713f8)
generation 1 has 0 finalizable objects (0000000000a70ed0->0000000000a70ed0)
generation 2 has 0 finalizable objects (0000000000a70ed0->0000000000a70ed0)
Ready for finalization 0 objects (0000000000a713f8->0000000000a713f8)
Statistics for all finalizable objects (including all objects ready for finalization):
MT Count TotalSize Class Name
00007ff84e389230 1 24 System.WeakReference
00007ff84e3874a0 1 32 Microsoft.Win32.SafeHandles.SafePEFileHandle
00007ff84e397a28 1 88 System.Diagnostics.Tracing.EventSource+OverideEventProvider
00007ff84e36eca8 1 104 System.Runtime.Remoting.Contexts.Context
00007ff84e397600 1 160 System.Diagnostics.Tracing.FrameworkEventSource
00007ff84e38a5e0 6 192 Microsoft.Win32.SafeHandles.SafeWaitHandle
00007ff84e388038 3 192 System.Threading.ReaderWriterLock
00007ff84e384928 7 224 Microsoft.Win32.SafeHandles.SafeFileHandle
00007ff84d4298a8 8 256 Microsoft.Win32.SafeHandles.SafeProcessHandle
00007ff84e3822f8 4 384 System.Threading.Thread
00007ff84e3a15a0 17 544 Microsoft.Win32.SafeHandles.SafeTokenHandle
00007ff84e38ca60 7 728 System.IO.FileStream
00007ff84d43c908 8 2240 System.Diagnostics.Process
00007ff84d3f94d8 100 5600 System.Diagnostics.ProcessModule
Total 165 objects
0:008> .remote_exit g;
Press Enter to exit.
You can use sos.FinalizeQueue command

How to diagnose a corrupted suffix pattern in a mixed managed/unmanaged x32 .NET application

I've got a .NET application that pinvokes several libraries, all 32 bit (the application is 32 bit as well). I recently started getting crash bugs that occurred when the GC started freeing memory, and when I attached I saw that it was an access violation. After some web searches, I got myself set up with gflags and windbg, and was able to get the actual problem :
===========================================================
VERIFIER STOP 0000000F: pid 0x9650: corrupted suffix pattern
001B1000 : Heap handle
20A5F008 : Heap block
00000006 : Block size
20A5F00E : corruption address
===========================================================
This verifier stop is not continuable. Process will be terminated
when you use the `go' debugger command.
===========================================================
After doing some more reading, I was able to get a stack trace :
0:009> !heap -p -a 20A5F008
address 20a5f008 found in
_HEAP # f420000
HEAP_ENTRY Size Prev Flags UserPtr UserSize - state
20a5efe0 0008 0000 [00] 20a5f008 00006 - (busy)
Trace: 0a94
60cba6a7 verifier!AVrfpDphNormalHeapAllocate+0x000000d7
60cb8f6e verifier!AVrfDebugPageHeapAllocate+0x0000030e
77e00d96 ntdll!RtlDebugAllocateHeap+0x00000030
77dbaf0d ntdll!RtlpAllocateHeap+0x000000c4
77d63cfe ntdll!RtlAllocateHeap+0x0000023a
60cccb62 verifier!AVrfpRtlAllocateHeap+0x00000092
7666ea43 ole32!CRetailMalloc_Alloc+0x00000016
7666ea5f ole32!CoTaskMemAlloc+0x00000013
6c40b25d clr!MngdNativeArrayMarshaler::ConvertSpaceToNative+0x000000bd
... and some more detailed information on the block entry :
0:009> !heap -i 20a5f008
Detailed information for block entry 20a5f008
Assumed heap : 0x0f610000 (Use !heap -i NewHeapHandle to change)
Header content : 0x00000000 0x00000001
Owning segment : 0x0f610000 (offset 0)
Block flags : 0x0 (free )
Total block size : 0x0 units (0x0 bytes)
Previous block size: 0xb4e4 units (0x5a720 bytes)
Block CRC : OK - 0x0
List corrupted: (Blink->Flink = 00000000) != (Block = 20a5f010)
Free list entry : CORRUPTED
Previous block : 0x20a048e8
Next block : 0x20a5f008
I'm kind of stuck with this data. Unfortunately, ConvertSpaceToNative isn't an illuminating call, since that encompasses... pretty much every unmanaged allocation request. I've tried branching out further to find the information I'd need to trace it back to the offending call and spent days looking through documentation, but am not finding a way to determine the actual source of the corruption. I've tried setting break points and stepping through, but I can't find a way to verify the contents of the heap manually that actually works - it always reports that everything is okay. It also seems to me that I should be able to get the application to halt immediately by turning on full page heaps, but it still looks like it's not halting until the free call (this is the call stack when execution halts) :
0:009> kL
ChildEBP RetAddr
2354ecac 60cb9df2 verifier!VerifierStopMessage+0x1f8
2354ed10 60cba22a verifier!AVrfpDphReportCorruptedBlock+0x1c2
2354ed6c 60cba742 verifier!AVrfpDphCheckNormalHeapBlock+0x11a
2354ed8c 60cb90d3 verifier!AVrfpDphNormalHeapFree+0x22
2354edb0 77e01564 verifier!AVrfDebugPageHeapFree+0xe3
2354edf8 77dbac29 ntdll!RtlDebugFreeHeap+0x2f
2354eeec 77d634a2 ntdll!RtlpFreeHeap+0x5d
2354ef0c 60cccc4f ntdll!RtlFreeHeap+0x142
2354ef54 76676e6a verifier!AVrfpRtlFreeHeap+0x86
2354ef68 76676f54 ole32!CRetailMalloc_Free+0x1c
2354ef78 6c40b346 ole32!CoTaskMemFree+0x13
2354f008 231f7e8a clr!MngdNativeArrayMarshaler::ClearNative+0x78
WARNING: Frame IP not in any known module. Following frames may be wrong.
2354f08c 231f6442 0x231f7e8a
2354f154 231f5a7b 0x231f6442
2354f264 231f572b 0x231f5a7b
2354f288 231f56a4 0x231f572b
2354f2a4 231f7b3e 0x231f56a4
2354f330 231f207b 0x231f7b3e
2354f3b4 1edf60e0 0x231f207b
*** WARNING: Unable to verify checksum for C:\Windows\assembly\NativeImages_v4.0.30319_32\mscorlib\045c9588954c3662d542b53f4462268b\mscorlib.ni.dll
2354f850 6a746ed4 0x1edf60e0
2354f85c 6a724157 mscorlib_ni+0x386ed4
2354f8c0 6a724096 mscorlib_ni+0x364157
2354f8d4 6a724051 mscorlib_ni+0x364096
2354f8f0 6a691cd2 mscorlib_ni+0x364051
2354f908 6c353e22 mscorlib_ni+0x2d1cd2
2354f914 6c363355 clr!CallDescrWorkerInternal+0x34
2354f968 6c366d1f clr!CallDescrWorkerWithHandler+0x6b
2354f9e0 6c4d29d6 clr!MethodDescCallSite::CallTargetWorker+0x152
2354fb54 6c3c8357 clr!ThreadNative::KickOffThread_Worker+0x19d
2354fb68 6c3c83c5 clr!Thread::DoExtraWorkForFinalizer+0x1ca
2354fc10 6c3c8492 clr!Thread::DoExtraWorkForFinalizer+0x256
2354fc6c 6c3c84ff clr!Thread::DoExtraWorkForFinalizer+0x615
2354fc90 6c4d2ad8 clr!Thread::DoExtraWorkForFinalizer+0x6b2
2354fd14 6c3fb4ad clr!ThreadNative::KickOffThread+0x1d2
2354feb0 60cd11d3 clr!Thread::intermediateThreadProc+0x4d
2354fee8 75c6336a verifier!AVrfpStandardThreadFunction+0x2f
2354fef4 77d69f72 KERNEL32!BaseThreadInitThunk+0xe
2354ff34 77d69f45 ntdll!__RtlUserThreadStart+0x70
2354ff4c 00000000 ntdll!_RtlUserThreadStart+0x1b
I feel like it's supposed to be obvious what I ought to be doing now, but no avenue of investigation is turning up anything to move me towards resolving the bug.
I finally resolved this, and found that I had made one crucial mistake. With a corrupted suffix pattern, the error will come on a free attempt, which led me to believe that it was unlikely that the allocation would have come right before the free. This was not accurate. When dealing with corruption that occurs on free, barring further information, any allocation point is equally likely. In this case, the verifier halt was coming on freeing a parameter which had been incorrectly defined as a struct of shorts instead of as a struct of ints.
Here's the offending code:
[DllImport("gdi32.dll", CharSet = CharSet.Unicode)]
[return: MarshalAs(UnmanagedType.Bool)]
static extern bool GetCharABCWidths(IntPtr hdc, uint uFirstChar, uint uLastChar, [Out] ABC[] lpabc);
(This declaration is okay)
[StructLayout(LayoutKind.Sequential)]
public struct ABC
{
public short A;
public ushort B;
public short C;
}
(This is not okay, per the MSDN article on the ABC struct : http://msdn.microsoft.com/en-us/library/windows/desktop/dd162454(v=vs.85).aspx )
So, if you find yourself debugging memory corruption that halts on free, keep in mind: never discount the possibility that the memory being freed was incorrectly allocated to begin with... and mind those [Out] parameters on unmanaged calls!

Categories

Resources