Unloading NativeAOT compiled dll

Unloading NativeAOT compiled dll - c#

With .NET 7's NativeAOT compilation. We can now load a C# dll as regular Win32 module.
HMODULE module = LoadLibraryW("AOT.dll");
auto hello = GetProcAddress(module, "Hello");
hello();
This works fine and prints some stuff in console.
However, when unloading the dll. It simply doesn't work. No matter how many times I call FreeLibrary("AOT.dll"), GetModuleHandle("AOT.dll") still returns the handle to the module, implying that it did not unload successfully.
My "wild guess" was that the runtime has some background threads still running (GC?), so I enumerated all threads and use NtQueryInformationThread to retrive the start address of each thread then call GetModuleHandleEx with GET_MODULE_HANDLE_EX_FLAG_FROM_ADDRESS to get the module where the thread started, the result were as follows.
Before:
THREAD ID = 7052
base priority = 8
delta priority = 0
Start address: 00007FF69D751613
Module: 00007FF69D740000 => CppRun.exe
THREAD ID = 3248
base priority = 8
delta priority = 0
Start address: 00007FFEF1F42B20
Module: 00007FFEF1EF0000 => ntdll.dll
THREAD ID = 7160
base priority = 8
delta priority = 0
Start address: 00007FFEF1F42B20
Module: 00007FFEF1EF0000 => ntdll.dll
After:
THREAD ID = 7052
base priority = 8
delta priority = 0
Start address: 00007FF69D751613
Module: 00007FF69D740000 => CppRun.exe
THREAD ID = 3248
base priority = 8
delta priority = 0
Start address: 00007FFEF1F42B20
Module: 00007FFEF1EF0000 => ntdll.dll
THREAD ID = 7160
base priority = 8
delta priority = 0
Start address: 00007FFEF1F42B20
Module: 00007FFEF1EF0000 => ntdll.dll
THREAD ID = 5944
base priority = 8
delta priority = 0
Start address: 00007FFEF1F42B20
Module: 00007FFEF1EF0000 => ntdll.dll
THREAD ID = 17444
base priority = 10
delta priority = 0
Start address: 00007FFE206DBEF0
Module: 00007FFE206D0000 => AOT.dll
"CppRun.exe" is my testing application.
As you can see, two additional threads were spawned. One from ntdll (5944), and one from my AOT compiled dll (17444).
I don't know what the leftover thread in "AOT.dll" was for (maybe GC?), but I force-terminated it successfully (definitely unhealthy, I know).
However, when I tried to open the thread in ntdll (5944), it throws an exception
An invalid thread, handle %p, is specified for this operation. Possibly, a threadpool worker thread was specified
Given that, I assume .NET starts a threadpool worker during initilization? How can I stop that pool and unload the dll?
Or, is there a better way for unloading a NativeAOT compiled dll?
Update: I've hooked the CreateThreadPool function, but the runtime doesn't call it. Still trying to figure out what spawned that thread.

Edit:
NativeAOT(aka CoreRT) compiled dll was unloadable at first, but Microsoft later blocked the functionality due to memory leak and crash on process exit. See this PR for more details.
This answer simply restores the functionality using detour hook and does not deal with the memory leak nor the crash. Use it at your own risk.
I was able to prevent the access violation crash by manually freeing the FLS(fiber-local storage) created by .NET. Here is a simple demo.
Original answer below:
Turns out that thread is used by Windows 10 for parallel library loading(TppWorkerThread) and isn't the problem.
I ended up inspecting the winapi call with this handy tool, and found that .NET is calling GetModuleHandleEx with the GET_MODULE_HANDLE_EX_FLAG_PIN flag, thus preventing the module from unloading.
So I hooked GetModuleHandleEx to intercept calls and shift out the flag.
Voila! Now I can unload the NativeAOT compiled dll.
I know this approach is quite hacky, but at least it works.
If anyone happen to have a better solution, please let me know.

Related

Getting error from window service with Exception code: 0xc0000005

We are getting error on server and our service is automatically stopped in the server.
Randomly application is crash in approx 1 hour with below Error as -
Faulting application name: Chubb.Studio.Event.Processor.exe, version:
0.0.0.0, time stamp: 0x5c0ab1b7 Faulting module name: KERNELBASE.dll, version: 6.3.9600.19425, time stamp: 0x5d26b6e9 Exception code:
0xc0000005 Fault offset: 0x0000000000001556 Faulting process id:
0x115c Faulting application start time: 0x01d5a35fd202f96d Faulting
application path:
E:\WindowsService\DevInt\Chubb.Studio.EventProcessor\Chubb.Studio.Event.Processor.exe
Faulting module path: C:\Windows\system32\KERNELBASE.dll Report Id:
762c15d4-0f5b-11ea-8120-005056a27597 Faulting package full name:
Faulting package-relative application ID:
Our Code is look like as -
protected override void OnStarted()
{
//IntializeEventsExecution();
Task task = Task.Factory.StartNew(() => IntializeEventsExecution());
base.OnStarted();
}
public void IntializeEventsExecution()
{
StartEvents();
}
public void StartEvents()
{
var eventList = GetEventTopics();
Parallel.ForEach(eventList,
new ParallelOptions { MaxDegreeOfParallelism = eventList.Count },
(item, state, index) =>
{
StartProcessingEvent(eventList[(int)index]);
});
}
/// <summary>
///
/// </summary>
/// <param name="index"></param>
public void StartProcessingEvent(EventTopic topic)
{
try
{
Task task = Task.Factory.StartNew(() => ExecuteProcessingEvent(topic));
task.Wait();
}
catch (Exception)
{
}
finally
{
new _processingDelegate(StartProcessingEvent).Invoke(topic);
}
}

As Klaus says in his comment, a STATUS_ACCESS-VIOLATION exception is caused by a process reading or writing memory that it doesn't own. Given this is C#, the most likely reason is either an incorrect use of P/Invoke or using unsafe code.
The best approach to debugging something vague like this is to isolate the issue by removing P/Invoke calls one by one until the exception doesn't happen. It's hard to be more precise because the exception may be triggered a long way from the cause (memory or stack corruption).
This SO answer gives a good list of the likely causes of an access violation in managed code.
Access violations in managed apps typically happen for one of these
reasons:
You P/Invoke a native API passing in a handle to a managed object and the native API uses that handle. If you get a collection and
compaction while the native API is running, the managed object may
move and the pointer becomes invalid.
You P/Invoke something with a buffer that is too small or smaller than the size you pass in and the API overruns a read or write
A pointer (IntPtr, etc) you pass to a P/Invoke call is invalid (-1 or 0) and the native isn't checking it before use
You P/Invoke a native call and the native code runs out of memory (usually virtual) and isn't checking for failed allocations and
reads/writes to an invalid address
You use a GCHandle that is not initialized or that somehow is pointing to an already finalized and collected object (so it's not
pointing to an object, it's pointing to an address where an object
used to be)
Your app uses a handle to something that got invalidated by a sleep/wake. This is more esoteric but certainly happens. For example,
if you're running an application off of a storage card, the entire app
isn't loaded into RAM. Pieces in use are demand-paged in for
execution. This is all well and good. Now if you power the device off,
the drivers all shut down. When you power back up, many devices simply
re-mount the storage devices. When your app needs to demand-page in
more program, it's no longer where it was and it dies. Similar
behavior can happen with databases on mounted stores. If you have an
open handle to the database, after a sleep/wake cycle the connection
handle may no longer be valid.

Find source of token handle leak in managed process

I'm investigating a handle leak in a WCF service, running on .NET 4.6.2. The service runs fine but over time handle count keeps increasing, with thousands of Token type handles sitting around in the process. It seems memory is also leaking very slowly (likely related to the handle leak).
Edit: it looks like event and thread handles are also leaked.
Process Explorer shows that the suspect handles all have the same name:
DOMAIN\some.username$:183db90
and all share the same address.
I attached WinDbg to the process and ran !htrace -enable and then !htrace -diff some time later. This gave me a list of almost 2000 newly opened handles and native stack traces, like this one:
Handle = 0x000000000000b02c - OPEN
Thread ID = 0x000000000000484c, Process ID = 0x0000000000002cdc
0x00007ffc66e80b3a: ntdll!NtCreateEvent+0x000000000000000a
0x00007ffc64272ce8: KERNELBASE!CreateEventW+0x0000000000000084
0x00007ffc5b392e0a: clr!CLREventBase::CreateManualEvent+0x000000000000003a
0x00007ffc5b3935c7: clr!Thread::AllocHandles+0x000000000000007b
0x00007ffc5b3943c7: clr!Thread::CreateNewOSThread+0x000000000000007f
0x00007ffc5b394308: clr!Thread::CreateNewThread+0x0000000000000090
0x00007ffc5b394afb: clr!ThreadpoolMgr::CreateUnimpersonatedThread+0x00000000000000cb
0x00007ffc5b394baf: clr!ThreadpoolMgr::MaybeAddWorkingWorker+0x000000000000010c
0x00007ffc5b1d8c74: clr!ManagedPerAppDomainTPCount::SetAppDomainRequestsActive+0x0000000000000024
0x00007ffc5b1d8d27: clr!ThreadpoolMgr::SetAppDomainRequestsActive+0x000000000000003f
0x00007ffc5b1d8cae: clr!ThreadPoolNative::RequestWorkerThread+0x000000000000002f
0x00007ffc5a019028: mscorlib_ni+0x0000000000549028
0x00007ffc59f5f48f: mscorlib_ni+0x000000000048f48f
0x00007ffc59f5f3b9: mscorlib_ni+0x000000000048f3b9
Another stack trace (a large portion of the ~2000 new handles have this):
Handle = 0x000000000000a0c8 - OPEN
Thread ID = 0x0000000000003614, Process ID = 0x0000000000002cdc
0x00007ffc66e817aa: ntdll!NtOpenProcessToken+0x000000000000000a
0x00007ffc64272eba: KERNELBASE!OpenProcessToken+0x000000000000000a
0x00007ffc5a01aa9b: mscorlib_ni+0x000000000054aa9b
0x00007ffc5a002ebd: mscorlib_ni+0x0000000000532ebd
0x00007ffc5a002e68: mscorlib_ni+0x0000000000532e68
0x00007ffc5a002d40: mscorlib_ni+0x0000000000532d40
0x00007ffc5a0027c7: mscorlib_ni+0x00000000005327c7
0x00007ffbfbfb3d6a: +0x00007ffbfbfb3d6a
When I run the !handle 0 0 command in WinDbg, I get the following result:
21046 Handles
Type Count
None 4
Event 2635 **
Section 360
File 408
Directory 4
Mutant 9
Semaphore 121
Key 77
Token 16803 **
Thread 554 **
IoCompletion 8
Timer 3
TpWorkerFactory 2
ALPC Port 7
WaitCompletionPacket 51
The ones marked with ** are increasing over time, although at a different rate.
Edit 3:
I ran !dumpheap to see the number of Thread objects (unrelated classes removed):
!DumpHeap -stat -type System.Threading.Thread
Statistics:
MT Count TotalSize Class Name
00007ffc5a152bb0 745 71520 System.Threading.Thread
Active thread count fluctuates between 56 and 62 in Process Explorer as the process is handling some background tasks periodically.
Some of the stack traces are different but they're all native traces so I don't know what managed code triggered the handle creation. Is it possible to get the managed function call that's running on the newly created thread when Thread::CreateNewThread is called? I don't know what WinDbg command I'd use for this.
Note:
I cannot attach sample code to the question because the WCF service loads hundreds of DLL-s, most of which are built from many source files - I have some very vague suspicions in what major area this may come from but I don't know any details to show an MCVE.
Edit: after Harry Johnston's comments below, I noticed that thread handle count is also increasing - I overlooked this earlier because of the high number of token handles.

OutOfMemory exception in WPF video rendering application with COM interop

We have a rich client application developed using WPF/C#.Net 4.0 which interops with in-house COM DLLs. Regular events are raised via this COM interface containing video data.
As part of the application we render video via Windows Media Foundation and have created interops to use Window Media Foundation. We have multiple WMF pipelines rendering different video at the same time.
The application runs for 6-8 hours rendering video. Private bytes remaining consistently steady during this time (say around 500-600MB).
At some point the application appears to hang, at this point private bytes increases very rapidly until the process consumes approximately 1.4GB of memory and crashes with an OutOfMemoryException.
We have reproduced this on 5 different workstations with different graphic cards (NVIDIA and ATI cards) and a mixture of Windows 7 32 and 64bit.
We have analyzed 3 dump files and found that the finalizer thread is waiting on a call to the ole32.GetToSTA() method. We are unable to determine what causes the finalizer thread to block and how to resolve this. I have pasted excerpts from three dumps we've been analyzing:
Dump 1)
Thread 2:ae0 is waiting on an STA thread efc
Thread 28:efc is calling a WaitForSingleObject. The handle it is waiting on is actually a thread handle 5ab4 which is thread id 14a4
Thread 130:14a4 has the following stack:
37f4fdf4 753776a6 ntdll!NtRemoveIoCompletion+0x15
37f4fe20 63301743 KERNELBASE!GetQueuedCompletionStatus+0x29
37f4fe74 6330d0db WMNetMgr!CNSIoCompletionPortNT::WaitAndServeCompletionsLoop+0x5e
37f4fe94 633199bf WMNetMgr!CNSIoCompletionPortNT::WaitAndServeCompletions+0x4c
37f4fecc 63312dbd WMNetMgr!CWorkThreadManager::CWorkerThread::ThreadMain+0xa2
37f4fed8 769b3677 WMNetMgr!CWMThread::ThreadFunc+0x3b
37f4fee4 77679f42 kernel32!BaseThreadInitThunk+0xe
37f4ff24 77679f15 ntdll!__RtlUserThreadStart+0x70
37f4ff3c 00000000 ntdll!_RtlUserThreadStart+0x1b
Dump2)
STA thread:
1127f474 75f80a91 ntdll!ZwWaitForSingleObject+0x15
1127f4e0 77411184 KERNELBASE!WaitForSingleObjectEx+0x98
1127f4f8 77411138 kernel32!WaitForSingleObjectExImplementation+0x75
1127f50c 63ae5f29 kernel32!WaitForSingleObject+0x12
1127f530 63a8eb2e WMNetMgr!CWMThread::Wait+0x78
1127f54c 63a8f128 WMNetMgr!CWorkThreadManager::CThreadPool::Shutdown+0x70
1127f568 63a76e10 WMNetMgr!CWorkThreadManager::Shutdown+0x34
1127f59c 63a76f2d WMNetMgr!CNSClientNetManagerHelper::Shutdown+0xdd
1127f5a4 63cd228e WMNetMgr!CNSClientNetManager::Shutdown+0x66
WARNING: Stack unwind information not available. Following frames may be wrong.
1127f5bc 63cd23a6 WMVCORE!WMCreateProfileManager+0xeef6
1127f5dc 63c573ca WMVCORE!WMCreateProfileManager+0xf00e
1127f5e8 63c62f18 WMVCORE!WMIsAvailableOffline+0x2ba3b
1127f618 63c19da6 WMVCORE!WMIsAvailableOffline+0x37589
1127f630 63c1aca2 WMVCORE!WMIsContentProtected+0x56e4
1127f63c 63c14bd7 WMVCORE!WMIsContentProtected+0x65e0
1127f650 113de6e8 WMVCORE!WMIsContentProtected+0x515
1127f660 113de513 wmp!CWMDRMReaderStub::CExternalStub::ShutdownInternalRefs+0x1d0
1127f674 113c1988 wmp!CWMDRMReaderStub::ExternalRelease+0x4f
1127f67c 1160a5b9 wmp!CWMDRMReaderStub::CExternalStub::Release+0x13
1127f6a4 1161745f wmp!CWMGraph::CleanupUpStream_selfprotected+0xbe
Finalizer thread is trying to switch to STA:
0126eccc 75f80a91 ntdll!ZwWaitForSingleObject+0x15
0126ed38 77411184 KERNELBASE!WaitForSingleObjectEx+0x98
0126ed50 77411138 kernel32!WaitForSingleObjectExImplementation+0x75
0126ed64 75d78907 kernel32!WaitForSingleObject+0x12
0126ed88 75e9a819 ole32!GetToSTA+0xad
Dump3)
The finalizer thread is in the GetToSTA call, so it is waiting for a COM object to free
Thread 29 is a COM object in the STA, and it is waiting on a critical section owned by thread 53 (1bf4)
Thread 53 is doing:
1cbcf990 76310a91 ntdll!ZwWaitForSingleObject+0x15
1cbcf9fc 74cb1184 KERNELBASE!WaitForSingleObjectEx+0x98
1cbcfa14 74cb1138 kernel32!WaitForSingleObjectExImplementation+0x75
1cbcfa28 65dfb6bb kernel32!WaitForSingleObject+0x12
WARNING: Stack unwind information not available. Following frames may be wrong.
1cbcfa48 74cb3677 wmp!Ordinal3000+0x53280
1cbcfa54 77029f42 kernel32!BaseThreadInitThunk+0xe
1cbcfa94 77029f15 ntdll!__RtlUserThreadStart+0x701cbcfaac 00000000 ntdll!_RtlUserThreadStart+0x1b
Any ideas on how we might resolve this issue?

Well, the finalizer thread is deadlocked. That will certainly result in an eventual OOM. We can't see the full stack trace for the finalizer thread but some odds that you'll see SwitchAptAndDispatchCall() and ReleaseRCWListInCorrectCtx() in the trace, indicating that it is trying to call IUnknown::Release() to release a COM object. And that object is apartment threaded so a thread switch is required to safely make the call.
I don't see any decent candidates in the stack traces you posted, possibly because you didn't get the right one or the thread is already busy shutting down due to the exception. Try to catch it earlier with a debugger break as soon as you see the virtual memory size climb.
The most common cause for a deadlock like this is violating the requirements for an STA thread. Which state that it must never block and must pump a message loop. The never-block requirement is typically easily met in a .NET program, the CLR will pump a message loop when necessary when you use the lock statement or a WaitHandle.WaitXxx() call. It is however very common to forget to pump a message loop, especially since doing so is kinda painful. Application.Run() is required.

ActiveX DLL called from thread

I have an ActiveX (COM) DLL that makes windows system calls (such as ReadFile() and WriteFile()). My GUIs (in Python or C#) create an instance of the DLL in the main GUI thread. However, in order to make calls to it from threads, a new instance of the DLL must be created in each thread, regardless of using C# or Python. (As a side note, the original instance could be called from a thread in C#, but this blocks the main thread; doing this in Python crashes the GUI.) Is there any way to avoid creating a new instance of the DLL in threads?
The reason why using the original DLL instance is desired: The DLL allows connection to a HID microcontroller. The DLL provides an option to allow only one exclusive handle to the microcontroller. If the GUI designer chooses this option (which is necessary in certain situations), the second DLL instance would not work as desired, since only one instance of the DLL could make the connection.

I haven't worked with Phyton, but for C# I would suggest creating a helper class that contains the ActiveX as a static public property. Have the main thread create the ActiveX and then from there all threads access it as needed.

When you make an ActiveX/COM component you can specify threading model for your component, it could be e.g. "compartmentalized". Depending on which you choose ActiveX/COM takes care of serializing requests.
If you "open" and ActiveX/COM component multiple times (depending on threading model?) only one instance is actually created.
I'm assuming you use win32com.client.Dispatch(".") to "open" your ActiveX/COM component.
Also, don't forget pythoncom.CoInitialize() and CoUninitialize() pair of calls.
Google on what those actually do.
If you can't change given ActiveX/COM component and its threading model is unacceptable, you can wrap all "outbound" calls in one dedicated Python thread with monitor "interface."
Here's an outline for what code I wrote once faced with similar situation:
class Driver(threading.Thread):
quit = False # graceful exit
con = None
request = None
response = None
def __init__(self, **kw):
super(Driver, self).__init__(**kw)
self.setDaemon(True) # optional, helps termination
self.con = threading.Condition()
self.request = None
self.response = None
def run(self):
pythoncom.CoInitialize()
handle = win32com.client.Dispatch("SomeActiveX.SomeInterface")
try:
with self.con:
while not self.quit:
while not self.request: self.con.wait() # wait for work
method, args = self.request
try: self.response = getattr(handle, method)(*args), None # buffer result
except Exception, e: self.response = None, e # buffer exception
self.con.notifyAll() # result ready
finally:
pythoncom.CoUninitialize()
def call(method, *args):
with self.con:
while self.request: self.con.wait() # driver is busy
self.request = method, args
self.con.notifyAll() # driver can start
while not self.response: self.con.wait() # wait for driver
rv, ex = self.response
self.request = self.response = None # free driver
self.con.notifyAll() # other clients can continue
if ex: raise ex
else: return rv

I need to develop a webservices monitoring .net Application. What is best way to design this

Need to develop a Webserver Monitoring system. There may be Hundreds of webserver running on different servers. This system need to keep monitoring of each webservice at a given interva and update the status in DB.
The current options designed.
Options1: Created class Monitorig it has Method1 which call the webservice dynamically on regular interval say 10 Min. And stores the status(Fail/Success) data to DB.
In a for loop I'm creating a new instance of monitoring class every time and a new Thread.
Example.
foreach(int i in idlist)
{
Monitoring monObj = new Monitoring();
Thread workerT = new Thread(monObj.MonitorWebService);
workerT.Start(i);
}
in the MonitorWebService API there is a infinity for loop
which does calling of the given webservice at a given interval as 1 min or 10 min etc. To process this in a regular inverval I'm using EventWaitHandle.WaitOne(T1 * 1000, false) instead of Thread.Sleep(). Here T1 can be 1 min or 1 or 5 hours.
Oprion 2:
in the for loop open a new appdomain with new Name and open a new thread as given below.
foreach(int i in idlist)
{
string appDNname = WSMonitor + i.ToString();
AppDomain WMSObj = AppDomain.CreateDomain(appDNname);
Type t= typeof(Monitoring);
Monitoring monWSObj = (Monitoring) WMSObj.CreateInstanceAndUnwrap(Assembly.GetExecuti ngAssembly().FullName, t.FullName);
Thread WorkerT = new Thread(monWSObj.MonitorWebService);
WorkerT.Start(i);
}
in option2 I'm unloading the AppDomain when the time interval is more then 10 min. And when ever its required loading. I thought option 2 will release resource when its not required and reload when its required.
Which is the best/better approach? Do we have any better solution. A Quick Help is highly appreciated.

First of all:
Option 2 is bad. It will not unload any more data than your Option 1 does.
.Net will automatically unload all application data when it is no longer referenced/needed. It just won't unload the application itself. But in your case you cannot unload your application itself anyways so using an AppDomain is completely useless here.
Option 1 is not terribly good either because (abusing) Threadsyncs for timining has huge overhead and is never a good idea.
Better options are:
1) If you don't need to run permanently just have the external windows task scheduler call your application at the needed times. This has the advantage that it is easily externally configurable and you don't have to worry about any timing in your code at all.
2) If you need/want to run permanently then the most simple and clear way would be to use one of the available Timer objects.
3) If you don't like 2) use a loop with Thread.Sleep (Don't try to abuse the Sleep interval for timing, just sleep e.g. 1 min and then wake up and check if things need to be done).

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.