The question is general but I am actually doing this with Mono and not .net, so if there are differences, I am very interested in what they are.
I have a simple data containing class (not struct for other reasons) which should be blitable in the sense that it consist of ints and doubles and smaller structs which consist of doubles. I am sending this to a native dll through DllImport static methods as a reference.
I was under the impression that for a simple object like that, what happens is that it is pinned in managed memory, the address of it, in managed memory, is passed to the native code as a reference/pointer (depending on how the native code is declared, same thing), the native code can read and write it, the function returns and the managed object is unpinned and may now hold changes written by the native code.
Others think the object is instead copied into a block of native memory and then the native code runs on it after which the data is copied back into the object in managed memory. This obviously being less performent and wasteful when the marshaling does not have to convert the data.
I made a test where I note the address of the data sent to native code and I see that it does not change per object. ObjectA gets one address, objectB gets another and each keep their address for as long as I tested.... but while that seems to support my understanding of this, there could still be other explanations for the addresses, so I would be grateful for a concrete explanation, since Mono documents do not mention pinning blittable objects while Microsoft documentation does.
Extra question:
Can there can be a situation with a class (not struct) containing only ushort, int and double where it is not blittable but requires a copy? It was observed with mono on android that changes to the data in native code native would not be visible on managed side (when not using the Out decoration), seemingly indicating that copying, and not pinning, was used.
It is possible that the mono int may be different from the c++ int in the native code, but such data size and alignment issues should not be detectable from the managed side, so how would it "know" to marshal by copy and not pinning? In tests on windows such mismatch just result in garbled data, as expected, so that is likely not the reason for marshal by copy.
Related
I have read this and this and was wondering if I use in C# functions from unmanaged C++ library via C# wrapper of this library, is there going to be any difference in performance compared with the same program, but written fully with unmanaged C++ and the C++ library? I am asking about crucial performance difference bigger then 1.5 times. Notice I am asking about the function performance of the C++ library only(in the two ways - with and without the use of C# wrapper), isolating the other code!
After edit:
I was just wondering if I want to use C++ dynamic unmanaged library(.dll) in C# and I am using wrapper - which is going to be compiled to intermediate CIL code and which is not. I guess only the wrapper is being compiled to CIL, and when C# want to use C++ function from the library it is just parsing and passing the arguments to C++ function with the use of the wrapper, so there will be maybe some delay, but not like if I write the whole library via C#. Correct me if I am mistaking please.
Of course, there is overhead involved in switching from managed to unmanaged code execution. It is very modest, takes about 12 cpu cycles. All that needs to be done is write a "cookie" on the stack so that the garbage collector can recognize that subsequent stack frames belong to unmanaged code and therefore should not be inspected for valid object references.
These cookies are strung together like a linked-list, supporting the scenario where C# code calls native code which in turn calls back into managed code. Traversed by the GC when it collects. Not as uncommon as it sounds, it happens in any GUI app for example. The Click event is a good example, triggered when the UI thread pinvokes GetMessage().
Not the only thing that needs to happen however, in any practical scenario you also pass arguments to the native function. They can require a lot more work to get marshaled into a format that the native code can understand. Arrays in particular, they'll need to get pinned if the array elements are blittable, that's still pretty cheap. Gets expensive when the entire array needs to be converted because the element is not blittable. Not always easy to recognize, a profiler is forever the proper tool to detect inefficient code.
Okay, I messed something up. I've written in C++ a DLL which I call from the managed code (C# .NET). The library works like diamonds and is blazingly fast.
My DLL uses its internal state i.e. allocates heaps of memory and uses myriad of variables which are not cleared off between the calls from .NET. Instead they stay there and C# code is aware of that (there is preprocessing and building data structures), actually this is required for performance.
So what is the problem?
I want to add multi-threading, effectively by allowing each .NET thread access his own DLL. Without storing any data between the calls it would be easy achievable with just one DLL.
But in my case, do I have to copy the *.DLL the number of times equal to the number of threads and write pInvoke wrapper for each file separately?? :O I mean [DllImport(...)] for each out of like 40 functions?
No way, there must be something more clever. Help?
Simply put you need to stop sharing variables between threads.
Your global variables are the problem. Instead you need each different thread to have its own copy of the state that persists between calls. Typically you would put this state into a structure of some sort, perhaps a struct. Then an initial call to the DLL would return a new instance of this structure. You then pass that structure back into the DLL every time you call a function that requires access to the persistent state. When you are done, call back into the DLL to deallocate the structure. You don't need to declare the structure in the managed code. You can just treat it as an opaque pointer. Use IntPtr.
Of course, perhaps you'd just be better off with a C++/CLI assembly.
I tried to find out how pinned pointers defined with fixed keyword work. My idea was that internally GCHandle.Alloc(object, GCHandleType.Pinned) was used for that. But when I looked into the IL generated for the following C# code:
unsafe static void f1()
{
var arr = new MyObject[10];
fixed(MyObject * aptr = &arr[0])
{
Console.WriteLine(*aptr);
}
}
I couldn't find any traces of GCHandle.
The only hint I saw that the pinned pointer was used in the method was the following IL declaration:
.locals init ([0] valuetype TestPointerPinning.MyObject[] arr,
[1] valuetype TestPointerPinning.MyObject& pinned aptr)
So the pointer was declared as pinned, and that did not require any additional methods calls, to pin it.
My questions are
Is there any difference between using pinned pointers in the declaration and pinning the pointer by using GCHandle class?
Is there any way to declare a pinned pointer in C# without using fixed keyword? I need this to pin a bunch of pointers within a loop and there's no way I can do this using fixed keyword.
Well, sure there's a difference, you saw it. The CLR supports more than one way to pin an object. Only the GCHandleType.Pinned method is directly exposed to user code. But there are others, like "async pinned handles", a feature that keeps I/O buffers pinned while a driver performs an overlapped I/O operation. And the one that the fixed keyword uses, it doesn't use an explicit handle or method call at all. These extra ways were added to make unpinning the objects again as quick and reliable as possible, very important to GC health.
Fixed buffer pins are implemented by the jitter. Which performs two important jobs when it translates MSIL to machine code, the highly visible one is the machine code itself, you can easily see it with the debugger. But it also generates a data structure used by the garbage collector, completely invisible in the debugger. Required by the GC to reliably find object references back that are stored in the stack frame or a CPU register. More about that data structure in this answer.
The jitter uses the [pinned] attribute on the variable declaration in the metadata to set a bit in that data structure, indicating that the object that's referenced by the variable is temporarily pinned. The GC sees this and knows to not move the object. Very efficient because it doesn't require an explicit method call to allocate the handle and doesn't require any storage.
But no, these tricks are not available otherwise to C# code, you really do need to use the fixed keyword in your code. Or GCHandle.Alloc(). If you are finding yourself getting lost in the pins then high odds that you ought to be considering pinvoke or C++/CLI so you can easily call native code. The temporary pins that the pinvoke marshaller uses to keep objects stable while the native code is running are another example of automatic pinning that doesn't require explicit code.
I'm trying to write a pinvoke for the ITaskTrigger::GetTriggerString method (defined at http://msdn.microsoft.com/en-us/library/windows/desktop/aa381866(v=vs.85).aspx). If you look at the page, it says that the caller of the method is responsible for freeing the memory (via CoTaskMemFree) of the LPWSTR referenced via the first argument. While I could do that manually in .NET or could write my custom marshaler using ICustomMarshaler, I was wondering if using the MarshalAs(UnmanagedType.LPWStr) attribute for that particular argument will free the memory appropriately.
Can anyone provide some insight?
First things first: you're talking about COM Interop here (ITaskTrigger is a COM interface), not P/Invoke. There are different interop rules for the two, so it's important to keep them straight. For example, you'll need to define C# interop wrappers for the entire interface, not just the method you want. These should get you started: pinvoke.net
The short answer is, you're in luck, becase the CLR should take care of things properly for you.
The longer answer involves the different types of marshalling the COM interop code does, depending on the parameter types, directions, and what attributes you add to your interop signatures.
In this case, the parameter type you will get on the call is an "out string" parameter, with a MarshalAs(UnmanagedType.LPWSTR) attribute. When a COM server exposes a call that has an "out" parameter of LPWSTR string type, assuming the server is keeping up its end of the deal, it will allocate a memory buffer with CoTaskMemAlloc() and return it to you. (If it was a different string type, like a BSTR, the specific memory allocation call might be different, but the basic concept is the same.) At this point, you are responsible for cleaning up that memory when you no longer need it, using the matching CoTaskMemFree() call.
This is a special type of operation called a "reference change": the parameter you are sending in is already a reference parameter, but the COM server is going to replace it with a different reference. A good explanation for this process is found in the "Memory Ownership" section of this MSDN magazine article. As you can see from that article, when the CLR receives data back from an "out" parameter on a reference type, it recognizes that it is taking responsibility for freeing that memory. While marshaling that call back to managed code, it uses the MarshalAs attribute to determine that this is a LPWSTR string-type pointer in COM, and that it should therefore have been allocated using CoTaskMemAlloc(). After creating a managed string out of the data, it will call CoTaskMemFree() on the original buffer on your behalf. The data you get back will be fully managed and you won't have to deal with any ownership problems.
there's a WinForms-application written in C# using .NET Framework 3.5. This application uses a C++ Dll which is imported using the following declaration:
[DllImport(DllName)]
public static unsafe extern int LoadDBData(String dsn, String userid, String password);
This method imports data from a given ODBC-DSN using a SQL Server database. The call crashes when there is too much data in database. The provider of this extern dll said this happens because the dll is unable to grab more heap size and my application should provide more heap memory.
How could I solve this problem? As far as I know the only possibility to exclude a component from automatic garbage collection is the unsafe keyword which I already used.
Any idea would be appreciated.
Thanks in advance
Martin
This seems like a problem with the vendor's library, rather than your code.
Managed and unmanaged memory should be considered to be completely separate. Managed memory is typically memory allocated on a garbage-collected
heap, while unmanaged memory is anything else: the ANSI C memory pool
allocated through malloc(3), custom memory pools, and
garbage-allocated heaps outside the control of the CLI implementation...
Note that the above quote is from the Mono documentation, but I believe (if I'm not mistaken) the same is true for .NET in general. If the data is being loaded in the DLL's internal data structures, then it should allocate its own memory. If you're providing a buffer which will get filled up with the data, then it will only get filled up with as much data as you've allocated for the buffer (and pinned before marshalling). So where is the data being loaded?
You can't increase the heap size in .NET.
You could create an EXE in c/c++ that your .NET app calls using Process.Start.
Your c/c++ EXE would just call the DLL function and return the result (or if you have more than one function it could take a command line parameter). If you don't want a separate EXE you could try using RunDll32 instead.
I doubt this is specific to .NET, managed memory, garbage collection etc. It's a native DLL so it uses regular, unmanaged memory. Of course, the .NET runtime will also use it's share of memory but a native application using the DLL would do the same.
If you're running in a 32 bit process, the total heap size for .NET and unmanaged code can be limited to 1.5 GB. It's difficult to tell without additional information, but you might have hit that limit.
So one option would be to ask your vendor, whether they have a 64 bit version of the library and switch to a 64 process. In a 64 bit process, memory is almost unlimited (according to today's standard).