Using SizeConst and SizeParamIndex in Custom Marshaler? - c#

I'm currently trying to implement a Custom Marshaler for UTF-8 C Strings.
The problem is, that the strings i'm dealing with are not neccesarily null terminated, so i need to rely on their constant size or a size parameter.
When Marshaling them as LPStr i can use the SizeParamIndex and SizeConst MarshalAs Attribute parameters, but i don't seem to have access to those inside my implementation of ICustomMarshaler.
I'd like to avoid using Byte[] and manual UTF-8 conversion on each function, but it seems like that's the only way?
Or am i missing some way to get access to the SizeParamIndex/SizeConst information?
And even if i could somehow pass this data inside the marshaler, how would i get the actual size value for a SizeParamIndex?

Related

Do I need to call Marshal.DestroyStructure when marshalling structures containing ByValTStr strings

I am doing some manual marshaling for interop from C#/.NET to unmanaged DLLs.
Consider the following struct:
[StructLayout(LayoutKind.Sequential, CharSet = CharSet.Unicode)]
private struct LockInfo
{
ushort lockVersion;
[MarshalAs(UnmanagedType.ByValTStr, SizeConst = 32)]
public string lockName;
}
I marshal this to unmanaged memory:
var lockInfo = new LockInfo();
var lockInfoPtr = Marshal.AllocHGlobal(Marshal.SizeOf(lockInfo));
Marshal.StructureToPtr(lockInfo, lockInfoPtr, false);
Once I'm done with it, do I need to call Marshal.DestroyStructure on lockInfoPtr?
I am aware of the need to call Marshal.FreeHGlobal, but prior to that, is Marshal.DestroyStructure actually required in this case?
I have found it very difficult to understand Microsoft's documentation around this. Google searching hasn't helped, possibly because I just don't quite understand marshalling properly yet (but I am learning).
Similar questions...
I have reviewed the similar question "Marshal.DestroyStructure vs Marshal.FreeHGlobal in .Net" but this question does not address the issue of the content a struct should contain that would require the use of DestroyStructure. My limited understanding is that DestroyStructure does not always need to be called, only when the structure contains certain kinds of fields. In my case I am unsure if a string, being marshalled as ByValTStr, requires the use of DestroyStructure.
Marshaling is complex stuff and a full answer could fill a whole chapter in a book, which of course is not appropriate here. So, in a very succinct nutshell:
Normally, when calling native functions from managed code, .NET marshals strings and arrays to native BSTR strings and SafeArray arrays.
To accomplish this, the marshaler calls SysAllocString and SafeArrayCreate respectively.
At some point, when these native-side strings and arrays are no longer needed, the marshaler will call SysFreeString and SafeArrayDestroy respectively to free memory.
If you take over .NET's automatic marshaling, and call methods like Marshal.StructureToPtr to manually marshal a structure, you become responsible for freeing/destroying those native-side BSTRs and SafeArrays. That's exactly what Marshal.DestroyStructure is for.
However...
By prepending the [MarshalAs(UnmanagedType.ByValTStr, SizeConst = 32)] attribute to your string field, you instructed the marshaler to not marshal the string to a BSTR string, but rather to allocate a fixed-length character array within the native-side structure itself.
That being the case, there is no need to call Marshal.DestroyStructure because there is no BSTR string to free. Of course you will still need to call Marshal.FreeHGlobal, I see you are aware of that.
Credit to #SimonMourier for his comment that made it all click.

For C#, is there a down-side to using 'string' instead of 'StringBuilder' when calling Win32 functions such as GetWindowText?

Consider these two definitions for GetWindowText. One uses a string for the buffer, the other uses a StringBuilder instead:
[DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = true)]
public static extern int GetWindowText(IntPtr hWnd, StringBuilder lpString, int nMaxCount);
[DllImport("user32.dll", CharSet = CharSet.Auto, SetLastError = true)]
public static extern int GetWindowText(IntPtr hWnd, string lpString, int nMaxCount);
Here's how you call them:
var windowTextLength = GetWindowTextLength(hWnd);
// You can use either of these as they both work
var buffer = new string('\0', windowTextLength);
//var buffer = new StringBuilder(windowTextLength);
// Add 1 to windowTextLength for the trailing null character
var readSize = GetWindowText(hWnd, buffer, windowTextLength + 1);
Console.WriteLine($"The title is '{buffer}'");
They both seem to work correctly whether I pass in a string, or a StringBuilder. However, all the examples I've seen use the StringBuilder variant. Even PInvoke.net lists that one.
My guess is the thinking goes 'In C# strings are immutable, therefore use StringBuilder', but since we're poking down to the Win32 API and messing with the memory locations directly, and that memory buffer is for all intents and purposes (pre)allocated (i.e. reserved for, and being currently used by the string) by the nature of it being assigned a value at its definition, that restriction doesn't actually apply, hence string works just fine. But I'm wondering if that assumption is wrong.
I don't think so because if you test this by increasing the buffer by say 10, and change the character you're initializing it with to say 'A', then pass in that larger buffer size to GetWindowText, the string you get back is the actual title, right-padded with the ten extra 'A's that weren't overwritten, showing it did update that memory location of the earlier characters.
So provided you pre-initialize the strings, can't you do this? Could those strings ever 'move out from under you' while using them because the CLR is assuming they're immutable? That's what I'm trying to figure out.
First off, pre-allocated is a misleading word in current context.The string is nothing different than just another .Net immutable string, and is as immutable as Hugh Jackman in real life. I believe OP knows this already.
In fact:
// You can use either of these as they both work
var buffer = new string('\0', windowTextLength);
is exactly same as:
// assuming windowTextLength is 5
var buffer = "\0\0\0\0\0";
Why shouldn't we use String/string and instead use StringBuilder for passing callee modifiable arguments to Interop/Unmanaged code? Are there specific scenarios where it will fail?
Honestly, I found this an interesting question and tested a few scenarios, by writing a custom Native DLL that takes a string and StringBuilder, while I force garbage collection, by forcing GC in different thread and so on. My intention was to force-relocate the object while its address was passed to an external library though PInvoke. In all cases, the object's address remained same even when other objects relocated. On research I found this by Jeffrey himself: The Managed Heap and Garbage Collection in the CLR
When you use the CLR’s P/Invoke mechanism to call a method, the CLR
pins the arguments for you automatically and unpins them when the
native method returns.
So takeaway is we can use it because it seems to work. But should we? I believe No:
Because its clearly mentioned in the docs, Fixed-Length String Buffers. So string works for now, may not work in future releases.
Because StringBuilder is library-provided mutable type, and logically it makes more sense to allow a mutable type to be modified vs an immutable type (string).
There's a subtle advantage. When using StringBuilder, we're pre-allocating capacity, and not string. What this does is, we get rid of additional steps to trim/sanitize the string, and also not bother about terminating null character.
If you pass a string to a function using P/Invoke, the CLR will assume the function will read the string. For efficiency, the string is pinned in memory and a pointer to the first character is passed to the function. No character data needs to be copied this way.
Of course, the function can do whatever it wants to the data in the string, including modifying it.
This means the function will overwrite the first few characters without issue, but buffer.Length will remain unchanged and you'll end up with the existing data at the end of the string still present in the string. .NET strings store their length in a field. They are also null-terminated, but the null terminator is only used as a convenience for interoperability with C code and has no effect in managed code.
Using such a string wouldn't be convenient as unless you pre-defined the string's size to perfectly match where the null-terminated character will ultimately be once written, .NET's length field will be out of sync with the underlying data.
Also, it's better this way, since changing the length of a string would certainly corrupt the CLR heap (the GC wouldn't be able to walk the objects). Strings and arrays are the only two object types that don't have a fixed size.
On the other hand, if you pass a StringBuilder through P/Invoke, you're explicitly telling the marshaler the function is expected to write to the instance, and when you call ToString() on it, it does update the length based on the null-termination character and everything is perfectly in sync.
Better use the right tool for the job. :)

C# Interop Marshaling behaviour for arrays seems inconsistent with documentation

I am currently writing a thin C# binding for OpenGL. I've just recently implemented the OpenGL GenVertexArrays function, which has the following signature:
OpenGL Documentation on glGenVertexArrays.
Essentially, you pass it an array in which to store generated object values for the vertex arrays created by OpenGL.
In order to create the binding, I use delegates as glGenVertexArrays is an OpenGL extension function, so I have to load it dynamically using wglGetProcAddress. The delegate signature I have defined in C# looks like this:
[SuppressUnmanagedCodeSecurity]
[UnmanagedFunctionPointer(CallingConvention.StdCall)]
private delegate void glGenVertexArrays(uint amount, uint[] array);
The function pointer is retrieved and converted to this delegate using Marshal.GetDelegateForFunctionPointer, like this:
IntPtr proc = wglGetProcAddress(name);
del = Marshal.GetDelegateForFunctionPointer(proc, delegateType);
Anyways, here's what bothers me:
In any official documentation I can find on default marshalling behaviour for reference types (which includes arrays), is this:
By default, reference types (classes, arrays, strings, and interfaces)
passed by value are marshaled as In parameters for performance
reasons. You do not see changes to these types unless you apply
InAttribute and OutAttribute (or just OutAttribute) to the method
parameter.
This is taken from this MSDN page: MSDN page on directional attributes
However, as can be seen from my delegate signatures, the [In] and [Out] directional attributes have not been used on the array of unsigned integers, meaning when I call this function I should actually not be able to see the generated object values which OpenGL should have stored in them. Except, I am. Using this signature, I can the following result when running the debugger:
As can be seen, the call absolutely did affect the array, even though I did not explicitly use the [Out] attribute. This is not, from what I understand, a result I should expect.
Does anyone know the reason behind this? I know it might seem as a minor deal, but I am very curious to know why this seems to break the default marshalling behaviour described by Microsoft. Is there some behind-the-scenes stuff going on when invoking delegates compared to pure platform invoke prototypes? Or am I misinterpreting the documentation?
[EDIT]
For anyone curious, the public method that invokes the delegate is defined on a static "GL" class, and is as followed:
public static void GenVertexArrays(uint amount, uint[] array)
{
InvokeExtensionFunction<glGenVertexArrays>()(amount, array);
}
It is not mentioned on the documentation page you linked, but there is another topic dedicated to the marshaling of arrays, where it says:
With pinning optimization, a blittable array can appear to operate as an In/Out parameter when interacting with objects in the same apartment.
Both conditions are met in your case: array of uint is blittable, and there is no machine-to-machine marshaling. It is still a good idea to declare it [Out], so your intention is documented within the code.
The documentation is correct in the general case. But uint is a bit special, it is a blittable type. An expensive word that means that the pinvoke marshaller does not have to do anything special to convert the array element values. An uint in C# is exactly the same type as an unsigned int in C. Not a coincidence at all, it is the kind of type that a processor can handle natively.
So the marshaller can simply pin the array and pass a pointer to the first array element as the second argument. Very fast, always what you want. And the function scribbles directly into the managed array, so copying the values back is not necessary. A bit dangerous too, you never ever want to lie about the amount argument, GC heap corruption is an excessively ugly bug to diagnose.
Most simple value types and structs of simple values types are blittable. bool is a notable exception. You'll otherwise never have to be sorry for using [Out] even if it is not necessary. The marshaller simply ignores it here.

What is the difference between [In, Out] and ref when using pinvoke in C#?

Is there a difference between using [In, Out] and just using ref when passing parameters from C# to C++?
I've found a couple different SO posts, and some stuff from MSDN as well that comes close to my question but doesn't quite answer it. My guess is that I can safely use ref just like I would use [In, Out], and that the marshaller won't act any differently. My concern is that it will act differently, and that C++ won't be happy with my C# struct being passed. I've seen both things done in the code base I'm working in...
Here are the posts I've found and have been reading through:
Are P/Invoke [In, Out] attributes optional for marshaling arrays?
Makes me think I should use [In, Out].
MSDN: InAttribute
MSDN: OutAttribute
MSDN: Directional Attributes
These three posts make me think that I should use [In, Out], but that I can use ref instead and it will have the same machine code. That makes me think I'm wrong -- hence asking here.
The usage of ref or out is not arbitrary. If the native code requires pass-by-reference (a pointer) then you must use those keywords if the parameter type is a value type. So that the jitter knows to generate a pointer to the value. And you must omit them if the parameter type is a reference type (class), objects are already pointers under the hood.
The [In] and [Out] attributes are then necessary to resolve the ambiguity about pointers, they don't specify the data flow. [In] is always implied by the pinvoke marshaller so doesn't have to be stated explicitly. But you must use [Out] if you expect to see any changes made by the native code to a struct or class member back in your code. The pinvoke marshaller avoids copying back automatically to avoid the expense.
A further quirk then is that [Out] is not often necessary. Happens when the value is blittable, an expensive word that means that the managed value or object layout is identical to the native layout. The pinvoke marshaller can then take a shortcut, pinning the object and passing a pointer to managed object storage. You'll inevitably see changes then since the native code is directly modifying the managed object.
Something you in general strongly want to pursue, it is very efficient. You help by giving the type the [StructLayout(LayoutKind.Sequential)] attribute, it suppresses an optimization that the CLR uses to rearrange the fields to get the smallest object. And by using only fields of simple value types or fixed size buffers, albeit that you don't often have that choice. Never use a bool, use byte instead. There is no easy way to find out if a type is blittable, other than it not working correctly or by using the debugger and compare pointer values.
Just be explicit and always use [Out] when you need it. It doesn't cost anything if it turned out not to be necessary. And it is self-documenting. And you can feel good that it still will work if the architecture of the native code changes.

.NET COM Interop Method Signature

What interop signature would you use for the following COM method? I am interested particularly in the final two parameters, and whether to try to use MarshalAs with a SizeParamIndex or not.
HRESULT GetOutputSetting(
DWORD dwOutputNum,
LPCWSTR pszName,
WMT_ATTR_DATATYPE* pType,
BYTE* pValue,
WORD* pcbLength
);
Documentation states:
pValue [out] Pointer to a byte buffer containing the value. Pass NULL
to retrieve the length of the buffer
required.
pcbLength [in, out] On input, pointer to a variable containing the
length of pValue. On output, the
variable contains the number of bytes
in pValue used.
You could try the PInvoke Signature Toolkit. It's rather useful for getting marshaling right when performing platform interops. It quite possibly won't cover your particular problem, but you may find a comparable one that gives you the information you seek.
I would use the SizeParamIndex, because your scenario is exactly the one this feature is for: To specify the length of a variable sized array.
So the last to parameters would be in C# signature:
byte[] pValue,
ref ushort pcbLength
The byte-Array is passed without ref, because the array corresponds to a pointer in native code.
If you pass NULL (or null in C#) for pValue in order to retrieve the size of the buffer needed. That means also that the caller has to allocate the byte-Array.
The parameter pcbLength is passed by ref, because it is used as an in/out-parameter.

Categories

Resources