I'm in the process of rewriting an overengineered and unmaintainable chunk of my company's library code that interfaces between C# and C++. I've started looking into P/Invoke, but it seems like there's not much in the way of accessible help.
We're passing a struct that contains various parameters and settings down to unmanaged codes, so we're defining identical structs. We don't need to change any of those parameters on the C++ side, but we do need to access them after the P/Invoked function has returned.
My questions are:
What is the best way to pass strings? Some are short (device id's which can be set by us), and some are file paths (which may contain Asian characters)
Should I pass an IntPtr to the C# struct or should I just let the Marshaller take care of it by putting the struct type in the function signature?
Should I be worried about any non-pointer datatypes like bools or enums (in other, related structs)? We have the treat warnings as errors flag set in C++ so we can't use the Microsoft extension for enums to force a datatype.
Is P/Invoke actually the way to go? There was some Microsoft documentation about Implicit P/Invoke that said it was more type-safe and performant.
For reference, here is one of the pairs of structs I've written so far:
C++
/**
Struct used for marshalling Scan parameters from managed to unmanaged code.
*/
struct ScanParameters
{
LPSTR deviceID;
LPSTR spdClock;
LPSTR spdStartTrigger;
double spinRpm;
double startRadius;
double endRadius;
double trackSpacing;
UINT64 numTracks;
UINT32 nominalSampleCount;
double gainLimit;
double sampleRate;
double scanHeight;
LPWSTR qmoPath; //includes filename
LPWSTR qzpPath; //includes filename
};
C#
/// <summary>
/// Struct used for marshalling scan parameters between managed and unmanaged code.
/// </summary>
[StructLayout(LayoutKind.Sequential)]
public struct ScanParameters
{
[MarshalAs(UnmanagedType.LPStr)]
public string deviceID;
[MarshalAs(UnmanagedType.LPStr)]
public string spdClock;
[MarshalAs(UnmanagedType.LPStr)]
public string spdStartTrigger;
public Double spinRpm;
public Double startRadius;
public Double endRadius;
public Double trackSpacing;
public UInt64 numTracks;
public UInt32 nominalSampleCount;
public Double gainLimit;
public Double sampleRate;
public Double scanHeight;
[MarshalAs(UnmanagedType.LPWStr)]
public string qmoPath;
[MarshalAs(UnmanagedType.LPWStr)]
public string qzpPath;
}
A blittable type is a type that has a common representation between managed and unmanaged code and can therefore be passed between them with little or no problem, e.g. byte, int32, etc.
A non-blittable type does not have the common representation, e.g. System.Array, System.String, System.Boolean, etc.
By specifying the MarshalAs attribute for a non-blittable type you can tell the marshaller what it should be converted to.
See this article on Blittable and Non-Blittable Types for more information
1 - What is the best way to pass strings? Some are short (device id's which can be set by us), and some are file paths (which may contain Asian characters)
StringBuilder is generally recommended as the easiest to use but I often use plain byte arrays.
2 - Should I pass an IntPtr to the C# struct or should I just let the Marshaller take care of it by putting the struct type in the function signature?
If the method is expecting a pointer then pass an IntPtr although you can get probably away with a ref in many cases depending on what it's going to be used for. If it's something that needs to stick around in the same place for a long time then I would manually allocate the memory with Marshal and pass the resulting IntPtr.
3 - Should I be worried about any non-pointer datatypes like bools or enums (in other, related structs)? We have the treat warnings as errors flag set in C++ so we can't use the Microsoft extension for enums to force a datatype.
Once you've got everything set up with the correct marshalling attributes I don't see why you'd need to worry. If in doubt put in the attribute, if the struct only ever gets used by managed code then the attribute won't be used.
4 - Is P/Invoke actually the way to go? There was some Microsoft documentation about Implicit P/Invoke that said it was more type-safe and performant.
Can't comment on this, you're into Visual C++ territory there.
Related
In a C API I have BOOL defined as follows
#ifndef BOOL
#define BOOL int
And I have a struct which, among others, has a simple BOOL member and an array of BOOLs
struct SomeStruct
{
BOOL bIsSomething;
BOOL bHasSomething[5];
}
Now I found out that when I want to cast the whole struct I have to marshal them differently:
the single BOOL I marshal with I1 and the fixed length array I have to marshal with I4 (if I don't their struct sizes won't match and I will have problems extracting an array of these structs into C#):
[StructLayout(LayoutKind.Sequential)]
public struct SomenNativeStruct
{
[MarshalAs(UnmanagedType.I1)]
public bool bIsSomething;
[MarshalAs(UnmanagedType.ByValArray, ArraySubType = UnmanagedType.I4, SizeConst = 5)]
public bool[] bHasSomething;
}
I suspect I do something wrong because I'm not sure why I should need to marshal the same type differently depending on whether I get it as a fixed size array or as a single member.
If I'm marshalling them all as I4 I get a System.ArgumentException
An unhandled exception of type 'System.ArgumentException' occurred in SomeDll.dll
Additional information: Type 'Namespace.Document+SomeNativeStruct' cannot be marshaled as an unmanaged structure; no meaningful size or offset can be computed.
bool is a tricky type to interop. There's many mutually incompatible definitions of what a boolean value is, so bool is considered a non-blittable type - that is, it needs to be truly marshalled, rather than just sticking a "totally a bool" tag to the data. And arrays of non-blittable types are doubly-tricky.
The simplest solution would be to avoid using bool entirely. Just replace the bool[] with int[], and provided the original type is actually a 32-bit int (depends on the compiler and platform), you'll get correct interop. You can then manually copy the interop struct to a managed struct with a more sane layout, if you so choose - which also gives you full control over interpreting which int values correspond to true and false, respectively.
In general, native interop is always tricky; you need to have a good understanding of the actual memory layout as well as the meaning of the values and types you're dealing with. The types aren't enough - they're too ambiguous, especially in standard C (which is often the standard for native interop even today). Headers aren't enough - you also need the docs, and perhaps even a look in a (native) debugger.
Extra danger comes from the fact that there's no safety net that tells you you're doing things somewhat wrong - the wrong interop approach can appear to work just fine for years, and then suddenly blow up in your face when e.g. a true value happens to be 42 instead of the more usual -1, and your bitwise arithmetics breaks subtly (this can actually happen in C#, if you use unsafe code). Everything might work great for values smaller than 32768, and then break horribly. There's plenty of hard to catch error cases, so you need extra caution.
If I have a native code that expects a structure with two fields:
[DllImport(LibraryName, CallingConvention = CallingConvention.Cdecl)]
public static extern int my_method(ref MyStruct myStruct);
// this is what native code expects
public struct MyStruct {
IntPtr First;
IntPtr Second;
}
but instead I pass another struct to it, will it work or not - by design or by accident?
[DllImport(LibraryName, CallingConvention = CallingConvention.Cdecl)]
public static extern int my_method(ref MyLongerStruct myLongerStruct);
// this is what I want to pass to it
public struct MyLongerStruct {
IntPtr First;
IntPtr Second;
object ObjectPointer;
}
Will the object reference added to the end of the struct at C# side somehow affect P/Invoke call?
I shouldn't work. And even more, you need to add and properly set StructLayoutAttribute to the structure, as it explained here
I think, the result should be like this:
[StructLayout(LayoutKind.Sequential)]
public struct MyStruct {
IntPtr First;
IntPtr Second;
}
If the total difference in structure is fields added to the end, and you use StructLayout to prevent the compiler from optimizing the memory layout of your struct (as Alex Butenko suggests), then it's unlikely that there will be any negative side effects apart from a slight speed hit.
When you pass a managed struct to an external function via P/Invoke (using the DllImport attribute) there is a marshaling phase that converts your struct to a compatible format for the target. For ref and out parameters the temporary is converted back when the invoked function returns, copying the values back to your struct instance. All of this is abstracted away, although exactly how the marshaling is performed for each member can be tweaked with the right attributes.
This is how the .NET framework handles strings in P/Invoke. Since it can't just send a string instance pointer to an API function that is expecting a char * (the two are nothing alike) there has to be some translation.
The fun part is that the marshaling code doesn't know anything about what the target is expecting other than what you tell it at the C# end, so if you are sending an extended version of the structure it will do the whole thing. At the other end the native code will get a pointer to a memory block containing the information it's expecting, and it won't have any way to tell that there is more after the end of the structure.
Apart from that, no problem... as long as you're passing by reference and not by value. Passing structs by value is something that should raise big red stop signs all over your brain. Don't do it, it's evil.
I need to do complex marshaling of several nested structures, containing variable length arrays to other structures, hence I decided to use ICustomMarshaler (see for a good JaredPar's tutorial here). But then I have a problem with a struct defined in C++ as:
typedef struct AStruct{
int32_t a;
AType* b;
int32_t bLength;
bool aBoolean;
bool bBoolean;
};
On the C# side, in the MarshalManagedToNative implementation of ICustomMarshaler I was using:
Marshal.WriteByte(intPtr, offset, Convert.ToByte(aBoolean));
offset += 1;
Marshal.WriteByte(intPtr, offset, Convert.ToByte(bBoolean));
But it was not working since I discovered that each bool in the C++ struct was taking 2 bytes. Indeed in x86 sizeof(AStruct) = 16, not 14. Ok, bool is not guaranteed to take 1 byte and so I tried with unsigned char and uint8_t but still the size is 16.
Now, I know I could use an int32 instead than a boolean, but since I care about the taken space and there are several structs containing boolean that flow to disk (I use HDF5 file format and I want to map those boolean with H5T_NATIVE_UINT8 defined in the HDF5 library that takes 1 byte), is there another way? I mean can I have something inside a struct that is guaranteed to take 1 byte?
EDIT
the same problem applies also to int16 values: depending on how many values are present because of alignment reasons the size of the struct at the end might be different from what expected. On the C# side I do not "see" the C++ struct, I simply write on the unmanaged memory by following the definition of my structs in C++. It is quite a simple process, but if I have instead to think to the real space taken by the struct (either by guessing or by measuring it) it will become more difficult and prone to errors every time I modify the struct.
This answer is in addition to what Hans Passant has said.
It might be easiest to have your structures use a fixed packing size, so you can readily predict the member layout. Keep in mind though that this could affect performance.
The rest of this answer is specific to Microsoft Visual C++, but most compilers offer their own variant of this.
To get you started, check out this SO answer #pragma pack effect and MSDN http://msdn.microsoft.com/en-us/library/2e70t5y1.aspx
What you often use is a pragma pack(push, ...) followed by a pragma pack(pop, ...) idiom to only affect packing for the structures defined between the two pragma's:
#pragma pack(push, 4)
struct someStructure
{
char a;
int b;
...
};
#pragma pack(pop)
This will make someStructure have a predictable packing of 4 byte-alignment of each of its members.
EDIT: From the MSDN page on packing
The alignment of a member will be on a boundary that is either a multiple of n
or a multiple of the size of the member, whichever is smaller.
So for pack(4) a char will be aligned on a 1-byte boundary, a short on a 2-byte, and the rest on a 4-byte boundary.
Which value is best depends on your situation. You'll need to explicitly pack all structures you intend to access, and probably all structures that are members of structures you want to access.
sizeof(AStruct) = 16, not 14
That's correct. The struct has two extra bytes at the end that are not used. They ensure that, if you put the struct in an array, that the fields in the struct are still properly aligned. In 32-bit mode, the int32_t and AType* members require 4 bytes and should be aligned to a multiple of 4 to allow the processor to access them quickly. That can only be achieved if the structure size is a multiple of 4. Thus 14 is rounded up to 16.
Do keep in mind that this does not mean that the bool fields take 2 bytes. A C++ compiler uses just 1 byte for them. The extra 2 bytes are pure padding.
If you use Marshal.SizeOf(typeof(AStruct)) in your C# program then you'll discover that the struct you declared takes 20 bytes. This is not good and the problem you are trying to fix. The bool members are the problem, an issue that goes way, way, back to early versions of the C language. Which did not have a bool type. The default marshaling that the CLR uses is compatible with BOOL, the typedef in the winapi. Which is a 32-bit type.
So you have to be explicit about it when you declare the struct in your C# code, you have to tell the marshaller that you want the 1-byte type. Which you do by declaring the struct member as byte. Or by overriding the default marshaling:
[StructLayout(LayoutKind.Sequential)]
private struct AStruct{
public int a;
public IntPtr b;
public int bLength;
[MarshalAs(UnmanagedType.U1)]
public bool aBoolean;
[MarshalAs(UnmanagedType.U1)]
public bool bBoolean;
}
And you'll now see that Marshal.SizeOf() now returns 16. Do be aware that you have to force your program in 32-bit mode, make sure that the EXE project's Platform Target setting is x86.
Here is a simple C code (VS 2013 C++ project, "compiled as C"):
typedef struct {
bool bool_value;
} BoolContainer;
BoolContainer create_bool_container (bool bool_value)
{
return (BoolContainer) { bool_value };
}
Here is my P/Invoke wrapper
public partial class NativeMethods
{
[DllImport(LibraryName, CallingConvention = CallingConvention.Cdecl)]
public static extern BoolContainer create_bool_container([MarshalAs(UnmanagedType.I1)] bool bool_value);
}
Here are the managed versions of the BoolContainer:
First is throwing MarshalDirectiveException: Method's type signature is not PInvoke compatible.:
public struct BoolContainer // Marshal.SizeOf(typeof(BoolContainer)) = 1
{
[MarshalAs(UnmanagedType.I1)] // same with UnmanagedType.U1
public bool bool_value;
}
Second is working:
public struct BoolContainer // Marshal.SizeOf(typeof(BoolContainer)) = 1
{
public byte bool_value;
}
Test code:
BoolContainer boolContainer = NativeMethods.create_bool_container(true);
Is seems like DllImport ignores MarshalAs for any boolean members in the returned structure. Is there a way to keep bool in the managed declaration?
Structs as return types are pretty difficult in interop, they don't fit a CPU register. This is handled in native code by the caller reserving space on its stack frame for the return value and passing a pointer to the callee. Which then copies the return value through the pointer. In general a pretty troublesome feature, different C compilers have different strategies to pass that pointer. The pinvoke marshaller assumes MSVC behavior.
This is a problem with structs that are not blittable, in other words when its layout is no longer an exact match with the unmanaged struct that the native code needs. That requires an extra step to convert the unmanaged struct to the managed struct, the equivalent of Marshal.PtrToStructure(). You see this for example if you add a string field to your struct. That makes it non-blittable since the C string has to be converted to a System.String, you'll get the exact same exception.
A similar kind of problem in your case, bool is also a very difficult type to interop with. Caused by the C language not having the type and it getting added later with highly incompatible choices. It is 1 byte in C++ and the CLR, 2 bytes in COM, 4 bytes in C and the winapi. The pinvoke marshaller picked the winapi version as the default, making it match BOOL, the most common reason to pinvoke. This forced you to add the [MarshalAs] attribute and that also requires extra work, again to be done by the equivalent of Marshal.PtrToStructure().
So what the pinvoke marshaller is missing is the support for this extra step, custom marshaling of the return value after the call. Not sure if this was a design constraint or a resource constraint. Probably both, ownership of non-blittable struct members is completely unspecified in unmanaged languages and guessing at it, especially as a return value, very rarely turns out well. So they may well have made the call that working on the feature just wasn't worth the low odds of success. This is a guess of course.
Using byte instead is a fine workaround.
faacEncConfigurationPtr FAACAPI faacEncGetCurrentConfiguration(
faacEncHandle hEncoder);
I'm trying to come up with a simple wrapper for this C++ library; I've never done more than very simple p/invoke interop before - like one function call with primitive arguments.
So, given the above C++ function, for example, what should I do to deal with the return type, and parameter?
FAACAPI is defined as: #define FAACAPI __stdcall
faacEncConfigurationPtr is defined:
typedef struct faacEncConfiguration
{
int version;
char *name;
char *copyright;
unsigned int mpegVersion;
unsigned long bitRate;
unsigned int inputFormat;
int shortctl;
psymodellist_t *psymodellist;
int channel_map[64];
} faacEncConfiguration, *faacEncConfigurationPtr;
AFAIK this means that the return type of the function is a reference to this struct?
And faacEncHandle is:
typedef struct {
unsigned int numChannels;
unsigned long sampleRate;
...
SR_INFO *srInfo;
double *sampleBuff[MAX_CHANNELS];
...
double *freqBuff[MAX_CHANNELS];
double *overlapBuff[MAX_CHANNELS];
double *msSpectrum[MAX_CHANNELS];
CoderInfo coderInfo[MAX_CHANNELS];
ChannelInfo channelInfo[MAX_CHANNELS];
PsyInfo psyInfo[MAX_CHANNELS];
GlobalPsyInfo gpsyInfo;
faacEncConfiguration config;
psymodel_t *psymodel;
/* quantizer specific config */
AACQuantCfg aacquantCfg;
/* FFT Tables */
FFT_Tables fft_tables;
int bitDiff;
} faacEncStruct, *faacEncHandle;
So within that struct we see a lot of other types... hmm.
Essentially, I'm trying to figure out how to deal with these types in my managed wrapper?
Do I need to create versions of these types/structs, in C#? Something like this:
[StructLayout(LayoutKind.Sequential)]
struct faacEncConfiguration
{
uint useTns;
ulong bitRate;
...
}
If so then can the runtime automatically "map" these objects onto eachother?
And, would I have to create these "mapped" types for all the types in these return types/parameter type hierarchies, all the way down until I get to all primitives?
I know this is a broad topic, any advice on getting up-to-speed quickly on what I need to learn to make this happen would be very much appreciated! Thanks!
Your are on the right track with how you would need to create managed structures that represent unamanged structures for use with P/Invoke.
It is however not the best strategy for interop with unmanaged libraries, because using this API from C# would still feel like using a C API - create and initialise a structure, pass it to a function and get some other structure in return.
It is OK to use P/Invoke for an odd function call that is not otherwise exposed as .NET API, but for a full blown API wrapper I would highly recommend using managed C++(C++/CLI). It is absolutely unparalleled in creating unmanaged interop layers for .NET.
The biggest challenge would be to convert this essentially C interface into an object orientated interface where you call methods off objects, as opposed to calling global functions with structures of all-public members.
As you start writing elaborate structure graphs for P/Invoke you would yourself having to deal with quite a bit of "magic" that governs how managed primitive types convert to unmanaged types. Often, using incorrect types will cause a run-time error.
With managed C++ (IJW - It Just Works) you define managed structures, classes, interfaces in C++, which allows more natural use of the underlying library and a more native interface to your C# application.
This is a great overview of C++/CLI. Also MSDN has extensive documentation for all managed C++ features.
Yes, you would need to declare all these structures in c#. Be careful to declare members with correct sizes. For example, 'long' is 32-bit in C++, but 64-bit in C#. For a pointer or void* in C++, use IntPtr in C#, etc.