Marshaling complex nested structures containing booleans

Marshaling complex nested structures containing booleans - c#

I need to do complex marshaling of several nested structures, containing variable length arrays to other structures, hence I decided to use ICustomMarshaler (see for a good JaredPar's tutorial here). But then I have a problem with a struct defined in C++ as:
typedef struct AStruct{
int32_t a;
AType* b;
int32_t bLength;
bool aBoolean;
bool bBoolean;
};
On the C# side, in the MarshalManagedToNative implementation of ICustomMarshaler I was using:
Marshal.WriteByte(intPtr, offset, Convert.ToByte(aBoolean));
offset += 1;
Marshal.WriteByte(intPtr, offset, Convert.ToByte(bBoolean));
But it was not working since I discovered that each bool in the C++ struct was taking 2 bytes. Indeed in x86 sizeof(AStruct) = 16, not 14. Ok, bool is not guaranteed to take 1 byte and so I tried with unsigned char and uint8_t but still the size is 16.
Now, I know I could use an int32 instead than a boolean, but since I care about the taken space and there are several structs containing boolean that flow to disk (I use HDF5 file format and I want to map those boolean with H5T_NATIVE_UINT8 defined in the HDF5 library that takes 1 byte), is there another way? I mean can I have something inside a struct that is guaranteed to take 1 byte?
EDIT
the same problem applies also to int16 values: depending on how many values are present because of alignment reasons the size of the struct at the end might be different from what expected. On the C# side I do not "see" the C++ struct, I simply write on the unmanaged memory by following the definition of my structs in C++. It is quite a simple process, but if I have instead to think to the real space taken by the struct (either by guessing or by measuring it) it will become more difficult and prone to errors every time I modify the struct.

This answer is in addition to what Hans Passant has said.
It might be easiest to have your structures use a fixed packing size, so you can readily predict the member layout. Keep in mind though that this could affect performance.
The rest of this answer is specific to Microsoft Visual C++, but most compilers offer their own variant of this.
To get you started, check out this SO answer #pragma pack effect and MSDN http://msdn.microsoft.com/en-us/library/2e70t5y1.aspx
What you often use is a pragma pack(push, ...) followed by a pragma pack(pop, ...) idiom to only affect packing for the structures defined between the two pragma's:
#pragma pack(push, 4)
struct someStructure
{
char a;
int b;
...
};
#pragma pack(pop)
This will make someStructure have a predictable packing of 4 byte-alignment of each of its members.
EDIT: From the MSDN page on packing
The alignment of a member will be on a boundary that is either a multiple of n
or a multiple of the size of the member, whichever is smaller.
So for pack(4) a char will be aligned on a 1-byte boundary, a short on a 2-byte, and the rest on a 4-byte boundary.
Which value is best depends on your situation. You'll need to explicitly pack all structures you intend to access, and probably all structures that are members of structures you want to access.

sizeof(AStruct) = 16, not 14
That's correct. The struct has two extra bytes at the end that are not used. They ensure that, if you put the struct in an array, that the fields in the struct are still properly aligned. In 32-bit mode, the int32_t and AType* members require 4 bytes and should be aligned to a multiple of 4 to allow the processor to access them quickly. That can only be achieved if the structure size is a multiple of 4. Thus 14 is rounded up to 16.
Do keep in mind that this does not mean that the bool fields take 2 bytes. A C++ compiler uses just 1 byte for them. The extra 2 bytes are pure padding.
If you use Marshal.SizeOf(typeof(AStruct)) in your C# program then you'll discover that the struct you declared takes 20 bytes. This is not good and the problem you are trying to fix. The bool members are the problem, an issue that goes way, way, back to early versions of the C language. Which did not have a bool type. The default marshaling that the CLR uses is compatible with BOOL, the typedef in the winapi. Which is a 32-bit type.
So you have to be explicit about it when you declare the struct in your C# code, you have to tell the marshaller that you want the 1-byte type. Which you do by declaring the struct member as byte. Or by overriding the default marshaling:
[StructLayout(LayoutKind.Sequential)]
private struct AStruct{
public int a;
public IntPtr b;
public int bLength;
[MarshalAs(UnmanagedType.U1)]
public bool aBoolean;
[MarshalAs(UnmanagedType.U1)]
public bool bBoolean;
}
And you'll now see that Marshal.SizeOf() now returns 16. Do be aware that you have to force your program in 32-bit mode, make sure that the EXE project's Platform Target setting is x86.

Related

Difference between Marshal.SizeOf and sizeof, I just don't get it

Until now I have just taken for granted that Marshal.SizeOf is the right way to compute the memory size of a blittable struct on the unmanaged heap (which seems to be the consensus here on SO and almost everywhere else on the web).
But after having read some cautions against Marshal.SizeOf (this article after "But there's a problem...") I tried it out and now I am completely confused:
public struct TestStruct
{
public char x;
public char y;
}
class Program
{
public static unsafe void Main(string[] args)
{
TestStruct s;
s.x = (char)0xABCD;
s.y = (char)0x1234;
// this results in size 4 (two Unicode characters)
Console.WriteLine(sizeof(TestStruct));
TestStruct* ps = &s;
// shows how the struct is seen from the managed side... okay!
Console.WriteLine((int)s.x);
Console.WriteLine((int)s.y);
// shows the same as before (meaning that -> is based on
// the same memory layout as in the managed case?)... okay!
Console.WriteLine((int)ps->x);
Console.WriteLine((int)ps->y);
// let's try the same on the unmanaged heap
int marshalSize = Marshal.SizeOf(typeof(TestStruct));
// this results in size 2 (two single byte characters)
Console.WriteLine(marshalSize);
TestStruct* ps2 = (TestStruct*)Marshal.AllocHGlobal(marshalSize);
// hmmm, put to 16 bit numbers into only 2 allocated
// bytes, this must surely fail...
ps2->x = (char)0xABCD;
ps2->y = (char)0x1234;
// huh??? same result as before, storing two 16bit values in
// only two bytes??? next will be a perpetuum mobile...
// at least I'd expect an access violation
Console.WriteLine((int)ps2->x);
Console.WriteLine((int)ps2->y);
Console.Write("Press any key to continue . . . ");
Console.ReadKey(true);
}
}
What's going wrong here? What memory layout does the field dereferencing operator '->' assume? Is '->' even the right operator for addressing unmanaged structs? Or is Marshal.SizeOf the wrong size operator for unmanaged structs?
I have found nothing that explains this in a language I understand. Except for "...struct layout is undiscoverable..." and "...in most cases..." wishy-washy kind of stuff.

The difference is: the sizeof operator takes a type name and tells you how many bytes of managed memory need to be allocated for an instance of that struct.This is not necessarily stack memory; structs are allocated off the heap when they are array elements, fields of a class, and so on. By contrast, Marshal.SizeOf takes either a type object or an instance of the type, and tells you how many bytes of unmanaged memory need to be allocated. These can be different for a variety of reasons. The name of the type gives you a clue: Marshal.SizeOf is intended to be used when marshaling a structure to unmanaged memory.
Another difference between the two is that the sizeof operator can only take the name of an unmanaged type; that is, a struct type whose fields are only integral types, Booleans, pointers and so on. (See the specification for an exact definition.) Marshal.SizeOf by contrast can take any class or struct type.

I think the one question you still don't have answered is what's going on in your particular situation:
&ps2->x
0x02ca4370 <------
*&ps2->x: 0xabcd 'ꯍ'
&ps2->y
0x02ca4372 <-------
*&ps2->y: 0x1234 'ሴ'
You are writing to and reading from (possibly) unallocated memory. Because of the memory area you're in, it's not detected.
This will reproduce the expected behavior (at least on my system, YMMV):
TestStruct* ps2 = (TestStruct*)Marshal.AllocHGlobal(marshalSize*10000);
// hmmm, put to 16 bit numbers into only 2 allocated
// bytes, this must surely fail...
for (int i = 0; i < 10000; i++)
{
ps2->x = (char)0xABCD;
ps2->y = (char)0x1234;
ps2++;
}

What memory layout does the field dereferencing operator '->' assume?
Whatever the CLI decides
Is '->' even the right operator for addressing unmanaged structs?
That is an ambiguous concept. There are structs in unmanaged memory accessed via the CLI: these follow CLI rules. And there are structs that are merely notional monikers for unmanaged code (perhaps C/C++) accessing the same memory. This follows the rules of that framework. Marshalling usually refers to P/Invoke, but that isn't necessarily applicable here.
Or is Marshal.SizeOf the wrong size operator for unmanaged structs?
I'd default to Unsafe.SizeOf<T>, which is essentially sizeof(T) - which is perfectly well-defined for the CLI/IL (including padding rules etc), but isn't possible in C#.

A char marshals, by default, to an ANSI byte. This allows interoperability with most C libraries and is fundamental to the operation of the .NET runtime.
I believe the correct solution is to change TestStruct to:
public struct TestStruct
{
[System.Runtime.InteropServices.MarshalAs(UnmanagedType.U2)]
public char x;
[System.Runtime.InteropServices.MarshalAs(UnmanagedType.U2)]
public char y;
}
UnmanagedType.U2 means unsigned 'integer' 2 bytes long, which makes it equivalent to the wchar_t type in a C header.
Seamless porting of C structures to .NET is possible with attention to detail and opens many doors for interop with native libraries.

Marshaling an array of boolean vs marshaling a single boolean (defined as int) to bool in C#

In a C API I have BOOL defined as follows
#ifndef BOOL
#define BOOL int
And I have a struct which, among others, has a simple BOOL member and an array of BOOLs
struct SomeStruct
{
BOOL bIsSomething;
BOOL bHasSomething[5];
}
Now I found out that when I want to cast the whole struct I have to marshal them differently:
the single BOOL I marshal with I1 and the fixed length array I have to marshal with I4 (if I don't their struct sizes won't match and I will have problems extracting an array of these structs into C#):
[StructLayout(LayoutKind.Sequential)]
public struct SomenNativeStruct
{
[MarshalAs(UnmanagedType.I1)]
public bool bIsSomething;
[MarshalAs(UnmanagedType.ByValArray, ArraySubType = UnmanagedType.I4, SizeConst = 5)]
public bool[] bHasSomething;
}
I suspect I do something wrong because I'm not sure why I should need to marshal the same type differently depending on whether I get it as a fixed size array or as a single member.
If I'm marshalling them all as I4 I get a System.ArgumentException
An unhandled exception of type 'System.ArgumentException' occurred in SomeDll.dll
Additional information: Type 'Namespace.Document+SomeNativeStruct' cannot be marshaled as an unmanaged structure; no meaningful size or offset can be computed.

bool is a tricky type to interop. There's many mutually incompatible definitions of what a boolean value is, so bool is considered a non-blittable type - that is, it needs to be truly marshalled, rather than just sticking a "totally a bool" tag to the data. And arrays of non-blittable types are doubly-tricky.
The simplest solution would be to avoid using bool entirely. Just replace the bool[] with int[], and provided the original type is actually a 32-bit int (depends on the compiler and platform), you'll get correct interop. You can then manually copy the interop struct to a managed struct with a more sane layout, if you so choose - which also gives you full control over interpreting which int values correspond to true and false, respectively.
In general, native interop is always tricky; you need to have a good understanding of the actual memory layout as well as the meaning of the values and types you're dealing with. The types aren't enough - they're too ambiguous, especially in standard C (which is often the standard for native interop even today). Headers aren't enough - you also need the docs, and perhaps even a look in a (native) debugger.
Extra danger comes from the fact that there's no safety net that tells you you're doing things somewhat wrong - the wrong interop approach can appear to work just fine for years, and then suddenly blow up in your face when e.g. a true value happens to be 42 instead of the more usual -1, and your bitwise arithmetics breaks subtly (this can actually happen in C#, if you use unsafe code). Everything might work great for values smaller than 32768, and then break horribly. There's plenty of hard to catch error cases, so you need extra caution.

Accommodating nested unsafe structs in C#

What is the best way to accommodate the following:
Real time, performance critical application that interfaces with a native C dll for communicating with a proprietary back end.
The native api has hundreds upon hundreds of structs, nested structs and methods that pass data back and forth via these structs.
Want to use c# for logic, so decided on unsafe c# in favor of cli and marshaling. I know how and have implemented this via the later so please don't reply "use cli". Marshaling hundreds of structs a hundred times a second introduces a significant enough delay that it warranted investigating unsafe c#.
Most of the c structs contain dozens of fields, so looking for a method to do minimal typing on each. At this point, got it down to running a VS macro to convert each line element to c# equivalent setting arrays to fixed size when necessary. This work pretty well until I hit a nested struct array. So for example, I have these 2 structs:
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe struct User{
int id;
fixed char name[12];
}
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe structs UserGroup{
fixed char name[12];
fixed User users[512]
int somethingElse;
fixed char anotherThing[16]
}
What is the best way to accommodate fixed User users[512] so that to not have to do much during run time?
I have seen examples where the suggestion is to do
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe structs UserGroup{
fixed char name[12];
User users_1;
User users_2;
...
User users_511;
int somethingElse;
fixed char anotherThing[16]
}
Another idea has been, to compute the size of User in bytes and just do this
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe structs UserGroup{
fixed char name[12];
fixed byte Users[28*512];
int somethingElse;
fixed char anotherThing[16]
}
But that would mean that I would have to do special treatment to this struct every time I need to use it, or wrap it with some other code. There are enough of those in the api that I would like to avoid this approach, but if someone can demonstrate an elegant way I that could work as well
A third approach that eludes me enough that I can't produce and example(i think i saw somewhere but cant find it anymore), is to specify size for User or somehow make it strictly sized so that you could use a "fixed" keyword on it.
Can anyone recommend a reasonable approach that they have utilized and scales well under load?

The best way I could find nested struct in unsafe structs is by defining them as fixed byte arrays and then providing a runtime conversion property for the field. For example:
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe struct UserGroup{
fixed char name[12];
fixed User users[512]
int somethingElse;
fixed char anotherThing[16]
}
Turns into:
[StructLayout(LayoutKind.Sequential,Pack=1)]
unsafe struct UserGroup{
fixed char name[12];
fixed byte users[512 * Constants.SizeOfUser]
int somethingElse;
fixed char anotherThing[16];
public User[] Users
{
get
{
var retArr = new User[512];
fixed(User* retArrRef = retArr){
fixed(byte* usersFixed = users){
{
Memory.Copy(usersFixed, retArrRef, 512 * Constants.SizeOfUser);
}
}
}
return retArr;
}
}
}
Pleas note, this code uses Memory.Copy function provided here: http://msdn.microsoft.com/en-us/library/aa664786(v=vs.71).aspx
The general explanation of the geter is as follows:
allocate a managed array for the return value
get and fix an unsafe pointer to it
get and fix an unsafe pointer to the byte array for the struct
copy the memory from one to the other
The reason why the managed array is not getting stored back into the struct it self is because it would modify its layout and would not translate correctly anymore, while the prop is a no issue when getting it from un-managed. Alternatively, this could be wrapped in another managed object that does the storing.

Sending C# struct to C++ through a socket

I want to send a struct in C# to C++ using sockets.
For example, I use this struct:
[StructLayout(LayoutKind.sequential, Pack = 1)]
struct pos {
public int i;
public float x;
};
If I somehow convert it into bytes and send over the network, I should be able to cast it to this in c++:
struct pos {
int i;
float x;
};
... I think.
1) how do you break down a struct instance in C# to send it over the network?
2) can I safely cast it to the c++ struct once I get it?
Thanks

The marshaller helps you with converting between .NET structs and raw bytes. In this answer, I posted a simple solution, which boils down to Marshal.StructureToPtr and Marshal.PtrToStructure. In contrast to the more advanced solutions provided by Johann du Toit, this is in my opinion the best thing you can do if all you want to do is to push some structures through a byte stream.
If you do this, you can safely cast to the C++ struct if the length is correct, and your C++ struct is declared with the same packing as the C# struct (i.e. #pragma pack in VC++ or __attribute__((packed)) in GCC).
Note that this also works with fixed length C strings, but will not take care of the endianness of larger values. I found it a simple solution to provide getters and setters for the latter problem which just swap the bytes accordingly (with BitConverter).
Some elaboration on the packing:
Take the following structure:
struct MyStruct {
uint8_t a;
float b;
};
With the C# declaration with StructLayout, Pack = 1, this struct will have a size of five bytes. The C++ struct, however, may have eight bytes (or even more), depending on the default packing of the compiler, who may happily insert some padding bytes to align the float value on a 32-bit boundary (just an example). Because of this, you have to apply the very same packing to both the C# and C++ struct. In Visual C++:
#pragma pack(push, 1)
// ... struct declarations...
#pragma pack(pop)
This means all structs declared between the two pragmas will have a packing of one. In GCC:
struct x {
// ...
} __attribute__((packed));
This will do the same. You can #define __attribute__(x) on Windows platforms and #ifdef _WIN32 around the pragmas to make the code compatible with both worlds.

You can either encode it in a format like JSON (There are a lot of JSON parsers out there, check on the json.org website for a list), XML or just roll your own. You could also try already built libraries like Protobuf, which allows you to serialize your structures that you would create with a file in .proto format (And use Protobuf-Net for C#). Another option would be Thrift which provides a way to serialize but also supplies a ready to use RCP system. It support c#, c++ and a ton of other languages by default.
So it's depends on taste, take your pick :D

64 Bit P/Invoke Idiosyncrasy

I am trying to properly Marshal some structs for a P/Invoke, but am finding strange behavior when testing on a 64 bit OS.
I have a struct defined as:
/// <summary>http://msdn.microsoft.com/en-us/library/aa366870(v=VS.85).aspx</summary>
[StructLayout(LayoutKind.Sequential)]
private struct MIB_IPNETTABLE
{
[MarshalAs(UnmanagedType.U4)]
public UInt32 dwNumEntries;
public IntPtr table; //MIB_IPNETROW[]
}
Now, to get the address of the table, I would like to do a Marshal.OffsetOf() call like so:
IntPtr offset = Marshal.OffsetOf(typeof(MIB_IPNETTABLE), "table");
This should be 4 - I have dumped the bytes of the buffer to confirm this as well as replacing the above call with a hard coded 4 in my pointer arithmetic, which yielded correct results.
I do get the expected 4 if I instantiate MIB_IPNETTABLE and perform the following call:
IntPtr offset = (IntPtr)Marshal.SizeOf(ipNetTable.dwNumEntries);
Now, in a sequential struct the offset of a field should be sum of the sizes of preceding fields, correct? Or is it the case that when it is an unmanaged structure the offset really is 8 (on an x64 system), but becomes 4 only after Marshalling magic? Is there a way to get the OffsetOf() call to give me the correct offset? I can limp along using calls to SizeOf(), but OffsetOf() is simpler for larger structs.

In a 64-bit C/C++ build the offset of your table field would be 8 due to alignment requirements (unless you forced it otherwise). I suspect that the CLR is doing the same to you:
http://msdn.microsoft.com/en-us/library/system.runtime.interopservices.layoutkind.aspx
The members of the object are laid out sequentially, in the order in which they appear when
exported to unmanaged memory. The members are laid out according to the packing specified in StructLayoutAttribute.Pack, and can be noncontiguous.
you may wnat to use that attribute or use the LayoutKind.Explicit attribute along with the FieldOffset attribute on each field if you need that level of control.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.

Marshaling complex nested structures containing booleans - c#

Related

Difference between Marshal.SizeOf and sizeof, I just don't get it

Marshaling an array of boolean vs marshaling a single boolean (defined as int) to bool in C#

Accommodating nested unsafe structs in C#

Sending C# struct to C++ through a socket

64 Bit P/Invoke Idiosyncrasy

Categories

Resources