I was reading about Marshaling. and im confused because what does mean this in unmanaged code.
HRESULT, DWORD, and HANDLE.
The original text is:
You already know that there is no such compatibility between managed and unmanaged environments. In other words, .NET does not contain such the types HRESULT, DWORD, and HANDLE that exist in the realm of unmanaged code. Therefore, you need to find a .NET substitute or create your own if needed. That is what called marshaling.
short answer:
it is just telling you that you must "map" one data type used in one programming language to another data type used in a different programming language, and the data types must match.
quick answer:
For this one, the details may not be correct, but the concept is.
These are a few of the data types defined in the Windows header files for C/C++. They are "macros" which "abstract" the primitive data types of C/C++ into more meaningful data types used in Windows programming. For instance, DWORD is really an 32-bit unsigned integer in C/C++, but on 64-bit processors, it is defined in the header files as a 64-bit unsigned integer. The idea is to provide an abstraction layer between the data type needed by the processor and the data types used by the language.
During marshalling, this "dword" will be converted to the CLR data type you specify in the DllImport declaration. This is an important point.
Let's say you want to call a Windows API method that takes a DWORD parameter. When declaring this call in C# using DllImport, you must specify the parameter data type as System.UInt32. If you don't, "bad things will happen".
For example, if you mistakenly specify the parameter data type as System.UInt64. When the actual call is made, the stack will become corrupt because more bytes are being placed on the stack then the API call expects. Which can lead to completely unexpected behavior, such as crashing the application, crashing Windows, invalid return values, or whatever.
That is why it is important to specific the correct data type.
data types in question:
DWORD is defined as 32-bit unsigned integer or the CLR type System.UInt32.
HANDLE is the CLR types IntPtr, UintPtr, or HandleRef
HRESULT is System.Int32 or System.UInt32
References:
Using P/Invoke to Call Unmanaged APIs from Your Managed Classes at http://msdn.microsoft.com/en-us/library/aa719104(v=vs.71).aspx has a table listing the Windows data type with its corresponding CLR data type that specifically answers your question.
Windows Data Types (Windows) at http://msdn.microsoft.com/en-us/library/aa383751(v=VS.85).aspx
.NET Column: Calling Win32 DLLs in C# with P/Invoke at http://msdn.microsoft.com/en-us/magazine/cc164123.aspx
HRESULT: http://en.wikipedia.org/wiki/HRESULT
In the field of computer programming, the HRESULT is a data type used
in Windows operating systems, and the earlier IBM/Microsoft OS/2
Operating system, used to represent error conditions, and warning
conditions. The original purpose of HRESULTs was to formally lay out
ranges of error codes for both public and Microsoft internal use in
order to prevent collisions between error codes in different
subsystems of the OS/2 Operating System. HRESULTs are numerical error
codes. Various bits within an HRESULT encode information about the
nature of the error code, and where it came from. HRESULT error codes
are most commonly encountered in COM programming, where they form the
basis for a standardized COM error handling convention.
DWORD: http://en.wikipedia.org/wiki/DWORD#Size_families
HANDLE: http://en.wikipedia.org/wiki/Handle_(computing)
In computer programming, a handle is an abstract reference to a
resource. Handles are used when application software references blocks
of memory or objects managed by another system, such as a database or
an operating system. While a pointer literally contains the address of
the item to which it refers, a handle is an abstraction of a reference
which is managed externally; its opacity allows the referent to be
relocated in memory by the system without invalidating the handle,
which is impossible with pointers. The extra layer of indirection also
increases the control the managing system has over operations
performed on the referent. Typically the handle is an index or a
pointer into a global array of tombstones.
HRESULT, DWORD, and HANDLE are typedef's (i.e., they represent plain data types) defined by Microsoft for use by programmers compiling *un*managed code in a Windows environment. They are defined in a C (or C++) header file that is provided by Microsoft that is, typically, automatically included in unmanaged Windows projects created within Microsoft Visual Studio.
Related
I am writing some tools to help validate IL that is emitted at runtime. A part of this validation involves maintaining a Stack<Type> as OpCodes are emitted so that future OpCodes that utilize these stack elements can be validated as using the proper types. I am confused as to how to handle the ldind.i opcode, however.
The Microsoft documentation states:
The ldind.i instruction indirectly loads a native int value from the
specified address (of type native int, &, or *) onto the stack as a
native int.
In C#, native int is not defined, and I am confused as to what type most accurately represents this data. How can I determine what its size is, and which C# type should be used to represent it? I am concerned it will vary by system hardware.
To my mind, you'd be better off looking at how the VES is defined and using a dedicated enum to model the types on the stack rather than C# visible types. Otherwise you're in for a rude surprise when we get to the floating point type.
From MS Partition I.pdf1, Section 12.1:
The CLI model uses an evaluation stack [...] However, the CLI supports only a subset of these types in its operations upon values stored on its evaluation stackāint32, int64, and native int. In addition, the CLI supports an internal data type to represent floating-point values on the internal evaluation stack. The size of the internal data type is implementation-dependent.
So those, as well as things like references are the things you should track, and I'd recommend you do that with an explicit model of the VES Stack using its terms.
1ECMA C# and Common Language Infrastructure Standards
I am working on a rather large codebase in which C++ functionality is P/Invoked from C#.
There are many calls in our codebase such as...
C++:
extern "C" int __stdcall InvokedFunction(int);
With a corresponding C#:
[DllImport("CPlusPlus.dll", ExactSpelling = true, SetLastError = true, CallingConvention = CallingConvention.Cdecl)]
private static extern int InvokedFunction(IntPtr intArg);
I have scoured the net (insofar as I am capable) for the reasoning as to why this apparent mismatch exists. For example, why is there a Cdecl within the C#, and __stdcall within the C++? Apparently, this results in the stack being cleared twice, but, in both cases, variables are pushed onto the stack in the same reverse order, such that I do not see any errors, albeit the possibility that return information is cleared in the event of attempting a trace during debugging?
From MSDN: http://msdn.microsoft.com/en-us/library/2x8kf7zx%28v=vs.100%29.aspx
// explicit DLLImport needed here to use P/Invoke marshalling
[DllImport("msvcrt.dll", EntryPoint = "printf", CallingConvention = CallingConvention::Cdecl, CharSet = CharSet::Ansi)]
// Implicit DLLImport specifying calling convention
extern "C" int __stdcall MessageBeep(int);
Once again, there is both extern "C" in the C++ code, and CallingConvention.Cdecl in the C#. Why is it not CallingConvention.Stdcall? Or, moreover, why is there __stdcall in the C++?
Thanks in advance!
This comes up repeatedly in SO questions, I'll try to turn this into a (long) reference answer. 32-bit code is saddled with a long history of incompatible calling conventions. Choices on how to make a function call that made sense a long time ago but are mostly a giant pain in the rear end today. 64-bit code has only one calling convention, whomever is going to add another one is going to get sent to small island in the South Atlantic.
I'll try to annotate that history and relevance of them beyond what's in the Wikipedia article. Starting point is that the choices to be made in how to make a function call are the order in which to pass the arguments, where to store the arguments and how to cleanup after the call.
__stdcall found its way into Windows programming through the olden 16-bit pascal calling convention, used in 16-bit Windows and OS/2. It is the convention used by all Windows api functions as well as COM. Since most pinvoke was intended to make OS calls, Stdcall is the default if you don't specify it explicitly in the [DllImport] attribute. Its one and only reason for existence is that it specifies that the callee cleans up. Which produces more compact code, very important back in the days when they had to squeeze a GUI operating system in 640 kilobytes of RAM. Its biggest disadvantage is that it is dangerous. A mismatch between what the caller assumes are the arguments for a function and what the callee implemented causes the stack to get imbalanced. Which in turn can cause extremely hard to diagnose crashes.
__cdecl is the standard calling convention for code written in the C language. Its prime reason for existence is that it supports making function calls with a variable number of arguments. Common in C code with functions like printf() and scanf(). With the side effect that since it is the caller that knows how many arguments were actually passed, it is the caller that cleans up. Forgetting CallingConvention = CallingConvention.Cdecl in the [DllImport] declaration is a very common bug.
__fastcall is a fairly poorly defined calling convention with mutually incompatible choices. It was common in Borland compilers, a company once very influential in compiler technology until they disintegrated. Also the former employer of many Microsoft employees, including Anders Hejlsberg of C# fame. It was invented to make argument passing cheaper by passing some of them through CPU registers instead of the stack. It is not supported in managed code due to the poor standardization.
__thiscall is a calling convention invented for C++ code. Very similar to __cdecl but it also specifies how the hidden this pointer for a class object is passed to instance methods of a class. An extra detail in C++ beyond C. While it looks simple to implement, the .NET pinvoke marshaller does not support it. A major reason that you cannot pinvoke C++ code. The complication is not the calling convention, it is the proper value of the this pointer. Which can get very convoluted due to C++'s support for multiple inheritance. Only a C++ compiler can ever figure out what exactly needs to be passed. And only the exact same C++ compiler that generated the code for the C++ class, different compilers have made different choices on how to implement MI and how to optimize it.
__clrcall is the calling convention for managed code. It is a blend of the other ones, this pointer passing like __thiscall, optimized argument passing like __fastcall, argument order like __cdecl and caller cleanup like __stdcall. The great advantage of managed code is the verifier built into the jitter. Which makes sure that there can never be an incompatibility between caller and callee. Thus allowing the designers to take the advantages of all of these conventions but without the baggage of trouble. An example of how managed code could stay competitive with native code in spite of the overhead of making code safe.
You mention extern "C", understanding the significance of that is important as well to survive interop. Language compilers often decorate the names of exported function with extra characters. Also called "name mangling". It is a pretty crappy trick that never stops causing trouble. And you need to understand it to determine the proper values of the CharSet, EntryPoint and ExactSpelling properties of a [DllImport] attribute. There are many conventions:
Windows api decoration. Windows was originally a non-Unicode operating system, using 8-bit encoding for strings. Windows NT was the first one that became Unicode at its core. That caused a rather major compatibility problem, old code would not have been able to run on new operating systems since it would pass 8-bit encoded strings to winapi functions that expect a utf-16 encoded Unicode string. They solved this by writing two versions of every winapi function. One that takes 8-bit strings, another that takes Unicode strings. And distinguished between the two by gluing the letter A at the end of the name of the legacy version (A = Ansi) and a W at the end of the new version (W = wide). Nothing is added if the function doesn't take a string. The pinvoke marshaller handles this automatically without your help, it will simply try to find all 3 possible versions. You should however always specify CharSet.Auto (or Unicode), the overhead of the legacy function translating the string from Ansi to Unicode is unnecessary and lossy.
The standard decoration for __stdcall functions is _foo#4. Leading underscore and a #n postfix that indicates the combined size of the arguments. This postfix was designed to help solve the nasty stack imbalance problem if the caller and callee don't agree about the number of arguments. Works well, although the error message isn't great, the pinvoke marshaller will tell you that it cannot find the entrypoint. Notable is that Windows, while using __stdcall, does not use this decoration. That was intentional, giving programmers a shot at getting the GetProcAddress() argument right. The pinvoke marshaller also takes care of this automatically, first trying to find the entrypoint with the #n postfix, next trying the one without.
The standard decoration for __cdecl function is _foo. A single leading underscore. The pinvoke marshaller sorts this out automatically. Sadly, the optional #n postfix for __stdcall does not allow it to tell you that your CallingConvention property is wrong, great loss.
C++ compilers use name mangling, producing truly bizarre looking names like "??2#YAPAXI#Z", the exported name for "operator new". This was a necessary evil due to its support for function overloading. And it originally having been designed as a preprocessor that used legacy C language tooling to get the program built. Which made it necessary to distinguish between, say, a void foo(char) and a void foo(int) overload by giving them different names. This is where the extern "C" syntax comes into play, it tells the C++ compiler to not apply the name mangling to the function name. Most programmer that write interop code intentionally use it to make the declaration in the other language easier to write. Which is actually a mistake, the decoration is very useful to catch mismatches. You'd use the linker's .map file or the Dumpbin.exe /exports utility to see the decorated names. The undname.exe SDK utility is very handy to convert a mangled name back to its original C++ declaration.
So this should clear up the properties. You use EntryPoint to give the exact name of the exported function, one that might not be a good match for what you want to call it in your own code, especially for C++ mangled names. And you use ExactSpelling to tell the pinvoke marshaller to not try to find the alternative names because you already gave the correct name.
I'll nurse my writing cramp for a while now. The answer to your question title should be clear, Stdcall is the default but is a mismatch for code written in C or C++. And your [DllImport] declaration is not compatible. This should produce a warning in the debugger from the PInvokeStackImbalance Managed Debugger Assistant, a debugger extension that was designed to detect bad declarations. And can rather randomly crash your code, particularly in the Release build. Make sure you didn't turn the MDA off.
cdecl and stdcall are both valid and usable between C++ and .NET, but they should consistent between the two unmanaged and managed worlds. So your C# declaration for InvokedFunction is invalid. Should be stdcall. The MSDN sample just gives two different examples, one with stdcall (MessageBeep), and one with cdecl (printf). They are unrelated.
I want to declare a COM Interface in MIDL that allows for returning a pointer (like in the ID3D11Blob). I understand that pointers are a special thing in COM because of the stubs generated for RPC calls. I do not need RPC, but only want to access the COM server from C#. The question is: can I declare the interface in such a way that the C# stub returns an IntPtr? I have tried to add [local] to enable void pointers, but that does not suffice.
The interface should look in MIDL like
[local] void *PeekData(void)
and in C# like
IntPtr PeekData()
Is this possible? If so, how?
Thanks in advance,
Christoph
Edit: To rephrase the question: Why is
HRESULT GetData([in, out, size_is(*size)] BYTE data[], [in, out] ULONG *size);
becoming
void GetData(ref byte, ref uint)
and how can I avoid the first parameter becoming a single byte in C#?
This goes wrong because you imported the COM server declarations from a type library. Type libraries were originally designed to support a sub-set of COM originally called "OLE Automation". Which restricts the kind of types you can use for method arguments. In particular, raw pointers are not permitted. An array must be declared as a SAFEARRAY. Which ensures that the caller can always index an array safely, safe arrays have extra metadata that describes the rank and the lower/upper bounds of the array.
The [size_is] attribute is only understood by MIDL, it is used to create the proxy and the stub for the interface. Knowing how many elements the array contains is also important when it needs to be copied into an interop packet that's sent on the wire to the stub.
Since type libraries don't support a declaration like this, the [size_is] attribute is stripped and the type library importer only sees BYTE*. Which is ambiguous, that can be a byte passed by reference or it can be a pointer to an array of bytes. The importer chooses the former since it has no hope of making an array work, it doesn't know the size of the array. So you get ref byte.
To fix this issue, you have to alter the import library so you can provide the proper declaration of the method. Which requires the [MarshalAs] attribute to declare the byte[] argument an LPArray with the SizeParamIndex property set so you can tell the CLR that the array size is determined by the size argument. There are two basic ways to go about it:
Decompile the interop library with ildasm.exe, modify the .il file and put it back together with ilasm.exe. You'd use a sample C# declaration that you look at with ildasm.exe to know how to edit the IL. This is the approach that Microsoft recommends.
Use a good decompiler that can decompile IL back to C#. Reflector and ILSpy are popular. Copy/paste the generated code into a source file of your project and edit the method, applying the [MarshalAs] attribute. Advantage is that editing is easier and you no longer have a dependency on the interop library anymore.
In either case, you want to make sure that the COM server is stable so you don't have to do this very often. If it is not then modifying the server itself is highly recommended, use a safe array.
I think I found the solution on http://msdn.microsoft.com/en-gb/library/z6cfh6e6(v=vs.110).aspx#cpcondefaultmarshalingforarraysanchor2: This is the default behaviour for C-style arrays. One can avoid that by using SAFEARRAYs.
When should we use this attribute and why do we need it? For example, if the native function in c takes as a parameter a pointer to unsigned char, and I know that it's needed to fulfill the array of unsigned chars, why can't I use array of bytes in C# to use this function? Is it necessary to do marshalling?
The runtime will be able to automatically determine how to marshal data between native and managed code in most cases, so you generally don't need to specify the attribute. MarshalAs is only necessary when there is an ambiguity in the definition (and you want to tell the runtime precisely how to marshal the data) or if you require non-default behaviour.
In my experience, MarshalAs is only really required when working with strings, since there are so many different representations in native code; unicode/ansi, c-strings or not, etc.
Additional use of MarshalAs attribute is marshalling fixed-size arrays (including fixed-size strings) with ByValArray and SizeConst parameters. For example, many structures from Windows API contain fixed-size strings.
Based on Microsoft documentation for Type marshaling
Marshaling is the process of transforming types when they need to cross between managed and native code. Marshaling is needed because
the types in the managed and unmanaged code are different. In managed
code, for instance, you have a String, while in the unmanaged world
strings can be Unicode ("wide"), non-Unicode, null-terminated, ASCII,
etc. By default, the P/Invoke subsystem tries to do the right thing
based on the default behavior, described on this article. However, for
those situations where you need extra control, you can employ the
MarshalAs attribute to specify what is the expected type on the
unmanaged side.
Generally, the runtime tries to do the "right thing" when marshaling
to require the least amount of work from you.
Which types need special handling is explained on the following link from the doc to Blittable and Non-Blittable Types:
Most data types have a common representation in both managed and
unmanaged memory and do not require special handling by the interop
marshaler. These types are called blittable types because they do not
require conversion when passed between managed and unmanaged code.
Non-blittable types would be the answer to your question. You would have to marshal for the following ones:
Array, Boolean, char, class, object, string, value type (structure), delegates, unmanaged arrays that are either COM-style safe arrays or C-style arrays with fixed or variable length.
Unmanaged structures can also contain embedded arrays or Booleans (non-blittable types). There you have to be careful according the doc:
Structures that are returned from platform invoke calls must be
blittable types. Platform invoke does not support non-blittable
structures as return types.
I'm after help on how to use complex objects either as return values or passed as parameters to C# class methods exposed to unmanaged C++ as COM components
Here's why:
I'm working on a project where we have half a dozen unmanaged C++ applications that each directly access the same Microsoft SQL Server database. We want to be able to use MS-Sql/Oracle/MySql with minimum changes and we've decided to implement a business logic plus data layer exposed via WCF services to get the required flexibility.
This strategy hinges on being able to get the unmanaged C++ to interop with the WCF service. There are a number of ways to do this, but the strategy I want to follow is to create a C# assembly exposed as a COM component which will act as a bridge between C++ and the WCF layer. This C# assembly will be loaded into unmanaged C++ process as COM component.
The C# bridge assembly will contain a helper class which has a number of methods that describe the operations that were formerly expressed as direct sql or stored proc calls in the C++ code.
I have two problems to solve
1) For an INSERT, I need to pass an object representing the entity to be inserted. On the unmanaged C++ side, the I already know that one of the entities has about 40 properties which have to make it into SQL - I don't want a C# method with 40 parameters, I want to pass an object; I don't know how to marshal a C++ object via COM into C#, so I thought about defining a Stuct on the C# side and then make the Struct COM visible.
2) How to return the result of a "SELECT this, that, other, ... ". I've seen two examples. One returns a struct[] and another returns a single struct containing a string[] for each column field and an int count member describing the length of the other member arrays.
On the C# side, I think it will be a case of defining and exposing a number of request/response structs which will be used to pass data in/out. These structs will need to be decorated with attributes that cause their members not to "change position" as a result of optimization. And the struct members may need to be decorated with the attribute that hints to the marshaller how the member should be exposed in COM.
Then of course I'll have to work out how to instantiate and populate these structs as seen as COM objects from the unmanaged C++, then I'll have to pass them in method calls and process them as return values.
This is the most difficult part for me; I grok C++ and some MFC/ATL but COM under C++ is a whole extra level of complexity. Any recommended books, blogs, tutorials on the subject of parameter passing and return value processing as I've described would be very helpful indeed.
If possible, I'd avoing bringing COM into the picture. If you control the C++ code (it sounds like you do), it should be easier to add a single C++/CLI cpp file that calls into your C# code. C++/CLI can directly access and create managed and unmanaged types and copy between them.