AccessViolationException was Unhandled C# [duplicate]

AccessViolationException was Unhandled C# [duplicate] - c#

I am working on a rather large codebase in which C++ functionality is P/Invoked from C#.
There are many calls in our codebase such as...
C++:
extern "C" int __stdcall InvokedFunction(int);
With a corresponding C#:
[DllImport("CPlusPlus.dll", ExactSpelling = true, SetLastError = true, CallingConvention = CallingConvention.Cdecl)]
private static extern int InvokedFunction(IntPtr intArg);
I have scoured the net (insofar as I am capable) for the reasoning as to why this apparent mismatch exists. For example, why is there a Cdecl within the C#, and __stdcall within the C++? Apparently, this results in the stack being cleared twice, but, in both cases, variables are pushed onto the stack in the same reverse order, such that I do not see any errors, albeit the possibility that return information is cleared in the event of attempting a trace during debugging?
From MSDN: http://msdn.microsoft.com/en-us/library/2x8kf7zx%28v=vs.100%29.aspx
// explicit DLLImport needed here to use P/Invoke marshalling
[DllImport("msvcrt.dll", EntryPoint = "printf", CallingConvention = CallingConvention::Cdecl, CharSet = CharSet::Ansi)]
// Implicit DLLImport specifying calling convention
extern "C" int __stdcall MessageBeep(int);
Once again, there is both extern "C" in the C++ code, and CallingConvention.Cdecl in the C#. Why is it not CallingConvention.Stdcall? Or, moreover, why is there __stdcall in the C++?
Thanks in advance!

This comes up repeatedly in SO questions, I'll try to turn this into a (long) reference answer. 32-bit code is saddled with a long history of incompatible calling conventions. Choices on how to make a function call that made sense a long time ago but are mostly a giant pain in the rear end today. 64-bit code has only one calling convention, whomever is going to add another one is going to get sent to small island in the South Atlantic.
I'll try to annotate that history and relevance of them beyond what's in the Wikipedia article. Starting point is that the choices to be made in how to make a function call are the order in which to pass the arguments, where to store the arguments and how to cleanup after the call.
__stdcall found its way into Windows programming through the olden 16-bit pascal calling convention, used in 16-bit Windows and OS/2. It is the convention used by all Windows api functions as well as COM. Since most pinvoke was intended to make OS calls, Stdcall is the default if you don't specify it explicitly in the [DllImport] attribute. Its one and only reason for existence is that it specifies that the callee cleans up. Which produces more compact code, very important back in the days when they had to squeeze a GUI operating system in 640 kilobytes of RAM. Its biggest disadvantage is that it is dangerous. A mismatch between what the caller assumes are the arguments for a function and what the callee implemented causes the stack to get imbalanced. Which in turn can cause extremely hard to diagnose crashes.
__cdecl is the standard calling convention for code written in the C language. Its prime reason for existence is that it supports making function calls with a variable number of arguments. Common in C code with functions like printf() and scanf(). With the side effect that since it is the caller that knows how many arguments were actually passed, it is the caller that cleans up. Forgetting CallingConvention = CallingConvention.Cdecl in the [DllImport] declaration is a very common bug.
__fastcall is a fairly poorly defined calling convention with mutually incompatible choices. It was common in Borland compilers, a company once very influential in compiler technology until they disintegrated. Also the former employer of many Microsoft employees, including Anders Hejlsberg of C# fame. It was invented to make argument passing cheaper by passing some of them through CPU registers instead of the stack. It is not supported in managed code due to the poor standardization.
__thiscall is a calling convention invented for C++ code. Very similar to __cdecl but it also specifies how the hidden this pointer for a class object is passed to instance methods of a class. An extra detail in C++ beyond C. While it looks simple to implement, the .NET pinvoke marshaller does not support it. A major reason that you cannot pinvoke C++ code. The complication is not the calling convention, it is the proper value of the this pointer. Which can get very convoluted due to C++'s support for multiple inheritance. Only a C++ compiler can ever figure out what exactly needs to be passed. And only the exact same C++ compiler that generated the code for the C++ class, different compilers have made different choices on how to implement MI and how to optimize it.
__clrcall is the calling convention for managed code. It is a blend of the other ones, this pointer passing like __thiscall, optimized argument passing like __fastcall, argument order like __cdecl and caller cleanup like __stdcall. The great advantage of managed code is the verifier built into the jitter. Which makes sure that there can never be an incompatibility between caller and callee. Thus allowing the designers to take the advantages of all of these conventions but without the baggage of trouble. An example of how managed code could stay competitive with native code in spite of the overhead of making code safe.
You mention extern "C", understanding the significance of that is important as well to survive interop. Language compilers often decorate the names of exported function with extra characters. Also called "name mangling". It is a pretty crappy trick that never stops causing trouble. And you need to understand it to determine the proper values of the CharSet, EntryPoint and ExactSpelling properties of a [DllImport] attribute. There are many conventions:
Windows api decoration. Windows was originally a non-Unicode operating system, using 8-bit encoding for strings. Windows NT was the first one that became Unicode at its core. That caused a rather major compatibility problem, old code would not have been able to run on new operating systems since it would pass 8-bit encoded strings to winapi functions that expect a utf-16 encoded Unicode string. They solved this by writing two versions of every winapi function. One that takes 8-bit strings, another that takes Unicode strings. And distinguished between the two by gluing the letter A at the end of the name of the legacy version (A = Ansi) and a W at the end of the new version (W = wide). Nothing is added if the function doesn't take a string. The pinvoke marshaller handles this automatically without your help, it will simply try to find all 3 possible versions. You should however always specify CharSet.Auto (or Unicode), the overhead of the legacy function translating the string from Ansi to Unicode is unnecessary and lossy.
The standard decoration for __stdcall functions is _foo#4. Leading underscore and a #n postfix that indicates the combined size of the arguments. This postfix was designed to help solve the nasty stack imbalance problem if the caller and callee don't agree about the number of arguments. Works well, although the error message isn't great, the pinvoke marshaller will tell you that it cannot find the entrypoint. Notable is that Windows, while using __stdcall, does not use this decoration. That was intentional, giving programmers a shot at getting the GetProcAddress() argument right. The pinvoke marshaller also takes care of this automatically, first trying to find the entrypoint with the #n postfix, next trying the one without.
The standard decoration for __cdecl function is _foo. A single leading underscore. The pinvoke marshaller sorts this out automatically. Sadly, the optional #n postfix for __stdcall does not allow it to tell you that your CallingConvention property is wrong, great loss.
C++ compilers use name mangling, producing truly bizarre looking names like "??2#YAPAXI#Z", the exported name for "operator new". This was a necessary evil due to its support for function overloading. And it originally having been designed as a preprocessor that used legacy C language tooling to get the program built. Which made it necessary to distinguish between, say, a void foo(char) and a void foo(int) overload by giving them different names. This is where the extern "C" syntax comes into play, it tells the C++ compiler to not apply the name mangling to the function name. Most programmer that write interop code intentionally use it to make the declaration in the other language easier to write. Which is actually a mistake, the decoration is very useful to catch mismatches. You'd use the linker's .map file or the Dumpbin.exe /exports utility to see the decorated names. The undname.exe SDK utility is very handy to convert a mangled name back to its original C++ declaration.
So this should clear up the properties. You use EntryPoint to give the exact name of the exported function, one that might not be a good match for what you want to call it in your own code, especially for C++ mangled names. And you use ExactSpelling to tell the pinvoke marshaller to not try to find the alternative names because you already gave the correct name.
I'll nurse my writing cramp for a while now. The answer to your question title should be clear, Stdcall is the default but is a mismatch for code written in C or C++. And your [DllImport] declaration is not compatible. This should produce a warning in the debugger from the PInvokeStackImbalance Managed Debugger Assistant, a debugger extension that was designed to detect bad declarations. And can rather randomly crash your code, particularly in the Release build. Make sure you didn't turn the MDA off.

cdecl and stdcall are both valid and usable between C++ and .NET, but they should consistent between the two unmanaged and managed worlds. So your C# declaration for InvokedFunction is invalid. Should be stdcall. The MSDN sample just gives two different examples, one with stdcall (MessageBeep), and one with cdecl (printf). They are unrelated.

Related

How to give a C# DLL a C-compliant interface?

A third-party app I have can call extension DLLs if they have C-compliant interfaces as described below.
I would like the third-party app to call my C# DLL, since I would rather write in C# than C. (Maybe my other choice would be to write a C DLL wrapper to call my C# DLL, which is an extra step that I might not get right).
I've searched the net and SO but haven't found (or recognized) a good match to my question.
The best suggestion seemed to be this special marshalling declarations in the C# code here: Calling a C# DLL from C; linking problems
Is it possible to write a C# DLL that has a C-compliant interface like the one below? If so, could someone tell me what I have to do or point me to some documentation?
The third-party app documentation says, "The DLL function must be a PASCAL function that takes a single LPCSTR (long pointer to a constant string) argument. C or C++ DLL functions called must have prototypes equivalent to:
extern "C" __declspec(dllexport) void __stdcall fn(LPCTSTR szParam );

Take a look at this article: https://www.codeproject.com/Articles/12512/Using-the-CDECL-calling-convention-in-C-changing
It discusses a different problem (opposite to yours) but also explains the linkage of C# functions - by default C# uses __stdcall (and it cannot be overridden). This means your third party app should be able to call the C# function as it stands. Have you tried it? Did you get any errors?

Create COM interface returning a pointer that is marshalled as IntPtr in C#

I want to declare a COM Interface in MIDL that allows for returning a pointer (like in the ID3D11Blob). I understand that pointers are a special thing in COM because of the stubs generated for RPC calls. I do not need RPC, but only want to access the COM server from C#. The question is: can I declare the interface in such a way that the C# stub returns an IntPtr? I have tried to add [local] to enable void pointers, but that does not suffice.
The interface should look in MIDL like
[local] void *PeekData(void)
and in C# like
IntPtr PeekData()
Is this possible? If so, how?
Thanks in advance,
Christoph
Edit: To rephrase the question: Why is
HRESULT GetData([in, out, size_is(*size)] BYTE data[], [in, out] ULONG *size);
becoming
void GetData(ref byte, ref uint)
and how can I avoid the first parameter becoming a single byte in C#?

This goes wrong because you imported the COM server declarations from a type library. Type libraries were originally designed to support a sub-set of COM originally called "OLE Automation". Which restricts the kind of types you can use for method arguments. In particular, raw pointers are not permitted. An array must be declared as a SAFEARRAY. Which ensures that the caller can always index an array safely, safe arrays have extra metadata that describes the rank and the lower/upper bounds of the array.
The [size_is] attribute is only understood by MIDL, it is used to create the proxy and the stub for the interface. Knowing how many elements the array contains is also important when it needs to be copied into an interop packet that's sent on the wire to the stub.
Since type libraries don't support a declaration like this, the [size_is] attribute is stripped and the type library importer only sees BYTE*. Which is ambiguous, that can be a byte passed by reference or it can be a pointer to an array of bytes. The importer chooses the former since it has no hope of making an array work, it doesn't know the size of the array. So you get ref byte.
To fix this issue, you have to alter the import library so you can provide the proper declaration of the method. Which requires the [MarshalAs] attribute to declare the byte[] argument an LPArray with the SizeParamIndex property set so you can tell the CLR that the array size is determined by the size argument. There are two basic ways to go about it:
Decompile the interop library with ildasm.exe, modify the .il file and put it back together with ilasm.exe. You'd use a sample C# declaration that you look at with ildasm.exe to know how to edit the IL. This is the approach that Microsoft recommends.
Use a good decompiler that can decompile IL back to C#. Reflector and ILSpy are popular. Copy/paste the generated code into a source file of your project and edit the method, applying the [MarshalAs] attribute. Advantage is that editing is easier and you no longer have a dependency on the interop library anymore.
In either case, you want to make sure that the COM server is stable so you don't have to do this very often. If it is not then modifying the server itself is highly recommended, use a safe array.

I think I found the solution on http://msdn.microsoft.com/en-gb/library/z6cfh6e6(v=vs.110).aspx#cpcondefaultmarshalingforarraysanchor2: This is the default behaviour for C-style arrays. One can avoid that by using SAFEARRAYs.

What do HRESULT, DWORD, and HANDLE mean in unmanaged code?

I was reading about Marshaling. and im confused because what does mean this in unmanaged code.
HRESULT, DWORD, and HANDLE.
The original text is:
You already know that there is no such compatibility between managed and unmanaged environments. In other words, .NET does not contain such the types HRESULT, DWORD, and HANDLE that exist in the realm of unmanaged code. Therefore, you need to find a .NET substitute or create your own if needed. That is what called marshaling.

short answer:
it is just telling you that you must "map" one data type used in one programming language to another data type used in a different programming language, and the data types must match.
quick answer:
For this one, the details may not be correct, but the concept is.
These are a few of the data types defined in the Windows header files for C/C++. They are "macros" which "abstract" the primitive data types of C/C++ into more meaningful data types used in Windows programming. For instance, DWORD is really an 32-bit unsigned integer in C/C++, but on 64-bit processors, it is defined in the header files as a 64-bit unsigned integer. The idea is to provide an abstraction layer between the data type needed by the processor and the data types used by the language.
During marshalling, this "dword" will be converted to the CLR data type you specify in the DllImport declaration. This is an important point.
Let's say you want to call a Windows API method that takes a DWORD parameter. When declaring this call in C# using DllImport, you must specify the parameter data type as System.UInt32. If you don't, "bad things will happen".
For example, if you mistakenly specify the parameter data type as System.UInt64. When the actual call is made, the stack will become corrupt because more bytes are being placed on the stack then the API call expects. Which can lead to completely unexpected behavior, such as crashing the application, crashing Windows, invalid return values, or whatever.
That is why it is important to specific the correct data type.
data types in question:
DWORD is defined as 32-bit unsigned integer or the CLR type System.UInt32.
HANDLE is the CLR types IntPtr, UintPtr, or HandleRef
HRESULT is System.Int32 or System.UInt32
References:
Using P/Invoke to Call Unmanaged APIs from Your Managed Classes at http://msdn.microsoft.com/en-us/library/aa719104(v=vs.71).aspx has a table listing the Windows data type with its corresponding CLR data type that specifically answers your question.
Windows Data Types (Windows) at http://msdn.microsoft.com/en-us/library/aa383751(v=VS.85).aspx
.NET Column: Calling Win32 DLLs in C# with P/Invoke at http://msdn.microsoft.com/en-us/magazine/cc164123.aspx

HRESULT: http://en.wikipedia.org/wiki/HRESULT
In the field of computer programming, the HRESULT is a data type used
in Windows operating systems, and the earlier IBM/Microsoft OS/2
Operating system, used to represent error conditions, and warning
conditions. The original purpose of HRESULTs was to formally lay out
ranges of error codes for both public and Microsoft internal use in
order to prevent collisions between error codes in different
subsystems of the OS/2 Operating System. HRESULTs are numerical error
codes. Various bits within an HRESULT encode information about the
nature of the error code, and where it came from. HRESULT error codes
are most commonly encountered in COM programming, where they form the
basis for a standardized COM error handling convention.
DWORD: http://en.wikipedia.org/wiki/DWORD#Size_families
HANDLE: http://en.wikipedia.org/wiki/Handle_(computing)
In computer programming, a handle is an abstract reference to a
resource. Handles are used when application software references blocks
of memory or objects managed by another system, such as a database or
an operating system. While a pointer literally contains the address of
the item to which it refers, a handle is an abstraction of a reference
which is managed externally; its opacity allows the referent to be
relocated in memory by the system without invalidating the handle,
which is impossible with pointers. The extra layer of indirection also
increases the control the managing system has over operations
performed on the referent. Typically the handle is an index or a
pointer into a global array of tombstones.

HRESULT, DWORD, and HANDLE are typedef's (i.e., they represent plain data types) defined by Microsoft for use by programmers compiling *un*managed code in a Windows environment. They are defined in a C (or C++) header file that is provided by Microsoft that is, typically, automatically included in unmanaged Windows projects created within Microsoft Visual Studio.

Clarification for copying strings from native structures

I'm using the PInvoke stuff in order to make use of the SetupAPI functions from C++. I'm using this to get paths to USB devices conforming to the HID spec. I've got everything working but something I don't understand has me puzzled. Using this structure from the SetupAPI:
typedef struct _SP_DEVICE_INTERFACE_DETAIL_DATA {
DWORD cbSize;
TCHAR DevicePath[ANYSIZE_ARRAY];
} SP_DEVICE_INTERFACE_DETAIL_DATA, *PSP_DEVICE_INTERFACE_DETAIL_DATA;
I don't get the same results as the example code I'm using. First off, I'm using an IntPtr and allocating memory using Marshal.AllocHGlobal() to pass this back and forth. I call SetupDiGetDeviceInterfaceDetail() twice, first to get the size of the buffer I need, and second to actually get the data I'm interested in. I'm looking to get the Path to this device, which is stored in this struct.
The code I'm going off of does this:
IntPtr pDevPath = new IntPtr(pDevInfoDetail.ToInt32() + 4);
string path = Marshal.PtrToStringAuto(pDevPath);
Which works just fine. I did that and the string I got was gibberish. I had to change it to
IntPtr pDevPath = new IntPtr(pDevInfoDetail.ToInt32() + 4);
string path = Marshal.PtrToStringAnsi(pDevPath);
to make it work. Why is this? Am I missing some setting for the project/solution that informs this beast how to treat strings and chars? So far, the MSDN article for PtrToStringAuto() doesn't tell me much about it. In fact, it looks like this method should have made the appropriate decision, called either the Unicode or Ansi version for my needs, and all would be well.
Please explain.

First of all, +10000 on using a real P/Invoke interop type and not marshalling data by hand. But since you asked, here's what's going on with your strings.
The runtime decides how to treat strings and chars on a per-case basis, based on the attributes you apply to your interop declaractions, the context in which you use interop, the methods you call, etc. Every type of P/Invoke declaration (extern method, delegate, or structure) allows you to specify the default character size for the scope of that definision. There are three options:
Use CharSet.Ansi, which converts the managed Unicode strings to 8-bit characters
Use CharSet.Unicode, which passes the string data as 16-bit characters
Use CharSet.Auto, which decides at runtime, based on the host OS, which one to use.
In general, I hate CharSet.Auto because it's mostly pointless. Since the Framework doesn't even support Windows 95, the only time "Auto" doesn't mean "Unicode" is when running on Windows 98. But there's a bigger problem here, which is that the runtime's decision on how to marshal strings happens at the "wrong time".
The unmanaged code you are calling made that decision at compile time, since the compiler had to decide if TCHAR meant char or wchar -- that decision is based on the presence of the _UNICODE preprocessor macro. That means that, for most libraries, it's going to always use one or the other, and there's no point in letting the CLR "pick one".
For Windows system components, things are a bit better, because the Unicode-aware builds actually include two versions of most system function. The Setup API, for example, has two methods: SetupDiGetDeviceInterfaceDetailA and SetupDiGetDeviceInterfaceDetailW. The *A version uses 8-bit "ANSI" strings and the *W version uses 16-bit wide "Unicode" strings. It similarly has ANSI and Wide version of any structure which has a string.
This is the kind of situation where CharSet.Auto shines, assuming you use it properly. When you apply a DllImport to a function, you can specify the character set. If you specify Ansi for the character set, if the runtime doesn't find an exact match to your function name, it appends the A and tries again. (Oddly, if you specify Unicode, it will call the *W function first, and only try an exact match if that fails.)
Here's the catch: if you don't specify a character set on your DllImport, the default is CharSet.Ansi. This means you are going to get the ANSI version of the function, unless you specifically override the charset. That's most likely what is happening here: you are calling the ANSI version of SetupDiGetDeviceInterfaceDetail by default, and thus getting an ANSI string back, but PtrToStringAuto wants to use Unicode because you're probably running at least Windows XP.
The BEST option, assuming we can ignore Windows 98, would be to specify CharSet.Unicode all over the place, since SetupAPI supports it, but at the very least, you need to specify the same CharSet value everywhere.

Unbalanced Stack!

I have written a VC++ dll. The declaration for one of the methods in the dll is as follows:
extern "C" _declspec(dllexport)
void startIt(int number)
{
capture = cvCaptureFromCAM(number);
}
I use this dll in a C# code using P/Invoke. I make the declaration as:
[DllImport("Tracking.dll", EntryPoint = "startIt")]
public extern static void startIt(int number);
and I call the function in the code as:
startIt(0);
Now, when this line is encountered, the compiler is throwing me this error:
A call to PInvoke function 'UsingTracking!UsingTracking.Form1::startIt' has
unbalanced the stack. This is likely because the managed PInvoke signature does
not match the unmanaged target signature. Check that the calling convention
and parameters of the PInvoke signature match the target unmanaged signature.
I cannot understand why is it throwing this error as the signature in both managed and unmanaged code are the same. Moreover, in my another machine, the same code is running perfectly in visual studio. So, this makes me think that the error thrown is mis leading.
Please help.
Thanks

When you p/invoke an external function, the calling convention used defaults to __stdcall. Since your function uses the __cdecl convention, you need to declare it as such:
[DllImport("Tracking.dll", EntryPoint = "startIt",
CallingConvention = CallingConvention.Cdecl)]
public extern static void startIt(int number);

Could you be missing CallingConvention=CallingConvention.Cdecl in your DllImport attribute?

Constantin and Frederic Hamidi have answered this question correctly as to how to fix this problem. This can help avoid an eventual stack overflow. I have bitten by this several times myself. What is really at play here is that .NET 4 has enabled a managed debugging assistant for debug (not release) builds on 32 bit x86 machines (not 64 bit ) that checks for an incorrectly specified p/invoke call. This MSDN article details this : http://msdn.microsoft.com/en-us/library/0htdy0k3.aspx. Stephen Cleary deserves the credit for identifying this on this post: pinvokestackimbalance -- how can I fix this or turn it off?

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.