Casting IntPtr to int only works sometimes - c#

Consider this code:
IntPtr p = (IntPtr) (long.MaxValue); // Not a valid ptr in 32 bit,
// but this is to demonstrate the exception for 64 bit
Console.WriteLine((int)(long)p);
Console.WriteLine((int)p);
The second WriteLine throws an OverflowException when compiling and running on 64 bit. This is documented here.
My question is: why?
When converting a pointer to an Int32 you conceptually lose all pointer semantics, and reduce the pointer semantically to just it's integer representation. Then why throw an exception, instead of truncating the value to fit inside an integer? That would be the most sensible thing to do imho. Because if the programmer really wanted to avoid truncation at all cost, why put it into an int in the first place?
Is this design on purpose, or is this an incomplete implementation of the conversion operator?
I feel some clarification is necessary after reading the first comments. My question can also be read as follows:
Is there a realistic use case where you would put an IntPtr into an int and later translate it back to an IntPtr, such that it is still valid after that?
As a reply to people asking for use cases, let me give the only use case for putting an IntPtr into an int that I can come up with: Implementing GetHashCode for a managed wrapper object around an unmanaged object.

The implementation of the cast operator for IntPtr to int is as follows (from ReferenceSource)
[System.Security.SecuritySafeCritical] // auto-generated
[System.Runtime.Versioning.NonVersionable]
public unsafe static explicit operator int (IntPtr value)
{
#if WIN32
return (int)value.m_value;
#else
long l = (long)value.m_value;
return checked((int)l);
#endif
}
As you can see, for Windows x64 it will be explicitly using checked to force an exception on overflow. (And I'm very glad that it does!)
Conversely, when you cast it to a long first and then cast to an int yourself, by default it will be done unchecked and therefore no exception will be thrown.
I can only guess at the reason they did this, but it seems fairly obvious that truncating a pointer is almost always A Bad Thing, so they decided that the code should throw an exception if truncation happens.

Related

Regarding an implementation detail of dotnet/runtime

Recently, I'm excited to read parts of the corefx (which moved recently to dotnet/runtime repo). I came across the Array.CopyTo method:
public static void Copy(Array sourceArray, long sourceIndex, Array destinationArray, long destinationIndex, long length)
{
int isourceIndex = (int)sourceIndex;
int idestinationIndex = (int)destinationIndex;
int ilength = (int)length;
if (sourceIndex != isourceIndex)
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.sourceIndex, ExceptionResource.ArgumentOutOfRange_HugeArrayNotSupported);
if (destinationIndex != idestinationIndex)
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.destinationIndex, ExceptionResource.ArgumentOutOfRange_HugeArrayNotSupported);
if (length != ilength)
ThrowHelper.ThrowArgumentOutOfRangeException(ExceptionArgument.length, ExceptionResource.ArgumentOutOfRange_HugeArrayNotSupported);
Copy(sourceArray, isourceIndex, destinationArray, idestinationIndex, ilength);
}
The thing caught my eye is that the method converts the long arguments to int (means that it discards the MSBs), and compre them to the actual arguments, and throw exception if they're not equal. What I see is the exception will be thrown if and only if any of these long arguments is greater than int.MaxValue or less than int.MinValue. The question is, why those arguments are long from the beginning ? Why not just make them int.
Only Microsoft can answer the "why?" question definitively. That said…
That method, and others like it, are provided as a convenience. There are situations where one might have a long value representing the region in the array(s) to be copied. For example, working with unmanaged memory, later copied piece-wise to a managed array.
Without such overloads, the caller would have to do the same work to have the same degree of safety, and quite possibly would do it incorrectly (e.g. just cast the value without checking for overflow).
By providing the long-parameter overloads, the framework offers both convenience and better resistance to client-code bugs. This way the client can still use long values when appropriate, and pass them directly to the framework API without having to do extra work to covert them properly.
The reason for this is that the API defines it as long to support a big range of values but this implementation does not support copying “huge arrays” so it throws an exception. It is not defined in the API that “huge arrays” shouldn’t be supported, it’s just the implementation. Other implementations might support any array size a long can define, or this implementation could be changed to support it. If the type were int then a breaking API change would be needed.
The .NET APIs usually handle lengths and sizes with longs anyway so that there’s no need to change it there is a need for huge values.

Marshaling an array of boolean vs marshaling a single boolean (defined as int) to bool in C#

In a C API I have BOOL defined as follows
#ifndef BOOL
#define BOOL int
And I have a struct which, among others, has a simple BOOL member and an array of BOOLs
struct SomeStruct
{
BOOL bIsSomething;
BOOL bHasSomething[5];
}
Now I found out that when I want to cast the whole struct I have to marshal them differently:
the single BOOL I marshal with I1 and the fixed length array I have to marshal with I4 (if I don't their struct sizes won't match and I will have problems extracting an array of these structs into C#):
[StructLayout(LayoutKind.Sequential)]
public struct SomenNativeStruct
{
[MarshalAs(UnmanagedType.I1)]
public bool bIsSomething;
[MarshalAs(UnmanagedType.ByValArray, ArraySubType = UnmanagedType.I4, SizeConst = 5)]
public bool[] bHasSomething;
}
I suspect I do something wrong because I'm not sure why I should need to marshal the same type differently depending on whether I get it as a fixed size array or as a single member.
If I'm marshalling them all as I4 I get a System.ArgumentException
An unhandled exception of type 'System.ArgumentException' occurred in SomeDll.dll
Additional information: Type 'Namespace.Document+SomeNativeStruct' cannot be marshaled as an unmanaged structure; no meaningful size or offset can be computed.
bool is a tricky type to interop. There's many mutually incompatible definitions of what a boolean value is, so bool is considered a non-blittable type - that is, it needs to be truly marshalled, rather than just sticking a "totally a bool" tag to the data. And arrays of non-blittable types are doubly-tricky.
The simplest solution would be to avoid using bool entirely. Just replace the bool[] with int[], and provided the original type is actually a 32-bit int (depends on the compiler and platform), you'll get correct interop. You can then manually copy the interop struct to a managed struct with a more sane layout, if you so choose - which also gives you full control over interpreting which int values correspond to true and false, respectively.
In general, native interop is always tricky; you need to have a good understanding of the actual memory layout as well as the meaning of the values and types you're dealing with. The types aren't enough - they're too ambiguous, especially in standard C (which is often the standard for native interop even today). Headers aren't enough - you also need the docs, and perhaps even a look in a (native) debugger.
Extra danger comes from the fact that there's no safety net that tells you you're doing things somewhat wrong - the wrong interop approach can appear to work just fine for years, and then suddenly blow up in your face when e.g. a true value happens to be 42 instead of the more usual -1, and your bitwise arithmetics breaks subtly (this can actually happen in C#, if you use unsafe code). Everything might work great for values smaller than 32768, and then break horribly. There's plenty of hard to catch error cases, so you need extra caution.

Safe way to cast (V)C++ long* to C# Int32*?

I'm currently writing a DLL in C++/CLI which will act as "proxy" between an unmanaged program and another C# DLL. The calling program requires my "proxy DLL" to implement various procedures that will be called by the unmanaged program. So far, no problem.
But: One of the functions has the following prototype:
extern "C" __declspec ( dllexport ) long Execute(unsigned long command, long nInBytes, byte bInData[], long nOutBytes, long* pnUsedOutBytes, byte bOutData[])
Well, my proxy DLL simply calls the C# DLL which provides the following function prototype (which was also given by the documentation of the calling program):
unsafe public UInt32 Execute(UInt32 command, Int32 nInBytes, byte* pInData, Int32 nOutBytes, Int32* pnUsedOutBytes, byte* pOutData);
The compiler throws an error (C2664) at parameter 5 pnUsedOutBytes and tells me, that long* cannot be cast to int*. Well OK, long and int currently have the same implementation which might change at some point in the future, so the thrown error is understandable (though the non-pointer long uses do not throw an error?).
Back to the actual question: What is the best solution to call my C# function? I've already read that (of course) the best solution is to use .NET types when calling a .NET function. So: Is it safe to do a simple type casting when calling the function or might there by any bad circumstance where this type cast will not work?
Using this line calms down the compiler, but is it really safe?
curInstance->Execute(command, nInBytes, pInData, nOutBytes, (System::Int32*)pnUsedOutBytes, pOutData);
Thanks in advance!
No, don't use that cast. Suppose that the actual value that pnUsedOutBytes was greater than 2^32. Best case, the call to Execute would overwrite the low bytes and leave the bits above 32 alone, resulting in a wrong answer.
The solution is to call Execute with a pointer to a 32-bit data type. Create one in your proxy, give it a sensible starting value if needed, make the call, and copy the resulting value into the long that pnUsedOutBytes points to.
Oh, and don't paraphrase error messages. The error message did not say that you can't cast long* to int*; you can. What it almost certainly said is that the compiler can't convert long* to int*. That's correct: there is no implicit conversion between the two types. Adding a cast tells the compiler to do it; with that you have an explicit conversion.
The easiest solution is just to fix the signature of the exported function:
extern "C" __declspec ( dllexport ) int32_t Execute(uint32_t command, int32_t nInBytes, byte bInData[], int32_t nOutBytes, int32_t* pnUsedOutBytes, byte bOutData[])
LoadLibrary will give you no grief whatsoever about the difference between int32_t and int and long, since they are all 32-bit integral types.
(Actually, LoadLibrary won't give you any grief for a bunch of actual errors either, ... but in this case you aren't using an incompatible type)

If Int32 is just an alias for int, how can the Int32 class use an int?

Been browsing through .NET source code of .NET Framework Reference Source, just for fun of it. And found something I don't understand.
There is a Int32.cs file with C# code for Int32 type. And somehow that seems strange to me. How does the C# compiler compile code for Int32 type?
public struct Int32: IComparable, IFormattable, IConvertible {
internal int m_value;
// ...
}
But isn't this illegal in C#? If int is only an alias for Int32, it should fail to compile with Error CS0523:
Struct member 'struct2 field' of type 'struct1' causes a cycle in the struct layout.
Is there some magic in the compiler, or am I completely off track?
isn't this illegal in C#? If "int" is only alias for "Int32" it should fail to compile with error CS0523. Is there some magic in the compiler?
Yes; the error is deliberately suppressed in the compiler. The cycle checker is skipped entirely if the type in question is a built-in type.
Normally this sort of thing is illegal:
struct S { S s; int i; }
In that case the size of S is undefined because whatever the size of S is, it must be equal to itself plus the size of an int. There is no such size.
struct S { S s; }
In that case we have no information from which to deduce the size of S.
struct Int32 { Int32 i; }
But in this case the compiler knows ahead of time that System.Int32 is four bytes because it is a very special type.
Incidentally, the details of how the C# compiler (and, for that matter, the CLR) determines when a set of struct types is cyclic is extremely interesting. I'll try to write a blog article about that at some point.
int is an alias for Int32, but the Int32 struct you are looking at is simply metadata, it is not a real object. The int m_value declaration is possibly there only to give the struct the appropriate size, because it is never actually referenced anywhere else (which is why it is allowed to be there).
So, in other words, the compiler kind of saves this from being a problem. There is a discussion on the topic in the MSDN Forums.
From the discussion, here is a quote from the chosen answer that helps to try to determine how the declaration is possible:
while it is true that the type contains an integer m_value field - the
field is never referenced. In every supporting method (CompareTo,
ToString, etc), "this" is used instead. It is possible that the
m_value fields only exist to force the structures to have the
appropriate size.
I suspect that when the compiler sees "int", it translates it into "a
reference to System.Int32 in mscorlib.dll, to be resolved later", and
since it's building mscorlib.dll, it does end up with a cyclical
reference (but not one that can ever cause problems, because m_value
is never used). If this assumption is correct, then this trick would
only work for special compiler types.
Reading further, it can be determined that the struct is simply metadata, and not a real object, so it is not bound by the same recursive definiton restraints.

Difference between IntPtr and UIntPtr

I was looking at the P/Invoke declaration of RegOpenKeyEx when I noticed this comment on the page:
Changed IntPtr to UIntPtr: When invoking with IntPtr for the handles, you will run into an Overflow. UIntPtr is the right choice if you wish this to work correctly on 32 and 64 bit platforms.
This doesn't make much sense to me: both IntPtr and UIntPtr are supposed to represent pointers so their size should match the bitness of the OS - either 32 bits or 64 bits. Since these are not numbers but pointers, their signed numeric values shouldn't matter, only the bits that represent the address they point to. I cannot think of any reason why there would be a difference between these two but this comment made me uncertain.
Is there a specific reason to use UIntPtr instead of IntPtr? According to the documentation:
The IntPtr type is CLS-compliant, while the UIntPtr type is not. Only the IntPtr type is used in the common language runtime. The UIntPtr type is provided mostly to maintain architectural symmetry with the IntPtr type.
This, of course, implies that there's no difference (as long as someone doesn't try to convert the values to integers). So is the above comment from pinvoke.net incorrect?
Edit:
After reading MarkH's answer, I did a bit of checking and found out that .NET applications are not large address aware and can only handle a 2GB virtual address space when compiled in 32-bit mode. (One can use a hack to turn on the large address aware flag but MarkH's answer shows that checks inside the .NET Framework will break things because the address space is assumed to be only 2GB, not 3GB.)
This means that all correct virtual memory addresses a pointer can have (as far as the .NET Framework is concerned) will be between 0x00000000 and 0x7FFFFFFF. When this range is translated to signed int, no values would be negative because the highest bit is not set. This reinforces my belief that there's no difference in using IntPtr vs UIntPtr. Is my reasoning correct?
Fermat2357 pointed out that the above edit is wrong.
UIntPtr and IntPtr are internal implemented as
private unsafe void* m_value;
You are right both simply only managing the bits that represent a address.
The only thing where I can think about an overflow issue is if you try to perform pointer arithmetics. Both classes support adding and subtracting of offsets. But also in this case the binary representation should be ok after such an operation.
From my experience I would also prefer UIntPtr, because I think on a pointer as an unsigned object. But this is not relevant and only my opinion.
It seems not make any difference if you use IntPtr or UIntPtr in your case.
EDIT:
IntPtr is CLS-compliant because there are languages on top of the CLR which not support unsigned.
This, of course, implies that there's no difference (as long as someone doesn't try to convert the values to integers).
Unfortunately, the framework attempts to do precisely this (when compiled specifically for x86). Both the IntPtr(long) constructor, and the ToInt32() methods attempt to cast the value to an int in a checked expression. Here's the implementation seen if using the framework debugging symbols.
public unsafe IntPtr(long value)
{
#if WIN32
m_value = (void *)checked((int)value);
#else
m_value = (void *)value;
#endif
}
Of course, the checked expression will throw the exception if the value is out of bounds. The UIntPtr doesn't overflow for the same value, because it attempts to cast to uint instead.
The difference between IntPtr\UIntPtr is the same as the differences found between: Int32\UInt32 (ie, it's all about how the numbers get interpreted.)
Usually, it doesn't matter which you choose, but, as mentioned, in some cases, it can come back to bite you.
(I'm not sure why MS chose IntPtr to begin with(CLS Compliance, etc,.), memory is handled as a DWORD(u32), meaning it's unsigned, thus, the preferred method should be UIntPtr, not IntPtr, right?)
Even: UIntPtr.Add()
Seems wrong to me, it takes a UIntPtr as a pointer, and an 'int' for the offset. (When, to me, 'uint' would make much more sense. Why feed a signed value to an unsigned method, when, more than likely the code cast it to 'uint' under the hood. /facepalm)
I would personally prefer UIntPtr over IntPtr simply because the unsigned values match the values of the underlying memory which I'm working with. :)
Btw, I'll likely end up creating my own pointer type(using UInt32), built specifically for working directly with memory. (I'm guessing that UIntPtr isn't going to catch all the possible bad memory issues, ie, 0xBADF00D, etc, etc,. Which is a CTD waiting to happen... I'll have to see how the built in type handles things first, hopefully, a null\zero check is properly filtering out stuff like this.)

Categories

Resources