Why can't I compare two IntPtr's with < and >?

Why can't I compare two IntPtr's with < and >? - c#

I'm currently having a problem with unsafe pointers which appears to be a compiler bug.
Note: the problem is not with the fact that I am using pointers and unsafe code; the code works pretty well. The problem is related to a confirmed compiler bug that refuses to compile legal code under certain circumstances. If you're interested in that problem, visit my other question
Since my problem is with the declaration of pointer variables, I have decided to work around the problem, and use IntPtr instead, and cast to actual pointers when needed.
However, I noticed that I cannot do something like this:
IntPtr a = something;
IntPtr b = somethingElse;
if (a > b) // ERROR. Can't do this
{
}
The > and < operators don't seem to be defined for IntPtr. Notice that I can indeed compare two actual pointers.
IntPtr has a .ToInt64() method. However, this returns a signed value, which may return incorrect values when comparing with > and < when positive and negative values are involved.
To be honest, I don't really understand what use is there to a .ToInt64() method that returns a signed value, considering that pointer comparisons are performed unsigned, but that's not my question.
One could argue that IntPtrs are opaque handles, and it is therefore meaningless to compare with > and <. However, I would point out that IntPtr has addition and subtraction methods, which mean that there is actually a notion of order for IntPtr, and therefore > and < are indeed meaningful.
I guess I could cast the result of ToInt64() to a ulong and then compare, or cast the IntPtr to a pointer and then do the comparison, but it makes me think why aren't > and < defined for IntPtr.
Why can't I compare two IntPtrs directly?

IntPtr has always been a little neglected. Until .NET 4.0 there weren't even the Add/operator+ and Subtract/operator-.
Now... if you want to compare two pointers, cast them to long if they are an IntPtr or to ulong if they are UIntPtr. Note that on Windows you'll need a UIntPtr only if you are using a 32 bits program with the /3GB option, because otherwise a 32 bits program can only use the lower 2gb of address space, while for 64bit programs, much less than 64 bits of address space is used (48 bits at this time).
Clearly if you are doing kernel programming in .NET this changes :-) (I'm jocking here, I hope :-) )
For the reason of why IntPtr are preferred to UIntPtr: https://msdn.microsoft.com/en-us/library/system.intptr%28v=vs.110%29.aspx
The IntPtr type is CLS-compliant, while the UIntPtr type is not. Only the IntPtr type is used in the common language runtime. The UIntPtr type is provided mostly to maintain architectural symmetry with the IntPtr type.
There are some languages that don't have the distinction between signed and unsigned types. .NET wanted to support them.
Done some tests by using
editbin /LARGEADDRESSAWARE myprogram.exe
(I was even able to crash my graphic adapter :-) )
static void Main(string[] args)
{
Console.WriteLine("Is 64 bits", Environment.Is64BitProcess);
const int memory = 128 * 1024;
var lst = new List<IntPtr>(16384); // More than necessary
while (true)
{
Console.Write("{0} ", lst.Count);
IntPtr ptr = Marshal.AllocCoTaskMem(memory);
//IntPtr ptr = Marshal.AllocHGlobal(memory);
lst.Add(ptr);
if ((long)ptr < 0)
{
Console.WriteLine("\nptr #{0} ({1}, {2}) is < 0", lst.Count, ptr, IntPtrToUintPtr(ptr));
}
}
}
I was able to allocate nearly 4 gb of memory with a 32 bits program (on a 64 bit OS) (so I had negative IntPtr)
And here it is a cast from IntPtr to UIntPtr
public static UIntPtr IntPtrToUintPtr(IntPtr ptr)
{
if (IntPtr.Size == 4)
{
return unchecked((UIntPtr)(uint)(int)ptr);
}
return unchecked((UIntPtr)(ulong)(long)ptr);
}
Note that thanks to how sign extension works, you can't simply do (UIntPtr)(ulong)(long)ptr, because at 32 bits it will break.
But note that few programs really support > 2 gb on 32 bits... http://blogs.msdn.com/b/oldnewthing/archive/2004/08/12/213468.aspx

Comparing IntPtr is very, very dangerous. The core reason why the C# language disallows this, even though the CLR has no problem with it.
IntPtr is frequently used to store an unmanaged pointer value. Big problem is: pointer values are not signed values. Only an UIntPtr is an appropriate managed type to store them. Big problem with UIntPtr is that it is not a CLS-compliant type. Lots of languages don't support unsigned types. Java, JScript and early versions of VB.NET are examples. All the framework methods therefore use IntPtr.
It is so especially nasty because it often works without a problem. Starting with 32-bit versions of Windows, the upper 2 GB of the address space is reserved to the operating system so all pointer values used in a program are always <= 0x7FFFFFFFF. Works just fine in an IntPtr.
But that is not true in every case. You could be running as a 32-bit app in the wow64 emulator on a 64-bit operating system. That upper 2 GB of address space is no longer needed by the OS so you get a 4 GB address space. Which is very nice, 32-bit code often skirts OOM these days. Pointer values now do get >= 0x80000000. And now the IntPtr comparison fails in completely random cases. A value of, say, 0x80000100 actually is larger than 0x7FFFFE00, but the comparison will say it is smaller. Not good. And it doesn't happen very often, pointer values tend to be similar. And it is quite random, actual pointer values are highly unpredictable.
That is a bug that nobody can diagnose.
Programmers that use C or C++ can easily make this mistake as well, their language doesn't stop them. Microsoft came up with another way to avoid that misery, such a program must be linked with a specific linker option to get more than 2 GB of address space.

IMHO, IntPtr wasn't developed for such aims like higher/lower comparison. It the structure which stores memory address and may be tested only for equality. You should not consider relative position of anything in memory(which is managed by CLI). It's like comparing which IP in Enternet is higher.

Related

Opposite behavior of Marshal.SizeOf and sizeof operator for boolean and char data types in C#

I was comparing Marshal.SizeOf API with sizeof operator in C#. Their outputs for char and bool data types are little surprising. Here are the results:
For Boolean:
Marshal.SizeOf = 4
sizeof = 1
For char:
Marshal.SizeOf = 1
sizeof = 2
On this link from MSDN I got following text:
For all other types, including structs, the sizeof operator can be
used only in unsafe code blocks. Although you can use the
Marshal.SizeOf method, the value returned by this method is not always
the same as the value returned by sizeof. Marshal.SizeOf returns the
size after the type has been marshaled, whereas sizeof returns the
size as it has been allocated by the common language runtime,
including any padding.
I do not know a lot about technicalities of Marshaling but it has something to do with Run-time heuristics when things change. Going by that logic for bool the size changes from 1 to 4. But for char (from 2 to 1) it is just the reverse which is a boomerang for me. I thought for char also it should also increase the way it happened for bool. Can some one help me understand these conflicting behaviors?

Sorry, you really do have to consider the technicalities to make sense of these choices. The target language for pinvoke is the C language, a very old language by modern standards with a lot of history and used in a lot of different machine architectures. It makes very few assumptions about the size of a type, the notion of a byte does not exist. Which made the language very easy to port to the kind of machines that were common back when C was invented and the unusual architectures used in super-computers and digital signal processors.
C did not originally have a bool type. Logical expressions instead use int where a value of 0 represents false and any other value represents true. Also carried forward into the winapi, it does use a BOOL type which is an alias for int. So 4 was the logical choice. But not a universal choice and you have to watch out, many C++ implementations use a single byte, COM Automation chose two bytes.
C does have a char type, the only guarantee is that it has at least 8 bits. Whether it is signed or unsigned is unspecified, most implementations today use signed. Support for an 8-bit byte is universal today on the kind of architectures that can execute managed code so char is always 8 bits in practice. So 1 was the logical choice.
That doesn't make you happy, nobody is happy about it, you can't support text written in an arbitrary language with an 8-bit character type. Unicode came about to solve the disaster with the many possible 8-bit encodings that were in use but it did not have much of an affect on the C and C++ languages. Their committees did add wchar_t (wide character) to the standard but in keeping with old practices they did not nail down its size. Which made it useless, forcing C++ to later add char16_t and char32_t. It is however always 16 bits in compilers that target Windows since that is the operating system's choice for characters (aka WCHAR). It is not in the various Unix flavors, they favor utf8.
That works well in C# too, you are not stuck with 1 byte characters. Every single type in the .NET framework has an implicit [StructLayout] attribute with a CharSet property. The default is CharSet.Ansi, matching the C language default. You can however easily apply your own and pick CharSet.Unicode. You now get two bytes per character, using the utf16 encoding, the string is copied as-is since .NET also uses utf16. Making sure that the native code expects strings in that encoding is however up to you.

what is equal to the c++ size_t in c#

I have a struct in c++:
struct some_struct{
uchar* data;
size_t size;
}
I want to pass it between manged(c#) and native(c++). What is the equivalent of size_t in C# ?
P.S. I need an exact match in the size because any byte difference will results in huge problem while wrapping
EDIT:
Both native and manged code are under my full control ( I can edit whatever I want)

There is no C# equivalent to size_t.
The C# sizeof() operator always returns an int value regardless of platform, so technically the C# equivalent of size_t is int, but that's no help to you.
(Note that Marshal.SizeOf() also returns an int.)
Also note that no C# object can be larger than 2GB in size as far as sizeof() and Marshal.Sizeof() is concerned. Arrays can be larger than 2GB, but you cannot use sizeof() or Marshal.SizeOf() with arrays.
For your purposes, you will need to know what the version of code in the DLL uses for size_t and use the appropriate size integral type in C#.
One important thing to realise is that in C/C++ size_t will generally have the same number of bits as intptr_t but this is NOT guaranteed, especially for segmented architectures.
I know lots of people say "use UIntPtr", and that will normally work, but it's not GUARANTEED to be correct.
From the C/C++ definition of size_t, size_t
is the unsigned integer type of the result of the sizeof operator;

The best equivalent for size_t in C# is the UIntPtr type. It's 32-bit on 32-bit platforms, 64-bit on 64-bit platforms, and unsigned.

You better to use nint/nuint which is wrapper around IntPtr/UIntPtr

Difference between Pointers in C# and C

i have this code in c:
long data1 = 1091230456;
*(double*)&((data1)) = 219999.02343875566
when i use the same code in C# the result is:
*(double*)&((data1)) = 5.39139480005278E-315
but if i define another varibale in C# :
unsafe
{long *data2 = &(data1);}
now :
*(double)&((data2)) = 219999.02343875566
Why the difference ?

Casting pointers is always tricky, especially when you don't have guarantees about the layout and size of the underlying types.
In C#, long is always a 64-bit integer and double is always 64-bit floating point number.
In C, long can easily end up being smaller than the 64-bits needed. If you're using a compiler that translates long as a 32-bit number, the rest of the value will be junk read from the next piece of memory - basically a "buffer" overflow.
On Windows, you usually want to use long long for 64-bit integers. Or better, use something like int64_t, where you're guaranteed to have exactly 64-bits of data. Or the best, don't cast pointers.
C integer types can be confusing if you have a Java / C# background. They give you guarantees about the minimal range they must allow, but that's it. For example, int must be able to hold values in the [−32767,+32767] range (note that it's not -32768 - C had to support one's complement machines, which had two zeroes), close to C#'s short. long must be able to hold values in the [−2147483647,+2147483647] range, close to C#'s int. Finally, long long is close to C#'s long, having at least the [-2^63,+2^63] range. float and double are specified even more loosely.
Whenever you cast pointers, you throw away even the tiny bits of abstraction C provides you with - you work with the underlying hardware layouts, whatever those are. This is one road to hell and something to avoid.
Sure, these days you probably will not find one's complement numbers, or other floating points than IEEE 754, but it's still inherently unsafe and unpredictable.
EDIT:
Okay, reproducing your example fully in a way that actually compiles:
unsafe
{
long data1 = 1091230456;
long *data2 = &data1;
var result = *(double*)&((data2));
}
result ends up being 219999.002675845 for me, close enough to make it obvious. Let's see what you're actually doing here, in more detail:
Store 1091230456 in a local data1
Take the address of data1, and store it in data2
Take the address of data2, cast it to a double pointer
Take the double value of the resulting pointer
It should be obvious that whatever value ends up in result has little relation to the value you stored in data1 in the first place!
Printing out the various parts of what you're doing will make this clearer:
unsafe
{
long data1 = 1091230456;
long *pData1 = &data1;
var pData2 = &pData1;
var pData2Double = (double*)pData2;
var result = *pData2Double;
new
{
data1 = data1,
pData1 = (long)pData1,
pData2 = (long)pData2,
pData2Double = (long)pData2Double,
result = result
}.Dump();
}
This prints out:
data1: 1091230456
pData1: 91941328
pData2: 91941324
pData2Double: 91941324
result: 219999.002675845
This will vary according to many environmental settings, but the critical part is that pData2 is pointing to memory four bytes in front of the actual data! This is because of the way the locals are allocated on stack - pData2 is pointing to pData1, not to data1. Since we're using 32-bit pointers here, we're reading the last four bytes of the original long, combined with the stack pointer to data1. You're reading at the wrong address, skipping over one indirection. To get back to the correct result, you can do something like this:
var pData2Double = (double**)pData2;
var result = *(*pData2Double);
This results in 5.39139480005278E-315 - the original value produced by your C# code. This is the more "correct" value, as far as there can even be a correct value.
The obvious answer here is that your C code is wrong as well - either due to different operand semantics, or due to some bug in the code you're not showing (or again, using a 32-bit integer instead of 64-bit), you end up with a pointer to a pointer to the value you want, and you mistakenly build the resulting double on a scrambled value that includes part of the original long, as well as the stack pointer - in other words, exactly one of the reasons you should be extra cautious whenever using unsafe code. Interestingly, this also implies that when compiled as a 64-bit executable, the result will be entirely decoupled from the value of data1 - you'll have a double built on the stack pointer exclusively.
Don't mess with pointers until you understand indirection very, very well. They have a tendency to "mostly work" when used entirely wrong. Then you change a tiny part of the code (for example, in this code you could add a third local, which could change where pData1 is allocated) or move to a different architecture (32-bit vs. 64-bit is quite enough in this example), or a different compiler, or a different OS... and it breaks completely. You don't guess around your way with pointers. Either you know exactly what every single expression in the code means, or you shouldn't deal with pointers at all.

Can someone tell me what this crazy c++ statement means in C#?

First off, no I am not a student...just a C# guy porting a C++ library.
What do these two crazy lines mean? What are they equivalent to in C#? I'm mostly concerned with the size_t and sizeof. Not concerned about static_cast or assert..I know how to deal with those.
size_t Index = static_cast<size_t>((y - 1620) / 2);
assert(Index < sizeof(DeltaTTable)/sizeof(double));
y is a double and DeltaTTable is a double[]. Thanks in advance!

size_t is a typedef for an unsigned integer type. It is used for sizes of things, and may be 32 or 64 bits in size. The particular size of a size_t is implementation defined, but it is unsigned.
I suppose in C# you could use a 64-bit unsigned integer type.
All sizeof does is return the size in bytes of a C++ type. Every type takes up a certain quantity of room, and sizeof returns that size.
What your code is doing is computing the number of doubles (64-bit floats) that the DeltaTTable takes up. Essentially, it's ensuring that the table is larger than some size based on y, whatever that is.
There is no equivalent of sizeof in C#, nor does it need it. There is no reason for you to port this code to C#.

The bad news first you can't do that in C#. There's no static cast only dynamic casts. However the good news is it doesn't matter.
The two lines of code is asserting that the index is in bounds of the table so that the code won't accidentally read some arbitrary memory location. The CLR takes care of that for you. So when porting just ignore those lines they are automatically there for you any ways.
Of course this is based on an assumption based on the pattern of the code. There's no information on what Y represents and how Index is used

sizeOf calculates how much memory in bytes the DeltaTable type takes.
There is not equivalent to calculate the size like this in c# AFAIK.
I guess size_t much be a struct type in C++ code.

Why do C# containers and GUI classes use int and not uint for size related members?

I usually program in C++, but for school i have to do a project in C#.
So i went ahead and coded like i was used to in C++, but was surprised when the compiler complained about code like the following:
const uint size = 10;
ArrayList myarray = new ArrayList(size); //Arg 1: cannot convert from 'uint' to 'int
Ok they expect int as argument type, but why ? I would feel much more comfortable with uint as argument type, because uint fits much better in this case.
Why do they use int as argument type pretty much everywhere in the .NET library even if though for many cases negative numbers dont make any sense (since no container nor gui element can have a negative size).
If the reason that they used int is, that they didnt expect that the average user cares about signedness, why didnt they add overloads for uint additonally ?
Is this just MS not caring about sign correctness or are there cases where negative values make some sense/ carry some information (error code ????) for container/gui widget/... sizes ?

I would imagine that Microsoft chose Int32 because UInt32 is not CLS-compliant (in other words not all languages that use the .NET framework support unsigned integers).

Because unsigned integers are not CLS compliant. There are languages that are missing support for them, Java would be an example.

In addition to the answers talking about CLS compliance, consider that integer math (e.g. 10 + 2) results in integer (as in signed) data, which only makes sense; now consider the bother of having to cast every mathematical expression to uint to pass it to one of the methods you refer to.
As for overloads that take uint -- in many cases method arguments are stored as (or used to calculate) property values, which are usually of type int (again for CLS compliance, and possibly for integer math convenience); the discrepancy in sign would be confusing, not to mention vulnerable to overflow.

Stroustrup prefers int over "uint" in The C++ Programming Language, and I think his reasons apply to C# too:
It's about underflow with no warning:
// Unintended very large uint
uint oops = (uint)5 - (uint)10;
or:
// Unintended infinite loop
for( uint counter = 10; counter >= 0; counter-- )
; // Do something
The extra bit of info is rarely worth having to watch for these kinds of bugs.

This possibly come a "little" late, but just found the question and want to add a missing bit.
A prime example where negative values make perfect sense is in graphical frameworks. For sizes, as stated in the question, negatives are out of question, but for position values it's perfectly acceptable to have negative values. Such values make objects to appear off-screen or at least partially cropped:
It follows the very same principle as in mathematics, negative coordinates just make points to go to the opposing from where the axis grows values. Assuming that (0,0) is at the upper-left corner of the screen, negative values displace things to the left and top of that point, making them half-visible.
This is useful for example if you want to implement a scrolling region, where the contents are larger than the available space. Simply all objects positions become negative to begin disappear from the top or larger that the height for disappear from bottom.
Such things aren't limited to C#. Winforms and WPF use that, as per in the question, but most other graphical environments have the same behavior. HTML+CSS can place elements in the same way, or the C/C++ library SDL also can make use of this effect.

Develop Reference

C# (C-Sharp) is a programming language developed by Microsoft that runs on the .NET Framework.