Why do the SByte and Int32 CompareTo() methods behave differently? - c#

If you run the following code:
SByte w = -5;
Console.WriteLine(w.CompareTo(0));
Int32 x = -5;
Console.WriteLine(x.CompareTo(0));
SByte y = 5;
Console.WriteLine(y.CompareTo(0));
Int32 z = 5;
Console.WriteLine(z.CompareTo(0));
then you get the following output:
-5
-1
5
1
Why do these methods with the same name that have almost identical descriptions in the MSDN documentation behave so differently?

Because the SByte.CompareTo() is implemented like
return m_value - value;
so a simple subtraction. This works because the m_value is converted automatically to int, and any possibile combination of values is "legal" with int.
With two Int32 this can't be done, because for example Int32.MinValue.CompareTo(Int32.MaxValue) would become Int32.MinValue - Int32.MaxValue that would be outside the int range, and in fact it is implemented as two comparisons:
if (m_value < value) return -1;
if (m_value > value) return 1;
return 0;
in general
The only important "thing" of the returned value of a CompareTo is its sign (or if it is 0). The "value" is irrelevant. The return value of 1, 5, 500, 5000, 5000000 of CompareTo() are the same. CompareTo can't be used to measure "distance" between numbers. So both implementations are equivalent.
It is totally wrong to do:
if (someValue.CompareTo(someOtherValue) == -1)
you must always
if (someValue.CompareTo(someOtherValue) < 0)
why the SByte.CompareTo is built that way
SByte.CompareTo is implementing a "branchless" comparison (there are no ifs in the code, the flow of code is linear). Processors have problems with branches, so branchless code could be faster than "branchful" code, so this microoptimization. Clearly SByte.CompareTo could have been written as Int32.CompareTo.
why any negative value is equivalent to -1 (and any positive value is equivalent to +1)
This is probably something that is derived directly from the C language: the qsort function for example to compare items uses a user-defined method that is like:
Pointer to a function that compares two elements.
This function is called repeatedly by qsort to compare two elements. It shall follow the following prototype:
int compar (const void* p1, const void* p2);
Taking two pointers as arguments (both converted to const void*). The function defines the order of the elements by returning (in a stable and transitive manner):
return value meaning
<0 The element pointed to by p1 goes before the element pointed to by p2
0 The element pointed to by p1 is equivalent to the element pointed to by p2
>0 The element pointed to by p1 goes after the element pointed to by p2
how is the .CompareTo implemented in other primitive types?
SByte, Byte, Int16, UInt16, Char all use the subtraction "method", while Int32, UInt32, Int64, UInt64 all use the if "method".

Looking at the source for these two methods, they are implemented differently:
public int CompareTo(sbyte value)
{
return (int)(this - value);
}
vs
public int CompareTo(int value)
{
if (this < value)
{
return -1;
}
if (this > value)
{
return 1;
}
return 0;
}
But none of this matters, since the sign of the returned value is the only thing that you should be checking.

Related

Unchecked assignment in generic function

I'm trying to write a generic method to perform an unchecked assignment from a long to other type. Here is the simplified version:
private static void AssignHex<T>(string hex, out T val) where T : struct
{
if (long.TryParse(hex, NumberStyles.AllowHexSpecifier, null, out long lval))
{
unchecked
{
val = (T)Convert.ChangeType(lval, typeof(T));
}
}
else
{
val = default(T);
}
}
This works fine except for input string "FFFFFFFF" and type int where I expect to get -1 and instead get overflow exception. The equivalent non-generic method works fine:
private static void AssignHex2(string hex, out int val)
{
if (long.TryParse(hex, NumberStyles.AllowHexSpecifier, null, out long lval))
{
unchecked
{
val = (int)lval;
}
}
else
{
val = default(int);
}
}
I can simply write non-generics but it bothers me that I can't get the generic version to work. Any solution?
While System.Convert.ChangeType is really handy a lot of times but you cannot use it in your scenario. FFFFFFFF in decimal is ‭4294967295 which cannot be represented in int type and conversion fails. Specific code that throws is this (there is a check if long is larger than maximal int value).
If you want this functionality, then you will have to manually write similar code as is in System.Convert.ChangeType - if statements on different possible types with appropriate casts wrapped in unchecked block. Or simply not use generics and have overload for each type that you are interested in.
EDIT: It may be better to remove unchecked blocks altogether and parse hex value directly to appropriate type instead of parsing it first to long. This way you will directly parse it to expected value and receive error if value is out of range instead of just silently ignoring leftover bits with unchecked block.
EDIT2: Negative numbers are represented using two's complement. In short you can get negative number from positive by flipping all bits which represent positive number and adding 1. This means that binary or hex representation of negative number depends on how many bits are allocated for a number. So for 8 bit number (sbyte) -1 is 0xFF, for 16 bit number (short) is 0xFFFF, for 32 bit number (int) is 0xFFFFFFFF and for 64 bit number (long) is 0xFFFFFFFFFFFFFFFF. Since you are parsing hex value 0xFFFFFFFF as long this is not -1, since you are actually parsing 0x0000000FFFFFFFF. What your unchecked block does is that when casting to lower precision number it will just take as many bits as required for lower precision type and discard the rest without any checks. Imagine now that you have 0XF0000000FFFFFFFF. If you parse this as long you get ~ -1 quintillion but with unchecked cast to int you would get -1, totally ignoring the most significant 4 bits.
Ok. Some pain was involved, but here is the answer:
private static bool AssignHex4<T>(string hex, out T val)
{
Type t = typeof(T);
MethodInfo mi = t.GetMethod("TryParse", new Type[] { typeof(string), typeof(NumberStyles), typeof(IFormatProvider), typeof(T).MakeByRefType()});
if (mi != null)
{
object[] parameters = new object[] {hex, NumberStyles.AllowHexSpecifier, null, null};
object result = mi.Invoke(null, parameters);
if ((bool) result)
{
val = (T)parameters[3];
return true;
}
}
val = default(T);
return false;
}
Shout outs are required to a few SO answers that were instrumental in putting this together:
MethodInfo.Invoke with out Parameter
Specifying out params for Type.GetMethod
I'll admit it is not a particularly elegant answer, but I think it is comprehensive and that is what I was looking for.
Consider...
AssignHex4("FF", out byte r1);
AssignHex4("FFFF", out short r2);
AssignHex4("FFFFFFFF", out int r3);
AssignHex4("FFFFFFFFFFFFFFFF", out long r4);
AssignHex4("FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF", out BigInteger r5);
Console.WriteLine("Convert in reflection function returns:" + r1 + ", " + r2 + ", " + r3 + ", " + r4 + ", " + r5);
results in...
Convert in reflection function returns:255, -1, -1, -1, -1

Compare two integer objects for equality regardless of type

I'm wondering how you could compare two boxed integers (either can be signed or unsigned) to each other for equality.
For instance, take a look at this scenario:
// case #1
object int1 = (int)50505;
object int2 = (int)50505;
bool success12 = int1.Equals(int2); // this is true. (pass)
// case #2
int int3 = (int)50505;
ushort int4 = (ushort)50505;
bool success34 = int3.Equals(int4); // this is also true. (pass)
// case #3
object int5 = (int)50505;
object int6 = (ushort)50505;
bool success56 = int5.Equals(int6); // this is false. (fail)
I'm stumped on how to reliably compare boxed integer types this way. I won't know what they are until runtime, and I can't just cast them both to long, because one could be a ulong. I also can't just convert them both to ulong because one could be negative.
The best idea I could come up with is to just trial-and-error-cast until I can find a common type or can rule out that they're not equal, which isn't an ideal solution.
In case 2, you actually end up calling int.Equals(int), because ushort is implicitly convertible to int. This overload resolution is performed at compile-time. It's not available in case 3 because the compiler only knows the type of int5 and int6 as object, so it calls object.Equals(object)... and it's natural that object.Equals will return false if the types of the two objects are different.
You could use dynamic typing to perform the same sort of overload resolution at execution time - but you'd still have a problem if you tried something like:
dynamic x = 10;
dynamic y = (long) 10;
Console.WriteLine(x.Equals(y)); // False
Here there's no overload that will handle long, so it will call the normal object.Equals.
One option is to convert the values to decimal:
object x = (int) 10;
object y = (long) 10;
decimal xd = Convert.ToDecimal(x);
decimal yd = Convert.ToDecimal(y);
Console.WriteLine(xd == yd);
This will handle comparing ulong with long as well.
I've chosen decimal as it can exactly represent every value of every primitive integer type.
Integer is a value type. When you compare two integers types, compiller checks their values.
Object is a reference type. When you compare two objects, compiller checks their references.
The interesting part is here:
object int5 = (int)50505;
Compiller perfoms boxing operation, wraps value type into reference type, and Equals will compare references, not values.

Calculate Length in bytes of an object [duplicate]

I would normally do this in my C++ code:
int variable = 10;
int sizeOfVariable = sizeof(variable); //Returns 4 for 32-bit process
But that doesn't seem to work for C#. Is there an analog?
The sizeof operator in C# works only on compile-time known types, not on variables (instances).
The correct example would be
int variable = 10;
int sizeOfVariable = sizeof(int);
So probably you are looking for Marshal.SizeOf which can be used on any object instances or runtime types.
int variable = 10;
int sizeOfVariable = Marshal.SizeOf(variable);
See here for more information
.NET 4.0 onwards:
if (Environment.Is64BitProcess)
Console.WriteLine("64-bit process");
else
Console.WriteLine("32-bit process");
Older versions of .NET framework:
public static bool Is64BitProcess
{
get { return IntPtr.Size == 8; }
}
(From your example I'm assuming you want to do this to determine the bitness of the process, which may in fact not be what you are trying to do!)
There are only a few standard situations where you'll want to do it:
int x = sizeof(T) // where T is a generic type
sadly it doesn't work :-)
int x = Marshal.SizeOf(T) // where T is a generic type
it does work except for char and bool (Marshal.SizeOf(typeof(char)) == 1 instead of 2, Marshal.SizeOf(typeof(bool)) == 4 instead of 1)
int x = sizeof(IntPtr);
it doesn't work, but you can do it as
int x = Marshal.SizeOf(typeof(IntPtr));
or, better
int x = IntPtr.Size;
All the other basic types (byte, sbyte, short, ushort, int, uint, long, ulong, float, double, decimal, bool, char) have a fixed length, so you can do sizeof(int) and it will always be 4.
You can use the Marshal.SizeOf() method, or use sizeof in unmanaged code:
Console.WriteLine(Marshal.SizeOf(typeof(int)));
This prints 4 on ideone.
Here is a link to Eric Lippert's blog describing the difference between the two sizeof options.
You can use sizeof on user-defined structs in unsafe contexts but unlike Marshal.SizeOf it does not support boxed objects

int promotion to unsigned int in C and C#

Have a look at this C code:
int main()
{
unsigned int y = 10;
int x = -2;
if (x > y)
printf("x is greater");
else
printf("y is greater");
return 0;
}
/*Output: x is greater.*/
I understand why the output is x is greater, because when the computer compares both of them, x is promoted to an unsigned integer type.
When x is promoted to unsigned integer, -2 becomes 65534 which is definitely greater than 10.
But why in C#, does the equivalent code give the opposite result?
public static void Main(String[] args)
{
uint y = 10;
int x = -2;
if (x > y)
{
Console.WriteLine("x is greater");
}
else
{
Console.WriteLine("y is greater");
}
}
//Output: y is greater.
In C#, both uint and int get promoted to a long before the comparison.
This is documented in 4.1.5 Integral types of the C# language spec:
For the binary +, –, *, /, %, &, ^, |, ==, !=, >, <, >=, and <= operators, the operands are converted to type T, where T is the first of int, uint, long, and ulong that can fully represent all possible values of both operands. The operation is then performed using the precision of type T, and the type of the result is T (or bool for the relational operators). It is not permitted for one operand to be of type long and the other to be of type ulong with the binary operators.
Since long is the first type that can fully represent all int and uint values, the variables are both converted to long, then compared.
In C#, in a comparison between an int and uint, both values are promoted to long values.
"Otherwise, if either operand is of type uint and the other operand is of type sbyte, short, or int, both operands are converted to type long."
http://msdn.microsoft.com/en-us/library/aa691330(v=vs.71).aspx
C and C# have differing views for what integral types represent. See my answer https://stackoverflow.com/a/18796084/363751 for some discussion about C's view. In C#, whether integers represent numbers or members of an abstract algebraic ring is determined to some extent by whether "checked arithmetic" is turned on or off, but that simply controls whether out-of-bounds computation should throw exceptions. In general, the .NET framework regards all integer types as representing numbers, and aside from allowing some out-of-bounds computations to be performed without throwing exceptions C# follows its lead.
If unsigned types represent members of an algebraic ring, adding e.g. -5 to an unsigned 2 should yield an unsigned value which, when added to 5, will yield 2. If they represent numbers, then adding a -5 to an unsigned 2 should if possible yield a representation of the number -3. Since promoting the operands to Int64 will allow that to happen, that's what C# does.
Incidentally, I dislike the notions that operators (especially relational operators!) should always work by promoting their operands to a common compatible type, should return a result of that type, and should accept without squawking any combination of operators which can be promoted to a common type. Given float f; long l;, there are at least three sensible meanings for a comparison f==l [it could cast l to float, it could cast l and f to double, or it could ensure that f is a whole number which can be cast to long, and that when cast it equals l]. Alternatively, a compiler could simply reject such a mixed comparison. If I had by druthers, compilers would be enjoined from casting the operands to relational operators except in cases where there was only one plausible meaning. Requiring that things which are implicitly convertible everywhere must be directly comparable is IMHO unhelpful.

Why are negative enum members enumerated last by foreach?

In C#, if we define an enum that contains a member correspondingto a negative value, and then we iterate over that enum's values, the negative value does not come first, but last. Why does that happen? In other languages (C, C++, Ada, etc), iterating over an enum will give you the order in which you defined it.
MSDN has a good example of this behavior:
using System;
enum SignMagnitude { Negative = -1, Zero = 0, Positive = 1 };
public class Example
{
public static void Main()
{
foreach (var value in Enum.GetValues(typeof(SignMagnitude)))
{
Console.WriteLine("{0,3} 0x{0:X8} {1}",
(int) value, ((SignMagnitude) value));
}
}
}
// The example displays the following output:
// 0 0x00000000 Zero
// 1 0x00000001 Positive
// -1 0xFFFFFFFF Negative
From the very documentation page you link to, my emphasis:
The elements of the array are sorted by the binary values of the enumeration constants (that is, by their unsigned magnitude).
Digging into the CLR code (the 2.0 SSCLI) and getting far lower-level than I'm really comfortable with, it looks like ultimately this is because internally enum values are stored in something that looks like this (note this is C++):
class EnumEEClass : public EEClass
{
friend class EEClass;
private:
DWORD m_countPlusOne; // biased by 1 so zero can be used as uninit flag
union
{
void *m_values;
BYTE *m_byteValues;
USHORT *m_shortValues;
UINT *m_intValues;
UINT64 *m_longValues;
};
LPCUTF8 *m_names;
As can be seen, it's unsigned types that hold the actual values - so when these values are emitted for enumeration, naturally they are in their unsigned order.

Categories

Resources