Related
Are the types implemented the exact same? This seems weird from a "type-safe" perspective. What's really going on here?
I came across this scenario when in my code I was able to do what's shown with the "b" variable. So playing around with the different types I kind of realize what's happening, but am curious how this is happening under the hood.
Char a2 = '0';
UInt16 a = a2;
// Int16 a = '2'; // Compile time error
Int32 b = '1';
Int64 c = '2';
They are not defined in terms of each other, but conversions from one to the other are defined. See MSDN: C# Type Conversions Table
An implicit type conversion is defined from the char data type to:
ushort (alias of UInt16)
int
uint
long
ulong
float
double
decimal
No implicit type conversation to short is defined, because such a conversion would result in a loss of precision. A char is defined in C# as "Unicode 16-bit character". Using a bit to indicate the sign would result in a possible loss of precision.
See also Eric Lippert's blog post, Why does char convert implicitly to ushort by not vice versa
UInt16 = ushort = char = 0..65535 = 2 bytes unsigned = 16bits
So you can put a char in a UInt16 that is a ushort.
Int16 = short != char = -32768..32767 = 2 bytes signed = 15bits + 1sign
So without using a valid cast or a conversion you can't put a char in an Int16 because there is not enough binary space to put 16bits in 15bits.
It is like if you try to put 1 liter of water in a cup of tea : you get an overflow.
The default behavior in C# is to not throw an eception when overflowing.
To have it there is the keyword checked.
Here the compiler does not let you to assign a char to Int16 because he know explicitly what you try to do and refuse to do that.
Learn the basics about C# data types' variables
Unsigned and signed
Shifting Behavior for Signed Integers
A Tutorial on Data Representation
Integer Overflow
Binary Overflow
I'm surprised by C# compiler behavior in the following example:
int i = 1024;
uint x = 2048;
x = x+i; // A error CS0266: Cannot implicitly convert type 'long' to 'uint' ...
It seems OK as int + uint can overflow. However, if uint is changed to int, then error disappears, like int + int cannot give overflow:
int i = 1024;
int x = 2048;
x = x+i; // OK, int
Moreover, uint + uint = uint:
uint i = 1024;
uint x = 2048;
x = x+i; // OK, uint
It seems totally obscure.
Why int + int = int and uint + uint = uint, but int + uint = long?
What is the motivation for this decision?
Why int + int = int and uint + uint = uint, but int + uint = long? What is the motivation for this decision?
The way the question is phrased implies the presupposition that the design team wanted int + uint to be long, and chose type rules to attain that goal. That presupposition is false.
Rather, the design team thought:
What mathematical operations are people most likely to perform?
What mathematical operations can be performed safely and efficiently?
What conversions between numeric types can be performed without loss of magnitude and precision?
How can the rules for operator resolution be made both simple and consistent with the rules for method overload resolution?
As well as many other considerations such as whether the design works for or against debuggable, maintainable, versionable programs, and so on. (I note that I was not in the room for this particular design meeting, as it predated my time on the design team. But I have read their notes and know the kinds of things that would have concerned the design team during this period.)
Investigating these questions led to the present design: that arithmetic operations are defined as int + int --> int, uint + uint --> uint, long + long --> long, int may be converted to long, uint may be converted to long, and so on.
A consequence of these decisions is that when adding uint + int, overload resolution chooses long + long as the closest match, and long + long is long, therefore uint + int is long.
Making uint + int have some more different behavior that you might consider more sensible was not a design goal of the team at all because mixing signed and unsigned values is first, rare in practice, and second, almost always a bug. The design team could have added special cases for every combination of signed and unsigned one, two, four, and eight byte integers, as well as char, float, double and decimal, or any subset of those many hundreds of cases, but that works against the goal of simplicity.
So in short, on the one hand we have a large amount of design work to make a feature that we want no one to actually use easier to use at the cost of a massively complicated specification. On the other hand we have a simple specification that produces an unusual behavior in a rare case we expect no one to encounter in practice. Given those choices, which would you choose? The C# design team chose the latter.
The short answer is "because the Standard says that it shall be so", which see the informative §14.2.5.2 of ISO 23270. The normative §13.1.2. (Implicit numeric conversions) says:
The implicit numeric conversions are:
...
From int to long, float, double, or decimal.
From uint to long, ulong, float, double, or decimal.
...
Conversions from int, uint, long or ulong to float and from long or ulong to double
can cause a loss of precision, but will never cause a loss of magnitude.
The other implicit numeric conversions never lose any information. (emph. mine)
The [slightly] longer answer is that you are adding two different types: a 32-bit signed integer and a 32-bit unsigned integer:
the domain of a signed 32-bit integer is -2,147,483,648 (0x80000000) — +2,147,483,647 (0x7FFFFFFF).
the domain of an unsigned 32-bit integer is 0 (0x00000000) — +4,294,967,295 (0xFFFFFFFF).
So the types aren't compatable, since an int can't contain any arbitrary uint and a uint can't contain any arbitrary int. They are implicitly converted (a widening conversion, per the requirement of §13.1.2 that no information be lost) to the next largest type that can contain both: a long in this case, a signed 64-bit integer, which has the domain -9,223,372,036,854,775,808 (0x8000000000000000) — +9,223,372,036,854,775,807 (0x7FFFFFFFFFFFFFFF).
Edited to note: Just as an aside, Executing this code:
var x = 1024 + 2048u ;
Console.WriteLine( "'x' is an instance of `{0}`" , x.GetType().FullName ) ;
does not yield a long as the original poster's example. Instead, what is produced is:
'x' is an instance of `System.UInt32`
This is because of constant folding. The first element in the expression, 1024 has no suffix and as such is an int and the second element in the expression 2048u is a uint, according to the rules:
If the literal has no suffix, it has the first of these types in which its value
can be represented: int, uint, long, ulong.
If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.
And since the optimizer knows what the values are, the sum is precomputed and evaluated as a uint.
Consistency is the hobgoblin of little minds.
This is a manifestation of overload resolution for numeric types
Numeric promotion consists of automatically performing certain implicit conversions of the operands of the predefined unary and binary numeric operators. Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators. Numeric promotion specifically does not affect evaluation of user-defined operators, although user-defined operators can be implemented to exhibit similar effects.
http://msdn.microsoft.com/en-us/library/aa691328(v=vs.71).aspx
If you have a look at
long operator *(long x, long y);
uint operator *(uint x, uint y);
from that link, you see those are two possible overloads (the example refers to operator *, but the same is true for operator +).
The uint is implicitly converted to a long for overload resolution, as is int.
From uint to long, ulong, float, double, or decimal.
From int to long, float, double, or decimal.
http://msdn.microsoft.com/en-us/library/aa691282(v=vs.71).aspx
What is the motivation for this decision?
It would likely take a member of the design team to answer that aspect. Eric Lippert, where are you? :-) Note though that #Nicolas's reasoning below is very plausible, that both operands are converted to the "smallest" type that can contain the full range of values for each operand.
I think the behavior of the compiler is pretty logical and expected.
In the following code:
int i;
int j;
var k = i + j;
There is an exact overload for this operation, so k is int. Same logic applies when adding two uint, two byte or what have you. The compiler's job is easy here, its happy because the overload resolution finds an exact match. There is a pretty good chance that the person writing this code expects k to be an int and is aware that the operation can overflow in certain circumstances.
Now consider the case you are asking about:
uint i;
int j;
var k = i + j;
What does the compiler see? Well it sees an operation that has no matching overload; there is no operator + overload that takes an int and a uint as its two operands. So the overload resolution algorithm goes ahead and tries to find an operator overload that can be valid. This means it has to find an overload where the types involved can "hold" the original operands; that is, both i and j have to be implicitly convertible to said type(s).
The compiler can't implicitly convert uint to int because such conversion doesn't exsit. It cant implicitly convert int to uint either because that conversion also doesn't exist (both can cause a change in magnitude). So the only choice it really has is to choose the first broader type that can "hold" both operand types, which in this case is long. Once both operands are implicitly converted to long k being long is obvious.
The motivation of this behavior is, IMO, to choose the safest available option and not second guess the dubious coder's intent. The compiler can not make an educated guess as to what the person writing this code is expecting k to be. An int? Well, why not an uint? Both options seem equally bad. The compiler chooses the only logical path; the safe one: long. If the coder wants k to be either int or unit he only has to explicitly cast one of the operands.
And last, but not least, the C# compiler's overload resolution algorithm does not consider the return type when deciding the best overload. So the fact that you are storing the operation result in a uint is completely irrelevant to the compiler and has no effect whatsoever in the overload resolution process.
This is all speculation on my part, and I may be completely wrong. But it does seem logical reasoning.
i int i = 1024;
uint x = 2048;
// Technique #1
x = x + Convert.ToUInt32(i);
// Technique #2
x = x + checked((uint)i);
// Technique #3
x = x + unchecked((uint) i);
// Technique #4
x = x + (uint)i;
The numerical promotion rules for C# are loosely based upon those of Java and C, which work by identifying a type to which both operands can be converted and then making the result be the same type. I think such an approach was reasonable in the 1980s, but newer languages should set it aside in favor of one that looks at how values are used (e.g. If I were designing a language, then given Int32 i1,i2,i3; Int64 l; a compiler would process i4=i1+i2+i3; using 32-bit math [throwing an exception in case of overflow] would would process l=i1+i2+i3; with 64-bit math.) but the C# rules are what they are and don't seem likely to change.
It should be noted that the C# promotion rules by definition always select the overloads which are deemed "most suitable" by the language specification, but that doesn't mean they're really the most suitable for any useful purpose. For example, double f=1111111100/11111111.0f; would seem like it should yield 100.0, and it would be correctly computed if both operands were promoted to double, but the compiler will instead convert the integer 1111111100 to float yielding 1111111040.0f, and then perform the division yielding 99.999992370605469.
I'm surprised by C# compiler behavior in the following example:
int i = 1024;
uint x = 2048;
x = x+i; // A error CS0266: Cannot implicitly convert type 'long' to 'uint' ...
It seems OK as int + uint can overflow. However, if uint is changed to int, then error disappears, like int + int cannot give overflow:
int i = 1024;
int x = 2048;
x = x+i; // OK, int
Moreover, uint + uint = uint:
uint i = 1024;
uint x = 2048;
x = x+i; // OK, uint
It seems totally obscure.
Why int + int = int and uint + uint = uint, but int + uint = long?
What is the motivation for this decision?
Why int + int = int and uint + uint = uint, but int + uint = long? What is the motivation for this decision?
The way the question is phrased implies the presupposition that the design team wanted int + uint to be long, and chose type rules to attain that goal. That presupposition is false.
Rather, the design team thought:
What mathematical operations are people most likely to perform?
What mathematical operations can be performed safely and efficiently?
What conversions between numeric types can be performed without loss of magnitude and precision?
How can the rules for operator resolution be made both simple and consistent with the rules for method overload resolution?
As well as many other considerations such as whether the design works for or against debuggable, maintainable, versionable programs, and so on. (I note that I was not in the room for this particular design meeting, as it predated my time on the design team. But I have read their notes and know the kinds of things that would have concerned the design team during this period.)
Investigating these questions led to the present design: that arithmetic operations are defined as int + int --> int, uint + uint --> uint, long + long --> long, int may be converted to long, uint may be converted to long, and so on.
A consequence of these decisions is that when adding uint + int, overload resolution chooses long + long as the closest match, and long + long is long, therefore uint + int is long.
Making uint + int have some more different behavior that you might consider more sensible was not a design goal of the team at all because mixing signed and unsigned values is first, rare in practice, and second, almost always a bug. The design team could have added special cases for every combination of signed and unsigned one, two, four, and eight byte integers, as well as char, float, double and decimal, or any subset of those many hundreds of cases, but that works against the goal of simplicity.
So in short, on the one hand we have a large amount of design work to make a feature that we want no one to actually use easier to use at the cost of a massively complicated specification. On the other hand we have a simple specification that produces an unusual behavior in a rare case we expect no one to encounter in practice. Given those choices, which would you choose? The C# design team chose the latter.
The short answer is "because the Standard says that it shall be so", which see the informative §14.2.5.2 of ISO 23270. The normative §13.1.2. (Implicit numeric conversions) says:
The implicit numeric conversions are:
...
From int to long, float, double, or decimal.
From uint to long, ulong, float, double, or decimal.
...
Conversions from int, uint, long or ulong to float and from long or ulong to double
can cause a loss of precision, but will never cause a loss of magnitude.
The other implicit numeric conversions never lose any information. (emph. mine)
The [slightly] longer answer is that you are adding two different types: a 32-bit signed integer and a 32-bit unsigned integer:
the domain of a signed 32-bit integer is -2,147,483,648 (0x80000000) — +2,147,483,647 (0x7FFFFFFF).
the domain of an unsigned 32-bit integer is 0 (0x00000000) — +4,294,967,295 (0xFFFFFFFF).
So the types aren't compatable, since an int can't contain any arbitrary uint and a uint can't contain any arbitrary int. They are implicitly converted (a widening conversion, per the requirement of §13.1.2 that no information be lost) to the next largest type that can contain both: a long in this case, a signed 64-bit integer, which has the domain -9,223,372,036,854,775,808 (0x8000000000000000) — +9,223,372,036,854,775,807 (0x7FFFFFFFFFFFFFFF).
Edited to note: Just as an aside, Executing this code:
var x = 1024 + 2048u ;
Console.WriteLine( "'x' is an instance of `{0}`" , x.GetType().FullName ) ;
does not yield a long as the original poster's example. Instead, what is produced is:
'x' is an instance of `System.UInt32`
This is because of constant folding. The first element in the expression, 1024 has no suffix and as such is an int and the second element in the expression 2048u is a uint, according to the rules:
If the literal has no suffix, it has the first of these types in which its value
can be represented: int, uint, long, ulong.
If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.
And since the optimizer knows what the values are, the sum is precomputed and evaluated as a uint.
Consistency is the hobgoblin of little minds.
This is a manifestation of overload resolution for numeric types
Numeric promotion consists of automatically performing certain implicit conversions of the operands of the predefined unary and binary numeric operators. Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators. Numeric promotion specifically does not affect evaluation of user-defined operators, although user-defined operators can be implemented to exhibit similar effects.
http://msdn.microsoft.com/en-us/library/aa691328(v=vs.71).aspx
If you have a look at
long operator *(long x, long y);
uint operator *(uint x, uint y);
from that link, you see those are two possible overloads (the example refers to operator *, but the same is true for operator +).
The uint is implicitly converted to a long for overload resolution, as is int.
From uint to long, ulong, float, double, or decimal.
From int to long, float, double, or decimal.
http://msdn.microsoft.com/en-us/library/aa691282(v=vs.71).aspx
What is the motivation for this decision?
It would likely take a member of the design team to answer that aspect. Eric Lippert, where are you? :-) Note though that #Nicolas's reasoning below is very plausible, that both operands are converted to the "smallest" type that can contain the full range of values for each operand.
I think the behavior of the compiler is pretty logical and expected.
In the following code:
int i;
int j;
var k = i + j;
There is an exact overload for this operation, so k is int. Same logic applies when adding two uint, two byte or what have you. The compiler's job is easy here, its happy because the overload resolution finds an exact match. There is a pretty good chance that the person writing this code expects k to be an int and is aware that the operation can overflow in certain circumstances.
Now consider the case you are asking about:
uint i;
int j;
var k = i + j;
What does the compiler see? Well it sees an operation that has no matching overload; there is no operator + overload that takes an int and a uint as its two operands. So the overload resolution algorithm goes ahead and tries to find an operator overload that can be valid. This means it has to find an overload where the types involved can "hold" the original operands; that is, both i and j have to be implicitly convertible to said type(s).
The compiler can't implicitly convert uint to int because such conversion doesn't exsit. It cant implicitly convert int to uint either because that conversion also doesn't exist (both can cause a change in magnitude). So the only choice it really has is to choose the first broader type that can "hold" both operand types, which in this case is long. Once both operands are implicitly converted to long k being long is obvious.
The motivation of this behavior is, IMO, to choose the safest available option and not second guess the dubious coder's intent. The compiler can not make an educated guess as to what the person writing this code is expecting k to be. An int? Well, why not an uint? Both options seem equally bad. The compiler chooses the only logical path; the safe one: long. If the coder wants k to be either int or unit he only has to explicitly cast one of the operands.
And last, but not least, the C# compiler's overload resolution algorithm does not consider the return type when deciding the best overload. So the fact that you are storing the operation result in a uint is completely irrelevant to the compiler and has no effect whatsoever in the overload resolution process.
This is all speculation on my part, and I may be completely wrong. But it does seem logical reasoning.
i int i = 1024;
uint x = 2048;
// Technique #1
x = x + Convert.ToUInt32(i);
// Technique #2
x = x + checked((uint)i);
// Technique #3
x = x + unchecked((uint) i);
// Technique #4
x = x + (uint)i;
The numerical promotion rules for C# are loosely based upon those of Java and C, which work by identifying a type to which both operands can be converted and then making the result be the same type. I think such an approach was reasonable in the 1980s, but newer languages should set it aside in favor of one that looks at how values are used (e.g. If I were designing a language, then given Int32 i1,i2,i3; Int64 l; a compiler would process i4=i1+i2+i3; using 32-bit math [throwing an exception in case of overflow] would would process l=i1+i2+i3; with 64-bit math.) but the C# rules are what they are and don't seem likely to change.
It should be noted that the C# promotion rules by definition always select the overloads which are deemed "most suitable" by the language specification, but that doesn't mean they're really the most suitable for any useful purpose. For example, double f=1111111100/11111111.0f; would seem like it should yield 100.0, and it would be correctly computed if both operands were promoted to double, but the compiler will instead convert the integer 1111111100 to float yielding 1111111040.0f, and then perform the division yielding 99.999992370605469.
Following is the c# code:
static void Main(string[] args)
{
uint y = 12;
int x = -2;
if (x > y)
Console.WriteLine("x is greater");
else
Console.WriteLine("y is greater");
}
and this is c++ code:
int _tmain(int argc, _TCHAR* argv[])
{
unsigned int y = 12;
int x = -2;
if(x>y)
printf("x is greater");
else
printf("y is greater");
return 0;
}
Both are giving different result. Am I missing something basic? Any idea?
C++ and C# are different languages. They have different rules for handling type promotion in the event of comparisons.
In C++ and C, they're usually compared as if they were both unsigned. This is called "unsigned preserving". C++ and C compilers traditionally use "unsigned preserving" and the use of this is specified in the C++ standard and in K&R.
In C#, they're both converted to signed longs and then compared. This is called "value preserving". C# specifies value preserving.
ANSI C also specifies value preserving, but only when dealing with shorts and chars. Shorts and chars (signed and unsigned) are upconverted to ints in a value-preserving manner and then compared. So if an unsigned short were compared to a signed short, the result would come out like the C# example. Any time a conversion to a larger size is done, it's done in a value-preserving manner, but if the two variables are the same size (and not shorts or chars) and either one is unsigned, then they get compared as unsigned quantities in ANSI C. There's a good discussion of the up and down sides of both approaches in the comp.lang.c FAQ.
In C++, when you compare an unsigned int and a signed int, the signed int is converted to unsigned int. Converting a negative signed int to an unsigned int is done by adding UINT_MAX + 1, which is larger than 12 and hence the result.
In C#, if you are getting the opposite result then it means that in C# both the expressions are being converted to signed int signed long (long or System.Int64)1 and then compared.
In C++, your compiler must have given you the warning:
warning: comparison between signed and unsigned integer expressions
Rule:
Always take warnings emitted by the compiler seriously!
1 As rightly pointed out by svick in comments.
I don't know about the standard of C#, but in the C++ standard, usual arithmetic conversions would be applied to both operands of relational operators:
[......enum, floating point type involed......]
— Otherwise, the integral promotions (4.5) shall be performed on both operands.
Then the following rules shall be applied to the promoted operands:
— If both operands have the same type, no further conversion is needed.
— Otherwise, if both operands have signed integer types or both have
unsigned integer types, the operand with the type of lesser integer
conversion rank shall be converted to the type of the operand with
greater rank.
— Otherwise, if the operand that has unsigned integer type has rank
greater than or equal to the rank of the type of the other operand, the
operand with signed integer type shall be converted to the type of the
operand with unsigned integer type.
— Otherwise, if the type of the operand with signed integer type can
represent all of the values of the type of the operand with unsigned
integer type, the operand with unsigned integer type shall be converted
to the type of the operand with signed integer type.
— Otherwise, both operands shall be converted to the unsigned integer type
corresponding to the type of the operand with signed integer type.
Thus, when unsigned int is compared with int, int would be converted to unsigned int, and -2 would become a very large number when converted to unsigned int.
I'm a Java programmer trying to migrate to C#, and this gotcha has me slightly stumped:
int a = 1;
a = 0x08000000 | a;
a = 0x80000000 | a;
The first line compiles just fine. The second does not. It seems to recognise that there is a constant with a sign bit, and for some reason it decides to cast the result to a long, resulting in the error:
Cannot implicitly convert type 'long' to 'int'.
An explicit conversion exists (are you missing a cast?)
The fix I have so far is:
a = (int)(0x80000000 | a);
Which deals with the cast but still leaves a warning:
Bitwise-or operator used on a sign-extended operand;
consider casting to a smaller unsigned type first
What would be the correct C# way to express this in an error/warning/long-free way?
I find it interesting that in all these answers, only one person actually suggested doing what the warning says. The warning is telling you how to fix the problem; pay attention to it.
Bitwise-or operator used on a sign-extended operand; consider casting to a smaller unsigned type first
The bitwise or operator is being used on a sign-extended operand: the int. That's causing the result to be converted to a larger type: long. An unsigned type smaller than long is uint. So do what the warning says; cast the sign-extended operand -- the int -- to uint:
result = (int)(0x80000000 | (uint) operand);
Now there is no sign extension.
Of course this just raises the larger question: why are you treating a signed integer as a bitfield in the first place? This seems like a dangerous thing to do.
A numeric integer literal is an int by default, unless the number is too large to fit in an int and it becomes an uint instead (and so on for long and ulong).
As the value 0x80000000 is too large to fit in an int, it's an uint value. When you use the | operator on an int and an uint both are extended to long as neither can be safely converted to the other.
The value can be represented as an int, but then you have to ignore that it becomes a negative value. The compiler won't do that silently, so you have to instruct it to make the value an int without caring about the overflow:
a = unchecked((int)0x80000000) | a;
(Note: This only instructs the compiler how to convert the value, so there is no code created for doing the conversion to int.)
Your issue is because 0x80000000 is a minus in int form and you cannot perform bitwise operations on minus values.
It should work fine if you use a uint.
a = ((uint)0x80000000) | a; //assuming a is a uint
Changing that line to
(int)((uint)0x80000000 | (uint)a);
does it for me.
The problem you have here is that
0x80000000 is an unsigned integer literal. The specification says that an integer literal is of the first type in the list (int, uint, long, ulong) which can hold the literal. In this case it is uint.
a is probably an int
This causes the result to be a long. I don't see a nicer way other than cast the result back to int, unless you know that a can't be negative. Then you can cast a to uint or declare it that way in the first place.
You can make it less ugly by creating a constant that starts with the letter 'O'. (And unlike on this website, in visual studio, it doesn't show it in a different color).
const int Ox80000000 = unchecked((int)0x80000000);
/// ...
const int every /* */ = Ox80000000;
const int thing /* */ = 0x40000000;
const int lines /* */ = 0x20000000;
const int up /* */ = 0x10000000;