Related
If I have two bytes a and b, how come:
byte c = a & b;
produces a compiler error about casting byte to int? It does this even if I put an explicit cast in front of a and b.
Also, I know about this question, but I don't really know how it applies here. This seems like it's a question of the return type of operator &(byte operand, byte operand2), which the compiler should be able to sort out just like any other operator.
Why do C#'s bitwise operators always return int regardless of the format of their inputs?
I disagree with always. This works and the result of a & b is of type long:
long a = 0xffffffffffff;
long b = 0xffffffffffff;
long x = a & b;
The return type is not int if one or both of the arguments are long, ulong or uint.
Why do C#'s bitwise operators return int if their inputs are bytes?
The result of byte & byte is an int because there is no & operator defined on byte. (Source)
An & operator exists for int and there is also an implicit cast from byte to int so when you write byte1 & byte2 this is effectively the same as writing ((int)byte1) & ((int)byte2) and the result of this is an int.
This behavior is a consequence of the design of IL, the intermediate language generated by all .NET compilers. While it supports the short integer types (byte, sbyte, short, ushort), it has only a very limited number of operations on them. Load, store, convert, create array, that's all. This is not an accident, those are the kind of operations you could execute efficiently on a 32-bit processor, back when IL was designed and RISC was the future.
The binary comparison and branch operations only work on int32, int64, native int, native floating point, object and managed reference. These operands are 32-bits or 64-bits on any current CPU core, ensuring the JIT compiler can generate efficient machine code.
You can read more about it in the Ecma 335, Partition I, chapter 12.1 and Partition III, chapter 1.5
I wrote a more extensive post about this over here.
Binary operators are not defined for byte types (among others). In fact, all binary (numeric) operators act only on the following native types:
int
uint
long
ulong
float
double
decimal
If there are any other types involved, it will use one of the above.
It's all in the C# specs version 5.0 (Section 7.3.6.2):
Binary numeric promotion occurs for the operands of the predefined +, –, *, /, %, &, |, ^, ==, !=, >, <, >=, and <= binary operators. Binary numeric promotion implicitly converts both operands to a common type which, in case of the non-relational operators, also becomes the result type of the operation. Binary numeric promotion consists of applying the following rules, in the order they appear here:
If either operand is of type decimal, the other operand is converted to type decimal, or a compile-time error occurs if the other operand is of type float or double.
Otherwise, if either operand is of type double, the other operand is converted to type double.
Otherwise, if either operand is of type float, the other operand is converted to type float.
Otherwise, if either operand is of type ulong, the other operand is converted to type ulong, or a compile-time error occurs if the other operand is of type sbyte, short, int, or long.
Otherwise, if either operand is of type long, the other operand is converted to type long.
Otherwise, if either operand is of type uint and the other operand is of type sbyte, short, or int, both operands are converted to type long.
Otherwise, if either operand is of type uint, the other operand is converted to type uint.
Otherwise, both operands are converted to type int.
It's because & is defined on integers, not on bytes, and the compiler implicitly casts your two arguments to int.
I'm surprised by C# compiler behavior in the following example:
int i = 1024;
uint x = 2048;
x = x+i; // A error CS0266: Cannot implicitly convert type 'long' to 'uint' ...
It seems OK as int + uint can overflow. However, if uint is changed to int, then error disappears, like int + int cannot give overflow:
int i = 1024;
int x = 2048;
x = x+i; // OK, int
Moreover, uint + uint = uint:
uint i = 1024;
uint x = 2048;
x = x+i; // OK, uint
It seems totally obscure.
Why int + int = int and uint + uint = uint, but int + uint = long?
What is the motivation for this decision?
Why int + int = int and uint + uint = uint, but int + uint = long? What is the motivation for this decision?
The way the question is phrased implies the presupposition that the design team wanted int + uint to be long, and chose type rules to attain that goal. That presupposition is false.
Rather, the design team thought:
What mathematical operations are people most likely to perform?
What mathematical operations can be performed safely and efficiently?
What conversions between numeric types can be performed without loss of magnitude and precision?
How can the rules for operator resolution be made both simple and consistent with the rules for method overload resolution?
As well as many other considerations such as whether the design works for or against debuggable, maintainable, versionable programs, and so on. (I note that I was not in the room for this particular design meeting, as it predated my time on the design team. But I have read their notes and know the kinds of things that would have concerned the design team during this period.)
Investigating these questions led to the present design: that arithmetic operations are defined as int + int --> int, uint + uint --> uint, long + long --> long, int may be converted to long, uint may be converted to long, and so on.
A consequence of these decisions is that when adding uint + int, overload resolution chooses long + long as the closest match, and long + long is long, therefore uint + int is long.
Making uint + int have some more different behavior that you might consider more sensible was not a design goal of the team at all because mixing signed and unsigned values is first, rare in practice, and second, almost always a bug. The design team could have added special cases for every combination of signed and unsigned one, two, four, and eight byte integers, as well as char, float, double and decimal, or any subset of those many hundreds of cases, but that works against the goal of simplicity.
So in short, on the one hand we have a large amount of design work to make a feature that we want no one to actually use easier to use at the cost of a massively complicated specification. On the other hand we have a simple specification that produces an unusual behavior in a rare case we expect no one to encounter in practice. Given those choices, which would you choose? The C# design team chose the latter.
The short answer is "because the Standard says that it shall be so", which see the informative §14.2.5.2 of ISO 23270. The normative §13.1.2. (Implicit numeric conversions) says:
The implicit numeric conversions are:
...
From int to long, float, double, or decimal.
From uint to long, ulong, float, double, or decimal.
...
Conversions from int, uint, long or ulong to float and from long or ulong to double
can cause a loss of precision, but will never cause a loss of magnitude.
The other implicit numeric conversions never lose any information. (emph. mine)
The [slightly] longer answer is that you are adding two different types: a 32-bit signed integer and a 32-bit unsigned integer:
the domain of a signed 32-bit integer is -2,147,483,648 (0x80000000) — +2,147,483,647 (0x7FFFFFFF).
the domain of an unsigned 32-bit integer is 0 (0x00000000) — +4,294,967,295 (0xFFFFFFFF).
So the types aren't compatable, since an int can't contain any arbitrary uint and a uint can't contain any arbitrary int. They are implicitly converted (a widening conversion, per the requirement of §13.1.2 that no information be lost) to the next largest type that can contain both: a long in this case, a signed 64-bit integer, which has the domain -9,223,372,036,854,775,808 (0x8000000000000000) — +9,223,372,036,854,775,807 (0x7FFFFFFFFFFFFFFF).
Edited to note: Just as an aside, Executing this code:
var x = 1024 + 2048u ;
Console.WriteLine( "'x' is an instance of `{0}`" , x.GetType().FullName ) ;
does not yield a long as the original poster's example. Instead, what is produced is:
'x' is an instance of `System.UInt32`
This is because of constant folding. The first element in the expression, 1024 has no suffix and as such is an int and the second element in the expression 2048u is a uint, according to the rules:
If the literal has no suffix, it has the first of these types in which its value
can be represented: int, uint, long, ulong.
If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.
And since the optimizer knows what the values are, the sum is precomputed and evaluated as a uint.
Consistency is the hobgoblin of little minds.
This is a manifestation of overload resolution for numeric types
Numeric promotion consists of automatically performing certain implicit conversions of the operands of the predefined unary and binary numeric operators. Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators. Numeric promotion specifically does not affect evaluation of user-defined operators, although user-defined operators can be implemented to exhibit similar effects.
http://msdn.microsoft.com/en-us/library/aa691328(v=vs.71).aspx
If you have a look at
long operator *(long x, long y);
uint operator *(uint x, uint y);
from that link, you see those are two possible overloads (the example refers to operator *, but the same is true for operator +).
The uint is implicitly converted to a long for overload resolution, as is int.
From uint to long, ulong, float, double, or decimal.
From int to long, float, double, or decimal.
http://msdn.microsoft.com/en-us/library/aa691282(v=vs.71).aspx
What is the motivation for this decision?
It would likely take a member of the design team to answer that aspect. Eric Lippert, where are you? :-) Note though that #Nicolas's reasoning below is very plausible, that both operands are converted to the "smallest" type that can contain the full range of values for each operand.
I think the behavior of the compiler is pretty logical and expected.
In the following code:
int i;
int j;
var k = i + j;
There is an exact overload for this operation, so k is int. Same logic applies when adding two uint, two byte or what have you. The compiler's job is easy here, its happy because the overload resolution finds an exact match. There is a pretty good chance that the person writing this code expects k to be an int and is aware that the operation can overflow in certain circumstances.
Now consider the case you are asking about:
uint i;
int j;
var k = i + j;
What does the compiler see? Well it sees an operation that has no matching overload; there is no operator + overload that takes an int and a uint as its two operands. So the overload resolution algorithm goes ahead and tries to find an operator overload that can be valid. This means it has to find an overload where the types involved can "hold" the original operands; that is, both i and j have to be implicitly convertible to said type(s).
The compiler can't implicitly convert uint to int because such conversion doesn't exsit. It cant implicitly convert int to uint either because that conversion also doesn't exist (both can cause a change in magnitude). So the only choice it really has is to choose the first broader type that can "hold" both operand types, which in this case is long. Once both operands are implicitly converted to long k being long is obvious.
The motivation of this behavior is, IMO, to choose the safest available option and not second guess the dubious coder's intent. The compiler can not make an educated guess as to what the person writing this code is expecting k to be. An int? Well, why not an uint? Both options seem equally bad. The compiler chooses the only logical path; the safe one: long. If the coder wants k to be either int or unit he only has to explicitly cast one of the operands.
And last, but not least, the C# compiler's overload resolution algorithm does not consider the return type when deciding the best overload. So the fact that you are storing the operation result in a uint is completely irrelevant to the compiler and has no effect whatsoever in the overload resolution process.
This is all speculation on my part, and I may be completely wrong. But it does seem logical reasoning.
i int i = 1024;
uint x = 2048;
// Technique #1
x = x + Convert.ToUInt32(i);
// Technique #2
x = x + checked((uint)i);
// Technique #3
x = x + unchecked((uint) i);
// Technique #4
x = x + (uint)i;
The numerical promotion rules for C# are loosely based upon those of Java and C, which work by identifying a type to which both operands can be converted and then making the result be the same type. I think such an approach was reasonable in the 1980s, but newer languages should set it aside in favor of one that looks at how values are used (e.g. If I were designing a language, then given Int32 i1,i2,i3; Int64 l; a compiler would process i4=i1+i2+i3; using 32-bit math [throwing an exception in case of overflow] would would process l=i1+i2+i3; with 64-bit math.) but the C# rules are what they are and don't seem likely to change.
It should be noted that the C# promotion rules by definition always select the overloads which are deemed "most suitable" by the language specification, but that doesn't mean they're really the most suitable for any useful purpose. For example, double f=1111111100/11111111.0f; would seem like it should yield 100.0, and it would be correctly computed if both operands were promoted to double, but the compiler will instead convert the integer 1111111100 to float yielding 1111111040.0f, and then perform the division yielding 99.999992370605469.
As per the MSDN article: https://msdn.microsoft.com/en-us/library/8s682k58%28v=vs.80%29.aspx
Explicit conversion is required by some compilers to support narrowing
conversions..
Based on the statements made in the above msdn link, is it safe to say that
Converting int to uint is narrowing conversion
Also,
Converting uint to int is narrowing conversion
?
Converting an integral or floating point type from signed to unsigned and vice-versa is neither a narrowing nor a widening conversion, since the number of bits used to store it remains unchanged.
Instead, it is a change of representation and can utterly change the number (e.g. signed -1 is converted to unsigned 0xffffffff).
In fact, if you use unchecked arithmetic:
unchecked
{
int x = -1;
uint y = (uint) x;
int z = (int) y;
Debug.Assert(x == z); // Will succeed for all x.
uint a = 0xffffffff;
int b = (int) a;
uint c = (uint) b;
Debug.Assert(a == c); // Will succeed for all a.
}
So a round trip works in both directions, which proves that narrowing does not occur in either direction.
Narrowing and widening only applies to integral and floating point types where a different number of bits are used to store one or more parts of the number.
However, because the number's value can be changed, you must cast to make such a conversion, to ensure that you don't do it accidentally.
It is not narrowing, but you must still be explicit. Both can hold values that the other doesn't have. int can hold negative values, which uint can't and uint can hold values greater than 2147483647, which is the maximum value for an int.
However, both int and uint use 32 bits (= 4 bytes) to hold information.
When you convert an int to a byte, you will lose information, because a byte can only hold 8 bits.
Converting between int and uint doesn't lose any bits, but the bits have a different meaning. (in 50% of the cases)
You must be explicit about it to indicate that you know what you are doing. It will prevent mistakes that would be caused by an implicit cast.
I'm surprised by C# compiler behavior in the following example:
int i = 1024;
uint x = 2048;
x = x+i; // A error CS0266: Cannot implicitly convert type 'long' to 'uint' ...
It seems OK as int + uint can overflow. However, if uint is changed to int, then error disappears, like int + int cannot give overflow:
int i = 1024;
int x = 2048;
x = x+i; // OK, int
Moreover, uint + uint = uint:
uint i = 1024;
uint x = 2048;
x = x+i; // OK, uint
It seems totally obscure.
Why int + int = int and uint + uint = uint, but int + uint = long?
What is the motivation for this decision?
Why int + int = int and uint + uint = uint, but int + uint = long? What is the motivation for this decision?
The way the question is phrased implies the presupposition that the design team wanted int + uint to be long, and chose type rules to attain that goal. That presupposition is false.
Rather, the design team thought:
What mathematical operations are people most likely to perform?
What mathematical operations can be performed safely and efficiently?
What conversions between numeric types can be performed without loss of magnitude and precision?
How can the rules for operator resolution be made both simple and consistent with the rules for method overload resolution?
As well as many other considerations such as whether the design works for or against debuggable, maintainable, versionable programs, and so on. (I note that I was not in the room for this particular design meeting, as it predated my time on the design team. But I have read their notes and know the kinds of things that would have concerned the design team during this period.)
Investigating these questions led to the present design: that arithmetic operations are defined as int + int --> int, uint + uint --> uint, long + long --> long, int may be converted to long, uint may be converted to long, and so on.
A consequence of these decisions is that when adding uint + int, overload resolution chooses long + long as the closest match, and long + long is long, therefore uint + int is long.
Making uint + int have some more different behavior that you might consider more sensible was not a design goal of the team at all because mixing signed and unsigned values is first, rare in practice, and second, almost always a bug. The design team could have added special cases for every combination of signed and unsigned one, two, four, and eight byte integers, as well as char, float, double and decimal, or any subset of those many hundreds of cases, but that works against the goal of simplicity.
So in short, on the one hand we have a large amount of design work to make a feature that we want no one to actually use easier to use at the cost of a massively complicated specification. On the other hand we have a simple specification that produces an unusual behavior in a rare case we expect no one to encounter in practice. Given those choices, which would you choose? The C# design team chose the latter.
The short answer is "because the Standard says that it shall be so", which see the informative §14.2.5.2 of ISO 23270. The normative §13.1.2. (Implicit numeric conversions) says:
The implicit numeric conversions are:
...
From int to long, float, double, or decimal.
From uint to long, ulong, float, double, or decimal.
...
Conversions from int, uint, long or ulong to float and from long or ulong to double
can cause a loss of precision, but will never cause a loss of magnitude.
The other implicit numeric conversions never lose any information. (emph. mine)
The [slightly] longer answer is that you are adding two different types: a 32-bit signed integer and a 32-bit unsigned integer:
the domain of a signed 32-bit integer is -2,147,483,648 (0x80000000) — +2,147,483,647 (0x7FFFFFFF).
the domain of an unsigned 32-bit integer is 0 (0x00000000) — +4,294,967,295 (0xFFFFFFFF).
So the types aren't compatable, since an int can't contain any arbitrary uint and a uint can't contain any arbitrary int. They are implicitly converted (a widening conversion, per the requirement of §13.1.2 that no information be lost) to the next largest type that can contain both: a long in this case, a signed 64-bit integer, which has the domain -9,223,372,036,854,775,808 (0x8000000000000000) — +9,223,372,036,854,775,807 (0x7FFFFFFFFFFFFFFF).
Edited to note: Just as an aside, Executing this code:
var x = 1024 + 2048u ;
Console.WriteLine( "'x' is an instance of `{0}`" , x.GetType().FullName ) ;
does not yield a long as the original poster's example. Instead, what is produced is:
'x' is an instance of `System.UInt32`
This is because of constant folding. The first element in the expression, 1024 has no suffix and as such is an int and the second element in the expression 2048u is a uint, according to the rules:
If the literal has no suffix, it has the first of these types in which its value
can be represented: int, uint, long, ulong.
If the literal is suffixed by U or u, it has the first of these types in which its value can be represented: uint, ulong.
And since the optimizer knows what the values are, the sum is precomputed and evaluated as a uint.
Consistency is the hobgoblin of little minds.
This is a manifestation of overload resolution for numeric types
Numeric promotion consists of automatically performing certain implicit conversions of the operands of the predefined unary and binary numeric operators. Numeric promotion is not a distinct mechanism, but rather an effect of applying overload resolution to the predefined operators. Numeric promotion specifically does not affect evaluation of user-defined operators, although user-defined operators can be implemented to exhibit similar effects.
http://msdn.microsoft.com/en-us/library/aa691328(v=vs.71).aspx
If you have a look at
long operator *(long x, long y);
uint operator *(uint x, uint y);
from that link, you see those are two possible overloads (the example refers to operator *, but the same is true for operator +).
The uint is implicitly converted to a long for overload resolution, as is int.
From uint to long, ulong, float, double, or decimal.
From int to long, float, double, or decimal.
http://msdn.microsoft.com/en-us/library/aa691282(v=vs.71).aspx
What is the motivation for this decision?
It would likely take a member of the design team to answer that aspect. Eric Lippert, where are you? :-) Note though that #Nicolas's reasoning below is very plausible, that both operands are converted to the "smallest" type that can contain the full range of values for each operand.
I think the behavior of the compiler is pretty logical and expected.
In the following code:
int i;
int j;
var k = i + j;
There is an exact overload for this operation, so k is int. Same logic applies when adding two uint, two byte or what have you. The compiler's job is easy here, its happy because the overload resolution finds an exact match. There is a pretty good chance that the person writing this code expects k to be an int and is aware that the operation can overflow in certain circumstances.
Now consider the case you are asking about:
uint i;
int j;
var k = i + j;
What does the compiler see? Well it sees an operation that has no matching overload; there is no operator + overload that takes an int and a uint as its two operands. So the overload resolution algorithm goes ahead and tries to find an operator overload that can be valid. This means it has to find an overload where the types involved can "hold" the original operands; that is, both i and j have to be implicitly convertible to said type(s).
The compiler can't implicitly convert uint to int because such conversion doesn't exsit. It cant implicitly convert int to uint either because that conversion also doesn't exist (both can cause a change in magnitude). So the only choice it really has is to choose the first broader type that can "hold" both operand types, which in this case is long. Once both operands are implicitly converted to long k being long is obvious.
The motivation of this behavior is, IMO, to choose the safest available option and not second guess the dubious coder's intent. The compiler can not make an educated guess as to what the person writing this code is expecting k to be. An int? Well, why not an uint? Both options seem equally bad. The compiler chooses the only logical path; the safe one: long. If the coder wants k to be either int or unit he only has to explicitly cast one of the operands.
And last, but not least, the C# compiler's overload resolution algorithm does not consider the return type when deciding the best overload. So the fact that you are storing the operation result in a uint is completely irrelevant to the compiler and has no effect whatsoever in the overload resolution process.
This is all speculation on my part, and I may be completely wrong. But it does seem logical reasoning.
i int i = 1024;
uint x = 2048;
// Technique #1
x = x + Convert.ToUInt32(i);
// Technique #2
x = x + checked((uint)i);
// Technique #3
x = x + unchecked((uint) i);
// Technique #4
x = x + (uint)i;
The numerical promotion rules for C# are loosely based upon those of Java and C, which work by identifying a type to which both operands can be converted and then making the result be the same type. I think such an approach was reasonable in the 1980s, but newer languages should set it aside in favor of one that looks at how values are used (e.g. If I were designing a language, then given Int32 i1,i2,i3; Int64 l; a compiler would process i4=i1+i2+i3; using 32-bit math [throwing an exception in case of overflow] would would process l=i1+i2+i3; with 64-bit math.) but the C# rules are what they are and don't seem likely to change.
It should be noted that the C# promotion rules by definition always select the overloads which are deemed "most suitable" by the language specification, but that doesn't mean they're really the most suitable for any useful purpose. For example, double f=1111111100/11111111.0f; would seem like it should yield 100.0, and it would be correctly computed if both operands were promoted to double, but the compiler will instead convert the integer 1111111100 to float yielding 1111111040.0f, and then perform the division yielding 99.999992370605469.
If I have two bytes a and b, how come:
byte c = a & b;
produces a compiler error about casting byte to int? It does this even if I put an explicit cast in front of a and b.
Also, I know about this question, but I don't really know how it applies here. This seems like it's a question of the return type of operator &(byte operand, byte operand2), which the compiler should be able to sort out just like any other operator.
Why do C#'s bitwise operators always return int regardless of the format of their inputs?
I disagree with always. This works and the result of a & b is of type long:
long a = 0xffffffffffff;
long b = 0xffffffffffff;
long x = a & b;
The return type is not int if one or both of the arguments are long, ulong or uint.
Why do C#'s bitwise operators return int if their inputs are bytes?
The result of byte & byte is an int because there is no & operator defined on byte. (Source)
An & operator exists for int and there is also an implicit cast from byte to int so when you write byte1 & byte2 this is effectively the same as writing ((int)byte1) & ((int)byte2) and the result of this is an int.
This behavior is a consequence of the design of IL, the intermediate language generated by all .NET compilers. While it supports the short integer types (byte, sbyte, short, ushort), it has only a very limited number of operations on them. Load, store, convert, create array, that's all. This is not an accident, those are the kind of operations you could execute efficiently on a 32-bit processor, back when IL was designed and RISC was the future.
The binary comparison and branch operations only work on int32, int64, native int, native floating point, object and managed reference. These operands are 32-bits or 64-bits on any current CPU core, ensuring the JIT compiler can generate efficient machine code.
You can read more about it in the Ecma 335, Partition I, chapter 12.1 and Partition III, chapter 1.5
I wrote a more extensive post about this over here.
Binary operators are not defined for byte types (among others). In fact, all binary (numeric) operators act only on the following native types:
int
uint
long
ulong
float
double
decimal
If there are any other types involved, it will use one of the above.
It's all in the C# specs version 5.0 (Section 7.3.6.2):
Binary numeric promotion occurs for the operands of the predefined +, –, *, /, %, &, |, ^, ==, !=, >, <, >=, and <= binary operators. Binary numeric promotion implicitly converts both operands to a common type which, in case of the non-relational operators, also becomes the result type of the operation. Binary numeric promotion consists of applying the following rules, in the order they appear here:
If either operand is of type decimal, the other operand is converted to type decimal, or a compile-time error occurs if the other operand is of type float or double.
Otherwise, if either operand is of type double, the other operand is converted to type double.
Otherwise, if either operand is of type float, the other operand is converted to type float.
Otherwise, if either operand is of type ulong, the other operand is converted to type ulong, or a compile-time error occurs if the other operand is of type sbyte, short, int, or long.
Otherwise, if either operand is of type long, the other operand is converted to type long.
Otherwise, if either operand is of type uint and the other operand is of type sbyte, short, or int, both operands are converted to type long.
Otherwise, if either operand is of type uint, the other operand is converted to type uint.
Otherwise, both operands are converted to type int.
It's because & is defined on integers, not on bytes, and the compiler implicitly casts your two arguments to int.